Ctrtool/Things to do

General

  • Now that unprivileged overlayfs is merged into the mainline kernel, document how to run Docker and LXC within an unprivileged ctrtool container.

mount_seq

  • The symlink check facility is non-atomic.
  • If a sequence of mounts fails part way, then there is no way of telling how far it went. This is because mount_seq was meant to be used in a tmpfs context, where all of the targets are meant to be located within a single, newly-created tmpfs mount; lazily unmounting that tmpfs would effectively undo all of the operations.
  • Specifying a filesystem type where the data argument of the mount() system call is expected to be something other than a null-terminated string (such as cifs) may lead to undefined behavior.

ns_open_file

  • The TUN/TAP opening feature is not yet implemented.
  • There should be a way to specify a path other than '/' for the foreign mount namespace (this would be useful if entering the user namespace would give the process file-related capabilities like CAP_DAC_READ_SEARCH over the user and group IDs in the user namespace). Should also support various open() and openat2() flags like O_DIRECTORY and RESOLVE_NO_SYMLINKS, as well as setting the "filesystem" UID/GID. Mostly done.
  • Needs a way to specify more meaningful strings for domain, type, and protocol. Done, but which domains/types/protocols are the most useful?
  • The socket creation facility is only useful for server sockets (including proxy servers), and where the server application supports listening on inherited file descriptors (though in many cases, you can (ab)use the systemd socket activation feature found in many modern server applications; see Help:Ctrtool/set_fds).
  • Listening with a backlog of 0 is currently equivalent to not listening at all. Ideally, this should have been equivalent to listening with a maximum backlog of SOMAXCONN.
  • -U currently relies on the NS_GET_USERNS ioctl, and thus only works on kernels 4.9 and up.
  • Needs to support binding of protocol families other than IPv4 or IPv6 (e.g. Unix domain sockets, AF_VSOCK, packet sockets, etc.) Currently also supports pathname and abstract Unix domain sockets, though permissions for pathname sockets are not yet implemented.
  • Needs to support more socket options such as TCP_DEFER_ACCEPT, IPV6_V6ONLY, SO_KEEPALIVE, TCP_NODELAY, and IPV6_ADDRFORM.
  • Needs to support creating multiple sockets in a single network namespace without forking the same number of child processes.
  • Needs to support setting effective and filesystem UIDs and GIDs before and/or after entering the user namespace or creating a socket, with additional flags to keep any capabilities (which should be disabled by default).
  • Needs to support connecting to a socket destination (though the same could also be achieved using a Python one-liner through a shell script, e.g. like below):
python3 -c 'import os, socket; s = socket.socket(fileno=int(os.getenv("CTRTOOL_NS_OPEN_FILE_FD_0"))); s.connect(("host", "port")); s.detach()'
  • It is theoretically possible for the sockets to share the same sets of bound IP addresses, though this may confuse the spawned application.
  • Setting the SOCK_CLOEXEC flag using the -t parameter may result in undefined behavior. Ideally this should have resulted in an error. (SOCK_NONBLOCK is OK.)
  • Needs to support spawning a new program with a pipe or Unix domain socket (similar to popen() but with a file descriptor, not a stdio object); also support SCM_RIGHTS operations.
  • Needs to support a few more modes of operation:
  • Instead of joining an existing network namespace, create a new, anonymous namespace with a socket in it; the network namespace can be later retrieved using SIOCGSKNS Done. Also support opening the ns/net file itself and specifying a user namespace to create the network namespace in Done.
  • Support creating a new user namespace with a specified user and group ID map and setgroups=allow/deny.
  • Use a series of intermediate registers to store the namespace output of the above operations, much like with pidfd_ctl Done.
  • The -I operations in the nsof_special branch are not yet documented.

pidfd_ctl

  • If requested, check that the PPID is correct before and just after calling pidfd_open, to allow a PID file descriptor for the parent process to be opened in a race-free manner.