#
eac7078a |
| 07-May-2019 |
Linus Torvalds <torvalds@linux-foundation.org> |
Merge tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull pidfd updates from Christian Brauner: "This patchset makes it possible to retrieve pidfds at process c
Merge tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
Pull pidfd updates from Christian Brauner: "This patchset makes it possible to retrieve pidfds at process creation time by introducing the new flag CLONE_PIDFD to the clone() system call. Linus originally suggested to implement this as a new flag to clone() instead of making it a separate system call.
After a thorough review from Oleg CLONE_PIDFD returns pidfds in the parent_tidptr argument. This means we can give back the associated pid and the pidfd at the same time. Access to process metadata information thus becomes rather trivial.
As has been agreed, CLONE_PIDFD creates file descriptors based on anonymous inodes similar to the new mount api. They are made unconditional by this patchset as they are now needed by core kernel code (vfs, pidfd) even more than they already were before (timerfd, signalfd, io_uring, epoll etc.). The core patchset is rather small. The bulky looking changelist is caused by David's very simple changes to Kconfig to make anon inodes unconditional.
A pidfd comes with additional information in fdinfo if the kernel supports procfs. The fdinfo file contains the pid of the process in the callers pid namespace in the same format as the procfs status file, i.e. "Pid:\t%d".
To remove worries about missing metadata access this patchset comes with a sample/test program that illustrates how a combination of CLONE_PIDFD and pidfd_send_signal() can be used to gain race-free access to process metadata through /proc/<pid>.
Further work based on this patchset has been done by Joel. His work makes pidfds pollable. It finished too late for this merge window. I would prefer to have it sitting in linux-next for a while and send it for inclusion during the 5.3 merge window"
* tag 'pidfd-v5.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: samples: show race-free pidfd metadata access signal: support CLONE_PIDFD with pidfd_send_signal clone: add CLONE_PIDFD Make anon_inodes unconditional
show more ...
|
#
43c6afee |
| 07-Apr-2019 |
Christian Brauner <christian@brauner.io> |
samples: show race-free pidfd metadata access
This is a sample program showing userspace how to get race-free access to process metadata from a pidfd. It is rather easy to do and userspace can actu
samples: show race-free pidfd metadata access
This is a sample program showing userspace how to get race-free access to process metadata from a pidfd. It is rather easy to do and userspace can actually simply reuse code that currently parses a process's status file in procfs. The program can easily be extended into a generic helper suitable for inclusion in a libc to make it even easier for userspace to gain metadata access.
Since this came up in a discussion because this API is going to be used in various service managers: A lot of programs will have a whitelist seccomp filter that returns <some-errno> for all new syscalls. This means that programs might get confused if CLONE_PIDFD works but the later pidfd_send_signal() syscall doesn't. Hence, here's a ahead of time check that pidfd_send_signal() is supported:
bool pidfd_send_signal_supported() { int procfd = open("/proc/self", O_DIRECTORY | O_RDONLY | O_CLOEXEC); if (procfd < 0) return false;
/* * A process is always allowed to signal itself so * pidfd_send_signal() should never fail this test. If it does * it must mean it is not available, blocked by an LSM, seccomp, * or other. */ return pidfd_send_signal(procfd, 0, NULL, 0) == 0; }
Signed-off-by: Christian Brauner <christian@brauner.io> Co-developed-by: Jann Horn <jannh@google.com> Signed-off-by: Jann Horn <jannh@google.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Howells <dhowells@redhat.com> Cc: "Michael Kerrisk (man-pages)" <mtk.manpages@gmail.com> Cc: Andy Lutomirsky <luto@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Al Viro <viro@zeniv.linux.org.uk>
show more ...
|