#
8c75c15d |
| 06-Jan-2025 |
Mark Johnston <markj@FreeBSD.org> |
jail: Avoid a potential use-after-free when destroying jails
prison_deref() and prison_deref_kill() have to handle the case where destruction of a jail will release the final reference on the jail's
jail: Avoid a potential use-after-free when destroying jails
prison_deref() and prison_deref_kill() have to handle the case where destruction of a jail will release the final reference on the jail's parent, resulting in destruction of the parent jail. They thus maintain a list of jails whose references have gone away; the loop at the end of prison_deref() then goes through the list and deallocates resources associated with each jail. In particular, if a jail's VNET is not the same as that of its parent, this loop destroys the VNET.
Suppose prison_deref() removes the last reference on a jail, releasing a reference to its parent and causing the jail to be placed in the "freeprison" list. Suppose then that the parent jail is destroyed before the "freeprison" list is processed. When destroying the now-orphaned child jail, prison_deref() derefences its parent to see whether the child jail's VNET needs to be freed, but if this race occurs, this is a use-after-free.
Fix the problem by using PR_VNET to decide whether the jail's VNET is to be destroyed, rather than dereferencing the parent jail pointer. Set it earlier so that a subsequent failure in kern_jail_set() cleans up the nascent VNET.
Reviewed by: zlei (previous version), jamie MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D47992
show more ...
|
#
8cf955f3 |
| 21-Dec-2024 |
Mark Johnston <markj@FreeBSD.org> |
jail: Handle jail removal in a dedicated thread
Otherwise a deadlock is possible: the system taskqueue thread removes a prison and calls vnet_destroy(), vnet_vlan_uninit() destroys the if_vlan clone
jail: Handle jail removal in a dedicated thread
Otherwise a deadlock is possible: the system taskqueue thread removes a prison and calls vnet_destroy(), vnet_vlan_uninit() destroys the if_vlan cloner, the vlan_clone_destroy() callback calls taskqueue_drain() on the thread taskqueue.
Fix the problem by introducing a new thread for jail removals.
Ideally, the taskqueue interface would let consumers define queues without having to map them to threads, as that'd make it possible to avoid such deadlocks without extra threads; for now, this is the only solution.
Reviewed by: jamie MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D47991
show more ...
|
Revision tags: release/14.2.0, release/13.4.0 |
|
#
ddb3eb4e |
| 18-Jul-2024 |
Olivier Certner <olce@FreeBSD.org> |
New setcred() system call and associated MAC hooks
This new system call allows to set all necessary credentials of a process in one go: Effective, real and saved UIDs, effective, real and saved GIDs
New setcred() system call and associated MAC hooks
This new system call allows to set all necessary credentials of a process in one go: Effective, real and saved UIDs, effective, real and saved GIDs, supplementary groups and the MAC label. Its advantage over standard credential-setting system calls (such as setuid(), seteuid(), etc.) is that it enables MAC modules, such as MAC/do, to restrict the set of credentials some process may gain in a fine-grained manner.
Traditionally, credential changes rely on setuid binaries that call multiple credential system calls and in a specific order (setuid() must be last, so as to remain root for all other credential-setting calls, which would otherwise fail with insufficient privileges). This piecewise approach causes the process to transiently hold credentials that are neither the original nor the final ones. For the kernel to enforce that only certain transitions of credentials are allowed, either these possibly non-compliant transient states have to disappear (by setting all relevant attributes in one go), or the kernel must delay setting or checking the new credentials. Delaying setting credentials could be done, e.g., by having some mode where the standard system calls contribute to building new credentials but without committing them. It could be started and ended by a special system call. Delaying checking could mean that, e.g., the kernel only verifies the credentials transition at the next non-credential-setting system call (we just mention this possibility for completeness, but are certainly not endorsing it).
We chose the simpler approach of a new system call, as we don't expect the set of credentials one can set to change often. It has the advantages that the traditional system calls' code doesn't have to be changed and that we can establish a special MAC protocol for it, by having some cleanup function called just before returning (this is a requirement for MAC/do), without disturbing the existing ones.
The mac_cred_check_setcred() hook is passed the flags received by setcred() (including the version) and both the old and new kernel's 'struct ucred' instead of 'struct setcred' as this should simplify evolving existing hooks as the 'struct setcred' structure evolves. The mac_cred_setcred_enter() and mac_cred_setcred_exit() hooks are always called by pairs around potential calls to mac_cred_check_setcred(). They allow MAC modules to allocate/free data they may need in their mac_cred_check_setcred() hook, as the latter is called under the current process' lock, rendering sleepable allocations impossible. MAC/do is going to leverage these in a subsequent commit. A scheme where mac_cred_check_setcred() could return ERESTART was considered but is incompatible with proper composition of MAC modules.
While here, add missing includes and declarations for standalone inclusion of <sys/ucred.h> both from kernel and userspace (for the latter, it has been working thanks to <bsm/audit.h> already including <sys/types.h>).
Reviewed by: brooks Approved by: markj (mentor) Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47618
show more ...
|
#
831531a8 |
| 06-Dec-2024 |
Konstantin Belousov <kib@FreeBSD.org> |
prison_proc_iterate(): make it work for prison0
Do not exclude processes owned by host/prison0 if there are jails configured.
PR: 283163 Reviewed by: jamie, markj Sponsored by: The FreeBSD Foundati
prison_proc_iterate(): make it work for prison0
Do not exclude processes owned by host/prison0 if there are jails configured.
PR: 283163 Reviewed by: jamie, markj Sponsored by: The FreeBSD Foundation MFC after: 1 week Differential revision: https://reviews.freebsd.org/D47943
show more ...
|
#
d3bb35d4 |
| 28-Jun-2024 |
Mariusz Zaborski <oshogbo@FreeBSD.org> |
jail: allow adjustment of host time
Add a special permission to the jail to adjust and to set the host time. This can be useful if we want to compartmentalize the NTP daemon from the rest of the sys
jail: allow adjustment of host time
Add a special permission to the jail to adjust and to set the host time. This can be useful if we want to compartmentalize the NTP daemon from the rest of the system.
Reviewed by: olce, imp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D45545
show more ...
|
Revision tags: release/14.1.0, release/13.3.0 |
|
#
61cc4830 |
| 18-Jan-2024 |
Alfredo Mazzinghi <am2419@cl.cam.ac.uk> |
Abstract UIO allocation and deallocation.
Introduce the allocuio() and freeuio() functions to allocate and deallocate struct uio. This hides the actual allocator interface, so it is easier to modify
Abstract UIO allocation and deallocation.
Introduce the allocuio() and freeuio() functions to allocate and deallocate struct uio. This hides the actual allocator interface, so it is easier to modify the sub-allocation layout of struct uio and the corresponding iovec array.
Obtained from: CheriBSD Reviewed by: kib, markj MFC after: 2 weeks Sponsored by: CHaOS, EPSRC grant EP/V000292/1 Differential Revision: https://reviews.freebsd.org/D43711
show more ...
|
#
ab0841bd |
| 26-Jan-2024 |
Jamie Gritton <jamie@FreeBSD.org> |
jail: expose children.max and children.cur via sysctl
Submitted by: Igor Ostapenko <igor.ostapenko_pm.me> Differential Revision: <https://reviews.freebsd.org/D43565>
|
#
9fd97868 |
| 04-Jan-2024 |
Baptiste Daroussin <bapt@FreeBSD.org> |
jail: add security.jail.mlock_allowed
when the parameter allow.mlock was added a way for jails to check if the parameter was set or now has not been added, this change covers it.
MFC After: 3 days
jail: add security.jail.mlock_allowed
when the parameter allow.mlock was added a way for jails to check if the parameter was set or now has not been added, this change covers it.
MFC After: 3 days Reviewed by: jamie@ Differential Revision: https://reviews.freebsd.org/D43314
show more ...
|
#
abbc260f |
| 26-Dec-2023 |
Mark Johnston <markj@FreeBSD.org> |
jail: Ignore errors from copyout() while copying the error string
Reviewed by: zlei, jamie MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D43142
|
#
ed31b3f4 |
| 30-Nov-2023 |
Jamie Gritton <jamie@FreeBSD.org> |
jail: Don't allow jail_set(2) to resurrect dying jails.
Currently, a prison in "dying" state (removed but still holding resources) can be brought back to alive state via "jail -d", or the JAIL_DYING
jail: Don't allow jail_set(2) to resurrect dying jails.
Currently, a prison in "dying" state (removed but still holding resources) can be brought back to alive state via "jail -d", or the JAIL_DYING flag to jail_set(2). This seemed like a good idea at the time.
Its main use was to improve support for specifying the jid when creating a jail, which also seemed like a good idea at the time. But resurrecting a jail that was partway through thr process of shutting down is trouble waiting to happen.
This patch deprecates that flag, leaving it as a no-op for creating jails (but still useful for looking at dying jails). It sill allows creating a new jail with the same jid as a dying one, but will renumber the old one in that case. That's imperfect, but allows for current behavior.
Reviewed by: bz Differential Revision: https://reviews.freebsd.org/D28150
show more ...
|
Revision tags: release/14.0.0 |
|
#
7974ca1c |
| 18-Aug-2023 |
Olivier Certner <olce.freebsd@certner.fr> |
cr_canseejailproc(): New privilege, no direct check for UID 0
Use priv_check_cred() with a new privilege (PRIV_SEEJAILPROC) instead of explicitly testing for UID 0 (the former has been the rule for
cr_canseejailproc(): New privilege, no direct check for UID 0
Use priv_check_cred() with a new privilege (PRIV_SEEJAILPROC) instead of explicitly testing for UID 0 (the former has been the rule for almost 20 years).
As a consequence, cr_canseejailproc() now abides by the 'security.bsd.suser_enabled' sysctl and MAC policies.
Update the MAC policies Biba and LOMAC, and prison_priv_check() so that they don't deny this privilege. This preserves the existing behavior (the 'root' user is not restricted, even when jailed, unless 'security.bsd.suser_enabled' is not 0) and is consistent with what is done for the related policies/privileges (PRIV_SEEOTHERGIDS, PRIV_SEEOTHERUIDS).
Reviewed by: emaste (earlier version), mhorne MFC after: 2 weeks Sponsored by: Kumacom SAS Differential Revision: https://reviews.freebsd.org/D40626
show more ...
|
#
cb48780d |
| 01-Sep-2023 |
Shawn Webb <shawn.webb@hardenedbsd.org> |
jail: Add the ability to access system-level filesystem extended attributes
Prior to this commit privileged accounts in a jail could not access to the filesystem extended attributes in the system na
jail: Add the ability to access system-level filesystem extended attributes
Prior to this commit privileged accounts in a jail could not access to the filesystem extended attributes in the system namespace. To control access to the system namespace in a per-jail basis add a new configuration parameter allow.extattr which is off by default.
Reported by: zirias Tested by: zirias Obtained from: HardenedBSD Reviewed by: kevans, jamie Differential revision: https://reviews.freebsd.org/D41643 MFC after: 1 week Relnotes: yes
show more ...
|
#
685dc743 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
4d846d26 |
| 10-May-2023 |
Warner Losh <imp@FreeBSD.org> |
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
show more ...
|
Revision tags: release/13.2.0 |
|
#
04f75b98 |
| 26-Mar-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: allow netlink sockets in non-vnet jails.
This change allow to open Netlink sockets in the non-vnet jails, even for unpriviledged processes. The security model largely follows the existing
netlink: allow netlink sockets in non-vnet jails.
This change allow to open Netlink sockets in the non-vnet jails, even for unpriviledged processes. The security model largely follows the existing one. To be more specific: * by default, every `NETLINK_ROUTE` command is **NOT** allowed in non-VNET jail UNLESS `RTNL_F_ALLOW_NONVNET_JAIL` flag is specified in the command handler. * All notifications are **disabled** for non-vnet jails (requests to subscribe for the notifications are ignored). This will change to be more fine-grained model once the first netlink provider requiring this gets committed. * Listing interfaces (RTM_GETLINK) is **allowed** w/o limits (**including** interfaces w/o any addresses attached to the jail). The value of this is questionable, but it follows the existing approach. * Listing ARP/NDP neighbours is **forbidden**. This is a **change** from the current approach - currently we list static ARP/ND entries belonging to the addresses attached to the jail. * Listing interface addresses is **allowed**, but the addresses are filtered to match only ones attached to the jail. * Listing routes is **allowed**, but the routes are filtered to provide only host routes matching the addresses attached to the jail. * By default, every `NETLINK_GENERIC` command is **allowed** in non-VNET jail (as sub-families may be unrelated to network at all). It is the goal of the family author to implement the restriction if necessary.
Differential Revision: https://reviews.freebsd.org/D39206 MFC after: 1 month
show more ...
|
#
0b0ae2e4 |
| 15-Mar-2023 |
Mina Galić <freebsd@igalic.co> |
jail: convert several functions from int to bool
these functions exclusively return (0) and (1), so convert them to bool
We also convert some networking related jail functions from int to bool some
jail: convert several functions from int to bool
these functions exclusively return (0) and (1), so convert them to bool
We also convert some networking related jail functions from int to bool some of which were returning an error that was never used.
Differential Revision: https://reviews.freebsd.org/D29659 Reviewed by: imp, jamie (earlier version) Pull Request: https://github.com/freebsd/freebsd-src/pull/663
show more ...
|
#
cbbb2203 |
| 02-Mar-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
kern_jail.c: Remove #ifdefs for VNET_NFSD
The consensus was that VNET_NFSD was not needed. This patch removes it from kern_jail.c.
With this patch, support for the "allow.nfsd" jail parameter is en
kern_jail.c: Remove #ifdefs for VNET_NFSD
The consensus was that VNET_NFSD was not needed. This patch removes it from kern_jail.c.
With this patch, support for the "allow.nfsd" jail parameter is enabled in the kernel for kernels built with "options VIMAGE".
Reviewed by: markj MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D38808
show more ...
|
#
2c33b456 |
| 28-Feb-2023 |
Zhenlei Huang <zlei@FreeBSD.org> |
jail: Improve readability
No functional change intended.
Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D37890
|
#
500f82d6 |
| 28-Feb-2023 |
Zhenlei Huang <zlei@FreeBSD.org> |
jail: Use flexible array member within struct prison_ip
Current implementation utilize off-by-one struct prison_ip to access the IPv[46] addresses. It is error prone and hence comes the regression f
jail: Use flexible array member within struct prison_ip
Current implementation utilize off-by-one struct prison_ip to access the IPv[46] addresses. It is error prone and hence comes the regression fix 21ad3e27fabc and ddbf879d79d4. Use flexible array member so that compiler will catch such errors and it will also be easier to review.
No functional change intended.
Reviewed by: melifaro, glebius Differential Revision: https://reviews.freebsd.org/D37874
show more ...
|
#
88175af8 |
| 21-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
vfs_export: Add mnt_exjail to control exports done in prisons
If there are multiple instances of mountd(8) (in different prisons), there will be confusion if they manipulate the exports of the same
vfs_export: Add mnt_exjail to control exports done in prisons
If there are multiple instances of mountd(8) (in different prisons), there will be confusion if they manipulate the exports of the same file system. This patch adds mnt_exjail to "struct mount" so that the credentials (and, therefore, the prison) that did the exports for that file system can be recorded. If another prison has already exported the file system, vfs_export() will fail with an error. If mnt_exjail == NULL, the file system has not been exported. mnt_exjail is checked by the NFS server, so that exports done from within a different prison will not be used.
The patch also implements vfs_exjail_destroy(), which is called from prison_cleanup() to release all the mnt_exjail credential references, so that the prison can be removed. Mainly to avoid doing a scan of the mountlist for the case where there were no exports done from within the prison, a count of how many file systems have been exported from within the prison is kept in pr_exportcnt.
Reviewed by: markj Discussed with: jamie Differential Revision: https://reviews.freebsd.org/D38371 MFC after: 3 months
show more ...
|
#
b2d76b52 |
| 21-Feb-2023 |
Zhenlei Huang <zlei@FreeBSD.org> |
jail: Fix redoing ip restricting
`prison_ip_restrict()` is called in loop FOREACH_PRISON_DESCENDANT_LOCKED. While under low memory, it is still possible that in subsequent rounds `prison_ip_restrict
jail: Fix redoing ip restricting
`prison_ip_restrict()` is called in loop FOREACH_PRISON_DESCENDANT_LOCKED. While under low memory, it is still possible that in subsequent rounds `prison_ip_restrict()` succeed and `redo_ip[46]` flip over from true to false, thus leave some prisons's IPv[46] addresses unrestricted.
Reviewed by: jamie Fixes: 8bce8d28abe6 jail: Avoid multipurpose return value of function prison_ip_restrict() Differential Revision: https://reviews.freebsd.org/D38697
show more ...
|
#
27202b98 |
| 07-Feb-2023 |
Mark Johnston <markj@FreeBSD.org> |
jail: Use atomic(9) instead of CK atomics
There's no reason to use one over the other here, let's prefer the interface that's used elsewhere in the kernel.
No functional change intended.
Reviewed
jail: Use atomic(9) instead of CK atomics
There's no reason to use one over the other here, let's prefer the interface that's used elsewhere in the kernel.
No functional change intended.
Reviewed by: mjg Sponsored by: Klara, Inc. Differential Revision: https://reviews.freebsd.org/D38360
show more ...
|
#
d94e0bdc |
| 04-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
Revert "vfs_export: Add checks for correct prison when updating exports"
This reverts commit 7926a01ed7ae7cefd81ef4cc2142c35b84d81913.
A new patch in D38371 is being considered for doing this.
|
#
7926a01e |
| 03-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
vfs_export: Add checks for correct prison when updating exports
mountd(8) basically does the following: getmntinfo() for each mount delete_exports using nmount(2) to do the creation/deletion o
vfs_export: Add checks for correct prison when updating exports
mountd(8) basically does the following: getmntinfo() for each mount delete_exports using nmount(2) to do the creation/deletion of individual exports.
For prison0 (and for other prisons if enforce_statfs == 0) getmntinfo() returns all mount points, including ones being used within other prisons. This can cause confusion if the same file system is specified in the exports(5) file for multiple prisons.
This patch adds a perminent identifier to each prison and marks which prison did the exports in a field of the mount structure called mnt_exjail. This field can then be compared to the perminent identifier for the prison that the thread's credentials is in. Also required was a new function called prison_isalive_permid() which returns if the prison is alive, so that the check can be ignored for prisons that have been removed.
This prepares the system to allow mountd(8) to run in multiple prisons, including prison0.
Future commits will complete the modifications to allow mountd(8) to run in vnet prisons. Until then, these changes should not affect semantics.
Reviewed by: markj MFC after: 3 months Differential Revision: https://reviews.freebsd.org/D38144
show more ...
|
#
99187c3a |
| 02-Feb-2023 |
Rick Macklem <rmacklem@FreeBSD.org> |
prison_check_nfsd: Add check for enforce_statfs != 0
Since mountd(8) will not be able to do exports when running in a vnet prison if enforce_statfs is set to 0, add a check for this to prison_check_
prison_check_nfsd: Add check for enforce_statfs != 0
Since mountd(8) will not be able to do exports when running in a vnet prison if enforce_statfs is set to 0, add a check for this to prison_check_nfsd().
Reviewed by: jamie, markj MFC after: 2 months Differential Revision: https://reviews.freebsd.org/D38189
show more ...
|