History log of /freebsd/sys/kern/kern_jail.c (Results 1 – 25 of 578)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 8c75c15d 06-Jan-2025 Mark Johnston <markj@FreeBSD.org>

jail: Avoid a potential use-after-free when destroying jails

prison_deref() and prison_deref_kill() have to handle the case where
destruction of a jail will release the final reference on the jail's

jail: Avoid a potential use-after-free when destroying jails

prison_deref() and prison_deref_kill() have to handle the case where
destruction of a jail will release the final reference on the jail's
parent, resulting in destruction of the parent jail. They thus maintain
a list of jails whose references have gone away; the loop at the end of
prison_deref() then goes through the list and deallocates resources
associated with each jail. In particular, if a jail's VNET is not the
same as that of its parent, this loop destroys the VNET.

Suppose prison_deref() removes the last reference on a jail, releasing a
reference to its parent and causing the jail to be placed in the
"freeprison" list. Suppose then that the parent jail is destroyed
before the "freeprison" list is processed. When destroying the
now-orphaned child jail, prison_deref() derefences its parent to see
whether the child jail's VNET needs to be freed, but if this race
occurs, this is a use-after-free.

Fix the problem by using PR_VNET to decide whether the jail's VNET is to
be destroyed, rather than dereferencing the parent jail pointer. Set it
earlier so that a subsequent failure in kern_jail_set() cleans up the
nascent VNET.

Reviewed by: zlei (previous version), jamie
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D47992

show more ...


# 8cf955f3 21-Dec-2024 Mark Johnston <markj@FreeBSD.org>

jail: Handle jail removal in a dedicated thread

Otherwise a deadlock is possible: the system taskqueue thread removes a
prison and calls vnet_destroy(), vnet_vlan_uninit() destroys the if_vlan
clone

jail: Handle jail removal in a dedicated thread

Otherwise a deadlock is possible: the system taskqueue thread removes a
prison and calls vnet_destroy(), vnet_vlan_uninit() destroys the if_vlan
cloner, the vlan_clone_destroy() callback calls taskqueue_drain() on the
thread taskqueue.

Fix the problem by introducing a new thread for jail removals.

Ideally, the taskqueue interface would let consumers define queues
without having to map them to threads, as that'd make it possible to
avoid such deadlocks without extra threads; for now, this is the only
solution.

Reviewed by: jamie
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D47991

show more ...


Revision tags: release/14.2.0, release/13.4.0
# ddb3eb4e 18-Jul-2024 Olivier Certner <olce@FreeBSD.org>

New setcred() system call and associated MAC hooks

This new system call allows to set all necessary credentials of
a process in one go: Effective, real and saved UIDs, effective, real and
saved GIDs

New setcred() system call and associated MAC hooks

This new system call allows to set all necessary credentials of
a process in one go: Effective, real and saved UIDs, effective, real and
saved GIDs, supplementary groups and the MAC label. Its advantage over
standard credential-setting system calls (such as setuid(), seteuid(),
etc.) is that it enables MAC modules, such as MAC/do, to restrict the
set of credentials some process may gain in a fine-grained manner.

Traditionally, credential changes rely on setuid binaries that call
multiple credential system calls and in a specific order (setuid() must
be last, so as to remain root for all other credential-setting calls,
which would otherwise fail with insufficient privileges). This
piecewise approach causes the process to transiently hold credentials
that are neither the original nor the final ones. For the kernel to
enforce that only certain transitions of credentials are allowed, either
these possibly non-compliant transient states have to disappear (by
setting all relevant attributes in one go), or the kernel must delay
setting or checking the new credentials. Delaying setting credentials
could be done, e.g., by having some mode where the standard system calls
contribute to building new credentials but without committing them. It
could be started and ended by a special system call. Delaying checking
could mean that, e.g., the kernel only verifies the credentials
transition at the next non-credential-setting system call (we just
mention this possibility for completeness, but are certainly not
endorsing it).

We chose the simpler approach of a new system call, as we don't expect
the set of credentials one can set to change often. It has the
advantages that the traditional system calls' code doesn't have to be
changed and that we can establish a special MAC protocol for it, by
having some cleanup function called just before returning (this is
a requirement for MAC/do), without disturbing the existing ones.

The mac_cred_check_setcred() hook is passed the flags received by
setcred() (including the version) and both the old and new kernel's
'struct ucred' instead of 'struct setcred' as this should simplify
evolving existing hooks as the 'struct setcred' structure evolves. The
mac_cred_setcred_enter() and mac_cred_setcred_exit() hooks are always
called by pairs around potential calls to mac_cred_check_setcred().
They allow MAC modules to allocate/free data they may need in their
mac_cred_check_setcred() hook, as the latter is called under the current
process' lock, rendering sleepable allocations impossible. MAC/do is
going to leverage these in a subsequent commit. A scheme where
mac_cred_check_setcred() could return ERESTART was considered but is
incompatible with proper composition of MAC modules.

While here, add missing includes and declarations for standalone
inclusion of <sys/ucred.h> both from kernel and userspace (for the
latter, it has been working thanks to <bsm/audit.h> already including
<sys/types.h>).

Reviewed by: brooks
Approved by: markj (mentor)
Relnotes: yes
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D47618

show more ...


# 831531a8 06-Dec-2024 Konstantin Belousov <kib@FreeBSD.org>

prison_proc_iterate(): make it work for prison0

Do not exclude processes owned by host/prison0 if there are jails
configured.

PR: 283163
Reviewed by: jamie, markj
Sponsored by: The FreeBSD Foundati

prison_proc_iterate(): make it work for prison0

Do not exclude processes owned by host/prison0 if there are jails
configured.

PR: 283163
Reviewed by: jamie, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D47943

show more ...


# d3bb35d4 28-Jun-2024 Mariusz Zaborski <oshogbo@FreeBSD.org>

jail: allow adjustment of host time

Add a special permission to the jail to adjust and to set the host time.
This can be useful if we want to compartmentalize the NTP daemon
from the rest of the sys

jail: allow adjustment of host time

Add a special permission to the jail to adjust and to set the host time.
This can be useful if we want to compartmentalize the NTP daemon
from the rest of the system.

Reviewed by: olce, imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D45545

show more ...


Revision tags: release/14.1.0, release/13.3.0
# 61cc4830 18-Jan-2024 Alfredo Mazzinghi <am2419@cl.cam.ac.uk>

Abstract UIO allocation and deallocation.

Introduce the allocuio() and freeuio() functions to allocate and
deallocate struct uio. This hides the actual allocator interface, so it
is easier to modify

Abstract UIO allocation and deallocation.

Introduce the allocuio() and freeuio() functions to allocate and
deallocate struct uio. This hides the actual allocator interface, so it
is easier to modify the sub-allocation layout of struct uio and the
corresponding iovec array.

Obtained from: CheriBSD
Reviewed by: kib, markj
MFC after: 2 weeks
Sponsored by: CHaOS, EPSRC grant EP/V000292/1
Differential Revision: https://reviews.freebsd.org/D43711

show more ...


# ab0841bd 26-Jan-2024 Jamie Gritton <jamie@FreeBSD.org>

jail: expose children.max and children.cur via sysctl

Submitted by: Igor Ostapenko <igor.ostapenko_pm.me>
Differential Revision: <https://reviews.freebsd.org/D43565>


# 9fd97868 04-Jan-2024 Baptiste Daroussin <bapt@FreeBSD.org>

jail: add security.jail.mlock_allowed

when the parameter allow.mlock was added a way for jails to check
if the parameter was set or now has not been added, this change
covers it.

MFC After: 3 days

jail: add security.jail.mlock_allowed

when the parameter allow.mlock was added a way for jails to check
if the parameter was set or now has not been added, this change
covers it.

MFC After: 3 days
Reviewed by: jamie@
Differential Revision: https://reviews.freebsd.org/D43314

show more ...


# abbc260f 26-Dec-2023 Mark Johnston <markj@FreeBSD.org>

jail: Ignore errors from copyout() while copying the error string

Reviewed by: zlei, jamie
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D43142


# ed31b3f4 30-Nov-2023 Jamie Gritton <jamie@FreeBSD.org>

jail: Don't allow jail_set(2) to resurrect dying jails.

Currently, a prison in "dying" state (removed but still holding
resources) can be brought back to alive state via "jail -d", or
the JAIL_DYING

jail: Don't allow jail_set(2) to resurrect dying jails.

Currently, a prison in "dying" state (removed but still holding
resources) can be brought back to alive state via "jail -d", or
the JAIL_DYING flag to jail_set(2). This seemed like a good idea
at the time.

Its main use was to improve support for specifying the jid when
creating a jail, which also seemed like a good idea at the time.
But resurrecting a jail that was partway through thr process of
shutting down is trouble waiting to happen.

This patch deprecates that flag, leaving it as a no-op for creating
jails (but still useful for looking at dying jails). It sill allows
creating a new jail with the same jid as a dying one, but will renumber
the old one in that case. That's imperfect, but allows for current
behavior.

Reviewed by: bz
Differential Revision: https://reviews.freebsd.org/D28150

show more ...


Revision tags: release/14.0.0
# 7974ca1c 18-Aug-2023 Olivier Certner <olce.freebsd@certner.fr>

cr_canseejailproc(): New privilege, no direct check for UID 0

Use priv_check_cred() with a new privilege (PRIV_SEEJAILPROC) instead of
explicitly testing for UID 0 (the former has been the rule for

cr_canseejailproc(): New privilege, no direct check for UID 0

Use priv_check_cred() with a new privilege (PRIV_SEEJAILPROC) instead of
explicitly testing for UID 0 (the former has been the rule for almost 20
years).

As a consequence, cr_canseejailproc() now abides by the
'security.bsd.suser_enabled' sysctl and MAC policies.

Update the MAC policies Biba and LOMAC, and prison_priv_check() so that
they don't deny this privilege. This preserves the existing behavior
(the 'root' user is not restricted, even when jailed, unless
'security.bsd.suser_enabled' is not 0) and is consistent with what is
done for the related policies/privileges (PRIV_SEEOTHERGIDS,
PRIV_SEEOTHERUIDS).

Reviewed by: emaste (earlier version), mhorne
MFC after: 2 weeks
Sponsored by: Kumacom SAS
Differential Revision: https://reviews.freebsd.org/D40626

show more ...


# cb48780d 01-Sep-2023 Shawn Webb <shawn.webb@hardenedbsd.org>

jail: Add the ability to access system-level filesystem extended attributes

Prior to this commit privileged accounts in a jail could not access to the
filesystem extended attributes in the system na

jail: Add the ability to access system-level filesystem extended attributes

Prior to this commit privileged accounts in a jail could not access to the
filesystem extended attributes in the system namespace. To control access to
the system namespace in a per-jail basis add a new configuration parameter
allow.extattr which is off by default.

Reported by: zirias
Tested by: zirias
Obtained from: HardenedBSD
Reviewed by: kevans, jamie
Differential revision: https://reviews.freebsd.org/D41643
MFC after: 1 week
Relnotes: yes

show more ...


# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix

show more ...


Revision tags: release/13.2.0
# 04f75b98 26-Mar-2023 Alexander V. Chernikov <melifaro@FreeBSD.org>

netlink: allow netlink sockets in non-vnet jails.

This change allow to open Netlink sockets in the non-vnet jails, even for
unpriviledged processes.
The security model largely follows the existing

netlink: allow netlink sockets in non-vnet jails.

This change allow to open Netlink sockets in the non-vnet jails, even for
unpriviledged processes.
The security model largely follows the existing one. To be more specific:
* by default, every `NETLINK_ROUTE` command is **NOT** allowed in non-VNET
jail UNLESS `RTNL_F_ALLOW_NONVNET_JAIL` flag is specified in the command
handler.
* All notifications are **disabled** for non-vnet jails (requests to
subscribe for the notifications are ignored). This will change to be more
fine-grained model once the first netlink provider requiring this gets
committed.
* Listing interfaces (RTM_GETLINK) is **allowed** w/o limits (**including**
interfaces w/o any addresses attached to the jail). The value of this is
questionable, but it follows the existing approach.
* Listing ARP/NDP neighbours is **forbidden**. This is a **change** from the
current approach - currently we list static ARP/ND entries belonging to the
addresses attached to the jail.
* Listing interface addresses is **allowed**, but the addresses are filtered
to match only ones attached to the jail.
* Listing routes is **allowed**, but the routes are filtered to provide only
host routes matching the addresses attached to the jail.
* By default, every `NETLINK_GENERIC` command is **allowed** in non-VNET jail
(as sub-families may be unrelated to network at all).
It is the goal of the family author to implement the restriction if
necessary.

Differential Revision: https://reviews.freebsd.org/D39206
MFC after: 1 month

show more ...


# 0b0ae2e4 15-Mar-2023 Mina Galić <freebsd@igalic.co>

jail: convert several functions from int to bool

these functions exclusively return (0) and (1), so convert them to bool

We also convert some networking related jail functions from int to bool
some

jail: convert several functions from int to bool

these functions exclusively return (0) and (1), so convert them to bool

We also convert some networking related jail functions from int to bool
some of which were returning an error that was never used.

Differential Revision: https://reviews.freebsd.org/D29659
Reviewed by: imp, jamie (earlier version)
Pull Request: https://github.com/freebsd/freebsd-src/pull/663

show more ...


# cbbb2203 02-Mar-2023 Rick Macklem <rmacklem@FreeBSD.org>

kern_jail.c: Remove #ifdefs for VNET_NFSD

The consensus was that VNET_NFSD was not needed.
This patch removes it from kern_jail.c.

With this patch, support for the "allow.nfsd"
jail parameter is en

kern_jail.c: Remove #ifdefs for VNET_NFSD

The consensus was that VNET_NFSD was not needed.
This patch removes it from kern_jail.c.

With this patch, support for the "allow.nfsd"
jail parameter is enabled in the kernel for
kernels built with "options VIMAGE".

Reviewed by: markj
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D38808

show more ...


# 2c33b456 28-Feb-2023 Zhenlei Huang <zlei@FreeBSD.org>

jail: Improve readability

No functional change intended.

Reviewed by: melifaro
Differential Revision: https://reviews.freebsd.org/D37890


# 500f82d6 28-Feb-2023 Zhenlei Huang <zlei@FreeBSD.org>

jail: Use flexible array member within struct prison_ip

Current implementation utilize off-by-one struct prison_ip to access the
IPv[46] addresses. It is error prone and hence comes the regression f

jail: Use flexible array member within struct prison_ip

Current implementation utilize off-by-one struct prison_ip to access the
IPv[46] addresses. It is error prone and hence comes the regression fix
21ad3e27fabc and ddbf879d79d4. Use flexible array member so that compiler
will catch such errors and it will also be easier to review.

No functional change intended.

Reviewed by: melifaro, glebius
Differential Revision: https://reviews.freebsd.org/D37874

show more ...


# 88175af8 21-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

vfs_export: Add mnt_exjail to control exports done in prisons

If there are multiple instances of mountd(8) (in different
prisons), there will be confusion if they manipulate the
exports of the same

vfs_export: Add mnt_exjail to control exports done in prisons

If there are multiple instances of mountd(8) (in different
prisons), there will be confusion if they manipulate the
exports of the same file system. This patch adds mnt_exjail
to "struct mount" so that the credentials (and, therefore,
the prison) that did the exports for that file system can
be recorded. If another prison has already exported the
file system, vfs_export() will fail with an error.
If mnt_exjail == NULL, the file system has not been exported.
mnt_exjail is checked by the NFS server, so that exports done
from within a different prison will not be used.

The patch also implements vfs_exjail_destroy(), which is
called from prison_cleanup() to release all the mnt_exjail
credential references, so that the prison can be removed.
Mainly to avoid doing a scan of the mountlist for the case
where there were no exports done from within the prison,
a count of how many file systems have been exported from
within the prison is kept in pr_exportcnt.

Reviewed by: markj
Discussed with: jamie
Differential Revision: https://reviews.freebsd.org/D38371
MFC after: 3 months

show more ...


# b2d76b52 21-Feb-2023 Zhenlei Huang <zlei@FreeBSD.org>

jail: Fix redoing ip restricting

`prison_ip_restrict()` is called in loop FOREACH_PRISON_DESCENDANT_LOCKED.
While under low memory, it is still possible that in subsequent rounds
`prison_ip_restrict

jail: Fix redoing ip restricting

`prison_ip_restrict()` is called in loop FOREACH_PRISON_DESCENDANT_LOCKED.
While under low memory, it is still possible that in subsequent rounds
`prison_ip_restrict()` succeed and `redo_ip[46]` flip over from true to
false, thus leave some prisons's IPv[46] addresses unrestricted.

Reviewed by: jamie
Fixes: 8bce8d28abe6 jail: Avoid multipurpose return value of function prison_ip_restrict()
Differential Revision: https://reviews.freebsd.org/D38697

show more ...


# 27202b98 07-Feb-2023 Mark Johnston <markj@FreeBSD.org>

jail: Use atomic(9) instead of CK atomics

There's no reason to use one over the other here, let's prefer the
interface that's used elsewhere in the kernel.

No functional change intended.

Reviewed

jail: Use atomic(9) instead of CK atomics

There's no reason to use one over the other here, let's prefer the
interface that's used elsewhere in the kernel.

No functional change intended.

Reviewed by: mjg
Sponsored by: Klara, Inc.
Differential Revision: https://reviews.freebsd.org/D38360

show more ...


# d94e0bdc 04-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

Revert "vfs_export: Add checks for correct prison when updating exports"

This reverts commit 7926a01ed7ae7cefd81ef4cc2142c35b84d81913.

A new patch in D38371 is being considered for doing this.


# 7926a01e 03-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

vfs_export: Add checks for correct prison when updating exports

mountd(8) basically does the following:
getmntinfo()
for each mount
delete_exports
using nmount(2) to do the creation/deletion o

vfs_export: Add checks for correct prison when updating exports

mountd(8) basically does the following:
getmntinfo()
for each mount
delete_exports
using nmount(2) to do the creation/deletion of individual exports.

For prison0 (and for other prisons if enforce_statfs == 0) getmntinfo()
returns all mount points, including ones being used within other prisons.
This can cause confusion if the same file system is specified in the
exports(5) file for multiple prisons.

This patch adds a perminent identifier to each prison
and marks which prison did the exports in a field of
the mount structure called mnt_exjail. This field can
then be compared to the perminent identifier for the
prison that the thread's credentials is in.
Also required was a new function called prison_isalive_permid()
which returns if the prison is alive, so that the check can be
ignored for prisons that have been removed.

This prepares the system to allow mountd(8) to run in multiple
prisons, including prison0.

Future commits will complete the modifications to allow mountd(8)
to run in vnet prisons. Until then, these changes should not affect
semantics.

Reviewed by: markj
MFC after: 3 months
Differential Revision: https://reviews.freebsd.org/D38144

show more ...


# 99187c3a 02-Feb-2023 Rick Macklem <rmacklem@FreeBSD.org>

prison_check_nfsd: Add check for enforce_statfs != 0

Since mountd(8) will not be able to do exports
when running in a vnet prison if enforce_statfs is
set to 0, add a check for this to prison_check_

prison_check_nfsd: Add check for enforce_statfs != 0

Since mountd(8) will not be able to do exports
when running in a vnet prison if enforce_statfs is
set to 0, add a check for this to prison_check_nfsd().

Reviewed by: jamie, markj
MFC after: 2 months
Differential Revision: https://reviews.freebsd.org/D38189

show more ...


12345678910>>...24