Revision tags: release/14.2.0, release/13.4.0 |
|
#
ddb3eb4e |
| 18-Jul-2024 |
Olivier Certner <olce@FreeBSD.org> |
New setcred() system call and associated MAC hooks
This new system call allows to set all necessary credentials of a process in one go: Effective, real and saved UIDs, effective, real and saved GIDs
New setcred() system call and associated MAC hooks
This new system call allows to set all necessary credentials of a process in one go: Effective, real and saved UIDs, effective, real and saved GIDs, supplementary groups and the MAC label. Its advantage over standard credential-setting system calls (such as setuid(), seteuid(), etc.) is that it enables MAC modules, such as MAC/do, to restrict the set of credentials some process may gain in a fine-grained manner.
Traditionally, credential changes rely on setuid binaries that call multiple credential system calls and in a specific order (setuid() must be last, so as to remain root for all other credential-setting calls, which would otherwise fail with insufficient privileges). This piecewise approach causes the process to transiently hold credentials that are neither the original nor the final ones. For the kernel to enforce that only certain transitions of credentials are allowed, either these possibly non-compliant transient states have to disappear (by setting all relevant attributes in one go), or the kernel must delay setting or checking the new credentials. Delaying setting credentials could be done, e.g., by having some mode where the standard system calls contribute to building new credentials but without committing them. It could be started and ended by a special system call. Delaying checking could mean that, e.g., the kernel only verifies the credentials transition at the next non-credential-setting system call (we just mention this possibility for completeness, but are certainly not endorsing it).
We chose the simpler approach of a new system call, as we don't expect the set of credentials one can set to change often. It has the advantages that the traditional system calls' code doesn't have to be changed and that we can establish a special MAC protocol for it, by having some cleanup function called just before returning (this is a requirement for MAC/do), without disturbing the existing ones.
The mac_cred_check_setcred() hook is passed the flags received by setcred() (including the version) and both the old and new kernel's 'struct ucred' instead of 'struct setcred' as this should simplify evolving existing hooks as the 'struct setcred' structure evolves. The mac_cred_setcred_enter() and mac_cred_setcred_exit() hooks are always called by pairs around potential calls to mac_cred_check_setcred(). They allow MAC modules to allocate/free data they may need in their mac_cred_check_setcred() hook, as the latter is called under the current process' lock, rendering sleepable allocations impossible. MAC/do is going to leverage these in a subsequent commit. A scheme where mac_cred_check_setcred() could return ERESTART was considered but is incompatible with proper composition of MAC modules.
While here, add missing includes and declarations for standalone inclusion of <sys/ucred.h> both from kernel and userspace (for the latter, it has been working thanks to <bsm/audit.h> already including <sys/types.h>).
Reviewed by: brooks Approved by: markj (mentor) Relnotes: yes Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D47618
show more ...
|
Revision tags: release/14.1.0, release/13.3.0, release/14.0.0 |
|
#
af93fea7 |
| 24-Aug-2023 |
Jake Freeland <jfree@freebsd.org> |
timerfd: Move implementation from linux compat to sys/kern
Move the timerfd impelemntation from linux compat code to sys/kern. Use it to implement the new system calls for timerfd. Add a hook to ker
timerfd: Move implementation from linux compat to sys/kern
Move the timerfd impelemntation from linux compat code to sys/kern. Use it to implement the new system calls for timerfd. Add a hook to kern_tc to allow timerfd to know when the system time has stepped. Add kqueue support to timerfd. Adjust a few names to be less Linux centric.
RelNotes: YES Reviewed by: markj (on irc), imp, kib (with reservations), jhb (slack) Differential Revision: https://reviews.freebsd.org/D38459
show more ...
|
#
95ee2897 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
|
Revision tags: release/13.2.0, release/12.4.0, release/13.1.0, release/12.3.0 |
|
#
0dc332bf |
| 05-Aug-2021 |
Ka Ho Ng <khng@FreeBSD.org> |
Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9).
fspacectl(2) is a system call to provide space management support to userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the
Add fspacectl(2), vn_deallocate(9) and VOP_DEALLOCATE(9).
fspacectl(2) is a system call to provide space management support to userspace applications. VOP_DEALLOCATE(9) is a VOP call to perform the deallocation. vn_deallocate(9) is a public KPI for kmods' use.
The purpose of proposing a new system call, a KPI and a VOP call is to allow bhyve or other hypervisor monitors to emulate the behavior of SCSI UNMAP/NVMe DEALLOCATE on a plain file.
fspacectl(2) comprises of cmd and flags parameters to specify the space management operation to be performed. Currently cmd has to be SPACECTL_DEALLOC, and flags has to be 0.
fo_fspacectl is added to fileops. VOP_DEALLOCATE(9) is added as a new VOP call. A trivial implementation of VOP_DEALLOCATE(9) is provided.
Sponsored by: The FreeBSD Foundation Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D28347
show more ...
|
Revision tags: release/13.0.0 |
|
#
022ca2fc |
| 03-Jan-2021 |
Alan Somers <asomers@FreeBSD.org> |
Add aio_writev and aio_readv
POSIX AIO is great, but it lacks vectored I/O functions. This commit fixes that shortcoming by adding aio_writev and aio_readv. They aren't part of the standard, but the
Add aio_writev and aio_readv
POSIX AIO is great, but it lacks vectored I/O functions. This commit fixes that shortcoming by adding aio_writev and aio_readv. They aren't part of the standard, but they're an obvious extension. They work just like their synchronous equivalents pwritev and preadv.
It isn't yet possible to use vectored aiocbs with lio_listio, but that could be added in the future.
Reviewed by: jhb, kib, bcr Relnotes: yes Differential Revision: https://reviews.freebsd.org/D27743
show more ...
|
#
7a202823 |
| 23-Dec-2020 |
Konstantin Belousov <kib@FreeBSD.org> |
Expose eventfd in the native API/ABI using a new __specialfd syscall
eventfd is a Linux system call that produces special file descriptors for event notification. When porting Linux software, it is
Expose eventfd in the native API/ABI using a new __specialfd syscall
eventfd is a Linux system call that produces special file descriptors for event notification. When porting Linux software, it is currently usually emulated by epoll-shim on top of kqueues. Unfortunately, kqueues are not passable between processes. And, as noted by the author of epoll-shim, even if they were, the library state would also have to be passed somehow. This came up when debugging strange HW video decode failures in Firefox. A native implementation would avoid these problems and help with porting Linux software.
Since we now already have an eventfd implementation in the kernel (for the Linuxulator), it's pretty easy to expose it natively, which is what this patch does.
Submitted by: greg@unrelenting.technology Reviewed by: markj (previous version) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26668
show more ...
|
Revision tags: release/12.2.0, release/11.4.0 |
|
#
ca9cba33 |
| 24-Apr-2020 |
Kyle Evans <kevans@FreeBSD.org> |
bsm: add AUE_CLOSERANGE
AUE_CLOSERANGE has been accepted upstream as 43265; AUE_REALPATHAT has now been upstreamed.
|
#
6c140a72 |
| 20-Feb-2020 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r358131 through r358178.
|
#
0573d0a9 |
| 20-Feb-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
vfs: add realpathat syscall
realpath(3) is used a lot e.g., by clang and is a major source of getcwd and fstatat calls. This can be done more efficiently in the kernel.
This works by performing a r
vfs: add realpathat syscall
realpath(3) is used a lot e.g., by clang and is a major source of getcwd and fstatat calls. This can be done more efficiently in the kernel.
This works by performing a regular lookup while saving the name and found parent directory. If the terminal vnode is a directory we can resolve it using usual means. Otherwise we can use the name saved by lookup and resolve the parent.
See the review for sample syscall counts.
Reviewed by: kib Differential Revision: https://reviews.freebsd.org/D23574
show more ...
|
#
2d5603fe |
| 18-Nov-2019 |
David Bright <dab@FreeBSD.org> |
Jail and capability mode for shm_rename; add audit support for shm_rename
Co-mingling two things here:
* Addressing some feedback from Konstantin and Kyle re: jail, capability mode, and a few
Jail and capability mode for shm_rename; add audit support for shm_rename
Co-mingling two things here:
* Addressing some feedback from Konstantin and Kyle re: jail, capability mode, and a few other things * Adding audit support as promised.
The audit support change includes a partial refresh of OpenBSM from upstream, where the change to add shm_rename has already been accepted. Matthew doesn't plan to work on refreshing anything else to support audit for those new event types.
Submitted by: Matthew Bryan <matthew.bryan@isilon.com> Reviewed by: kib Relnotes: Yes Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D22083
show more ...
|
Revision tags: release/12.1.0, release/11.3.0, release/12.0.0, release/11.2.0 |
|
#
82725ba9 |
| 23-Nov-2017 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Merge ^/head r325999 through r326131.
|
#
51369649 |
| 20-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification to make it easier for
sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
show more ...
|
Revision tags: release/10.4.0, release/11.1.0 |
|
#
5e386598 |
| 26-Mar-2017 |
Robert Watson <rwatson@FreeBSD.org> |
Merge OpenBSM 1.2-alpha5 from vendor branch to FreeBSD -CURRENT:
- Add a new "qsize" parameter in audit_control and the getacqsize(3) API to query it, allowing to set the kernel's maximum audit qu
Merge OpenBSM 1.2-alpha5 from vendor branch to FreeBSD -CURRENT:
- Add a new "qsize" parameter in audit_control and the getacqsize(3) API to query it, allowing to set the kernel's maximum audit queue length. - Add support to push a mapping between audit event names and event numbers into the kernel (where supported) using new A_GETEVENT and A_SETEVENT auditon(2) operations. - Add audit event identifiers for a number of new (and not-so-new) FreeBSD system calls including those for asynchronous I/O, thread management, SCTP, jails, multi-FIB support, and misc. POSIX interfaces such as posix_fallocate(2) and posix_fadvise(2). - On operating systems supporting Capsicum, auditreduce(1) and praudit(1) now run sandboxed. - Empty "flags" and "naflags" fields are now permitted in audit_control(5).
Many thanks to Christian Brueffer for producing the OpenBSM release and importing/tagging it in the vendor branch. This release will allow improved auditing of a range of new FreeBSD functionality, as well as non-traditional events (e.g., fine-grained I/O auditing) not required by the Orange Book or Common Criteria.
Obtained from: TrustedBSD Project Sponsored by: DARPA, AFRL MFC after: 3 weeks
show more ...
|
#
47192295 |
| 06-Dec-2016 |
Christian Brueffer <brueffer@FreeBSD.org> |
Vendor import of OpenBSM 1.2-alpha5.
|
Revision tags: release/11.0.1, release/11.0.0, release/10.3.0 |
|
#
b626f5a7 |
| 04-Jan-2016 |
Glen Barber <gjb@FreeBSD.org> |
MFH r289384-r293170
Sponsored by: The FreeBSD Foundation
|
#
9a7cd2e6 |
| 22-Dec-2015 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
MFH @r292599
This includes the pluggable TCP framework and other chnages to the netstack to track for VNET stability.
Security: The FreeBSD Foundation
|
#
8a0f5c0b |
| 21-Dec-2015 |
Christian Brueffer <brueffer@FreeBSD.org> |
Merge from contrib/openbsm to bring the kernel audit bits up to date with OpenBSM 1.2 alpha 4:
- remove $P4$ - fix a comment
|
#
97aa9e73 |
| 09-Dec-2015 |
Christian Brueffer <brueffer@FreeBSD.org> |
Vendor import of OpenBSM 1.2-alpha4.
|
Revision tags: release/10.2.0, release/10.1.0, release/9.3.0, release/10.0.0 |
|
#
0bfd163f |
| 18-Oct-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Merge head r233826 through r256722.
|
#
1ccca3b5 |
| 10-Oct-2013 |
Alan Somers <asomers@FreeBSD.org> |
IFC @256277
Approved by: ken (mentor)
|
Revision tags: release/9.2.0 |
|
#
ef90af83 |
| 20-Sep-2013 |
Peter Grehan <grehan@FreeBSD.org> |
IFC @ r255692
Comment out IA32_MISC_ENABLE MSR access - this doesn't exist on AMD. Need to sort out how arch-specific MSRs will be handled.
|
#
47823319 |
| 11-Sep-2013 |
Peter Grehan <grehan@FreeBSD.org> |
IFC @ r255459
|
#
0fbf163e |
| 06-Sep-2013 |
Mark Murray <markm@FreeBSD.org> |
MFC
|
#
d1d01586 |
| 05-Sep-2013 |
Simon J. Gerraty <sjg@FreeBSD.org> |
Merge from head
|
#
7008be5b |
| 05-Sep-2013 |
Pawel Jakub Dawidek <pjd@FreeBSD.org> |
Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way.
The cap_rights_t represents capability rights. We used to use o
Change the cap_rights_t type from uint64_t to a structure that we can extend in the future in a backward compatible (API and ABI) way.
The cap_rights_t represents capability rights. We used to use one bit to represent one right, but we are running out of spare bits. Currently the new structure provides place for 114 rights (so 50 more than the previous cap_rights_t), but it is possible to grow the structure to hold at least 285 rights, although we can make it even larger if 285 rights won't be enough.
The structure definition looks like this:
struct cap_rights { uint64_t cr_rights[CAP_RIGHTS_VERSION + 2]; };
The initial CAP_RIGHTS_VERSION is 0.
The top two bits in the first element of the cr_rights[] array contain total number of elements in the array - 2. This means if those two bits are equal to 0, we have 2 array elements.
The top two bits in all remaining array elements should be 0. The next five bits in all array elements contain array index. Only one bit is used and bit position in this five-bits range defines array index. This means there can be at most five array elements in the future.
To define new right the CAPRIGHT() macro must be used. The macro takes two arguments - an array index and a bit to set, eg.
#define CAP_PDKILL CAPRIGHT(1, 0x0000000000000800ULL)
We still support aliases that combine few rights, but the rights have to belong to the same array element, eg:
#define CAP_LOOKUP CAPRIGHT(0, 0x0000000000000400ULL) #define CAP_FCHMOD CAPRIGHT(0, 0x0000000000002000ULL)
#define CAP_FCHMODAT (CAP_FCHMOD | CAP_LOOKUP)
There is new API to manage the new cap_rights_t structure:
cap_rights_t *cap_rights_init(cap_rights_t *rights, ...); void cap_rights_set(cap_rights_t *rights, ...); void cap_rights_clear(cap_rights_t *rights, ...); bool cap_rights_is_set(const cap_rights_t *rights, ...);
bool cap_rights_is_valid(const cap_rights_t *rights); void cap_rights_merge(cap_rights_t *dst, const cap_rights_t *src); void cap_rights_remove(cap_rights_t *dst, const cap_rights_t *src); bool cap_rights_contains(const cap_rights_t *big, const cap_rights_t *little);
Capability rights to the cap_rights_init(), cap_rights_set(), cap_rights_clear() and cap_rights_is_set() functions are provided by separating them with commas, eg:
cap_rights_t rights;
cap_rights_init(&rights, CAP_READ, CAP_WRITE, CAP_FSTAT);
There is no need to terminate the list of rights, as those functions are actually macros that take care of the termination, eg:
#define cap_rights_set(rights, ...) \ __cap_rights_set((rights), __VA_ARGS__, 0ULL) void __cap_rights_set(cap_rights_t *rights, ...);
Thanks to using one bit as an array index we can assert in those functions that there are no two rights belonging to different array elements provided together. For example this is illegal and will be detected, because CAP_LOOKUP belongs to element 0 and CAP_PDKILL to element 1:
cap_rights_init(&rights, CAP_LOOKUP | CAP_PDKILL);
Providing several rights that belongs to the same array's element this way is correct, but is not advised. It should only be used for aliases definition.
This commit also breaks compatibility with some existing Capsicum system calls, but I see no other way to do that. This should be fine as Capsicum is still experimental and this change is not going to 9.x.
Sponsored by: The FreeBSD Foundation
show more ...
|