#
06cf3651 |
| 05-Feb-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: provide genl_unregister_group()
Cause generic netlink group IDs are dynamic, we go through all sockets and unsubscribe from the group that goes away. Otherwise they could be surprisingly f
netlink: provide genl_unregister_group()
Cause generic netlink group IDs are dynamic, we go through all sockets and unsubscribe from the group that goes away. Otherwise they could be surprisingly find themselves subscribed to a group created later.
show more ...
|
#
ee507b70 |
| 05-Feb-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: refactor KPI for generic Netlink modules
Now that the family and group are completely private to netlink_generic.c, provide a simple and robust KPI, that would require very simple guarantee
netlink: refactor KPI for generic Netlink modules
Now that the family and group are completely private to netlink_generic.c, provide a simple and robust KPI, that would require very simple guarantees from both KPI and the module:
* Strings are used only for family and group registration, that return ID: uint16_t genl_register_family(const char *name, ... uint32_t genl_register_group(uint16_t family, const char *name, ... * Once created families and groups are guaranteed to not disappear and be addressable by their ID. * All subsequent calls, including deregistration shall use ID.
Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D48845
show more ...
|
#
ef3991d7 |
| 05-Feb-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: don't store an extra pointer to so_cred
|
#
841dcdcd |
| 05-Feb-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: initialize VNET context with VNET_SYSINIT()
With the initial check-in netlink(4) was very conservative with regards to using memory and intrusiveness to the kernel and network stack. In par
netlink: initialize VNET context with VNET_SYSINIT()
With the initial check-in netlink(4) was very conservative with regards to using memory and intrusiveness to the kernel and network stack. In particular it would initialize the VNET context only on the first actuall call to socket(PF_NETLINK), saving on allocation of a struct nl_control of size 224 bytes.
Now it is clear that netlink(4) is primary citizen of FreeBSD, with a set of system tools using it. So resort to normal VNET_SYSINIT() and with that shave a lot of complexity, since after the change V_nl_ctl is immutable.
show more ...
|
#
753a4acd |
| 04-Feb-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: make struct genl_family and genl_group private
|
Revision tags: release/14.1.0-p7, release/14.2.0-p1, release/13.4.0-p3 |
|
#
926d2ead |
| 11-Jan-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: some refactoring of NETLINK_GENERIC layer
- Statically initialize control family/group. This removes extra startup code and provides a strong guarantee that they reside at the 0 index of t
netlink: some refactoring of NETLINK_GENERIC layer
- Statically initialize control family/group. This removes extra startup code and provides a strong guarantee that they reside at the 0 index of the respective arrays. Before a genl_register_family() with a higher SYSINIT order could try to hijack index 0.
- Remove the family_id field completely. Now the family ID as well as group ID are array indices and there is basically no place for a mistake. Previous code had a bug where a KPI user could induce an ID mismatch.
- Merge netlink_generic_kpi.c to netlink_generic.c. Both files are small and now there is more dependency between the control family and the family allocator. Ok'ed by melifaro@.
Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D48316
show more ...
|
#
0fda4ffd |
| 11-Jan-2025 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: augment group writer with priv(9) argument
This will allow to broadcast messages visible only to priveleged subscribers.
Reviewed by: melifaro Differential Revision: https://reviews.freeb
netlink: augment group writer with priv(9) argument
This will allow to broadcast messages visible only to priveleged subscribers.
Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D48307
show more ...
|
#
f1c6edba |
| 03-Dec-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: use size_t through the allocation KPI
This fixes some signedness bugs and potential underflows. The length of nl_buf is still limited by UINT_MAX and this is asserted now.
Reviewed by: m
netlink: use size_t through the allocation KPI
This fixes some signedness bugs and potential underflows. The length of nl_buf is still limited by UINT_MAX and this is asserted now.
Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D47551
show more ...
|
#
a034c0ae |
| 03-Dec-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: refactor writer initialization KPI
o Allow callers to initialize a writer that will malloc(9) with M_WAITOK. o Use size_t for expected malloc size. o Use correct types to initialize a group
netlink: refactor writer initialization KPI
o Allow callers to initialize a writer that will malloc(9) with M_WAITOK. o Use size_t for expected malloc size. o Use correct types to initialize a group writer. o Rename functions into nl_writer_ namespace instead of nlmsg_, cause they are working on nl_writer, not on nlmsg. o Make the KPI responsible to sparsely initialize the writer structure. o Garbage collect chain writer. Fixes 17083b94a915.
All current consumers are left as is, however some may benefit from M_WAITOK allocation as well as supplying a correct expected size.
Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D47549
show more ...
|
#
edf5608b |
| 03-Dec-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: use bitset(9)
Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D47548
|
#
ac84ce05 |
| 03-Dec-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: consistently use uint16_t for family id
Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D47547
|
Revision tags: release/14.2.0, release/13.4.0, release/14.1.0, release/13.3.0 |
|
#
09fa78d4 |
| 09-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: fix regression with group writers
Refactoring of argument list to nl_send_one() led to derefercing wrong union member. Rename nl_send_one() to a more generic name, isolate anew nl_send_one
netlink: fix regression with group writers
Refactoring of argument list to nl_send_one() led to derefercing wrong union member. Rename nl_send_one() to a more generic name, isolate anew nl_send_one() as the callback only for the normal writer and provide correct argument to nl_send() from nl_send_group().
Fixes: ff5ad900d2a0793659241eee96be53e6053b5081
show more ...
|
#
ff5ad900 |
| 02-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: refactor control data generation for recvmsg(2)
Netlink should return a very simple control data on every recvmsg(2) syscall. This data is associated with a syscall, not with an nlmsg, nei
netlink: refactor control data generation for recvmsg(2)
Netlink should return a very simple control data on every recvmsg(2) syscall. This data is associated with a syscall, not with an nlmsg, neither with internal our internal representation (nl_bufs). There is no need to pre-allocate it in non-sleepable context and attach to nl_buf. Allocate right in the syscall with M_WAITOK. This also shaves lots of code and simplifies things.
Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D42989
show more ...
|
#
17083b94 |
| 02-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: use protocol specific receive buffer
Implement Netlink socket receive buffer as a simple TAILQ of nl_buf's, same part of struct sockbuf that is used for send buffer already. This shaves a l
netlink: use protocol specific receive buffer
Implement Netlink socket receive buffer as a simple TAILQ of nl_buf's, same part of struct sockbuf that is used for send buffer already. This shaves a lot of code and a lot of extra processing. The pcb rids of the I/O queues as the socket buffer is exactly the queue. The message writer is simplified a lot, as we now always deal with linear buf. Notion of different buffer types goes away as way as different kinds of writers. The only things remaining are: a socket writer and a group writer. The impact on the network stack is that we no longer use mbufs, so a workaround from d18715475071 disappears.
Note on message throttling. Now the taskqueue throttling mechanism needs to look at both socket buffers protected by their respective locks and on flags in the pcb that are protected by the pcb lock. There is definitely some room for optimization, but this changes tries to preserve as much as possible.
Note on new nl_soreceive(). It emulates soreceive_generic(). It must undergo further optimization, see large comment put in there.
Note on tests/sys/netlink/test_netlink_message_writer.py. This test boiled down almost to nothing with mbufs removed. However, I left it with minimal functionality (it basically checks that allocating N bytes we get N bytes) as it is one of not so many examples of ktest framework that allows to test KPIs with python.
Note on Linux support. It got much simplier: Netlink message writer loses notion of Linux support lifetime, it is same regardless of process ABI. On socket write from Linux process we perform conversion immediately in nl_receive_message() and on an output conversion to Linux happens in in nl_send_one(). XXX: both conversions use M_NOWAIT allocation, which used to be the case before this change, too.
Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D42524
show more ...
|
#
660bd40a |
| 02-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: use domain specific send buffer
Instead of using generic socket code, create Netlink specific socket buffer. It is a simple TAILQ of writes that came from userland. This saves us one memo
netlink: use domain specific send buffer
Instead of using generic socket code, create Netlink specific socket buffer. It is a simple TAILQ of writes that came from userland. This saves us one memory allocation that could fail and one memory copy.
Reviewed by: melifaro Differential Revision: https://reviews.freebsd.org/D42522
show more ...
|
#
dbc46311 |
| 27-Dec-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netlink: remove unused structure
|
Revision tags: release/14.0.0 |
|
#
d1871547 |
| 31-May-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: use custom uma zone for the mbuf storage.
Netlink communicates with userland via sockets, utilising MCLBYTES-sized mbufs to append data to the socket buffers. These mbufs are never transmi
netlink: use custom uma zone for the mbuf storage.
Netlink communicates with userland via sockets, utilising MCLBYTES-sized mbufs to append data to the socket buffers. These mbufs are never transmitted via logical or physical network.
It may be possible that the 2k mbuf zone is temporary exhausted due to the DDoS-style traffic, leading to Netlink failure to respond to the requests.
To address it, this change introduces a custom Netlink-specific zone for the mbuf storage. It has the following benefits: * no precious memory from UMA_ZONE_CONTIG zones is utilized for Netlink * Netlink becomes (more) independent from the traffic spikes and other related network "corner" conditions. * Netlink allocations are now isolated within a specific zone, making it easier to track Netlink mbuf usage and attribute mbufs.
Reviewed by: gallatin, adrian Differential Revision: https://reviews.freebsd.org/D40356 MFC after: 2 weeks
show more ...
|
#
4d846d26 |
| 10-May-2023 |
Warner Losh <imp@FreeBSD.org> |
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
show more ...
|
#
30d7e724 |
| 28-Apr-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
route: show originator PID in netlink monitor
Replacing rtsock with netlink also means providing similar tracing facilities, rtsock provides `route -n monitor` interface, where each message can be t
route: show originator PID in netlink monitor
Replacing rtsock with netlink also means providing similar tracing facilities, rtsock provides `route -n monitor` interface, where each message can be traced to the originating PID. This diff closes the feature gap between rtsock and netlink in that regard.
Netlink works slightly differently from rtsock, as it is a generic message "broker". It calls some kernel KPIs and returns the result to the caller. Other Netlink consumers gets notified on the changed kernel state using the relevant subsystem callbacks. Typically, it is close to impossible to pass some data through these KPIs to enhance the notification.
This diff approaches the problem by using osd(9) to assign the relevant socket pointer (`'nlp`) to the per-socket taskqueue execution thread. This change allows to recover the pointer in the aforementioned notification callbacks and extract some additional data. Using `osd(9)` (and adding additional metadata) to the notification receiver comes with some additional cost attached, so this interface needs to be enabled explicitly by using a newly-created `NETLINK_MSG_INFO` `SOL_NETLINK` socket option.
The actual medatadata (which includes the originator PID) is provided via control messages. To enable extensibility, the control message data is encoded in the standard netlink(TLV-based) fashion. The list of the currently-provided properties can be found in `nlmsginfo_attrs`. snl(3) is extended to enable decoding of netlink messages with metadata (`snl_read_message_dbg()` stores the parsed structure in the provided buffer).
Differential Revision: https://reviews.freebsd.org/D39391
show more ...
|
#
089104e0 |
| 19-Apr-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: add netlink interfaces to if_clone
This change adds netlink create/modify/dump interfaces to the `if_clone.c`. The previous attempt with storing the logic inside `netlink/route/iface_driver
netlink: add netlink interfaces to if_clone
This change adds netlink create/modify/dump interfaces to the `if_clone.c`. The previous attempt with storing the logic inside `netlink/route/iface_drivers.c` did not quite work, as, for example, dumping interface-specific state (like vlan id or vlan parent) required some peeking into the private interfaces.
The new interfaces are added in a compatible way - callers don't have to do anything unless they are extended with Netlink.
Reviewed by: kp Differential Revision: https://reviews.freebsd.org/D39032 MFC after: 1 month
show more ...
|
Revision tags: release/13.2.0 |
|
#
19e43c16 |
| 27-Mar-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: add netlink KPI to the kernel by default
This change does the following:
Base Netlink KPIs (ability to register the family, parse and/or write a Netlink message) are always present in the
netlink: add netlink KPI to the kernel by default
This change does the following:
Base Netlink KPIs (ability to register the family, parse and/or write a Netlink message) are always present in the kernel. Specifically, * Implementation of genetlink family/group registration/removal, some base accessors (netlink_generic_kpi.c, 260 LoC) are compiled in unconditionally. * Basic TLV parser functions (netlink_message_parser.c, 507 LoC) are compiled in unconditionally. * Glue functions (netlink<>rtsock), malloc/core sysctl definitions (netlink_glue.c, 259 LoC) are compiled in unconditionally. * The rest of the KPI _functions_ are defined in the netlink_glue.c, but their implementation calls a pointer to either the stub function or the actual function, depending on whether the module is loaded or not.
This approach allows to have only 1k LoC out of ~3.7k LoC (current sys/netlink implementation) in the kernel, which will not grow further. It also allows for the generic netlink kernel customers to load successfully without requiring Netlink module and operate correctly once Netlink module is loaded.
Reviewed by: imp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D39269
show more ...
|
#
04f75b98 |
| 26-Mar-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: allow netlink sockets in non-vnet jails.
This change allow to open Netlink sockets in the non-vnet jails, even for unpriviledged processes. The security model largely follows the existing
netlink: allow netlink sockets in non-vnet jails.
This change allow to open Netlink sockets in the non-vnet jails, even for unpriviledged processes. The security model largely follows the existing one. To be more specific: * by default, every `NETLINK_ROUTE` command is **NOT** allowed in non-VNET jail UNLESS `RTNL_F_ALLOW_NONVNET_JAIL` flag is specified in the command handler. * All notifications are **disabled** for non-vnet jails (requests to subscribe for the notifications are ignored). This will change to be more fine-grained model once the first netlink provider requiring this gets committed. * Listing interfaces (RTM_GETLINK) is **allowed** w/o limits (**including** interfaces w/o any addresses attached to the jail). The value of this is questionable, but it follows the existing approach. * Listing ARP/NDP neighbours is **forbidden**. This is a **change** from the current approach - currently we list static ARP/ND entries belonging to the addresses attached to the jail. * Listing interface addresses is **allowed**, but the addresses are filtered to match only ones attached to the jail. * Listing routes is **allowed**, but the routes are filtered to provide only host routes matching the addresses attached to the jail. * By default, every `NETLINK_GENERIC` command is **allowed** in non-VNET jail (as sub-families may be unrelated to network at all). It is the goal of the family author to implement the restriction if necessary.
Differential Revision: https://reviews.freebsd.org/D39206 MFC after: 1 month
show more ...
|
Revision tags: release/12.4.0 |
|
#
4dfd380e |
| 03-Nov-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: allow more than 64 groups per netlink socket.
|
#
dddafa8d |
| 01-Oct-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: make test-includes happy by hiding most of the header contents under _KERNEL.
|
Revision tags: release/13.1.0 |
|
#
7e5bf684 |
| 20-Jan-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: add netlink support
Netlinks is a communication protocol currently used in Linux kernel to modify, read and subscribe for nearly all networking state. Interfaces, addresses, routes, firew
netlink: add netlink support
Netlinks is a communication protocol currently used in Linux kernel to modify, read and subscribe for nearly all networking state. Interfaces, addresses, routes, firewall, fibs, vnets, etc are controlled via netlink. It is async, TLV-based protocol, providing 1-1 and 1-many communications.
The current implementation supports the subset of NETLINK_ROUTE family. To be more specific, the following is supported: * Dumps: - routes - nexthops / nexthop groups - interfaces - interface addresses - neighbors (arp/ndp) * Notifications: - interface arrival/departure - interface address arrival/departure - route addition/deletion * Modifications: - adding/deleting routes - adding/deleting nexthops/nexthops groups - adding/deleting neghbors - adding/deleting interfaces (basic support only) * Rtsock interaction - route events are bridged both ways
The implementation also supports the NETLINK_GENERIC family framework.
Implementation notes: Netlink is implemented via loadable/unloadable kernel module, not touching many kernel parts. Each netlink socket uses dedicated taskqueue to support async operations that can sleep, such as interface creation. All message processing is performed within these taskqueues.
Compatibility: Most of the Netlink data models specified above maps to FreeBSD concepts nicely. Unmodified ip(8) binary correctly works with interfaces, addresses, routes, nexthops and nexthop groups. Some software such as net/bird require header-only modifications to compile and work with FreeBSD netlink.
Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D36002 MFC after: 2 months
show more ...
|