#
3360a158 |
| 24-Oct-2024 |
Kyle Evans <kevans@FreeBSD.org> |
net: route: convert routing statistics to a sysctl
Exporting the relevant pcpustat is trivial, so let's do that. We will use it in a near-future change in netstat to avoid having to dig around in m
net: route: convert routing statistics to a sysctl
Exporting the relevant pcpustat is trivial, so let's do that. We will use it in a near-future change in netstat to avoid having to dig around in mem(4) for live kernel statistics.
Differential Revision: https://reviews.freebsd.org/D47231
show more ...
|
Revision tags: release/13.4.0, release/14.1.0, release/13.3.0 |
|
#
29363fb4 |
| 23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl s
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script.
Sponsored by: Netflix
show more ...
|
Revision tags: release/14.0.0 |
|
#
2ff63af9 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .h pattern
Remove /^\s*\*+\s*\$FreeBSD\$.*$\n/
|
Revision tags: release/13.2.0 |
|
#
19e43c16 |
| 27-Mar-2023 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: add netlink KPI to the kernel by default
This change does the following:
Base Netlink KPIs (ability to register the family, parse and/or write a Netlink message) are always present in the
netlink: add netlink KPI to the kernel by default
This change does the following:
Base Netlink KPIs (ability to register the family, parse and/or write a Netlink message) are always present in the kernel. Specifically, * Implementation of genetlink family/group registration/removal, some base accessors (netlink_generic_kpi.c, 260 LoC) are compiled in unconditionally. * Basic TLV parser functions (netlink_message_parser.c, 507 LoC) are compiled in unconditionally. * Glue functions (netlink<>rtsock), malloc/core sysctl definitions (netlink_glue.c, 259 LoC) are compiled in unconditionally. * The rest of the KPI _functions_ are defined in the netlink_glue.c, but their implementation calls a pointer to either the stub function or the actual function, depending on whether the module is loaded or not.
This approach allows to have only 1k LoC out of ~3.7k LoC (current sys/netlink implementation) in the kernel, which will not grow further. It also allows for the generic netlink kernel customers to load successfully without requiring Netlink module and operate correctly once Netlink module is loaded.
Reviewed by: imp MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D39269
show more ...
|
#
2c2b37ad |
| 13-Jan-2023 |
Justin Hibbits <jhibbits@FreeBSD.org> |
ifnet/API: Move struct ifnet definition to a <net/if_private.h>
Hide the ifnet structure definition, no user serviceable parts inside, it's a netstack implementation detail. Include it temporarily
ifnet/API: Move struct ifnet definition to a <net/if_private.h>
Hide the ifnet structure definition, no user serviceable parts inside, it's a netstack implementation detail. Include it temporarily in <net/if_var.h> until all drivers are updated to use the accessors exclusively.
Reviewed by: glebius Sponsored by: Juniper Networks, Inc. Differential Revision: https://reviews.freebsd.org/D38046
show more ...
|
#
3636a967 |
| 15-Dec-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
route: allow RTM_CHANGE notifications in rt_routemsg().
MFC after: 2 weeks
|
#
1bcd230f |
| 03-Dec-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: add interface notification on link status / flags change.
* Add link-state change notifications by subscribing to ifnet_link_event. In the Linux netlink model, link state is reported in 2
netlink: add interface notification on link status / flags change.
* Add link-state change notifications by subscribing to ifnet_link_event. In the Linux netlink model, link state is reported in 2 places: first is the IFLA_OPERSTATE, which stores state per RFC2863. The second is an IFF_LOWER_UP interface flag. As many applications rely on the latter, reserve 1 bit from if_flags, named as IFF_NETLINK_1. This flag is mapped to IFF_LOWER_UP in the netlink headers. This is done to avoid making applications think this flag is actually supported / presented in non-netlink outputs. * Add flag change notifications, by hooking into rt_ifmsg(). In the netlink model, notification should include the bitmask for the change flags. Update rt_ifmsg() to include such bitmask.
Differential Revision: https://reviews.freebsd.org/D37597
show more ...
|
Revision tags: release/12.4.0, release/13.1.0 |
|
#
7e5bf684 |
| 20-Jan-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
netlink: add netlink support
Netlinks is a communication protocol currently used in Linux kernel to modify, read and subscribe for nearly all networking state. Interfaces, addresses, routes, firew
netlink: add netlink support
Netlinks is a communication protocol currently used in Linux kernel to modify, read and subscribe for nearly all networking state. Interfaces, addresses, routes, firewall, fibs, vnets, etc are controlled via netlink. It is async, TLV-based protocol, providing 1-1 and 1-many communications.
The current implementation supports the subset of NETLINK_ROUTE family. To be more specific, the following is supported: * Dumps: - routes - nexthops / nexthop groups - interfaces - interface addresses - neighbors (arp/ndp) * Notifications: - interface arrival/departure - interface address arrival/departure - route addition/deletion * Modifications: - adding/deleting routes - adding/deleting nexthops/nexthops groups - adding/deleting neghbors - adding/deleting interfaces (basic support only) * Rtsock interaction - route events are bridged both ways
The implementation also supports the NETLINK_GENERIC family framework.
Implementation notes: Netlink is implemented via loadable/unloadable kernel module, not touching many kernel parts. Each netlink socket uses dedicated taskqueue to support async operations that can sleep, such as interface creation. All message processing is performed within these taskqueues.
Compatibility: Most of the Netlink data models specified above maps to FreeBSD concepts nicely. Unmodified ip(8) binary correctly works with interfaces, addresses, routes, nexthops and nexthop groups. Some software such as net/bird require header-only modifications to compile and work with FreeBSD netlink.
Reviewed by: imp Differential Revision: https://reviews.freebsd.org/D36002 MFC after: 2 months
show more ...
|
#
000250be |
| 08-Sep-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: add abitity to set the protocol that installed route/nexthop.
Routing daemons such as bird need to know if they install certain route so they can clean it up on startup, as a form of achie
routing: add abitity to set the protocol that installed route/nexthop.
Routing daemons such as bird need to know if they install certain route so they can clean it up on startup, as a form of achieving consistent state during the crash recovery. Currently they use combination of routing flags (RTF_PROTO1) to detect these routes when interacting via route(4) rtsock protocol. Netlink protocol has a special "rtm_protocol" field that is filled and checked by the route originator. To prepare for the upcoming netlink introduction, add ability to record origing to both nexthops and nexthop groups via <nhop|nhgrp>_<get|set>_origin() KPI. The actual calls will be used in the followup commits.
MFC after: 1 month
show more ...
|
#
6d4f6e4c |
| 09-Aug-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: make rib_add_redirect() use new nhop-based KPI
MFC after: 1 month Differential Revision: https://reviews.freebsd.org/D36169
|
#
88a782fc |
| 15-Aug-2022 |
Mateusz Guzik <mjg@FreeBSD.org> |
routing: G/C rt_exportinfo declaration
Sponsored by: Rubicon Communications, LLC ("Netgate")
|
#
036f1bc6 |
| 14-Aug-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: retire rib_lookup_info()
This function was added in pre-epoch era ( 9a1b64d5a0224 ) to provide public rtentry access interface & hide rtentry internals. The implementation is based on the
routing: retire rib_lookup_info()
This function was added in pre-epoch era ( 9a1b64d5a0224 ) to provide public rtentry access interface & hide rtentry internals. The implementation is based on the large on-stack copying and refcounting of the referenced objects (ifa/ifp). It has become obsolete after epoch & nexthop introduction. Convert the last remaining user and remove the function itself.
Differential Revision: https://reviews.freebsd.org/D36197
show more ...
|
#
66230639 |
| 04-Aug-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: split nexthop creation and rtentry creation.
This change is required for the upcoming introduction of the next nexhop-based operations KPI, as it will create rtentry and nexthops at diffe
routing: split nexthop creation and rtentry creation.
This change is required for the upcoming introduction of the next nexhop-based operations KPI, as it will create rtentry and nexthops at different stages of route table modification.
Differential Revision: https://reviews.freebsd.org/D36072 MFC after: 2 weeks
show more ...
|
#
800c6846 |
| 29-Jul-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: add nhop(9) kpi.
Differential Revision: https://reviews.freebsd.org/D35985 MFC after: 1 month
|
Revision tags: release/12.3.0 |
|
#
4b631fc8 |
| 07-Sep-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: fix source address selection rules for IPv4 over IPv6.
Current logic always selects an IFA of the same family from the outgoing interfaces. In IPv4 over IPv6 setup there can be just singl
routing: fix source address selection rules for IPv4 over IPv6.
Current logic always selects an IFA of the same family from the outgoing interfaces. In IPv4 over IPv6 setup there can be just single non-127.0.0.1 ifa, attached to the loopback interface.
Create a separate rt_getifa_family() to handle entire ifa selection for the IPv4 over IPv6.
Differential Revision: https://reviews.freebsd.org/D31868 MFC after: 1 week
show more ...
|
#
d98954e2 |
| 29-Aug-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: Bring back the ability to specify transmit interface via its name.
Some software references outgoing interfaces by specifying name instead of index.
Use rti_ifp from rt_addrinfo if provid
routing: Bring back the ability to specify transmit interface via its name.
Some software references outgoing interfaces by specifying name instead of index.
Use rti_ifp from rt_addrinfo if provided instead of always using address interface when constructing nexthop.
PR: 255678 Reported by: martin.larsson2 at gmail.com MFC after: 1 week
show more ...
|
#
a7581946 |
| 23-Jun-2021 |
Rozhuk Ivan <rozhuk.im@gmail.com> |
devctl: add ADDR_ADD and ADDR_DEL devctl event for IFNET
Add devd event on network iface address add/remove. Can be used to automate actions on any address change.
Reviewed by: imp@ (and minor st
devctl: add ADDR_ADD and ADDR_DEL devctl event for IFNET
Add devd event on network iface address add/remove. Can be used to automate actions on any address change.
Reviewed by: imp@ (and minor style tweaks) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D30840
show more ...
|
#
8e8f1cc9 |
| 23-Apr-2021 |
Mark Johnston <markj@FreeBSD.org> |
Re-enable network ioctls in capability mode
This reverts a portion of 274579831b61 ("capsicum: Limit socket operations in capability mode") as at least rtsol and dhcpcd rely on being able to configu
Re-enable network ioctls in capability mode
This reverts a portion of 274579831b61 ("capsicum: Limit socket operations in capability mode") as at least rtsol and dhcpcd rely on being able to configure network interfaces while in capability mode.
Reported by: bapt, Greg V Sponsored by: The FreeBSD Foundation
show more ...
|
Revision tags: release/13.0.0 |
|
#
27457983 |
| 07-Apr-2021 |
Mark Johnston <markj@FreeBSD.org> |
capsicum: Limit socket operations in capability mode
Capsicum did not prevent certain privileged networking operations, specifically creation of raw sockets and network configuration ioctls. However
capsicum: Limit socket operations in capability mode
Capsicum did not prevent certain privileged networking operations, specifically creation of raw sockets and network configuration ioctls. However, these facilities can be used to circumvent some of the restrictions that capability mode is supposed to enforce.
Add capability mode checks to disallow network configuration ioctls and creation of sockets other than PF_LOCAL and SOCK_DGRAM/STREAM/SEQPACKET internet sockets.
Reviewed by: oshogbo Discussed with: emaste Reported by: manu Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D29423
show more ...
|
#
b1d63265 |
| 08-Mar-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Flush remaining routes from the routing table during VNET shutdown.
Summary: This fixes rtentry leak for the cloned interfaces created inside the VNET.
PR: 253998 Reported by: rashey at superbox.p
Flush remaining routes from the routing table during VNET shutdown.
Summary: This fixes rtentry leak for the cloned interfaces created inside the VNET.
PR: 253998 Reported by: rashey at superbox.pl MFC after: 3 days
Loopback teardown order is `SI_SUB_INIT_IF`, which happens after `SI_SUB_PROTO_DOMAIN` (route table teardown). Thus, any route table operations are too late to schedule. As the intent of the vnet teardown procedures to minimise the amount of effort by doing global cleanups instead of per-interface ones, address this by adding a relatively light-weight routing table cleanup function, `rib_flush_routes()`. It removes all remaining routes from the routing table and schedules the deletion, which will happen later, when `rtables_destroy()` waits for the current epoch to finish.
Test Plan: ``` set_skip:set_skip_group_lo -> passed [0.053s] tail -n 200 /var/log/messages | grep rtentry ```
Reviewers: #network, kp, bz
Reviewed By: kp
Subscribers: imp, ae
Differential Revision: https://reviews.freebsd.org/D29116
show more ...
|
#
59641728 |
| 22-Feb-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Simplify ifa/ifp refcounting in the routing stack.
The routing stack control depends on quite a tree of functions to determine the proper attributes of a route such as a source address (ifa) or tr
Simplify ifa/ifp refcounting in the routing stack.
The routing stack control depends on quite a tree of functions to determine the proper attributes of a route such as a source address (ifa) or transmit ifp of a route.
When actually inserting a route, the stack needs to ensure that ifa and ifp points to the entities that are still valid. Validity means slightly more than just pointer validity - stack need guarantee that the provided objects are not scheduled for deletion.
Currently, callers either ignore it (most ifp parts, historically) or try to use refcounting (ifa parts). Even in case of ifa refcounting it's not always implemented in fully-safe manner. For example, some codepaths inside rt_getifa_fib() are referencing ifa while not holding any locks, resulting in possibility of referencing scheduled-for-deletion ifa.
Instead of trying to fix all of the callers by enforcing proper refcounting, switch to a different model. As the rib_action() already requires epoch, do not require any stability guarantees other than the epoch-provided one. Use newly-added conditional versions of the refcounting functions (ifa_try_ref(), if_try_ref()) and fail if any of these fails.
Reviewed by: donner MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D28837
show more ...
|
#
cb984c62 |
| 30-Jan-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Fix multipath support for rib_lookup_info().
The initial plan was to remove rib_lookup_info() before FreeBSD 13. As several customers are still remaining, fix rib_lookup_info() for the multipath u
Fix multipath support for rib_lookup_info().
The initial plan was to remove rib_lookup_info() before FreeBSD 13. As several customers are still remaining, fix rib_lookup_info() for the multipath use case.
show more ...
|
#
81728a53 |
| 09-Jan-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes, similar to ifa_maintain_loopback_route(). It was intended to be family-agno
Split rtinit() into multiple functions.
rtinit[1]() is a function used to add or remove interface address prefix routes, similar to ifa_maintain_loopback_route(). It was intended to be family-agnostic. There is a problem with this approach in reality.
1) IPv6 code does not use it for the ifa routes. There is a separate layer, nd6_prelist_(), providing interface for maintaining interface routes. Its part, responsible for the actual route table interaction, mimics rtenty() code.
2) rtinit tries to combine multiple actions in the same function: constructing proper route attributes and handling iterations over multiple fibs, for the non-zero net.add_addr_allfibs use case. It notably increases the code complexity.
3) dstaddr handling. flags parameter re-uses RTF_ flags. As there is no special flag for p2p connections, host routes and p2p routes are handled in the same way. Additionally, mapping IFA flags to RTF flags makes the interface pretty messy. It make rtinit() to clash with ifa_mainain_loopback_route() for IPV4 interface aliases.
4) rtinit() is the last customer passing non-masked prefixes to rib_action(), complicating rib_action() implementation.
5) rtinit() coupled ifa announce/withdrawal notifications, producing "false positive" ifa messages in certain corner cases.
To address all these points, the following has been done:
* rtinit() has been split into multiple functions: - Route attribute construction were moved to the per-address-family functions, dealing with (2), (3) and (4). - funnction providing net.add_addr_allfibs handling and route rtsock notificaions is the new routing table inteface. - rtsock ifa notificaion has been moved out as well. resulting set of funcion are only responsible for the actual route notifications.
Side effects: * /32 alias does not result in interface routes (/32 route and "host" route) * RTF_PINNED is now set for IPv6 prefixes corresponding to the interface addresses
Differential revision: https://reviews.freebsd.org/D28186
show more ...
|
#
d68cf57b |
| 07-Jan-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Refactor rt_addrmsg() and rt_routemsg().
Summary: * Refactor rt_addrmsg(): make V_rt_add_addr_allfibs decision locally. * Fix rt_routemsg() and multipath by accepting nexthop instead of interface po
Refactor rt_addrmsg() and rt_routemsg().
Summary: * Refactor rt_addrmsg(): make V_rt_add_addr_allfibs decision locally. * Fix rt_routemsg() and multipath by accepting nexthop instead of interface pointer. * Refactor rtsock_routemsg(): avoid accessing rtentry fields directly. * Simplify in_addprefix() by moving prefix search to a separate function.
Reviewers: #network
Subscribers: imp, ae, bz
Differential Revision: https://reviews.freebsd.org/D28011
show more ...
|
#
f5baf8bb |
| 25-Dec-2020 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Add modular fib lookup framework.
This change introduces framework that allows to dynamically attach or detach longest prefix match (lpm) lookup algorithms to speed up datapath route tables lookup
Add modular fib lookup framework.
This change introduces framework that allows to dynamically attach or detach longest prefix match (lpm) lookup algorithms to speed up datapath route tables lookups.
Framework takes care of handling initial synchronisation, route subscription, nhop/nhop groups reference and indexing, dataplane attachments and fib instance algorithm setup/teardown. Framework features automatic algorithm selection, allowing for picking the best matching algorithm on-the-fly based on the amount of routes in the routing table.
Currently framework code is guarded under FIB_ALGO config option. An idea is to enable it by default in the next couple of weeks.
The following algorithms are provided by default: IPv4: * bsearch4 (lockless binary search in a special IP array), tailored for small-fib (<16 routes) * radix4_lockless (lockless immutable radix, re-created on every rtable change), tailored for small-fib (<1000 routes) * radix4 (base system radix backend) * dpdk_lpm4 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) IPv6: * radix6_lockless (lockless immutable radix, re-created on every rtable change), tailed for small-fib (<1000 routes) * radix6 (base system radix backend) * dpdk_lpm6 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412)
Performance changes: Micro benchmarks (I7-7660U, single-core lookups, 2048k dst, code in D27604): IPv4: 8 routes: radix4: ~20mpps radix4_lockless: ~24.8mpps bsearch4: ~69mpps dpdk_lpm4: ~67 mpps 700k routes: radix4_lockless: 3.3mpps dpdk_lpm4: 46mpps
IPv6: 8 routes: radix6_lockless: ~20mpps dpdk_lpm6: ~70mpps 100k routes: radix6_lockless: 13.9mpps dpdk_lpm6: 57mpps
Forwarding benchmarks: + 10-15% IPv4 forwarding performance (small-fib, bsearch4) + 25% IPv4 forwarding performance (full-view, dpdk_lpm4) + 20% IPv6 forwarding performance (full-view, dpdk_lpm6)
Control: Framwork adds the following runtime sysctls:
List algos * net.route.algo.inet.algo_list: bsearch4, radix4_lockless, radix4 * net.route.algo.inet6.algo_list: radix6_lockless, radix6, dpdk_lpm6 Debug level (7=LOG_DEBUG, per-route) net.route.algo.debug_level: 5 Algo selection (currently only for fib 0): net.route.algo.inet.algo: bsearch4 net.route.algo.inet6.algo: radix6_lockless
Support for manually changing algos in non-default fib will be added soon. Some sysctl names will be changed in the near future.
Differential Revision: https://reviews.freebsd.org/D27401
show more ...
|