Revision tags: release/14.0.0 |
|
#
95ee2897 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: two-line .h pattern
Remove /^\s*\*\n \*\s+\$FreeBSD\$$\n/
|
Revision tags: release/13.2.0, release/12.4.0 |
|
#
fe05d1dd |
| 29-Aug-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: extend nhop(9) kpi
* add nhop_get_unlinked() used to prepare referenced but not linked nexthop, that can later be used as a clone source. * add nhop_check_gateway() to check for allowed ad
routing: extend nhop(9) kpi
* add nhop_get_unlinked() used to prepare referenced but not linked nexthop, that can later be used as a clone source. * add nhop_check_gateway() to check for allowed address family combinations between the rib family and neighbor family (useful for 4o6 or direct routes) * add nhop_set_upper_family() to allow copying IPv6 nexthops to IPv4 rib. * add rt_get_rnd() wrapper, returning both nexthop/group and its weight attached to the rtentry. * Add CHT_SLIST_FOREACH_SAFE(), allowing to delete items during iteration.
MFC after: 2 weeks
show more ...
|
#
db4ca190 |
| 29-Aug-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: add ability to store opaque indentifiers in nhops/nhgs
This is a pre-requisite for the direct nexthop/nexhop group operations via netlink.
MFC after: 2 weeks
|
#
40503b79 |
| 07-Aug-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: populate fibs with interface routes after growing net.fibs.
Currently it is possible to extend number of fibs in runtime, but this functionality is of limited use when net.add_addrs_all_fi
routing: populate fibs with interface routes after growing net.fibs.
Currently it is possible to extend number of fibs in runtime, but this functionality is of limited use when net.add_addrs_all_fibs is non-zero, as the routing tables are created empty.
This change automatically populate newly-created fibs with the kernel-originated interface routes (filtered by RTF_PINNED flag) if net.add_addrs_all_fibs is set.
``` -> sysctl net.add_addr_allfibs=1 net.add_addr_allfibs: 0 -> 1 -> sysctl net.fibs net.fibs: 2 -> sysctl net.fibs=3 net.fibs: 2 -> 3
BEFORE: -> setfib 2 netstat -rn Routing tables (fib: 2)
AFTER: -> setfib 2 netstat -rn Routing tables (fib: 2)
Internet: Destination Gateway Flags Netif Expire 10.0.0.0/24 link#1 U vtnet0 10.0.0.5 link#1 UHS lo0 127.0.0.1 link#2 UH lo0
Internet6: Destination Gateway Flags Netif Expire ::1 link#2 UHS lo0 2a01:4f9:3a:fa00::/64 link#1 U vtnet0 2a01:4f9:3a:fa00:5054:ff:fe15:4a3b link#1 UHS lo0 fe80::%vtnet0/64 link#1 U vtnet0 fe80::5054:ff:fe15:4a3b%vtnet0 link#1 UHS lo0 fe80::%lo0/64 link#2 U lo0 fe80::1%lo0 link#2 UHS lo0 ```
Differential Revision: https://reviews.freebsd.org/D36075 MFC after: 1 month
show more ...
|
#
5c4d2252 |
| 08-Aug-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: move rtentry and subscription code out of route_ctl.c
route_ctl.c size has grown considerably since initial introduction. Factor out non-relevant parts: * all rtentry logic, such as creatio
routing: move rtentry and subscription code out of route_ctl.c
route_ctl.c size has grown considerably since initial introduction. Factor out non-relevant parts: * all rtentry logic, such as creation/destruction and accessors goes to net/route/route_rtentry.c * all rtable subscription logic goes to net/route/route_subscription.c
Differential Revision: https://reviews.freebsd.org/D36074 MFC after: 1 month
show more ...
|
#
dedeec11 |
| 03-Aug-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: refactor #2
* Use same filter func (rib_filter_f_t) for nexhtop groups to simplify callbacks. * simplify conditional route deletion & remove the need to pass rt_addrinfo to the low-level
routing: refactor #2
* Use same filter func (rib_filter_f_t) for nexhtop groups to simplify callbacks. * simplify conditional route deletion & remove the need to pass rt_addrinfo to the low-level deletion functions * speedup rib_walk_del() by removing an additional per-prefix lookup
Differential Revision: https://reviews.freebsd.org/D36071 MFC after: 1 month
show more ...
|
#
0d60e88b |
| 02-Aug-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: refactor control cmds #1
This and the follow-up routing-related changes target to remove or reduce `struct rt_addrinfo` usage and use recently-landed nhop(9) KPI instead. Traditionally `r
routing: refactor control cmds #1
This and the follow-up routing-related changes target to remove or reduce `struct rt_addrinfo` usage and use recently-landed nhop(9) KPI instead. Traditionally `rt_addrinfo` structure has been used to propagate all necessary information between the protocol/rtsock and a routing layer. Many functions inside routing subsystem uses it internally. However, using this structure became somewhat complicated, as there are too many ways of specifying a single state and verifying data consistency is hard. For example, arerouting flgs consistent with mask/gateway sockaddr pointers? Is mask really a host mask? Are sockaddr "valid" (e.g. properly zeroed, masked, have proper length)? Are they mutable? Is the suggested interface specified by the interface index embedded into the sockadd_dl gateway, or passed as RTAX_IFP parameter, or directly provided by rti_ifp or it needs to be derived from the ifa? These (and other similar) questions have to be considered every time when a function has `rt_addrinfo` pointer as an argument.
The new approach is to bring more control back to the protocols and construct the desired routing objects themselves - in the end, it's the protocol/subsystem who knows the desired outcome.
This specific diff changes the following: * add explicit basic low-level radix operations: add_route() (renamed from add_route_nhop()) delete_route() (factored from change_route_nhop()) change_route() (renamed from change_route_nhop) * remove "info" parameter from change_route_conditional() as a part of reducing rt_addrinfo usage in the internal KPIs * add lookup_prefix_rt() wrapper for doing re-lookups after RIB lock/unlock
Differential Revision: https://reviews.freebsd.org/D36070 MFC after: 2 weeks
show more ...
|
#
ae6bfd12 |
| 01-Aug-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: refactor private KPI * Make nhgrp_get_nhops() return const struct weightened_nhop to indicate that the list is immutable * Make nhgrp_get_group() return the actual group, instead of group+
routing: refactor private KPI * Make nhgrp_get_nhops() return const struct weightened_nhop to indicate that the list is immutable * Make nhgrp_get_group() return the actual group, instead of group+weight.
MFC after: 2 weeks
show more ...
|
#
5c23343b |
| 29-Jul-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: convert remnants of DPRINTF to FIB_CTL_LOG().
Convert the last remaining pieces of old-style debug messages to the new debugging framework.
Differential Revision: https://reviews.freebsd.
routing: convert remnants of DPRINTF to FIB_CTL_LOG().
Convert the last remaining pieces of old-style debug messages to the new debugging framework.
Differential Revision: https://reviews.freebsd.org/D35994 MFC after: 2 weeks
show more ...
|
#
800c6846 |
| 29-Jul-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: add nhop(9) kpi.
Differential Revision: https://reviews.freebsd.org/D35985 MFC after: 1 month
|
#
29029b06 |
| 28-Jul-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: remove info argument from add/change_route_nhop().
Currently, rt_addrinfo(info) serves as a main "transport" moving state between various functions inside the routing subsystem. As all of
routing: remove info argument from add/change_route_nhop().
Currently, rt_addrinfo(info) serves as a main "transport" moving state between various functions inside the routing subsystem. As all of the fields are filled in directly by the customers, it is problematic to maintain consistency, resulting in repeated checks inside many functions. Additionally, there are multiple ways of specifying the same value (RTAX_IFP vs rti_ifp / rti_ifa) and so on. With the upcoming nhop(9) kpi it is possible to store all of the required state in the nexthops in the consistent fashion, reducing the need to use "info" in the KPI calls. Finally, rt_addrinfo structure format was derived from the rtsock wire format, which is different from other kernel routing users or netlink.
This cleanup simplifies upcoming nhop(9) kpi and netlink introduction.
Reviewed by: zlei.huang@gmail.com Differential Revision: https://reviews.freebsd.org/D35972 MFC after: 2 weeks
show more ...
|
#
2717e958 |
| 28-Jul-2022 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
routing: move route expiration time to its nexthop
Expiration time is actually a path property, not a route property. Move its storage to nexthop to simplify upcoming nhop(9) KPI changes and netlin
routing: move route expiration time to its nexthop
Expiration time is actually a path property, not a route property. Move its storage to nexthop to simplify upcoming nhop(9) KPI changes and netlink introduction.
Differential Revision: https://reviews.freebsd.org/D35970 MFC after: 2 weeks
show more ...
|
Revision tags: release/13.1.0, release/12.3.0 |
|
#
8a0d57ba |
| 25-Apr-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
[fib algo] Delay algo init at fib growth to to allow to reliably use rib KPI.
Currently, most of the rib(9) KPI does not use rnh pointers, using fibnum and family parameters to determine the rib po
[fib algo] Delay algo init at fib growth to to allow to reliably use rib KPI.
Currently, most of the rib(9) KPI does not use rnh pointers, using fibnum and family parameters to determine the rib pointer instead. This works well except for the case when we initialize new rib pointers during fib growth. In that case, there is no mapping between fib/family and the new rib, as an entirely new rib pointer array is populated.
Address this by delaying fib algo initialization till after switching to the new pointer array and updating the number of fibs. Set datapath pointer to the dummy function, so the potential callers won't crash the kernel in the brief moment when the rib exists, but no fib algo is attached.
This change allows to avoid creating duplicates of existing rib functions, with altered signature.
Differential Revision: https://reviews.freebsd.org/D29969 MFC after: 1 week
show more ...
|
#
bc5ef45a |
| 27-Apr-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Fix drace CTF for the rib_head.
33cb3cb2e321 introduced an `rib_head` structure field under the FIB_ALGO define. This may be problematic for the CTF, as some of the files including `route_var.h` do
Fix drace CTF for the rib_head.
33cb3cb2e321 introduced an `rib_head` structure field under the FIB_ALGO define. This may be problematic for the CTF, as some of the files including `route_var.h` do not have `fib_algo` defined.
Make dtrace happy by making the field unconditional.
Suggested by: markj
show more ...
|
#
33cb3cb2 |
| 17-Apr-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Fix rib generation count for fib algo.
Currently, PCB caching mechanism relies on the rib generation counter (rnh_gen) to invalidate cached nhops/LLE entries.
With certain fib algorithms, it is no
Fix rib generation count for fib algo.
Currently, PCB caching mechanism relies on the rib generation counter (rnh_gen) to invalidate cached nhops/LLE entries.
With certain fib algorithms, it is now possible that the datapath lookup state applies RIB changes with some delay. In that scenario, PCB cache will invalidate on the RIB change, but the new lookup may result in the same nexthop being returned. When fib algo finally gets in sync with the RIB changes, PCB cache will not receive any notification and will end up caching the stale data.
To fix this, introduce additional counter, rnh_gen_rib, which is used only when FIB_ALGO is enabled. This counter is incremented by the control plane. Each time when fib algo synchronises with the RIB, it updates rnh_gen to the current rnh_gen_rib value.
Differential Revision: https://reviews.freebsd.org/D29812 Reviewed by: donner MFC after: 2 weeks
show more ...
|
Revision tags: release/13.0.0 |
|
#
d68cf57b |
| 07-Jan-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Refactor rt_addrmsg() and rt_routemsg().
Summary: * Refactor rt_addrmsg(): make V_rt_add_addr_allfibs decision locally. * Fix rt_routemsg() and multipath by accepting nexthop instead of interface po
Refactor rt_addrmsg() and rt_routemsg().
Summary: * Refactor rt_addrmsg(): make V_rt_add_addr_allfibs decision locally. * Fix rt_routemsg() and multipath by accepting nexthop instead of interface pointer. * Refactor rtsock_routemsg(): avoid accessing rtentry fields directly. * Simplify in_addprefix() by moving prefix search to a separate function.
Reviewers: #network
Subscribers: imp, ae, bz
Differential Revision: https://reviews.freebsd.org/D28011
show more ...
|
#
833dbf1e |
| 28-Dec-2020 |
Ryan Libby <rlibby@FreeBSD.org> |
route: quiet -Wredundant-decls
Remove declaration duplicated in f5baf8bb12f39d0e8d64508c47eb6c4386ef716d
Reviewed by: melifaro Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.f
route: quiet -Wredundant-decls
Remove declaration duplicated in f5baf8bb12f39d0e8d64508c47eb6c4386ef716d
Reviewed by: melifaro Sponsored by: Dell EMC Isilon Differential Revision: https://reviews.freebsd.org/D27790
show more ...
|
#
f5baf8bb |
| 25-Dec-2020 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Add modular fib lookup framework.
This change introduces framework that allows to dynamically attach or detach longest prefix match (lpm) lookup algorithms to speed up datapath route tables lookup
Add modular fib lookup framework.
This change introduces framework that allows to dynamically attach or detach longest prefix match (lpm) lookup algorithms to speed up datapath route tables lookups.
Framework takes care of handling initial synchronisation, route subscription, nhop/nhop groups reference and indexing, dataplane attachments and fib instance algorithm setup/teardown. Framework features automatic algorithm selection, allowing for picking the best matching algorithm on-the-fly based on the amount of routes in the routing table.
Currently framework code is guarded under FIB_ALGO config option. An idea is to enable it by default in the next couple of weeks.
The following algorithms are provided by default: IPv4: * bsearch4 (lockless binary search in a special IP array), tailored for small-fib (<16 routes) * radix4_lockless (lockless immutable radix, re-created on every rtable change), tailored for small-fib (<1000 routes) * radix4 (base system radix backend) * dpdk_lpm4 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412) IPv6: * radix6_lockless (lockless immutable radix, re-created on every rtable change), tailed for small-fib (<1000 routes) * radix6 (base system radix backend) * dpdk_lpm6 (DPDK DIR24-8-based lookups), lockless datastrucure, optimized for large-fib (D27412)
Performance changes: Micro benchmarks (I7-7660U, single-core lookups, 2048k dst, code in D27604): IPv4: 8 routes: radix4: ~20mpps radix4_lockless: ~24.8mpps bsearch4: ~69mpps dpdk_lpm4: ~67 mpps 700k routes: radix4_lockless: 3.3mpps dpdk_lpm4: 46mpps
IPv6: 8 routes: radix6_lockless: ~20mpps dpdk_lpm6: ~70mpps 100k routes: radix6_lockless: 13.9mpps dpdk_lpm6: 57mpps
Forwarding benchmarks: + 10-15% IPv4 forwarding performance (small-fib, bsearch4) + 25% IPv4 forwarding performance (full-view, dpdk_lpm4) + 20% IPv6 forwarding performance (full-view, dpdk_lpm6)
Control: Framwork adds the following runtime sysctls:
List algos * net.route.algo.inet.algo_list: bsearch4, radix4_lockless, radix4 * net.route.algo.inet6.algo_list: radix6_lockless, radix6, dpdk_lpm6 Debug level (7=LOG_DEBUG, per-route) net.route.algo.debug_level: 5 Algo selection (currently only for fib 0): net.route.algo.inet.algo: bsearch4 net.route.algo.inet6.algo: radix6_lockless
Support for manually changing algos in non-default fib will be added soon. Some sysctl names will be changed in the near future.
Differential Revision: https://reviews.freebsd.org/D27401
show more ...
|
#
df905392 |
| 03-Dec-2020 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Add IPv4/IPv6 rtentry prefix accessors.
Multiple consumers like ipfw, netflow or new route lookup algorithms need to get the prefix data out of struct rtentry. Instead of providing direct access to
Add IPv4/IPv6 rtentry prefix accessors.
Multiple consumers like ipfw, netflow or new route lookup algorithms need to get the prefix data out of struct rtentry. Instead of providing direct access to the rtentry, create IPv4/IPv6 accessors to abstract struct rtentry internals and avoid including internal routing headers for external consumers.
While here, move struct route_nhop_data to the public header, so external customers can actually use lookup functions returning rt&nhop data.
Differential Revision: https://reviews.freebsd.org/D27416
show more ...
|
#
f47fa260 |
| 29-Nov-2020 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Add nhop_ref_any() to unify referencing nhop or nexthop group.
It allows code within routing subsystem to transparently reference nexthops and nexthop groups, similar to nhop_free_any(), abstractin
Add nhop_ref_any() to unify referencing nhop or nexthop group.
It allows code within routing subsystem to transparently reference nexthops and nexthop groups, similar to nhop_free_any(), abstracting ROUTE_MPATH details.
Differential Revision: https://reviews.freebsd.org/D27410
show more ...
|
#
98d5c4e5 |
| 29-Nov-2020 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Add tracking for rib/nhops/nhgrp objects and provide cumulative number accessors.
The resulting KPI can be used by routing table consumers to estimate the required scale for route table export.
*
Add tracking for rib/nhops/nhgrp objects and provide cumulative number accessors.
The resulting KPI can be used by routing table consumers to estimate the required scale for route table export.
* Add tracking for rib routes * Add accessors for number of nexthops/nexthop objects * Simplify rib_unsubscribe: store rnh we're attached to instead of requiring it up again on destruction. This helps in the cases when rnh is not linked yet/already unlinked.
Differential Revision: https://reviews.freebsd.org/D27404
show more ...
|
#
ef6ef7e5 |
| 28-Nov-2020 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Add nhgrp_get_idx() as a counterpart for nhop_get_idx().
It allows the routing-related code to reference nexthop groups by index instead of storing a pointer.
|
Revision tags: release/12.2.0 |
|
#
0c325f53 |
| 18-Oct-2020 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Implement flowid calculation for outbound connections to balance connections over multiple paths.
Multipath routing relies on mbuf flowid data for both transit and outbound traffic. Current code f
Implement flowid calculation for outbound connections to balance connections over multiple paths.
Multipath routing relies on mbuf flowid data for both transit and outbound traffic. Current code fills mbuf flowid from inp_flowid for connection-oriented sockets. However, inp_flowid is currently not calculated for outbound connections.
This change creates simple hashing functions and starts calculating hashes for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the system.
Reviewed by: glebius (previous version),ae Differential Revision: https://reviews.freebsd.org/D26523
show more ...
|
#
fedeb08b |
| 03-Oct-2020 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Introduce scalable route multipath.
This change is based on the nexthop objects landed in D24232.
The change introduces the concept of nexthop groups. Each group contains the collection of nexthops
Introduce scalable route multipath.
This change is based on the nexthop objects landed in D24232.
The change introduces the concept of nexthop groups. Each group contains the collection of nexthops with their relative weights and a dataplane-optimized structure to enable efficient nexthop selection.
Simular to the nexthops, nexthop groups are immutable. Dataplane part gets compiled during group creation and is basically an array of nexthop pointers, compiled w.r.t their weights.
With this change, `rt_nhop` field of `struct rtentry` contains either nexthop or nexthop group. They are distinguished by the presense of NHF_MULTIPATH flag. All dataplane lookup functions returns pointer to the nexthop object, leaving nexhop groups details inside routing subsystem.
User-visible changes:
The change is intended to be backward-compatible: all non-mpath operations should work as before with ROUTE_MPATH and net.route.multipath=1.
All routes now comes with weight, default weight is 1, maximum is 2^24-1.
Current maximum multipath group width is statically set to 64. This will become sysctl-tunable in the followup changes.
Using functionality: * Recompile kernel with ROUTE_MPATH * set net.route.multipath to 1
route add -6 2001:db8::/32 2001:db8::2 -weight 10 route add -6 2001:db8::/32 2001:db8::3 -weight 20
netstat -6On
Nexthop groups data
Internet6: GrpIdx NhIdx Weight Slots Gateway Netif Refcnt 1 ------- ------- ------- --------------------------------------- --------- 1 13 10 1 2001:db8::2 vlan2 14 20 2 2001:db8::3 vlan2
Next steps: * Land outbound hashing for locally-originated routes ( D26523 ). * Fix net/bird multipath (net/frr seems to work fine) * Add ROUTE_MPATH to GENERIC * Set net.route.multipath=1 by default
Tested by: olivier Reviewed by: glebius Relnotes: yes Differential Revision: https://reviews.freebsd.org/D26449
show more ...
|
#
2259a030 |
| 21-Sep-2020 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Rework part of routing code to reduce difference to D26449.
* Split rt_setmetrics into get_info_weight() and rt_set_expire_info(), as these two can be applied at different entities and at different
Rework part of routing code to reduce difference to D26449.
* Split rt_setmetrics into get_info_weight() and rt_set_expire_info(), as these two can be applied at different entities and at different times. * Start filling route weight in route change notifications * Pass flowid to UDP/raw IP route lookups * Rework nd6_subscription_cb() and sysctl_dumpentry() to prepare for the fact that rtentry can contain multiple nexthops.
Differential Revision: https://reviews.freebsd.org/D26497
show more ...
|