#
42b3c16e |
| 17-May-2024 |
Marko Zec <zec@FreeBSD.org> |
fib_dxr: code hygiene, prune old code, no functional changes
The !DXR2 code corresponds to the original DXR encoding proposal from 2012 with a single direct-lookup stage, which is inferior to the mo
fib_dxr: code hygiene, prune old code, no functional changes
The !DXR2 code corresponds to the original DXR encoding proposal from 2012 with a single direct-lookup stage, which is inferior to the more recent (DXR2) variant with two-stage trie both in terms of memory footprint of the lookup structures, and in terms of overall lookup througput.
I'm axing the old code chunks to (hopefully) somewhat improve readability, as well as to simplify future maintenance and updates.
MFC after: 1 week
show more ...
|
#
19bd24ca |
| 17-May-2024 |
Marko Zec <zec@FreeBSD.org> |
fib_dxr: do not leak memory if FIB constellation hits structural limit
DXR lookup table encoding has an inherent structural limit on the amount of binary search ranges it can accomodate. With the c
fib_dxr: do not leak memory if FIB constellation hits structural limit
DXR lookup table encoding has an inherent structural limit on the amount of binary search ranges it can accomodate. With the current IPv4 BGP views (circa 1 M prefixes) and default DXR encoding we are only at around 5% of that limit, so far, far away from hitting it. Just in case it ever gets hit, make sure we free the allocated structures, instead of leaking it.
MFC after: 1 week
show more ...
|
#
4ab122e8 |
| 17-May-2024 |
Marko Zec <zec@FreeBSD.org> |
fib_dxr: check if cached fib_data matches the new request in dxr_init()
When calling dxr_init(), the FIB_ALGO infrastructure may provide a pointer to a previous dxr instance, which permits reuse of
fib_dxr: check if cached fib_data matches the new request in dxr_init()
When calling dxr_init(), the FIB_ALGO infrastructure may provide a pointer to a previous dxr instance, which permits reuse of auxiliary dxr structures, i.e. incremental lookup structure updates. For dxr this is a crucial feature provided by FIB_ALGO, since dxr incremental updates are typically several orders of magnitude faster than full lookup table rebuilds.
However, the auxiliary dxr structure caches a pointer to struct fib_data and relies upon it for performing incremental updates. Apparently, incremental rebuild requests from FIB_ALGO, i.e. a calls to dxr_init() with a pointer old_data set, may (under not yet fully understood circumstances) be invoked within a different fib_data context than the one cached in the previous version of dxr auxiliary structures. In such (rare) events, we ignore the offered old dxr context, and proceed with a full lookup structure rebuild instead of attempting an incremental one using a fib_data context which may or may not no longer be valid, and thus lead to a system crash.
PR: 278422 MFC after: 1 week
show more ...
|
#
b24e353f |
| 07-May-2024 |
Marko Zec <zec@FreeBSD.org> |
fib_dxr: set fib_data field in struct dxr_aux early enough
Previously it was possible for dxr_build() to return with da->fd unset in case of range_tbl or x_tbl malloc() failures. This may have led
fib_dxr: set fib_data field in struct dxr_aux early enough
Previously it was possible for dxr_build() to return with da->fd unset in case of range_tbl or x_tbl malloc() failures. This may have led to NULL ptr dereferencing in dxr_change_rib_batch().
MFC after: 1 week
PR: 278422
show more ...
|
#
4aa275f1 |
| 07-May-2024 |
Marko Zec <zec@FreeBSD.org> |
fib_dxr: s/KASSERT/MPASS/
MFC after: 1 week
|
#
7a5de1d4 |
| 07-May-2024 |
Marko Zec <zec@FreeBSD.org> |
fib_dxr: KASSERTs for chasing NULL ptr and runaway refcount suspects
MFC after: 1 week
|
#
ed541e20 |
| 07-May-2024 |
Marko Zec <zec@FreeBSD.org> |
fib_dxr: move the bulko of malloc() failure logging into dxr_build()
|
#
5295e891 |
| 06-May-2024 |
Marko Zec <zec@FreeBSD.org> |
fib_dxr: update comment.
MFC after: 1 week
|
#
85801064 |
| 06-May-2024 |
Marko Zec <zec@FreeBSD.org> |
fib_dxr: free() does nothing if arg is NULL, so remove a redundant check.
MFC after: 1 week
|
#
308caa38 |
| 06-May-2024 |
Marko Zec <zec@FreeBSD.org> |
fib_dxr: log malloc() failures.
MFC after: 1 week
|
Revision tags: release/13.3.0, release/14.0.0 |
|
#
685dc743 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
4d846d26 |
| 10-May-2023 |
Warner Losh <imp@FreeBSD.org> |
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of
spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD
The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch up to that fact and revert to their recommended match of BSD-2-Clause.
Discussed with: pfg MFC After: 3 days Sponsored by: Netflix
show more ...
|
Revision tags: release/13.2.0, release/12.4.0, release/13.1.0 |
|
#
e7abe200 |
| 17-Jan-2022 |
Marko Zec <zec@FreeBSD.org> |
fib_algo: shift / mask by constants in dxr_lookup()
Since trie configuration remains invariant during each DXR instance lifetime, instead of shifting and masking lookup keys by values computed at ru
fib_algo: shift / mask by constants in dxr_lookup()
Since trie configuration remains invariant during each DXR instance lifetime, instead of shifting and masking lookup keys by values computed at runtime, compile upfront several dxr_lookup() configurations with hardcoded shift / mask constants, and choose the apropriate lookup function version after each DXR instance rebuild.
In synthetic tests this yields small but measurable (5-10%) lookup throughput improvement, depending on FIB size and prefix patterns.
MFC after: 3 days
show more ...
|
Revision tags: release/12.3.0 |
|
#
bc8b8e10 |
| 09-Oct-2021 |
Marko Zec <zec@FreeBSD.org> |
[fib_algo][dxr] Retire counters which are no longer used
The number of chunks can still be tracked via vmstat -z|fgrep dxr.
MFC after: 3 days
|
#
1549575f |
| 09-Oct-2021 |
Marko Zec <zec@FreeBSD.org> |
[fib_algo][dxr] Improve incremental updating strategy
Tracking the number of unused holes in the trie and the range table was a bad metric based on which full trie and / or range rebuilds were trigg
[fib_algo][dxr] Improve incremental updating strategy
Tracking the number of unused holes in the trie and the range table was a bad metric based on which full trie and / or range rebuilds were triggered, which would happen in vain by far too frequently, particularly with live BGP feeds.
Instead, track the total unused space inside the trie and range table structures, and trigger rebuilds if the percentage of unused space exceeds a sysctl-tunable threshold.
MFC after: 3 days PR: 257965
show more ...
|
#
43880c51 |
| 25-Sep-2021 |
Marko Zec <zec@FreeBSD.org> |
[fib_algo][dxr] Split unused range chunk list in multiple buckets
Traversing a single list of unused range chunks in search for a block of optimal size was suboptimal.
The experience with real-worl
[fib_algo][dxr] Split unused range chunk list in multiple buckets
Traversing a single list of unused range chunks in search for a block of optimal size was suboptimal.
The experience with real-world BGP workloads has shown that on average unused range chunks are tiny, mostly in length from 1 to 4 or 5, when DXR is configured with K = 20 which is the current default (D16X4R).
Therefore, introduce a limited amount of buckets to accomodate descriptors of empty blocks of fixed (small) size, so that those can be found in O(1) time. If no empty chunks of the requested size can be found in fixed-size buckets, the search continues in an unsorted list of empty chunks of variable lengths, which should only happen infrequently.
This change should permit us to manage significantly more empty range chunks without sacrifying the speed of incremental range table updating.
MFC after: 3 days
show more ...
|
#
2ac039f7 |
| 20-Sep-2021 |
Marko Zec <zec@FreeBSD.org> |
[fib_algo][dxr] Merge adjacent empty range table chunks.
MFC after: 3 days
|
#
eb3148cc |
| 16-Sep-2021 |
Marko Zec <zec@FreeBSD.org> |
[fib algo][dxr] Fix division by zero.
A division by zero would occur if DXR would be activated on a vnet with no IP addresses configured on any interfaces.
PR: 257965 MFC after: 3 days Reported by
[fib algo][dxr] Fix division by zero.
A division by zero would occur if DXR would be activated on a vnet with no IP addresses configured on any interfaces.
PR: 257965 MFC after: 3 days Reported by: Raul Munoz
show more ...
|
#
b51f8bae |
| 15-Sep-2021 |
Marko Zec <zec@FreeBSD.org> |
[fib algo][dxr] Optimize trie updating.
Don't rebuild in vain trie parts unaffected by accumulated incremental RIB updates.
PR: 257965 Tested by: Konrad Kreciwilk MFC after: 3 days
|
#
442c8a24 |
| 15-Sep-2021 |
Marko Zec <zec@FreeBSD.org> |
[fib algo][dxr] Fix undefined behavior.
The result of shifting uint32_t by 32 (or more) is undefined: fix it.
|
#
2aca58e1 |
| 05-May-2021 |
Marko Zec <zec@FreeBSD.org> |
Introduce DXR as an IPv4 longest prefix matching / FIB module
DXR maintains compressed lookup structures with a trivial search procedure. A two-stage trie is indexed by the more significant bits of
Introduce DXR as an IPv4 longest prefix matching / FIB module
DXR maintains compressed lookup structures with a trivial search procedure. A two-stage trie is indexed by the more significant bits of the search key (IPv4 address), while the remaining bits are used for finding the next hop in a sorted array. The tradeoff between memory footprint and search speed depends on the split between the trie and the remaining binary search. The default of 20 bits of the key being used for trie indexing yields good performance (see below) with footprints of around 2.5 Bytes per prefix with current BGP snapshots.
Rebuilding lookup structures takes some time, which is compensated for by batching several RIB change requests into a single FIB update, i.e. FIB synchronization with the RIB may be delayed for a fraction of a second. RIB to FIB synchronization, next-hop table housekeeping, and lockless lookup capability is provided by the FIB_ALGO infrastructure.
DXR works well on modern CPUs with several MBytes of caches, especially in VMs, where is outperforms other currently available IPv4 FIB algorithms by a large margin.
Synthetic single-thread LPM throughput test method:
kldload test_lookup; kldload dpdk_lpm4; kldload fib_dxr sysctl net.route.test.run_lps_rnd=N sysctl net.route.test.run_lps_seq=N
where N is the number of randomly generated keys (IPv4 addresses) which should be chosen so that each test iteration runs for several seconds.
Each reported score represents the best of three runs, in million lookups per second (MLPS), for two bechmarks (RND & SEQ) with two FIBs:
host: single interface address, local subnet route + default route BGP: snapshot from linx.routeviews.org, 887957 prefixes, 496 next hops
Bhyve VM on an Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60 GHz: inet.algo host, RND host, SEQ BGP, RND BGP, SEQ bsearch4 40.6 20.2 N/A N/A radix4 7.8 3.8 1.2 0.6 radix4_lockless 18.0 9.0 1.6 0.8 dpdk_lpm4 14.4 5.0 14.6 5.0 dxr 70.3 34.7 43.0 19.5
Intel(R) Core(TM) i5-5300U CPU @ 2.30 GHz: inet.algo host, RND host, SEQ BGP, RND BGP, SEQ bsearch4 47.0 23.1 N/A N/A radix4 8.5 4.2 1.9 1.0 radix4_lockless 19.2 9.5 2.5 1.2 dpdk_lpm4 31.2 9.4 31.6 9.3 dxr 84.9 41.4 51.7 23.6
Intel(R) Core(TM) i7-4771 CPU @ 3.50 GHz: inet.algo host, RND host, SEQ BGP, RND BGP, SEQ bsearch4 59.5 29.4 N/A N/A radix4 10.8 5.5 2.5 1.3 radix4_lockless 24.7 12.0 3.1 1.6 dpdk_lpm4 29.1 9.0 30.2 9.1 dxr 101.3 49.9 69.8 32.5
AMD Ryzen 7 3700X 8-Core Processor @ 3.60 GHz: inet.algo host, RND host, SEQ BGP, RND BGP, SEQ bsearch4 70.8 35.4 N/A N/A radix4 14.4 7.2 2.8 1.4 radix4_lockless 30.2 15.1 3.7 1.8 dpdk_lpm4 29.9 9.0 30.0 8.9 dxr 163.3 81.5 99.5 44.4
AMD Ryzen 5 5600X 6-Core Processor @ 3.70 GHz: inet.algo host, RND host, SEQ BGP, RND BGP, SEQ bsearch4 93.6 46.7 N/A N/A radix4 18.9 9.3 4.3 2.1 radix4_lockless 37.2 18.6 5.3 2.7 dpdk_lpm4 51.8 15.1 51.6 14.9 dxr 218.2 103.3 114.0 49.0
Reviewed by: melifaro MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D29821
show more ...
|