History log of /freebsd/sys/netinet/in_fib_dxr.c (Results 1 – 21 of 21)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 42b3c16e 17-May-2024 Marko Zec <zec@FreeBSD.org>

fib_dxr: code hygiene, prune old code, no functional changes

The !DXR2 code corresponds to the original DXR encoding proposal from
2012 with a single direct-lookup stage, which is inferior to the mo

fib_dxr: code hygiene, prune old code, no functional changes

The !DXR2 code corresponds to the original DXR encoding proposal from
2012 with a single direct-lookup stage, which is inferior to the more
recent (DXR2) variant with two-stage trie both in terms of memory
footprint of the lookup structures, and in terms of overall lookup
througput.

I'm axing the old code chunks to (hopefully) somewhat improve readability,
as well as to simplify future maintenance and updates.

MFC after: 1 week

show more ...


# 19bd24ca 17-May-2024 Marko Zec <zec@FreeBSD.org>

fib_dxr: do not leak memory if FIB constellation hits structural limit

DXR lookup table encoding has an inherent structural limit on the amount
of binary search ranges it can accomodate. With the c

fib_dxr: do not leak memory if FIB constellation hits structural limit

DXR lookup table encoding has an inherent structural limit on the amount
of binary search ranges it can accomodate. With the current IPv4 BGP views
(circa 1 M prefixes) and default DXR encoding we are only at around 5% of
that limit, so far, far away from hitting it. Just in case it ever gets
hit, make sure we free the allocated structures, instead of leaking it.

MFC after: 1 week

show more ...


# 4ab122e8 17-May-2024 Marko Zec <zec@FreeBSD.org>

fib_dxr: check if cached fib_data matches the new request in dxr_init()

When calling dxr_init(), the FIB_ALGO infrastructure may provide a
pointer to a previous dxr instance, which permits reuse of

fib_dxr: check if cached fib_data matches the new request in dxr_init()

When calling dxr_init(), the FIB_ALGO infrastructure may provide a
pointer to a previous dxr instance, which permits reuse of auxiliary
dxr structures, i.e. incremental lookup structure updates. For dxr this
is a crucial feature provided by FIB_ALGO, since dxr incremental updates
are typically several orders of magnitude faster than full lookup table
rebuilds.

However, the auxiliary dxr structure caches a pointer to struct fib_data and
relies upon it for performing incremental updates. Apparently, incremental
rebuild requests from FIB_ALGO, i.e. a calls to dxr_init() with a pointer
old_data set, may (under not yet fully understood circumstances) be invoked
within a different fib_data context than the one cached in the previous
version of dxr auxiliary structures. In such (rare) events, we ignore the
offered old dxr context, and proceed with a full lookup structure rebuild
instead of attempting an incremental one using a fib_data context which
may or may not no longer be valid, and thus lead to a system crash.

PR: 278422
MFC after: 1 week

show more ...


# b24e353f 07-May-2024 Marko Zec <zec@FreeBSD.org>

fib_dxr: set fib_data field in struct dxr_aux early enough

Previously it was possible for dxr_build() to return with da->fd
unset in case of range_tbl or x_tbl malloc() failures. This
may have led

fib_dxr: set fib_data field in struct dxr_aux early enough

Previously it was possible for dxr_build() to return with da->fd
unset in case of range_tbl or x_tbl malloc() failures. This
may have led to NULL ptr dereferencing in dxr_change_rib_batch().

MFC after: 1 week

PR: 278422

show more ...


# 4aa275f1 07-May-2024 Marko Zec <zec@FreeBSD.org>

fib_dxr: s/KASSERT/MPASS/

MFC after: 1 week


# 7a5de1d4 07-May-2024 Marko Zec <zec@FreeBSD.org>

fib_dxr: KASSERTs for chasing NULL ptr and runaway refcount suspects

MFC after: 1 week


# ed541e20 07-May-2024 Marko Zec <zec@FreeBSD.org>

fib_dxr: move the bulko of malloc() failure logging into dxr_build()


# 5295e891 06-May-2024 Marko Zec <zec@FreeBSD.org>

fib_dxr: update comment.

MFC after: 1 week


# 85801064 06-May-2024 Marko Zec <zec@FreeBSD.org>

fib_dxr: free() does nothing if arg is NULL, so remove a redundant check.

MFC after: 1 week


# 308caa38 06-May-2024 Marko Zec <zec@FreeBSD.org>

fib_dxr: log malloc() failures.

MFC after: 1 week


Revision tags: release/13.3.0, release/14.0.0
# 685dc743 16-Aug-2023 Warner Losh <imp@FreeBSD.org>

sys: Remove $FreeBSD$: one-line .c pattern

Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/


# 4d846d26 10-May-2023 Warner Losh <imp@FreeBSD.org>

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of

spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop -FreeBSD

The SPDX folks have obsoleted the BSD-2-Clause-FreeBSD identifier. Catch
up to that fact and revert to their recommended match of BSD-2-Clause.

Discussed with: pfg
MFC After: 3 days
Sponsored by: Netflix

show more ...


Revision tags: release/13.2.0, release/12.4.0, release/13.1.0
# e7abe200 17-Jan-2022 Marko Zec <zec@FreeBSD.org>

fib_algo: shift / mask by constants in dxr_lookup()

Since trie configuration remains invariant during each DXR instance
lifetime, instead of shifting and masking lookup keys by values
computed at ru

fib_algo: shift / mask by constants in dxr_lookup()

Since trie configuration remains invariant during each DXR instance
lifetime, instead of shifting and masking lookup keys by values
computed at runtime, compile upfront several dxr_lookup()
configurations with hardcoded shift / mask constants, and choose the
apropriate lookup function version after each DXR instance rebuild.

In synthetic tests this yields small but measurable (5-10%) lookup
throughput improvement, depending on FIB size and prefix patterns.

MFC after: 3 days

show more ...


Revision tags: release/12.3.0
# bc8b8e10 09-Oct-2021 Marko Zec <zec@FreeBSD.org>

[fib_algo][dxr] Retire counters which are no longer used

The number of chunks can still be tracked via vmstat -z|fgrep dxr.

MFC after: 3 days


# 1549575f 09-Oct-2021 Marko Zec <zec@FreeBSD.org>

[fib_algo][dxr] Improve incremental updating strategy

Tracking the number of unused holes in the trie and the range table
was a bad metric based on which full trie and / or range rebuilds
were trigg

[fib_algo][dxr] Improve incremental updating strategy

Tracking the number of unused holes in the trie and the range table
was a bad metric based on which full trie and / or range rebuilds
were triggered, which would happen in vain by far too frequently,
particularly with live BGP feeds.

Instead, track the total unused space inside the trie and range table
structures, and trigger rebuilds if the percentage of unused space
exceeds a sysctl-tunable threshold.

MFC after: 3 days
PR: 257965

show more ...


# 43880c51 25-Sep-2021 Marko Zec <zec@FreeBSD.org>

[fib_algo][dxr] Split unused range chunk list in multiple buckets

Traversing a single list of unused range chunks in search for a block
of optimal size was suboptimal.

The experience with real-worl

[fib_algo][dxr] Split unused range chunk list in multiple buckets

Traversing a single list of unused range chunks in search for a block
of optimal size was suboptimal.

The experience with real-world BGP workloads has shown that on average
unused range chunks are tiny, mostly in length from 1 to 4 or 5, when
DXR is configured with K = 20 which is the current default (D16X4R).

Therefore, introduce a limited amount of buckets to accomodate descriptors
of empty blocks of fixed (small) size, so that those can be found in O(1)
time. If no empty chunks of the requested size can be found in fixed-size
buckets, the search continues in an unsorted list of empty chunks of
variable lengths, which should only happen infrequently.

This change should permit us to manage significantly more empty range
chunks without sacrifying the speed of incremental range table updating.

MFC after: 3 days

show more ...


# 2ac039f7 20-Sep-2021 Marko Zec <zec@FreeBSD.org>

[fib_algo][dxr] Merge adjacent empty range table chunks.

MFC after: 3 days


# eb3148cc 16-Sep-2021 Marko Zec <zec@FreeBSD.org>

[fib algo][dxr] Fix division by zero.

A division by zero would occur if DXR would be activated on a vnet
with no IP addresses configured on any interfaces.

PR: 257965
MFC after: 3 days
Reported by

[fib algo][dxr] Fix division by zero.

A division by zero would occur if DXR would be activated on a vnet
with no IP addresses configured on any interfaces.

PR: 257965
MFC after: 3 days
Reported by: Raul Munoz

show more ...


# b51f8bae 15-Sep-2021 Marko Zec <zec@FreeBSD.org>

[fib algo][dxr] Optimize trie updating.

Don't rebuild in vain trie parts unaffected by accumulated incremental
RIB updates.

PR: 257965
Tested by: Konrad Kreciwilk
MFC after: 3 days


# 442c8a24 15-Sep-2021 Marko Zec <zec@FreeBSD.org>

[fib algo][dxr] Fix undefined behavior.

The result of shifting uint32_t by 32 (or more) is undefined: fix it.


# 2aca58e1 05-May-2021 Marko Zec <zec@FreeBSD.org>

Introduce DXR as an IPv4 longest prefix matching / FIB module

DXR maintains compressed lookup structures with a trivial search
procedure. A two-stage trie is indexed by the more significant bits of

Introduce DXR as an IPv4 longest prefix matching / FIB module

DXR maintains compressed lookup structures with a trivial search
procedure. A two-stage trie is indexed by the more significant bits of
the search key (IPv4 address), while the remaining bits are used for
finding the next hop in a sorted array. The tradeoff between memory
footprint and search speed depends on the split between the trie and
the remaining binary search. The default of 20 bits of the key being
used for trie indexing yields good performance (see below) with
footprints of around 2.5 Bytes per prefix with current BGP snapshots.

Rebuilding lookup structures takes some time, which is compensated for by
batching several RIB change requests into a single FIB update, i.e. FIB
synchronization with the RIB may be delayed for a fraction of a second.
RIB to FIB synchronization, next-hop table housekeeping, and lockless
lookup capability is provided by the FIB_ALGO infrastructure.

DXR works well on modern CPUs with several MBytes of caches, especially
in VMs, where is outperforms other currently available IPv4 FIB
algorithms by a large margin.

Synthetic single-thread LPM throughput test method:

kldload test_lookup; kldload dpdk_lpm4; kldload fib_dxr
sysctl net.route.test.run_lps_rnd=N
sysctl net.route.test.run_lps_seq=N

where N is the number of randomly generated keys (IPv4 addresses) which
should be chosen so that each test iteration runs for several seconds.

Each reported score represents the best of three runs, in million
lookups per second (MLPS), for two bechmarks (RND & SEQ) with two FIBs:

host: single interface address, local subnet route + default route
BGP: snapshot from linx.routeviews.org, 887957 prefixes, 496 next hops

Bhyve VM on an Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60 GHz:
inet.algo host, RND host, SEQ BGP, RND BGP, SEQ
bsearch4 40.6 20.2 N/A N/A
radix4 7.8 3.8 1.2 0.6
radix4_lockless 18.0 9.0 1.6 0.8
dpdk_lpm4 14.4 5.0 14.6 5.0
dxr 70.3 34.7 43.0 19.5

Intel(R) Core(TM) i5-5300U CPU @ 2.30 GHz:
inet.algo host, RND host, SEQ BGP, RND BGP, SEQ
bsearch4 47.0 23.1 N/A N/A
radix4 8.5 4.2 1.9 1.0
radix4_lockless 19.2 9.5 2.5 1.2
dpdk_lpm4 31.2 9.4 31.6 9.3
dxr 84.9 41.4 51.7 23.6

Intel(R) Core(TM) i7-4771 CPU @ 3.50 GHz:
inet.algo host, RND host, SEQ BGP, RND BGP, SEQ
bsearch4 59.5 29.4 N/A N/A
radix4 10.8 5.5 2.5 1.3
radix4_lockless 24.7 12.0 3.1 1.6
dpdk_lpm4 29.1 9.0 30.2 9.1
dxr 101.3 49.9 69.8 32.5

AMD Ryzen 7 3700X 8-Core Processor @ 3.60 GHz:
inet.algo host, RND host, SEQ BGP, RND BGP, SEQ
bsearch4 70.8 35.4 N/A N/A
radix4 14.4 7.2 2.8 1.4
radix4_lockless 30.2 15.1 3.7 1.8
dpdk_lpm4 29.9 9.0 30.0 8.9
dxr 163.3 81.5 99.5 44.4

AMD Ryzen 5 5600X 6-Core Processor @ 3.70 GHz:
inet.algo host, RND host, SEQ BGP, RND BGP, SEQ
bsearch4 93.6 46.7 N/A N/A
radix4 18.9 9.3 4.3 2.1
radix4_lockless 37.2 18.6 5.3 2.7
dpdk_lpm4 51.8 15.1 51.6 14.9
dxr 218.2 103.3 114.0 49.0

Reviewed by: melifaro
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D29821

show more ...