#
1f628be8 |
| 05-Aug-2024 |
Andrew Gallatin <gallatin@FreeBSD.org> |
tcp_ratelimit: provide an api for drivers to release ratesets at detach
When the kernel is compiled with options RATELIMIT, the mlx5en driver cannot detach. It gets stuck waiting for all kernel user
tcp_ratelimit: provide an api for drivers to release ratesets at detach
When the kernel is compiled with options RATELIMIT, the mlx5en driver cannot detach. It gets stuck waiting for all kernel users of its rates to drop to zero before finally calling ether_ifdetach.
The tcp ratelimit code has an eventhandler for ifnet departure which causes rates to be released. However, this is called as an ifnet departure eventhandler, which is invoked as part of ifdetach(), via either_ifdetach(). This means that the tcp ratelimit code holds down many hw rates when the mlx5en driver is waiting for the rate count to go to 0. Thus devctl detach will deadlock on mlx5 with this stack: mi_switch+0xcf sleepq_timedwait+0x2f _sleep+0x1a3 pause_sbt+0x77 mlx5e_destroy_ifp+0xaf mlx5_remove_device+0xa7 mlx5_unregister_device+0x78 mlx5_unload_one+0x10a remove_one+0x1e linux_pci_detach_device+0x36 linux_pci_detach+0x24 device_detach+0x180 devctl2_ioctl+0x3dc devfs_ioctl+0xbb vn_ioctl+0xca devfs_ioctl_f+0x1e kern_ioctl+0x1c3 sys_ioctl+0x10a
To fix this, provide an explicit API for a driver to call the tcp ratelimit code telling it to detach itself from an ifnet. This allows the mlx5 driver to unload cleanly. I considered adding an ifnet pre-departure eventhandler. However, that would need to be invoked by the driver, so a simple function call seemed better.
The mlx5en driver has been updated to call this function.
Reviewed by: kib, rrs
Differential Revision: https://reviews.freebsd.org/D46221 Sponsored by: Netflix
show more ...
|
Revision tags: release/14.1.0, release/13.3.0, release/14.0.0 |
|
#
685dc743 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
Revision tags: release/13.2.0 |
|
#
69c7c811 |
| 16-Mar-2023 |
Randall Stewart <rrs@FreeBSD.org> |
Move access to tcp's t_logstate into inline functions and provide new tracepoint and bbpoint capabilities.
The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and t
Move access to tcp's t_logstate into inline functions and provide new tracepoint and bbpoint capabilities.
The TCP stacks have long accessed t_logstate directly, but in order to do tracepoints and the new bbpoints we need to move to using the new inline functions. This adds them and moves rack to now use the tcp_tracepoints.
Reviewed by: tuexen, gallatin Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D38831
show more ...
|
#
c0e4090e |
| 08-Feb-2023 |
Andrew Gallatin <gallatin@FreeBSD.org> |
ktls: Accurately track if ifnet ktls is enabled
This allows us to avoid spurious calls to ktls_disable_ifnet()
When we implemented ifnet kTLSe, we set a flag in the tx socket buffer (SB_TLS_IFNET)
ktls: Accurately track if ifnet ktls is enabled
This allows us to avoid spurious calls to ktls_disable_ifnet()
When we implemented ifnet kTLSe, we set a flag in the tx socket buffer (SB_TLS_IFNET) to indicate ifnet kTLS. This flag meant that now, or in the past, ifnet ktls was active on a socket. Later, I added code to switch ifnet ktls sessions to software in the case of lossy TCP connections that have a high retransmit rate. Because TCP was using SB_TLS_IFNET to know if it needed to do math to calculate the retransmit ratio and potentially call into ktls_disable_ifnet(), it was doing unneeded work long after a session was moved to software.
This patch carefully tracks whether or not ifnet ktls is still enabled on a TCP connection. Because the inp is now embedded in the tcpcb, and because TCP is the most frequent accessor of this state, it made sense to move this from the socket buffer flags to the tcpcb. Because we now need reliable access to the tcbcb, we take a ref on the inp when creating a tx ktls session.
While here, I noticed that rack/bbr were incorrectly implementing tfb_hwtls_change(), and applying the change to all pending sends, when it should apply only to future sends.
This change reduces spurious calls to ktls_disable_ifnet() by 95% or so in a Netflix CDN environment.
Reviewed by: markj, rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D38380
show more ...
|
#
3d0d5b21 |
| 23-Jan-2023 |
Justin Hibbits <jhibbits@FreeBSD.org> |
IfAPI: Explicitly include <net/if_private.h> in netstack
Summary: In preparation of making if_t completely opaque outside of the netstack, explicitly include the header. <net/if_var.h> will stop in
IfAPI: Explicitly include <net/if_private.h> in netstack
Summary: In preparation of making if_t completely opaque outside of the netstack, explicitly include the header. <net/if_var.h> will stop including the header in the future.
Sponsored by: Juniper Networks, Inc. Reviewed by: glebius, melifaro Differential Revision: https://reviews.freebsd.org/D38200
show more ...
|
#
26bdd35c |
| 05-Jan-2023 |
Randall Stewart <rrs@FreeBSD.org> |
rack and bbr not loading if TCP_RATELIMIT is not configured.
So it turns out that rack and bbr still will not load without TCP_RATELIMIT. This needs to be fixed and lets also at the same time bring
rack and bbr not loading if TCP_RATELIMIT is not configured.
So it turns out that rack and bbr still will not load without TCP_RATELIMIT. This needs to be fixed and lets also at the same time bring tcp_ratelimit up to date where we allow the transports to set a divisor (though still having a default path with the default divisor of 1000) for setting the burst size.
Reviewed by: tuexen, gallatin Sponsored by: Netflix Inc Differential Revision: https://reviews.freebsd.org/D37954
show more ...
|
#
eaabc937 |
| 14-Dec-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: retire TCPDEBUG
This subsystem is superseded by modern debugging facilities, e.g. DTrace probes and TCP black box logging.
We intentionally leave SO_DEBUG in place, as many utilities may set i
tcp: retire TCPDEBUG
This subsystem is superseded by modern debugging facilities, e.g. DTrace probes and TCP black box logging.
We intentionally leave SO_DEBUG in place, as many utilities may set it on a socket. Also the tcp::debug DTrace probes look at this flag on a socket.
Reviewed by: gnn, tuexen Discussed with: rscheff, rrs, jtl Differential revision: https://reviews.freebsd.org/D37694
show more ...
|
Revision tags: release/12.4.0 |
|
#
9eb0e832 |
| 08-Nov-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: provide macros to access inpcb and socket from a tcpcb
There should be no functional changes with this commit.
Reviewed by: rscheff Differential revision: https://reviews.freebsd.org/D37123
|
#
0ab46f28 |
| 04-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
tcp: remove unnecessary include of tcp6_var.h
Reviewed by: rscheff, melifaro Differential revision: https://reviews.freebsd.org/D36725
|
Revision tags: release/13.1.0 |
|
#
d782385e |
| 01-Feb-2022 |
John Baldwin <jhb@FreeBSD.org> |
tcp_ratelimit: Handle some edge cases with TLS + RL send tags.
- After a connection has fallen back from NIC TLS to SW TLS, any pacing rate changes should modify the inpcb send tag even though S
tcp_ratelimit: Handle some edge cases with TLS + RL send tags.
- After a connection has fallen back from NIC TLS to SW TLS, any pacing rate changes should modify the inpcb send tag even though SB_TLS_IFNET is set.
- If a connection tries to modify the pacing rate before the send tag has been converted from plain TLS to TLS + RL, don't fail the rate request set but let it fall through to setting the rate on the non-TLS inpcb RL tag.
Reviewed by: gallatin, rrs, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34085
show more ...
|
#
8a7404b2 |
| 27-Jan-2022 |
Andrew Gallatin <gallatin@FreeBSD.org> |
tcp: fix leaks in tcp_chg_pacing_rate error paths
tcp_chg_pacing_rate() is expected to release the hw rate limit table, but failed to do so in several error cases, leading to ever increasing counts
tcp: fix leaks in tcp_chg_pacing_rate error paths
tcp_chg_pacing_rate() is expected to release the hw rate limit table, but failed to do so in several error cases, leading to ever increasing counts of flows using the rate.
This patch was mostly done by rrs
Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D34058 Reviewed by: hselasky, rrs, jhb (inital version, outside of Differential)
show more ...
|
#
aac52f94 |
| 18-Jan-2022 |
Randall Stewart <rrs@FreeBSD.org> |
tcp: Warning cleanup from new compiler.
The clang compiler recently got an update that generates warnings of unused variables where they were set, and then never used. This revision goes through the
tcp: Warning cleanup from new compiler.
The clang compiler recently got an update that generates warnings of unused variables where they were set, and then never used. This revision goes through the tcp stack and cleans all of those up.
Reviewed by: Michael Tuexen, Gleb Smirnoff Sponsored by: Netflix Inc. Differential Revision:
show more ...
|
Revision tags: release/12.3.0 |
|
#
c782ea8b |
| 14-Sep-2021 |
John Baldwin <jhb@FreeBSD.org> |
Add a switch structure for send tags.
Move the type and function pointers for operations on existing send tags (modify, query, next, free) out of 'struct ifnet' and into a new 'struct if_snd_tag_sw'
Add a switch structure for send tags.
Move the type and function pointers for operations on existing send tags (modify, query, next, free) out of 'struct ifnet' and into a new 'struct if_snd_tag_sw'. A pointer to this structure is added to the generic part of send tags and is initialized by m_snd_tag_init() (which now accepts a switch structure as a new argument in place of the type).
Previously, device driver ifnet methods switched on the type to call type-specific functions. Now, those type-specific functions are saved in the switch structure and invoked directly. In addition, this more gracefully permits multiple implementations of the same tag within a driver. In particular, NIC TLS for future Chelsio adapters will use a different implementation than the existing NIC TLS support for T6 adapters.
Reviewed by: gallatin, hselasky, kib (older version) Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D31572
show more ...
|
#
5d8fd932 |
| 06-May-2021 |
Randall Stewart <rrs@FreeBSD.org> |
This brings into sync FreeBSD with the netflix versions of rack and bbr. This fixes several breakages (panics) since the tcp_lro code was committed that have been reported. Quite a few new features a
This brings into sync FreeBSD with the netflix versions of rack and bbr. This fixes several breakages (panics) since the tcp_lro code was committed that have been reported. Quite a few new features are now in rack (prefecting of DGP -- Dynamic Goodput Pacing among the largest). There is also support for ack-war prevention. Documents comming soon on rack..
Sponsored by: Netflix Reviewed by: rscheff, mtuexen Differential Revision: https://reviews.freebsd.org/D30036
show more ...
|
Revision tags: release/13.0.0 |
|
#
db46c0d0 |
| 01-Feb-2021 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Fix LINT kernel builds after 1a714ff20419 .
MFC after: 1 week Discussed with: rrs@ Differential Revision: https://reviews.freebsd.org/D28357 Sponsored by: Mellanox Technologies // NVIDIA Networking
|
#
1a714ff2 |
| 26-Jan-2021 |
Randall Stewart <rrs@FreeBSD.org> |
This pulls over all the changes that are in the netflix tree that fix the ratelimit code. There were several bugs in tcp_ratelimit itself and we needed further work to support the multiple tag format
This pulls over all the changes that are in the netflix tree that fix the ratelimit code. There were several bugs in tcp_ratelimit itself and we needed further work to support the multiple tag format coming for the joint TLS and Ratelimit dances.
Sponsored by: Netflix Inc. Differential Revision: https://reviews.freebsd.org/D28357
show more ...
|
#
36e0a362 |
| 30-Oct-2020 |
John Baldwin <jhb@FreeBSD.org> |
Add m_snd_tag_alloc() as a wrapper around if_snd_tag_alloc().
This gives a more uniform API for send tag life cycle management.
Reviewed by: gallatin, hselasky Sponsored by: Netflix Differential Re
Add m_snd_tag_alloc() as a wrapper around if_snd_tag_alloc().
This gives a more uniform API for send tag life cycle management.
Reviewed by: gallatin, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D27000
show more ...
|
#
98d7a8d9 |
| 29-Oct-2020 |
John Baldwin <jhb@FreeBSD.org> |
Call m_snd_tag_rele() to free send tags.
Send tags are refcounted and if_snd_tag_free() is called by m_snd_tag_rele() when the last reference is dropped on a send tag.
Reviewed by: gallatin, hselas
Call m_snd_tag_rele() to free send tags.
Send tags are refcounted and if_snd_tag_free() is called by m_snd_tag_rele() when the last reference is dropped on a send tag.
Reviewed by: gallatin, hselasky Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26995
show more ...
|
#
7552deb2 |
| 29-Oct-2020 |
John Baldwin <jhb@FreeBSD.org> |
Remove an extra if_ref().
In r348254, if_snd_tag_alloc() routines were changed to bump the ifp refcount via m_snd_tag_init(). This function wasn't in the tree at the time and wasn't updated for the
Remove an extra if_ref().
In r348254, if_snd_tag_alloc() routines were changed to bump the ifp refcount via m_snd_tag_init(). This function wasn't in the tree at the time and wasn't updated for the new semantics, so was still doing a separate bump after if_snd_tag_alloc() returned.
Reviewed by: gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26999
show more ...
|
#
521eac97 |
| 29-Oct-2020 |
John Baldwin <jhb@FreeBSD.org> |
Support hardware rate limiting (pacing) with TLS offload.
- Add a new send tag type for a send tag that supports both rate limiting (packet pacing) and TLS offload (mostly similar to D22669 but
Support hardware rate limiting (pacing) with TLS offload.
- Add a new send tag type for a send tag that supports both rate limiting (packet pacing) and TLS offload (mostly similar to D22669 but adds a separate structure when allocating the new tag type).
- When allocating a send tag for TLS offload, check to see if the connection already has a pacing rate. If so, allocate a tag that supports both rate limiting and TLS offload rather than a plain TLS offload tag.
- When setting an initial rate on an existing ifnet KTLS connection, set the rate in the TCP control block inp and then reset the TLS send tag (via ktls_output_eagain) to reallocate a TLS + ratelimit send tag. This allocates the TLS send tag asynchronously from a task queue, so the TLS rate limit tag alloc is always sleepable.
- When modifying a rate on a connection using KTLS, look for a TLS send tag. If the send tag is only a plain TLS send tag, assume we failed to allocate a TLS ratelimit tag (either during the TCP_TXTLS_ENABLE socket option, or during the send tag reset triggered by ktls_output_eagain) and ignore the new rate. If the send tag is a ratelimit TLS send tag, change the rate on the TLS tag and leave the inp tag alone.
- Lock the inp lock when setting sb_tls_info for a socket send buffer so that the routines in tcp_ratelimit can safely dereference the pointer without needing to grab the socket buffer lock.
- Add an IFCAP_TXTLS_RTLMT capability flag and associated administrative controls in ifconfig(8). TLS rate limit tags are only allocated if this capability is enabled. Note that TLS offload (whether unlimited or rate limited) always requires IFCAP_TXTLS[46].
Reviewed by: gallatin, hselasky Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26691
show more ...
|
#
ce398115 |
| 29-Oct-2020 |
John Baldwin <jhb@FreeBSD.org> |
Save the current TCP pacing rate in t_pacing_rate.
Reviewed by: gallatin, gnn Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26875
|
Revision tags: release/12.2.0 |
|
#
9aed26b9 |
| 06-Oct-2020 |
John Baldwin <jhb@FreeBSD.org> |
Check if_capenable, not if_capabilities when enabling rate limiting.
if_capabilities is a read-only mask of supported capabilities. if_capenable is a mask under administrative control via ifconfig(8
Check if_capenable, not if_capabilities when enabling rate limiting.
if_capabilities is a read-only mask of supported capabilities. if_capenable is a mask under administrative control via ifconfig(8).
Reviewed by: gallatin Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D26690
show more ...
|
#
662c1305 |
| 01-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
net: clean up empty lines in .c and .h files
|
Revision tags: release/11.4.0 |
|
#
28540ab1 |
| 08-Apr-2020 |
Warner Losh <imp@FreeBSD.org> |
Fix copyright year and eliminate the obsolete all rights reserved line.
Reviewed by: rrs@
|
#
c012cfe6 |
| 28-Mar-2020 |
Ed Maste <emaste@FreeBSD.org> |
sys/netinet: remove spurious doubled ;s
|