History log of /linux/tools/testing/selftests/net/tcp_port_share.c (Results 1 – 3 of 3)
Revision (<<< Hide revision tags) (Show revision tags >>>) Date Author Comments
# 07fdad3a 03-Oct-2025 Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
"Core & protocols:

- Improve drop account scalability on NUM

Merge tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next

Pull networking updates from Paolo Abeni:
"Core & protocols:

- Improve drop account scalability on NUMA hosts for RAW and UDP
sockets and the backlog, almost doubling the Pps capacity under DoS

- Optimize the UDP RX performance under stress, reducing contention,
revisiting the binary layout of the involved data structs and
implementing NUMA-aware locking. This improves UDP RX performance
by an additional 50%, even more under extreme conditions

- Add support for PSP encryption of TCP connections; this mechanism
has some similarities with IPsec and TLS, but offers superior HW
offloads capabilities

- Ongoing work to support Accurate ECN for TCP. AccECN allows more
than one congestion notification signal per RTT and is a building
block for Low Latency, Low Loss, and Scalable Throughput (L4S)

- Reorganize the TCP socket binary layout for data locality, reducing
the number of touched cachelines in the fastpath

- Refactor skb deferral free to better scale on large multi-NUMA
hosts, this improves TCP and UDP RX performances significantly on
such HW

- Increase the default socket memory buffer limits from 256K to 4M to
better fit modern link speeds

- Improve handling of setups with a large number of nexthop, making
dump operating scaling linearly and avoiding unneeded
synchronize_rcu() on delete

- Improve bridge handling of VLAN FDB, storing a single entry per
bridge instead of one entry per port; this makes the dump order of
magnitude faster on large switches

- Restore IP ID correctly for encapsulated packets at GSO
segmentation time, allowing GRO to merge packets in more scenarios

- Improve netfilter matching performance on large sets

- Improve MPTCP receive path performance by leveraging recently
introduced core infrastructure (skb deferral free) and adopting
recent TCP autotuning changes

- Allow bridges to redirect to a backup port when the bridge port is
administratively down

- Introduce MPTCP 'laminar' endpoint that con be used only once per
connection and simplify common MPTCP setups

- Add RCU safety to dst->dev, closing a lot of possible races

- A significant crypto library API for SCTP, MPTCP and IPv6 SR,
reducing code duplication

- Supports pulling data from an skb frag into the linear area of an
XDP buffer

Things we sprinkled into general kernel code:

- Generate netlink documentation from YAML using an integrated YAML
parser

Driver API:

- Support using IPv6 Flow Label in Rx hash computation and RSS queue
selection

- Introduce API for fetching the DMA device for a given queue,
allowing TCP zerocopy RX on more H/W setups

- Make XDP helpers compatible with unreadable memory, allowing more
easily building DevMem-enabled drivers with a unified XDP/skbs
datapath

- Add a new dedicated ethtool callback enabling drivers to provide
the number of RX rings directly, improving efficiency and clarity
in RX ring queries and RSS configuration

- Introduce a burst period for the health reporter, allowing better
handling of multiple errors due to the same root cause

- Support for DPLL phase offset exponential moving average,
controlling the average smoothing factor

Device drivers:

- Add a new Huawei driver for 3rd gen NIC (hinic3)

- Add a new SpacemiT driver for K1 ethernet MAC

- Add a generic abstraction for shared memory communication
devices (dibps)

- Ethernet high-speed NICs:
- nVidia/Mellanox:
- Use multiple per-queue doorbell, to avoid MMIO contention
issues
- support adjacent functions, allowing them to delegate their
SR-IOV VFs to sibling PFs
- support RSS for IPSec offload
- support exposing raw cycle counters in PTP and mlx5
- support for disabling host PFs.
- Intel (100G, ice, idpf):
- ice: support for SRIOV VFs over an Active-Active link
aggregate
- ice: support for firmware logging via debugfs
- ice: support for Earliest TxTime First (ETF) hardware offload
- idpf: support basic XDP functionalities and XSk
- Broadcom (bnxt):
- support Hyper-V VF ID
- dynamic SRIOV resource allocations for RoCE
- Meta (fbnic):
- support queue API, zero-copy Rx and Tx
- support basic XDP functionalities
- devlink health support for FW crashes and OTP mem corruptions
- expand hardware stats coverage to FEC, PHY, and Pause
- Wangxun:
- support ethtool coalesce options
- support for multiple RSS contexts

- Ethernet virtual:
- Macsec:
- replace custom netlink attribute checks with policy-level
checks
- Bonding:
- support aggregator selection based on port priority
- Microsoft vNIC:
- use page pool fragments for RX buffers instead of full pages
to improve memory efficiency

- Ethernet NICs consumer, and embedded:
- Qualcomm: support Ethernet function for IPQ9574 SoC
- Airoha: implement wlan offloading via NPU
- Freescale
- enetc: add NETC timer PTP driver and add PTP support
- fec: enable the Jumbo frame support for i.MX8QM
- Renesas (R-Car S4):
- support HW offloading for layer 2 switching
- support for RZ/{T2H, N2H} SoCs
- Cadence (macb): support TAPRIO traffic scheduling
- TI:
- support for Gigabit ICSS ethernet SoC (icssm-prueth)
- Synopsys (stmmac): a lot of cleanups

- Ethernet PHYs:
- Support 10g-qxgmi phy-mode for AQR412C, Felix DSA and Lynx PCS
driver
- Support bcm63268 GPHY power control
- Support for Micrel lan8842 PHY and PTP
- Support for Aquantia AQR412 and AQR115

- CAN:
- a large CAN-XL preparation work
- reorganize raw_sock and uniqframe struct to minimize memory
usage
- rcar_canfd: update the CAN-FD handling

- WiFi:
- extended Neighbor Awareness Networking (NAN) support
- S1G channel representation cleanup
- improve S1G support

- WiFi drivers:
- Intel (iwlwifi):
- major refactor and cleanup
- Broadcom (brcm80211):
- support for AP isolation
- RealTek (rtw88/89) rtw88/89:
- preparation work for RTL8922DE support
- MediaTek (mt76):
- HW restart improvements
- MLO support
- Qualcomm/Atheros (ath10k):
- GTK rekey fixes

- Bluetooth drivers:
- btusb: support for several new IDs for MT7925
- btintel: support for BlazarIW core
- btintel_pcie: support for _suspend() / _resume()
- btintel_pcie: support for Scorpious, Panther Lake-H484 IDs"

* tag 'net-next-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1536 commits)
net: stmmac: Add support for Allwinner A523 GMAC200
dt-bindings: net: sun8i-emac: Add A523 GMAC200 compatible
Revert "Documentation: net: add flow control guide and document ethtool API"
octeontx2-pf: fix bitmap leak
octeontx2-vf: fix bitmap leak
net/mlx5e: Use extack in set rxfh callback
net/mlx5e: Introduce mlx5e_rss_params for RSS configuration
net/mlx5e: Introduce mlx5e_rss_init_params
net/mlx5e: Remove unused mdev param from RSS indir init
net/mlx5: Improve QoS error messages with actual depth values
net/mlx5e: Prevent entering switchdev mode with inconsistent netns
net/mlx5: HWS, Generalize complex matchers
net/mlx5: Improve write-combining test reliability for ARM64 Grace CPUs
selftests/net: add tcp_port_share to .gitignore
Revert "net/mlx5e: Update and set Xon/Xoff upon MTU set"
net: add NUMA awareness to skb_attempt_defer_free()
net: use llist for sd->defer_list
net: make softnet_data.defer_count an atomic
selftests: drv-net: psp: add tests for destroying devices
selftests: drv-net: psp: add test for auto-adjusting TCP MSS
...

show more ...


Revision tags: v6.17
# 6e2f1484 23-Sep-2025 Paolo Abeni <pabeni@redhat.com>

Merge branch 'tcp-update-bind-bucket-state-on-port-release'

Jakub Sitnicki says:

====================
tcp: Update bind bucket state on port release

TL;DR
-----

This is another take on addressing

Merge branch 'tcp-update-bind-bucket-state-on-port-release'

Jakub Sitnicki says:

====================
tcp: Update bind bucket state on port release

TL;DR
-----

This is another take on addressing the issue we already raised earlier [1].

This time around, instead of trying to relax the bind-conflict checks in
connect(), we make an attempt to fix the tcp bind bucket state accounting.

The goal of this patch set is to make the bind buckets return to "port
reusable by ephemeral connections" state when all sockets blocking the port
from reuse get unhashed.

Changelog
---------
Changes in v5:
- Fix initial port-addr bucket state on saddr update with ip_dynaddr=1
- Add Kuniyuki's tag for tests
- Link to v4: https://lore.kernel.org/r/20250913-update-bind-bucket-state-on-unhash-v4-0-33a567594df7@cloudflare.com

Changes in v4:
- Drop redundant sk_is_connect_bind helper doc comment
- Link to v3: https://lore.kernel.org/r/20250910-update-bind-bucket-state-on-unhash-v3-0-023caaf4ae3c@cloudflare.com

Changes in v3:
- Move the flag from inet_flags to sk_userlocks (Kuniyuki)
- Rename the flag from AUTOBIND to CONNECT_BIND to avoid a name clash (Kuniyuki)
- Drop unreachable code for sk_state == TCP_NEW_SYN_RECV (Kuniyuki)
- Move the helper to inet_hashtables where it's used
- Reword patch 1 description for conciseness
- Link to v2: https://lore.kernel.org/r/20250821-update-bind-bucket-state-on-unhash-v2-0-0c204543a522@cloudflare.com

Changes in v2:
- Rename the inet_sock flag from LAZY_BIND to AUTOBIND (Eric)
- Clear the AUTOBIND flag on disconnect path (Eric)
- Add a test to cover the disconnect case (Eric)
- Link to RFC v1: https://lore.kernel.org/r/20250808-update-bind-bucket-state-on-unhash-v1-0-faf85099d61b@cloudflare.com

Situation
---------

We observe the following scenario in production:

inet_bind_bucket
state for port 54321
--------------------

(bucket doesn't exist)

// Process A opens a long-lived connection:
s1 = socket(AF_INET, SOCK_STREAM)
s1.setsockopt(IP_BIND_ADDRESS_NO_PORT)
s1.setsockopt(IP_LOCAL_PORT_RANGE, 54000..54500)
s1.bind(192.0.2.10, 0)
s1.connect(192.51.100.1, 443)
tb->fastreuse = -1
tb->fastreuseport = -1
s1.getsockname() -> 192.0.2.10:54321
s1.send()
s1.recv()
// ... s1 stays open.

// Process B opens a short-lived connection:
s2 = socket(AF_INET, SOCK_STREAM)
s2.setsockopt(SO_REUSEADDR)
s2.bind(192.0.2.20, 0)
tb->fastreuse = 0
tb->fastreuseport = 0
s2.connect(192.51.100.2, 53)
s2.getsockname() -> 192.0.2.20:54321
s2.send()
s2.recv()
s2.close()

// bucket remains in this
// state even though port
// was released by s2
tb->fastreuse = 0
tb->fastreuseport = 0

// Process A attempts to open another connection
// when there is connection pressure from
// 192.0.2.30:54000..54500 to 192.51.100.1:443.
// Assume only port 54321 is still available.

s3 = socket(AF_INET, SOCK_STREAM)
s3.setsockopt(IP_BIND_ADDRESS_NO_PORT)
s3.setsockopt(IP_LOCAL_PORT_RANGE, 54000..54500)
s3.bind(192.0.2.30, 0)
s3.connect(192.51.100.1, 443) -> EADDRNOTAVAIL (99)

Problem
-------

We end up in a state where Process A can't reuse ephemeral port 54321 for
as long as there are sockets, like s1, that keep the bind bucket alive. The
bucket does not return to "reusable" state even when all sockets which
blocked it from reuse, like s2, are gone.

The ephemeral port becomes available for use again only after all sockets
bound to it are gone and the bind bucket is destroyed.

Programs which behave like Process B in this scenario - that is, binding to
an IP address without setting IP_BIND_ADDRESS_NO_PORT - might be considered
poorly written. However, the reality is that such implementation is not
actually uncommon. Trying to fix each and every such program is like
playing whack-a-mole.

For instance, it could be any software using Golang's net.Dialer with
LocalAddr provided:

dialer := &net.Dialer{
LocalAddr: &net.TCPAddr{IP: srcIP},
}
conn, err := dialer.Dial("tcp4", dialTarget)

Or even a ubiquitous tool like dig when using a specific local address:

$ dig -b 127.1.1.1 +tcp +short example.com

Hence, we are proposing a systematic fix in the network stack itself.

Solution
--------

Please see the description in patch 1.

[1] https://lore.kernel.org/r/20250714-connect-port-search-harder-v3-0-b1a41f249865@cloudflare.com

Reported-by: Lee Valentine <lvalentine@cloudflare.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
====================

Link: https://patch.msgid.link/20250917-update-bind-bucket-state-on-unhash-v5-0-57168b661b47@cloudflare.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

show more ...


Revision tags: v6.17-rc7
# 8a8241cd 17-Sep-2025 Jakub Sitnicki <jakub@cloudflare.com>

selftests/net: Test tcp port reuse after unbinding a socket

Exercise the scenario described in detail in the cover letter:

1) socket A: connect() from ephemeral port X
2) socket B: explicitly b

selftests/net: Test tcp port reuse after unbinding a socket

Exercise the scenario described in detail in the cover letter:

1) socket A: connect() from ephemeral port X
2) socket B: explicitly bind() to port X
3) check that port X is now excluded from ephemeral ports
4) close socket B to release the port bind
5) socket C: connect() from ephemeral port X

As well as a corner case to test that the connect-bind flag is cleared:

1) connect() from ephemeral port X
2) disconnect the socket with connect(AF_UNSPEC)
3) bind() it explicitly to port X
4) check that port X is now excluded from ephemeral ports

Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://patch.msgid.link/20250917-update-bind-bucket-state-on-unhash-v5-2-57168b661b47@cloudflare.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

show more ...