#
d74b7bae |
| 04-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
ifnet_byindex() actually requires network epoch
Sweep over potentially unsafe calls to ifnet_byindex() and wrap them in epoch. Most of the code touched remains unsafe, as the returned pointer is be
ifnet_byindex() actually requires network epoch
Sweep over potentially unsafe calls to ifnet_byindex() and wrap them in epoch. Most of the code touched remains unsafe, as the returned pointer is being used after epoch exit. Mark that with a comment.
Validate the index argument inside the function, reducing argument validation requirement from the callers and making V_if_index private to if.c.
Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D33263
show more ...
|
#
44775b16 |
| 24-Nov-2021 |
Mark Johnston <markj@FreeBSD.org> |
netinet: Remove unneeded mb_unmapped_to_ext() calls
in_cksum_skip() now handles unmapped mbufs on platforms where they're permitted.
Reviewed by: glebius, jhb MFC after: 1 week Sponsored by: The Fr
netinet: Remove unneeded mb_unmapped_to_ext() calls
in_cksum_skip() now handles unmapped mbufs on platforms where they're permitted.
Reviewed by: glebius, jhb MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D33097
show more ...
|
#
a8d54fc9 |
| 09-Aug-2021 |
Michael Tuexen <tuexen@FreeBSD.org> |
ipv6: Fix getsockopt() for some IPPROTO_IPV6 level socket options
Fix getsockopt() for the IPPROTO_IPV6 level socket options with the following names: IPV6_HOPOPTS, IPV6_RTHDR, IPV6_RTHDRDSTOPTS, IP
ipv6: Fix getsockopt() for some IPPROTO_IPV6 level socket options
Fix getsockopt() for the IPPROTO_IPV6 level socket options with the following names: IPV6_HOPOPTS, IPV6_RTHDR, IPV6_RTHDRDSTOPTS, IPV6_DSTOPTS, and IPV6_NEXTHOP.
Reviewed by: markj MFC after: 3 days Sponsored by: Netflix, Inc. Differential Revision: https://reviews.freebsd.org/D31458
show more ...
|
#
2290dfb4 |
| 19-May-2021 |
Ryan Stone <rstone@FreeBSD.org> |
Enter the net epoch before calling ip6_setpktopts
ip6_setpktopts() can look up ifnets via ifnet_by_index(), which is only safe in the net epoch. Ensure that callers are in the net epoch before call
Enter the net epoch before calling ip6_setpktopts
ip6_setpktopts() can look up ifnets via ifnet_by_index(), which is only safe in the net epoch. Ensure that callers are in the net epoch before calling this function.
Sponsored by: Dell EMC Isilon MFC after: 4 weeks Reviewed by: donner, kp Differential Revision: https://reviews.freebsd.org/D30630
show more ...
|
Revision tags: release/13.0.0 |
|
#
bb4a7d94 |
| 04-Mar-2021 |
Kristof Provost <kp@FreeBSD.org> |
net: Introduce IPV6_DSCP(), IPV6_ECN() and IPV6_TRAFFIC_CLASS() macros
Introduce convenience macros to retrieve the DSCP, ECN or traffic class bits from an IPv6 header.
Use them where appropriate.
net: Introduce IPV6_DSCP(), IPV6_ECN() and IPV6_TRAFFIC_CLASS() macros
Introduce convenience macros to retrieve the DSCP, ECN or traffic class bits from an IPv6 header.
Use them where appropriate.
Reviewed by: ae (previous version), rscheff, tuexen, rgrimes MFC after: 2 weeks Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D29056
show more ...
|
#
1bd44b11 |
| 14-Feb-2021 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Do not reference returned ifa in in6_ifawithifp().
The only place where in6_ifawithifp() is used is ip6_output(), which uses the returned ifa to bump traffic counters. Given ifa stability guarantee
Do not reference returned ifa in in6_ifawithifp().
The only place where in6_ifawithifp() is used is ip6_output(), which uses the returned ifa to bump traffic counters. Given ifa stability guarantees is provided by epoch, do not refcount ifa.
This eliminates 2 atomic ops from IPv6 fast path.
Reviewed By: rstone MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D28649
show more ...
|
#
3f43ada9 |
| 28-Jan-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Catch up with 6edfd179c86: mechanically rename IFCAP_NOMAP to IFCAP_MEXTPG.
Originally IFCAP_NOMAP meant that the mbuf has external storage pointer that points to unmapped address. Then, this was e
Catch up with 6edfd179c86: mechanically rename IFCAP_NOMAP to IFCAP_MEXTPG.
Originally IFCAP_NOMAP meant that the mbuf has external storage pointer that points to unmapped address. Then, this was extended to array of such pointers. Then, such mbufs were augmented with header/trailer. Basically, extended mbufs are extended, and set of features is subject to change. The new name should be generic enough to avoid further renaming.
show more ...
|
Revision tags: release/12.2.0 |
|
#
0c325f53 |
| 18-Oct-2020 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Implement flowid calculation for outbound connections to balance connections over multiple paths.
Multipath routing relies on mbuf flowid data for both transit and outbound traffic. Current code f
Implement flowid calculation for outbound connections to balance connections over multiple paths.
Multipath routing relies on mbuf flowid data for both transit and outbound traffic. Current code fills mbuf flowid from inp_flowid for connection-oriented sockets. However, inp_flowid is currently not calculated for outbound connections.
This change creates simple hashing functions and starts calculating hashes for TCP,UDP/UDP-Lite and raw IP if multipath routes are present in the system.
Reviewed by: glebius (previous version),ae Differential Revision: https://reviews.freebsd.org/D26523
show more ...
|
#
868aabb4 |
| 09-Oct-2020 |
Richard Scheffenegger <rscheff@FreeBSD.org> |
Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow.
This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt) option IP(V6)_VLAN_PCP, which can be set to -1 (interface default), or explicitly to
Add IP(V6)_VLAN_PCP to set 802.1 priority per-flow.
This adds a new IP_PROTO / IPV6_PROTO setsockopt (getsockopt) option IP(V6)_VLAN_PCP, which can be set to -1 (interface default), or explicitly to any priority between 0 and 7.
Note that for untagged traffic, explicitly adding a priority will insert a special 801.1Q vlan header with vlan ID = 0 to carry the priority setting
Reviewed by: gallatin, rrs MFC after: 2 weeks Sponsored by: NetApp, Inc. Differential Revision: https://reviews.freebsd.org/D26409
show more ...
|
#
b092fd6c |
| 18-Sep-2020 |
Navdeep Parhar <np@FreeBSD.org> |
if_vxlan(4): add support for hardware assisted checksumming, TSO, and RSS.
This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx and rx), TSO, and RSS if the NIC is capable
if_vxlan(4): add support for hardware assisted checksumming, TSO, and RSS.
This lets a VXLAN pseudo-interface take advantage of hardware checksumming (tx and rx), TSO, and RSS if the NIC is capable of performing these operations on inner VXLAN traffic.
A VXLAN interface inherits the capabilities of its vxlandev interface if one is specified or of the interface that hosts the vxlanlocal address. If other interfaces will carry traffic for that VXLAN then they must have the same hardware capabilities.
On transmit, if_vxlan verifies that the outbound interface has the required capabilities and then translates the CSUM_ flags to their inner equivalents. This tells the hardware ifnet that it needs to operate on the inner frame and not the outer VXLAN headers.
An event is generated when a VXLAN ifnet starts. This allows hardware drivers to configure their devices to expect VXLAN traffic on the specified incoming port.
On receive, the hardware does RSS and checksum verification on the inner frame. if_vxlan now does a direct netisr dispatch to take full advantage of RSS. It is not very clear why it didn't do this already.
Future work: Rx: it should be possible to avoid the first trip up the protocol stack to get the frame to if_vxlan just so it can decapsulate and requeue for a second trip up the stack. The hardware NIC driver could directly call an if_vxlan receive routine for VXLAN traffic instead.
Rx: LRO. depends on what happens with the previous item. There will have to to be a mechanism to indicate that it's time for if_vxlan to flush its LRO state.
Reviewed by: kib@ Relnotes: Yes Sponsored by: Chelsio Communications Differential Revision: https://reviews.freebsd.org/D25873
show more ...
|
#
662c1305 |
| 01-Sep-2020 |
Mateusz Guzik <mjg@FreeBSD.org> |
net: clean up empty lines in .c and .h files
|
#
c7aa572c |
| 31-Jul-2020 |
Glen Barber <gjb@FreeBSD.org> |
MFH
Sponsored by: Rubicon Communications, LLC (netgate.com)
|
#
17996960 |
| 31-Jul-2020 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r363583 through r363738.
|
#
19afc65a |
| 30-Jul-2020 |
Mark Johnston <markj@FreeBSD.org> |
ip6_output(): Check the return value of in6_getlinkifnet().
If the destination address has an embedded scope ID, make sure that it corresponds to a valid ifnet before proceeding. Otherwise a sendto
ip6_output(): Check the return value of in6_getlinkifnet().
If the destination address has an embedded scope ID, make sure that it corresponds to a valid ifnet before proceeding. Otherwise a sendto() with a bogus link-local address can trigger a NULL pointer dereference.
Reported by: syzkaller Reviewed by: ae Fixes: r358572 Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D25887
show more ...
|
#
95033af9 |
| 18-Jun-2020 |
Mark Johnston <markj@FreeBSD.org> |
Add the SCTP_SUPPORT kernel option.
This is in preparation for enabling a loadable SCTP stack. Analogous to IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured in order to suppor
Add the SCTP_SUPPORT kernel option.
This is in preparation for enabling a loadable SCTP stack. Analogous to IPSEC/IPSEC_SUPPORT, the SCTP_SUPPORT kernel option must be configured in order to support a loadable SCTP implementation.
Discussed with: tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation
show more ...
|
Revision tags: release/11.4.0 |
|
#
1483c1c5 |
| 28-May-2020 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Replace ip6_ouput fib6_lookup_nh_<ext|basic> calls with fib6_lookup().
fib6_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-s
Replace ip6_ouput fib6_lookup_nh_<ext|basic> calls with fib6_lookup().
fib6_lookup_nh_ represents pre-epoch generation of fib api, providing less guarantees over pointer validness and requiring on-stack data copying.
Conversion is straight-forwarded, as the only 2 differences are requirement of running in network epoch and the need to handle RTF_GATEWAY case in the caller code.
Reviewed by: ae Differential Revision: https://reviews.freebsd.org/D24973
show more ...
|
#
d7452d89 |
| 12-May-2020 |
Andrew Gallatin <gallatin@FreeBSD.org> |
IPv6: sync IP_NO_SND_TAG_RL support from IPv4
The IP_NO_SND_TAG_RL flag to ip{,6}_output() means that the packets being sent should bypass hardware rate limiting. This is typically used by modern TC
IPv6: sync IP_NO_SND_TAG_RL support from IPv4
The IP_NO_SND_TAG_RL flag to ip{,6}_output() means that the packets being sent should bypass hardware rate limiting. This is typically used by modern TCP stacks for rexmits.
This support was added to IPv4 in r352657, but never added to IPv6, even though rack and bbr call ip6_output() with this flag.
Reviewed by: rrs Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24822
show more ...
|
#
84af4cc1 |
| 11-May-2020 |
Andrew Gallatin <gallatin@FreeBSD.org> |
Fix the build
Back out the IPv6 portion of r360903, as the stamp_tag param is apparently not supported in upstream FreeBSD.
Sponsored by: Netflix Pointy hat to: gallatin
|
#
6043ac20 |
| 11-May-2020 |
Andrew Gallatin <gallatin@FreeBSD.org> |
Ktls: never skip stamping tags for NIC TLS
The newer RACK and BBR TCP stacks have added a mechanism to disable hardware packet pacing for TCP retransmits. This mechanism works by skipping the send-t
Ktls: never skip stamping tags for NIC TLS
The newer RACK and BBR TCP stacks have added a mechanism to disable hardware packet pacing for TCP retransmits. This mechanism works by skipping the send-tag stamp on rate-limited connections when the TCP stack calls ip_output() with the IP_NO_SND_TAG_RL flag set.
When doing NIC TLS, we must ignore this flag, as NIC TLS packets must always be stamped. Failure to stamp a NIC TLS packet will result in crypto issues.
Reviewed by: hselasky, rrs Sponsored by: Netflix, Mellanox
show more ...
|
#
7b6c99d0 |
| 03-May-2020 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf within m_epg namespace. All edits except the 'struct mbuf' declaration and mb_dupcl() were done mechanically with sed:
Step 3: anonymize struct mbuf_ext_pgs and move all its fields into mbuf within m_epg namespace. All edits except the 'struct mbuf' declaration and mb_dupcl() were done mechanically with sed:
s/->m_ext_pgs.nrdy/->m_epg_nrdy/g s/->m_ext_pgs.hdr_len/->m_epg_hdrlen/g s/->m_ext_pgs.trail_len/->m_epg_trllen/g s/->m_ext_pgs.first_pg_off/->m_epg_1st_off/g s/->m_ext_pgs.last_pg_len/->m_epg_last_len/g s/->m_ext_pgs.flags/->m_epg_flags/g s/->m_ext_pgs.record_type/->m_epg_record_type/g s/->m_ext_pgs.enc_cnt/->m_epg_enc_cnt/g s/->m_ext_pgs.tls/->m_epg_tls/g s/->m_ext_pgs.so/->m_epg_so/g s/->m_ext_pgs.seqno/->m_epg_seqno/g s/->m_ext_pgs.stailq/->m_epg_stailq/g
Reviewed by: gallatin Differential Revision: https://reviews.freebsd.org/D24598
show more ...
|
#
983066f0 |
| 25-Apr-2020 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Convert route caching to nexthop caching.
This change is build on top of nexthop objects introduced in r359823.
Nexthops are separate datastructures, containing all necessary information to perfor
Convert route caching to nexthop caching.
This change is build on top of nexthop objects introduced in r359823.
Nexthops are separate datastructures, containing all necessary information to perform packet forwarding such as gateway interface and mtu. Nexthops are shared among the routes, providing more pre-computed cache-efficient data while requiring less memory. Splitting the LPM code and the attached data solves multiple long-standing problems in the routing layer, drastically reduces the coupling with outher parts of the stack and allows to transparently introduce faster lookup algorithms.
Route caching was (re)introduced to minimise (slow) routing lookups, allowing for notably better performance for large TCP senders. Caching works by acquiring rtentry reference, which is protected by per-rtentry mutex. If the routing table is changed (checked by comparing the rtable generation id) or link goes down, cache record gets withdrawn.
Nexthops have the same reference counting interface, backed by refcount(9). This change merely replaces rtentry with the actual forwarding nextop as a cached object, which is mostly mechanical. Other moving parts like cache cleanup on rtable change remains the same.
Differential Revision: https://reviews.freebsd.org/D24340
show more ...
|
#
23feb563 |
| 14-Apr-2020 |
Andrew Gallatin <gallatin@FreeBSD.org> |
KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.
While the original implementation of unmapped mbufs was a large step forward in terms of reducing cache misses by enabling mbufs to
KTLS: Re-work unmapped mbufs to carry ext_pgs in the mbuf itself.
While the original implementation of unmapped mbufs was a large step forward in terms of reducing cache misses by enabling mbufs to carry more than a single page for sendfile, they are rather cache unfriendly when accessing the ext_pgs metadata and data. This is because the ext_pgs part of the mbuf is allocated separately, and almost guaranteed to be cold in cache.
This change takes advantage of the fact that unmapped mbufs are never used at the same time as pkthdr mbufs. Given this fact, we can overlap the ext_pgs metadata with the mbuf pkthdr, and carry the ext_pgs meta directly in the mbuf itself. Similarly, we can carry the ext_pgs data (TLS hdr/trailer/array of pages) directly after the existing m_ext.
In order to be able to carry 5 pages (which is the minimum required for a 16K TLS record which is not perfectly aligned) on LP64, I've had to steal ext_arg2. The only user of this in the xmit path is sendfile, and I've adjusted it to use arg1 when using unmapped mbufs.
This change is almost entirely mechanical, except that we change mb_alloc_ext_pgs() to no longer allow allocating pkthdrs, the change to avoid ext_arg2 as mentioned above, and the removal of the ext_pgs zone,
This change saves roughly 2% "raw" CPU (~59% -> 57%), or over 3% "scaled" CPU on a Netflix 100% software kTLS workload at 90+ Gb/s on Broadwell Xeons.
In a follow-on commit, I plan to remove some hacks to avoid access ext_pgs fields of mbufs, since they will now be in cache.
Many thanks to glebius for helping to make this better in the Netflix tree.
Reviewed by: hselasky, jhb, rrs, glebius (early version) Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D24213
show more ...
|
#
e02582d1 |
| 19-Mar-2020 |
Mark Johnston <markj@FreeBSD.org> |
Fix synchronization in the IPV6_2292PKTOPTIONS set handler.
The inpcb needs to be locked when we update output packet options. Otherwise it is possible for the IPV6_2292PKTOPTIONS handler to free pa
Fix synchronization in the IPV6_2292PKTOPTIONS set handler.
The inpcb needs to be locked when we update output packet options. Otherwise it is possible for the IPV6_2292PKTOPTIONS handler to free packet option structures while another thread is reading or updating them.
Note that the option handler is still kind of broken. For instance it frees all options before performing privilege checks for individual options. However, this can be fixed separately.
Reported by: syzbot+52eb0fd4ddc119787f9d@syzkaller.appspotmail.com Reviewed by: bz, tuexen MFC after: 2 weeks Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D24125
show more ...
|
#
e43d33d2 |
| 05-Mar-2020 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r358466 through r358677.
|
#
8483fce6 |
| 03-Mar-2020 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
ip6: retire in6_selectroute_fib() as promised 8 years ago
In r231852 I added in6_selectroute_fib() as a compat function with the fibnum as an extra argument compared to in6_selectroute() to keep the
ip6: retire in6_selectroute_fib() as promised 8 years ago
In r231852 I added in6_selectroute_fib() as a compat function with the fibnum as an extra argument compared to in6_selectroute() to keep the KPI stable. Way too late retire this function again and add the fib to in6_selectroute() which also only has a single consumer now and was an orphan function before.
show more ...
|