Revision tags: release/11.3.0 |
|
#
7f49ce7a |
| 28-Jun-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead @349476
Sponsored by: The FreeBSD Foundation
|
#
59854ecf |
| 25-Jun-2019 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Convert all IPv4 and IPv6 multicast memberships into using a STAILQ instead of a linear array.
The multicast memberships for the inpcb structure are protected by a non-sleepable lock, INP_WLOCK(), w
Convert all IPv4 and IPv6 multicast memberships into using a STAILQ instead of a linear array.
The multicast memberships for the inpcb structure are protected by a non-sleepable lock, INP_WLOCK(), which needs to be dropped when calling the underlying possibly sleeping if_ioctl() method. When using a linear array to keep track of multicast memberships, the computed memory location of the multicast filter may suddenly change, due to concurrent insertion or removal of elements in the linear array. This in turn leads to various invalid memory access issues and kernel panics.
To avoid this problem, put all multicast memberships on a STAILQ based list. Then the memory location of the IPv4 and IPv6 multicast filters become fixed during their lifetime and use after free and memory leak issues are easier to track, for example by: vmstat -m | grep multi
All list manipulation has been factored into inline functions including some macros, to easily allow for a future hash-list implementation, if needed.
This patch has been tested by pho@ .
Differential Revision: https://reviews.freebsd.org/D20080 Reviewed by: markj @ MFC after: 1 week Sponsored by: Mellanox Technologies
show more ...
|
Revision tags: release/12.0.0, release/11.2.0 |
|
#
4f6c66cc |
| 23-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
UDP: further performance improvements on tx
Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps
Single stream throughput incre
UDP: further performance improvements on tx
Cumulative throughput while running 64 netperf -H $DUT -t UDP_STREAM -- -m 1 on a 2x8x2 SKL went from 1.1Mpps to 2.5Mpps
Single stream throughput increases from 910kpps to 1.18Mpps
Baseline: https://people.freebsd.org/~mmacy/2018.05.11/udpsender2.svg
- Protect read access to global ifnet list with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender3.svg
- Protect short lived ifaddr references with epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender4.svg
- Convert if_afdata read lock path to epoch https://people.freebsd.org/~mmacy/2018.05.11/udpsender5.svg
A fix for the inpcbhash contention is pending sufficient time on a canary at LLNW.
Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15409
show more ...
|
#
d7c5a620 |
| 18-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@
gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I
ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@
gallatin: Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5 based ConnectX 4-LX NIC, I see an almost 12% improvement in received packet rate, and a larger improvement in bytes delivered all the way to userspace.
When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1, I see, using nstat -I mce0 1 before the patch:
InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 4.98 0.00 4.42 0.00 4235592 33 83.80 4720653 2149771 1235 247.32 4.73 0.00 4.20 0.00 4025260 33 82.99 4724900 2139833 1204 247.32 4.72 0.00 4.20 0.00 4035252 33 82.14 4719162 2132023 1264 247.32 4.71 0.00 4.21 0.00 4073206 33 83.68 4744973 2123317 1347 247.32 4.72 0.00 4.21 0.00 4061118 33 80.82 4713615 2188091 1490 247.32 4.72 0.00 4.21 0.00 4051675 33 85.29 4727399 2109011 1205 247.32 4.73 0.00 4.21 0.00 4039056 33 84.65 4724735 2102603 1053 247.32
After the patch
InMpps OMpps InGbs OGbs err TCP Est %CPU syscalls csw irq GBfree 5.43 0.00 4.20 0.00 3313143 33 84.96 5434214 1900162 2656 245.51 5.43 0.00 4.20 0.00 3308527 33 85.24 5439695 1809382 2521 245.51 5.42 0.00 4.19 0.00 3316778 33 87.54 5416028 1805835 2256 245.51 5.42 0.00 4.19 0.00 3317673 33 90.44 5426044 1763056 2332 245.51 5.42 0.00 4.19 0.00 3314839 33 88.11 5435732 1792218 2499 245.52 5.44 0.00 4.19 0.00 3293228 33 91.84 5426301 1668597 2121 245.52
Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch
Reviewed by: gallatin Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D15366
show more ...
|
#
b6f6f880 |
| 06-May-2018 |
Matt Macy <mmacy@FreeBSD.org> |
r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl to sleep on commands to the NIC when updating multicast filters. More generally this permitted driver's
r333175 introduced deferred deletion of multicast addresses in order to permit the driver ioctl to sleep on commands to the NIC when updating multicast filters. More generally this permitted driver's to use an sx as a softc lock. Unfortunately this change introduced a race whereby a a multicast update would still be queued for deletion when ifconfig deleted the interface thus calling down in to _purgemaddrs and synchronously deleting _all_ of the multicast addresses on the interface.
Synchronously remove all external references to a multicast address before enqueueing for delete.
Reported by: lwhsu Approved by: sbruno
show more ...
|
#
f3e1324b |
| 02-May-2018 |
Stephen Hurd <shurd@FreeBSD.org> |
Separate list manipulation locking from state change in multicast
Multicast incorrectly calls in to drivers with a mutex held causing drivers to have to go through all manner of contortions to use a
Separate list manipulation locking from state change in multicast
Multicast incorrectly calls in to drivers with a mutex held causing drivers to have to go through all manner of contortions to use a non sleepable lock. Serialize multicast updates instead.
Submitted by: mmacy <mmacy@mattmacy.io> Reviewed by: shurd, sbruno Sponsored by: Limelight Networks Differential Revision: https://reviews.freebsd.org/D14969
show more ...
|
#
82725ba9 |
| 23-Nov-2017 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Merge ^/head r325999 through r326131.
|
#
51369649 |
| 20-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification to make it easier for
sys: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.
The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
Special thanks to Wind River for providing access to "The Duke of Highlander" tool: an older (2014) run over FreeBSD tree was useful as a starting point.
show more ...
|
Revision tags: release/10.4.0, release/11.1.0 |
|
#
348238db |
| 01-Mar-2017 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r314420 through r314481.
|
#
fbbd9655 |
| 01-Mar-2017 |
Warner Losh <imp@FreeBSD.org> |
Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is
Renumber copyright clause 4
Renumber cluase 4 to 3, per what everybody else did when BSD granted them permission to remove clause 3. My insistance on keeping the same numbering for legal reasons is too pedantic, so give up on that point.
Submitted by: Jan Schaumann <jschauma@stevens.edu> Pull Request: https://github.com/freebsd/freebsd/pull/96
show more ...
|
#
242b2482 |
| 09-Oct-2016 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r306412 through r306905.
|
#
2d9db0bc |
| 02-Oct-2016 |
Eric van Gyzen <vangyzen@FreeBSD.org> |
Add GARP retransmit capability
A single gratuitous ARP (GARP) is always transmitted when an IPv4 address is added to an interface, and that is usually sufficient. However, in some circumstances, suc
Add GARP retransmit capability
A single gratuitous ARP (GARP) is always transmitted when an IPv4 address is added to an interface, and that is usually sufficient. However, in some circumstances, such as when a shared address is passed between cluster nodes, this single GARP may occasionally be dropped or lost. This can lead to neighbors on the network link working with a stale ARP cache and sending packets destined for that address to the node that previously owned the address, which may not respond.
To avoid this situation, GARP retransmissions can be enabled by setting the net.link.ether.inet.garp_rexmit_count sysctl to a value greater than zero. The setting represents the maximum number of retransmissions. The interval between retransmissions is calculated using an exponential backoff algorithm, doubling each time, so the retransmission intervals are: {1, 2, 4, 8, 16, ...} (seconds).
Due to the exponential backoff algorithm used for the interval between GARP retransmissions, the maximum number of retransmissions is limited to 16 for sanity. This limit corresponds to a maximum interval between retransmissions of 2^16 seconds ~= 18 hours. Increasing this limit is possible, but sending out GARPs spaced days apart would be of little use.
Submitted by: David A. Bright <david.a.bright@dell.com> MFC after: 1 month Relnotes: yes Sponsored by: Dell EMC Differential Revision: https://reviews.freebsd.org/D7695
show more ...
|
Revision tags: release/11.0.1, release/11.0.0 |
|
#
89856f7e |
| 21-Jun-2016 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
Get closer to a VIMAGE network stack teardown from top to bottom rather than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be
Get closer to a VIMAGE network stack teardown from top to bottom rather than removing the network interfaces first. This change is rather larger and convoluted as the ordering requirements cannot be separated.
Move the pfil(9) framework to SI_SUB_PROTO_PFIL, move Firewalls and related modules to their own SI_SUB_PROTO_FIREWALL. Move initialization of "physical" interfaces to SI_SUB_DRIVERS, move virtual (cloned) interfaces to SI_SUB_PSEUDO. Move Multicast to SI_SUB_PROTO_MC.
Re-work parts of multicast initialisation and teardown, not taking the huge amount of memory into account if used as a module yet.
For interface teardown we try to do as many of them as we can on SI_SUB_INIT_IF, but for some this makes no sense, e.g., when tunnelling over a higher layer protocol such as IP. In that case the interface has to go along (or before) the higher layer protocol is shutdown.
Kernel hhooks need to go last on teardown as they may be used at various higher layers and we cannot remove them before we cleaned up the higher layers.
For interface teardown there are multiple paths: (a) a cloned interface is destroyed (inside a VIMAGE or in the base system), (b) any interface is moved from a virtual network stack to a different network stack ("vmove"), or (c) a virtual network stack is being shut down. All code paths go through if_detach_internal() where we, depending on the vmove flag or the vnet state, make a decision on how much to shut down; in case we are destroying a VNET the individual protocol layers will cleanup their own parts thus we cannot do so again for each interface as we end up with, e.g., double-frees, destroying locks twice or acquiring already destroyed locks. When calling into protocol cleanups we equally have to tell them whether they need to detach upper layer protocols ("ulp") or not (e.g., in6_ifdetach()).
Provide or enahnce helper functions to do proper cleanup at a protocol rather than at an interface level.
Approved by: re (hrs) Obtained from: projects/vnet Reviewed by: gnn, jhb Sponsored by: The FreeBSD Foundation MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D6747
show more ...
|
Revision tags: release/10.3.0 |
|
#
2414e864 |
| 03-Feb-2016 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
MfH @r295202
Expect to see panics in routing code at least now.
|
#
14d5c08b |
| 26-Jan-2016 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r294599 through r294776.
|
#
f9421853 |
| 25-Jan-2016 |
Glen Barber <gjb@FreeBSD.org> |
MFH
Sponsored by: The FreeBSD Foundation
|
#
61eee0e2 |
| 25-Jan-2016 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
MFP r287070,r287073: split radix implementation and route table structure.
There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don'
MFP r287070,r287073: split radix implementation and route table structure.
There are number of radix consumers in kernel land (pf,ipfw,nfs,route) with different requirements. In fact, first 3 don't have _any_ requirements and first 2 does not use radix locking. On the other hand, routing structure do have these requirements (rnh_gen, multipath, custom to-be-added control plane functions, different locking). Additionally, radix should not known anything about its consumers internals.
So, radix code now uses tiny 'struct radix_head' structure along with internal 'struct radix_mask_head' instead of 'struct radix_node_head'. Existing consumers still uses the same 'struct radix_node_head' with slight modifications: they need to pass pointer to (embedded) 'struct radix_head' to all radix callbacks.
Routing code now uses new 'struct rib_head' with different locking macro: RADIX_NODE_HEAD prefix was renamed to RIB_ (which stands for routing information base).
New net/route_var.h header was added to hold routing subsystem internal data. 'struct rib_head' was placed there. 'struct rtentry' will also be moved there soon.
show more ...
|
#
009e81b1 |
| 22-Jan-2016 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
MFH @r294567
|
#
7f38eb34 |
| 15-Jan-2016 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r293850 through r294089.
|
#
a11378bd |
| 14-Jan-2016 |
Glen Barber <gjb@FreeBSD.org> |
MFH
Sponsored by: The FreeBSD Foundation
|
#
10e0e235 |
| 14-Jan-2016 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Remove now-unused wrappers for various routing functions.
|
#
b626f5a7 |
| 04-Jan-2016 |
Glen Barber <gjb@FreeBSD.org> |
MFH r289384-r293170
Sponsored by: The FreeBSD Foundation
|
#
d6e82913 |
| 17-Dec-2015 |
Steven Hartland <smh@FreeBSD.org> |
Revert r292275 & r292379
glebius has concerns about these changes so reverting those can be discussed and addressed.
Sponsored by: Multiplay
|
#
52e53e2d |
| 15-Dec-2015 |
Steven Hartland <smh@FreeBSD.org> |
Fix lagg failover due to missing notifications
When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited Neighbour Advertisements (IPv6) are sent to notify other nodes that the addr
Fix lagg failover due to missing notifications
When using lagg failover mode neither Gratuitous ARP (IPv4) or Unsolicited Neighbour Advertisements (IPv6) are sent to notify other nodes that the address may have moved.
This results is slow failover, dropped packets and network outages for the lagg interface when the primary link goes down.
We now use the new if_link_state_change_cond with the force param set to allow lagg to force through link state changes and hence fire a ifnet_link_event which are now monitored by rip and nd6.
Upon receiving these events each protocol trigger the relevant notifications: * inet4 => Gratuitous ARP * inet6 => Unsolicited Neighbour Announce
This also fixes the carp IPv6 NA's that stopped working after r251584 which added the ipv6_route__llma route.
The new behavour can be controlled using the sysctls: * net.link.ether.inet.arp_on_link * net.inet6.icmp6.nd6_on_link
Also removed unused param from lagg_port_state and added descriptions for the sysctls while here.
PR: 156226 MFC after: 1 month Sponsored by: Multiplay Differential Revision: https://reviews.freebsd.org/D4111
show more ...
|
#
a5d8944a |
| 19-Nov-2015 |
Navdeep Parhar <np@FreeBSD.org> |
Catch up with head (r291075).
|