#
8b3bc70a |
| 08-Oct-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r352764 through r353315.
|
#
1e80e4f2 |
| 08-Oct-2019 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Remove epoch assertion from if_setlladdr(). Originally this function was protected by IF_ADDR_LOCK(), which was a mutex, so that two simultaneous if_setlladdr() can't execute. Later it was switched
Remove epoch assertion from if_setlladdr(). Originally this function was protected by IF_ADDR_LOCK(), which was a mutex, so that two simultaneous if_setlladdr() can't execute. Later it was switched to IF_ADDR_RLOCK(), likely by a mistake. Later it was switched to NET_EPOCH_ENTER(). Then I incorrectly added NET_EPOCH_ASSERT() here.
In reality ifp->if_addr never goes away and never changes its length. So, doing bcopy() in it is always "safe", meaning it won't dereference a wrong pointer or write into someone's else memory. Of course doing two bcopy() in parallel would result in a mess of two addresses, but net epoch doesn't protect against that, neither IF_ADDR_RLOCK() did.
So for now, just remove the assertion and leave for later a proper fix.
Reported by: markj
show more ...
|
#
e9dc46cc |
| 08-Oct-2019 |
Gleb Smirnoff <glebius@FreeBSD.org> |
In DIAGNOSTIC block of if_delmulti_ifma_flags() enter the network epoch. This quickly plugs the regression from r353292. The locking of multicast definitely needs a broader review today...
Reported
In DIAGNOSTIC block of if_delmulti_ifma_flags() enter the network epoch. This quickly plugs the regression from r353292. The locking of multicast definitely needs a broader review today...
Reported by: pho, dhw
show more ...
|
#
b8a6e03f |
| 08-Oct-2019 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex cover
Widen NET_EPOCH coverage.
When epoch(9) was introduced to network stack, it was basically dropped in place of existing locking, which was mutexes and rwlocks. For the sake of performance mutex covered areas were as small as possible, so became epoch covered areas.
However, epoch doesn't introduce any contention, it just delays memory reclaim. So, there is no point to minimise epoch covered areas in sense of performance. Meanwhile entering/exiting epoch also has non-zero CPU usage, so doing this less often is a win.
Not the least is also code maintainability. In the new paradigm we can assume that at any stage of processing a packet, we are inside network epoch. This makes coding both input and output path way easier.
On output path we already enter epoch quite early - in the ip_output(), in the ip6_output().
This patch does the same for the input path. All ISR processing, network related callouts, other ways of packet injection to the network stack shall be performed in net_epoch. Any leaf function that walks network configuration now asserts epoch.
Tricky part is configuration code paths - ioctls, sysctls. They also call into leaf functions, so some need to be changed.
This patch would introduce more epoch recursions (see EPOCH_TRACE) than we had before. They will be cleaned up separately, as several of them aren't trivial. Note, that unlike a lock recursion the epoch recursion is safe and just wastes a bit of resources.
Reviewed by: gallatin, hselasky, cy, adrian, kristof Differential Revision: https://reviews.freebsd.org/D19111
show more ...
|
#
204e2f30 |
| 07-Oct-2019 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Factor out VNET shutdown check into an own vnet structure field. Remove the now obsolete vnet_state field. This greatly simplifies the detection of VNET shutdown and avoids code duplication.
Discuss
Factor out VNET shutdown check into an own vnet structure field. Remove the now obsolete vnet_state field. This greatly simplifies the detection of VNET shutdown and avoids code duplication.
Discussed with: bz@ MFC after: 1 week Sponsored by: Mellanox Technologies
show more ...
|
#
668ee101 |
| 26-Sep-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r352587 through r352763.
|
#
dd902d01 |
| 25-Sep-2019 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Add debugging facility EPOCH_TRACE that checks that epochs entered are properly nested and warns about recursive entrances. Unlike with locks, there is nothing fundamentally wrong with such use, the
Add debugging facility EPOCH_TRACE that checks that epochs entered are properly nested and warns about recursive entrances. Unlike with locks, there is nothing fundamentally wrong with such use, the intent of tracer is to help to review complex epoch-protected code paths, and we mean the network stack here.
Reviewed by: hselasky Sponsored by: Netflix Pull Request: https://reviews.freebsd.org/D21610
show more ...
|
#
0f80acb9 |
| 19-Sep-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r352436 through r352536.
|
#
247cf566 |
| 17-Sep-2019 |
Konstantin Belousov <kib@FreeBSD.org> |
Add SIOCGIFDOWNREASON.
The ioctl(2) is intended to provide more details about the cause of the down for the link.
Eventually we might define a comprehensive list of codes for the situations. But i
Add SIOCGIFDOWNREASON.
The ioctl(2) is intended to provide more details about the cause of the down for the link.
Eventually we might define a comprehensive list of codes for the situations. But interface also allows the driver to provide free-form null-terminated ASCII string to provide arbitrary non-formalized information. Sample implementation exists for mlx5(4), where the string is fetched from firmware controlling the port.
Reviewed by: hselasky, rrs Sponsored by: Mellanox Technologies MFC after: 1 week Differential revision: https://reviews.freebsd.org/D21527
show more ...
|
#
61c1328e |
| 13-Sep-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r352105 through r352307.
|
#
40b1c921 |
| 12-Sep-2019 |
Kyle Evans <kevans@FreeBSD.org> |
SIOCSIFNAME: Do nothing if we're not actually changing
Instead of throwing EEXIST, just succeed if the name isn't actually changing. We don't need to trigger departure or any of that because there's
SIOCSIFNAME: Do nothing if we're not actually changing
Instead of throwing EEXIST, just succeed if the name isn't actually changing. We don't need to trigger departure or any of that because there's no change from consumers' perspective.
PR: 240539 Reviewed by: brooks MFC after: 5 days Differential Revision: https://reviews.freebsd.org/D21618
show more ...
|
#
a63915c2 |
| 28-Jul-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead @r350386
Sponsored by: The FreeBSD Foundation
|
Revision tags: release/11.3.0 |
|
#
0dbdf041 |
| 28-Jun-2019 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Need to wait for epoch callbacks to complete before detaching a network interface.
This particularly manifests itself when an INP has multicast options attached during a network interface detach. Th
Need to wait for epoch callbacks to complete before detaching a network interface.
This particularly manifests itself when an INP has multicast options attached during a network interface detach. Then the IPv4 and IPv6 leave group call which results from freeing the multicast address, may access a freed ifnet structure. These are the steps to reproduce:
service mdnsd onestart # installed from ports
ifconfig epair create ifconfig epair0a 0/24 up ifconfig epair0a destroy
Tested by: pho @ MFC after: 1 week Sponsored by: Mellanox Technologies
show more ...
|
#
0269ae4c |
| 06-Jun-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead @348740
Sponsored by: The FreeBSD Foundation
|
#
fb3bc596 |
| 25-May-2019 |
John Baldwin <jhb@FreeBSD.org> |
Restructure mbuf send tags to provide stronger guarantees.
- Perform ifp mismatch checks (to determine if a send tag is allocated for a different ifp than the one the packet is being output on), i
Restructure mbuf send tags to provide stronger guarantees.
- Perform ifp mismatch checks (to determine if a send tag is allocated for a different ifp than the one the packet is being output on), in ip_output() and ip6_output(). This avoids sending packets with send tags to ifnet drivers that don't support send tags.
Since we are now checking for ifp mismatches before invoking if_output, we can now try to allocate a new tag before invoking if_output sending the original packet on the new tag if allocation succeeds.
To avoid code duplication for the fragment and unfragmented cases, add ip_output_send() and ip6_output_send() as wrappers around if_output and nd6_output_ifp, respectively. All of the logic for setting send tags and dealing with send tag-related errors is done in these wrapper functions.
For pseudo interfaces that wrap other network interfaces (vlan and lagg), wrapper send tags are now allocated so that ip*_output see the wrapper ifp as the ifp in the send tag. The if_transmit routines rewrite the send tags after performing an ifp mismatch check. If an ifp mismatch is detected, the transmit routines fail with EAGAIN.
- To provide clearer life cycle management of send tags, especially in the presence of vlan and lagg wrapper tags, add a reference count to send tags managed via m_snd_tag_ref() and m_snd_tag_rele(). Provide a helper function (m_snd_tag_init()) for use by drivers supporting send tags. m_snd_tag_init() takes care of the if_ref on the ifp meaning that code alloating send tags via if_snd_tag_alloc no longer has to manage that manually. Similarly, m_snd_tag_rele drops the refcount on the ifp after invoking if_snd_tag_free when the last reference to a send tag is dropped.
This also closes use after free races if there are pending packets in driver tx rings after the socket is closed (e.g. from tcpdrop).
In order for m_free to work reliably, add a new CSUM_SND_TAG flag in csum_flags to indicate 'snd_tag' is set (rather than 'rcvif'). Drivers now also check this flag instead of checking snd_tag against NULL. This avoids false positive matches when a forwarded packet has a non-NULL rcvif that was treated as a send tag.
- cxgbe was relying on snd_tag_free being called when the inp was detached so that it could kick the firmware to flush any pending work on the flow. This is because the driver doesn't require ACK messages from the firmware for every request, but instead does a kind of manual interrupt coalescing by only setting a flag to request a completion on a subset of requests. If all of the in-flight requests don't have the flag when the tag is detached from the inp, the flow might never return the credits. The current snd_tag_free command issues a flush command to force the credits to return. However, the credit return is what also frees the mbufs, and since those mbufs now hold references on the tag, this meant that snd_tag_free would never be called.
To fix, explicitly drop the mbuf's reference on the snd tag when the mbuf is queued in the firmware work queue. This means that once the inp's reference on the tag goes away and all in-flight mbufs have been queued to the firmware, tag's refcount will drop to zero and snd_tag_free will kick in and send the flush request. Note that we need to avoid doing this in the middle of ethofld_tx(), so the driver grabs a temporary reference on the tag around that loop to defer the free to the end of the function in case it sends the last mbuf to the queue after the inp has dropped its reference on the tag.
- mlx5 preallocates send tags and was using the ifp pointer even when the send tag wasn't in use. Explicitly use the ifp from other data structures instead.
- Sprinkle some assertions in various places to assert that received packets don't have a send tag, and that other places that overwrite rcvif (e.g. 802.11 transmit) don't clobber a send tag pointer.
Reviewed by: gallatin, hselasky, rgrimes, ae Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20117
show more ...
|
#
e2e050c8 |
| 20-May-2019 |
Conrad Meyer <cem@FreeBSD.org> |
Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces hea
Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change).
No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
show more ...
|
#
2ad7ed6e |
| 19-May-2019 |
Alexander V. Chernikov <melifaro@FreeBSD.org> |
Fix rt_ifa selection during loopback route insertion process. Currently such routes are added with a link-level IFA, which is plain wrong. Only after the insertion they get fixed by the special
Fix rt_ifa selection during loopback route insertion process. Currently such routes are added with a link-level IFA, which is plain wrong. Only after the insertion they get fixed by the special link_rtrequest() ifa handler. This behaviour complicates routing code and makes ifa selection more complex. Streamline this process by explicitly moving link_rtrequest() logic to the pre-insertion rt_getifa_fib() ifa selector. Avoid calling all this logic in the loopback route case by explicitly specifying proper rt_ifa inside the ifa_maintain_loopback_route().§
MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D20076
show more ...
|
#
7648bc9f |
| 13-May-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead @347527
Sponsored by: The FreeBSD Foundation
|
#
7687707d |
| 22-Apr-2019 |
Andrew Gallatin <gallatin@FreeBSD.org> |
Track device's NUMA domain in ifnet & alloc ifnet from NUMA local memory
This commit adds new if_alloc_domain() and if_alloc_dev() methods to allocate ifnets. When called with a domain on a NUMA ma
Track device's NUMA domain in ifnet & alloc ifnet from NUMA local memory
This commit adds new if_alloc_domain() and if_alloc_dev() methods to allocate ifnets. When called with a domain on a NUMA machine, ifalloc_domain() will record the NUMA domain in the ifnet, and it will allocate the ifnet struct from memory which is local to that NUMA node. Similarly, if_alloc_dev() is a wrapper for if_alloc_domain which uses a driver supplied device_t to call ifalloc_domain() with the appropriate domain.
Note that the new if_numa_domain field fits in an alignment pad in struct ifnet, and so does not alter the size of the structure.
Reviewed by: glebius, kib, markj Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D19930
show more ...
|
#
88148a07 |
| 22-Jan-2019 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r343202 through r343319.
|
#
bc6f170e |
| 22-Jan-2019 |
Brooks Davis <brooks@FreeBSD.org> |
Rework CASE_IOC_IFGROUPREQ() to require a case before the macro.
This is more compatible with formatting tools and looks more normal.
Reported by: jhb (on a different review) Sponsored by: DARPA, A
Rework CASE_IOC_IFGROUPREQ() to require a case before the macro.
This is more compatible with formatting tools and looks more normal.
Reported by: jhb (on a different review) Sponsored by: DARPA, AFRL Differential Revision: https://reviews.freebsd.org/D18442
show more ...
|
#
a68cc388 |
| 09-Jan-2019 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Mechanical cleanup of epoch(9) usage in network stack.
- Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro
Mechanical cleanup of epoch(9) usage in network stack.
- Remove macros that covertly create epoch_tracker on thread stack. Such macros a quite unsafe, e.g. will produce a buggy code if same macro is used in embedded scopes. Explicitly declare epoch_tracker always.
- Unmask interface list IFNET_RLOCK_NOSLEEP(), interface address list IF_ADDR_RLOCK() and interface AF specific data IF_AFDATA_RLOCK() read locking macros to what they actually are - the net_epoch. Keeping them as is is very misleading. They all are named FOO_RLOCK(), while they no longer have lock semantics. Now they allow recursion and what's more important they now no longer guarantee protection against their companion WLOCK macros. Note: INP_HASH_RLOCK() has same problems, but not touched by this commit.
This is non functional mechanical change. The only functionally changed functions are ni6_addrs() and ni6_store_addrs(), where we no longer enter epoch recursively.
Discussed with: jtl, gallatin
show more ...
|
#
21ee61a6 |
| 09-Jan-2019 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Remove part of comment that doesn't match reality.
|
#
67350cb5 |
| 09-Dec-2018 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r340918 through r341763.
|
Revision tags: release/12.0.0 |
|
#
85107355 |
| 30-Nov-2018 |
Andrey V. Elsukov <ae@FreeBSD.org> |
Adapt the fix in r341008 to correctly work with EBR.
IFNET_RLOCK_NOSLEEP() is epoch_enter_preempt() in FreeBSD 12+. Holding it in sysctl_rtsock() doesn't protect us from ifnet unlinking, because unl
Adapt the fix in r341008 to correctly work with EBR.
IFNET_RLOCK_NOSLEEP() is epoch_enter_preempt() in FreeBSD 12+. Holding it in sysctl_rtsock() doesn't protect us from ifnet unlinking, because unlinking occurs with IFNET_WLOCK(), that is rw_wlock+sx_xlock, and it doesn check that concurrent code is running in epoch section. But while we are in epoch section, we should be able to do access to ifnet's fields, even it was unlinked. Thus do not change if_addr and if_hw_addr fields in ifnet_detach_internal() to NULL, since rtsock code can do access to these fields and this is allowed while it is running in epoch section.
This should fix the race, when ifnet_detach_internal() unlinks ifnet after we checked it for IFF_DYING in sysctl_dumpentry.
Move free(ifp->if_hw_addr) into ifnet_free_internal(). Also remove the NULL check for ifp->if_description, since free(9) can correctly handle NULL pointer.
MFC after: 1 week
show more ...
|