#
20162e6f |
| 14-Nov-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
rip6: don't lock the inpcb list
There is no point in doing that when we operate on a particular inpcb.
|
Revision tags: release/13.4.0, release/14.1.0, release/13.3.0 |
|
#
60d8dbbe |
| 18-Jan-2024 |
Kristof Provost <kp@FreeBSD.org> |
netinet: add a probe point for IP, IP6, ICMP, ICMP6, UDP and TCP stats counters
When debugging network issues one common clue is an unexpectedly incrementing error counter. This is helpful, in that
netinet: add a probe point for IP, IP6, ICMP, ICMP6, UDP and TCP stats counters
When debugging network issues one common clue is an unexpectedly incrementing error counter. This is helpful, in that it gives us an idea of what might be going wrong, but often these counters may be incremented in different functions.
Add a static probe point for them so that we can use dtrace to get futher information (e.g. a stack trace).
For example: dtrace -n 'mib:ip:count: { printf("%d", arg0); stack(); }'
This can be disabled by setting the following kernel option: options KDTRACE_NO_MIB_SDT
Reviewed by: gallatin, tuexen (previous version), gnn (previous version) Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D43504
show more ...
|
#
ce69e373 |
| 03-Feb-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Revert "sockets: retire sorflush()"
Provide a comment in sorflush() why the socket I/O sx(9) lock is actually important.
This reverts commit 507f87a799cf0811ce30f0ae7f10ba19b2fd3db3.
|
#
507f87a7 |
| 16-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: retire sorflush()
With removal of dom_dispose method the function boils down to two meaningful function calls: socantrcvmore() and sbrelease(). The latter is only relevant for protocols th
sockets: retire sorflush()
With removal of dom_dispose method the function boils down to two meaningful function calls: socantrcvmore() and sbrelease(). The latter is only relevant for protocols that use generic socket buffers.
The socket I/O sx(9) lock acquisition in sorflush() is not relevant for shutdown(2) operation as it doesn't do any I/O that may interleave with read(2) or write(2). The socket buffer mutex acquisition inside sbrelease() is what guarantees thread safety. This sx(9) acquisition in soshutdown() can be tracked down to 4.4BSD times, where it used to be sblock(), and it was carried over through the years evolving together with sockets with no reconsideration of why do we carry it over. I can't tell if that sblock() made sense back then, but it doesn't make any today.
Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D43415
show more ...
|
#
5bba2728 |
| 16-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: make pr_shutdown fully protocol specific method
Disassemble a one-for-all soshutdown() into protocol specific methods. This creates a small amount of copy & paste, but makes code a lot more
sockets: make pr_shutdown fully protocol specific method
Disassemble a one-for-all soshutdown() into protocol specific methods. This creates a small amount of copy & paste, but makes code a lot more self documented, as protocol specific method would execute only the code that is relevant to that protocol and nothing else. This also fixes a couple recent regressions and reduces risk of future regressions. The extended KPI for the new pr_shutdown removes need for the extra pr_flush which was added for the sake of SCTP which could not perform its shutdown properly with the old one. Particularly for SCTP this change streamlines a lot of code.
Some notes on why certain parts of code were copied or were not to certain protocols: * The (SS_ISCONNECTED | SS_ISCONNECTING | SS_ISDISCONNECTING) check is needed only for those protocols that may be connected or disconnected. * The above reduces into only SS_ISCONNECTED for those protocols that always connect instantly. * The ENOTCONN and continue processing hack is left only for datagram protocols. * The SOLISTENING(so) block is copied to those protocols that listen(2). * sorflush() on SHUT_RD is copied almost to every protocol, but that will be refactored later. * wakeup(&so->so_timeo) is copied to protocols that can make a non-instant connect(2), can SO_LINGER or can accept(2).
There are three protocols (netgraph(4), Bluetooth, SDP) that did not have pr_shutdown, but old soshutdown() would still perform sorflush() on SHUT_RD for them and also wakeup(9). Those protocols partially supported shutdown(2) returning EOPNOTSUP for SHUT_WR/SHUT_RDWR, now they fully lost shutdown(2) support. I'm pretty sure netgraph(4) and Bluetooth are okay about that and SDP is almost abandoned anyway.
Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D43413
show more ...
|
#
a13039e2 |
| 27-Dec-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: reoder inpcb destruction
First, merge in_pcbdetach() with in_pcbfree(). The comment for in_pcbdetach() was no longer correct. Then, make sure we remove the inpcb from the hash before we com
inpcb: reoder inpcb destruction
First, merge in_pcbdetach() with in_pcbfree(). The comment for in_pcbdetach() was no longer correct. Then, make sure we remove the inpcb from the hash before we commit any destructive actions on it. There are couple functions that rely on the hash lock skipping SMR + inpcb lock to lookup an inpcb. Although there are no known functions that similarly rely on the global inpcb list lock, also do list removal before destructive actions.
PR: 273890 Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D43122
show more ...
|
#
29363fb4 |
| 23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl s
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script.
Sponsored by: Netflix
show more ...
|
Revision tags: release/14.0.0 |
|
#
685dc743 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
Revision tags: release/13.2.0 |
|
#
3d0d5b21 |
| 23-Jan-2023 |
Justin Hibbits <jhibbits@FreeBSD.org> |
IfAPI: Explicitly include <net/if_private.h> in netstack
Summary: In preparation of making if_t completely opaque outside of the netstack, explicitly include the header. <net/if_var.h> will stop in
IfAPI: Explicitly include <net/if_private.h> in netstack
Summary: In preparation of making if_t completely opaque outside of the netstack, explicitly include the header. <net/if_var.h> will stop including the header in the future.
Sponsored by: Juniper Networks, Inc. Reviewed by: glebius, melifaro Differential Revision: https://reviews.freebsd.org/D38200
show more ...
|
Revision tags: release/12.4.0 |
|
#
fcb3f813 |
| 04-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netinet*: remove PRC_ constants and streamline ICMP processing
In the original design of the network stack from the protocol control input method pr_ctlinput was used notify the protocols about two
netinet*: remove PRC_ constants and streamline ICMP processing
In the original design of the network stack from the protocol control input method pr_ctlinput was used notify the protocols about two very different kinds of events: internal system events and receival of an ICMP messages from outside. These events were coded with PRC_ codes. Today these methods are removed from the protosw(9) and are isolated to IPv4 and IPv6 stacks and are called only from icmp*_input(). The PRC_ codes now just create a shim layer between ICMP codes and errors or actions taken by protocols.
- Change ipproto_ctlinput_t to pass just pointer to ICMP header. This allows protocols to not deduct it from the internal IP header. - Change ip6proto_ctlinput_t to pass just struct ip6ctlparam pointer. It has all the information needed to the protocols. In the structure, change ip6c_finaldst fields to sockaddr_in6. The reason is that icmp6_input() already has this address wrapped in sockaddr, and the protocols want this address as sockaddr. - For UDP tunneling control input, as well as for IPSEC control input, change the prototypes to accept a transparent union of either ICMP header pointer or struct ip6ctlparam pointer. - In icmp_input() and icmp6_input() do only validation of ICMP header and count bad packets. The translation of ICMP codes to errors/actions is done by protocols. - Provide icmp_errmap() and icmp6_errmap() as substitute to inetctlerrmap, inet6ctlerrmap arrays. - In protocol ctlinput methods either trust what icmp_errmap() recommend, or do our own logic based on the ICMP header.
Differential revision: https://reviews.freebsd.org/D36731
show more ...
|
#
43d39ca7 |
| 04-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netinet*: de-void control input IP protocol methods
After decoupling of protosw(9) and IP wire protocols in 78b1fc05b205 for IPv4 we got vector ip_ctlprotox[] that is executed only and only from icm
netinet*: de-void control input IP protocol methods
After decoupling of protosw(9) and IP wire protocols in 78b1fc05b205 for IPv4 we got vector ip_ctlprotox[] that is executed only and only from icmp_input() and respectively for IPv6 we got ip6_ctlprotox[] executed only and only from icmp6_input(). This allows to use protocol specific argument types in these methods instead of struct sockaddr and void.
Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36727
show more ...
|
#
46ddeb6b |
| 04-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netinet6: retire ip6protosw.h
The netinet/ipprotosw.h and netinet6/ip6protosw.h were KAME relics, with the former removed in f0ffb944d25 in 2001 and the latter survived until today. It has been red
netinet6: retire ip6protosw.h
The netinet/ipprotosw.h and netinet6/ip6protosw.h were KAME relics, with the former removed in f0ffb944d25 in 2001 and the latter survived until today. It has been reduced down to only one useful declaration that moves to ip6_var.h
Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36726
show more ...
|
#
74ed2e8a |
| 02-Sep-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
raw ip: fix regression with multicast and RSVP
With 61f7427f02a raw sockets protosw has wildcard pr_protocol. Protocol of a specific pcb is stored in inp_ip_p.
Reviewed by: karels Reported by: k
raw ip: fix regression with multicast and RSVP
With 61f7427f02a raw sockets protosw has wildcard pr_protocol. Protocol of a specific pcb is stored in inp_ip_p.
Reviewed by: karels Reported by: karels Differential revision: https://reviews.freebsd.org/D36429 Fixes: 61f7427f02a307d28af674a12c45dd546e3898e4
show more ...
|
#
61f7427f |
| 31-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: cleanup protocols that existed merely to provide pr_input
Since 4.4BSD the protosw was used to implement socket types created by socket(2) syscall and at the same to demultiplex incoming IP
protosw: cleanup protocols that existed merely to provide pr_input
Since 4.4BSD the protosw was used to implement socket types created by socket(2) syscall and at the same to demultiplex incoming IPv4 datagrams (later copied to IPv6). This story ended with 78b1fc05b20.
These entries (e.g. IPPROTO_ICMP) in inetsw that were added to catch packets in ip_input(), they would also be returned by pffindproto() if user says socket(AF_INET, SOCK_RAW, IPPROTO_ICMP). Thus, for raw sockets to work correctly, all the entries were pointing at raw_usrreq differentiating only in the value of pr_protocol.
With 78b1fc05b20 all these entries are no longer needed, as ip_protox is independent of protosw. Any socket syscall requesting SOCK_RAW type would end up with rip_protosw. And this protosw has its pr_protocol set to 0, allowing to mark socket with any protocol.
For IPv6 raw socket the change required two small fixes: o Validate user provided protocol value o Always use protocol number stored in inp in rip6_attach, instead of protosw value, which is now always 0.
Differential revision: https://reviews.freebsd.org/D36380
show more ...
|
#
e7d02be1 |
| 17-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: refactor protosw and domain static declaration and load
o Assert that every protosw has pr_attach. Now this structure is only for socket protocols declarations and nothing else. o Merge
protosw: refactor protosw and domain static declaration and load
o Assert that every protosw has pr_attach. Now this structure is only for socket protocols declarations and nothing else. o Merge struct pr_usrreqs into struct protosw. This was suggested in 1996 by wollman@ (see 7b187005d18ef), and later reiterated in 2006 by rwatson@ (see 6fbb9cf860dcd). o Make struct domain hold a variable sized array of protosw pointers. For most protocols these pointers are initialized statically. Those domains that may have loadable protocols have spacers. IPv4 and IPv6 have 8 spacers each (andre@ dff3237ee54ea). o For inetsw and inet6sw leave a comment noting that many protosw entries very likely are dead code. o Refactor pf_proto_[un]register() into protosw_[un]register(). o Isolate pr_*_notsupp() methods into uipc_domain.c
Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36232
show more ...
|
#
e0b40500 |
| 11-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
raw ip6: merge rip6_output() into rip6_send()
While here remove some code that was compat legacy back in 2005, added in a1f7e5f8ee7fe.
Reviewed by: melifaro Differential revision: https://reviews.
raw ip6: merge rip6_output() into rip6_send()
While here remove some code that was compat legacy back in 2005, added in a1f7e5f8ee7fe.
Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36128
show more ...
|
#
c7a62c92 |
| 10-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: gather v4/v6 handling code into in_pcballoc() from protocols
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D36062
|
#
a14465e1 |
| 14-Jun-2022 |
Mark Johnston <markj@FreeBSD.org> |
rip6: Fix a lock order reversal in rip6_bind()
See also commit 71a1539e3783.
Reported by: syzbot+9b461b6a07a83cc10daa@syzkaller.appspotmail.com Reported by: syzbot+b6ce0aec16f5fdab3282@syzkaller.ap
rip6: Fix a lock order reversal in rip6_bind()
See also commit 71a1539e3783.
Reported by: syzbot+9b461b6a07a83cc10daa@syzkaller.appspotmail.com Reported by: syzbot+b6ce0aec16f5fdab3282@syzkaller.appspotmail.com Reviewed by: glebius MFC after: 1 week Sponsored by: The FreeBSD Foundation Differential Revision: https://reviews.freebsd.org/D35472
show more ...
|
Revision tags: release/13.1.0 |
|
#
a98bb75f |
| 14-Apr-2022 |
John Baldwin <jhb@FreeBSD.org> |
netinet6: Use __diagused for variables only used in KASSERT().
|
#
71a1539e |
| 16-Dec-2021 |
Mateusz Guzik <mjg@FreeBSD.org> |
inet6: fix a LOR between rip and rawinp
Running sys/netpfil/pf/fragmentation v6 results in:
lock order reversal: 1st 0xfffffe00050429a8 rip (rip, sleep mutex) @ /usr/src/sys/netinet6/raw_ip6.c:803
inet6: fix a LOR between rip and rawinp
Running sys/netpfil/pf/fragmentation v6 results in:
lock order reversal: 1st 0xfffffe00050429a8 rip (rip, sleep mutex) @ /usr/src/sys/netinet6/raw_ip6.c:803 2nd 0xfffff8009491e1d0 rawinp (rawinp, rw) @ /usr/src/sys/netinet6/raw_ip6.c:804 lock order rawinp -> rip established at: 0xffffffff8068e26a at witness_lock_order_add+0x28a 0xffffffff8068d087 at witness_checkorder+0x627 0xffffffff805a9f05 at __mtx_lock_flags+0x205 0xffffffff808102e4 at in_pcballoc+0x204 0xffffffff808d53c6 at rip6_attach+0x116 0xffffffff806dc4e8 at socreate+0x368 0xffffffff806eaedc at kern_socket+0xfc 0xffffffff806eadcd at sys_socket+0x2d 0xffffffff80abc774 at syscallenter+0x5c4 0xffffffff80abbeeb at amd64_syscall+0x1b 0xffffffff80a8044b at fast_syscall_common+0xf8 lock order rip -> rawinp attempted at: 0xffffffff8068dc2a at witness_checkorder+0x11ca 0xffffffff805d1b7f at _rw_wlock_cookie+0x18f 0xffffffff808d596c at rip6_connect+0x19c 0xffffffff806e0842 at soconnectat+0x142 0xffffffff806ebe36 at kern_connectat+0x136 0xffffffff806ebcdf at sys_connect+0x4f 0xffffffff80abc774 at syscallenter+0x5c4 0xffffffff80abbeeb at amd64_syscall+0x1b 0xffffffff80a8044b at fast_syscall_common+0xf8
Reviewed by: glebius Fixes: de2d47842e880281 ("SMR protection for inpcbs") Sponsored by: Rubicon Communications, LLC ("Netgate") Differential Revision: https://reviews.freebsd.org/D33508
show more ...
|
#
db0ac6de |
| 02-Dec-2021 |
Cy Schubert <cy@FreeBSD.org> |
Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.
A mism
Revert "wpa: Import wpa_supplicant/hostapd commit 14ab4a816"
This reverts commit 266f97b5e9a7958e365e78288616a459b40d924a, reversing changes made to a10253cffea84c0c980a36ba6776b00ed96c3e3b.
A mismerge of a merge to catch up to main resulted in files being committed which should not have been.
show more ...
|
#
266f97b5 |
| 02-Dec-2021 |
Cy Schubert <cy@FreeBSD.org> |
wpa: Import wpa_supplicant/hostapd commit 14ab4a816
This is the November update to vendor/wpa committed upstream 2021-11-26.
MFC after: 1 month
|
#
de2d4784 |
| 02-Dec-2021 |
Gleb Smirnoff <glebius@FreeBSD.org> |
SMR protection for inpcbs
With introduction of epoch(9) synchronization to network stack the inpcb database became protected by the network epoch together with static network data (interfaces, addre
SMR protection for inpcbs
With introduction of epoch(9) synchronization to network stack the inpcb database became protected by the network epoch together with static network data (interfaces, addresses, etc). However, inpcb aren't static in nature, they are created and destroyed all the time, which creates some traffic on the epoch(9) garbage collector.
Fairly new feature of uma(9) - Safe Memory Reclamation allows to safely free memory in page-sized batches, with virtually zero overhead compared to uma_zfree(). However, unlike epoch(9), it puts stricter requirement on the access to the protected memory, needing the critical(9) section to access it. Details:
- The database is already build on CK lists, thanks to epoch(9). - For write access nothing is changed. - For a lookup in the database SMR section is now required. Once the desired inpcb is found we need to transition from SMR section to r/w lock on the inpcb itself, with a check that inpcb isn't yet freed. This requires some compexity, since SMR section itself is a critical(9) section. The complexity is hidden from KPI users in inp_smr_lock(). - For a inpcb list traversal (a pcblist sysctl, or broadcast notification) also a new KPI is provided, that hides internals of the database - inp_next(struct inp_iterator *).
Reviewed by: rrs Differential revision: https://reviews.freebsd.org/D33022
show more ...
|
Revision tags: release/12.3.0 |
|
#
7045b160 |
| 28-Jul-2021 |
Roy Marples <roy@marples.name> |
socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they
socket: Implement SO_RERROR
SO_RERROR indicates that receive buffer overflows should be handled as errors. Historically receive buffer overflows have been ignored and programs could not tell if they missed messages or messages had been truncated because of overflows. Since programs historically do not expect to get receive overflow errors, this behavior is not the default.
This is really really important for programs that use route(4) to keep in sync with the system. If we loose a message then we need to reload the full system state, otherwise the behaviour from that point is undefined and can lead to chasing bogus bug reports.
Reviewed by: philip (network), kbowling (transport), gbe (manpages) MFC after: 2 weeks Differential Revision: https://reviews.freebsd.org/D26652
show more ...
|
#
2290dfb4 |
| 19-May-2021 |
Ryan Stone <rstone@FreeBSD.org> |
Enter the net epoch before calling ip6_setpktopts
ip6_setpktopts() can look up ifnets via ifnet_by_index(), which is only safe in the net epoch. Ensure that callers are in the net epoch before call
Enter the net epoch before calling ip6_setpktopts
ip6_setpktopts() can look up ifnets via ifnet_by_index(), which is only safe in the net epoch. Ensure that callers are in the net epoch before calling this function.
Sponsored by: Dell EMC Isilon MFC after: 4 weeks Reviewed by: donner, kp Differential Revision: https://reviews.freebsd.org/D30630
show more ...
|