#
70703aa9 |
| 03-Mar-2025 |
acazuc <acazuc@acazuc.fr> |
netinet: allow per protocol random IP id control, single out IPSEC
A globally enabled random IP id generation maybe useful in most IP contexts, but it may be unnecessary in the case of IPsec encapsu
netinet: allow per protocol random IP id control, single out IPSEC
A globally enabled random IP id generation maybe useful in most IP contexts, but it may be unnecessary in the case of IPsec encapsulated packets because IPsec can be configured to use anti-replay windows.
This commit adds a new net.inet.ipsec.random_id sysctl to control whether or not IPsec packets should use random IP id generation.
Rest of the protocols/modules are still controlled by the global net.inet.ip.random_id, but can be easily augmented with a knob.
Reviewed by: glebius Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D49164
show more ...
|
Revision tags: release/14.2.0-p2, release/14.1.0-p8, release/13.4.0-p4 |
|
#
4009a98f |
| 06-Feb-2025 |
Mark Johnston <markj@FreeBSD.org> |
rawip: Add a bind_all_fibs sysctl
As with net.inet.{tcp,udp}.bind_all_fibs, this causes raw sockets to accept only packets from the same FIB.
Reviewed by: glebius Sponsored by: Klara, Inc. Sponsore
rawip: Add a bind_all_fibs sysctl
As with net.inet.{tcp,udp}.bind_all_fibs, this causes raw sockets to accept only packets from the same FIB.
Reviewed by: glebius Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48707
show more ...
|
#
caccbaef |
| 06-Feb-2025 |
Mark Johnston <markj@FreeBSD.org> |
socket: Move SO_SETFIB handling to protocol layers
In particular, we store a FIB number in both struct socket and in struct inpcb. When updating the FIB number with setsockopt(SO_SETFIB), make the
socket: Move SO_SETFIB handling to protocol layers
In particular, we store a FIB number in both struct socket and in struct inpcb. When updating the FIB number with setsockopt(SO_SETFIB), make the update atomic. This is required to support the new bind_all_fibs mode, since in that mode changing the FIB of a bound socket is not permitted.
This requires a bit more code, but avoids a layering violation in sosetopt(), where we hard-code the list of protocol families that implement SO_SETFIB.
Reviewed by: glebius MFC after: 2 weeks Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48666
show more ...
|
Revision tags: release/14.1.0-p7, release/14.2.0-p1, release/13.4.0-p3 |
|
#
fd94571c |
| 07-Jan-2025 |
Mark Johnston <markj@FreeBSD.org> |
rawip: Take the inpcb lock when appropriate in rip_ctloutput()
Reviewed by: glebius MFC after: 1 week Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freeb
rawip: Take the inpcb lock when appropriate in rip_ctloutput()
Reviewed by: glebius MFC after: 1 week Sponsored by: Klara, Inc. Sponsored by: Stormshield Differential Revision: https://reviews.freebsd.org/D48344
show more ...
|
Revision tags: release/14.2.0, release/13.4.0, release/14.1.0, release/13.3.0 |
|
#
ce69e373 |
| 03-Feb-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Revert "sockets: retire sorflush()"
Provide a comment in sorflush() why the socket I/O sx(9) lock is actually important.
This reverts commit 507f87a799cf0811ce30f0ae7f10ba19b2fd3db3.
|
#
507f87a7 |
| 16-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: retire sorflush()
With removal of dom_dispose method the function boils down to two meaningful function calls: socantrcvmore() and sbrelease(). The latter is only relevant for protocols th
sockets: retire sorflush()
With removal of dom_dispose method the function boils down to two meaningful function calls: socantrcvmore() and sbrelease(). The latter is only relevant for protocols that use generic socket buffers.
The socket I/O sx(9) lock acquisition in sorflush() is not relevant for shutdown(2) operation as it doesn't do any I/O that may interleave with read(2) or write(2). The socket buffer mutex acquisition inside sbrelease() is what guarantees thread safety. This sx(9) acquisition in soshutdown() can be tracked down to 4.4BSD times, where it used to be sblock(), and it was carried over through the years evolving together with sockets with no reconsideration of why do we carry it over. I can't tell if that sblock() made sense back then, but it doesn't make any today.
Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D43415
show more ...
|
#
5bba2728 |
| 16-Jan-2024 |
Gleb Smirnoff <glebius@FreeBSD.org> |
sockets: make pr_shutdown fully protocol specific method
Disassemble a one-for-all soshutdown() into protocol specific methods. This creates a small amount of copy & paste, but makes code a lot more
sockets: make pr_shutdown fully protocol specific method
Disassemble a one-for-all soshutdown() into protocol specific methods. This creates a small amount of copy & paste, but makes code a lot more self documented, as protocol specific method would execute only the code that is relevant to that protocol and nothing else. This also fixes a couple recent regressions and reduces risk of future regressions. The extended KPI for the new pr_shutdown removes need for the extra pr_flush which was added for the sake of SCTP which could not perform its shutdown properly with the old one. Particularly for SCTP this change streamlines a lot of code.
Some notes on why certain parts of code were copied or were not to certain protocols: * The (SS_ISCONNECTED | SS_ISCONNECTING | SS_ISDISCONNECTING) check is needed only for those protocols that may be connected or disconnected. * The above reduces into only SS_ISCONNECTED for those protocols that always connect instantly. * The ENOTCONN and continue processing hack is left only for datagram protocols. * The SOLISTENING(so) block is copied to those protocols that listen(2). * sorflush() on SHUT_RD is copied almost to every protocol, but that will be refactored later. * wakeup(&so->so_timeo) is copied to protocols that can make a non-instant connect(2), can SO_LINGER or can accept(2).
There are three protocols (netgraph(4), Bluetooth, SDP) that did not have pr_shutdown, but old soshutdown() would still perform sorflush() on SHUT_RD for them and also wakeup(9). Those protocols partially supported shutdown(2) returning EOPNOTSUP for SHUT_WR/SHUT_RDWR, now they fully lost shutdown(2) support. I'm pretty sure netgraph(4) and Bluetooth are okay about that and SDP is almost abandoned anyway.
Reviewed by: tuexen Differential Revision: https://reviews.freebsd.org/D43413
show more ...
|
#
a13039e2 |
| 27-Dec-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: reoder inpcb destruction
First, merge in_pcbdetach() with in_pcbfree(). The comment for in_pcbdetach() was no longer correct. Then, make sure we remove the inpcb from the hash before we com
inpcb: reoder inpcb destruction
First, merge in_pcbdetach() with in_pcbfree(). The comment for in_pcbdetach() was no longer correct. Then, make sure we remove the inpcb from the hash before we commit any destructive actions on it. There are couple functions that rely on the hash lock skipping SMR + inpcb lock to lookup an inpcb. Although there are no known functions that similarly rely on the global inpcb list lock, also do list removal before destructive actions.
PR: 273890 Reviewed by: markj Differential Revision: https://reviews.freebsd.org/D43122
show more ...
|
#
29363fb4 |
| 23-Nov-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl s
sys: Remove ancient SCCS tags.
Remove ancient SCCS tags from the tree, automated scripting, with two minor fixup to keep things compiling. All the common forms in the tree were removed with a perl script.
Sponsored by: Netflix
show more ...
|
Revision tags: release/14.0.0 |
|
#
685dc743 |
| 16-Aug-2023 |
Warner Losh <imp@FreeBSD.org> |
sys: Remove $FreeBSD$: one-line .c pattern
Remove /^[\s*]*__FBSDID\("\$FreeBSD\$"\);?\s*\n/
|
#
fdb987be |
| 20-Apr-2023 |
Mark Johnston <markj@FreeBSD.org> |
inpcb: Split PCB hash tables
Currently we use a single hash table per PCB database for connected and bound PCBs. Since we started using net_epoch to synchronize hash table lookups, there's been a b
inpcb: Split PCB hash tables
Currently we use a single hash table per PCB database for connected and bound PCBs. Since we started using net_epoch to synchronize hash table lookups, there's been a bug, noted in a comment above in_pcbrehash(): connecting a socket can cause an inpcb to move between hash chains, and this can cause a concurrent lookup to follow the wrong linkage pointers. I believe this could cause rare, spurious ECONNREFUSED errors in the worse case.
Address the problem by introducing a second hash table and adding more linkage pointers to struct inpcb. Now the database has one table each for connected and unconnected sockets.
When inserting an inpcb into the hash table, in_pcbinhash() now looks at the foreign address of the inpcb to figure out which table to use. This ensures that queue linkage pointers are stable until the socket is disconnected, so the problem described above goes away. There is also a small benefit in that in_pcblookup_*() can now search just one of the two possible hash buckets.
I also made the "rehash" parameter of in(6)_pcbconnect() unused. This parameter seems confusing and it is simpler to let the inpcb code figure out what to do using the existing INP_INHASHLIST flag.
UDP sockets pose a special problem since they can be connected and disconnected multiple times during their lifecycle. To handle this, the patch plugs a hole in the inpcb structure and uses it to store an SMR sequence number. When an inpcb is disconnected - an operation which requires the global PCB database hash lock - the write sequence number is advanced, and in order to reconnect, the connecting thread must wait for readers to drain before reusing the inpcb's hash chain linkage pointers.
raw_ip (ab)uses the hash table without using the corresponding accessors. Since there are now two hash tables, it arbitrarily uses the "connected" table for all of its PCBs. This will be addressed in some way in the future.
inp interators which specify a hash bucket will only visit connected PCBs. This is not really correct, but nothing in the tree uses that functionality except raw_ip, which as mentioned above places all of its PCBs in the "connected" table and so is unaffected.
Discussed with: glebius Tested by: glebius Sponsored by: Klara, Inc. Sponsored by: Modirum MDPay Differential Revision: https://reviews.freebsd.org/D38569
show more ...
|
Revision tags: release/13.2.0 |
|
#
7fc82fd1 |
| 03-Mar-2023 |
Gleb Smirnoff <glebius@FreeBSD.org> |
ipfw: garbage collect ip_fw_chk_ptr
It is a relict left from the old times when ipfw(4) was hooked into IP stack directly, without pfil(9).
|
#
0aa120d5 |
| 02-Dec-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: allow to provide protocol specific pcb size
The protocol specific structure shall start with inpcb.
Differential revision: https://reviews.freebsd.org/D37126
|
Revision tags: release/12.4.0 |
|
#
fcb3f813 |
| 04-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netinet*: remove PRC_ constants and streamline ICMP processing
In the original design of the network stack from the protocol control input method pr_ctlinput was used notify the protocols about two
netinet*: remove PRC_ constants and streamline ICMP processing
In the original design of the network stack from the protocol control input method pr_ctlinput was used notify the protocols about two very different kinds of events: internal system events and receival of an ICMP messages from outside. These events were coded with PRC_ codes. Today these methods are removed from the protosw(9) and are isolated to IPv4 and IPv6 stacks and are called only from icmp*_input(). The PRC_ codes now just create a shim layer between ICMP codes and errors or actions taken by protocols.
- Change ipproto_ctlinput_t to pass just pointer to ICMP header. This allows protocols to not deduct it from the internal IP header. - Change ip6proto_ctlinput_t to pass just struct ip6ctlparam pointer. It has all the information needed to the protocols. In the structure, change ip6c_finaldst fields to sockaddr_in6. The reason is that icmp6_input() already has this address wrapped in sockaddr, and the protocols want this address as sockaddr. - For UDP tunneling control input, as well as for IPSEC control input, change the prototypes to accept a transparent union of either ICMP header pointer or struct ip6ctlparam pointer. - In icmp_input() and icmp6_input() do only validation of ICMP header and count bad packets. The translation of ICMP codes to errors/actions is done by protocols. - Provide icmp_errmap() and icmp6_errmap() as substitute to inetctlerrmap, inet6ctlerrmap arrays. - In protocol ctlinput methods either trust what icmp_errmap() recommend, or do our own logic based on the ICMP header.
Differential revision: https://reviews.freebsd.org/D36731
show more ...
|
#
43d39ca7 |
| 04-Oct-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netinet*: de-void control input IP protocol methods
After decoupling of protosw(9) and IP wire protocols in 78b1fc05b205 for IPv4 we got vector ip_ctlprotox[] that is executed only and only from icm
netinet*: de-void control input IP protocol methods
After decoupling of protosw(9) and IP wire protocols in 78b1fc05b205 for IPv4 we got vector ip_ctlprotox[] that is executed only and only from icmp_input() and respectively for IPv6 we got ip6_ctlprotox[] executed only and only from icmp6_input(). This allows to use protocol specific argument types in these methods instead of struct sockaddr and void.
Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36727
show more ...
|
#
74ed2e8a |
| 02-Sep-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
raw ip: fix regression with multicast and RSVP
With 61f7427f02a raw sockets protosw has wildcard pr_protocol. Protocol of a specific pcb is stored in inp_ip_p.
Reviewed by: karels Reported by: k
raw ip: fix regression with multicast and RSVP
With 61f7427f02a raw sockets protosw has wildcard pr_protocol. Protocol of a specific pcb is stored in inp_ip_p.
Reviewed by: karels Reported by: karels Differential revision: https://reviews.freebsd.org/D36429 Fixes: 61f7427f02a307d28af674a12c45dd546e3898e4
show more ...
|
#
61f7427f |
| 31-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: cleanup protocols that existed merely to provide pr_input
Since 4.4BSD the protosw was used to implement socket types created by socket(2) syscall and at the same to demultiplex incoming IP
protosw: cleanup protocols that existed merely to provide pr_input
Since 4.4BSD the protosw was used to implement socket types created by socket(2) syscall and at the same to demultiplex incoming IPv4 datagrams (later copied to IPv6). This story ended with 78b1fc05b20.
These entries (e.g. IPPROTO_ICMP) in inetsw that were added to catch packets in ip_input(), they would also be returned by pffindproto() if user says socket(AF_INET, SOCK_RAW, IPPROTO_ICMP). Thus, for raw sockets to work correctly, all the entries were pointing at raw_usrreq differentiating only in the value of pr_protocol.
With 78b1fc05b20 all these entries are no longer needed, as ip_protox is independent of protosw. Any socket syscall requesting SOCK_RAW type would end up with rip_protosw. And this protosw has its pr_protocol set to 0, allowing to mark socket with any protocol.
For IPv6 raw socket the change required two small fixes: o Validate user provided protocol value o Always use protocol number stored in inp in rip6_attach, instead of protosw value, which is now always 0.
Differential revision: https://reviews.freebsd.org/D36380
show more ...
|
#
e7d02be1 |
| 17-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: refactor protosw and domain static declaration and load
o Assert that every protosw has pr_attach. Now this structure is only for socket protocols declarations and nothing else. o Merge
protosw: refactor protosw and domain static declaration and load
o Assert that every protosw has pr_attach. Now this structure is only for socket protocols declarations and nothing else. o Merge struct pr_usrreqs into struct protosw. This was suggested in 1996 by wollman@ (see 7b187005d18ef), and later reiterated in 2006 by rwatson@ (see 6fbb9cf860dcd). o Make struct domain hold a variable sized array of protosw pointers. For most protocols these pointers are initialized statically. Those domains that may have loadable protocols have spacers. IPv4 and IPv6 have 8 spacers each (andre@ dff3237ee54ea). o For inetsw and inet6sw leave a comment noting that many protosw entries very likely are dead code. o Refactor pf_proto_[un]register() into protosw_[un]register(). o Isolate pr_*_notsupp() methods into uipc_domain.c
Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36232
show more ...
|
#
78b1fc05 |
| 17-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
protosw: separate pr_input and pr_ctlinput out of protosw
The protosw KPI historically has implemented two quite orthogonal things: protocols that implement a certain kind of socket, and protocols t
protosw: separate pr_input and pr_ctlinput out of protosw
The protosw KPI historically has implemented two quite orthogonal things: protocols that implement a certain kind of socket, and protocols that are IPv4/IPv6 protocol. These two things do not make one-to-one correspondence. The pr_input and pr_ctlinput methods were utilized only in IP protocols. This strange duality required IP protocols that doesn't have a socket to declare protosw, e.g. carp(4). On the other hand developers of socket protocols thought that they need to define pr_input/pr_ctlinput always, which lead to strange dead code, e.g. div_input() or sdp_ctlinput().
With this change pr_input and pr_ctlinput as part of protosw disappear and IPv4/IPv6 get their private single level protocol switch table ip_protox[] and ip6_protox[] respectively, pointing at array of ipproto_input_t functions. The pr_ctlinput that was used for control input coming from the network (ICMP, ICMPv6) is now represented by ip_ctlprotox[] and ip6_ctlprotox[].
ipproto_register() becomes the only official way to register in the table. Those protocols that were always static and unlikely anybody is interested in making them loadable, are now registered by ip_init(), ip6_init(). An IP protocol that considers itself unloadable shall register itself within its own private SYSINIT().
Reviewed by: tuexen, melifaro Differential revision: https://reviews.freebsd.org/D36157
show more ...
|
#
3d2041c0 |
| 11-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
raw ip: merge rip_output() into rip_send()
While here, address the unlocked 'dst' read. Solve that by storing a pointer either to the inpcb or to the sockaddr. If we end up copying address out of
raw ip: merge rip_output() into rip_send()
While here, address the unlocked 'dst' read. Solve that by storing a pointer either to the inpcb or to the sockaddr. If we end up copying address out of the inpcb, that would be done under the read lock section.
Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36127
show more ...
|
#
b8103ca7 |
| 11-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
netinet: get interface event notifications directly via EVENTHANDLER(9)
The old mechanism of getting them via domains/protocols control input is a relict from the previous century, when nothing like
netinet: get interface event notifications directly via EVENTHANDLER(9)
The old mechanism of getting them via domains/protocols control input is a relict from the previous century, when nothing like EVENTHANDLER(9) existed yet. Retire PRC_IFDOWN/PRC_IFUP as netinet was the only one to use them.
Reviewed by: melifaro Differential revision: https://reviews.freebsd.org/D36116
show more ...
|
#
c7a62c92 |
| 10-Aug-2022 |
Gleb Smirnoff <glebius@FreeBSD.org> |
inpcb: gather v4/v6 handling code into in_pcballoc() from protocols
Reviewed by: rrs, tuexen Differential revision: https://reviews.freebsd.org/D36062
|
Revision tags: release/13.1.0 |
|
#
9f70c04d |
| 01-Mar-2022 |
Mark Johnston <markj@FreeBSD.org> |
rip: Fix a -Wunused-but-set-variable warning
Fixes: 81728a538d24 ("Split rtinit() into multiple functions.") Reviewed by: imp, melifaro MFC after: 1 week Differential Revision: https://reviews.free
rip: Fix a -Wunused-but-set-variable warning
Fixes: 81728a538d24 ("Split rtinit() into multiple functions.") Reviewed by: imp, melifaro MFC after: 1 week Differential Revision: https://reviews.freebsd.org/D34395
show more ...
|
#
77223d98 |
| 25-Jan-2022 |
Wojciech Macek <wma@FreeBSD.org> |
ip_mroute: refactor epoch-basd locking
Remove duplicated epoch_enter and epoch_exit in IP inp/outp routines. Remove unnecessary macros as well.
Obtained from: Semihalf Spponsored by: Stormshield
ip_mroute: refactor epoch-basd locking
Remove duplicated epoch_enter and epoch_exit in IP inp/outp routines. Remove unnecessary macros as well.
Obtained from: Semihalf Spponsored by: Stormshield Reviewed by: glebius Differential revision: https://reviews.freebsd.org/D34030
show more ...
|
#
889c6050 |
| 22-Jan-2022 |
Wojciech Macek <wma@FreeBSD.org> |
ip_mroute: release epoch lock if mrouter is not configured
Add mising "else" branch to release a lock if mrouter is not configured.
Obtained from: Semihalf Sponsored by: Stormshield
|