xref: /freebsd/share/man/man4/tcp.4 (revision 79d939126a819f31882343b5733c2073452a4218)
1.\" Copyright (c) 1983, 1991, 1993
2.\"	The Regents of the University of California.
3.\" Copyright (c) 2010-2011 The FreeBSD Foundation
4.\" All rights reserved.
5.\"
6.\" Portions of this documentation were written at the Centre for Advanced
7.\" Internet Architectures, Swinburne University of Technology, Melbourne,
8.\" Australia by David Hayes under sponsorship from the FreeBSD Foundation.
9.\"
10.\" Redistribution and use in source and binary forms, with or without
11.\" modification, are permitted provided that the following conditions
12.\" are met:
13.\" 1. Redistributions of source code must retain the above copyright
14.\"    notice, this list of conditions and the following disclaimer.
15.\" 2. Redistributions in binary form must reproduce the above copyright
16.\"    notice, this list of conditions and the following disclaimer in the
17.\"    documentation and/or other materials provided with the distribution.
18.\" 3. Neither the name of the University nor the names of its contributors
19.\"    may be used to endorse or promote products derived from this software
20.\"    without specific prior written permission.
21.\"
22.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
23.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
24.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
25.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
26.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
27.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
28.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
29.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
30.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
31.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
32.\" SUCH DAMAGE.
33.\"
34.\"     From: @(#)tcp.4	8.1 (Berkeley) 6/5/93
35.\" $FreeBSD$
36.\"
37.Dd August 1, 2022
38.Dt TCP 4
39.Os
40.Sh NAME
41.Nm tcp
42.Nd Internet Transmission Control Protocol
43.Sh SYNOPSIS
44.In sys/types.h
45.In sys/socket.h
46.In netinet/in.h
47.In netinet/tcp.h
48.Ft int
49.Fn socket AF_INET SOCK_STREAM 0
50.Sh DESCRIPTION
51The
52.Tn TCP
53protocol provides reliable, flow-controlled, two-way
54transmission of data.
55It is a byte-stream protocol used to
56support the
57.Dv SOCK_STREAM
58abstraction.
59.Tn TCP
60uses the standard
61Internet address format and, in addition, provides a per-host
62collection of
63.Dq "port addresses" .
64Thus, each address is composed
65of an Internet address specifying the host and network,
66with a specific
67.Tn TCP
68port on the host identifying the peer entity.
69.Pp
70Sockets utilizing the
71.Tn TCP
72protocol are either
73.Dq active
74or
75.Dq passive .
76Active sockets initiate connections to passive
77sockets.
78By default,
79.Tn TCP
80sockets are created active; to create a
81passive socket, the
82.Xr listen 2
83system call must be used
84after binding the socket with the
85.Xr bind 2
86system call.
87Only passive sockets may use the
88.Xr accept 2
89call to accept incoming connections.
90Only active sockets may use the
91.Xr connect 2
92call to initiate connections.
93.Pp
94Passive sockets may
95.Dq underspecify
96their location to match
97incoming connection requests from multiple networks.
98This technique, termed
99.Dq "wildcard addressing" ,
100allows a single
101server to provide service to clients on multiple networks.
102To create a socket which listens on all networks, the Internet
103address
104.Dv INADDR_ANY
105must be bound.
106The
107.Tn TCP
108port may still be specified
109at this time; if the port is not specified, the system will assign one.
110Once a connection has been established, the socket's address is
111fixed by the peer entity's location.
112The address assigned to the
113socket is the address associated with the network interface
114through which packets are being transmitted and received.
115Normally, this address corresponds to the peer entity's network.
116.Pp
117.Tn TCP
118supports a number of socket options which can be set with
119.Xr setsockopt 2
120and tested with
121.Xr getsockopt 2 :
122.Bl -tag -width ".Dv TCP_FUNCTION_BLK"
123.It Dv TCP_INFO
124Information about a socket's underlying TCP session may be retrieved
125by passing the read-only option
126.Dv TCP_INFO
127to
128.Xr getsockopt 2 .
129It accepts a single argument: a pointer to an instance of
130.Vt "struct tcp_info" .
131.Pp
132This API is subject to change; consult the source to determine
133which fields are currently filled out by this option.
134.Fx
135specific additions include
136send window size,
137receive window size,
138and
139bandwidth-controlled window space.
140.It Dv TCP_CCALGOOPT
141Set or query congestion control algorithm specific parameters.
142See
143.Xr mod_cc 4
144for details.
145.It Dv TCP_CONGESTION
146Select or query the congestion control algorithm that TCP will use for the
147connection.
148See
149.Xr mod_cc 4
150for details.
151.It Dv TCP_FASTOPEN
152Enable or disable TCP Fast Open (TFO).
153To use this option, the kernel must be built with the
154.Dv TCP_RFC7413
155option.
156.Pp
157This option can be set on the socket either before or after the
158.Xr listen 2
159is invoked.
160Clearing this option on a listen socket after it has been set has no effect on
161existing TFO connections or TFO connections in progress; it only prevents new
162TFO connections from being established.
163.Pp
164For passively-created sockets, the
165.Dv TCP_FASTOPEN
166socket option can be queried to determine whether the connection was established
167using TFO.
168Note that connections that are established via a TFO
169.Tn SYN ,
170but that fall back to using a non-TFO
171.Tn SYN|ACK
172will have the
173.Dv TCP_FASTOPEN
174socket option set.
175.Pp
176In addition to the facilities defined in RFC7413, this implementation supports a
177pre-shared key (PSK) mode of operation in which the TFO server requires the
178client to be in posession of a shared secret in order for the client to be able
179to successfully open TFO connections with the server.
180This is useful, for example, in environments where TFO servers are exposed to
181both internal and external clients and only wish to allow TFO connections from
182internal clients.
183.Pp
184In the PSK mode of operation, the server generates and sends TFO cookies to
185requesting clients as usual.
186However, when validating cookies received in TFO SYNs from clients, the server
187requires the client-supplied cookie to equal
188.Bd -literal -offset left
189SipHash24(key=\fI16-byte-psk\fP, msg=\fIcookie-sent-to-client\fP)
190.Ed
191.Pp
192Multiple concurrent valid pre-shared keys are supported so that time-based
193rolling PSK invalidation policies can be implemented in the system.
194The default number of concurrent pre-shared keys is 2.
195.Pp
196This can be adjusted with the
197.Dv TCP_RFC7413_MAX_PSKS
198kernel option.
199.It Dv TCP_FUNCTION_BLK
200Select or query the set of functions that TCP will use for this connection.
201This allows a user to select an alternate TCP stack.
202The alternate TCP stack must already be loaded in the kernel.
203To list the available TCP stacks, see
204.Va functions_available
205in the
206.Sx MIB (sysctl) Variables
207section further down.
208To list the default TCP stack, see
209.Va functions_default
210in the
211.Sx MIB (sysctl) Variables
212section.
213.It Dv TCP_KEEPINIT
214This
215.Xr setsockopt 2
216option accepts a per-socket timeout argument of
217.Vt "u_int"
218in seconds, for new, non-established
219.Tn TCP
220connections.
221For the global default in milliseconds see
222.Va keepinit
223in the
224.Sx MIB (sysctl) Variables
225section further down.
226.It Dv TCP_KEEPIDLE
227This
228.Xr setsockopt 2
229option accepts an argument of
230.Vt "u_int"
231for the amount of time, in seconds, that the connection must be idle
232before keepalive probes (if enabled) are sent for the connection of this
233socket.
234If set on a listening socket, the value is inherited by the newly created
235socket upon
236.Xr accept 2 .
237For the global default in milliseconds see
238.Va keepidle
239in the
240.Sx MIB (sysctl) Variables
241section further down.
242.It Dv TCP_KEEPINTVL
243This
244.Xr setsockopt 2
245option accepts an argument of
246.Vt "u_int"
247to set the per-socket interval, in seconds, between keepalive probes sent
248to a peer.
249If set on a listening socket, the value is inherited by the newly created
250socket upon
251.Xr accept 2 .
252For the global default in milliseconds see
253.Va keepintvl
254in the
255.Sx MIB (sysctl) Variables
256section further down.
257.It Dv TCP_KEEPCNT
258This
259.Xr setsockopt 2
260option accepts an argument of
261.Vt "u_int"
262and allows a per-socket tuning of the number of probes sent, with no response,
263before the connection will be dropped.
264If set on a listening socket, the value is inherited by the newly created
265socket upon
266.Xr accept 2 .
267For the global default see the
268.Va keepcnt
269in the
270.Sx MIB (sysctl) Variables
271section further down.
272.It Dv TCP_NODELAY
273Under most circumstances,
274.Tn TCP
275sends data when it is presented;
276when outstanding data has not yet been acknowledged, it gathers
277small amounts of output to be sent in a single packet once
278an acknowledgement is received.
279For a small number of clients, such as window systems
280that send a stream of mouse events which receive no replies,
281this packetization may cause significant delays.
282The boolean option
283.Dv TCP_NODELAY
284defeats this algorithm.
285.It Dv TCP_MAXSEG
286By default, a sender- and
287.No receiver- Ns Tn TCP
288will negotiate among themselves to determine the maximum segment size
289to be used for each connection.
290The
291.Dv TCP_MAXSEG
292option allows the user to determine the result of this negotiation,
293and to reduce it if desired.
294.It Dv TCP_NOOPT
295.Tn TCP
296usually sends a number of options in each packet, corresponding to
297various
298.Tn TCP
299extensions which are provided in this implementation.
300The boolean option
301.Dv TCP_NOOPT
302is provided to disable
303.Tn TCP
304option use on a per-connection basis.
305.It Dv TCP_NOPUSH
306By convention, the
307.No sender- Ns Tn TCP
308will set the
309.Dq push
310bit, and begin transmission immediately (if permitted) at the end of
311every user call to
312.Xr write 2
313or
314.Xr writev 2 .
315When this option is set to a non-zero value,
316.Tn TCP
317will delay sending any data at all until either the socket is closed,
318or the internal send buffer is filled.
319.It Dv TCP_MD5SIG
320This option enables the use of MD5 digests (also known as TCP-MD5)
321on writes to the specified socket.
322Outgoing traffic is digested;
323digests on incoming traffic are verified.
324When this option is enabled on a socket, all inbound and outgoing
325TCP segments must be signed with MD5 digests.
326.Pp
327One common use for this in a
328.Fx
329router deployment is to enable
330based routers to interwork with Cisco equipment at peering points.
331Support for this feature conforms to RFC 2385.
332.Pp
333In order for this option to function correctly, it is necessary for the
334administrator to add a tcp-md5 key entry to the system's security
335associations database (SADB) using the
336.Xr setkey 8
337utility.
338This entry can only be specified on a per-host basis at this time.
339.Pp
340If an SADB entry cannot be found for the destination,
341the system does not send any outgoing segments and drops any inbound segments.
342However, during connection negotiation, a non-signed segment will be accepted if
343an SADB entry does not exist between hosts.
344When a non-signed segment is accepted, the established connection is not
345protected with MD5 digests.
346.It Dv TCP_STATS
347Manage collection of connection level statistics using the
348.Xr stats 3
349framework.
350.Pp
351Each dropped segment is taken into account in the TCP protocol statistics.
352.It Dv TCP_TXTLS_ENABLE
353Enable in-kernel Transport Layer Security (TLS) for data written to this
354socket.
355See
356.Xr ktls 4
357for more details.
358.It Dv TCP_TXTLS_MODE
359The integer argument can be used to get or set the current TLS transmit mode
360of a socket.
361See
362.Xr ktls 4
363for more details.
364.It Dv TCP_RXTLS_ENABLE
365Enable in-kernel TLS for data read from this socket.
366See
367.Xr ktls 4
368for more details.
369.It Dv TCP_REUSPORT_LB_NUMA
370Changes NUMA affinity filtering for an established TCP listen
371socket.
372This option takes a single integer argument which specifies
373the NUMA domain to filter on for this listen socket.
374The argument can also have the follwing special values:
375.Bl -tag -width "Dv TCP_REUSPORT_LB_NUMA"
376.It Dv TCP_REUSPORT_LB_NUMA_NODOM
377Remove NUMA filtering for this listen socket.
378.It Dv TCP_REUSPORT_LB_NUMA_CURDOM
379Filter traffic associated with the domain where the calling thread is
380currently executing.
381This is typically used after a process or thread inherits a listen
382socket from its parent, and sets its CPU affinity to a particular core.
383.El
384.It Dv TCP_REMOTE_UDP_ENCAPS_PORT
385Set and get the remote UDP encapsulation port.
386It can only be set on a closed TCP socket.
387.El
388.Pp
389The option level for the
390.Xr setsockopt 2
391call is the protocol number for
392.Tn TCP ,
393available from
394.Xr getprotobyname 3 ,
395or
396.Dv IPPROTO_TCP .
397All options are declared in
398.In netinet/tcp.h .
399.Pp
400Options at the
401.Tn IP
402transport level may be used with
403.Tn TCP ;
404see
405.Xr ip 4 .
406Incoming connection requests that are source-routed are noted,
407and the reverse source route is used in responding.
408.Pp
409The default congestion control algorithm for
410.Tn TCP
411is
412.Xr cc_newreno 4 .
413Other congestion control algorithms can be made available using the
414.Xr mod_cc 4
415framework.
416.Ss MIB (sysctl) Variables
417The
418.Tn TCP
419protocol implements a number of variables in the
420.Va net.inet.tcp
421branch of the
422.Xr sysctl 3
423MIB, which can also be read or modified with
424.Xr sysctl 8 .
425.Bl -tag -width ".Va v6pmtud_blackhole_mss"
426.It Va always_keepalive
427Assume that
428.Dv SO_KEEPALIVE
429is set on all
430.Tn TCP
431connections, the kernel will
432periodically send a packet to the remote host to verify the connection
433is still up.
434.It Va blackhole
435If enabled, disable sending of RST when a connection is attempted
436to a port where there is no socket accepting connections.
437See
438.Xr blackhole 4 .
439.It Va blackhole_local
440See
441.Xr blackhole 4 .
442.It Va cc
443A number of variables for congestion control are under the
444.Va net.inet.tcp.cc
445node.
446See
447.Xr mod_cc 4 .
448.It Va cc.newreno
449Variables for NewReno congestion control are under the
450.Va net.inet.tcp.cc.newreno
451node.
452See
453.Xr cc_newreno 4 .
454.It Va delacktime
455Maximum amount of time, in milliseconds, before a delayed ACK is sent.
456.It Va delayed_ack
457Delay ACK to try and piggyback it onto a data packet or another ACK.
458.It Va do_lrd
459Enable Lost Retransmission Detection for SACK-enabled sessions, disabled by
460default.
461Under severe congestion, a retransmission can be lost which then leads to a
462mandatory Retransmission Timeout (RTO), followed by slow-start.
463LRD will try to resend the repeatedly lost packet, preventing the time-consuming
464RTO and performance reducing slow-start.
465.It Va do_prr
466Perform SACK loss recovery using the Proportional Rate Reduction (PRR) algorithm
467described in RFC6937.
468This improves the effectiveness of retransmissions particular in environments
469with ACK thinning or burst loss events, as chances to run out of the ACK clock
470are reduced, preventing lengthy and performance reducing RTO based loss recovery
471(default is true).
472.It Va do_prr_conservative
473While doing Proportional Rate Reduction, remain strictly in a packet conserving
474mode, sending only one new packet for each ACK received.
475Helpful when a misconfigured token bucket traffic policer causes persistent
476high losses leading to RTO, but reduces PRR effectiveness in more common settings
477(default is false).
478.It Va do_tcpdrain
479Flush packets in the
480.Tn TCP
481reassembly queue if the system is low on mbufs.
482.It Va drop_synfin
483Drop TCP packets with both SYN and FIN set.
484.It Va ecn.enable
485Enable support for TCP Explicit Congestion Notification (ECN).
486ECN allows a TCP sender to reduce the transmission rate in order to
487avoid packet drops.
488.Bl -tag -compact
489.It 0
490Disable ECN.
491.It 1
492Allow incoming connections to request ECN.
493Outgoing connections will request ECN.
494.It 2
495Allow incoming connections to request ECN.
496Outgoing connections will not request ECN.
497(default)
498.It 3
499Negotiate on incoming connection for Accurate ECN, ECN, or no ECN.
500Outgoing connections will request Accurate ECN and fall back to
501ECN depending on the capabilities of the server.
502.It 4
503Negotiate on incoming connection for Accurate ECN, ECN, or no ECN.
504Outgoing connections will not request ECN.
505.El
506.It Va ecn.maxretries
507Number of retries (SYN or SYN/ACK retransmits) before disabling ECN on a
508specific connection.
509This is needed to help with connection establishment
510when a broken firewall is in the network path.
511.It Va fast_finwait2_recycle
512Recycle
513.Tn TCP
514.Dv FIN_WAIT_2
515connections faster when the socket is marked as
516.Dv SBS_CANTRCVMORE
517(no user process has the socket open, data received on
518the socket cannot be read).
519The timeout used here is
520.Va finwait2_timeout .
521.It Va fastopen.acceptany
522When non-zero, all client-supplied TFO cookies will be considered to be valid.
523The default is 0.
524.It Va fastopen.autokey
525When this and
526.Va net.inet.tcp.fastopen.server_enable
527are non-zero, a new key will be automatically generated after this specified
528seconds.
529The default is 120.
530.It Va fastopen.ccache_bucket_limit
531The maximum number of entries in a client cookie cache bucket.
532The default value can be tuned with the
533.Dv TCP_FASTOPEN_CCACHE_BUCKET_LIMIT_DEFAULT
534kernel option or by setting
535.Va net.inet.tcp.fastopen_ccache_bucket_limit
536in the
537.Xr loader 8 .
538.It Va fastopen.ccache_buckets
539The number of client cookie cache buckets.
540Read-only.
541The value can be tuned with the
542.Dv TCP_FASTOPEN_CCACHE_BUCKETS_DEFAULT
543kernel option or by setting
544.Va fastopen.ccache_buckets
545in the
546.Xr loader 8 .
547.It Va fastopen.ccache_list
548Print the client cookie cache.
549Read-only.
550.It Va fastopen.client_enable
551When zero, no new active (i.e., client) TFO connections can be created.
552On the transition from enabled to disabled, the client cookie cache is cleared
553and disabled.
554The transition from enabled to disabled does not affect any active TFO
555connections in progress; it only prevents new ones from being established.
556The default is 0.
557.It Va fastopen.keylen
558The key length in bytes.
559Read-only.
560.It Va fastopen.maxkeys
561The maximum number of keys supported.
562Read-only,
563.It Va fastopen.maxpsks
564The maximum number of pre-shared keys supported.
565Read-only.
566.It Va fastopen.numkeys
567The current number of keys installed.
568Read-only.
569.It Va fastopen.numpsks
570The current number of pre-shared keys installed.
571Read-only.
572.It Va fastopen.path_disable_time
573When a failure occurs while trying to create a new active (i.e., client) TFO
574connection, new active connections on the same path, as determined by the tuple
575.Brq client_ip, server_ip, server_port ,
576will be forced to be non-TFO for this many seconds.
577Note that the path disable mechanism relies on state stored in client cookie
578cache entries, so it is possible for the disable time for a given path to be
579reduced if the corresponding client cookie cache entry is reused due to resource
580pressure before the disable period has elapsed.
581The default is
582.Dv TCP_FASTOPEN_PATH_DISABLE_TIME_DEFAULT .
583.It Va fastopen.psk_enable
584When non-zero, pre-shared key (PSK) mode is enabled for all TFO servers.
585On the transition from enabled to disabled, all installed pre-shared keys are
586removed.
587The default is 0.
588.It Va fastopen.server_enable
589When zero, no new passive (i.e., server) TFO connections can be created.
590On the transition from enabled to disabled, all installed keys and pre-shared
591keys are removed.
592On the transition from disabled to enabled, if
593.Va fastopen.autokey
594is non-zero and there are no keys installed, a new key will be generated
595immediately.
596The transition from enabled to disabled does not affect any passive TFO
597connections in progress; it only prevents new ones from being established.
598The default is 0.
599.It Va fastopen.setkey
600Install a new key by writing
601.Va net.inet.tcp.fastopen.keylen
602bytes to this sysctl.
603.It Va fastopen.setpsk
604Install a new pre-shared key by writing
605.Va net.inet.tcp.fastopen.keylen
606bytes to this sysctl.
607.It Va finwait2_timeout
608Timeout to use for fast recycling of
609.Tn TCP
610.Dv FIN_WAIT_2
611connections
612.Pq Va fast_finwait2_recycle .
613Defaults to 60 seconds.
614.It Va functions_available
615List of available TCP function blocks (TCP stacks).
616.It Va functions_default
617The default TCP function block (TCP stack).
618.It Va functions_inherit_listen_socket_stack
619Determines whether to inherit listen socket's TCP stack or use the current
620system default TCP stack, as defined by
621.Va functions_default .
622Default is true.
623.It Va hostcache
624The TCP host cache is used to cache connection details and metrics to
625improve future performance of connections between the same hosts.
626At the completion of a TCP connection, a host will cache information
627for the connection for some defined period of time.
628There are a number of
629.Va hostcache
630variables under this node.
631See
632.Va hostcache.enable .
633.It Va hostcache.bucketlimit
634The maximum number of entries for the same hash.
635Defaults to 30.
636.It Va hostcache.cachelimit
637Overall entry limit for hostcache.
638Defaults to
639.Va hashsize
640*
641.Va bucketlimit .
642.It Va hostcache.count
643The current number of entries in the host cache.
644.It Va hostcache.enable
645Enable/disable the host cache:
646.Bl -tag -compact
647.It 0
648Disable the host cache.
649.It 1
650Enable the host cache. (default)
651.El
652.It Va hostcache.expire
653Time in seconds, how long a entry should be kept in the
654host cache since last accessed.
655Defaults to 3600 (1 hour).
656.It Va hostcache.hashsize
657Size of TCP hostcache hashtable.
658This number has to be a power of two, or will be rejected.
659Defaults to 512.
660.It Va hostcache.histo
661Provide a Histogram of the hostcache hash utilization.
662.It Va hostcache.list
663Provide a complete list of all current entries in the host
664cache.
665.It Va hostcache.prune
666Time in seconds between pruning expired host cache entries.
667Defaults to 300 (5 minutes).
668.It Va hostcache.purge
669Expire all entires on next pruning of host cache entries.
670Any non-zero setting will be reset to zero, once the purge
671is running.
672.Bl -tag -compact
673.It 0
674Do not purge all entries when pruning the host cache (default).
675.It 1
676Purge all entries when doing the next pruning.
677.It 2
678Purge all entries and also reseed the hash salt.
679.El
680.It Va hostcache.purgenow
681Immediately purge all entries once set to any value.
682Setting this to 2 will also reseed the hash salt.
683.It Va icmp_may_rst
684Certain
685.Tn ICMP
686unreachable messages may abort connections in
687.Tn SYN-SENT
688state.
689.It Va initcwnd_segments
690Enable the ability to specify initial congestion window in number of segments.
691The default value is 10 as suggested by RFC 6928.
692Changing the value on the fly would not affect connections
693using congestion window from the hostcache.
694Caution:
695This regulates the burst of packets allowed to be sent in the first RTT.
696The value should be relative to the link capacity.
697Start with small values for lower-capacity links.
698Large bursts can cause buffer overruns and packet drops if routers have small
699buffers or the link is experiencing congestion.
700.It Va insecure_rst
701Use criteria defined in RFC793 instead of RFC5961 for accepting RST segments.
702Default is false.
703.It Va insecure_syn
704Use criteria defined in RFC793 instead of RFC5961 for accepting SYN segments.
705Default is false.
706.It Va isn_reseed_interval
707The interval (in seconds) specifying how often the secret data used in
708RFC 1948 initial sequence number calculations should be reseeded.
709By default, this variable is set to zero, indicating that
710no reseeding will occur.
711Reseeding should not be necessary, and will break
712.Dv TIME_WAIT
713recycling for a few minutes.
714.It Va keepcnt
715Number of keepalive probes sent, with no response, before a connection
716is dropped.
717The default is 8 packets.
718.It Va keepidle
719Amount of time, in milliseconds, that the connection must be idle
720before sending keepalive probes (if enabled).
721The default is 7200000 msec (7.2M msec, 2 hours).
722.It Va keepinit
723Timeout, in milliseconds, for new, non-established
724.Tn TCP
725connections.
726The default is 75000 msec (75K msec, 75 sec).
727.It Va keepintvl
728The interval, in milliseconds, between keepalive probes sent to remote
729machines, when no response is received on a
730.Va keepidle
731probe.
732The default is 75000 msec (75K msec, 75 sec).
733.It Va log_in_vain
734Log any connection attempts to ports where there is no socket
735accepting connections.
736The value of 1 limits the logging to
737.Tn SYN
738(connection establishment) packets only.
739A value of 2 results in any
740.Tn TCP
741packets to closed ports being logged.
742Any value not listed above disables the logging
743(default is 0, i.e., the logging is disabled).
744.It Va maxtcptw
745When a TCP connection enters the
746.Dv TIME_WAIT
747state, its associated socket structure is freed, since it is of
748negligible size and use, and a new structure is allocated to contain a
749minimal amount of information necessary for sustaining a connection in
750this state, called the compressed TCP
751.Dv TIME_WAIT
752state.
753Since this structure is smaller than a socket structure, it can save
754a significant amount of system memory.
755The
756.Va net.inet.tcp.maxtcptw
757MIB variable controls the maximum number of these structures allocated.
758By default, it is initialized to
759.Va kern.ipc.maxsockets
760/ 5.
761.It Va minmss
762Minimum TCP Maximum Segment Size; used to prevent a denial of service attack
763from an unreasonably low MSS.
764.It Va msl
765The Maximum Segment Lifetime, in milliseconds, for a packet.
766.It Va mssdflt
767The default value used for the TCP Maximum Segment Size
768.Pq Dq MSS
769for IPv4 when no advice to the contrary is received from MSS negotiation.
770.It Va newcwd
771Enable the New Congestion Window Validation mechanism as described in RFC 7661.
772This gently reduces the congestion window during periods, where TCP is
773application limited and the network bandwidth is not utilized completely.
774That prevents self-inflicted packet losses once the application starts to
775transmit data at a higher speed.
776.It Va nolocaltimewait
777Suppress creation of compressed TCP
778.Dv TIME_WAIT
779states for connections in
780which both endpoints are local.
781.It Va path_mtu_discovery
782Enable Path MTU Discovery.
783.It Va pcbcount
784Number of active process control blocks
785(read-only).
786.It Va perconn_stats_enable
787Controls the default collection of statistics for all connections using the
788.Xr stats 3
789framework.
7900 disables, 1 enables, 2 enables random sampling across log id connection
791groups with all connections in a group receiving the same setting.
792.It Va perconn_stats_sample_rates
793A CSV list of template_spec=percent key-value pairs which controls the per
794template sampling rates when
795.Xr stats 3
796sampling is enabled.
797.It Va persmax
798Maximum persistence interval, msec.
799.It Va persmin
800Minimum persistence interval, msec.
801.It Va pmtud_blackhole_detection
802Enable automatic path MTU blackhole detection.
803In case of retransmits of MSS sized segments,
804the OS will lower the MSS to check if it's an MTU problem.
805If the current MSS is greater than the configured value to try
806.Po Va net.inet.tcp.pmtud_blackhole_mss
807and
808.Va net.inet.tcp.v6pmtud_blackhole_mss
809.Pc ,
810it will be set to this value, otherwise,
811the MSS will be set to the default values
812.Po Va net.inet.tcp.mssdflt
813and
814.Va net.inet.tcp.v6mssdflt
815.Pc .
816Settings:
817.Bl -tag -compact
818.It 0
819Disable path MTU blackhole detection.
820.It 1
821Enable path MTU blackhole detection for IPv4 and IPv6.
822.It 2
823Enable path MTU blackhole detection only for IPv4.
824.It 3
825Enable path MTU blackhole detection only for IPv6.
826.El
827.It Va pmtud_blackhole_mss
828MSS to try for IPv4 if PMTU blackhole detection is turned on.
829.It Va reass.cursegments
830The current total number of segments present in all reassembly queues.
831.It Va reass.maxqueuelen
832The maximum number of segments allowed in each reassembly queue.
833By default, the system chooses a limit based on each TCP connection's
834receive buffer size and maximum segment size (MSS).
835The actual limit applied to a session's reassembly queue will be the lower of
836the system-calculated automatic limit and the user-specified
837.Va reass.maxqueuelen
838limit.
839.It Va reass.maxsegments
840The maximum limit on the total number of segments across all reassembly
841queues.
842The limit can be adjusted as a tunable.
843.It Va recvbuf_auto
844Enable automatic receive buffer sizing as a connection progresses.
845.It Va recvbuf_max
846Maximum size of automatic receive buffer.
847.It Va recvspace
848Initial
849.Tn TCP
850receive window (buffer size).
851.It Va require_unique_port
852Require unique ephemeral port for outgoing connections;
853otherwise, the 4-tuple of local and remote ports and addresses must be unique.
854Requiring a unique port limits the number of outgoing connections.
855.It Va rexmit_drop_options
856Drop TCP options from third and later retransmitted SYN segments
857of a connection.
858.It Va rexmit_initial , rexmit_min , rexmit_slop
859Adjust the retransmit timer calculation for
860.Tn TCP .
861The slop is
862typically added to the raw calculation to take into account
863occasional variances that the
864.Tn SRTT
865(smoothed round-trip time)
866is unable to accommodate, while the minimum specifies an
867absolute minimum.
868While a number of
869.Tn TCP
870RFCs suggest a 1
871second minimum, these RFCs tend to focus on streaming behavior,
872and fail to deal with the fact that a 1 second minimum has severe
873detrimental effects over lossy interactive connections, such
874as a 802.11b wireless link, and over very fast but lossy
875connections for those cases not covered by the fast retransmit
876code.
877For this reason, we use 200ms of slop and a near-0
878minimum, which gives us an effective minimum of 200ms (similar to
879.Tn Linux ) .
880The initial value is used before an RTT measurement has been performed.
881.It Va rfc1323
882Implement the window scaling and timestamp options of RFC 1323/RFC 7323
883(default is true).
884.It Va rfc3042
885Enable the Limited Transmit algorithm as described in RFC 3042.
886It helps avoid timeouts on lossy links and also when the congestion window
887is small, as happens on short transfers.
888.It Va rfc3390
889Enable support for RFC 3390, which allows for a variable-sized
890starting congestion window on new connections, depending on the
891maximum segment size.
892This helps throughput in general, but
893particularly affects short transfers and high-bandwidth large
894propagation-delay connections.
895.It Va rfc6675_pipe
896Deprecated and superseded by
897.Va sack.revised
898.It Va sack.enable
899Enable support for RFC 2018, TCP Selective Acknowledgment option,
900which allows the receiver to inform the sender about all successfully
901arrived segments, allowing the sender to retransmit the missing segments
902only.
903.It Va sack.globalholes
904Global number of TCP SACK holes currently allocated.
905.It Va sack.globalmaxholes
906Maximum number of SACK holes per system, across all connections.
907Defaults to 65536.
908.It Va sack.maxholes
909Maximum number of SACK holes per connection.
910Defaults to 128.
911.It Va sack.revised
912Enables three updated mechanisms from RFC6675 (default is true).
913Calculate the bytes in flight using the algorithm described in RFC 6675, and
914is also an improvement when Proportional Rate Reduction is enabled.
915Next, Rescue Retransmission helps timely loss recovery, when the trailing segments
916of a transmission are lost, while no additional data is ready to be sent.
917In case a partial ACK without a SACK block is received during SACK loss
918recovery, the trailing segment is immediately resent, rather than waiting
919for a Retransmission timeout.
920Finally, SACK loss recovery is also engaged, once two segments plus one byte are
921SACKed - even if no traditional duplicate ACKs were observed.
922.It Va sendbuf_auto
923Enable automatic send buffer sizing.
924.It Va sendbuf_auto_lowat
925Modify threshold for auto send buffer growth to account for
926.Dv SO_SNDLOWAT .
927.It Va sendbuf_inc
928Incrementor step size of automatic send buffer.
929.It Va sendbuf_max
930Maximum size of automatic send buffer.
931.It Va sendspace
932Initial
933.Tn TCP
934send window (buffer size).
935.It Va syncache
936Variables under the
937.Va net.inet.tcp.syncache
938node are documented in
939.Xr syncache 4 .
940.It Va syncookies
941Determines whether or not
942.Tn SYN
943cookies should be generated for outbound
944.Tn SYN-ACK
945packets.
946.Tn SYN
947cookies are a great help during
948.Tn SYN
949flood attacks, and are enabled by default.
950(See
951.Xr syncookies 4 . )
952.It Va syncookies_only
953See
954.Xr syncookies 4 .
955.It Va tcbhashsize
956Size of the
957.Tn TCP
958control-block hash table
959(read-only).
960This is tuned using the kernel option
961.Dv TCBHASHSIZE
962or by setting
963.Va net.inet.tcp.tcbhashsize
964in the
965.Xr loader 8 .
966.It Va tolerate_missing_ts
967Tolerate the missing of timestamps (RFC 1323/RFC 7323) for
968.Tn TCP
969segments belonging to
970.Tn TCP
971connections for which support of
972.Tn TCP
973timestamps has been negotiated.
974As of June 2021, several TCP stacks are known to violate RFC 7323, including
975modern widely deployed ones.
976Therefore the default is 1, i.e., the missing of timestamps is tolerated.
977.It Va ts_offset_per_conn
978When initializing the TCP timestamps, use a per connection offset instead of a
979per host pair offset.
980Default is to use per connection offsets as recommended in RFC 7323.
981.It Va tso
982Enable TCP Segmentation Offload.
983.It Va udp_tunneling_overhead
984The overhead taken into account when using UDP encapsulation.
985Since MSS clamping by middleboxes will most likely not work, values larger than
9868 (the size of the UDP header) are also supported.
987Supported values are between 8 and 1024.
988The default is 8.
989.It Va udp_tunneling_port
990The local UDP encapsulation port.
991A value of 0 indicates that UDP encapsulation is disabled.
992The default is 0.
993.It Va v6mssdflt
994The default value used for the TCP Maximum Segment Size
995.Pq Dq MSS
996for IPv6 when no advice to the contrary is received from MSS negotiation.
997.It Va v6pmtud_blackhole_mss
998MSS to try for IPv6 if PMTU blackhole detection is turned on.
999See
1000.Va pmtud_blackhole_detection .
1001.El
1002.Sh ERRORS
1003A socket operation may fail with one of the following errors returned:
1004.Bl -tag -width Er
1005.It Bq Er EISCONN
1006when trying to establish a connection on a socket which
1007already has one;
1008.It Bo Er ENOBUFS Bc or Bo Er ENOMEM Bc
1009when the system runs out of memory for
1010an internal data structure;
1011.It Bq Er ETIMEDOUT
1012when a connection was dropped
1013due to excessive retransmissions;
1014.It Bq Er ECONNRESET
1015when the remote peer
1016forces the connection to be closed;
1017.It Bq Er ECONNREFUSED
1018when the remote
1019peer actively refuses connection establishment (usually because
1020no process is listening to the port);
1021.It Bq Er EADDRINUSE
1022when an attempt
1023is made to create a socket with a port which has already been
1024allocated;
1025.It Bq Er EADDRNOTAVAIL
1026when an attempt is made to create a
1027socket with a network address for which no network interface
1028exists;
1029.It Bq Er EAFNOSUPPORT
1030when an attempt is made to bind or connect a socket to a multicast
1031address.
1032.It Bq Er EINVAL
1033when trying to change TCP function blocks at an invalid point in the session;
1034.It Bq Er ENOENT
1035when trying to use a TCP function block that is not available;
1036.El
1037.Sh SEE ALSO
1038.Xr getsockopt 2 ,
1039.Xr socket 2 ,
1040.Xr stats 3 ,
1041.Xr sysctl 3 ,
1042.Xr blackhole 4 ,
1043.Xr inet 4 ,
1044.Xr intro 4 ,
1045.Xr ip 4 ,
1046.Xr ktls 4 ,
1047.Xr mod_cc 4 ,
1048.Xr siftr 4 ,
1049.Xr syncache 4 ,
1050.Xr tcp_bbr 4 ,
1051.Xr setkey 8 ,
1052.Xr sysctl 8 ,
1053.Xr tcp_functions 9
1054.Rs
1055.%A "V. Jacobson"
1056.%A "B. Braden"
1057.%A "D. Borman"
1058.%T "TCP Extensions for High Performance"
1059.%O "RFC 1323"
1060.Re
1061.Rs
1062.%A "D. Borman"
1063.%A "B. Braden"
1064.%A "V. Jacobson"
1065.%A "R. Scheffenegger"
1066.%T "TCP Extensions for High Performance"
1067.%O "RFC 7323"
1068.Re
1069.Rs
1070.%A "A. Heffernan"
1071.%T "Protection of BGP Sessions via the TCP MD5 Signature Option"
1072.%O "RFC 2385"
1073.Re
1074.Rs
1075.%A "K. Ramakrishnan"
1076.%A "S. Floyd"
1077.%A "D. Black"
1078.%T "The Addition of Explicit Congestion Notification (ECN) to IP"
1079.%O "RFC 3168"
1080.Re
1081.Sh HISTORY
1082The
1083.Tn TCP
1084protocol appeared in
1085.Bx 4.2 .
1086The RFC 1323 extensions for window scaling and timestamps were added
1087in
1088.Bx 4.4 .
1089The
1090.Dv TCP_INFO
1091option was introduced in
1092.Tn Linux 2.6
1093and is
1094.Em subject to change .
1095