1.\" Copyright (c) 1983, 1991, 1993 2.\" The Regents of the University of California. 3.\" Copyright (c) 2010-2011 The FreeBSD Foundation 4.\" All rights reserved. 5.\" 6.\" Portions of this documentation were written at the Centre for Advanced 7.\" Internet Architectures, Swinburne University of Technology, Melbourne, 8.\" Australia by David Hayes under sponsorship from the FreeBSD Foundation. 9.\" 10.\" Redistribution and use in source and binary forms, with or without 11.\" modification, are permitted provided that the following conditions 12.\" are met: 13.\" 1. Redistributions of source code must retain the above copyright 14.\" notice, this list of conditions and the following disclaimer. 15.\" 2. Redistributions in binary form must reproduce the above copyright 16.\" notice, this list of conditions and the following disclaimer in the 17.\" documentation and/or other materials provided with the distribution. 18.\" 3. Neither the name of the University nor the names of its contributors 19.\" may be used to endorse or promote products derived from this software 20.\" without specific prior written permission. 21.\" 22.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 23.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 24.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 25.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 26.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 27.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 28.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 29.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 30.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 31.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 32.\" SUCH DAMAGE. 33.\" 34.\" From: @(#)tcp.4 8.1 (Berkeley) 6/5/93 35.\" $FreeBSD$ 36.\" 37.Dd January 8, 2022 38.Dt TCP 4 39.Os 40.Sh NAME 41.Nm tcp 42.Nd Internet Transmission Control Protocol 43.Sh SYNOPSIS 44.In sys/types.h 45.In sys/socket.h 46.In netinet/in.h 47.In netinet/tcp.h 48.Ft int 49.Fn socket AF_INET SOCK_STREAM 0 50.Sh DESCRIPTION 51The 52.Tn TCP 53protocol provides reliable, flow-controlled, two-way 54transmission of data. 55It is a byte-stream protocol used to 56support the 57.Dv SOCK_STREAM 58abstraction. 59.Tn TCP 60uses the standard 61Internet address format and, in addition, provides a per-host 62collection of 63.Dq "port addresses" . 64Thus, each address is composed 65of an Internet address specifying the host and network, 66with a specific 67.Tn TCP 68port on the host identifying the peer entity. 69.Pp 70Sockets utilizing the 71.Tn TCP 72protocol are either 73.Dq active 74or 75.Dq passive . 76Active sockets initiate connections to passive 77sockets. 78By default, 79.Tn TCP 80sockets are created active; to create a 81passive socket, the 82.Xr listen 2 83system call must be used 84after binding the socket with the 85.Xr bind 2 86system call. 87Only passive sockets may use the 88.Xr accept 2 89call to accept incoming connections. 90Only active sockets may use the 91.Xr connect 2 92call to initiate connections. 93.Pp 94Passive sockets may 95.Dq underspecify 96their location to match 97incoming connection requests from multiple networks. 98This technique, termed 99.Dq "wildcard addressing" , 100allows a single 101server to provide service to clients on multiple networks. 102To create a socket which listens on all networks, the Internet 103address 104.Dv INADDR_ANY 105must be bound. 106The 107.Tn TCP 108port may still be specified 109at this time; if the port is not specified, the system will assign one. 110Once a connection has been established, the socket's address is 111fixed by the peer entity's location. 112The address assigned to the 113socket is the address associated with the network interface 114through which packets are being transmitted and received. 115Normally, this address corresponds to the peer entity's network. 116.Pp 117.Tn TCP 118supports a number of socket options which can be set with 119.Xr setsockopt 2 120and tested with 121.Xr getsockopt 2 : 122.Bl -tag -width ".Dv TCP_FUNCTION_BLK" 123.It Dv TCP_INFO 124Information about a socket's underlying TCP session may be retrieved 125by passing the read-only option 126.Dv TCP_INFO 127to 128.Xr getsockopt 2 . 129It accepts a single argument: a pointer to an instance of 130.Vt "struct tcp_info" . 131.Pp 132This API is subject to change; consult the source to determine 133which fields are currently filled out by this option. 134.Fx 135specific additions include 136send window size, 137receive window size, 138and 139bandwidth-controlled window space. 140.It Dv TCP_CCALGOOPT 141Set or query congestion control algorithm specific parameters. 142See 143.Xr mod_cc 4 144for details. 145.It Dv TCP_CONGESTION 146Select or query the congestion control algorithm that TCP will use for the 147connection. 148See 149.Xr mod_cc 4 150for details. 151.It Dv TCP_FASTOPEN 152Enable or disable TCP Fast Open (TFO). 153To use this option, the kernel must be built with the 154.Dv TCP_RFC7413 155option. 156.Pp 157This option can be set on the socket either before or after the 158.Xr listen 2 159is invoked. 160Clearing this option on a listen socket after it has been set has no effect on 161existing TFO connections or TFO connections in progress; it only prevents new 162TFO connections from being established. 163.Pp 164For passively-created sockets, the 165.Dv TCP_FASTOPEN 166socket option can be queried to determine whether the connection was established 167using TFO. 168Note that connections that are established via a TFO 169.Tn SYN , 170but that fall back to using a non-TFO 171.Tn SYN|ACK 172will have the 173.Dv TCP_FASTOPEN 174socket option set. 175.Pp 176In addition to the facilities defined in RFC7413, this implementation supports a 177pre-shared key (PSK) mode of operation in which the TFO server requires the 178client to be in posession of a shared secret in order for the client to be able 179to successfully open TFO connections with the server. 180This is useful, for example, in environments where TFO servers are exposed to 181both internal and external clients and only wish to allow TFO connections from 182internal clients. 183.Pp 184In the PSK mode of operation, the server generates and sends TFO cookies to 185requesting clients as usual. 186However, when validating cookies received in TFO SYNs from clients, the server 187requires the client-supplied cookie to equal 188.Bd -literal -offset left 189SipHash24(key=\fI16-byte-psk\fP, msg=\fIcookie-sent-to-client\fP) 190.Ed 191.Pp 192Multiple concurrent valid pre-shared keys are supported so that time-based 193rolling PSK invalidation policies can be implemented in the system. 194The default number of concurrent pre-shared keys is 2. 195.Pp 196This can be adjusted with the 197.Dv TCP_RFC7413_MAX_PSKS 198kernel option. 199.It Dv TCP_FUNCTION_BLK 200Select or query the set of functions that TCP will use for this connection. 201This allows a user to select an alternate TCP stack. 202The alternate TCP stack must already be loaded in the kernel. 203To list the available TCP stacks, see 204.Va functions_available 205in the 206.Sx MIB Variables 207section further down. 208To list the default TCP stack, see 209.Va functions_default 210in the 211.Sx MIB Variables 212section. 213.It Dv TCP_KEEPINIT 214This 215.Xr setsockopt 2 216option accepts a per-socket timeout argument of 217.Vt "u_int" 218in seconds, for new, non-established 219.Tn TCP 220connections. 221For the global default in milliseconds see 222.Va keepinit 223in the 224.Sx MIB Variables 225section further down. 226.It Dv TCP_KEEPIDLE 227This 228.Xr setsockopt 2 229option accepts an argument of 230.Vt "u_int" 231for the amount of time, in seconds, that the connection must be idle 232before keepalive probes (if enabled) are sent for the connection of this 233socket. 234If set on a listening socket, the value is inherited by the newly created 235socket upon 236.Xr accept 2 . 237For the global default in milliseconds see 238.Va keepidle 239in the 240.Sx MIB Variables 241section further down. 242.It Dv TCP_KEEPINTVL 243This 244.Xr setsockopt 2 245option accepts an argument of 246.Vt "u_int" 247to set the per-socket interval, in seconds, between keepalive probes sent 248to a peer. 249If set on a listening socket, the value is inherited by the newly created 250socket upon 251.Xr accept 2 . 252For the global default in milliseconds see 253.Va keepintvl 254in the 255.Sx MIB Variables 256section further down. 257.It Dv TCP_KEEPCNT 258This 259.Xr setsockopt 2 260option accepts an argument of 261.Vt "u_int" 262and allows a per-socket tuning of the number of probes sent, with no response, 263before the connection will be dropped. 264If set on a listening socket, the value is inherited by the newly created 265socket upon 266.Xr accept 2 . 267For the global default see the 268.Va keepcnt 269in the 270.Sx MIB Variables 271section further down. 272.It Dv TCP_NODELAY 273Under most circumstances, 274.Tn TCP 275sends data when it is presented; 276when outstanding data has not yet been acknowledged, it gathers 277small amounts of output to be sent in a single packet once 278an acknowledgement is received. 279For a small number of clients, such as window systems 280that send a stream of mouse events which receive no replies, 281this packetization may cause significant delays. 282The boolean option 283.Dv TCP_NODELAY 284defeats this algorithm. 285.It Dv TCP_MAXSEG 286By default, a sender- and 287.No receiver- Ns Tn TCP 288will negotiate among themselves to determine the maximum segment size 289to be used for each connection. 290The 291.Dv TCP_MAXSEG 292option allows the user to determine the result of this negotiation, 293and to reduce it if desired. 294.It Dv TCP_NOOPT 295.Tn TCP 296usually sends a number of options in each packet, corresponding to 297various 298.Tn TCP 299extensions which are provided in this implementation. 300The boolean option 301.Dv TCP_NOOPT 302is provided to disable 303.Tn TCP 304option use on a per-connection basis. 305.It Dv TCP_NOPUSH 306By convention, the 307.No sender- Ns Tn TCP 308will set the 309.Dq push 310bit, and begin transmission immediately (if permitted) at the end of 311every user call to 312.Xr write 2 313or 314.Xr writev 2 . 315When this option is set to a non-zero value, 316.Tn TCP 317will delay sending any data at all until either the socket is closed, 318or the internal send buffer is filled. 319.It Dv TCP_MD5SIG 320This option enables the use of MD5 digests (also known as TCP-MD5) 321on writes to the specified socket. 322Outgoing traffic is digested; 323digests on incoming traffic are verified. 324When this option is enabled on a socket, all inbound and outgoing 325TCP segments must be signed with MD5 digests. 326.Pp 327One common use for this in a 328.Fx 329router deployment is to enable 330based routers to interwork with Cisco equipment at peering points. 331Support for this feature conforms to RFC 2385. 332.Pp 333In order for this option to function correctly, it is necessary for the 334administrator to add a tcp-md5 key entry to the system's security 335associations database (SADB) using the 336.Xr setkey 8 337utility. 338This entry can only be specified on a per-host basis at this time. 339.Pp 340If an SADB entry cannot be found for the destination, 341the system does not send any outgoing segments and drops any inbound segments. 342However, during connection negotiation, a non-signed segment will be accepted if 343an SADB entry does not exist between hosts. 344When a non-signed segment is accepted, the established connection is not 345protected with MD5 digests. 346.It Dv TCP_STATS 347Manage collection of connection level statistics using the 348.Xr stats 3 349framework. 350.Pp 351Each dropped segment is taken into account in the TCP protocol statistics. 352.It Dv TCP_TXTLS_ENABLE 353Enable in-kernel Transport Layer Security (TLS) for data written to this 354socket. 355See 356.Xr ktls 4 357for more details. 358.It Dv TCP_TXTLS_MODE 359The integer argument can be used to get or set the current TLS transmit mode 360of a socket. 361See 362.Xr ktls 4 363for more details. 364.It Dv TCP_RXTLS_ENABLE 365Enable in-kernel TLS for data read from this socket. 366See 367.Xr ktls 4 368for more details. 369.It Dv TCP_REUSPORT_LB_NUMA 370Changes NUMA affinity filtering for an established TCP listen 371socket. 372This option takes a single integer argument which specifies 373the NUMA domain to filter on for this listen socket. 374The argument can also have the follwing special values: 375.Bl -tag -width "Dv TCP_REUSPORT_LB_NUMA" 376.It Dv TCP_REUSPORT_LB_NUMA_NODOM 377Remove NUMA filtering for this listen socket. 378.It Dv TCP_REUSPORT_LB_NUMA_CURDOM 379Filter traffic associated with the domain where the calling thread is 380currently executing. 381This is typically used after a process or thread inherits a listen 382socket from its parent, and sets its CPU affinity to a particular core. 383.El 384.It Dv TCP_REMOTE_UDP_ENCAPS_PORT 385Set and get the remote UDP encapsulation port. 386It can only be set on a closed TCP socket. 387.El 388.Pp 389The option level for the 390.Xr setsockopt 2 391call is the protocol number for 392.Tn TCP , 393available from 394.Xr getprotobyname 3 , 395or 396.Dv IPPROTO_TCP . 397All options are declared in 398.In netinet/tcp.h . 399.Pp 400Options at the 401.Tn IP 402transport level may be used with 403.Tn TCP ; 404see 405.Xr ip 4 . 406Incoming connection requests that are source-routed are noted, 407and the reverse source route is used in responding. 408.Pp 409The default congestion control algorithm for 410.Tn TCP 411is 412.Xr cc_newreno 4 . 413Other congestion control algorithms can be made available using the 414.Xr mod_cc 4 415framework. 416.Ss MIB Variables 417The 418.Tn TCP 419protocol implements a number of variables in the 420.Va net.inet.tcp 421branch of the 422.Xr sysctl 3 423MIB. 424.Bl -tag -width ".Va TCPCTL_DO_RFC1323" 425.It Dv TCPCTL_DO_RFC1323 426.Pq Va rfc1323 427Implement the window scaling and timestamp options of RFC 1323/RFC 7323 428(default is true). 429.It Va tolerate_missing_ts 430Tolerate the missing of timestamps (RFC 1323/RFC 7323) for 431.Tn TCP 432segments belonging to 433.Tn TCP 434connections for which support of 435.Tn TCP 436timestamps has been negotiated. 437As of June 2021, several TCP stacks are known to violate RFC 7323, including 438modern widely deployed ones. 439Therefore the default is 1, i.e., the missing of timestamps is tolerated. 440.It Dv TCPCTL_MSSDFLT 441.Pq Va mssdflt 442The default value used for the maximum segment size 443.Pq Dq MSS 444when no advice to the contrary is received from MSS negotiation. 445.It Dv TCPCTL_SENDSPACE 446.Pq Va sendspace 447Maximum 448.Tn TCP 449send window. 450.It Dv TCPCTL_RECVSPACE 451.Pq Va recvspace 452Maximum 453.Tn TCP 454receive window. 455.It Va log_in_vain 456Log any connection attempts to ports where there is not a socket 457accepting connections. 458The value of 1 limits the logging to 459.Tn SYN 460(connection establishment) packets only. 461That of 2 results in any 462.Tn TCP 463packets to closed ports being logged. 464Any value unlisted above disables the logging 465(default is 0, i.e., the logging is disabled). 466.It Va msl 467The Maximum Segment Lifetime, in milliseconds, for a packet. 468.It Va keepinit 469Timeout, in milliseconds, for new, non-established 470.Tn TCP 471connections. 472The default is 75000 msec. 473.It Va keepidle 474Amount of time, in milliseconds, that the connection must be idle 475before keepalive probes (if enabled) are sent. 476The default is 7200000 msec (2 hours). 477.It Va keepintvl 478The interval, in milliseconds, between keepalive probes sent to remote 479machines, when no response is received on a 480.Va keepidle 481probe. 482The default is 75000 msec. 483.It Va keepcnt 484Number of probes sent, with no response, before a connection 485is dropped. 486The default is 8 packets. 487.It Va always_keepalive 488Assume that 489.Dv SO_KEEPALIVE 490is set on all 491.Tn TCP 492connections, the kernel will 493periodically send a packet to the remote host to verify the connection 494is still up. 495.It Va icmp_may_rst 496Certain 497.Tn ICMP 498unreachable messages may abort connections in 499.Tn SYN-SENT 500state. 501.It Va do_tcpdrain 502Flush packets in the 503.Tn TCP 504reassembly queue if the system is low on mbufs. 505.It Va blackhole 506If enabled, disable sending of RST when a connection is attempted 507to a port where there is not a socket accepting connections. 508See 509.Xr blackhole 4 . 510.It Va delayed_ack 511Delay ACK to try and piggyback it onto a data packet. 512.It Va delacktime 513Maximum amount of time, in milliseconds, before a delayed ACK is sent. 514.It Va path_mtu_discovery 515Enable Path MTU Discovery. 516.It Va tcbhashsize 517Size of the 518.Tn TCP 519control-block hash table 520(read-only). 521This may be tuned using the kernel option 522.Dv TCBHASHSIZE 523or by setting 524.Va net.inet.tcp.tcbhashsize 525in the 526.Xr loader 8 . 527.It Va pcbcount 528Number of active process control blocks 529(read-only). 530.It Va syncookies 531Determines whether or not 532.Tn SYN 533cookies should be generated for outbound 534.Tn SYN-ACK 535packets. 536.Tn SYN 537cookies are a great help during 538.Tn SYN 539flood attacks, and are enabled by default. 540(See 541.Xr syncookies 4 . ) 542.It Va isn_reseed_interval 543The interval (in seconds) specifying how often the secret data used in 544RFC 1948 initial sequence number calculations should be reseeded. 545By default, this variable is set to zero, indicating that 546no reseeding will occur. 547Reseeding should not be necessary, and will break 548.Dv TIME_WAIT 549recycling for a few minutes. 550.It Va reass.cursegments 551The current total number of segments present in all reassembly queues. 552.It Va reass.maxsegments 553The maximum limit on the total number of segments across all reassembly 554queues. 555The limit can be adjusted as a tunable. 556.It Va reass.maxqueuelen 557The maximum number of segments allowed in each reassembly queue. 558By default, the system chooses a limit based on each TCP connection's 559receive buffer size and maximum segment size (MSS). 560The actual limit applied to a session's reassembly queue will be the lower of 561the system-calculated automatic limit and the user-specified 562.Va reass.maxqueuelen 563limit. 564.It Va rexmit_initial , rexmit_min , rexmit_slop 565Adjust the retransmit timer calculation for 566.Tn TCP . 567The slop is 568typically added to the raw calculation to take into account 569occasional variances that the 570.Tn SRTT 571(smoothed round-trip time) 572is unable to accommodate, while the minimum specifies an 573absolute minimum. 574While a number of 575.Tn TCP 576RFCs suggest a 1 577second minimum, these RFCs tend to focus on streaming behavior, 578and fail to deal with the fact that a 1 second minimum has severe 579detrimental effects over lossy interactive connections, such 580as a 802.11b wireless link, and over very fast but lossy 581connections for those cases not covered by the fast retransmit 582code. 583For this reason, we use 200ms of slop and a near-0 584minimum, which gives us an effective minimum of 200ms (similar to 585.Tn Linux ) . 586The initial value is used before an RTT measurement has been performed. 587.It Va initcwnd_segments 588Enable the ability to specify initial congestion window in number of segments. 589The default value is 10 as suggested by RFC 6928. 590Changing the value on fly would not affect connections using congestion window 591from the hostcache. 592Caution: 593This regulates the burst of packets allowed to be sent in the first RTT. 594The value should be relative to the link capacity. 595Start with small values for lower-capacity links. 596Large bursts can cause buffer overruns and packet drops if routers have small 597buffers or the link is experiencing congestion. 598.It Va newcwd 599Enable the New Congestion Window Validation mechanism as described in RFC 7661. 600This gently reduces the congestion window during periods, where TCP is 601application limited and the network bandwidth is not utilized completely. 602That prevents self-inflicted packet losses once the application starts to 603transmit data at a higher speed. 604.It Va do_lrd 605Enable Lost Retransmission Detection for SACK-enabled sessions, disabled by 606default. 607Under severe congestion, a retransmission can be lost which then leads to a 608mandatory Retransmission Timeout (RTO), followed by slow-start. 609LRD will try to resend the repeatedly lost packet, preventing the time-consuming 610RTO and performance reducing slow-start. 611.It Va do_prr 612Perform SACK loss recovery using the Proportional Rate Reduction (PRR) algorithm 613described in RFC6937. 614This improves the effectiveness of retransmissions particular in environments 615with ACK thinning or burst loss events, as chances to run out of the ACK clock 616are reduced, preventing lengthy and performance reducing RTO based loss recovery 617(default is true). 618.It Va do_prr_conservative 619While doing Proportional Rate Reduction, remain strictly in a packet conserving 620mode, sending only one new packet for each ACK received. 621Helpful when a misconfigured token bucket traffic policer causes persistent 622high losses leading to RTO, but reduces PRR effectiveness in more common settings 623(default is false). 624.It Va rfc6675_pipe 625Deprecated and superseded by 626.Va sack.revised 627.It Va rfc3042 628Enable the Limited Transmit algorithm as described in RFC 3042. 629It helps avoid timeouts on lossy links and also when the congestion window 630is small, as happens on short transfers. 631.It Va rfc3390 632Enable support for RFC 3390, which allows for a variable-sized 633starting congestion window on new connections, depending on the 634maximum segment size. 635This helps throughput in general, but 636particularly affects short transfers and high-bandwidth large 637propagation-delay connections. 638.It Va sack.enable 639Enable support for RFC 2018, TCP Selective Acknowledgment option, 640which allows the receiver to inform the sender about all successfully 641arrived segments, allowing the sender to retransmit the missing segments 642only. 643.It Va sack.revised 644Enables three updated mechanisms from RFC6675 (default is true). 645Calculate the bytes in flight using the algorithm described in RFC 6675, and 646is also an improvement when Proportional Rate Reduction is enabled. 647Next, Rescue Retransmission helps timely loss recovery, when the trailing segments 648of a transmission are lost, while no additional data is ready to be sent. 649In case a partial ACK without a SACK block is received during SACK loss 650recovery, the trailing segment is immediately resent, rather than waiting 651for a Retransmission timeout. 652Finally, SACK loss recovery is also engaged, once two segments plus one byte are 653SACKed - even if no traditional duplicate ACKs were observed. 654.It Va sack.maxholes 655Maximum number of SACK holes per connection. 656Defaults to 128. 657.It Va sack.globalmaxholes 658Maximum number of SACK holes per system, across all connections. 659Defaults to 65536. 660.It Va maxtcptw 661When a TCP connection enters the 662.Dv TIME_WAIT 663state, its associated socket structure is freed, since it is of 664negligible size and use, and a new structure is allocated to contain a 665minimal amount of information necessary for sustaining a connection in 666this state, called the compressed TCP TIME_WAIT state. 667Since this structure is smaller than a socket structure, it can save 668a significant amount of system memory. 669The 670.Va net.inet.tcp.maxtcptw 671MIB variable controls the maximum number of these structures allocated. 672By default, it is initialized to 673.Va kern.ipc.maxsockets 674/ 5. 675.It Va nolocaltimewait 676Suppress creating of compressed TCP TIME_WAIT states for connections in 677which both endpoints are local. 678.It Va fast_finwait2_recycle 679Recycle 680.Tn TCP 681.Dv FIN_WAIT_2 682connections faster when the socket is marked as 683.Dv SBS_CANTRCVMORE 684(no user process has the socket open, data received on 685the socket cannot be read). 686The timeout used here is 687.Va finwait2_timeout . 688.It Va finwait2_timeout 689Timeout to use for fast recycling of 690.Tn TCP 691.Dv FIN_WAIT_2 692connections. 693Defaults to 60 seconds. 694.It Va ecn.enable 695Enable support for TCP Explicit Congestion Notification (ECN). 696ECN allows a TCP sender to reduce the transmission rate in order to 697avoid packet drops. 698.Bl -tag -compact 699.It 0 700Disable ECN. 701.It 1 702Allow incoming connections to request ECN. 703Outgoing connections will request ECN. 704.It 2 705Allow incoming connections to request ECN. 706Outgoing connections will not request ECN. 707(default) 708.El 709.It Va ecn.maxretries 710Number of retries (SYN or SYN/ACK retransmits) before disabling ECN on a 711specific connection. 712This is needed to help with connection establishment 713when a broken firewall is in the network path. 714.It Va pmtud_blackhole_detection 715Enable automatic path MTU blackhole detection. 716In case of retransmits of MSS sized segments, 717the OS will lower the MSS to check if it's an MTU problem. 718If the current MSS is greater than the configured value to try 719.Po Va net.inet.tcp.pmtud_blackhole_mss 720and 721.Va net.inet.tcp.v6pmtud_blackhole_mss 722.Pc , 723it will be set to this value, otherwise, 724the MSS will be set to the default values 725.Po Va net.inet.tcp.mssdflt 726and 727.Va net.inet.tcp.v6mssdflt 728.Pc . 729Settings: 730.Bl -tag -compact 731.It 0 732Disable path MTU blackhole detection. 733.It 1 734Enable path MTU blackhole detection for IPv4 and IPv6. 735.It 2 736Enable path MTU blackhole detection only for IPv4. 737.It 3 738Enable path MTU blackhole detection only for IPv6. 739.El 740.It Va pmtud_blackhole_mss 741MSS to try for IPv4 if PMTU blackhole detection is turned on. 742.It Va v6pmtud_blackhole_mss 743MSS to try for IPv6 if PMTU blackhole detection is turned on. 744.It Va fastopen.acceptany 745When non-zero, all client-supplied TFO cookies will be considered to be valid. 746The default is 0. 747.It Va fastopen.autokey 748When this and 749.Va net.inet.tcp.fastopen.server_enable 750are non-zero, a new key will be automatically generated after this specified 751seconds. 752The default is 120. 753.It Va fastopen.ccache_bucket_limit 754The maximum number of entries in a client cookie cache bucket. 755The default value can be tuned with the 756.Dv TCP_FASTOPEN_CCACHE_BUCKET_LIMIT_DEFAULT 757kernel option or by setting 758.Va net.inet.tcp.fastopen_ccache_bucket_limit 759in the 760.Xr loader 8 . 761.It Va fastopen.ccache_buckets 762The number of client cookie cache buckets. 763Read-only. 764The value can be tuned with the 765.Dv TCP_FASTOPEN_CCACHE_BUCKETS_DEFAULT 766kernel option or by setting 767.Va fastopen.ccache_buckets 768in the 769.Xr loader 8 . 770.It Va fastopen.ccache_list 771Print the client cookie cache. 772Read-only. 773.It Va fastopen.client_enable 774When zero, no new active (i.e., client) TFO connections can be created. 775On the transition from enabled to disabled, the client cookie cache is cleared 776and disabled. 777The transition from enabled to disabled does not affect any active TFO 778connections in progress; it only prevents new ones from being established. 779The default is 0. 780.It Va fastopen.keylen 781The key length in bytes. 782Read-only. 783.It Va fastopen.maxkeys 784The maximum number of keys supported. 785Read-only, 786.It Va fastopen.maxpsks 787The maximum number of pre-shared keys supported. 788Read-only. 789.It Va fastopen.numkeys 790The current number of keys installed. 791Read-only. 792.It Va fastopen.numpsks 793The current number of pre-shared keys installed. 794Read-only. 795.It Va fastopen.path_disable_time 796When a failure occurs while trying to create a new active (i.e., client) TFO 797connection, new active connections on the same path, as determined by the tuple 798.Brq client_ip, server_ip, server_port , 799will be forced to be non-TFO for this many seconds. 800Note that the path disable mechanism relies on state stored in client cookie 801cache entries, so it is possible for the disable time for a given path to be 802reduced if the corresponding client cookie cache entry is reused due to resource 803pressure before the disable period has elapsed. 804The default is 805.Dv TCP_FASTOPEN_PATH_DISABLE_TIME_DEFAULT . 806.It Va fastopen.psk_enable 807When non-zero, pre-shared key (PSK) mode is enabled for all TFO servers. 808On the transition from enabled to disabled, all installed pre-shared keys are 809removed. 810The default is 0. 811.It Va fastopen.server_enable 812When zero, no new passive (i.e., server) TFO connections can be created. 813On the transition from enabled to disabled, all installed keys and pre-shared 814keys are removed. 815On the transition from disabled to enabled, if 816.Va fastopen.autokey 817is non-zero and there are no keys installed, a new key will be generated 818immediately. 819The transition from enabled to disabled does not affect any passive TFO 820connections in progress; it only prevents new ones from being established. 821The default is 0. 822.It Va fastopen.setkey 823Install a new key by writing 824.Va net.inet.tcp.fastopen.keylen 825bytes to this sysctl. 826.It Va fastopen.setpsk 827Install a new pre-shared key by writing 828.Va net.inet.tcp.fastopen.keylen 829bytes to this sysctl. 830.It Va hostcache.enable 831The TCP host cache is used to cache connection details and metrics to 832improve future performance of connections between the same hosts. 833At the completion of a TCP connection, a host will cache information 834for the connection for some defined period of time. 835.Bl -tag -compact 836.It 0 837Disable the host cache. 838.It 1 839Enable the host cache. (default) 840.El 841.It Va hostcache.purgenow 842Immediately purge all entries once set to any value. 843Setting this to 2 will also reseed the hash salt. 844.It Va hostcache.purge 845Expire all entires on next pruning of host cache entries. 846Any non-zero setting will be reset to zero, once the pruge 847is running. 848.Bl -tag -compact 849.It 0 850Do not purge all entries when pruning the host cache. (default) 851.It 1 852Purge all entries when doing the next pruning. 853.It 2 854Purge all entries, and also reseed the hash salt. 855.El 856.It Va hostcache.prune 857Time in seconds between pruning expired host cache entries. 858Defaults to 300 (5 minutes). 859.It Va hostcache.expire 860Time in seconds, how long a entry should be kept in the 861host cache since last accessed. 862Defaults to 3600 (1 hour). 863.It Va hostcache.count 864The current number of entries in the host cache. 865.It Va hostcache.bucketlimit 866The maximum number of entries for the same hash. 867Defaults to 30. 868.It Va hostcache.hashsize 869Size of TCP hostcache hashtable. 870This number has to be a power of two, or will be rejected. 871Defaults to 512. 872.It Va hostcache.cachelimit 873Overall entry limit for hostcache. 874Defaults to hashsize * bucketlimit. 875.It Va hostcache.histo 876Provide a Histogram of the hostcache hash utilization. 877.It Va hostcache.list 878Provide a complete list of all current entries in the host 879cache. 880.It Va functions_available 881List of available TCP function blocks (TCP stacks). 882.It Va functions_default 883The default TCP function block (TCP stack). 884.It Va functions_inherit_listen_socket_stack 885Determines whether to inherit listen socket's tcp stack or use the current 886system default tcp stack, as defined by 887.Va functions_default . 888Default is true. 889.It Va insecure_rst 890Use criteria defined in RFC793 instead of RFC5961 for accepting RST segments. 891Default is false. 892.It Va insecure_syn 893Use criteria defined in RFC793 instead of RFC5961 for accepting SYN segments. 894Default is false. 895.It Va ts_offset_per_conn 896When initializing the TCP timestamps, use a per connection offset instead of a 897per host pair offset. 898Default is to use per connection offsets as recommended in RFC 7323. 899.It Va perconn_stats_enable 900Controls the default collection of statistics for all connections using the 901.Xr stats 3 902framework. 9030 disables, 1 enables, 2 enables random sampling across log id connection 904groups with all connections in a group receiving the same setting. 905.It Va perconn_stats_sample_rates 906A CSV list of template_spec=percent key-value pairs which controls the per 907template sampling rates when 908.Xr stats 3 909sampling is enabled. 910.It Va udp_tunneling_port 911The local UDP encapsulation port. 912A value of 0 indicates that UDP encapsulation is disabled. 913The default is 0. 914.It Va udp_tunneling_overhead 915The overhead taken into account when using UDP encapsulation. 916Since MSS clamping by middleboxes will most likely not work, values larger than 9178 (the size of the UDP header) are also supported. 918Supported values are between 8 and 1024. 919The default is 8. 920.El 921.Sh ERRORS 922A socket operation may fail with one of the following errors returned: 923.Bl -tag -width Er 924.It Bq Er EISCONN 925when trying to establish a connection on a socket which 926already has one; 927.It Bo Er ENOBUFS Bc or Bo Er ENOMEM Bc 928when the system runs out of memory for 929an internal data structure; 930.It Bq Er ETIMEDOUT 931when a connection was dropped 932due to excessive retransmissions; 933.It Bq Er ECONNRESET 934when the remote peer 935forces the connection to be closed; 936.It Bq Er ECONNREFUSED 937when the remote 938peer actively refuses connection establishment (usually because 939no process is listening to the port); 940.It Bq Er EADDRINUSE 941when an attempt 942is made to create a socket with a port which has already been 943allocated; 944.It Bq Er EADDRNOTAVAIL 945when an attempt is made to create a 946socket with a network address for which no network interface 947exists; 948.It Bq Er EAFNOSUPPORT 949when an attempt is made to bind or connect a socket to a multicast 950address. 951.It Bq Er EINVAL 952when trying to change TCP function blocks at an invalid point in the session; 953.It Bq Er ENOENT 954when trying to use a TCP function block that is not available; 955.El 956.Sh SEE ALSO 957.Xr getsockopt 2 , 958.Xr socket 2 , 959.Xr stats 3 , 960.Xr sysctl 3 , 961.Xr blackhole 4 , 962.Xr inet 4 , 963.Xr intro 4 , 964.Xr ip 4 , 965.Xr ktls 4 , 966.Xr mod_cc 4 , 967.Xr siftr 4 , 968.Xr syncache 4 , 969.Xr tcp_bbr 4 , 970.Xr setkey 8 , 971.Xr tcp_functions 9 972.Rs 973.%A "V. Jacobson" 974.%A "B. Braden" 975.%A "D. Borman" 976.%T "TCP Extensions for High Performance" 977.%O "RFC 1323" 978.Re 979.Rs 980.%A "D. Borman" 981.%A "B. Braden" 982.%A "V. Jacobson" 983.%A "R. Scheffenegger" 984.%T "TCP Extensions for High Performance" 985.%O "RFC 7323" 986.Re 987.Rs 988.%A "A. Heffernan" 989.%T "Protection of BGP Sessions via the TCP MD5 Signature Option" 990.%O "RFC 2385" 991.Re 992.Rs 993.%A "K. Ramakrishnan" 994.%A "S. Floyd" 995.%A "D. Black" 996.%T "The Addition of Explicit Congestion Notification (ECN) to IP" 997.%O "RFC 3168" 998.Re 999.Sh HISTORY 1000The 1001.Tn TCP 1002protocol appeared in 1003.Bx 4.2 . 1004The RFC 1323 extensions for window scaling and timestamps were added 1005in 1006.Bx 4.4 . 1007The 1008.Dv TCP_INFO 1009option was introduced in 1010.Tn Linux 2.6 1011and is 1012.Em subject to change . 1013