1.\" Copyright (c) 1983, 1991, 1993 2.\" The Regents of the University of California. 3.\" Copyright (c) 2010-2011 The FreeBSD Foundation 4.\" All rights reserved. 5.\" 6.\" Portions of this documentation were written at the Centre for Advanced 7.\" Internet Architectures, Swinburne University of Technology, Melbourne, 8.\" Australia by David Hayes under sponsorship from the FreeBSD Foundation. 9.\" 10.\" Redistribution and use in source and binary forms, with or without 11.\" modification, are permitted provided that the following conditions 12.\" are met: 13.\" 1. Redistributions of source code must retain the above copyright 14.\" notice, this list of conditions and the following disclaimer. 15.\" 2. Redistributions in binary form must reproduce the above copyright 16.\" notice, this list of conditions and the following disclaimer in the 17.\" documentation and/or other materials provided with the distribution. 18.\" 3. Neither the name of the University nor the names of its contributors 19.\" may be used to endorse or promote products derived from this software 20.\" without specific prior written permission. 21.\" 22.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND 23.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 24.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 25.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE 26.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 27.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 28.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 29.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 30.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 31.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 32.\" SUCH DAMAGE. 33.\" 34.\" From: @(#)tcp.4 8.1 (Berkeley) 6/5/93 35.\" $FreeBSD$ 36.\" 37.Dd October 7, 2022 38.Dt TCP 4 39.Os 40.Sh NAME 41.Nm tcp 42.Nd Internet Transmission Control Protocol 43.Sh SYNOPSIS 44.In sys/types.h 45.In sys/socket.h 46.In netinet/in.h 47.In netinet/tcp.h 48.Ft int 49.Fn socket AF_INET SOCK_STREAM 0 50.Sh DESCRIPTION 51The 52.Tn TCP 53protocol provides reliable, flow-controlled, two-way 54transmission of data. 55It is a byte-stream protocol used to 56support the 57.Dv SOCK_STREAM 58abstraction. 59.Tn TCP 60uses the standard 61Internet address format and, in addition, provides a per-host 62collection of 63.Dq "port addresses" . 64Thus, each address is composed 65of an Internet address specifying the host and network, 66with a specific 67.Tn TCP 68port on the host identifying the peer entity. 69.Pp 70Sockets utilizing the 71.Tn TCP 72protocol are either 73.Dq active 74or 75.Dq passive . 76Active sockets initiate connections to passive 77sockets. 78By default, 79.Tn TCP 80sockets are created active; to create a 81passive socket, the 82.Xr listen 2 83system call must be used 84after binding the socket with the 85.Xr bind 2 86system call. 87Only passive sockets may use the 88.Xr accept 2 89call to accept incoming connections. 90Only active sockets may use the 91.Xr connect 2 92call to initiate connections. 93.Pp 94Passive sockets may 95.Dq underspecify 96their location to match 97incoming connection requests from multiple networks. 98This technique, termed 99.Dq "wildcard addressing" , 100allows a single 101server to provide service to clients on multiple networks. 102To create a socket which listens on all networks, the Internet 103address 104.Dv INADDR_ANY 105must be bound. 106The 107.Tn TCP 108port may still be specified 109at this time; if the port is not specified, the system will assign one. 110Once a connection has been established, the socket's address is 111fixed by the peer entity's location. 112The address assigned to the 113socket is the address associated with the network interface 114through which packets are being transmitted and received. 115Normally, this address corresponds to the peer entity's network. 116.Pp 117.Tn TCP 118supports a number of socket options which can be set with 119.Xr setsockopt 2 120and tested with 121.Xr getsockopt 2 : 122.Bl -tag -width ".Dv TCP_FUNCTION_BLK" 123.It Dv TCP_INFO 124Information about a socket's underlying TCP session may be retrieved 125by passing the read-only option 126.Dv TCP_INFO 127to 128.Xr getsockopt 2 . 129It accepts a single argument: a pointer to an instance of 130.Vt "struct tcp_info" . 131.Pp 132This API is subject to change; consult the source to determine 133which fields are currently filled out by this option. 134.Fx 135specific additions include 136send window size, 137receive window size, 138and 139bandwidth-controlled window space. 140.It Dv TCP_CCALGOOPT 141Set or query congestion control algorithm specific parameters. 142See 143.Xr mod_cc 4 144for details. 145.It Dv TCP_CONGESTION 146Select or query the congestion control algorithm that TCP will use for the 147connection. 148See 149.Xr mod_cc 4 150for details. 151.It Dv TCP_FASTOPEN 152Enable or disable TCP Fast Open (TFO). 153To use this option, the kernel must be built with the 154.Dv TCP_RFC7413 155option. 156.Pp 157This option can be set on the socket either before or after the 158.Xr listen 2 159is invoked. 160Clearing this option on a listen socket after it has been set has no effect on 161existing TFO connections or TFO connections in progress; it only prevents new 162TFO connections from being established. 163.Pp 164For passively-created sockets, the 165.Dv TCP_FASTOPEN 166socket option can be queried to determine whether the connection was established 167using TFO. 168Note that connections that are established via a TFO 169.Tn SYN , 170but that fall back to using a non-TFO 171.Tn SYN|ACK 172will have the 173.Dv TCP_FASTOPEN 174socket option set. 175.Pp 176In addition to the facilities defined in RFC7413, this implementation supports a 177pre-shared key (PSK) mode of operation in which the TFO server requires the 178client to be in posession of a shared secret in order for the client to be able 179to successfully open TFO connections with the server. 180This is useful, for example, in environments where TFO servers are exposed to 181both internal and external clients and only wish to allow TFO connections from 182internal clients. 183.Pp 184In the PSK mode of operation, the server generates and sends TFO cookies to 185requesting clients as usual. 186However, when validating cookies received in TFO SYNs from clients, the server 187requires the client-supplied cookie to equal 188.Bd -literal -offset left 189SipHash24(key=\fI16-byte-psk\fP, msg=\fIcookie-sent-to-client\fP) 190.Ed 191.Pp 192Multiple concurrent valid pre-shared keys are supported so that time-based 193rolling PSK invalidation policies can be implemented in the system. 194The default number of concurrent pre-shared keys is 2. 195.Pp 196This can be adjusted with the 197.Dv TCP_RFC7413_MAX_PSKS 198kernel option. 199.It Dv TCP_FUNCTION_BLK 200Select or query the set of functions that TCP will use for this connection. 201This allows a user to select an alternate TCP stack. 202The alternate TCP stack must already be loaded in the kernel. 203To list the available TCP stacks, see 204.Va functions_available 205in the 206.Sx MIB (sysctl) Variables 207section further down. 208To list the default TCP stack, see 209.Va functions_default 210in the 211.Sx MIB (sysctl) Variables 212section. 213.It Dv TCP_KEEPINIT 214This 215.Xr setsockopt 2 216option accepts a per-socket timeout argument of 217.Vt "u_int" 218in seconds, for new, non-established 219.Tn TCP 220connections. 221For the global default in milliseconds see 222.Va keepinit 223in the 224.Sx MIB (sysctl) Variables 225section further down. 226.It Dv TCP_KEEPIDLE 227This 228.Xr setsockopt 2 229option accepts an argument of 230.Vt "u_int" 231for the amount of time, in seconds, that the connection must be idle 232before keepalive probes (if enabled) are sent for the connection of this 233socket. 234If set on a listening socket, the value is inherited by the newly created 235socket upon 236.Xr accept 2 . 237For the global default in milliseconds see 238.Va keepidle 239in the 240.Sx MIB (sysctl) Variables 241section further down. 242.It Dv TCP_KEEPINTVL 243This 244.Xr setsockopt 2 245option accepts an argument of 246.Vt "u_int" 247to set the per-socket interval, in seconds, between keepalive probes sent 248to a peer. 249If set on a listening socket, the value is inherited by the newly created 250socket upon 251.Xr accept 2 . 252For the global default in milliseconds see 253.Va keepintvl 254in the 255.Sx MIB (sysctl) Variables 256section further down. 257.It Dv TCP_KEEPCNT 258This 259.Xr setsockopt 2 260option accepts an argument of 261.Vt "u_int" 262and allows a per-socket tuning of the number of probes sent, with no response, 263before the connection will be dropped. 264If set on a listening socket, the value is inherited by the newly created 265socket upon 266.Xr accept 2 . 267For the global default see the 268.Va keepcnt 269in the 270.Sx MIB (sysctl) Variables 271section further down. 272.It Dv TCP_NODELAY 273Under most circumstances, 274.Tn TCP 275sends data when it is presented; 276when outstanding data has not yet been acknowledged, it gathers 277small amounts of output to be sent in a single packet once 278an acknowledgement is received. 279For a small number of clients, such as window systems 280that send a stream of mouse events which receive no replies, 281this packetization may cause significant delays. 282The boolean option 283.Dv TCP_NODELAY 284defeats this algorithm. 285 286.It Dv TCP_MAXSEG 287By default, a sender- and 288.No receiver- Ns Tn TCP 289will negotiate among themselves to determine the maximum segment size 290to be used for each connection. 291The 292.Dv TCP_MAXSEG 293option allows the user to determine the result of this negotiation, 294and to reduce it if desired. 295.It Dv TCP_MAXUNACKTIME 296This 297.Xr setsockopt 2 298option accepts an argument of 299.Vt "u_int" 300to set the per-socket interval, in seconds, in which the connection must 301make progress. Progress is defined by at least 1 byte being acknowledged within 302the set time period. If a connection fails to make progress, then the 303.Tn TCP 304stack will terminate the connection with a reset. Note that the default 305value for this is zero which indicates no progress checks should be made. 306.It Dv TCP_NOOPT 307.Tn TCP 308usually sends a number of options in each packet, corresponding to 309various 310.Tn TCP 311extensions which are provided in this implementation. 312The boolean option 313.Dv TCP_NOOPT 314is provided to disable 315.Tn TCP 316option use on a per-connection basis. 317.It Dv TCP_NOPUSH 318By convention, the 319.No sender- Ns Tn TCP 320will set the 321.Dq push 322bit, and begin transmission immediately (if permitted) at the end of 323every user call to 324.Xr write 2 325or 326.Xr writev 2 . 327When this option is set to a non-zero value, 328.Tn TCP 329will delay sending any data at all until either the socket is closed, 330or the internal send buffer is filled. 331.It Dv TCP_MD5SIG 332This option enables the use of MD5 digests (also known as TCP-MD5) 333on writes to the specified socket. 334Outgoing traffic is digested; 335digests on incoming traffic are verified. 336When this option is enabled on a socket, all inbound and outgoing 337TCP segments must be signed with MD5 digests. 338.Pp 339One common use for this in a 340.Fx 341router deployment is to enable 342based routers to interwork with Cisco equipment at peering points. 343Support for this feature conforms to RFC 2385. 344.Pp 345In order for this option to function correctly, it is necessary for the 346administrator to add a tcp-md5 key entry to the system's security 347associations database (SADB) using the 348.Xr setkey 8 349utility. 350This entry can only be specified on a per-host basis at this time. 351.Pp 352If an SADB entry cannot be found for the destination, 353the system does not send any outgoing segments and drops any inbound segments. 354However, during connection negotiation, a non-signed segment will be accepted if 355an SADB entry does not exist between hosts. 356When a non-signed segment is accepted, the established connection is not 357protected with MD5 digests. 358.It Dv TCP_STATS 359Manage collection of connection level statistics using the 360.Xr stats 3 361framework. 362.Pp 363Each dropped segment is taken into account in the TCP protocol statistics. 364.It Dv TCP_TXTLS_ENABLE 365Enable in-kernel Transport Layer Security (TLS) for data written to this 366socket. 367See 368.Xr ktls 4 369for more details. 370.It Dv TCP_TXTLS_MODE 371The integer argument can be used to get or set the current TLS transmit mode 372of a socket. 373See 374.Xr ktls 4 375for more details. 376.It Dv TCP_RXTLS_ENABLE 377Enable in-kernel TLS for data read from this socket. 378See 379.Xr ktls 4 380for more details. 381.It Dv TCP_REUSPORT_LB_NUMA 382Changes NUMA affinity filtering for an established TCP listen 383socket. 384This option takes a single integer argument which specifies 385the NUMA domain to filter on for this listen socket. 386The argument can also have the follwing special values: 387.Bl -tag -width "Dv TCP_REUSPORT_LB_NUMA" 388.It Dv TCP_REUSPORT_LB_NUMA_NODOM 389Remove NUMA filtering for this listen socket. 390.It Dv TCP_REUSPORT_LB_NUMA_CURDOM 391Filter traffic associated with the domain where the calling thread is 392currently executing. 393This is typically used after a process or thread inherits a listen 394socket from its parent, and sets its CPU affinity to a particular core. 395.El 396.It Dv TCP_REMOTE_UDP_ENCAPS_PORT 397Set and get the remote UDP encapsulation port. 398It can only be set on a closed TCP socket. 399.El 400.Pp 401The option level for the 402.Xr setsockopt 2 403call is the protocol number for 404.Tn TCP , 405available from 406.Xr getprotobyname 3 , 407or 408.Dv IPPROTO_TCP . 409All options are declared in 410.In netinet/tcp.h . 411.Pp 412Options at the 413.Tn IP 414transport level may be used with 415.Tn TCP ; 416see 417.Xr ip 4 . 418Incoming connection requests that are source-routed are noted, 419and the reverse source route is used in responding. 420.Pp 421The default congestion control algorithm for 422.Tn TCP 423is 424.Xr cc_newreno 4 . 425Other congestion control algorithms can be made available using the 426.Xr mod_cc 4 427framework. 428.Ss MIB (sysctl) Variables 429The 430.Tn TCP 431protocol implements a number of variables in the 432.Va net.inet.tcp 433branch of the 434.Xr sysctl 3 435MIB, which can also be read or modified with 436.Xr sysctl 8 . 437.Bl -tag -width ".Va v6pmtud_blackhole_mss" 438.It Va always_keepalive 439Assume that 440.Dv SO_KEEPALIVE 441is set on all 442.Tn TCP 443connections, the kernel will 444periodically send a packet to the remote host to verify the connection 445is still up. 446.It Va blackhole 447If enabled, disable sending of RST when a connection is attempted 448to a port where there is no socket accepting connections. 449See 450.Xr blackhole 4 . 451.It Va blackhole_local 452See 453.Xr blackhole 4 . 454.It Va cc 455A number of variables for congestion control are under the 456.Va net.inet.tcp.cc 457node. 458See 459.Xr mod_cc 4 . 460.It Va cc.newreno 461Variables for NewReno congestion control are under the 462.Va net.inet.tcp.cc.newreno 463node. 464See 465.Xr cc_newreno 4 . 466.It Va delacktime 467Maximum amount of time, in milliseconds, before a delayed ACK is sent. 468.It Va delayed_ack 469Delay ACK to try and piggyback it onto a data packet or another ACK. 470.It Va do_lrd 471Enable Lost Retransmission Detection for SACK-enabled sessions, disabled by 472default. 473Under severe congestion, a retransmission can be lost which then leads to a 474mandatory Retransmission Timeout (RTO), followed by slow-start. 475LRD will try to resend the repeatedly lost packet, preventing the time-consuming 476RTO and performance reducing slow-start. 477.It Va do_prr 478Perform SACK loss recovery using the Proportional Rate Reduction (PRR) algorithm 479described in RFC6937. 480This improves the effectiveness of retransmissions particular in environments 481with ACK thinning or burst loss events, as chances to run out of the ACK clock 482are reduced, preventing lengthy and performance reducing RTO based loss recovery 483(default is true). 484.It Va do_prr_conservative 485While doing Proportional Rate Reduction, remain strictly in a packet conserving 486mode, sending only one new packet for each ACK received. 487Helpful when a misconfigured token bucket traffic policer causes persistent 488high losses leading to RTO, but reduces PRR effectiveness in more common settings 489(default is false). 490.It Va do_tcpdrain 491Flush packets in the 492.Tn TCP 493reassembly queue if the system is low on mbufs. 494.It Va drop_synfin 495Drop TCP packets with both SYN and FIN set. 496.It Va ecn.enable 497Enable support for TCP Explicit Congestion Notification (ECN). 498ECN allows a TCP sender to reduce the transmission rate in order to 499avoid packet drops. 500.Bl -tag -compact 501.It 0 502Disable ECN. 503.It 1 504Allow incoming connections to request ECN. 505Outgoing connections will request ECN. 506.It 2 507Allow incoming connections to request ECN. 508Outgoing connections will not request ECN. 509(default) 510.It 3 511Negotiate on incoming connection for Accurate ECN, ECN, or no ECN. 512Outgoing connections will request Accurate ECN and fall back to 513ECN depending on the capabilities of the server. 514.It 4 515Negotiate on incoming connection for Accurate ECN, ECN, or no ECN. 516Outgoing connections will not request ECN. 517.El 518.It Va ecn.maxretries 519Number of retries (SYN or SYN/ACK retransmits) before disabling ECN on a 520specific connection. 521This is needed to help with connection establishment 522when a broken firewall is in the network path. 523.It Va fast_finwait2_recycle 524Recycle 525.Tn TCP 526.Dv FIN_WAIT_2 527connections faster when the socket is marked as 528.Dv SBS_CANTRCVMORE 529(no user process has the socket open, data received on 530the socket cannot be read). 531The timeout used here is 532.Va finwait2_timeout . 533.It Va fastopen.acceptany 534When non-zero, all client-supplied TFO cookies will be considered to be valid. 535The default is 0. 536.It Va fastopen.autokey 537When this and 538.Va net.inet.tcp.fastopen.server_enable 539are non-zero, a new key will be automatically generated after this specified 540seconds. 541The default is 120. 542.It Va fastopen.ccache_bucket_limit 543The maximum number of entries in a client cookie cache bucket. 544The default value can be tuned with the 545.Dv TCP_FASTOPEN_CCACHE_BUCKET_LIMIT_DEFAULT 546kernel option or by setting 547.Va net.inet.tcp.fastopen_ccache_bucket_limit 548in the 549.Xr loader 8 . 550.It Va fastopen.ccache_buckets 551The number of client cookie cache buckets. 552Read-only. 553The value can be tuned with the 554.Dv TCP_FASTOPEN_CCACHE_BUCKETS_DEFAULT 555kernel option or by setting 556.Va fastopen.ccache_buckets 557in the 558.Xr loader 8 . 559.It Va fastopen.ccache_list 560Print the client cookie cache. 561Read-only. 562.It Va fastopen.client_enable 563When zero, no new active (i.e., client) TFO connections can be created. 564On the transition from enabled to disabled, the client cookie cache is cleared 565and disabled. 566The transition from enabled to disabled does not affect any active TFO 567connections in progress; it only prevents new ones from being established. 568The default is 0. 569.It Va fastopen.keylen 570The key length in bytes. 571Read-only. 572.It Va fastopen.maxkeys 573The maximum number of keys supported. 574Read-only, 575.It Va fastopen.maxpsks 576The maximum number of pre-shared keys supported. 577Read-only. 578.It Va fastopen.numkeys 579The current number of keys installed. 580Read-only. 581.It Va fastopen.numpsks 582The current number of pre-shared keys installed. 583Read-only. 584.It Va fastopen.path_disable_time 585When a failure occurs while trying to create a new active (i.e., client) TFO 586connection, new active connections on the same path, as determined by the tuple 587.Brq client_ip, server_ip, server_port , 588will be forced to be non-TFO for this many seconds. 589Note that the path disable mechanism relies on state stored in client cookie 590cache entries, so it is possible for the disable time for a given path to be 591reduced if the corresponding client cookie cache entry is reused due to resource 592pressure before the disable period has elapsed. 593The default is 594.Dv TCP_FASTOPEN_PATH_DISABLE_TIME_DEFAULT . 595.It Va fastopen.psk_enable 596When non-zero, pre-shared key (PSK) mode is enabled for all TFO servers. 597On the transition from enabled to disabled, all installed pre-shared keys are 598removed. 599The default is 0. 600.It Va fastopen.server_enable 601When zero, no new passive (i.e., server) TFO connections can be created. 602On the transition from enabled to disabled, all installed keys and pre-shared 603keys are removed. 604On the transition from disabled to enabled, if 605.Va fastopen.autokey 606is non-zero and there are no keys installed, a new key will be generated 607immediately. 608The transition from enabled to disabled does not affect any passive TFO 609connections in progress; it only prevents new ones from being established. 610The default is 0. 611.It Va fastopen.setkey 612Install a new key by writing 613.Va net.inet.tcp.fastopen.keylen 614bytes to this sysctl. 615.It Va fastopen.setpsk 616Install a new pre-shared key by writing 617.Va net.inet.tcp.fastopen.keylen 618bytes to this sysctl. 619.It Va finwait2_timeout 620Timeout to use for fast recycling of 621.Tn TCP 622.Dv FIN_WAIT_2 623connections 624.Pq Va fast_finwait2_recycle . 625Defaults to 60 seconds. 626.It Va functions_available 627List of available TCP function blocks (TCP stacks). 628.It Va functions_default 629The default TCP function block (TCP stack). 630.It Va functions_inherit_listen_socket_stack 631Determines whether to inherit listen socket's TCP stack or use the current 632system default TCP stack, as defined by 633.Va functions_default . 634Default is true. 635.It Va hostcache 636The TCP host cache is used to cache connection details and metrics to 637improve future performance of connections between the same hosts. 638At the completion of a TCP connection, a host will cache information 639for the connection for some defined period of time. 640There are a number of 641.Va hostcache 642variables under this node. 643See 644.Va hostcache.enable . 645.It Va hostcache.bucketlimit 646The maximum number of entries for the same hash. 647Defaults to 30. 648.It Va hostcache.cachelimit 649Overall entry limit for hostcache. 650Defaults to 651.Va hashsize 652* 653.Va bucketlimit . 654.It Va hostcache.count 655The current number of entries in the host cache. 656.It Va hostcache.enable 657Enable/disable the host cache: 658.Bl -tag -compact 659.It 0 660Disable the host cache. 661.It 1 662Enable the host cache. (default) 663.El 664.It Va hostcache.expire 665Time in seconds, how long a entry should be kept in the 666host cache since last accessed. 667Defaults to 3600 (1 hour). 668.It Va hostcache.hashsize 669Size of TCP hostcache hashtable. 670This number has to be a power of two, or will be rejected. 671Defaults to 512. 672.It Va hostcache.histo 673Provide a Histogram of the hostcache hash utilization. 674.It Va hostcache.list 675Provide a complete list of all current entries in the host 676cache. 677.It Va hostcache.prune 678Time in seconds between pruning expired host cache entries. 679Defaults to 300 (5 minutes). 680.It Va hostcache.purge 681Expire all entires on next pruning of host cache entries. 682Any non-zero setting will be reset to zero, once the purge 683is running. 684.Bl -tag -compact 685.It 0 686Do not purge all entries when pruning the host cache (default). 687.It 1 688Purge all entries when doing the next pruning. 689.It 2 690Purge all entries and also reseed the hash salt. 691.El 692.It Va hostcache.purgenow 693Immediately purge all entries once set to any value. 694Setting this to 2 will also reseed the hash salt. 695.It Va icmp_may_rst 696Certain 697.Tn ICMP 698unreachable messages may abort connections in 699.Tn SYN-SENT 700state. 701.It Va initcwnd_segments 702Enable the ability to specify initial congestion window in number of segments. 703The default value is 10 as suggested by RFC 6928. 704Changing the value on the fly would not affect connections 705using congestion window from the hostcache. 706Caution: 707This regulates the burst of packets allowed to be sent in the first RTT. 708The value should be relative to the link capacity. 709Start with small values for lower-capacity links. 710Large bursts can cause buffer overruns and packet drops if routers have small 711buffers or the link is experiencing congestion. 712.It Va insecure_rst 713Use criteria defined in RFC793 instead of RFC5961 for accepting RST segments. 714Default is false. 715.It Va insecure_syn 716Use criteria defined in RFC793 instead of RFC5961 for accepting SYN segments. 717Default is false. 718.It Va isn_reseed_interval 719The interval (in seconds) specifying how often the secret data used in 720RFC 1948 initial sequence number calculations should be reseeded. 721By default, this variable is set to zero, indicating that 722no reseeding will occur. 723Reseeding should not be necessary, and will break 724.Dv TIME_WAIT 725recycling for a few minutes. 726.It Va keepcnt 727Number of keepalive probes sent, with no response, before a connection 728is dropped. 729The default is 8 packets. 730.It Va keepidle 731Amount of time, in milliseconds, that the connection must be idle 732before sending keepalive probes (if enabled). 733The default is 7200000 msec (7.2M msec, 2 hours). 734.It Va keepinit 735Timeout, in milliseconds, for new, non-established 736.Tn TCP 737connections. 738The default is 75000 msec (75K msec, 75 sec). 739.It Va keepintvl 740The interval, in milliseconds, between keepalive probes sent to remote 741machines, when no response is received on a 742.Va keepidle 743probe. 744The default is 75000 msec (75K msec, 75 sec). 745.It Va log_in_vain 746Log any connection attempts to ports where there is no socket 747accepting connections. 748The value of 1 limits the logging to 749.Tn SYN 750(connection establishment) packets only. 751A value of 2 results in any 752.Tn TCP 753packets to closed ports being logged. 754Any value not listed above disables the logging 755(default is 0, i.e., the logging is disabled). 756.It Va maxtcptw 757When a TCP connection enters the 758.Dv TIME_WAIT 759state, its associated socket structure is freed, since it is of 760negligible size and use, and a new structure is allocated to contain a 761minimal amount of information necessary for sustaining a connection in 762this state, called the compressed TCP 763.Dv TIME_WAIT 764state. 765Since this structure is smaller than a socket structure, it can save 766a significant amount of system memory. 767The 768.Va net.inet.tcp.maxtcptw 769MIB variable controls the maximum number of these structures allocated. 770By default, it is initialized to 771.Va kern.ipc.maxsockets 772/ 5. 773.It Va minmss 774Minimum TCP Maximum Segment Size; used to prevent a denial of service attack 775from an unreasonably low MSS. 776.It Va msl 777The Maximum Segment Lifetime, in milliseconds, for a packet. 778.It Va mssdflt 779The default value used for the TCP Maximum Segment Size 780.Pq Dq MSS 781for IPv4 when no advice to the contrary is received from MSS negotiation. 782.It Va newcwd 783Enable the New Congestion Window Validation mechanism as described in RFC 7661. 784This gently reduces the congestion window during periods, where TCP is 785application limited and the network bandwidth is not utilized completely. 786That prevents self-inflicted packet losses once the application starts to 787transmit data at a higher speed. 788.It Va nolocaltimewait 789Suppress creation of compressed TCP 790.Dv TIME_WAIT 791states for connections in 792which both endpoints are local. 793.It Va path_mtu_discovery 794Enable Path MTU Discovery. 795.It Va pcbcount 796Number of active process control blocks 797(read-only). 798.It Va perconn_stats_enable 799Controls the default collection of statistics for all connections using the 800.Xr stats 3 801framework. 8020 disables, 1 enables, 2 enables random sampling across log id connection 803groups with all connections in a group receiving the same setting. 804.It Va perconn_stats_sample_rates 805A CSV list of template_spec=percent key-value pairs which controls the per 806template sampling rates when 807.Xr stats 3 808sampling is enabled. 809.It Va persmax 810Maximum persistence interval, msec. 811.It Va persmin 812Minimum persistence interval, msec. 813.It Va pmtud_blackhole_detection 814Enable automatic path MTU blackhole detection. 815In case of retransmits of MSS sized segments, 816the OS will lower the MSS to check if it's an MTU problem. 817If the current MSS is greater than the configured value to try 818.Po Va net.inet.tcp.pmtud_blackhole_mss 819and 820.Va net.inet.tcp.v6pmtud_blackhole_mss 821.Pc , 822it will be set to this value, otherwise, 823the MSS will be set to the default values 824.Po Va net.inet.tcp.mssdflt 825and 826.Va net.inet.tcp.v6mssdflt 827.Pc . 828Settings: 829.Bl -tag -compact 830.It 0 831Disable path MTU blackhole detection. 832.It 1 833Enable path MTU blackhole detection for IPv4 and IPv6. 834.It 2 835Enable path MTU blackhole detection only for IPv4. 836.It 3 837Enable path MTU blackhole detection only for IPv6. 838.El 839.It Va pmtud_blackhole_mss 840MSS to try for IPv4 if PMTU blackhole detection is turned on. 841.It Va reass.cursegments 842The current total number of segments present in all reassembly queues. 843.It Va reass.maxqueuelen 844The maximum number of segments allowed in each reassembly queue. 845By default, the system chooses a limit based on each TCP connection's 846receive buffer size and maximum segment size (MSS). 847The actual limit applied to a session's reassembly queue will be the lower of 848the system-calculated automatic limit and the user-specified 849.Va reass.maxqueuelen 850limit. 851.It Va reass.maxsegments 852The maximum limit on the total number of segments across all reassembly 853queues. 854The limit can be adjusted as a tunable. 855.It Va recvbuf_auto 856Enable automatic receive buffer sizing as a connection progresses. 857.It Va recvbuf_max 858Maximum size of automatic receive buffer. 859.It Va recvspace 860Initial 861.Tn TCP 862receive window (buffer size). 863.It Va require_unique_port 864Require unique ephemeral port for outgoing connections; 865otherwise, the 4-tuple of local and remote ports and addresses must be unique. 866Requiring a unique port limits the number of outgoing connections. 867.It Va rexmit_drop_options 868Drop TCP options from third and later retransmitted SYN segments 869of a connection. 870.It Va rexmit_initial , rexmit_min , rexmit_slop 871Adjust the retransmit timer calculation for 872.Tn TCP . 873The slop is 874typically added to the raw calculation to take into account 875occasional variances that the 876.Tn SRTT 877(smoothed round-trip time) 878is unable to accommodate, while the minimum specifies an 879absolute minimum. 880While a number of 881.Tn TCP 882RFCs suggest a 1 883second minimum, these RFCs tend to focus on streaming behavior, 884and fail to deal with the fact that a 1 second minimum has severe 885detrimental effects over lossy interactive connections, such 886as a 802.11b wireless link, and over very fast but lossy 887connections for those cases not covered by the fast retransmit 888code. 889For this reason, we use 200ms of slop and a near-0 890minimum, which gives us an effective minimum of 200ms (similar to 891.Tn Linux ) . 892The initial value is used before an RTT measurement has been performed. 893.It Va rfc1323 894Implement the window scaling and timestamp options of RFC 1323/RFC 7323 895(default is 1). 896Settings: 897.Bl -tag -compact 898.It 0 899Disable window scaling and timestamp option. 900.It 1 901Enable window scaling and timestamp option. 902.It 2 903Enable only window scaling. 904.It 3 905Enable only timestamp option. 906.El 907.It Va rfc3042 908Enable the Limited Transmit algorithm as described in RFC 3042. 909It helps avoid timeouts on lossy links and also when the congestion window 910is small, as happens on short transfers. 911.It Va rfc3390 912Enable support for RFC 3390, which allows for a variable-sized 913starting congestion window on new connections, depending on the 914maximum segment size. 915This helps throughput in general, but 916particularly affects short transfers and high-bandwidth large 917propagation-delay connections. 918.It Va rfc6675_pipe 919Deprecated and superseded by 920.Va sack.revised 921.It Va sack.enable 922Enable support for RFC 2018, TCP Selective Acknowledgment option, 923which allows the receiver to inform the sender about all successfully 924arrived segments, allowing the sender to retransmit the missing segments 925only. 926.It Va sack.globalholes 927Global number of TCP SACK holes currently allocated. 928.It Va sack.globalmaxholes 929Maximum number of SACK holes per system, across all connections. 930Defaults to 65536. 931.It Va sack.maxholes 932Maximum number of SACK holes per connection. 933Defaults to 128. 934.It Va sack.revised 935Enables three updated mechanisms from RFC6675 (default is true). 936Calculate the bytes in flight using the algorithm described in RFC 6675, and 937is also an improvement when Proportional Rate Reduction is enabled. 938Next, Rescue Retransmission helps timely loss recovery, when the trailing segments 939of a transmission are lost, while no additional data is ready to be sent. 940In case a partial ACK without a SACK block is received during SACK loss 941recovery, the trailing segment is immediately resent, rather than waiting 942for a Retransmission timeout. 943Finally, SACK loss recovery is also engaged, once two segments plus one byte are 944SACKed - even if no traditional duplicate ACKs were observed. 945.It Va sendbuf_auto 946Enable automatic send buffer sizing. 947.It Va sendbuf_auto_lowat 948Modify threshold for auto send buffer growth to account for 949.Dv SO_SNDLOWAT . 950.It Va sendbuf_inc 951Incrementor step size of automatic send buffer. 952.It Va sendbuf_max 953Maximum size of automatic send buffer. 954.It Va sendspace 955Initial 956.Tn TCP 957send window (buffer size). 958.It Va syncache 959Variables under the 960.Va net.inet.tcp.syncache 961node are documented in 962.Xr syncache 4 . 963.It Va syncookies 964Determines whether or not 965.Tn SYN 966cookies should be generated for outbound 967.Tn SYN-ACK 968packets. 969.Tn SYN 970cookies are a great help during 971.Tn SYN 972flood attacks, and are enabled by default. 973(See 974.Xr syncookies 4 . ) 975.It Va syncookies_only 976See 977.Xr syncookies 4 . 978.It Va tcbhashsize 979Size of the 980.Tn TCP 981control-block hash table 982(read-only). 983This is tuned using the kernel option 984.Dv TCBHASHSIZE 985or by setting 986.Va net.inet.tcp.tcbhashsize 987in the 988.Xr loader 8 . 989.It Va tolerate_missing_ts 990Tolerate the missing of timestamps (RFC 1323/RFC 7323) for 991.Tn TCP 992segments belonging to 993.Tn TCP 994connections for which support of 995.Tn TCP 996timestamps has been negotiated. 997As of June 2021, several TCP stacks are known to violate RFC 7323, including 998modern widely deployed ones. 999Therefore the default is 1, i.e., the missing of timestamps is tolerated. 1000.It Va ts_offset_per_conn 1001When initializing the TCP timestamps, use a per connection offset instead of a 1002per host pair offset. 1003Default is to use per connection offsets as recommended in RFC 7323. 1004.It Va tso 1005Enable TCP Segmentation Offload. 1006.It Va udp_tunneling_overhead 1007The overhead taken into account when using UDP encapsulation. 1008Since MSS clamping by middleboxes will most likely not work, values larger than 10098 (the size of the UDP header) are also supported. 1010Supported values are between 8 and 1024. 1011The default is 8. 1012.It Va udp_tunneling_port 1013The local UDP encapsulation port. 1014A value of 0 indicates that UDP encapsulation is disabled. 1015The default is 0. 1016.It Va v6mssdflt 1017The default value used for the TCP Maximum Segment Size 1018.Pq Dq MSS 1019for IPv6 when no advice to the contrary is received from MSS negotiation. 1020.It Va v6pmtud_blackhole_mss 1021MSS to try for IPv6 if PMTU blackhole detection is turned on. 1022See 1023.Va pmtud_blackhole_detection . 1024.El 1025.Sh ERRORS 1026A socket operation may fail with one of the following errors returned: 1027.Bl -tag -width Er 1028.It Bq Er EISCONN 1029when trying to establish a connection on a socket which 1030already has one; 1031.It Bo Er ENOBUFS Bc or Bo Er ENOMEM Bc 1032when the system runs out of memory for 1033an internal data structure; 1034.It Bq Er ETIMEDOUT 1035when a connection was dropped 1036due to excessive retransmissions; 1037.It Bq Er ECONNRESET 1038when the remote peer 1039forces the connection to be closed; 1040.It Bq Er ECONNREFUSED 1041when the remote 1042peer actively refuses connection establishment (usually because 1043no process is listening to the port); 1044.It Bq Er EADDRINUSE 1045when an attempt 1046is made to create a socket with a port which has already been 1047allocated; 1048.It Bq Er EADDRNOTAVAIL 1049when an attempt is made to create a 1050socket with a network address for which no network interface 1051exists; 1052.It Bq Er EAFNOSUPPORT 1053when an attempt is made to bind or connect a socket to a multicast 1054address. 1055.It Bq Er EINVAL 1056when trying to change TCP function blocks at an invalid point in the session; 1057.It Bq Er ENOENT 1058when trying to use a TCP function block that is not available; 1059.El 1060.Sh SEE ALSO 1061.Xr getsockopt 2 , 1062.Xr socket 2 , 1063.Xr stats 3 , 1064.Xr sysctl 3 , 1065.Xr blackhole 4 , 1066.Xr inet 4 , 1067.Xr intro 4 , 1068.Xr ip 4 , 1069.Xr ktls 4 , 1070.Xr mod_cc 4 , 1071.Xr siftr 4 , 1072.Xr syncache 4 , 1073.Xr tcp_bbr 4 , 1074.Xr setkey 8 , 1075.Xr sysctl 8 , 1076.Xr tcp_functions 9 1077.Rs 1078.%A "V. Jacobson" 1079.%A "B. Braden" 1080.%A "D. Borman" 1081.%T "TCP Extensions for High Performance" 1082.%O "RFC 1323" 1083.Re 1084.Rs 1085.%A "D. Borman" 1086.%A "B. Braden" 1087.%A "V. Jacobson" 1088.%A "R. Scheffenegger" 1089.%T "TCP Extensions for High Performance" 1090.%O "RFC 7323" 1091.Re 1092.Rs 1093.%A "A. Heffernan" 1094.%T "Protection of BGP Sessions via the TCP MD5 Signature Option" 1095.%O "RFC 2385" 1096.Re 1097.Rs 1098.%A "K. Ramakrishnan" 1099.%A "S. Floyd" 1100.%A "D. Black" 1101.%T "The Addition of Explicit Congestion Notification (ECN) to IP" 1102.%O "RFC 3168" 1103.Re 1104.Sh HISTORY 1105The 1106.Tn TCP 1107protocol appeared in 1108.Bx 4.2 . 1109The RFC 1323 extensions for window scaling and timestamps were added 1110in 1111.Bx 4.4 . 1112The 1113.Dv TCP_INFO 1114option was introduced in 1115.Tn Linux 2.6 1116and is 1117.Em subject to change . 1118