xref: /freebsd/share/man/man4/tcp.4 (revision 4b2eaea43fec8e8792be611dea204071a10b655a)
1.\" Copyright (c) 1983, 1991, 1993
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\" 3. All advertising materials mentioning features or use of this software
13.\"    must display the following acknowledgement:
14.\"	This product includes software developed by the University of
15.\"	California, Berkeley and its contributors.
16.\" 4. Neither the name of the University nor the names of its contributors
17.\"    may be used to endorse or promote products derived from this software
18.\"    without specific prior written permission.
19.\"
20.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
21.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
23.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
24.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
26.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
27.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
28.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
29.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
30.\" SUCH DAMAGE.
31.\"
32.\"     From: @(#)tcp.4	8.1 (Berkeley) 6/5/93
33.\" $FreeBSD$
34.\"
35.Dd February 14, 1995
36.Dt TCP 4
37.Os
38.Sh NAME
39.Nm tcp
40.Nd Internet Transmission Control Protocol
41.Sh SYNOPSIS
42.In sys/types.h
43.In sys/socket.h
44.In netinet/in.h
45.Ft int
46.Fn socket AF_INET SOCK_STREAM 0
47.Sh DESCRIPTION
48The
49.Tn TCP
50protocol provides reliable, flow-controlled, two-way
51transmission of data.  It is a byte-stream protocol used to
52support the
53.Dv SOCK_STREAM
54abstraction.  TCP uses the standard
55Internet address format and, in addition, provides a per-host
56collection of
57.Dq port addresses .
58Thus, each address is composed
59of an Internet address specifying the host and network, with
60a specific
61.Tn TCP
62port on the host identifying the peer entity.
63.Pp
64Sockets utilizing the tcp protocol are either
65.Dq active
66or
67.Dq passive .
68Active sockets initiate connections to passive
69sockets.  By default
70.Tn TCP
71sockets are created active; to create a
72passive socket the
73.Xr listen 2
74system call must be used
75after binding the socket with the
76.Xr bind 2
77system call.  Only
78passive sockets may use the
79.Xr accept 2
80call to accept incoming connections.  Only active sockets may
81use the
82.Xr connect 2
83call to initiate connections.
84.Tn TCP
85also supports a more datagram-like mode, called Transaction
86.Tn TCP ,
87which is described in
88.Xr ttcp 4 .
89.Pp
90Passive sockets may
91.Dq underspecify
92their location to match
93incoming connection requests from multiple networks.  This
94technique, termed
95.Dq wildcard addressing ,
96allows a single
97server to provide service to clients on multiple networks.
98To create a socket which listens on all networks, the Internet
99address
100.Dv INADDR_ANY
101must be bound.  The
102.Tn TCP
103port may still be specified
104at this time; if the port is not specified the system will assign one.
105Once a connection has been established the socket's address is
106fixed by the peer entity's location.   The address assigned the
107socket is the address associated with the network interface
108through which packets are being transmitted and received.  Normally
109this address corresponds to the peer entity's network.
110.Pp
111.Tn TCP
112supports a number of socket options which can be set with
113.Xr setsockopt 2
114and tested with
115.Xr getsockopt 2 :
116.Bl -tag -width TCP_NODELAYx
117.It Dv TCP_NODELAY
118Under most circumstances,
119.Tn TCP
120sends data when it is presented;
121when outstanding data has not yet been acknowledged, it gathers
122small amounts of output to be sent in a single packet once
123an acknowledgement is received.
124For a small number of clients, such as window systems
125that send a stream of mouse events which receive no replies,
126this packetization may cause significant delays.
127The boolean option
128.Dv TCP_NODELAY
129defeats this algorithm.
130.It Dv TCP_MAXSEG
131By default, a sender\- and receiver-TCP
132will negotiate among themselves to determine the maximum segment size
133to be used for each connection.  The
134.Dv TCP_MAXSEG
135option allows the user to determine the result of this negotiation,
136and to reduce it if desired.
137.It Dv TCP_NOOPT
138.Tn TCP
139usually sends a number of options in each packet, corresponding to
140various
141.Tn TCP
142extensions which are provided in this implementation.  The boolean
143option
144.Dv TCP_NOOPT
145is provided to disable
146.Tn TCP
147option use on a per-connection basis.
148.It Dv TCP_NOPUSH
149By convention, the sender-TCP
150will set the
151.Dq push
152bit and begin transmission immediately (if permitted) at the end of
153every user call to
154.Xr write 2
155or
156.Xr writev 2 .
157The
158.Dv TCP_NOPUSH
159option is provided to allow servers to easily make use of Transaction
160TCP (see
161.Xr ttcp 4 ) .
162When the option is set to a non-zero value,
163.Tn TCP
164will delay sending any data at all until either the socket is closed,
165or the internal send buffer is filled.
166.El
167.Pp
168The option level for the
169.Xr setsockopt 2
170call is the protocol number for
171.Tn TCP ,
172available from
173.Xr getprotobyname 3 ,
174or
175.Dv IPPROTO_TCP .
176All options are declared in
177.Aq Pa netinet/tcp.h .
178.Pp
179Options at the
180.Tn IP
181transport level may be used with
182.Tn TCP ;
183see
184.Xr ip 4 .
185Incoming connection requests that are source-routed are noted,
186and the reverse source route is used in responding.
187.Sh MIB VARIABLES
188The
189.Nm
190protocol implements a number of variables in the
191.Li net.inet
192branch of the
193.Xr sysctl 3
194MIB.
195.Bl -tag -width TCPCTL_DO_RFC1644
196.It Dv TCPCTL_DO_RFC1323
197.Pq tcp.rfc1323
198Implement the window scaling and timestamp options of RFC 1323
199(default true).
200.It Dv TCPCTL_DO_RFC1644
201.Pq tcp.rfc1644
202Implement Transaction
203.Tn TCP ,
204as described in RFC 1644.
205.It Dv TCPCTL_MSSDFLT
206.Pq tcp.mssdflt
207The default value used for the maximum segment size
208.Pq Dq MSS
209when no advice to the contrary is received from MSS negotiation.
210.It Dv TCPCTL_SENDSPACE
211.Pq tcp.sendspace
212Maximum TCP send window.
213.It Dv TCPCTL_RECVSPACE
214.Pq tcp.recvspace
215Maximum TCP receive window.
216.It tcp.log_in_vain
217Log any connection attempts to ports where there is not a socket
218accepting connections.
219The value of 1 limits the logging to SYN (connection establishment)
220packets only.
221That of 2 results in any TCP packets to closed ports being logged.
222Any value unlisted above disables the logging
223(default is 0, i.e., the logging is disabled).
224.It tcp.slowstart_flightsize
225The number of packets allowed to be in-flight during the
226.Tn TCP
227slow-start phase on a non-local network.
228.It tcp.local_slowstart_flightsize
229The number of packets allowed to be in-flight during the
230.Tn TCP
231slow-start phase to local machines in the same subnet.
232.It tcp.msl
233The Maximum Segment Lifetime, in milliseconds, for a packet.
234.It tcp.keepinit
235Timeout, in milliseconds, for new, non-established TCP connections.
236.It tcp.keepidle
237Amount of time, in milliseconds, that the connection must be idle
238before keepalive probes (if enabled) are sent.
239.It tcp.keepintvl
240The interval, in milliseconds, between keepalive probes sent to remote
241machines.
242After
243.Dv TCPTV_KEEPCNT
244(default 8) probes are sent, with no response, the connection is dropped.
245.It tcp.always_keepalive
246Assume that
247.Dv SO_KEEPALIVE
248is set on all
249.Tn TCP
250connections, the kernel will
251periodically send a packet to the remote host to verify the connection
252is still up.
253.It tcp.icmp_may_rst
254Certain
255.Tn ICMP
256unreachable messages may abort connections in
257.Tn SYN-SENT
258state.
259.It tcp.do_tcpdrain
260Flush packets in the
261.Tn TCP
262reassembly queue if the system is low on mbufs.
263.It tcp.blackhole
264If enabled, disable sending of RST when a connection is attempted
265to a port where there is not a socket accepting connections.
266See
267.Xr blackhole 4 .
268.It tcp.delayed_ack
269Delay ACK to try and piggyback it onto a data packet.
270.It tcp.delacktime
271Maximum amount of time, in milliseconds, before a delayed ACK is sent.
272.It tcp.newreno
273Enable TCP NewReno Fast Recovery algorithm,
274as described in RFC 2582.
275.It tcp.path_mtu_discovery
276Enable Path MTU Discovery
277.It tcp.tcbhashsize
278Size of the
279.Tn TCP
280control-block hashtable
281(read-only).
282This may be tuned using the kernel option
283.Dv TCBHASHSIZE
284or by setting
285.Va net.inet.tcp.tcbhashsize
286in the
287.Xr loader 8 .
288.It tcp.pcbcount
289Number of active process control blocks
290(read-only).
291.It tcp.syncookies
292Determines whether or not syn cookies should be generated for
293outbound syn-ack packets.  Syn cookies are a great help during
294syn flood attacks, and are enabled by default.
295.It tcp.isn_reseed_interval
296The interval (in seconds) specifying how often the secret data used in
297RFC 1948 initial sequence number calculations should be reseeded.
298By default, this variable is set to zero, indicating that
299no reseeding will occur.
300Reseeding should not be necessary, and will break
301.Dv TIME_WAIT
302recycling for a few minutes.
303.It tcp.inet.tcp.rexmit_{min,slop}
304Adjust the retransmit timer calculation for TCP.  The slop is
305typically added to the raw calculation to take into account
306occasional variances that the SRTT (smoothed round trip time)
307is unable to accomodate, while the minimum specifies an
308absolute minimum.  While a number of TCP RFCs suggest a 1
309second minimum these RFCs tend to focus on streaming behavior
310and fail to deal with the fact that a 1 second minimum has severe
311detrimental effects over lossy interactive connections, such
312as a 802.11b wireless link, and over very fast but lossy
313connections for those cases not covered by the fast retransmit
314code.  For this reason we use 200ms of slop and a near-0
315minimum, which gives us an effective minimum of 200ms (similar to Linux).
316.It tcp.inflight_enable
317Enable
318.Tn TCP
319bandwidth delay product limiting.  An attempt will be made to calculate
320the bandwidth delay product for each individual TCP connection and limit
321the amount of inflight data being transmitted to avoid building up
322unnecessary packets in the network.  This option is recommended if you
323are serving a lot of data over connections with high bandwidth-delay
324products, such as modems, GigE links, and fast long-haul WANs, and/or
325you have configured your machine to accomodate large TCP windows.  In such
326situations, without this option, you may experience high interactive
327latencies or packet loss due to the overloading of intermediate routers
328and switches.  Note that bandwidth delay product limiting only effects
329the transmit side of a TCP connection.
330.It tcp.inflight_debug
331Enable debugging for the bandwidth delay product algorithm.  This may
332default to on (1) so if you enable the algorithm you should probably also
333disable debugging by setting this variable to 0.
334.It tcp.inflight_min
335This puts a lower bound on the bandwidth delay product window, in bytes.
336A value of 1024 is typically used for debugging.  6000-16000 is more typical
337in a production installation.  Setting this value too low may result in
338slow ramp-up times for bursty connections.  Setting this value too high
339effectively disables the algorithm.
340.It tcp.inflight_max
341This puts an upper bound on the bandwidth delay product window, in bytes.
342This value should not generally be modified but may be used to set a
343global per-connection limit on queued data, potentially allowing you to
344intentionally set a less than optimum limit to smooth data flow over a
345network while still being able to specify huge internal TCP buffers.
346.It tcp.inflight_stab
347The bandwidth delay product algorithm requires a slightly larger window
348than it otherwise calculates for stability.  This parameter determines the
349extra window in maximal packets / 10.  The default value of 20 represents
3502 maximal packets.  Reducing this value is not recommended but you may
351come across a situation with very slow links where the ping time
352reduction of the default inflight code is not sufficient.  If this case
353occurs, you should first try reducing
354.Va tcp.inflight_min
355and, if that does not
356work, reduce both
357.Va tcp.inflight_min
358and
359.Va tcp.inflight_stab ,
360trying values of
36115, 10, or 5 for the latter.  Never use a value less than 5.  Reducing
362.Va tcp.inflight_stab
363can lead to upwards of a 20% underutilization of the link
364as well as reducing the algorithm's ability to adapt to changing
365situations and should only be done as a last resort.
366.El
367.Sh ERRORS
368A socket operation may fail with one of the following errors returned:
369.Bl -tag -width Er
370.It Bq Er EISCONN
371when trying to establish a connection on a socket which
372already has one;
373.It Bq Er ENOBUFS
374when the system runs out of memory for
375an internal data structure;
376.It Bq Er ETIMEDOUT
377when a connection was dropped
378due to excessive retransmissions;
379.It Bq Er ECONNRESET
380when the remote peer
381forces the connection to be closed;
382.It Bq Er ECONNREFUSED
383when the remote
384peer actively refuses connection establishment (usually because
385no process is listening to the port);
386.It Bq Er EADDRINUSE
387when an attempt
388is made to create a socket with a port which has already been
389allocated;
390.It Bq Er EADDRNOTAVAIL
391when an attempt is made to create a
392socket with a network address for which no network interface
393exists.
394.It Bq Er EAFNOSUPPORT
395when an attempt is made to bind or connect a socket to a multicast
396address.
397.El
398.Sh SEE ALSO
399.Xr getsockopt 2 ,
400.Xr socket 2 ,
401.Xr sysctl 3 ,
402.Xr blackhole 4 ,
403.Xr inet 4 ,
404.Xr intro 4 ,
405.Xr ip 4 ,
406.Xr syncache 4 ,
407.Xr ttcp 4
408.Rs
409.%A V. Jacobson
410.%A R. Braden
411.%A D. Borman
412.%T "TCP Extensions for High Performance"
413.%O RFC 1323
414.Re
415.Rs
416.%A R. Braden
417.%T "T/TCP \- TCP Extensions for Transactions"
418.%O RFC 1644
419.Re
420.Sh HISTORY
421The
422.Nm
423protocol appeared in
424.Bx 4.2 .
425The RFC 1323 extensions for window scaling and timestamps were added
426in
427.Bx 4.4 .
428