xref: /linux/Documentation/networking/snmp_counter.rst (revision cad4977344b35ea116ec5fefe91a76b1dfa113f5)
1===========
2SNMP counter
3===========
4
5This document explains the meaning of SNMP counters.
6
7General IPv4 counters
8====================
9All layer 4 packets and ICMP packets will change these counters, but
10these counters won't be changed by layer 2 packets (such as STP) or
11ARP packets.
12
13* IpInReceives
14Defined in `RFC1213 ipInReceives`_
15
16.. _RFC1213 ipInReceives: https://tools.ietf.org/html/rfc1213#page-26
17
18The number of packets received by the IP layer. It gets increasing at the
19beginning of ip_rcv function, always be updated together with
20IpExtInOctets. It indicates the number of aggregated segments after
21GRO/LRO.
22
23* IpInDelivers
24Defined in `RFC1213 ipInDelivers`_
25
26.. _RFC1213 ipInDelivers: https://tools.ietf.org/html/rfc1213#page-28
27
28The number of packets delivers to the upper layer protocols. E.g. TCP, UDP,
29ICMP and so on. If no one listens on a raw socket, only kernel
30supported protocols will be delivered, if someone listens on the raw
31socket, all valid IP packets will be delivered.
32
33* IpOutRequests
34Defined in `RFC1213 ipOutRequests`_
35
36.. _RFC1213 ipOutRequests: https://tools.ietf.org/html/rfc1213#page-28
37
38The number of packets sent via IP layer, for both single cast and
39multicast packets, and would always be updated together with
40IpExtOutOctets.
41
42* IpExtInOctets and IpExtOutOctets
43They are Linux kernel extensions, no RFC definitions. Please note,
44RFC1213 indeed defines ifInOctets  and ifOutOctets, but they
45are different things. The ifInOctets and ifOutOctets include the MAC
46layer header size but IpExtInOctets and IpExtOutOctets don't, they
47only include the IP layer header and the IP layer data.
48
49* IpExtInNoECTPkts, IpExtInECT1Pkts, IpExtInECT0Pkts, IpExtInCEPkts
50They indicate the number of four kinds of ECN IP packets, please refer
51`Explicit Congestion Notification`_ for more details.
52
53.. _Explicit Congestion Notification: https://tools.ietf.org/html/rfc3168#page-6
54
55These 4 counters calculate how many packets received per ECN
56status. They count the real frame number regardless the LRO/GRO. So
57for the same packet, you might find that IpInReceives count 1, but
58IpExtInNoECTPkts counts 2 or more.
59
60ICMP counters
61============
62* IcmpInMsgs and IcmpOutMsgs
63Defined by `RFC1213 icmpInMsgs`_ and `RFC1213 icmpOutMsgs`_
64
65.. _RFC1213 icmpInMsgs: https://tools.ietf.org/html/rfc1213#page-41
66.. _RFC1213 icmpOutMsgs: https://tools.ietf.org/html/rfc1213#page-43
67
68As mentioned in the RFC1213, these two counters include errors, they
69would be increased even if the ICMP packet has an invalid type. The
70ICMP output path will check the header of a raw socket, so the
71IcmpOutMsgs would still be updated if the IP header is constructed by
72a userspace program.
73
74* ICMP named types
75| These counters include most of common ICMP types, they are:
76| IcmpInDestUnreachs: `RFC1213 icmpInDestUnreachs`_
77| IcmpInTimeExcds: `RFC1213 icmpInTimeExcds`_
78| IcmpInParmProbs: `RFC1213 icmpInParmProbs`_
79| IcmpInSrcQuenchs: `RFC1213 icmpInSrcQuenchs`_
80| IcmpInRedirects: `RFC1213 icmpInRedirects`_
81| IcmpInEchos: `RFC1213 icmpInEchos`_
82| IcmpInEchoReps: `RFC1213 icmpInEchoReps`_
83| IcmpInTimestamps: `RFC1213 icmpInTimestamps`_
84| IcmpInTimestampReps: `RFC1213 icmpInTimestampReps`_
85| IcmpInAddrMasks: `RFC1213 icmpInAddrMasks`_
86| IcmpInAddrMaskReps: `RFC1213 icmpInAddrMaskReps`_
87| IcmpOutDestUnreachs: `RFC1213 icmpOutDestUnreachs`_
88| IcmpOutTimeExcds: `RFC1213 icmpOutTimeExcds`_
89| IcmpOutParmProbs: `RFC1213 icmpOutParmProbs`_
90| IcmpOutSrcQuenchs: `RFC1213 icmpOutSrcQuenchs`_
91| IcmpOutRedirects: `RFC1213 icmpOutRedirects`_
92| IcmpOutEchos: `RFC1213 icmpOutEchos`_
93| IcmpOutEchoReps: `RFC1213 icmpOutEchoReps`_
94| IcmpOutTimestamps: `RFC1213 icmpOutTimestamps`_
95| IcmpOutTimestampReps: `RFC1213 icmpOutTimestampReps`_
96| IcmpOutAddrMasks: `RFC1213 icmpOutAddrMasks`_
97| IcmpOutAddrMaskReps: `RFC1213 icmpOutAddrMaskReps`_
98
99.. _RFC1213 icmpInDestUnreachs: https://tools.ietf.org/html/rfc1213#page-41
100.. _RFC1213 icmpInTimeExcds: https://tools.ietf.org/html/rfc1213#page-41
101.. _RFC1213 icmpInParmProbs: https://tools.ietf.org/html/rfc1213#page-42
102.. _RFC1213 icmpInSrcQuenchs: https://tools.ietf.org/html/rfc1213#page-42
103.. _RFC1213 icmpInRedirects: https://tools.ietf.org/html/rfc1213#page-42
104.. _RFC1213 icmpInEchos: https://tools.ietf.org/html/rfc1213#page-42
105.. _RFC1213 icmpInEchoReps: https://tools.ietf.org/html/rfc1213#page-42
106.. _RFC1213 icmpInTimestamps: https://tools.ietf.org/html/rfc1213#page-42
107.. _RFC1213 icmpInTimestampReps: https://tools.ietf.org/html/rfc1213#page-43
108.. _RFC1213 icmpInAddrMasks: https://tools.ietf.org/html/rfc1213#page-43
109.. _RFC1213 icmpInAddrMaskReps: https://tools.ietf.org/html/rfc1213#page-43
110
111.. _RFC1213 icmpOutDestUnreachs: https://tools.ietf.org/html/rfc1213#page-44
112.. _RFC1213 icmpOutTimeExcds: https://tools.ietf.org/html/rfc1213#page-44
113.. _RFC1213 icmpOutParmProbs: https://tools.ietf.org/html/rfc1213#page-44
114.. _RFC1213 icmpOutSrcQuenchs: https://tools.ietf.org/html/rfc1213#page-44
115.. _RFC1213 icmpOutRedirects: https://tools.ietf.org/html/rfc1213#page-44
116.. _RFC1213 icmpOutEchos: https://tools.ietf.org/html/rfc1213#page-45
117.. _RFC1213 icmpOutEchoReps: https://tools.ietf.org/html/rfc1213#page-45
118.. _RFC1213 icmpOutTimestamps: https://tools.ietf.org/html/rfc1213#page-45
119.. _RFC1213 icmpOutTimestampReps: https://tools.ietf.org/html/rfc1213#page-45
120.. _RFC1213 icmpOutAddrMasks: https://tools.ietf.org/html/rfc1213#page-45
121.. _RFC1213 icmpOutAddrMaskReps: https://tools.ietf.org/html/rfc1213#page-46
122
123Every ICMP type has two counters: 'In' and 'Out'. E.g., for the ICMP
124Echo packet, they are IcmpInEchos and IcmpOutEchos. Their meanings are
125straightforward. The 'In' counter means kernel receives such a packet
126and the 'Out' counter means kernel sends such a packet.
127
128* ICMP numeric types
129They are IcmpMsgInType[N] and IcmpMsgOutType[N], the [N] indicates the
130ICMP type number. These counters track all kinds of ICMP packets. The
131ICMP type number definition could be found in the `ICMP parameters`_
132document.
133
134.. _ICMP parameters: https://www.iana.org/assignments/icmp-parameters/icmp-parameters.xhtml
135
136For example, if the Linux kernel sends an ICMP Echo packet, the
137IcmpMsgOutType8 would increase 1. And if kernel gets an ICMP Echo Reply
138packet, IcmpMsgInType0 would increase 1.
139
140* IcmpInCsumErrors
141This counter indicates the checksum of the ICMP packet is
142wrong. Kernel verifies the checksum after updating the IcmpInMsgs and
143before updating IcmpMsgInType[N]. If a packet has bad checksum, the
144IcmpInMsgs would be updated but none of IcmpMsgInType[N] would be updated.
145
146* IcmpInErrors and IcmpOutErrors
147Defined by `RFC1213 icmpInErrors`_ and `RFC1213 icmpOutErrors`_
148
149.. _RFC1213 icmpInErrors: https://tools.ietf.org/html/rfc1213#page-41
150.. _RFC1213 icmpOutErrors: https://tools.ietf.org/html/rfc1213#page-43
151
152When an error occurs in the ICMP packet handler path, these two
153counters would be updated. The receiving packet path use IcmpInErrors
154and the sending packet path use IcmpOutErrors. When IcmpInCsumErrors
155is increased, IcmpInErrors would always be increased too.
156
157relationship of the ICMP counters
158-------------------------------
159The sum of IcmpMsgOutType[N] is always equal to IcmpOutMsgs, as they
160are updated at the same time. The sum of IcmpMsgInType[N] plus
161IcmpInErrors should be equal or larger than IcmpInMsgs. When kernel
162receives an ICMP packet, kernel follows below logic:
163
1641. increase IcmpInMsgs
1652. if has any error, update IcmpInErrors and finish the process
1663. update IcmpMsgOutType[N]
1674. handle the packet depending on the type, if has any error, update
168   IcmpInErrors and finish the process
169
170So if all errors occur in step (2), IcmpInMsgs should be equal to the
171sum of IcmpMsgOutType[N] plus IcmpInErrors. If all errors occur in
172step (4), IcmpInMsgs should be equal to the sum of
173IcmpMsgOutType[N]. If the errors occur in both step (2) and step (4),
174IcmpInMsgs should be less than the sum of IcmpMsgOutType[N] plus
175IcmpInErrors.
176
177General TCP counters
178==================
179* TcpInSegs
180Defined in `RFC1213 tcpInSegs`_
181
182.. _RFC1213 tcpInSegs: https://tools.ietf.org/html/rfc1213#page-48
183
184The number of packets received by the TCP layer. As mentioned in
185RFC1213, it includes the packets received in error, such as checksum
186error, invalid TCP header and so on. Only one error won't be included:
187if the layer 2 destination address is not the NIC's layer 2
188address. It might happen if the packet is a multicast or broadcast
189packet, or the NIC is in promiscuous mode. In these situations, the
190packets would be delivered to the TCP layer, but the TCP layer will discard
191these packets before increasing TcpInSegs. The TcpInSegs counter
192isn't aware of GRO. So if two packets are merged by GRO, the TcpInSegs
193counter would only increase 1.
194
195* TcpOutSegs
196Defined in `RFC1213 tcpOutSegs`_
197
198.. _RFC1213 tcpOutSegs: https://tools.ietf.org/html/rfc1213#page-48
199
200The number of packets sent by the TCP layer. As mentioned in RFC1213,
201it excludes the retransmitted packets. But it includes the SYN, ACK
202and RST packets. Doesn't like TcpInSegs, the TcpOutSegs is aware of
203GSO, so if a packet would be split to 2 by GSO, TcpOutSegs will
204increase 2.
205
206* TcpActiveOpens
207Defined in `RFC1213 tcpActiveOpens`_
208
209.. _RFC1213 tcpActiveOpens: https://tools.ietf.org/html/rfc1213#page-47
210
211It means the TCP layer sends a SYN, and come into the SYN-SENT
212state. Every time TcpActiveOpens increases 1, TcpOutSegs should always
213increase 1.
214
215* TcpPassiveOpens
216Defined in `RFC1213 tcpPassiveOpens`_
217
218.. _RFC1213 tcpPassiveOpens: https://tools.ietf.org/html/rfc1213#page-47
219
220It means the TCP layer receives a SYN, replies a SYN+ACK, come into
221the SYN-RCVD state.
222
223* TcpExtTCPRcvCoalesce
224When packets are received by the TCP layer and are not be read by the
225application, the TCP layer will try to merge them. This counter
226indicate how many packets are merged in such situation. If GRO is
227enabled, lots of packets would be merged by GRO, these packets
228wouldn't be counted to TcpExtTCPRcvCoalesce.
229
230* TcpExtTCPAutoCorking
231When sending packets, the TCP layer will try to merge small packets to
232a bigger one. This counter increase 1 for every packet merged in such
233situation. Please refer to the LWN article for more details:
234https://lwn.net/Articles/576263/
235
236* TcpExtTCPOrigDataSent
237This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
238explaination below::
239
240  TCPOrigDataSent: number of outgoing packets with original data (excluding
241  retransmission but including data-in-SYN). This counter is different from
242  TcpOutSegs because TcpOutSegs also tracks pure ACKs. TCPOrigDataSent is
243  more useful to track the TCP retransmission rate.
244
245* TCPSynRetrans
246This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
247explaination below::
248
249  TCPSynRetrans: number of SYN and SYN/ACK retransmits to break down
250  retransmissions into SYN, fast-retransmits, timeout retransmits, etc.
251
252* TCPFastOpenActiveFail
253This counter is explained by `kernel commit f19c29e3e391`_, I pasted the
254explaination below::
255
256  TCPFastOpenActiveFail: Fast Open attempts (SYN/data) failed because
257  the remote does not accept it or the attempts timed out.
258
259.. _kernel commit f19c29e3e391: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f19c29e3e391a66a273e9afebaf01917245148cd
260
261* TcpExtListenOverflows and TcpExtListenDrops
262When kernel receives a SYN from a client, and if the TCP accept queue
263is full, kernel will drop the SYN and add 1 to TcpExtListenOverflows.
264At the same time kernel will also add 1 to TcpExtListenDrops. When a
265TCP socket is in LISTEN state, and kernel need to drop a packet,
266kernel would always add 1 to TcpExtListenDrops. So increase
267TcpExtListenOverflows would let TcpExtListenDrops increasing at the
268same time, but TcpExtListenDrops would also increase without
269TcpExtListenOverflows increasing, e.g. a memory allocation fail would
270also let TcpExtListenDrops increase.
271
272Note: The above explanation is based on kernel 4.10 or above version, on
273an old kernel, the TCP stack has different behavior when TCP accept
274queue is full. On the old kernel, TCP stack won't drop the SYN, it
275would complete the 3-way handshake. As the accept queue is full, TCP
276stack will keep the socket in the TCP half-open queue. As it is in the
277half open queue, TCP stack will send SYN+ACK on an exponential backoff
278timer, after client replies ACK, TCP stack checks whether the accept
279queue is still full, if it is not full, moves the socket to the accept
280queue, if it is full, keeps the socket in the half-open queue, at next
281time client replies ACK, this socket will get another chance to move
282to the accept queue.
283
284
285TCP Fast Open
286============
287When kernel receives a TCP packet, it has two paths to handler the
288packet, one is fast path, another is slow path. The comment in kernel
289code provides a good explanation of them, I pasted them below::
290
291  It is split into a fast path and a slow path. The fast path is
292  disabled when:
293
294  - A zero window was announced from us
295  - zero window probing
296    is only handled properly on the slow path.
297  - Out of order segments arrived.
298  - Urgent data is expected.
299  - There is no buffer space left
300  - Unexpected TCP flags/window values/header lengths are received
301    (detected by checking the TCP header against pred_flags)
302  - Data is sent in both directions. The fast path only supports pure senders
303    or pure receivers (this means either the sequence number or the ack
304    value must stay constant)
305  - Unexpected TCP option.
306
307Kernel will try to use fast path unless any of the above conditions
308are satisfied. If the packets are out of order, kernel will handle
309them in slow path, which means the performance might be not very
310good. Kernel would also come into slow path if the "Delayed ack" is
311used, because when using "Delayed ack", the data is sent in both
312directions. When the TCP window scale option is not used, kernel will
313try to enable fast path immediately when the connection comes into the
314established state, but if the TCP window scale option is used, kernel
315will disable the fast path at first, and try to enable it after kernel
316receives packets.
317
318* TcpExtTCPPureAcks and TcpExtTCPHPAcks
319If a packet set ACK flag and has no data, it is a pure ACK packet, if
320kernel handles it in the fast path, TcpExtTCPHPAcks will increase 1,
321if kernel handles it in the slow path, TcpExtTCPPureAcks will
322increase 1.
323
324* TcpExtTCPHPHits
325If a TCP packet has data (which means it is not a pure ACK packet),
326and this packet is handled in the fast path, TcpExtTCPHPHits will
327increase 1.
328
329
330TCP abort
331========
332
333
334* TcpExtTCPAbortOnData
335It means TCP layer has data in flight, but need to close the
336connection. So TCP layer sends a RST to the other side, indicate the
337connection is not closed very graceful. An easy way to increase this
338counter is using the SO_LINGER option. Please refer to the SO_LINGER
339section of the `socket man page`_:
340
341.. _socket man page: http://man7.org/linux/man-pages/man7/socket.7.html
342
343By default, when an application closes a connection, the close function
344will return immediately and kernel will try to send the in-flight data
345async. If you use the SO_LINGER option, set l_onoff to 1, and l_linger
346to a positive number, the close function won't return immediately, but
347wait for the in-flight data are acked by the other side, the max wait
348time is l_linger seconds. If set l_onoff to 1 and set l_linger to 0,
349when the application closes a connection, kernel will send a RST
350immediately and increase the TcpExtTCPAbortOnData counter.
351
352* TcpExtTCPAbortOnClose
353This counter means the application has unread data in the TCP layer when
354the application wants to close the TCP connection. In such a situation,
355kernel will send a RST to the other side of the TCP connection.
356
357* TcpExtTCPAbortOnMemory
358When an application closes a TCP connection, kernel still need to track
359the connection, let it complete the TCP disconnect process. E.g. an
360app calls the close method of a socket, kernel sends fin to the other
361side of the connection, then the app has no relationship with the
362socket any more, but kernel need to keep the socket, this socket
363becomes an orphan socket, kernel waits for the reply of the other side,
364and would come to the TIME_WAIT state finally. When kernel has no
365enough memory to keep the orphan socket, kernel would send an RST to
366the other side, and delete the socket, in such situation, kernel will
367increase 1 to the TcpExtTCPAbortOnMemory. Two conditions would trigger
368TcpExtTCPAbortOnMemory:
369
3701. the memory used by the TCP protocol is higher than the third value of
371the tcp_mem. Please refer the tcp_mem section in the `TCP man page`_:
372
373.. _TCP man page: http://man7.org/linux/man-pages/man7/tcp.7.html
374
3752. the orphan socket count is higher than net.ipv4.tcp_max_orphans
376
377
378* TcpExtTCPAbortOnTimeout
379This counter will increase when any of the TCP timers expire. In such
380situation, kernel won't send RST, just give up the connection.
381
382* TcpExtTCPAbortOnLinger
383When a TCP connection comes into FIN_WAIT_2 state, instead of waiting
384for the fin packet from the other side, kernel could send a RST and
385delete the socket immediately. This is not the default behavior of
386Linux kernel TCP stack. By configuring the TCP_LINGER2 socket option,
387you could let kernel follow this behavior.
388
389* TcpExtTCPAbortFailed
390The kernel TCP layer will send RST if the `RFC2525 2.17 section`_ is
391satisfied. If an internal error occurs during this process,
392TcpExtTCPAbortFailed will be increased.
393
394.. _RFC2525 2.17 section: https://tools.ietf.org/html/rfc2525#page-50
395
396TCP Hybrid Slow Start
397====================
398The Hybrid Slow Start algorithm is an enhancement of the traditional
399TCP congestion window Slow Start algorithm. It uses two pieces of
400information to detect whether the max bandwidth of the TCP path is
401approached. The two pieces of information are ACK train length and
402increase in packet delay. For detail information, please refer the
403`Hybrid Slow Start paper`_. Either ACK train length or packet delay
404hits a specific threshold, the congestion control algorithm will come
405into the Congestion Avoidance state. Until v4.20, two congestion
406control algorithms are using Hybrid Slow Start, they are cubic (the
407default congestion control algorithm) and cdg. Four snmp counters
408relate with the Hybrid Slow Start algorithm.
409
410.. _Hybrid Slow Start paper: https://pdfs.semanticscholar.org/25e9/ef3f03315782c7f1cbcd31b587857adae7d1.pdf
411
412* TcpExtTCPHystartTrainDetect
413How many times the ACK train length threshold is detected
414
415* TcpExtTCPHystartTrainCwnd
416The sum of CWND detected by ACK train length. Dividing this value by
417TcpExtTCPHystartTrainDetect is the average CWND which detected by the
418ACK train length.
419
420* TcpExtTCPHystartDelayDetect
421How many times the packet delay threshold is detected.
422
423* TcpExtTCPHystartDelayCwnd
424The sum of CWND detected by packet delay. Dividing this value by
425TcpExtTCPHystartDelayDetect is the average CWND which detected by the
426packet delay.
427
428examples
429=======
430
431ping test
432--------
433Run the ping command against the public dns server 8.8.8.8::
434
435  nstatuser@nstat-a:~$ ping 8.8.8.8 -c 1
436  PING 8.8.8.8 (8.8.8.8) 56(84) bytes of data.
437  64 bytes from 8.8.8.8: icmp_seq=1 ttl=119 time=17.8 ms
438
439  --- 8.8.8.8 ping statistics ---
440  1 packets transmitted, 1 received, 0% packet loss, time 0ms
441  rtt min/avg/max/mdev = 17.875/17.875/17.875/0.000 ms
442
443The nstayt result::
444
445  nstatuser@nstat-a:~$ nstat
446  #kernel
447  IpInReceives                    1                  0.0
448  IpInDelivers                    1                  0.0
449  IpOutRequests                   1                  0.0
450  IcmpInMsgs                      1                  0.0
451  IcmpInEchoReps                  1                  0.0
452  IcmpOutMsgs                     1                  0.0
453  IcmpOutEchos                    1                  0.0
454  IcmpMsgInType0                  1                  0.0
455  IcmpMsgOutType8                 1                  0.0
456  IpExtInOctets                   84                 0.0
457  IpExtOutOctets                  84                 0.0
458  IpExtInNoECTPkts                1                  0.0
459
460The Linux server sent an ICMP Echo packet, so IpOutRequests,
461IcmpOutMsgs, IcmpOutEchos and IcmpMsgOutType8 were increased 1. The
462server got ICMP Echo Reply from 8.8.8.8, so IpInReceives, IcmpInMsgs,
463IcmpInEchoReps and IcmpMsgInType0 were increased 1. The ICMP Echo Reply
464was passed to the ICMP layer via IP layer, so IpInDelivers was
465increased 1. The default ping data size is 48, so an ICMP Echo packet
466and its corresponding Echo Reply packet are constructed by:
467
468* 14 bytes MAC header
469* 20 bytes IP header
470* 16 bytes ICMP header
471* 48 bytes data (default value of the ping command)
472
473So the IpExtInOctets and IpExtOutOctets are 20+16+48=84.
474
475tcp 3-way handshake
476------------------
477On server side, we run::
478
479  nstatuser@nstat-b:~$ nc -lknv 0.0.0.0 9000
480  Listening on [0.0.0.0] (family 0, port 9000)
481
482On client side, we run::
483
484  nstatuser@nstat-a:~$ nc -nv 192.168.122.251 9000
485  Connection to 192.168.122.251 9000 port [tcp/*] succeeded!
486
487The server listened on tcp 9000 port, the client connected to it, they
488completed the 3-way handshake.
489
490On server side, we can find below nstat output::
491
492  nstatuser@nstat-b:~$ nstat | grep -i tcp
493  TcpPassiveOpens                 1                  0.0
494  TcpInSegs                       2                  0.0
495  TcpOutSegs                      1                  0.0
496  TcpExtTCPPureAcks               1                  0.0
497
498On client side, we can find below nstat output::
499
500  nstatuser@nstat-a:~$ nstat | grep -i tcp
501  TcpActiveOpens                  1                  0.0
502  TcpInSegs                       1                  0.0
503  TcpOutSegs                      2                  0.0
504
505When the server received the first SYN, it replied a SYN+ACK, and came into
506SYN-RCVD state, so TcpPassiveOpens increased 1. The server received
507SYN, sent SYN+ACK, received ACK, so server sent 1 packet, received 2
508packets, TcpInSegs increased 2, TcpOutSegs increased 1. The last ACK
509of the 3-way handshake is a pure ACK without data, so
510TcpExtTCPPureAcks increased 1.
511
512When the client sent SYN, the client came into the SYN-SENT state, so
513TcpActiveOpens increased 1, the client sent SYN, received SYN+ACK, sent
514ACK, so client sent 2 packets, received 1 packet, TcpInSegs increased
5151, TcpOutSegs increased 2.
516
517TCP normal traffic
518-----------------
519Run nc on server::
520
521  nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000
522  Listening on [0.0.0.0] (family 0, port 9000)
523
524Run nc on client::
525
526  nstatuser@nstat-a:~$ nc -v nstat-b 9000
527  Connection to nstat-b 9000 port [tcp/*] succeeded!
528
529Input a string in the nc client ('hello' in our example)::
530
531  nstatuser@nstat-a:~$ nc -v nstat-b 9000
532  Connection to nstat-b 9000 port [tcp/*] succeeded!
533  hello
534
535The client side nstat output::
536
537  nstatuser@nstat-a:~$ nstat
538  #kernel
539  IpInReceives                    1                  0.0
540  IpInDelivers                    1                  0.0
541  IpOutRequests                   1                  0.0
542  TcpInSegs                       1                  0.0
543  TcpOutSegs                      1                  0.0
544  TcpExtTCPPureAcks               1                  0.0
545  TcpExtTCPOrigDataSent           1                  0.0
546  IpExtInOctets                   52                 0.0
547  IpExtOutOctets                  58                 0.0
548  IpExtInNoECTPkts                1                  0.0
549
550The server side nstat output::
551
552  nstatuser@nstat-b:~$ nstat
553  #kernel
554  IpInReceives                    1                  0.0
555  IpInDelivers                    1                  0.0
556  IpOutRequests                   1                  0.0
557  TcpInSegs                       1                  0.0
558  TcpOutSegs                      1                  0.0
559  IpExtInOctets                   58                 0.0
560  IpExtOutOctets                  52                 0.0
561  IpExtInNoECTPkts                1                  0.0
562
563Input a string in nc client side again ('world' in our exmaple)::
564
565  nstatuser@nstat-a:~$ nc -v nstat-b 9000
566  Connection to nstat-b 9000 port [tcp/*] succeeded!
567  hello
568  world
569
570Client side nstat output::
571
572  nstatuser@nstat-a:~$ nstat
573  #kernel
574  IpInReceives                    1                  0.0
575  IpInDelivers                    1                  0.0
576  IpOutRequests                   1                  0.0
577  TcpInSegs                       1                  0.0
578  TcpOutSegs                      1                  0.0
579  TcpExtTCPHPAcks                 1                  0.0
580  TcpExtTCPOrigDataSent           1                  0.0
581  IpExtInOctets                   52                 0.0
582  IpExtOutOctets                  58                 0.0
583  IpExtInNoECTPkts                1                  0.0
584
585
586Server side nstat output::
587
588  nstatuser@nstat-b:~$ nstat
589  #kernel
590  IpInReceives                    1                  0.0
591  IpInDelivers                    1                  0.0
592  IpOutRequests                   1                  0.0
593  TcpInSegs                       1                  0.0
594  TcpOutSegs                      1                  0.0
595  TcpExtTCPHPHits                 1                  0.0
596  IpExtInOctets                   58                 0.0
597  IpExtOutOctets                  52                 0.0
598  IpExtInNoECTPkts                1                  0.0
599
600Compare the first client-side nstat and the second client-side nstat,
601we could find one difference: the first one had a 'TcpExtTCPPureAcks',
602but the second one had a 'TcpExtTCPHPAcks'. The first server-side
603nstat and the second server-side nstat had a difference too: the
604second server-side nstat had a TcpExtTCPHPHits, but the first
605server-side nstat didn't have it. The network traffic patterns were
606exactly the same: the client sent a packet to the server, the server
607replied an ACK. But kernel handled them in different ways. When the
608TCP window scale option is not used, kernel will try to enable fast
609path immediately when the connection comes into the established state,
610but if the TCP window scale option is used, kernel will disable the
611fast path at first, and try to enable it after kerenl receives
612packets. We could use the 'ss' command to verify whether the window
613scale option is used. e.g. run below command on either server or
614client::
615
616  nstatuser@nstat-a:~$ ss -o state established -i '( dport = :9000 or sport = :9000 )
617  Netid    Recv-Q     Send-Q            Local Address:Port             Peer Address:Port
618  tcp      0          0               192.168.122.250:40654         192.168.122.251:9000
619             ts sack cubic wscale:7,7 rto:204 rtt:0.98/0.49 mss:1448 pmtu:1500 rcvmss:536 advmss:1448 cwnd:10 bytes_acked:1 segs_out:2 segs_in:1 send 118.2Mbps lastsnd:46572 lastrcv:46572 lastack:46572 pacing_rate 236.4Mbps rcv_space:29200 rcv_ssthresh:29200 minrtt:0.98
620
621The 'wscale:7,7' means both server and client set the window scale
622option to 7. Now we could explain the nstat output in our test:
623
624In the first nstat output of client side, the client sent a packet, server
625reply an ACK, when kernel handled this ACK, the fast path was not
626enabled, so the ACK was counted into 'TcpExtTCPPureAcks'.
627
628In the second nstat output of client side, the client sent a packet again,
629and received another ACK from the server, in this time, the fast path is
630enabled, and the ACK was qualified for fast path, so it was handled by
631the fast path, so this ACK was counted into TcpExtTCPHPAcks.
632
633In the first nstat output of server side, fast path was not enabled,
634so there was no 'TcpExtTCPHPHits'.
635
636In the second nstat output of server side, the fast path was enabled,
637and the packet received from client qualified for fast path, so it
638was counted into 'TcpExtTCPHPHits'.
639
640TcpExtTCPAbortOnClose
641--------------------
642On the server side, we run below python script::
643
644  import socket
645  import time
646
647  port = 9000
648
649  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
650  s.bind(('0.0.0.0', port))
651  s.listen(1)
652  sock, addr = s.accept()
653  while True:
654      time.sleep(9999999)
655
656This python script listen on 9000 port, but doesn't read anything from
657the connection.
658
659On the client side, we send the string "hello" by nc::
660
661  nstatuser@nstat-a:~$ echo "hello" | nc nstat-b 9000
662
663Then, we come back to the server side, the server has received the "hello"
664packet, and the TCP layer has acked this packet, but the application didn't
665read it yet. We type Ctrl-C to terminate the server script. Then we
666could find TcpExtTCPAbortOnClose increased 1 on the server side::
667
668  nstatuser@nstat-b:~$ nstat | grep -i abort
669  TcpExtTCPAbortOnClose           1                  0.0
670
671If we run tcpdump on the server side, we could find the server sent a
672RST after we type Ctrl-C.
673
674TcpExtTCPAbortOnMemory and TcpExtTCPAbortOnTimeout
675-----------------------------------------------
676Below is an example which let the orphan socket count be higher than
677net.ipv4.tcp_max_orphans.
678Change tcp_max_orphans to a smaller value on client::
679
680  sudo bash -c "echo 10 > /proc/sys/net/ipv4/tcp_max_orphans"
681
682Client code (create 64 connection to server)::
683
684  nstatuser@nstat-a:~$ cat client_orphan.py
685  import socket
686  import time
687
688  server = 'nstat-b' # server address
689  port = 9000
690
691  count = 64
692
693  connection_list = []
694
695  for i in range(64):
696      s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
697      s.connect((server, port))
698      connection_list.append(s)
699      print("connection_count: %d" % len(connection_list))
700
701  while True:
702      time.sleep(99999)
703
704Server code (accept 64 connection from client)::
705
706  nstatuser@nstat-b:~$ cat server_orphan.py
707  import socket
708  import time
709
710  port = 9000
711  count = 64
712
713  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
714  s.bind(('0.0.0.0', port))
715  s.listen(count)
716  connection_list = []
717  while True:
718      sock, addr = s.accept()
719      connection_list.append((sock, addr))
720      print("connection_count: %d" % len(connection_list))
721
722Run the python scripts on server and client.
723
724On server::
725
726  python3 server_orphan.py
727
728On client::
729
730  python3 client_orphan.py
731
732Run iptables on server::
733
734  sudo iptables -A INPUT -i ens3 -p tcp --destination-port 9000 -j DROP
735
736Type Ctrl-C on client, stop client_orphan.py.
737
738Check TcpExtTCPAbortOnMemory on client::
739
740  nstatuser@nstat-a:~$ nstat | grep -i abort
741  TcpExtTCPAbortOnMemory          54                 0.0
742
743Check orphane socket count on client::
744
745  nstatuser@nstat-a:~$ ss -s
746  Total: 131 (kernel 0)
747  TCP:   14 (estab 1, closed 0, orphaned 10, synrecv 0, timewait 0/0), ports 0
748
749  Transport Total     IP        IPv6
750  *         0         -         -
751  RAW       1         0         1
752  UDP       1         1         0
753  TCP       14        13        1
754  INET      16        14        2
755  FRAG      0         0         0
756
757The explanation of the test: after run server_orphan.py and
758client_orphan.py, we set up 64 connections between server and
759client. Run the iptables command, the server will drop all packets from
760the client, type Ctrl-C on client_orphan.py, the system of the client
761would try to close these connections, and before they are closed
762gracefully, these connections became orphan sockets. As the iptables
763of the server blocked packets from the client, the server won't receive fin
764from the client, so all connection on clients would be stuck on FIN_WAIT_1
765stage, so they will keep as orphan sockets until timeout. We have echo
76610 to /proc/sys/net/ipv4/tcp_max_orphans, so the client system would
767only keep 10 orphan sockets, for all other orphan sockets, the client
768system sent RST for them and delete them. We have 64 connections, so
769the 'ss -s' command shows the system has 10 orphan sockets, and the
770value of TcpExtTCPAbortOnMemory was 54.
771
772An additional explanation about orphan socket count: You could find the
773exactly orphan socket count by the 'ss -s' command, but when kernel
774decide whither increases TcpExtTCPAbortOnMemory and sends RST, kernel
775doesn't always check the exactly orphan socket count. For increasing
776performance, kernel checks an approximate count firstly, if the
777approximate count is more than tcp_max_orphans, kernel checks the
778exact count again. So if the approximate count is less than
779tcp_max_orphans, but exactly count is more than tcp_max_orphans, you
780would find TcpExtTCPAbortOnMemory is not increased at all. If
781tcp_max_orphans is large enough, it won't occur, but if you decrease
782tcp_max_orphans to a small value like our test, you might find this
783issue. So in our test, the client set up 64 connections although the
784tcp_max_orphans is 10. If the client only set up 11 connections, we
785can't find the change of TcpExtTCPAbortOnMemory.
786
787Continue the previous test, we wait for several minutes. Because of the
788iptables on the server blocked the traffic, the server wouldn't receive
789fin, and all the client's orphan sockets would timeout on the
790FIN_WAIT_1 state finally. So we wait for a few minutes, we could find
79110 timeout on the client::
792
793  nstatuser@nstat-a:~$ nstat | grep -i abort
794  TcpExtTCPAbortOnTimeout         10                 0.0
795
796TcpExtTCPAbortOnLinger
797---------------------
798The server side code::
799
800  nstatuser@nstat-b:~$ cat server_linger.py
801  import socket
802  import time
803
804  port = 9000
805
806  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
807  s.bind(('0.0.0.0', port))
808  s.listen(1)
809  sock, addr = s.accept()
810  while True:
811      time.sleep(9999999)
812
813The client side code::
814
815  nstatuser@nstat-a:~$ cat client_linger.py
816  import socket
817  import struct
818
819  server = 'nstat-b' # server address
820  port = 9000
821
822  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
823  s.setsockopt(socket.SOL_SOCKET, socket.SO_LINGER, struct.pack('ii', 1, 10))
824  s.setsockopt(socket.SOL_TCP, socket.TCP_LINGER2, struct.pack('i', -1))
825  s.connect((server, port))
826  s.close()
827
828Run server_linger.py on server::
829
830  nstatuser@nstat-b:~$ python3 server_linger.py
831
832Run client_linger.py on client::
833
834  nstatuser@nstat-a:~$ python3 client_linger.py
835
836After run client_linger.py, check the output of nstat::
837
838  nstatuser@nstat-a:~$ nstat | grep -i abort
839  TcpExtTCPAbortOnLinger          1                  0.0
840
841TcpExtTCPRcvCoalesce
842-------------------
843On the server, we run a program which listen on TCP port 9000, but
844doesn't read any data::
845
846  import socket
847  import time
848  port = 9000
849  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
850  s.bind(('0.0.0.0', port))
851  s.listen(1)
852  sock, addr = s.accept()
853  while True:
854      time.sleep(9999999)
855
856Save the above code as server_coalesce.py, and run::
857
858  python3 server_coalesce.py
859
860On the client, save below code as client_coalesce.py::
861
862  import socket
863  server = 'nstat-b'
864  port = 9000
865  s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
866  s.connect((server, port))
867
868Run::
869
870  nstatuser@nstat-a:~$ python3 -i client_coalesce.py
871
872We use '-i' to come into the interactive mode, then a packet::
873
874  >>> s.send(b'foo')
875  3
876
877Send a packet again::
878
879  >>> s.send(b'bar')
880  3
881
882On the server, run nstat::
883
884  ubuntu@nstat-b:~$ nstat
885  #kernel
886  IpInReceives                    2                  0.0
887  IpInDelivers                    2                  0.0
888  IpOutRequests                   2                  0.0
889  TcpInSegs                       2                  0.0
890  TcpOutSegs                      2                  0.0
891  TcpExtTCPRcvCoalesce            1                  0.0
892  IpExtInOctets                   110                0.0
893  IpExtOutOctets                  104                0.0
894  IpExtInNoECTPkts                2                  0.0
895
896The client sent two packets, server didn't read any data. When
897the second packet arrived at server, the first packet was still in
898the receiving queue. So the TCP layer merged the two packets, and we
899could find the TcpExtTCPRcvCoalesce increased 1.
900
901TcpExtListenOverflows and TcpExtListenDrops
902----------------------------------------
903On server, run the nc command, listen on port 9000::
904
905  nstatuser@nstat-b:~$ nc -lkv 0.0.0.0 9000
906  Listening on [0.0.0.0] (family 0, port 9000)
907
908On client, run 3 nc commands in different terminals::
909
910  nstatuser@nstat-a:~$ nc -v nstat-b 9000
911  Connection to nstat-b 9000 port [tcp/*] succeeded!
912
913The nc command only accepts 1 connection, and the accept queue length
914is 1. On current linux implementation, set queue length to n means the
915actual queue length is n+1. Now we create 3 connections, 1 is accepted
916by nc, 2 in accepted queue, so the accept queue is full.
917
918Before running the 4th nc, we clean the nstat history on the server::
919
920  nstatuser@nstat-b:~$ nstat -n
921
922Run the 4th nc on the client::
923
924  nstatuser@nstat-a:~$ nc -v nstat-b 9000
925
926If the nc server is running on kernel 4.10 or higher version, you
927won't see the "Connection to ... succeeded!" string, because kernel
928will drop the SYN if the accept queue is full. If the nc client is running
929on an old kernel, you would see that the connection is succeeded,
930because kernel would complete the 3 way handshake and keep the socket
931on half open queue. I did the test on kernel 4.15. Below is the nstat
932on the server::
933
934  nstatuser@nstat-b:~$ nstat
935  #kernel
936  IpInReceives                    4                  0.0
937  IpInDelivers                    4                  0.0
938  TcpInSegs                       4                  0.0
939  TcpExtListenOverflows           4                  0.0
940  TcpExtListenDrops               4                  0.0
941  IpExtInOctets                   240                0.0
942  IpExtInNoECTPkts                4                  0.0
943
944Both TcpExtListenOverflows and TcpExtListenDrops were 4. If the time
945between the 4th nc and the nstat was longer, the value of
946TcpExtListenOverflows and TcpExtListenDrops would be larger, because
947the SYN of the 4th nc was dropped, the client was retrying.
948