xref: /freebsd/share/man/man9/mod_cc.9 (revision 13ec1e3155c7e9bf037b12af186351b7fa9b9450)
1.\"
2.\" Copyright (c) 2008-2009 Lawrence Stewart <lstewart@FreeBSD.org>
3.\" Copyright (c) 2010-2011 The FreeBSD Foundation
4.\" All rights reserved.
5.\"
6.\" Portions of this documentation were written at the Centre for Advanced
7.\" Internet Architectures, Swinburne University of Technology, Melbourne,
8.\" Australia by David Hayes and Lawrence Stewart under sponsorship from the
9.\" FreeBSD Foundation.
10.\"
11.\" Redistribution and use in source and binary forms, with or without
12.\" modification, are permitted provided that the following conditions
13.\" are met:
14.\" 1. Redistributions of source code must retain the above copyright
15.\"    notice, this list of conditions and the following disclaimer.
16.\" 2. Redistributions in binary form must reproduce the above copyright
17.\"    notice, this list of conditions and the following disclaimer in the
18.\"    documentation and/or other materials provided with the distribution.
19.\"
20.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
21.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
23.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
24.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
26.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
27.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
28.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
29.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
30.\" SUCH DAMAGE.
31.\"
32.\" $FreeBSD$
33.\"
34.Dd May 13, 2021
35.Dt MOD_CC 9
36.Os
37.Sh NAME
38.Nm mod_cc ,
39.Nm DECLARE_CC_MODULE ,
40.Nm CCV
41.Nd Modular Congestion Control
42.Sh SYNOPSIS
43.In netinet/tcp.h
44.In netinet/cc/cc.h
45.In netinet/cc/cc_module.h
46.Fn DECLARE_CC_MODULE "ccname" "ccalgo"
47.Fn CCV "ccv" "what"
48.Sh DESCRIPTION
49The
50.Nm
51framework allows congestion control algorithms to be implemented as dynamically
52loadable kernel modules via the
53.Xr kld 4
54facility.
55Transport protocols can select from the list of available algorithms on a
56connection-by-connection basis, or use the system default (see
57.Xr mod_cc 4
58for more details).
59.Pp
60.Nm
61modules are identified by an
62.Xr ascii 7
63name and set of hook functions encapsulated in a
64.Vt "struct cc_algo" ,
65which has the following members:
66.Bd -literal -offset indent
67struct cc_algo {
68	char	name[TCP_CA_NAME_MAX];
69	int	(*mod_init) (void);
70	int	(*mod_destroy) (void);
71	size_t  (*cc_data_sz)(void);
72	int	(*cb_init) (struct cc_var *ccv, void *ptr);
73	void	(*cb_destroy) (struct cc_var *ccv);
74	void	(*conn_init) (struct cc_var *ccv);
75	void	(*ack_received) (struct cc_var *ccv, uint16_t type);
76	void	(*cong_signal) (struct cc_var *ccv, uint32_t type);
77	void	(*post_recovery) (struct cc_var *ccv);
78	void	(*after_idle) (struct cc_var *ccv);
79	int	(*ctl_output)(struct cc_var *, struct sockopt *, void *);
80	void    (*rttsample)(struct cc_var *, uint32_t, uint32_t, uint32_t);
81	void    (*newround)(struct cc_var *, uint32_t);
82};
83.Ed
84.Pp
85The
86.Va name
87field identifies the unique name of the algorithm, and should be no longer than
88TCP_CA_NAME_MAX-1 characters in length (the TCP_CA_NAME_MAX define lives in
89.In netinet/tcp.h
90for compatibility reasons).
91.Pp
92The
93.Va mod_init
94function is called when a new module is loaded into the system but before the
95registration process is complete.
96It should be implemented if a module needs to set up some global state prior to
97being available for use by new connections.
98Returning a non-zero value from
99.Va mod_init
100will cause the loading of the module to fail.
101.Pp
102The
103.Va mod_destroy
104function is called prior to unloading an existing module from the kernel.
105It should be implemented if a module needs to clean up any global state before
106being removed from the kernel.
107The return value is currently ignored.
108.Pp
109The
110.Va cc_data_sz
111function is called by the socket option code to get the size of
112data that the
113.Va cb_init
114function needs.
115The socket option code then preallocates the modules memory so that the
116.Va cb_init
117function will not fail (the socket option code uses M_WAITOK with
118no locks held to do this).
119.Pp
120The
121.Va cb_init
122function is called when a TCP control block
123.Vt struct tcpcb
124is created.
125It should be implemented if a module needs to allocate memory for storing
126private per-connection state.
127Returning a non-zero value from
128.Va cb_init
129will cause the connection set up to be aborted, terminating the connection as a
130result.
131Note that the ptr argument passed to the function should be checked to
132see if it is non-NULL, if so it is preallocated memory that the cb_init function
133must use instead of calling malloc itself.
134.Pp
135The
136.Va cb_destroy
137function is called when a TCP control block
138.Vt struct tcpcb
139is destroyed.
140It should be implemented if a module needs to free memory allocated in
141.Va cb_init .
142.Pp
143The
144.Va conn_init
145function is called when a new connection has been established and variables are
146being initialised.
147It should be implemented to initialise congestion control algorithm variables
148for the newly established connection.
149.Pp
150The
151.Va ack_received
152function is called when a TCP acknowledgement (ACK) packet is received.
153Modules use the
154.Fa type
155argument as an input to their congestion management algorithms.
156The ACK types currently reported by the stack are CC_ACK and CC_DUPACK.
157CC_ACK indicates the received ACK acknowledges previously unacknowledged data.
158CC_DUPACK indicates the received ACK acknowledges data we have already received
159an ACK for.
160.Pp
161The
162.Va cong_signal
163function is called when a congestion event is detected by the TCP stack.
164Modules use the
165.Fa type
166argument as an input to their congestion management algorithms.
167The congestion event types currently reported by the stack are CC_ECN, CC_RTO,
168CC_RTO_ERR and CC_NDUPACK.
169CC_ECN is reported when the TCP stack receives an explicit congestion notification
170(RFC3168).
171CC_RTO is reported when the retransmission time out timer fires.
172CC_RTO_ERR is reported if the retransmission time out timer fired in error.
173CC_NDUPACK is reported if N duplicate ACKs have been received back-to-back,
174where N is the fast retransmit duplicate ack threshold (N=3 currently as per
175RFC5681).
176.Pp
177The
178.Va post_recovery
179function is called after the TCP connection has recovered from a congestion event.
180It should be implemented to adjust state as required.
181.Pp
182The
183.Va after_idle
184function is called when data transfer resumes after an idle period.
185It should be implemented to adjust state as required.
186.Pp
187The
188.Va ctl_output
189function is called when
190.Xr getsockopt 2
191or
192.Xr setsockopt 2
193is called on a
194.Xr tcp 4
195socket with the
196.Va struct sockopt
197pointer forwarded unmodified from the TCP control, and a
198.Va void *
199pointer to algorithm specific argument.
200.Pp
201The
202.Va rttsample
203function is called to pass round trip time information to the
204congestion controller.
205The additional arguments to the function include the microsecond RTT
206that is being noted, the number of times that the data being
207acknowledged was retransmitted as well as the flightsize at send.
208For transports that do not track flightsize at send, this variable
209will be the current cwnd at the time of the call.
210.Pp
211The
212.Va newround
213function is called each time a new round trip time begins.
214The montonically increasing round number is also passed to the
215congestion controller as well.
216This can be used for various purposes by the congestion controller (e.g Hystart++).
217.Pp
218Note that currently not all TCP stacks call the
219.Va rttsample
220and
221.Va newround
222function so dependancy on these functions is also
223dependant upon which TCP stack is in use.
224.Pp
225The
226.Fn DECLARE_CC_MODULE
227macro provides a convenient wrapper around the
228.Xr DECLARE_MODULE 9
229macro, and is used to register a
230.Nm
231module with the
232.Nm
233framework.
234The
235.Fa ccname
236argument specifies the module's name.
237The
238.Fa ccalgo
239argument points to the module's
240.Vt struct cc_algo .
241.Pp
242.Nm
243modules must instantiate a
244.Vt struct cc_algo ,
245but are only required to set the name field, and optionally any of the function
246pointers.
247Note that if a module defines the
248.Va cb_init
249function it also must define a
250.Va cc_data_sz
251function.
252This is because when switching from one congestion control
253module to another the socket option code will preallocate memory for the
254.Va cb_init
255function. If no memory is allocated by the modules
256.Va cb_init
257then the
258.Va cc_data_sz
259function should return 0.
260.Pp
261The stack will skip calling any function pointer which is NULL, so there is no
262requirement to implement any of the function pointers (with the exception of
263the cb_init <-> cc_data_sz dependancy noted above).
264Using the C99 designated initialiser feature to set fields is encouraged.
265.Pp
266Each function pointer which deals with congestion control state is passed a
267pointer to a
268.Vt struct cc_var ,
269which has the following members:
270.Bd -literal -offset indent
271struct cc_var {
272	void		*cc_data;
273	int		bytes_this_ack;
274	tcp_seq		curack;
275	uint32_t	flags;
276	int		type;
277	union ccv_container {
278		struct tcpcb		*tcp;
279		struct sctp_nets	*sctp;
280	} ccvc;
281	uint16_t	nsegs;
282	uint8_t		labc;
283};
284.Ed
285.Pp
286.Vt struct cc_var
287groups congestion control related variables into a single, embeddable structure
288and adds a layer of indirection to accessing transport protocol control blocks.
289The eventual goal is to allow a single set of
290.Nm
291modules to be shared between all congestion aware transport protocols, though
292currently only
293.Xr tcp 4
294is supported.
295.Pp
296To aid the eventual transition towards this goal, direct use of variables from
297the transport protocol's data structures is strongly discouraged.
298However, it is inevitable at the current time to require access to some of these
299variables, and so the
300.Fn CCV
301macro exists as a convenience accessor.
302The
303.Fa ccv
304argument points to the
305.Vt struct cc_var
306passed into the function by the
307.Nm
308framework.
309The
310.Fa what
311argument specifies the name of the variable to access.
312.Pp
313Apart from the
314.Va type
315and
316.Va ccv_container
317fields, the remaining fields in
318.Vt struct cc_var
319are for use by
320.Nm
321modules.
322.Pp
323The
324.Va cc_data
325field is available for algorithms requiring additional per-connection state to
326attach a dynamic memory pointer to.
327The memory should be allocated and attached in the module's
328.Va cb_init
329hook function.
330.Pp
331The
332.Va bytes_this_ack
333field specifies the number of new bytes acknowledged by the most recently
334received ACK packet.
335It is only valid in the
336.Va ack_received
337hook function.
338.Pp
339The
340.Va curack
341field specifies the sequence number of the most recently received ACK packet.
342It is only valid in the
343.Va ack_received ,
344.Va cong_signal
345and
346.Va post_recovery
347hook functions.
348.Pp
349The
350.Va flags
351field is used to pass useful information from the stack to a
352.Nm
353module.
354The CCF_ABC_SENTAWND flag is relevant in
355.Va ack_received
356and is set when appropriate byte counting (RFC3465) has counted a window's worth
357of bytes has been sent.
358It is the module's responsibility to clear the flag after it has processed the
359signal.
360The CCF_CWND_LIMITED flag is relevant in
361.Va ack_received
362and is set when the connection's ability to send data is currently constrained
363by the value of the congestion window.
364Algorithms should use the absence of this flag being set to avoid accumulating
365a large difference between the congestion window and send window.
366.Pp
367The
368.Va nsegs
369variable is used to pass in how much compression was done by the local
370LRO system.
371So for example if LRO pushed three in-order acknowledgements into
372one acknowledgement the variable would be set to three.
373.Pp
374The
375.Va labc
376variable is used in conjunction with the CCF_USE_LOCAL_ABC flag
377to override what labc variable the congestion controller will use
378for this particular acknowledgement.
379.Sh SEE ALSO
380.Xr cc_cdg 4 ,
381.Xr cc_chd 4 ,
382.Xr cc_cubic 4 ,
383.Xr cc_dctcp 4 ,
384.Xr cc_hd 4 ,
385.Xr cc_htcp 4 ,
386.Xr cc_newreno 4 ,
387.Xr cc_vegas 4 ,
388.Xr mod_cc 4 ,
389.Xr tcp 4
390.Sh ACKNOWLEDGEMENTS
391Development and testing of this software were made possible in part by grants
392from the FreeBSD Foundation and Cisco University Research Program Fund at
393Community Foundation Silicon Valley.
394.Sh FUTURE WORK
395Integrate with
396.Xr sctp 4 .
397.Sh HISTORY
398The modular Congestion Control (CC) framework first appeared in
399.Fx 9.0 .
400.Pp
401The framework was first released in 2007 by James Healy and Lawrence Stewart
402whilst working on the NewTCP research project at Swinburne University of
403Technology's Centre for Advanced Internet Architectures, Melbourne, Australia,
404which was made possible in part by a grant from the Cisco University Research
405Program Fund at Community Foundation Silicon Valley.
406More details are available at:
407.Pp
408http://caia.swin.edu.au/urp/newtcp/
409.Sh AUTHORS
410.An -nosplit
411The
412.Nm
413framework was written by
414.An Lawrence Stewart Aq Mt lstewart@FreeBSD.org ,
415.An James Healy Aq Mt jimmy@deefa.com
416and
417.An David Hayes Aq Mt david.hayes@ieee.org .
418.Pp
419This manual page was written by
420.An David Hayes Aq Mt david.hayes@ieee.org
421and
422.An Lawrence Stewart Aq Mt lstewart@FreeBSD.org .
423