xref: /freebsd/share/man/man9/mod_cc.9 (revision c7c3ef8949021696029fb73a513678578a9f8a46)
1 .\"
2 .\" Copyright (c) 2008-2009 Lawrence Stewart <lstewart@FreeBSD.org>
3 .\" Copyright (c) 2010-2011 The FreeBSD Foundation
4 .\" All rights reserved.
5 .\"
6 .\" Portions of this documentation were written at the Centre for Advanced
7 .\" Internet Architectures, Swinburne University of Technology, Melbourne,
8 .\" Australia by David Hayes and Lawrence Stewart under sponsorship from the
9 .\" FreeBSD Foundation.
10 .\"
11 .\" Redistribution and use in source and binary forms, with or without
12 .\" modification, are permitted provided that the following conditions
13 .\" are met:
14 .\" 1. Redistributions of source code must retain the above copyright
15 .\"    notice, this list of conditions and the following disclaimer.
16 .\" 2. Redistributions in binary form must reproduce the above copyright
17 .\"    notice, this list of conditions and the following disclaimer in the
18 .\"    documentation and/or other materials provided with the distribution.
19 .\"
20 .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
21 .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
22 .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
23 .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR
24 .\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
25 .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
26 .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
27 .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
28 .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
29 .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
30 .\" SUCH DAMAGE.
31 .\"
32 .Dd May 13, 2021
33 .Dt MOD_CC 9
34 .Os
35 .Sh NAME
36 .Nm mod_cc ,
37 .Nm DECLARE_CC_MODULE ,
38 .Nm CCV
39 .Nd Modular Congestion Control
40 .Sh SYNOPSIS
41 .In netinet/tcp.h
42 .In netinet/cc/cc.h
43 .In netinet/cc/cc_module.h
44 .Fn DECLARE_CC_MODULE "ccname" "ccalgo"
45 .Fn CCV "ccv" "what"
46 .Sh DESCRIPTION
47 The
48 .Nm
49 framework allows congestion control algorithms to be implemented as dynamically
50 loadable kernel modules via the
51 .Xr kld 4
52 facility.
53 Transport protocols can select from the list of available algorithms on a
54 connection-by-connection basis, or use the system default (see
55 .Xr mod_cc 4
56 for more details).
57 .Pp
58 .Nm
59 modules are identified by an
60 .Xr ascii 7
61 name and set of hook functions encapsulated in a
62 .Vt "struct cc_algo" ,
63 which has the following members:
64 .Bd -literal -offset indent
65 struct cc_algo {
66 	char	name[TCP_CA_NAME_MAX];
67 	int	(*mod_init) (void);
68 	int	(*mod_destroy) (void);
69 	size_t  (*cc_data_sz)(void);
70 	int	(*cb_init) (struct cc_var *ccv, void *ptr);
71 	void	(*cb_destroy) (struct cc_var *ccv);
72 	void	(*conn_init) (struct cc_var *ccv);
73 	void	(*ack_received) (struct cc_var *ccv, uint16_t type);
74 	void	(*cong_signal) (struct cc_var *ccv, uint32_t type);
75 	void	(*post_recovery) (struct cc_var *ccv);
76 	void	(*after_idle) (struct cc_var *ccv);
77 	int	(*ctl_output)(struct cc_var *, struct sockopt *, void *);
78 	void    (*rttsample)(struct cc_var *, uint32_t, uint32_t, uint32_t);
79 	void    (*newround)(struct cc_var *, uint32_t);
80 };
81 .Ed
82 .Pp
83 The
84 .Va name
85 field identifies the unique name of the algorithm, and should be no longer than
86 TCP_CA_NAME_MAX-1 characters in length (the TCP_CA_NAME_MAX define lives in
87 .In netinet/tcp.h
88 for compatibility reasons).
89 .Pp
90 The
91 .Va mod_init
92 function is called when a new module is loaded into the system but before the
93 registration process is complete.
94 It should be implemented if a module needs to set up some global state prior to
95 being available for use by new connections.
96 Returning a non-zero value from
97 .Va mod_init
98 will cause the loading of the module to fail.
99 .Pp
100 The
101 .Va mod_destroy
102 function is called prior to unloading an existing module from the kernel.
103 It should be implemented if a module needs to clean up any global state before
104 being removed from the kernel.
105 The return value is currently ignored.
106 .Pp
107 The
108 .Va cc_data_sz
109 function is called by the socket option code to get the size of
110 data that the
111 .Va cb_init
112 function needs.
113 The socket option code then preallocates the modules memory so that the
114 .Va cb_init
115 function will not fail (the socket option code uses M_WAITOK with
116 no locks held to do this).
117 .Pp
118 The
119 .Va cb_init
120 function is called when a TCP control block
121 .Vt struct tcpcb
122 is created.
123 It should be implemented if a module needs to allocate memory for storing
124 private per-connection state.
125 Returning a non-zero value from
126 .Va cb_init
127 will cause the connection set up to be aborted, terminating the connection as a
128 result.
129 Note that the ptr argument passed to the function should be checked to
130 see if it is non-NULL, if so it is preallocated memory that the cb_init function
131 must use instead of calling malloc itself.
132 .Pp
133 The
134 .Va cb_destroy
135 function is called when a TCP control block
136 .Vt struct tcpcb
137 is destroyed.
138 It should be implemented if a module needs to free memory allocated in
139 .Va cb_init .
140 .Pp
141 The
142 .Va conn_init
143 function is called when a new connection has been established and variables are
144 being initialised.
145 It should be implemented to initialise congestion control algorithm variables
146 for the newly established connection.
147 .Pp
148 The
149 .Va ack_received
150 function is called when a TCP acknowledgement (ACK) packet is received.
151 Modules use the
152 .Fa type
153 argument as an input to their congestion management algorithms.
154 The ACK types currently reported by the stack are CC_ACK and CC_DUPACK.
155 CC_ACK indicates the received ACK acknowledges previously unacknowledged data.
156 CC_DUPACK indicates the received ACK acknowledges data we have already received
157 an ACK for.
158 .Pp
159 The
160 .Va cong_signal
161 function is called when a congestion event is detected by the TCP stack.
162 Modules use the
163 .Fa type
164 argument as an input to their congestion management algorithms.
165 The congestion event types currently reported by the stack are CC_ECN, CC_RTO,
166 CC_RTO_ERR and CC_NDUPACK.
167 CC_ECN is reported when the TCP stack receives an explicit congestion notification
168 (RFC3168).
169 CC_RTO is reported when the retransmission time out timer fires.
170 CC_RTO_ERR is reported if the retransmission time out timer fired in error.
171 CC_NDUPACK is reported if N duplicate ACKs have been received back-to-back,
172 where N is the fast retransmit duplicate ack threshold (N=3 currently as per
173 RFC5681).
174 .Pp
175 The
176 .Va post_recovery
177 function is called after the TCP connection has recovered from a congestion event.
178 It should be implemented to adjust state as required.
179 .Pp
180 The
181 .Va after_idle
182 function is called when data transfer resumes after an idle period.
183 It should be implemented to adjust state as required.
184 .Pp
185 The
186 .Va ctl_output
187 function is called when
188 .Xr getsockopt 2
189 or
190 .Xr setsockopt 2
191 is called on a
192 .Xr tcp 4
193 socket with the
194 .Va struct sockopt
195 pointer forwarded unmodified from the TCP control, and a
196 .Va void *
197 pointer to algorithm specific argument.
198 .Pp
199 The
200 .Va rttsample
201 function is called to pass round trip time information to the
202 congestion controller.
203 The additional arguments to the function include the microsecond RTT
204 that is being noted, the number of times that the data being
205 acknowledged was retransmitted as well as the flightsize at send.
206 For transports that do not track flightsize at send, this variable
207 will be the current cwnd at the time of the call.
208 .Pp
209 The
210 .Va newround
211 function is called each time a new round trip time begins.
212 The montonically increasing round number is also passed to the
213 congestion controller as well.
214 This can be used for various purposes by the congestion controller (e.g Hystart++).
215 .Pp
216 Note that currently not all TCP stacks call the
217 .Va rttsample
218 and
219 .Va newround
220 function so dependency on these functions is also
221 dependent upon which TCP stack is in use.
222 .Pp
223 The
224 .Fn DECLARE_CC_MODULE
225 macro provides a convenient wrapper around the
226 .Xr DECLARE_MODULE 9
227 macro, and is used to register a
228 .Nm
229 module with the
230 .Nm
231 framework.
232 The
233 .Fa ccname
234 argument specifies the module's name.
235 The
236 .Fa ccalgo
237 argument points to the module's
238 .Vt struct cc_algo .
239 .Pp
240 .Nm
241 modules must instantiate a
242 .Vt struct cc_algo ,
243 but are only required to set the name field, and optionally any of the function
244 pointers.
245 Note that if a module defines the
246 .Va cb_init
247 function it also must define a
248 .Va cc_data_sz
249 function.
250 This is because when switching from one congestion control
251 module to another the socket option code will preallocate memory for the
252 .Va cb_init
253 function.
254 If no memory is allocated by the modules
255 .Va cb_init
256 then the
257 .Va cc_data_sz
258 function should return 0.
259 .Pp
260 The stack will skip calling any function pointer which is NULL, so there is no
261 requirement to implement any of the function pointers (with the exception of
262 the cb_init <-> cc_data_sz dependency noted above).
263 Using the C99 designated initialiser feature to set fields is encouraged.
264 .Pp
265 Each function pointer which deals with congestion control state is passed a
266 pointer to a
267 .Vt struct cc_var ,
268 which has the following members:
269 .Bd -literal -offset indent
270 struct cc_var {
271 	void		*cc_data;
272 	int		bytes_this_ack;
273 	tcp_seq		curack;
274 	uint32_t	flags;
275 	int		type;
276 	union ccv_container {
277 		struct tcpcb		*tcp;
278 		struct sctp_nets	*sctp;
279 	} ccvc;
280 	uint16_t	nsegs;
281 	uint8_t		labc;
282 };
283 .Ed
284 .Pp
285 .Vt struct cc_var
286 groups congestion control related variables into a single, embeddable structure
287 and adds a layer of indirection to accessing transport protocol control blocks.
288 The eventual goal is to allow a single set of
289 .Nm
290 modules to be shared between all congestion aware transport protocols, though
291 currently only
292 .Xr tcp 4
293 is supported.
294 .Pp
295 To aid the eventual transition towards this goal, direct use of variables from
296 the transport protocol's data structures is strongly discouraged.
297 However, it is inevitable at the current time to require access to some of these
298 variables, and so the
299 .Fn CCV
300 macro exists as a convenience accessor.
301 The
302 .Fa ccv
303 argument points to the
304 .Vt struct cc_var
305 passed into the function by the
306 .Nm
307 framework.
308 The
309 .Fa what
310 argument specifies the name of the variable to access.
311 .Pp
312 Apart from the
313 .Va type
314 and
315 .Va ccv_container
316 fields, the remaining fields in
317 .Vt struct cc_var
318 are for use by
319 .Nm
320 modules.
321 .Pp
322 The
323 .Va cc_data
324 field is available for algorithms requiring additional per-connection state to
325 attach a dynamic memory pointer to.
326 The memory should be allocated and attached in the module's
327 .Va cb_init
328 hook function.
329 .Pp
330 The
331 .Va bytes_this_ack
332 field specifies the number of new bytes acknowledged by the most recently
333 received ACK packet.
334 It is only valid in the
335 .Va ack_received
336 hook function.
337 .Pp
338 The
339 .Va curack
340 field specifies the sequence number of the most recently received ACK packet.
341 It is only valid in the
342 .Va ack_received ,
343 .Va cong_signal
344 and
345 .Va post_recovery
346 hook functions.
347 .Pp
348 The
349 .Va flags
350 field is used to pass useful information from the stack to a
351 .Nm
352 module.
353 The CCF_ABC_SENTAWND flag is relevant in
354 .Va ack_received
355 and is set when appropriate byte counting (RFC3465) has counted a window's worth
356 of bytes has been sent.
357 It is the module's responsibility to clear the flag after it has processed the
358 signal.
359 The CCF_CWND_LIMITED flag is relevant in
360 .Va ack_received
361 and is set when the connection's ability to send data is currently constrained
362 by the value of the congestion window.
363 Algorithms should use the absence of this flag being set to avoid accumulating
364 a large difference between the congestion window and send window.
365 .Pp
366 The
367 .Va nsegs
368 variable is used to pass in how much compression was done by the local
369 LRO system.
370 So for example if LRO pushed three in-order acknowledgements into
371 one acknowledgement the variable would be set to three.
372 .Pp
373 The
374 .Va labc
375 variable is used in conjunction with the CCF_USE_LOCAL_ABC flag
376 to override what labc variable the congestion controller will use
377 for this particular acknowledgement.
378 .Sh SEE ALSO
379 .Xr cc_cdg 4 ,
380 .Xr cc_chd 4 ,
381 .Xr cc_cubic 4 ,
382 .Xr cc_dctcp 4 ,
383 .Xr cc_hd 4 ,
384 .Xr cc_htcp 4 ,
385 .Xr cc_newreno 4 ,
386 .Xr cc_vegas 4 ,
387 .Xr mod_cc 4 ,
388 .Xr tcp 4
389 .Sh ACKNOWLEDGEMENTS
390 Development and testing of this software were made possible in part by grants
391 from the FreeBSD Foundation and Cisco University Research Program Fund at
392 Community Foundation Silicon Valley.
393 .Sh FUTURE WORK
394 Integrate with
395 .Xr sctp 4 .
396 .Sh HISTORY
397 The modular Congestion Control (CC) framework first appeared in
398 .Fx 9.0 .
399 .Pp
400 The framework was first released in 2007 by James Healy and Lawrence Stewart
401 whilst working on the NewTCP research project at Swinburne University of
402 Technology's Centre for Advanced Internet Architectures, Melbourne, Australia,
403 which was made possible in part by a grant from the Cisco University Research
404 Program Fund at Community Foundation Silicon Valley.
405 More details are available at:
406 .Pp
407 http://caia.swin.edu.au/urp/newtcp/
408 .Sh AUTHORS
409 .An -nosplit
410 The
411 .Nm
412 framework was written by
413 .An Lawrence Stewart Aq Mt lstewart@FreeBSD.org ,
414 .An James Healy Aq Mt jimmy@deefa.com
415 and
416 .An David Hayes Aq Mt david.hayes@ieee.org .
417 .Pp
418 This manual page was written by
419 .An David Hayes Aq Mt david.hayes@ieee.org
420 and
421 .An Lawrence Stewart Aq Mt lstewart@FreeBSD.org .
422