1.\" 2.\" Copyright (c) 2008-2009 Lawrence Stewart <lstewart@FreeBSD.org> 3.\" Copyright (c) 2010-2011 The FreeBSD Foundation 4.\" All rights reserved. 5.\" 6.\" Portions of this documentation were written at the Centre for Advanced 7.\" Internet Architectures, Swinburne University of Technology, Melbourne, 8.\" Australia by David Hayes and Lawrence Stewart under sponsorship from the 9.\" FreeBSD Foundation. 10.\" 11.\" Redistribution and use in source and binary forms, with or without 12.\" modification, are permitted provided that the following conditions 13.\" are met: 14.\" 1. Redistributions of source code must retain the above copyright 15.\" notice, this list of conditions and the following disclaimer. 16.\" 2. Redistributions in binary form must reproduce the above copyright 17.\" notice, this list of conditions and the following disclaimer in the 18.\" documentation and/or other materials provided with the distribution. 19.\" 20.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 21.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 23.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR 24.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 26.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 27.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 28.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 29.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 30.\" SUCH DAMAGE. 31.\" 32.\" $FreeBSD$ 33.\" 34.Dd May 13, 2021 35.Dt MOD_CC 9 36.Os 37.Sh NAME 38.Nm mod_cc , 39.Nm DECLARE_CC_MODULE , 40.Nm CCV 41.Nd Modular Congestion Control 42.Sh SYNOPSIS 43.In netinet/tcp.h 44.In netinet/cc/cc.h 45.In netinet/cc/cc_module.h 46.Fn DECLARE_CC_MODULE "ccname" "ccalgo" 47.Fn CCV "ccv" "what" 48.Sh DESCRIPTION 49The 50.Nm 51framework allows congestion control algorithms to be implemented as dynamically 52loadable kernel modules via the 53.Xr kld 4 54facility. 55Transport protocols can select from the list of available algorithms on a 56connection-by-connection basis, or use the system default (see 57.Xr mod_cc 4 58for more details). 59.Pp 60.Nm 61modules are identified by an 62.Xr ascii 7 63name and set of hook functions encapsulated in a 64.Vt "struct cc_algo" , 65which has the following members: 66.Bd -literal -offset indent 67struct cc_algo { 68 char name[TCP_CA_NAME_MAX]; 69 int (*mod_init) (void); 70 int (*mod_destroy) (void); 71 size_t (*cc_data_sz)(void); 72 int (*cb_init) (struct cc_var *ccv, void *ptr); 73 void (*cb_destroy) (struct cc_var *ccv); 74 void (*conn_init) (struct cc_var *ccv); 75 void (*ack_received) (struct cc_var *ccv, uint16_t type); 76 void (*cong_signal) (struct cc_var *ccv, uint32_t type); 77 void (*post_recovery) (struct cc_var *ccv); 78 void (*after_idle) (struct cc_var *ccv); 79 int (*ctl_output)(struct cc_var *, struct sockopt *, void *); 80 void (*rttsample)(struct cc_var *, uint32_t, uint32_t, uint32_t); 81 void (*newround)(struct cc_var *, uint32_t); 82}; 83.Ed 84.Pp 85The 86.Va name 87field identifies the unique name of the algorithm, and should be no longer than 88TCP_CA_NAME_MAX-1 characters in length (the TCP_CA_NAME_MAX define lives in 89.In netinet/tcp.h 90for compatibility reasons). 91.Pp 92The 93.Va mod_init 94function is called when a new module is loaded into the system but before the 95registration process is complete. 96It should be implemented if a module needs to set up some global state prior to 97being available for use by new connections. 98Returning a non-zero value from 99.Va mod_init 100will cause the loading of the module to fail. 101.Pp 102The 103.Va mod_destroy 104function is called prior to unloading an existing module from the kernel. 105It should be implemented if a module needs to clean up any global state before 106being removed from the kernel. 107The return value is currently ignored. 108.Pp 109The 110.Va cc_data_sz 111function is called by the socket option code to get the size of 112data that the 113.Va cb_init 114function needs. 115The socket option code then preallocates the modules memory so that the 116.Va cb_init 117function will not fail (the socket option code uses M_WAITOK with 118no locks held to do this). 119.Pp 120The 121.Va cb_init 122function is called when a TCP control block 123.Vt struct tcpcb 124is created. 125It should be implemented if a module needs to allocate memory for storing 126private per-connection state. 127Returning a non-zero value from 128.Va cb_init 129will cause the connection set up to be aborted, terminating the connection as a 130result. 131Note that the ptr argument passed to the function should be checked to 132see if it is non-NULL, if so it is preallocated memory that the cb_init function 133must use instead of calling malloc itself. 134.Pp 135The 136.Va cb_destroy 137function is called when a TCP control block 138.Vt struct tcpcb 139is destroyed. 140It should be implemented if a module needs to free memory allocated in 141.Va cb_init . 142.Pp 143The 144.Va conn_init 145function is called when a new connection has been established and variables are 146being initialised. 147It should be implemented to initialise congestion control algorithm variables 148for the newly established connection. 149.Pp 150The 151.Va ack_received 152function is called when a TCP acknowledgement (ACK) packet is received. 153Modules use the 154.Fa type 155argument as an input to their congestion management algorithms. 156The ACK types currently reported by the stack are CC_ACK and CC_DUPACK. 157CC_ACK indicates the received ACK acknowledges previously unacknowledged data. 158CC_DUPACK indicates the received ACK acknowledges data we have already received 159an ACK for. 160.Pp 161The 162.Va cong_signal 163function is called when a congestion event is detected by the TCP stack. 164Modules use the 165.Fa type 166argument as an input to their congestion management algorithms. 167The congestion event types currently reported by the stack are CC_ECN, CC_RTO, 168CC_RTO_ERR and CC_NDUPACK. 169CC_ECN is reported when the TCP stack receives an explicit congestion notification 170(RFC3168). 171CC_RTO is reported when the retransmission time out timer fires. 172CC_RTO_ERR is reported if the retransmission time out timer fired in error. 173CC_NDUPACK is reported if N duplicate ACKs have been received back-to-back, 174where N is the fast retransmit duplicate ack threshold (N=3 currently as per 175RFC5681). 176.Pp 177The 178.Va post_recovery 179function is called after the TCP connection has recovered from a congestion event. 180It should be implemented to adjust state as required. 181.Pp 182The 183.Va after_idle 184function is called when data transfer resumes after an idle period. 185It should be implemented to adjust state as required. 186.Pp 187The 188.Va ctl_output 189function is called when 190.Xr getsockopt 2 191or 192.Xr setsockopt 2 193is called on a 194.Xr tcp 4 195socket with the 196.Va struct sockopt 197pointer forwarded unmodified from the TCP control, and a 198.Va void * 199pointer to algorithm specific argument. 200.Pp 201The 202.Va rttsample 203function is called to pass round trip time information to the 204congestion controller. 205The additional arguments to the function include the microsecond RTT 206that is being noted, the number of times that the data being 207acknowledged was retransmitted as well as the flightsize at send. 208For transports that do not track flightsize at send, this variable 209will be the current cwnd at the time of the call. 210.Pp 211The 212.Va newround 213function is called each time a new round trip time begins. 214The montonically increasing round number is also passed to the 215congestion controller as well. 216This can be used for various purposes by the congestion controller (e.g Hystart++). 217.Pp 218Note that currently not all TCP stacks call the 219.Va rttsample 220and 221.Va newround 222function so dependancy on these functions is also 223dependant upon which TCP stack is in use. 224.Pp 225The 226.Fn DECLARE_CC_MODULE 227macro provides a convenient wrapper around the 228.Xr DECLARE_MODULE 9 229macro, and is used to register a 230.Nm 231module with the 232.Nm 233framework. 234The 235.Fa ccname 236argument specifies the module's name. 237The 238.Fa ccalgo 239argument points to the module's 240.Vt struct cc_algo . 241.Pp 242.Nm 243modules must instantiate a 244.Vt struct cc_algo , 245but are only required to set the name field, and optionally any of the function 246pointers. 247Note that if a module defines the 248.Va cb_init 249function it also must define a 250.Va cc_data_sz 251function. 252This is because when switching from one congestion control 253module to another the socket option code will preallocate memory for the 254.Va cb_init 255function. 256If no memory is allocated by the modules 257.Va cb_init 258then the 259.Va cc_data_sz 260function should return 0. 261.Pp 262The stack will skip calling any function pointer which is NULL, so there is no 263requirement to implement any of the function pointers (with the exception of 264the cb_init <-> cc_data_sz dependancy noted above). 265Using the C99 designated initialiser feature to set fields is encouraged. 266.Pp 267Each function pointer which deals with congestion control state is passed a 268pointer to a 269.Vt struct cc_var , 270which has the following members: 271.Bd -literal -offset indent 272struct cc_var { 273 void *cc_data; 274 int bytes_this_ack; 275 tcp_seq curack; 276 uint32_t flags; 277 int type; 278 union ccv_container { 279 struct tcpcb *tcp; 280 struct sctp_nets *sctp; 281 } ccvc; 282 uint16_t nsegs; 283 uint8_t labc; 284}; 285.Ed 286.Pp 287.Vt struct cc_var 288groups congestion control related variables into a single, embeddable structure 289and adds a layer of indirection to accessing transport protocol control blocks. 290The eventual goal is to allow a single set of 291.Nm 292modules to be shared between all congestion aware transport protocols, though 293currently only 294.Xr tcp 4 295is supported. 296.Pp 297To aid the eventual transition towards this goal, direct use of variables from 298the transport protocol's data structures is strongly discouraged. 299However, it is inevitable at the current time to require access to some of these 300variables, and so the 301.Fn CCV 302macro exists as a convenience accessor. 303The 304.Fa ccv 305argument points to the 306.Vt struct cc_var 307passed into the function by the 308.Nm 309framework. 310The 311.Fa what 312argument specifies the name of the variable to access. 313.Pp 314Apart from the 315.Va type 316and 317.Va ccv_container 318fields, the remaining fields in 319.Vt struct cc_var 320are for use by 321.Nm 322modules. 323.Pp 324The 325.Va cc_data 326field is available for algorithms requiring additional per-connection state to 327attach a dynamic memory pointer to. 328The memory should be allocated and attached in the module's 329.Va cb_init 330hook function. 331.Pp 332The 333.Va bytes_this_ack 334field specifies the number of new bytes acknowledged by the most recently 335received ACK packet. 336It is only valid in the 337.Va ack_received 338hook function. 339.Pp 340The 341.Va curack 342field specifies the sequence number of the most recently received ACK packet. 343It is only valid in the 344.Va ack_received , 345.Va cong_signal 346and 347.Va post_recovery 348hook functions. 349.Pp 350The 351.Va flags 352field is used to pass useful information from the stack to a 353.Nm 354module. 355The CCF_ABC_SENTAWND flag is relevant in 356.Va ack_received 357and is set when appropriate byte counting (RFC3465) has counted a window's worth 358of bytes has been sent. 359It is the module's responsibility to clear the flag after it has processed the 360signal. 361The CCF_CWND_LIMITED flag is relevant in 362.Va ack_received 363and is set when the connection's ability to send data is currently constrained 364by the value of the congestion window. 365Algorithms should use the absence of this flag being set to avoid accumulating 366a large difference between the congestion window and send window. 367.Pp 368The 369.Va nsegs 370variable is used to pass in how much compression was done by the local 371LRO system. 372So for example if LRO pushed three in-order acknowledgements into 373one acknowledgement the variable would be set to three. 374.Pp 375The 376.Va labc 377variable is used in conjunction with the CCF_USE_LOCAL_ABC flag 378to override what labc variable the congestion controller will use 379for this particular acknowledgement. 380.Sh SEE ALSO 381.Xr cc_cdg 4 , 382.Xr cc_chd 4 , 383.Xr cc_cubic 4 , 384.Xr cc_dctcp 4 , 385.Xr cc_hd 4 , 386.Xr cc_htcp 4 , 387.Xr cc_newreno 4 , 388.Xr cc_vegas 4 , 389.Xr mod_cc 4 , 390.Xr tcp 4 391.Sh ACKNOWLEDGEMENTS 392Development and testing of this software were made possible in part by grants 393from the FreeBSD Foundation and Cisco University Research Program Fund at 394Community Foundation Silicon Valley. 395.Sh FUTURE WORK 396Integrate with 397.Xr sctp 4 . 398.Sh HISTORY 399The modular Congestion Control (CC) framework first appeared in 400.Fx 9.0 . 401.Pp 402The framework was first released in 2007 by James Healy and Lawrence Stewart 403whilst working on the NewTCP research project at Swinburne University of 404Technology's Centre for Advanced Internet Architectures, Melbourne, Australia, 405which was made possible in part by a grant from the Cisco University Research 406Program Fund at Community Foundation Silicon Valley. 407More details are available at: 408.Pp 409http://caia.swin.edu.au/urp/newtcp/ 410.Sh AUTHORS 411.An -nosplit 412The 413.Nm 414framework was written by 415.An Lawrence Stewart Aq Mt lstewart@FreeBSD.org , 416.An James Healy Aq Mt jimmy@deefa.com 417and 418.An David Hayes Aq Mt david.hayes@ieee.org . 419.Pp 420This manual page was written by 421.An David Hayes Aq Mt david.hayes@ieee.org 422and 423.An Lawrence Stewart Aq Mt lstewart@FreeBSD.org . 424