1.\" 2.\" Copyright (c) 2008-2009 Lawrence Stewart <lstewart@FreeBSD.org> 3.\" Copyright (c) 2010-2011 The FreeBSD Foundation 4.\" All rights reserved. 5.\" 6.\" Portions of this documentation were written at the Centre for Advanced 7.\" Internet Architectures, Swinburne University of Technology, Melbourne, 8.\" Australia by David Hayes and Lawrence Stewart under sponsorship from the 9.\" FreeBSD Foundation. 10.\" 11.\" Redistribution and use in source and binary forms, with or without 12.\" modification, are permitted provided that the following conditions 13.\" are met: 14.\" 1. Redistributions of source code must retain the above copyright 15.\" notice, this list of conditions and the following disclaimer. 16.\" 2. Redistributions in binary form must reproduce the above copyright 17.\" notice, this list of conditions and the following disclaimer in the 18.\" documentation and/or other materials provided with the distribution. 19.\" 20.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 21.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 23.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR 24.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 26.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 27.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 28.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 29.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 30.\" SUCH DAMAGE. 31.\" 32.Dd May 13, 2021 33.Dt MOD_CC 9 34.Os 35.Sh NAME 36.Nm mod_cc , 37.Nm DECLARE_CC_MODULE , 38.Nm CCV 39.Nd Modular Congestion Control 40.Sh SYNOPSIS 41.In netinet/tcp.h 42.In netinet/cc/cc.h 43.In netinet/cc/cc_module.h 44.Fn DECLARE_CC_MODULE "ccname" "ccalgo" 45.Fn CCV "ccv" "what" 46.Sh DESCRIPTION 47The 48.Nm 49framework allows congestion control algorithms to be implemented as dynamically 50loadable kernel modules via the 51.Xr kld 4 52facility. 53Transport protocols can select from the list of available algorithms on a 54connection-by-connection basis, or use the system default (see 55.Xr mod_cc 4 56for more details). 57.Pp 58.Nm 59modules are identified by an 60.Xr ascii 7 61name and set of hook functions encapsulated in a 62.Vt "struct cc_algo" , 63which has the following members: 64.Bd -literal -offset indent 65struct cc_algo { 66 char name[TCP_CA_NAME_MAX]; 67 int (*mod_init) (void); 68 int (*mod_destroy) (void); 69 size_t (*cc_data_sz)(void); 70 int (*cb_init) (struct cc_var *ccv, void *ptr); 71 void (*cb_destroy) (struct cc_var *ccv); 72 void (*conn_init) (struct cc_var *ccv); 73 void (*ack_received) (struct cc_var *ccv, uint16_t type); 74 void (*cong_signal) (struct cc_var *ccv, uint32_t type); 75 void (*post_recovery) (struct cc_var *ccv); 76 void (*after_idle) (struct cc_var *ccv); 77 int (*ctl_output)(struct cc_var *, struct sockopt *, void *); 78 void (*rttsample)(struct cc_var *, uint32_t, uint32_t, uint32_t); 79 void (*newround)(struct cc_var *, uint32_t); 80}; 81.Ed 82.Pp 83The 84.Va name 85field identifies the unique name of the algorithm, and should be no longer than 86TCP_CA_NAME_MAX-1 characters in length (the TCP_CA_NAME_MAX define lives in 87.In netinet/tcp.h 88for compatibility reasons). 89.Pp 90The 91.Va mod_init 92function is called when a new module is loaded into the system but before the 93registration process is complete. 94It should be implemented if a module needs to set up some global state prior to 95being available for use by new connections. 96Returning a non-zero value from 97.Va mod_init 98will cause the loading of the module to fail. 99.Pp 100The 101.Va mod_destroy 102function is called prior to unloading an existing module from the kernel. 103It should be implemented if a module needs to clean up any global state before 104being removed from the kernel. 105The return value is currently ignored. 106.Pp 107The 108.Va cc_data_sz 109function is called by the socket option code to get the size of 110data that the 111.Va cb_init 112function needs. 113The socket option code then preallocates the modules memory so that the 114.Va cb_init 115function will not fail (the socket option code uses M_WAITOK with 116no locks held to do this). 117.Pp 118The 119.Va cb_init 120function is called when a TCP control block 121.Vt struct tcpcb 122is created. 123It should be implemented if a module needs to allocate memory for storing 124private per-connection state. 125Returning a non-zero value from 126.Va cb_init 127will cause the connection set up to be aborted, terminating the connection as a 128result. 129Note that the ptr argument passed to the function should be checked to 130see if it is non-NULL, if so it is preallocated memory that the cb_init function 131must use instead of calling malloc itself. 132.Pp 133The 134.Va cb_destroy 135function is called when a TCP control block 136.Vt struct tcpcb 137is destroyed. 138It should be implemented if a module needs to free memory allocated in 139.Va cb_init . 140.Pp 141The 142.Va conn_init 143function is called when a new connection has been established and variables are 144being initialised. 145It should be implemented to initialise congestion control algorithm variables 146for the newly established connection. 147.Pp 148The 149.Va ack_received 150function is called when a TCP acknowledgement (ACK) packet is received. 151Modules use the 152.Fa type 153argument as an input to their congestion management algorithms. 154The ACK types currently reported by the stack are CC_ACK and CC_DUPACK. 155CC_ACK indicates the received ACK acknowledges previously unacknowledged data. 156CC_DUPACK indicates the received ACK acknowledges data we have already received 157an ACK for. 158.Pp 159The 160.Va cong_signal 161function is called when a congestion event is detected by the TCP stack. 162Modules use the 163.Fa type 164argument as an input to their congestion management algorithms. 165The congestion event types currently reported by the stack are CC_ECN, CC_RTO, 166CC_RTO_ERR and CC_NDUPACK. 167CC_ECN is reported when the TCP stack receives an explicit congestion notification 168(RFC3168). 169CC_RTO is reported when the retransmission time out timer fires. 170CC_RTO_ERR is reported if the retransmission time out timer fired in error. 171CC_NDUPACK is reported if N duplicate ACKs have been received back-to-back, 172where N is the fast retransmit duplicate ack threshold (N=3 currently as per 173RFC5681). 174.Pp 175The 176.Va post_recovery 177function is called after the TCP connection has recovered from a congestion event. 178It should be implemented to adjust state as required. 179.Pp 180The 181.Va after_idle 182function is called when data transfer resumes after an idle period. 183It should be implemented to adjust state as required. 184.Pp 185The 186.Va ctl_output 187function is called when 188.Xr getsockopt 2 189or 190.Xr setsockopt 2 191is called on a 192.Xr tcp 4 193socket with the 194.Va struct sockopt 195pointer forwarded unmodified from the TCP control, and a 196.Va void * 197pointer to algorithm specific argument. 198.Pp 199The 200.Va rttsample 201function is called to pass round trip time information to the 202congestion controller. 203The additional arguments to the function include the microsecond RTT 204that is being noted, the number of times that the data being 205acknowledged was retransmitted as well as the flightsize at send. 206For transports that do not track flightsize at send, this variable 207will be the current cwnd at the time of the call. 208.Pp 209The 210.Va newround 211function is called each time a new round trip time begins. 212The montonically increasing round number is also passed to the 213congestion controller as well. 214This can be used for various purposes by the congestion controller (e.g Hystart++). 215.Pp 216Note that currently not all TCP stacks call the 217.Va rttsample 218and 219.Va newround 220function so dependency on these functions is also 221dependent upon which TCP stack is in use. 222.Pp 223The 224.Fn DECLARE_CC_MODULE 225macro provides a convenient wrapper around the 226.Xr DECLARE_MODULE 9 227macro, and is used to register a 228.Nm 229module with the 230.Nm 231framework. 232The 233.Fa ccname 234argument specifies the module's name. 235The 236.Fa ccalgo 237argument points to the module's 238.Vt struct cc_algo . 239.Pp 240.Nm 241modules must instantiate a 242.Vt struct cc_algo , 243but are only required to set the name field, and optionally any of the function 244pointers. 245Note that if a module defines the 246.Va cb_init 247function it also must define a 248.Va cc_data_sz 249function. 250This is because when switching from one congestion control 251module to another the socket option code will preallocate memory for the 252.Va cb_init 253function. 254If no memory is allocated by the modules 255.Va cb_init 256then the 257.Va cc_data_sz 258function should return 0. 259.Pp 260The stack will skip calling any function pointer which is NULL, so there is no 261requirement to implement any of the function pointers (with the exception of 262the cb_init <-> cc_data_sz dependency noted above). 263Using the C99 designated initialiser feature to set fields is encouraged. 264.Pp 265Each function pointer which deals with congestion control state is passed a 266pointer to a 267.Vt struct cc_var , 268which has the following members: 269.Bd -literal -offset indent 270struct cc_var { 271 void *cc_data; 272 int bytes_this_ack; 273 tcp_seq curack; 274 uint32_t flags; 275 int type; 276 union ccv_container { 277 struct tcpcb *tcp; 278 struct sctp_nets *sctp; 279 } ccvc; 280 uint16_t nsegs; 281 uint8_t labc; 282}; 283.Ed 284.Pp 285.Vt struct cc_var 286groups congestion control related variables into a single, embeddable structure 287and adds a layer of indirection to accessing transport protocol control blocks. 288The eventual goal is to allow a single set of 289.Nm 290modules to be shared between all congestion aware transport protocols, though 291currently only 292.Xr tcp 4 293is supported. 294.Pp 295To aid the eventual transition towards this goal, direct use of variables from 296the transport protocol's data structures is strongly discouraged. 297However, it is inevitable at the current time to require access to some of these 298variables, and so the 299.Fn CCV 300macro exists as a convenience accessor. 301The 302.Fa ccv 303argument points to the 304.Vt struct cc_var 305passed into the function by the 306.Nm 307framework. 308The 309.Fa what 310argument specifies the name of the variable to access. 311.Pp 312Apart from the 313.Va type 314and 315.Va ccv_container 316fields, the remaining fields in 317.Vt struct cc_var 318are for use by 319.Nm 320modules. 321.Pp 322The 323.Va cc_data 324field is available for algorithms requiring additional per-connection state to 325attach a dynamic memory pointer to. 326The memory should be allocated and attached in the module's 327.Va cb_init 328hook function. 329.Pp 330The 331.Va bytes_this_ack 332field specifies the number of new bytes acknowledged by the most recently 333received ACK packet. 334It is only valid in the 335.Va ack_received 336hook function. 337.Pp 338The 339.Va curack 340field specifies the sequence number of the most recently received ACK packet. 341It is only valid in the 342.Va ack_received , 343.Va cong_signal 344and 345.Va post_recovery 346hook functions. 347.Pp 348The 349.Va flags 350field is used to pass useful information from the stack to a 351.Nm 352module. 353The CCF_ABC_SENTAWND flag is relevant in 354.Va ack_received 355and is set when appropriate byte counting (RFC3465) has counted a window's worth 356of bytes has been sent. 357It is the module's responsibility to clear the flag after it has processed the 358signal. 359The CCF_CWND_LIMITED flag is relevant in 360.Va ack_received 361and is set when the connection's ability to send data is currently constrained 362by the value of the congestion window. 363Algorithms should use the absence of this flag being set to avoid accumulating 364a large difference between the congestion window and send window. 365.Pp 366The 367.Va nsegs 368variable is used to pass in how much compression was done by the local 369LRO system. 370So for example if LRO pushed three in-order acknowledgements into 371one acknowledgement the variable would be set to three. 372.Pp 373The 374.Va labc 375variable is used in conjunction with the CCF_USE_LOCAL_ABC flag 376to override what labc variable the congestion controller will use 377for this particular acknowledgement. 378.Sh SEE ALSO 379.Xr cc_cdg 4 , 380.Xr cc_chd 4 , 381.Xr cc_cubic 4 , 382.Xr cc_dctcp 4 , 383.Xr cc_hd 4 , 384.Xr cc_htcp 4 , 385.Xr cc_newreno 4 , 386.Xr cc_vegas 4 , 387.Xr mod_cc 4 , 388.Xr tcp 4 389.Sh ACKNOWLEDGEMENTS 390Development and testing of this software were made possible in part by grants 391from the FreeBSD Foundation and Cisco University Research Program Fund at 392Community Foundation Silicon Valley. 393.Sh FUTURE WORK 394Integrate with 395.Xr sctp 4 . 396.Sh HISTORY 397The modular Congestion Control (CC) framework first appeared in 398.Fx 9.0 . 399.Pp 400The framework was first released in 2007 by James Healy and Lawrence Stewart 401whilst working on the NewTCP research project at Swinburne University of 402Technology's Centre for Advanced Internet Architectures, Melbourne, Australia, 403which was made possible in part by a grant from the Cisco University Research 404Program Fund at Community Foundation Silicon Valley. 405More details are available at: 406.Pp 407http://caia.swin.edu.au/urp/newtcp/ 408.Sh AUTHORS 409.An -nosplit 410The 411.Nm 412framework was written by 413.An Lawrence Stewart Aq Mt lstewart@FreeBSD.org , 414.An James Healy Aq Mt jimmy@deefa.com 415and 416.An David Hayes Aq Mt david.hayes@ieee.org . 417.Pp 418This manual page was written by 419.An David Hayes Aq Mt david.hayes@ieee.org 420and 421.An Lawrence Stewart Aq Mt lstewart@FreeBSD.org . 422