1.\" 2.\" Copyright (c) 2008-2009 Lawrence Stewart <lstewart@FreeBSD.org> 3.\" Copyright (c) 2010-2011 The FreeBSD Foundation 4.\" All rights reserved. 5.\" 6.\" Portions of this documentation were written at the Centre for Advanced 7.\" Internet Architectures, Swinburne University of Technology, Melbourne, 8.\" Australia by David Hayes and Lawrence Stewart under sponsorship from the 9.\" FreeBSD Foundation. 10.\" 11.\" Redistribution and use in source and binary forms, with or without 12.\" modification, are permitted provided that the following conditions 13.\" are met: 14.\" 1. Redistributions of source code must retain the above copyright 15.\" notice, this list of conditions and the following disclaimer. 16.\" 2. Redistributions in binary form must reproduce the above copyright 17.\" notice, this list of conditions and the following disclaimer in the 18.\" documentation and/or other materials provided with the distribution. 19.\" 20.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 21.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 22.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 23.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR 24.\" ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 25.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 26.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 27.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 28.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 29.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 30.\" SUCH DAMAGE. 31.\" 32.\" $FreeBSD$ 33.\" 34.Dd September 15, 2011 35.Dt MOD_CC 9 36.Os 37.Sh NAME 38.Nm mod_cc , 39.Nm DECLARE_CC_MODULE , 40.Nm CC_VAR 41.Nd Modular Congestion Control 42.Sh SYNOPSIS 43.In netinet/cc.h 44.In netinet/cc/cc_module.h 45.Fn DECLARE_CC_MODULE "ccname" "ccalgo" 46.Fn CC_VAR "ccv" "what" 47.Sh DESCRIPTION 48The 49.Nm 50framework allows congestion control algorithms to be implemented as dynamically 51loadable kernel modules via the 52.Xr kld 4 53facility. 54Transport protocols can select from the list of available algorithms on a 55connection-by-connection basis, or use the system default (see 56.Xr mod_cc 4 57for more details). 58.Pp 59.Nm 60modules are identified by an 61.Xr ascii 7 62name and set of hook functions encapsulated in a 63.Vt "struct cc_algo" , 64which has the following members: 65.Bd -literal -offset indent 66struct cc_algo { 67 char name[TCP_CA_NAME_MAX]; 68 int (*mod_init) (void); 69 int (*mod_destroy) (void); 70 int (*cb_init) (struct cc_var *ccv); 71 void (*cb_destroy) (struct cc_var *ccv); 72 void (*conn_init) (struct cc_var *ccv); 73 void (*ack_received) (struct cc_var *ccv, uint16_t type); 74 void (*cong_signal) (struct cc_var *ccv, uint32_t type); 75 void (*post_recovery) (struct cc_var *ccv); 76 void (*after_idle) (struct cc_var *ccv); 77}; 78.Ed 79.Pp 80The 81.Va name 82field identifies the unique name of the algorithm, and should be no longer than 83TCP_CA_NAME_MAX-1 characters in length (the TCP_CA_NAME_MAX define lives in 84.In netinet/tcp.h 85for compatibility reasons). 86.Pp 87The 88.Va mod_init 89function is called when a new module is loaded into the system but before the 90registration process is complete. 91It should be implemented if a module needs to set up some global state prior to 92being available for use by new connections. 93Returning a non-zero value from 94.Va mod_init 95will cause the loading of the module to fail. 96.Pp 97The 98.Va mod_destroy 99function is called prior to unloading an existing module from the kernel. 100It should be implemented if a module needs to clean up any global state before 101being removed from the kernel. 102The return value is currently ignored. 103.Pp 104The 105.Va cb_init 106function is called when a TCP control block 107.Vt struct tcpcb 108is created. 109It should be implemented if a module needs to allocate memory for storing 110private per-connection state. 111Returning a non-zero value from 112.Va cb_init 113will cause the connection set up to be aborted, terminating the connection as a 114result. 115.Pp 116The 117.Va cb_destroy 118function is called when a TCP control block 119.Vt struct tcpcb 120is destroyed. 121It should be implemented if a module needs to free memory allocated in 122.Va cb_init . 123.Pp 124The 125.Va conn_init 126function is called when a new connection has been established and variables are 127being initialised. 128It should be implemented to initialise congestion control algorithm variables 129for the newly established connection. 130.Pp 131The 132.Va ack_received 133function is called when a TCP acknowledgement (ACK) packet is received. 134Modules use the 135.Fa type 136argument as an input to their congestion management algorithms. 137The ACK types currently reported by the stack are CC_ACK and CC_DUPACK. 138CC_ACK indicates the received ACK acknowledges previously unacknowledged data. 139CC_DUPACK indicates the received ACK acknowledges data we have already received 140an ACK for. 141.Pp 142The 143.Va cong_signal 144function is called when a congestion event is detected by the TCP stack. 145Modules use the 146.Fa type 147argument as an input to their congestion management algorithms. 148The congestion event types currently reported by the stack are CC_ECN, CC_RTO, 149CC_RTO_ERR and CC_NDUPACK. 150CC_ECN is reported when the TCP stack receives an explicit congestion notification 151(RFC3168). 152CC_RTO is reported when the retransmission time out timer fires. 153CC_RTO_ERR is reported if the retransmission time out timer fired in error. 154CC_NDUPACK is reported if N duplicate ACKs have been received back-to-back, 155where N is the fast retransmit duplicate ack threshold (N=3 currently as per 156RFC5681). 157.Pp 158The 159.Va post_recovery 160function is called after the TCP connection has recovered from a congestion event. 161It should be implemented to adjust state as required. 162.Pp 163The 164.Va after_idle 165function is called when data transfer resumes after an idle period. 166It should be implemented to adjust state as required. 167.Pp 168The 169.Fn DECLARE_CC_MODULE 170macro provides a convenient wrapper around the 171.Xr DECLARE_MODULE 9 172macro, and is used to register a 173.Nm 174module with the 175.Nm 176framework. 177The 178.Fa ccname 179argument specifies the module's name. 180The 181.Fa ccalgo 182argument points to the module's 183.Vt struct cc_algo . 184.Pp 185.Nm 186modules must instantiate a 187.Vt struct cc_algo , 188but are only required to set the name field, and optionally any of the function 189pointers. 190The stack will skip calling any function pointer which is NULL, so there is no 191requirement to implement any of the function pointers. 192Using the C99 designated initialiser feature to set fields is encouraged. 193.Pp 194Each function pointer which deals with congestion control state is passed a 195pointer to a 196.Vt struct cc_var , 197which has the following members: 198.Bd -literal -offset indent 199struct cc_var { 200 void *cc_data; 201 int bytes_this_ack; 202 tcp_seq curack; 203 uint32_t flags; 204 int type; 205 union ccv_container { 206 struct tcpcb *tcp; 207 struct sctp_nets *sctp; 208 } ccvc; 209}; 210.Ed 211.Pp 212.Vt struct cc_var 213groups congestion control related variables into a single, embeddable structure 214and adds a layer of indirection to accessing transport protocol control blocks. 215The eventual goal is to allow a single set of 216.Nm 217modules to be shared between all congestion aware transport protocols, though 218currently only 219.Xr tcp 4 220is supported. 221.Pp 222To aid the eventual transition towards this goal, direct use of variables from 223the transport protocol's data structures is strongly discouraged. 224However, it is inevitable at the current time to require access to some of these 225variables, and so the 226.Fn CC_VAR 227macro exists as a convenience accessor. 228The 229.Fa ccv 230argument points to the 231.Vt struct cc_var 232passed into the function by the 233.Nm 234framework. 235The 236.Fa what 237argument specifies the name of the variable to access. 238.Pp 239Apart from the 240.Va type 241and 242.Va ccv_container 243fields, the remaining fields in 244.Vt struct cc_var 245are for use by 246.Nm 247modules. 248.Pp 249The 250.Va cc_data 251field is available for algorithms requiring additional per-connection state to 252attach a dynamic memory pointer to. 253The memory should be allocated and attached in the module's 254.Va cb_init 255hook function. 256.Pp 257The 258.Va bytes_this_ack 259field specifies the number of new bytes acknowledged by the most recently 260received ACK packet. 261It is only valid in the 262.Va ack_received 263hook function. 264.Pp 265The 266.Va curack 267field specifies the sequence number of the most recently received ACK packet. 268It is only valid in the 269.Va ack_received , 270.Va cong_signal 271and 272.Va post_recovery 273hook functions. 274.Pp 275The 276.Va flags 277field is used to pass useful information from the stack to a 278.Nm 279module. 280The CCF_ABC_SENTAWND flag is relevant in 281.Va ack_received 282and is set when appropriate byte counting (RFC3465) has counted a window's worth 283of bytes has been sent. 284It is the module's responsibility to clear the flag after it has processed the 285signal. 286The CCF_CWND_LIMITED flag is relevant in 287.Va ack_received 288and is set when the connection's ability to send data is currently constrained 289by the value of the congestion window. 290Algorithms should use the absence of this flag being set to avoid accumulating 291a large difference between the congestion window and send window. 292.Sh SEE ALSO 293.Xr cc_chd 4 , 294.Xr cc_cubic 4 , 295.Xr cc_hd 4 , 296.Xr cc_htcp 4 , 297.Xr cc_newreno 4 , 298.Xr cc_vegas 4 , 299.Xr mod_cc 4 , 300.Xr tcp 4 301.Sh ACKNOWLEDGEMENTS 302Development and testing of this software were made possible in part by grants 303from the FreeBSD Foundation and Cisco University Research Program Fund at 304Community Foundation Silicon Valley. 305.Sh FUTURE WORK 306Integrate with 307.Xr sctp 4 . 308.Sh HISTORY 309The modular Congestion Control (CC) framework first appeared in 310.Fx 9.0 . 311.Pp 312The framework was first released in 2007 by James Healy and Lawrence Stewart 313whilst working on the NewTCP research project at Swinburne University of 314Technology's Centre for Advanced Internet Architectures, Melbourne, Australia, 315which was made possible in part by a grant from the Cisco University Research 316Program Fund at Community Foundation Silicon Valley. 317More details are available at: 318.Pp 319http://caia.swin.edu.au/urp/newtcp/ 320.Sh AUTHORS 321.An -nosplit 322The 323.Nm 324framework was written by 325.An Lawrence Stewart Aq Mt lstewart@FreeBSD.org , 326.An James Healy Aq Mt jimmy@deefa.com 327and 328.An David Hayes Aq Mt david.hayes@ieee.org . 329.Pp 330This manual page was written by 331.An David Hayes Aq Mt david.hayes@ieee.org 332and 333.An Lawrence Stewart Aq Mt lstewart@FreeBSD.org . 334