1# 2# $FreeBSD$ 3# 4 5Notes on the internal structure of dummynet (2010 version) 6by Riccardo Panicucci and Luigi Rizzo 7Work supported by the EC project ONELAB2 8 9 10********* 11* INDEX * 12********* 13Implementation of new dummynet 14 Internal structure 15 Files 16Packet arrival 17 The reconfiguration routine 18dummynet_task() 19Configuration 20 Add a pipe 21 Add a scheduler 22 Add a flowset 23Listing object 24Delete of object 25 Delete a pipe 26 Delete a flowset 27 Delete a scheduler 28Compatibility with FreeBSD7.2 and FreeBSD 8 ipfw binary 29 ip_dummynet_glue.c 30 ip_fw_glue.c 31How to configure dummynet 32How to implement a new scheduler 33 34 35 36OPEN ISSUES 37------------------------------ 3820100131 deleting RR causes infinite loop 39 presumably in the rr_free_queue() call -- seems to hang 40 forever when deleting a live flow 41------------------------------ 42 43Dummynet is a traffic shaper and network emulator. Packets are 44selected by an external filter such as ipfw, and passed to the emulator 45with a tag such as "pipe 10" or "queue 5" which tells what to 46do with the packet. As an example 47 48 ipfw add queue 5 icmp from 10.0.0.2 to all 49 50All packets with the same tag belong to a "flowset", or a set 51of flows which can be further partitioned according to a mask. 52Flowsets are then passed to a scheduler for processing. The 53association of flowsets and schedulers is configurable e.g. 54 55 ipfw queue 5 config sched 10 weight 3 flow_mask xxxx 56 ipfw queue 8 config sched 10 weight 1 ... 57 ipfw queue 3 config sched 20 weight 1 ... 58 59"sched 10" represents one or more scheduler instances, 60selected through a mask on the 5-tuple itself. 61 62 ipfw sched 20 config type FIFO sched_mask yyy ... 63 64There are in fact two masks applied to each packet: 65+ the "sched_mask" sends packets arriving to a scheduler_id to 66 one of many instances. 67+ the "flow_mask" together with the flowset_id is used to 68 collect packets into independent flows on each scheduler. 69 70As an example, we can have 71 ipfw queue 5 config sched 10 flow_mask src-ip 0x000000ff 72 ipfw sched 10 config type WF2Q+ sched_mask src-ip 0xffffff00 73 74means that sched 10 will have one instance per /24 source subnet, 75and within that, each individual source will be a flow. 76 77Internal structure 78----------------- 79Dummynet-related data is split into several data structures, 80part of them constituting the userland-kernel API, and others 81specific to the kernel. 82NOTE: for up-to-date details please look at the relevant source 83 headers (ip_dummynet.h, ip_dn_private.h, dn_sched.h) 84 85USERLAND-KERNEL API (ip_dummynet.h) 86 87 struct dn_link: 88 contains data about the physical link such as 89 bandwith, delay, burst size; 90 91 struct dn_fs: 92 describes a flowset, i.e. a template for queues. 93 Main parameters are the scheduler we attach to, a flow_mask, 94 buckets, queue size, plr, weight, and other scheduler-specific 95 parameters. 96 97 struct dn_flow 98 contains information on a flow, including masks and 99 statistics 100 101 struct dn_sch: 102 defines a scheduler (and a link attached to it). 103 Parameters include scheduler type, sched_mask, number of 104 buckets, and possibly other scheduler-specific parameters, 105 106 struct dn_profile: 107 fields to simulate a delay profile 108 109 110KERNEL REPRESENTATION (ip_dn_private.h) 111 112 struct mq 113 a queue of mbufs with head and tail. 114 115 struct dn_queue 116 individual queue of packets, created by a flowset using 117 flow_mask and attached to a scheduler instance selected 118 through sched_mask. 119 A dn_queue has a pointer to the dn_fsk (which in turn counts 120 how many queues point to it), a pointer to the 121 dn_sch_inst it attaches to, and is in a hash table in the 122 flowset. scheduler instances also should store queues in 123 their own containers used for scheduling (lists, trees, etc.) 124 CREATE: done on packet arrivals when a flow matches a flowset. 125 DELETE: done only when deleting the parent dn_sch_inst 126 or draining memory. 127 128 struct dn_fsk 129 includes a dn_fs; a pointer to the dn_schk; a link field 130 for the list of dn_fsk attached to the same scheduler, 131 or for the unlinked list; 132 a refcount for the number of queues pointing to it; 133 The dn_fsk is in a hash table, fshash. 134 CREATE: done on configuration commands. 135 DELETE: on configuration commands. 136 137 struct dn_sch_inst 138 a scheduler instance, created from a dn_schk applying sched_mask. 139 Contains a delay line, a reference to the parent, and scheduler- 140 specific info. Both dn_sch_inst and its delay line can be in the 141 evheap if they have events to be processed. 142 CREATE: created from a dn_schk applying sched_mask 143 DELETE: configuration command delete a scheduler which in turn 144 sweeps the hash table of instances deleting them 145 146 struct dn_schk 147 includes dn_sch, dn_link, a pointer to dn_profile, 148 a hash table of dn_sch_inst, a list of dn_fsk 149 attached to it. 150 CREATE: configuration command. If there are flowsets that 151 refer to this number, they are attached and moved 152 to the hash table 153 DELETE: manual, see dn_sch_inst 154 155 156 fshash schedhash 157 +---------------+ sched +--------------+ 158 | sched-------------------->| NEW_SCHK| 159 -<----*sch_chain |<-----------------*fsk_list | 160 |NEW_FSK |<----. | [dn_link] | 161 +---------------+ | +--------------+ 162 |qht (hash) | | | siht(hash) | 163 | [dn_queue] | | | [dn_si] | 164 | [dn_queue] | | | [dn_si] | 165 | ... | | | ... | 166 | +--------+ | | | +---------+ | 167 | |dn_queue| | | | |dn_si | | 168 | | fs *----------' | | | | 169 | | si *---------------------->| | | 170 | +---------+ | | +---------+ | 171 +---------------+ +--------------+ 172 173The following global data structures contain all 174schedulers and flowsets. 175 176- schedhash[x]: contains all scheduler templates in the system. 177 Looked up only on manual configurations, where flowsets 178 are attached to matching schedulers. 179 We have one entry per 'sched X config' command 180 (plus one for each 'pipe X config'). 181 182- fshash[x]: contains all flowsets. 183 We do a lookup on this for each packet. 184 We have one entry for each 'queue X config' 185 (plus one for each 'pipe X config'). 186 187Additionally, a list that contains all unlinked flowset: 188- fsu: contains flowset that are not linked with any scheduler. 189 Flowset are put in this list when they refer to a non 190 existing scheduler. 191 We don't need an efficient data structure as we never search 192 here on a packet arrivals. 193 194Scheduler instances and the delay lines associated with each scheduler 195instance need to be woken up at certain times. Because we have many 196such objects, we keep them in a priority heap (system_heap). 197 198Almost all objects in this implementation are preceded by a structure 199(struct dn_id) which makes it easier to identify them. 200 201 202Files 203----- 204The dummynet code is split in several files. 205All kernel code is in sys/netinet/ipfw except ip_dummynet.h 206All userland code is in sbin/ipfw. 207Files are 208- sys/netinet/ip_dummynet.h defines the kernel-userland API 209- ip_dn_private.h contains the kernel-specific APIs 210 and data structures 211- dn_sched.h defines the scheduler API 212- ip_dummynet.c cointains module glue and sockopt handlers, with all 213 functions to configure and list objects. 214- ip_dn_io.c contains the functions directly related to packet processing, 215 and run in the critical path. It also contains some functions 216 exported to the schedulers. 217- dn_heap.[ch] implement a binary heap and a generic hash table 218- dn_sched_* implement the various scheduler modules 219 220- dummynet.c is the file used to implement the user side of dummynet. 221 It contains the function to parsing command line, and functions to 222 show the output of dummynet objects. 223Moreover, there are two new file (ip_dummynet_glue.c and ip_fw_glue.c) that 224are used to allow compatibility with the "ipfw" binary from FreeBSD 7.2 and 225FreeBSD 8. 226 227LOCKING 228======= 229At the moment the entire processing occurs under a single lock 230which is expected to be acquired in exclusive mode 231DN_BH_WLOCK() / DN_BH_WUNLOCK(). 232 233In perspective we aim at the following: 234- the 'busy' flag, 'pending' list and all structures modified by packet 235 arrivals and departures are protected by the BH_WLOCK. 236 This is normally acquired in exclusive mode by the packet processing 237 functions for short sections of code (exception -- the timer). 238 If 'busy' is not set, we can do regular packet processing. 239 If 'busy' is set, no pieces can be accessed. 240 We must enqueue the packet on 'pending' and return immediately. 241 242- the 'busy' flag is set/cleared by long sections of code as follows: 243 UH_WLOCK(); KASSERT(busy == 0); 244 BH_WLOCK(); busy=1; BH_WUNLOCK(); 245 ... do processing ... 246 BH_WLOCK(); busy=0; drain_queue(pending); BH_WUNLOCK(); 247 UH_WUNLOCK(); 248 this normally happens when the upper half has something heavy 249 to do. The prologue and epilogue are not in the critical path. 250 251- the main containers (fshash, schedhash, ...) are protected by 252 UH_WLOCK. 253 254Packet processing 255================= 256A packet enters dummynet through dummynet_io(). We first lookup 257the flowset number in fshash using dn_ht_find(), then find the scheduler 258instance using ipdn_si_find(), then possibly identify the correct 259queue with ipdn_q_find(). 260If successful, we call the scheduler's enqueue function(), and 261if needed start I/O on the link calling serve_sched(). 262If the packet can be returned immediately, this is done by 263leaving *m0 set. Otherwise, the packet is absorbed by dummynet 264and we simply return, possibly with some appropriate error code. 265 266Reconfiguration 267--------------- 268Reconfiguration is the complex part of the system because we need to 269keep track of the various objects and containers. 270At the moment we do not use reference counts for objects so all 271processing must be done under a lock. 272 273The main entry points for configuration is the ip_dn_ctl() handler 274for the IP_DUMMYNET3 sockopt (others are provided only for backward 275compatibility). Modifications to the configuration call do_config(). 276The argument is a sequence of blocks each starting with a struct dn_id 277which specifies its content. 278The first dn_id must contain as obj.id the DN_API_VERSION 279The obj.type is DN_CMD_CONFIG (followed by actual objects), 280DN_CMD_DELETE (with the correct subtype and list of objects), or 281DN_CMD_FLUSH. 282 283DN_CMD_CONFIG is followed by objects to add/reconfigure. In general, 284if an object already exists it is reconfigured, otherwise it is 285created in a way that keeps the structure consistent. 286We have the following objects in the system, normally numbered with 287an identifier N between 1 and 65535. For certain objects we have 288"shadow" copies numbered I+NMAX and I+ 2*NMAX which are used to 289implement certain backward compatibility features. 290 291In general we have the following linking 292 293 TRADITIONAL DUMMYNET QUEUES "queue N config ... pipe M ..." 294 corresponds to a dn_fs object numbered N 295 296 TRADITIONAL DUMMYNET PIPES "pipe N config ..." 297 dn_fs N+2*NMAX --> dn_sch N+NMAX type FIFO --> dn_link N+NMAX 298 299 GENERIC SCHEDULER "sched N config ... " 300 [dn_fs N+NMAX] --> dn_sch N --> dn_link N 301 The flowset N+NMAX is created only if the scheduler is not 302 of type MULTIQUEUE. 303 304 DELAY PROFILE "pipe N config profile ..." 305 it is always attached to an existing dn_link N 306 307Because traditional dummynet pipes actually configure both a 308'standalone' instance and one that can be used by queues, 309we do the following: 310 311 "pipe N config ..." configures: 312 dn_sched N type WF2Q+ 313 dn_sched N+NMAX type FIFO 314 dn_fs N+2NMAX attached to dn_sched N+NMAX 315 dn_pipe N 316 dn_pipe N+NMAX 317 318 "queue N config" configures 319 dn_fs N 320 321 "sched N config" configures 322 dn_sched N type as desired 323 dn_fs N+NMAX attached to dn_sched N 324 325 326dummynet_task() 327=============== 328The dummynet_task() function is the main dummynet processing function and is 329called every tick. This function first calculate the new current time, then 330it checks if it is the time to wake up object from the system_heap comparing 331the current time and the key of the heap. Two types of object (really the 332heap contains pointer to objects) are in the 333system_heap: 334 335- scheduler instance: if a scheduler instance is waked up, the dequeue() 336 function is called until it has credit. If the dequeue() returns packets, 337 the scheduler instance is inserted in the heap with a new key depending of 338 the data that will be send out. If the scheduler instance remains with 339 some credit, it means that is hasn't other packet to send and so the 340 instance is no longer inserted in the heap. 341 342 If the scheduler instance extracted from the heap has the DELETE flag set, 343 the dequeue() is not called and the instance is destroyed now. 344 345- delay line: when extracting a delay line, the function transmit_event() is 346 called to send out packet from delay line. 347 348 If the scheduler instance associated with this delay line doesn't exists, 349 the delay line will be delete now. 350 351Configuration 352============= 353To create a pipe, queue or scheduler, the user should type commands like: 354"ipfw pipe x config" 355"ipfw queue y config pipe x" 356"ipfw pipe x config sched <type>" 357 358The userland side of dummynet will prepare a buffer contains data to pass to 359kernel side. 360The buffer contains all struct needed to configure an object. In more detail, 361to configure a pipe all three structs (dn_link, dn_sch, dn_fs) are needed, 362plus the delay profile struct if the pipe has a delay profile. 363 364If configuring a scheduler only the struct dn_sch is wrote in the buffer, 365while if configuring a flowset only the dn_fs struct is wrote. 366 367The first struct in the buffer contains the type of command request, that is 368if it is configuring a pipe, a queue, or a scheduler. Then there are structs 369need to configure the object, and finally there is the struct that mark 370the end of the buffer. 371 372To support the insertion of pipe and queue using the old syntax, when adding 373a pipe it's necessary to create a FIFO flowset and a FIFO scheduler, which 374have a number x + DN_PIPEOFFSET. 375 376Add a pipe 377---------- 378A pipe is only a template for a link. 379If the pipe already exists, parameters are updated. If a delay profile exists 380it is deleted and a new one is created. 381If the pipe doesn't exist a new one is created. After the creation, the 382flowset unlinked list is scanned to see if there are some flowset that would 383be linked with this pipe. If so, these flowset will be of wf2q+ type (for 384compatibility) and a new wf2q+ scheduler is created now. 385 386Add a scheduler 387--------------- 388If the scheduler already exists, and the type and the mask are the same, the 389scheduler is simply reconfigured calling the config_scheduler() scheduler 390function with the RECONFIGURE flag active. 391If the type or the mask differ, it is necessary to delete the old scheduler 392and create a new one. 393If the scheduler doesn't exists, a new one is created. If the scheduler has 394a mask, the hash table is created to store pointers to scheduler instances. 395When a new scheduler is created, it is necessary to scan the unlinked 396flowset list to search eventually flowset that would be linked with this 397scheduler number. If some are found, flowsets became of the type of this 398scheduler and they are configured properly. 399 400Add a flowset 401------------- 402Flowset pointers are store in the system in two list. The unlinked flowset list 403contains all flowset that aren't linked with a scheduler, the flowset list 404contains flowset linked to a scheduler, and so they have a type. 405When adding a new flowset, first it is checked if the flowset exists (that is, 406it is in the flowset list) and if it doesn't exists a new flowset is created 407and added to unlinked flowset list if the scheduler which the flowset would be 408linked doesn't exists, or added in the flowset list and configured properly if 409the scheduler exists. If the flowset (before to be created) was in the 410unlinked flowset list, it is removed and deleted, and then recreated. 411If the flowset exists, to allow reconfiguration of this flowset, the 412scheduler number and types must match with the one in memory. If this isn't 413so, the flowset is deleted and a new one will be created. Really, the flowset 414it isn't deleted now, but it is removed from flowset list and it will be 415deleted later because there could be some queues that are using it. 416 417Listing of object 418================= 419The user can request a list of object present in dummynet through the command 420"ipfw [-v] pipe|queue [x] list|show" 421The kernel side of dummynet send a buffer to user side that contains all 422pipe, all scheduler, all flowset, plus all scheduler instances and all queues. 423The dummynet user land will format the output and show only the relevant 424information. 425The buffer sent start with all pipe from the system. The entire struct dn_link 426is passed, except the delay_profile struct that is useless in user space. 427After pipes, all flowset are wrote in the buffer. The struct contains 428scheduler flowset specific data is linked with the flowset writing the 429'obj' id of the extension into the 'alg_fs' pointer. 430Then schedulers are wrote. If a scheduler has one or more scheduler instance, 431these are linked to the parent scheduler writing the id of the parent in the 432'ptr_sched' pointer. If a scheduler instance has queues, there are wrote in 433the buffer and linked thorugh the 'obj' and 'sched_inst' pointer. 434Finally, flowsets in the unlinked flowset list are write in the buffer, and 435then a struct gen in saved in the buffer to mark the last struct in the buffer. 436 437 438Delete of object 439================ 440An object is usually removed by user through a command like 441"ipfw pipe|queue x delete". XXX sched? 442ipfw pass to the kernel a struct gen that contains the type and the number 443of the object to remove 444 445Delete of pipe x 446---------------- 447A pipe can be deleted by the user throught the command 'ipfw pipe x delete'. 448To delete a pipe, the pipe is removed from the pipe list, and then deleted. 449Also the scheduler associated with this pipe should be deleted. 450For compatibility with old dummynet syntax, the associated FIFO scheduler and 451FIFO flowset must be deleted. 452 453Delete of flowset x 454------------------- 455To remove a flowset, we must be sure that is no loger referenced by any object. 456If the flowset to remove is in the unlinked flowset list, there is not any 457issue, the flowset can be safely removed calling a free() (the flowset 458extension is not yet created if the flowset is in this list). 459If the flowset is in the flowset list, first we remove from it so new packet 460are discarded when arrive. Next, the flowset is marked as delete. 461Now we must check if some queue is using this flowset. 462To do this, a counter (active_f) is provided. This counter indicate how many 463queues exist using this flowset. 464The active_f counter is automatically incremented when a queue is created 465and decremented when a queue is deleted. 466If the counter is 0, the flowset can be safely deleted, and the delete_alg_fs() 467scheduler function is called before deallocate memory. 468If the counter is not 0, the flowset remain in memory until the counter become 469zero. When a queue is delete (by dn_delete_queue() function) it is checked if 470the linked flowset is deleting and if so the counter is decrementing. If the 471counter reaches 0, the flowset is deleted. 472The deletion of a queue can be done only by the scheduler, or when the scheduler 473is destroyed. 474 475Delete of scheduler x 476--------------------- 477To delete a scheduler we must be sure that any scheduler instance of this type 478are in the system_heap. To do so, a counter (inst_counter) is provided. 479This counter is managed by the system: it is incremented every time it is 480inserted in the system_heap, and decremented every time it is extracted from it. 481To delete the scheduler, first we remove it from the scheduler list, so new 482packet are discarded when they arrive, and mark the scheduler as deleting. 483 484If the counter is 0, we can remove the scheduler safely calling the 485really_deletescheduler() function. This function will scan all scheduler 486instances and call the delete_scheduler_instance() function that will delete 487the instance. When all instance are deleted, the scheduler template is 488deleted calling the delete_scheduler_template(). If the delay line associate 489with the scheduler is empty, it is deleted now, else it will be deleted when 490it will became empy. 491If the counter was not 0, we wait for it. Every time the dummynet_task() 492function extract a scheduler from the system_heap, the counter is decremented. 493If the scheduler has the delete flag enabled the dequeue() is not called and 494delete_scheduler_instance() is called to delete the instance. 495Obviously this scheduler instance is no loger inserted in the system_heap. 496If the counter reaches 0, the delete_scheduler_template() function is called 497all memory is released. 498NOTE: Flowsets that belong to this scheduler are not deleted, so if a new 499 scheduler with the same number is inserted will use these flowsets. 500 To do so, the best approach would be insert these flowset in the 501 unlinked flowset list, but doing this now will be very expensive. 502 So flowsets will remain in memory and linked with a scheduler that no 503 longer exists until a packet belonging to this flowset arrives. When 504 this packet arrives, the reconfigure() function is called because the 505 generation number mismatch with one contains in the flowset and so 506 the flowset will be moved into the flowset unlinked list, or will be 507 linked with the new scheduler if a new one was created. 508 509 510COMPATIBILITY WITH FREEBSD 7.2 AND FREEBSD 8 'IPFW' BINARY 511========================================================== 512Dummynet is not compatible with old ipfw binary because internal structs are 513changed. Moreover, the old ipfw binary is not compatible with new kernels 514because the struct that represents a firewall rule has changed. So, if a user 515install a new kernel on a FreeBSD 7.2, the ipfw (and possibly many other 516commands) will not work. 517New dummynet uses a new socket option: IP_DUMMYNET3, used for both set and get. 518The old option can be used to allow compatibility with the 'ipfw' binary of 519older version (tested with 7.2 and 8.0) of FreeBSD. 520Two file are provided for this purpose: 521- ip_dummynet_glue.c translates old dummynet requests to the new ones, 522- ip_fw_glue.c converts the rule format between 7.2 and 8 versions. 523Let see in detail these two files. 524 525IP_DUMMYNET_GLUE.C 526------------------ 527The internal structs of new dummynet are very different from the original. 528Because of there are some difference from between dummynet in FreeBSD 7.2 and 529dummynet in FreeBSD 8 (the FreeBSD 8 version includes support to pipe delay 530profile and burst option), I have to include both header files. I copied 531the revision 191715 (for version 7.2) and the revision 196045 (for version 8) 532and I appended a number to each struct to mark them. 533 534The main function of this file is ip_dummynet_compat() that is called by 535ip_dn_ctl() when it receive a request of old socket option. 536 537A global variabile ('is7') store the version of 'ipfw' that FreeBSD is using. 538This variable is set every time a request of configuration is done, because 539with this request we receive a buffer of which size depending of ipfw version. 540Because of in general the first action is a configuration, this variable is 541usually set accordly. If the first action is a request of listing of pipes 542or queues, the system cannot know the version of ipfw, and we suppose that 543version 7.2 is used. If version is wrong, the output can be senseless, but 544the application should not crash. 545 546There are four request for old dummynet: 547- IP_DUMMYNET_FLUSH: the flush options have no parameter, so simply the 548 dummynet_flush() function is called; 549- IP_DUMMYNET_DEL: the delete option need to be translate. 550 It is only necessary to extract the number and the type of the object 551 (pipe or queue) to delete from the buffer received and build a new struct 552 gen contains the right parameters, then call the delete_object() function; 553- IP_DUMMYNET_CONFIGURE: the configure command receive a buffer depending of 554 the ipfw version. After the properly extraction of all data, that depends 555 by the ipfw version used, new structures are filled and then the dummynet 556 config_link() function is properly called. Note that the 7.2 version does 557 not support some parameter as burst or delay profile. 558- IP_DUMMYNET_GET: The get command should send to the ipfw the correct buffer 559 depending of its version. There are two function that build the 560 corrected buffer, ip_dummynet_get7() and ip_dummynet_get8(). These 561 functions reproduce the buffer exactly as 'ipfw' expect. The only difference 562 is that the weight parameter for a queue is no loger sent by dummynet and so 563 it is set to 0. 564 Moreover, because of the internal structure has changed, the bucket size 565 of a queue could not be correct, because now all flowset share the hash 566 table. 567 If the version of ipfw is wrong, the output could be senseless or truncated, 568 but the application should not crash. 569 570IP_FW_GLUE.C 571------------ 572The ipfw binary also is used to add rules to FreeBSD firewall. Because of the 573struct ip_fw is changed from FreeBsd 7.2 to FreeBSD 8, it is necessary 574to write some glue code to allow use ipfw from FreeBSD 7.2 with the kernel 575provided with FreeBSD 8. 576This file contains two functions to convert a rule from FreeBSD 7.2 format to 577FreeBSD 8 format, and viceversa. 578The conversion should be done when a rule passes from userspace to kernel space 579and viceversa. 580I have to modify the ip_fw2.c file to manage these two case, and added a 581variable (is7) to store the ipfw version used, using an approach like the 582previous file: 583- when a new rule is added (option IP_FW_ADD) the is7 variable is set if the 584 size of the rule received corrispond to FreeBSD 7.2 ipfw version. If so, the 585 rule is converted to version 8 calling the function convert_rule_to_8(). 586 Moreover, after the insertion of the rule, the rule is now reconverted to 587 version 7 because the ipfw binary will print it. 588- when the user request a list of rules (option IP_FW_GET) the is7 variable 589 should be set correctly because we suppose that a configure command was done, 590 else we suppose that the FreeBSD version is 8. The function ipfw_getrules() 591 in ip_fw2.c file return all rules, eventually converted to version 7 (if 592 the is7 is set) to the ipfw binary. 593The conversion of a rule is quite simple. The only difference between the 594two structures (struct ip_fw) is that in the new there is a new field 595(uint32_t id). So, I copy the entire rule in a buffer and the copy the rule in 596the right position in the new (or old) struct. The size of commands are not 597changed, and the copy is done into a cicle. 598 599How to configure dummynet 600========================= 601It is possible to configure dummynet through two main commands: 602'ipfw pipe' and 'ipfw queue'. 603To allow compatibility with old version, it is possible configure dummynet 604using the old command syntax. Doing so, obviously, it is only possible to 605configure a FIFO scheduler or a wf2q+ scheduler. 606A new command, 'ipfw pipe x config sched <type>' is supported to add a new 607scheduler to the system. 608 609- ipfw pipe x config ... 610 create a new pipe with the link parameters 611 create a new scheduler fifo (x + offset) 612 create a new flowset fifo (x + offset) 613 the mask is eventually stored in the FIFO scheduler 614 615- ipfw queue y config pipe x ... 616 create a new flowset y linked to sched x. 617 The type of flowset depends by the specified scheduler. 618 If the scheduler does not exist, this flowset is inserted in a special 619 list and will be not active. 620 If pipe x exists and sched does not exist, a new wf2q+ scheduler is 621 created and the flowset will be linked to this new scheduler (this is 622 done for compatibility with old syntax). 623 624- ipfw pipe x config sched <type> ... 625 create a new scheduler x of type <type>. 626 Search into the flowset unlinked list if there are some flowset that 627 should be linked with this new scheduler. 628 629- ipfw pipe x delete 630 delete the pipe x 631 delete the scheduler fifo (x + offset) 632 delete the scheduler x 633 delete the flowset fifo (x + offset) 634 635- ipfw queue x delete 636 delete the flowset x 637 638- ipfw sched x delete ///XXX 639 delete the scheduler x 640 641Follow now some examples to how configure dummynet: 642- Ex1: 643 ipfw pipe 10 config bw 1M delay 15 // create a pipe with band and delay 644 A FIFO flowset and scheduler is 645 also created 646 ipfw queue 5 config pipe 10 weight 56 // create a flowset. This flowset 647 will be of wf2q+ because a pipe 10 648 exists. Moreover, the wf2q+ 649 scheduler is created now. 650- Ex2: 651 ipfw queue 5 config pipe 10 weight 56 // Create a flowset. Scheduler 10 652 does not exist, so this flowset 653 is inserted in the unlinked 654 flowset list. 655 ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler. 656 Because of a flowset with 'pipe 10' exists, 657 a wf2q+ scheduler is created now and that 658 flowset is linked with this sceduler. 659 660- Ex3: 661 ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler. 662 ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to 663 pipe 10 664 ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5. This flowset 665 will belong to scheduler 10 and 666 it is of type RR 667 668- Ex4: 669 ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to 670 pipe 10 (not exist yet) 671 ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler. 672 ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5.This flowset 673 will belong to scheduler 10 and 674 it is of type RR 675 ipfw pipe 10 config sched wf2q+ // Modify the type of scheduler 10. It 676 becomes a wf2q+ scheduler. 677 When a new packet of flowset 5 arrives, 678 the flowset 5 becomes to wf2q+ type. 679 680How to implement a new scheduler 681================================ 682In dummynet, a scheduler algorithm is represented by two main structs, some 683functions and other minor structs. 684- A struct dn_sch_xyz (where xyz is the 'type' of scheduler algorithm 685 implemented) contains data relative to scheduler, as global parameter that 686 are common to all instances of the scheduler 687- A struct dn_sch_inst_xyz contains data relative to a single scheduler 688 instance, as local status variable depending for example by flows that 689 are linked with the scheduler, and so on. 690To add a scheduler to dummynet, the user should type a command like: 691'ipfw pipe x config sched <type> [mask ... ...]' 692This command creates a new struct dn_sch_xyz of type <type>, and 693store the optional parameter in that struct. 694 695The parameter mask determines how many scheduler instance of this 696scheduler may exist. For example, it is possible to divide traffic 697depending on the source port (or destination, or ip address...), 698so that every scheduler instance act as an independent scheduler. 699If the mask is not set, all traffic goes to the same instance. 700 701When a packet arrives to a scheduler, the system search the corrected 702scheduler instance, and if it does not exist it is created now (the 703struct dn_sch_inst_xyz is allocated by the system, and the scheduler 704fills the field correctly). It is a task of the scheduler to create 705the struct that contains all queues for a scheduler instance. 706Dummynet provides some function to create an hash table to store 707queues, but the schedule algorithm can choice the own struct. 708 709To link a flow to a scheduler, the user should type a command like: 710'ipfw queue z config pipe x [mask... ...]' 711 712This command creates a new 'dn_fs' struct that will be inserted 713in the system. If the scheduler x exists, this flowset will be 714linked to that scheduler and the flowset type become the same as 715the scheduler type. At this point, the function create_alg_fs_xyz() 716is called to allow store eventually parameter for the flowset that 717depend by scheduler (for example the 'weight' parameter for a wf2q+ 718scheduler, or some priority...). A parameter mask can be used for 719a flowset. If the mask parameter is set, the scheduler instance can 720separate packet according to its flow id (src and dst ip, ports...) 721and assign it to a separate queue. This is done by the scheduler, 722so it can ignore the mask if it wants. 723 724See now the two main structs: 725struct dn_sch_xyz { 726 struct gen g; /* important the name g */ 727 /* global params */ 728}; 729struct dn_sch_inst_xyz { 730 struct gen g; /* important the name g */ 731 /* params of the instance */ 732}; 733It is important to embed the struct gen as first parameter. The struct gen 734contains some values that the scheduler instance must fill (the 'type' of 735scheduler, the 'len' of the struct...) 736The function create_scheduler_xyz() should be implemented to initialize global 737parameters in the first struct, and if memory allocation is done it is 738mandatory to implement the delete_scheduler_template() function to free that 739memory. 740The function create_scheduler_instance_xyz() must be implemented even if the 741scheduler instance does not use extra parameters. In this function the struct 742gen fields must be filled with corrected infos. The 743delete_scheduler_instance_xyz() function must bu implemented if the instance 744has allocated some memory in the previous function. 745 746To store data belonging to a flowset the follow struct is used: 747struct alg_fs_xyz { 748 struct gen g; 749 /* fill correctly the gen struct 750 g.subtype = DN_XYZ; 751 g.len = sizeof(struct alg_fs_xyz) 752 ... 753 */ 754 /* params for the flow */ 755}; 756The create_alg_fs_xyz() function is mandatory, because it must fill the struct 757gen, but the delete_alg_fs_xyz() is mandatory only if the previous function 758has allocated some memory. 759 760A struct dn_queue contains packets belonging to a queue and some statistical 761data. The scheduler could have to store data in this struct, so it must define 762a dn_queue_xyz struct: 763struct dn_queue_xyz { 764 struct dn_queue q; 765 /* parameter for a queue */ 766} 767 768All structures are allocated by the system. To do so, the scheduler must 769set the size of its structs in the scheduler descriptor: 770scheduler_size: sizeof(dn_sch_xyz) 771scheduler_i_size: sizeof(dn_sch_inst_xyz) 772flowset_size: sizeof(alg_fs_xyz) 773queue_size: sizeof(dn_queue_xyz); 774The scheduler_size could be 0, but other struct must have at least a struct gen. 775 776 777After the definition of structs, it is necessary to implement the 778scheduler functions. 779 780- int (*config_scheduler)(char *command, void *sch, int reconfigure); 781 Configure a scheduler, or reconfigure if 'reconfigure' == 1. 782 This function performs additional allocation and initialization of global 783 parameter for this scheduler. 784 If memory is allocated here, the delete_scheduler_template() function 785 should be implemented to remove this memory. 786- int (*delete_scheduler_template)(void* sch); 787 Delete a scheduler template. This function is mandatory if the scheduler 788 uses extra data respect the struct dn_sch. 789- int (*create_scheduler_instance)(void *s); 790 Create a new scheduler instance. The system allocate the necessary memory 791 and the schedulet can access it using the 's' pointer. 792 The scheduler instance stores all queues, and to do this can use the 793 hash table provided by the system. 794- int (*delete_scheduler_instance)(void *s); 795 Delete a scheduler instance. It is important to free memory allocated 796 by create_scheduler_instance() function. The memory allocated by system 797 is freed by the system itself. The struct contains all queue also has 798 to be deleted. 799- int (*enqueue)(void *s, struct gen *f, struct mbuf *m, 800 struct ipfw_flow_id *id); 801 Called when a packet arrives. The packet 'm' belongs to the scheduler 802 instance 's', has a flowset 'f' and the flowid 'id' has already been 803 masked. The enqueue() must call dn_queue_packet(q, m) function to really 804 enqueue packet in the queue q. The queue 'q' is chosen by the scheduler 805 and if it does not exist should be created calling the dn_create_queue() 806 function. If the schedule want to drop the packet, it must call the 807 dn_drop_packet() function and then return 1. 808- struct mbuf * (*dequeue)(void *s); 809 Called when the timer expires (or when a packet arrives and the scheduler 810 instance is idle). 811 This function is called when at least a packet can be send out. The 812 scheduler choices the packet and returns it; if no packet are in the 813 schedulerinstance, the function must return NULL. 814 Before return a packet, it is important to call the function 815 dn_return_packet() to update some statistic of the queue and update the 816 queue counters. 817- int (*drain_queue)(void *s, int flag); 818 The system request to scheduler to delete all queues that is not using 819 to free memory. The flag parameter indicate if a queue must be deleted 820 even if it is active. 821 822- int (*create_alg_fs)(char *command, struct gen *g, int reconfigure); 823 It is called when a flowset is linked with a scheduler. This is done 824 when the scheduler is defined, so we can know the type of flowset. 825 The function initialize the flowset paramenter parsing the command 826 line. The parameter will be stored in the g struct that have the right 827 size allocated by the system. If the reconfigure flag is set, it means 828 that the flowset is reconfiguring 829- int (*delete_alg_fs)(struct gen *f); 830 It is called when a flowset is deleting. Must remove the memory allocate 831 by the create_alg_fs() function. 832 833- int (*create_queue_alg)(struct dn_queue *q, struct gen *f); 834 Called when a queue is created. The function should link the queue 835 to the struct used by the scheduler instance to store all queues. 836- int (*delete_queue_alg)(struct dn_queue *q); 837 Called when a queue is deleting. The function should remove extra data 838 and update the struct contains all queues in the scheduler instance. 839 840The struct scheduler represent the scheduler descriptor that is passed to 841dummynet when a scheduler module is loaded. 842This struct contains the type of scheduler, the length of all structs and 843all function pointers. 844If a function is not implemented should be initialize to NULL. Some functions 845are mandatory, other are mandatory if some memory should be freed. 846Mandatory functions: 847- create_scheduler_instance() 848- enqueue() 849- dequeue() 850- create_alg_fs() 851- drain_queue() 852Optional functions: 853- config_scheduler() 854- create_queue_alg() 855Mandatory functions if the corresponding create...() has allocated memory: 856- delete_scheduler_template() 857- delete_scheduler_instance() 858- delete_alg_fs() 859- delete_queue_alg() 860 861