1# 2# 3 4Notes on the internal structure of dummynet (2010 version) 5by Riccardo Panicucci and Luigi Rizzo 6Work supported by the EC project ONELAB2 7 8 9********* 10* INDEX * 11********* 12Implementation of new dummynet 13 Internal structure 14 Files 15Packet arrival 16 The reconfiguration routine 17dummynet_task() 18Configuration 19 Add a pipe 20 Add a scheduler 21 Add a flowset 22Listing object 23Delete of object 24 Delete a pipe 25 Delete a flowset 26 Delete a scheduler 27Compatibility with FreeBSD7.2 and FreeBSD 8 ipfw binary 28 ip_dummynet_glue.c 29 ip_fw_glue.c 30How to configure dummynet 31How to implement a new scheduler 32 33 34 35OPEN ISSUES 36------------------------------ 3720100131 deleting RR causes infinite loop 38 presumably in the rr_free_queue() call -- seems to hang 39 forever when deleting a live flow 40------------------------------ 41 42Dummynet is a traffic shaper and network emulator. Packets are 43selected by an external filter such as ipfw, and passed to the emulator 44with a tag such as "pipe 10" or "queue 5" which tells what to 45do with the packet. As an example 46 47 ipfw add queue 5 icmp from 10.0.0.2 to all 48 49All packets with the same tag belong to a "flowset", or a set 50of flows which can be further partitioned according to a mask. 51Flowsets are then passed to a scheduler for processing. The 52association of flowsets and schedulers is configurable e.g. 53 54 ipfw queue 5 config sched 10 weight 3 flow_mask xxxx 55 ipfw queue 8 config sched 10 weight 1 ... 56 ipfw queue 3 config sched 20 weight 1 ... 57 58"sched 10" represents one or more scheduler instances, 59selected through a mask on the 5-tuple itself. 60 61 ipfw sched 20 config type FIFO sched_mask yyy ... 62 63There are in fact two masks applied to each packet: 64+ the "sched_mask" sends packets arriving to a scheduler_id to 65 one of many instances. 66+ the "flow_mask" together with the flowset_id is used to 67 collect packets into independent flows on each scheduler. 68 69As an example, we can have 70 ipfw queue 5 config sched 10 flow_mask src-ip 0x000000ff 71 ipfw sched 10 config type WF2Q+ sched_mask src-ip 0xffffff00 72 73means that sched 10 will have one instance per /24 source subnet, 74and within that, each individual source will be a flow. 75 76Internal structure 77----------------- 78Dummynet-related data is split into several data structures, 79part of them constituting the userland-kernel API, and others 80specific to the kernel. 81NOTE: for up-to-date details please look at the relevant source 82 headers (ip_dummynet.h, ip_dn_private.h, dn_sched.h) 83 84USERLAND-KERNEL API (ip_dummynet.h) 85 86 struct dn_link: 87 contains data about the physical link such as 88 bandwidth, delay, burst size; 89 90 struct dn_fs: 91 describes a flowset, i.e. a template for queues. 92 Main parameters are the scheduler we attach to, a flow_mask, 93 buckets, queue size, plr, weight, and other scheduler-specific 94 parameters. 95 96 struct dn_flow 97 contains information on a flow, including masks and 98 statistics 99 100 struct dn_sch: 101 defines a scheduler (and a link attached to it). 102 Parameters include scheduler type, sched_mask, number of 103 buckets, and possibly other scheduler-specific parameters, 104 105 struct dn_profile: 106 fields to simulate a delay profile 107 108 109KERNEL REPRESENTATION (ip_dn_private.h) 110 111 struct mq 112 a queue of mbufs with head and tail. 113 114 struct dn_queue 115 individual queue of packets, created by a flowset using 116 flow_mask and attached to a scheduler instance selected 117 through sched_mask. 118 A dn_queue has a pointer to the dn_fsk (which in turn counts 119 how many queues point to it), a pointer to the 120 dn_sch_inst it attaches to, and is in a hash table in the 121 flowset. scheduler instances also should store queues in 122 their own containers used for scheduling (lists, trees, etc.) 123 CREATE: done on packet arrivals when a flow matches a flowset. 124 DELETE: done only when deleting the parent dn_sch_inst 125 or draining memory. 126 127 struct dn_fsk 128 includes a dn_fs; a pointer to the dn_schk; a link field 129 for the list of dn_fsk attached to the same scheduler, 130 or for the unlinked list; 131 a refcount for the number of queues pointing to it; 132 The dn_fsk is in a hash table, fshash. 133 CREATE: done on configuration commands. 134 DELETE: on configuration commands. 135 136 struct dn_sch_inst 137 a scheduler instance, created from a dn_schk applying sched_mask. 138 Contains a delay line, a reference to the parent, and scheduler- 139 specific info. Both dn_sch_inst and its delay line can be in the 140 evheap if they have events to be processed. 141 CREATE: created from a dn_schk applying sched_mask 142 DELETE: configuration command delete a scheduler which in turn 143 sweeps the hash table of instances deleting them 144 145 struct dn_schk 146 includes dn_sch, dn_link, a pointer to dn_profile, 147 a hash table of dn_sch_inst, a list of dn_fsk 148 attached to it. 149 CREATE: configuration command. If there are flowsets that 150 refer to this number, they are attached and moved 151 to the hash table 152 DELETE: manual, see dn_sch_inst 153 154 155 fshash schedhash 156 +---------------+ sched +--------------+ 157 | sched-------------------->| NEW_SCHK| 158 -<----*sch_chain |<-----------------*fsk_list | 159 |NEW_FSK |<----. | [dn_link] | 160 +---------------+ | +--------------+ 161 |qht (hash) | | | siht(hash) | 162 | [dn_queue] | | | [dn_si] | 163 | [dn_queue] | | | [dn_si] | 164 | ... | | | ... | 165 | +--------+ | | | +---------+ | 166 | |dn_queue| | | | |dn_si | | 167 | | fs *----------' | | | | 168 | | si *---------------------->| | | 169 | +---------+ | | +---------+ | 170 +---------------+ +--------------+ 171 172The following global data structures contain all 173schedulers and flowsets. 174 175- schedhash[x]: contains all scheduler templates in the system. 176 Looked up only on manual configurations, where flowsets 177 are attached to matching schedulers. 178 We have one entry per 'sched X config' command 179 (plus one for each 'pipe X config'). 180 181- fshash[x]: contains all flowsets. 182 We do a lookup on this for each packet. 183 We have one entry for each 'queue X config' 184 (plus one for each 'pipe X config'). 185 186Additionally, a list that contains all unlinked flowset: 187- fsu: contains flowset that are not linked with any scheduler. 188 Flowset are put in this list when they refer to a non 189 existing scheduler. 190 We don't need an efficient data structure as we never search 191 here on a packet arrivals. 192 193Scheduler instances and the delay lines associated with each scheduler 194instance need to be woken up at certain times. Because we have many 195such objects, we keep them in a priority heap (system_heap). 196 197Almost all objects in this implementation are preceded by a structure 198(struct dn_id) which makes it easier to identify them. 199 200 201Files 202----- 203The dummynet code is split in several files. 204All kernel code is in sys/netpfil/ipfw except ip_dummynet.h 205All userland code is in sbin/ipfw. 206Files are 207- sys/netpfil/ip_dummynet.h defines the kernel-userland API 208- ip_dn_private.h contains the kernel-specific APIs 209 and data structures 210- dn_sched.h defines the scheduler API 211- ip_dummynet.c cointains module glue and sockopt handlers, with all 212 functions to configure and list objects. 213- ip_dn_io.c contains the functions directly related to packet processing, 214 and run in the critical path. It also contains some functions 215 exported to the schedulers. 216- dn_heap.[ch] implement a binary heap and a generic hash table 217- dn_sched_* implement the various scheduler modules 218 219- dummynet.c is the file used to implement the user side of dummynet. 220 It contains the function to parsing command line, and functions to 221 show the output of dummynet objects. 222Moreover, there are two new file (ip_dummynet_glue.c and ip_fw_glue.c) that 223are used to allow compatibility with the "ipfw" binary from FreeBSD 7.2 and 224FreeBSD 8. 225 226LOCKING 227======= 228At the moment the entire processing occurs under a single lock 229which is expected to be acquired in exclusive mode 230DN_BH_WLOCK() / DN_BH_WUNLOCK(). 231 232In perspective we aim at the following: 233- the 'busy' flag, 'pending' list and all structures modified by packet 234 arrivals and departures are protected by the BH_WLOCK. 235 This is normally acquired in exclusive mode by the packet processing 236 functions for short sections of code (exception -- the timer). 237 If 'busy' is not set, we can do regular packet processing. 238 If 'busy' is set, no pieces can be accessed. 239 We must enqueue the packet on 'pending' and return immediately. 240 241- the 'busy' flag is set/cleared by long sections of code as follows: 242 UH_WLOCK(); KASSERT(busy == 0); 243 BH_WLOCK(); busy=1; BH_WUNLOCK(); 244 ... do processing ... 245 BH_WLOCK(); busy=0; drain_queue(pending); BH_WUNLOCK(); 246 UH_WUNLOCK(); 247 this normally happens when the upper half has something heavy 248 to do. The prologue and epilogue are not in the critical path. 249 250- the main containers (fshash, schedhash, ...) are protected by 251 UH_WLOCK. 252 253Packet processing 254================= 255A packet enters dummynet through dummynet_io(). We first lookup 256the flowset number in fshash using dn_ht_find(), then find the scheduler 257instance using ipdn_si_find(), then possibly identify the correct 258queue with ipdn_q_find(). 259If successful, we call the scheduler's enqueue function(), and 260if needed start I/O on the link calling serve_sched(). 261If the packet can be returned immediately, this is done by 262leaving *m0 set. Otherwise, the packet is absorbed by dummynet 263and we simply return, possibly with some appropriate error code. 264 265Reconfiguration 266--------------- 267Reconfiguration is the complex part of the system because we need to 268keep track of the various objects and containers. 269At the moment we do not use reference counts for objects so all 270processing must be done under a lock. 271 272The main entry points for configuration is the ip_dn_ctl() handler 273for the IP_DUMMYNET3 sockopt (others are provided only for backward 274compatibility). Modifications to the configuration call do_config(). 275The argument is a sequence of blocks each starting with a struct dn_id 276which specifies its content. 277The first dn_id must contain as obj.id the DN_API_VERSION 278The obj.type is DN_CMD_CONFIG (followed by actual objects), 279DN_CMD_DELETE (with the correct subtype and list of objects), or 280DN_CMD_FLUSH. 281 282DN_CMD_CONFIG is followed by objects to add/reconfigure. In general, 283if an object already exists it is reconfigured, otherwise it is 284created in a way that keeps the structure consistent. 285We have the following objects in the system, normally numbered with 286an identifier N between 1 and 65535. For certain objects we have 287"shadow" copies numbered I+NMAX and I+ 2*NMAX which are used to 288implement certain backward compatibility features. 289 290In general we have the following linking 291 292 TRADITIONAL DUMMYNET QUEUES "queue N config ... pipe M ..." 293 corresponds to a dn_fs object numbered N 294 295 TRADITIONAL DUMMYNET PIPES "pipe N config ..." 296 dn_fs N+2*NMAX --> dn_sch N+NMAX type FIFO --> dn_link N+NMAX 297 298 GENERIC SCHEDULER "sched N config ... " 299 [dn_fs N+NMAX] --> dn_sch N --> dn_link N 300 The flowset N+NMAX is created only if the scheduler is not 301 of type MULTIQUEUE. 302 303 DELAY PROFILE "pipe N config profile ..." 304 it is always attached to an existing dn_link N 305 306Because traditional dummynet pipes actually configure both a 307'standalone' instance and one that can be used by queues, 308we do the following: 309 310 "pipe N config ..." configures: 311 dn_sched N type WF2Q+ 312 dn_sched N+NMAX type FIFO 313 dn_fs N+2NMAX attached to dn_sched N+NMAX 314 dn_pipe N 315 dn_pipe N+NMAX 316 317 "queue N config" configures 318 dn_fs N 319 320 "sched N config" configures 321 dn_sched N type as desired 322 dn_fs N+NMAX attached to dn_sched N 323 324 325dummynet_task() 326=============== 327The dummynet_task() function is the main dummynet processing function and is 328called every tick. This function first calculate the new current time, then 329it checks if it is the time to wake up object from the system_heap comparing 330the current time and the key of the heap. Two types of object (really the 331heap contains pointer to objects) are in the 332system_heap: 333 334- scheduler instance: if a scheduler instance is waked up, the dequeue() 335 function is called until it has credit. If the dequeue() returns packets, 336 the scheduler instance is inserted in the heap with a new key depending of 337 the data that will be send out. If the scheduler instance remains with 338 some credit, it means that is hasn't other packet to send and so the 339 instance is no longer inserted in the heap. 340 341 If the scheduler instance extracted from the heap has the DELETE flag set, 342 the dequeue() is not called and the instance is destroyed now. 343 344- delay line: when extracting a delay line, the function transmit_event() is 345 called to send out packet from delay line. 346 347 If the scheduler instance associated with this delay line doesn't exists, 348 the delay line will be delete now. 349 350Configuration 351============= 352To create a pipe, queue or scheduler, the user should type commands like: 353"ipfw pipe x config" 354"ipfw queue y config pipe x" 355"ipfw pipe x config sched <type>" 356 357The userland side of dummynet will prepare a buffer contains data to pass to 358kernel side. 359The buffer contains all struct needed to configure an object. In more detail, 360to configure a pipe all three structs (dn_link, dn_sch, dn_fs) are needed, 361plus the delay profile struct if the pipe has a delay profile. 362 363If configuring a scheduler only the struct dn_sch is wrote in the buffer, 364while if configuring a flowset only the dn_fs struct is wrote. 365 366The first struct in the buffer contains the type of command request, that is 367if it is configuring a pipe, a queue, or a scheduler. Then there are structs 368need to configure the object, and finally there is the struct that mark 369the end of the buffer. 370 371To support the insertion of pipe and queue using the old syntax, when adding 372a pipe it's necessary to create a FIFO flowset and a FIFO scheduler, which 373have a number x + DN_PIPEOFFSET. 374 375Add a pipe 376---------- 377A pipe is only a template for a link. 378If the pipe already exists, parameters are updated. If a delay profile exists 379it is deleted and a new one is created. 380If the pipe doesn't exist a new one is created. After the creation, the 381flowset unlinked list is scanned to see if there are some flowset that would 382be linked with this pipe. If so, these flowset will be of wf2q+ type (for 383compatibility) and a new wf2q+ scheduler is created now. 384 385Add a scheduler 386--------------- 387If the scheduler already exists, and the type and the mask are the same, the 388scheduler is simply reconfigured calling the config_scheduler() scheduler 389function with the RECONFIGURE flag active. 390If the type or the mask differ, it is necessary to delete the old scheduler 391and create a new one. 392If the scheduler doesn't exists, a new one is created. If the scheduler has 393a mask, the hash table is created to store pointers to scheduler instances. 394When a new scheduler is created, it is necessary to scan the unlinked 395flowset list to search eventually flowset that would be linked with this 396scheduler number. If some are found, flowsets became of the type of this 397scheduler and they are configured properly. 398 399Add a flowset 400------------- 401Flowset pointers are store in the system in two list. The unlinked flowset list 402contains all flowset that aren't linked with a scheduler, the flowset list 403contains flowset linked to a scheduler, and so they have a type. 404When adding a new flowset, first it is checked if the flowset exists (that is, 405it is in the flowset list) and if it doesn't exists a new flowset is created 406and added to unlinked flowset list if the scheduler which the flowset would be 407linked doesn't exists, or added in the flowset list and configured properly if 408the scheduler exists. If the flowset (before to be created) was in the 409unlinked flowset list, it is removed and deleted, and then recreated. 410If the flowset exists, to allow reconfiguration of this flowset, the 411scheduler number and types must match with the one in memory. If this isn't 412so, the flowset is deleted and a new one will be created. Really, the flowset 413it isn't deleted now, but it is removed from flowset list and it will be 414deleted later because there could be some queues that are using it. 415 416Listing of object 417================= 418The user can request a list of object present in dummynet through the command 419"ipfw [-v] pipe|queue [x] list|show" 420The kernel side of dummynet send a buffer to user side that contains all 421pipe, all scheduler, all flowset, plus all scheduler instances and all queues. 422The dummynet user land will format the output and show only the relevant 423information. 424The buffer sent start with all pipe from the system. The entire struct dn_link 425is passed, except the delay_profile struct that is useless in user space. 426After pipes, all flowset are wrote in the buffer. The struct contains 427scheduler flowset specific data is linked with the flowset writing the 428'obj' id of the extension into the 'alg_fs' pointer. 429Then schedulers are wrote. If a scheduler has one or more scheduler instance, 430these are linked to the parent scheduler writing the id of the parent in the 431'ptr_sched' pointer. If a scheduler instance has queues, there are wrote in 432the buffer and linked thorugh the 'obj' and 'sched_inst' pointer. 433Finally, flowsets in the unlinked flowset list are write in the buffer, and 434then a struct gen in saved in the buffer to mark the last struct in the buffer. 435 436 437Delete of object 438================ 439An object is usually removed by user through a command like 440"ipfw pipe|queue x delete". XXX sched? 441ipfw pass to the kernel a struct gen that contains the type and the number 442of the object to remove 443 444Delete of pipe x 445---------------- 446A pipe can be deleted by the user through the command 'ipfw pipe x delete'. 447To delete a pipe, the pipe is removed from the pipe list, and then deleted. 448Also the scheduler associated with this pipe should be deleted. 449For compatibility with old dummynet syntax, the associated FIFO scheduler and 450FIFO flowset must be deleted. 451 452Delete of flowset x 453------------------- 454To remove a flowset, we must be sure that is no longer referenced by any object. 455If the flowset to remove is in the unlinked flowset list, there is not any 456issue, the flowset can be safely removed calling a free() (the flowset 457extension is not yet created if the flowset is in this list). 458If the flowset is in the flowset list, first we remove from it so new packet 459are discarded when arrive. Next, the flowset is marked as delete. 460Now we must check if some queue is using this flowset. 461To do this, a counter (active_f) is provided. This counter indicate how many 462queues exist using this flowset. 463The active_f counter is automatically incremented when a queue is created 464and decremented when a queue is deleted. 465If the counter is 0, the flowset can be safely deleted, and the delete_alg_fs() 466scheduler function is called before deallocate memory. 467If the counter is not 0, the flowset remain in memory until the counter become 468zero. When a queue is delete (by dn_delete_queue() function) it is checked if 469the linked flowset is deleting and if so the counter is decrementing. If the 470counter reaches 0, the flowset is deleted. 471The deletion of a queue can be done only by the scheduler, or when the scheduler 472is destroyed. 473 474Delete of scheduler x 475--------------------- 476To delete a scheduler we must be sure that any scheduler instance of this type 477are in the system_heap. To do so, a counter (inst_counter) is provided. 478This counter is managed by the system: it is incremented every time it is 479inserted in the system_heap, and decremented every time it is extracted from it. 480To delete the scheduler, first we remove it from the scheduler list, so new 481packet are discarded when they arrive, and mark the scheduler as deleting. 482 483If the counter is 0, we can remove the scheduler safely calling the 484really_deletescheduler() function. This function will scan all scheduler 485instances and call the delete_scheduler_instance() function that will delete 486the instance. When all instance are deleted, the scheduler template is 487deleted calling the delete_scheduler_template(). If the delay line associate 488with the scheduler is empty, it is deleted now, else it will be deleted when 489it will became empy. 490If the counter was not 0, we wait for it. Every time the dummynet_task() 491function extract a scheduler from the system_heap, the counter is decremented. 492If the scheduler has the delete flag enabled the dequeue() is not called and 493delete_scheduler_instance() is called to delete the instance. 494Obviously this scheduler instance is no longer inserted in the system_heap. 495If the counter reaches 0, the delete_scheduler_template() function is called 496all memory is released. 497NOTE: Flowsets that belong to this scheduler are not deleted, so if a new 498 scheduler with the same number is inserted will use these flowsets. 499 To do so, the best approach would be insert these flowset in the 500 unlinked flowset list, but doing this now will be very expensive. 501 So flowsets will remain in memory and linked with a scheduler that no 502 longer exists until a packet belonging to this flowset arrives. When 503 this packet arrives, the reconfigure() function is called because the 504 generation number mismatch with one contains in the flowset and so 505 the flowset will be moved into the flowset unlinked list, or will be 506 linked with the new scheduler if a new one was created. 507 508 509COMPATIBILITY WITH FREEBSD 7.2 AND FREEBSD 8 'IPFW' BINARY 510========================================================== 511Dummynet is not compatible with old ipfw binary because internal structs are 512changed. Moreover, the old ipfw binary is not compatible with new kernels 513because the struct that represents a firewall rule has changed. So, if a user 514install a new kernel on a FreeBSD 7.2, the ipfw (and possibly many other 515commands) will not work. 516New dummynet uses a new socket option: IP_DUMMYNET3, used for both set and get. 517The old option can be used to allow compatibility with the 'ipfw' binary of 518older version (tested with 7.2 and 8.0) of FreeBSD. 519Two file are provided for this purpose: 520- ip_dummynet_glue.c translates old dummynet requests to the new ones, 521- ip_fw_glue.c converts the rule format between 7.2 and 8 versions. 522Let see in detail these two files. 523 524IP_DUMMYNET_GLUE.C 525------------------ 526The internal structs of new dummynet are very different from the original. 527Because of there are some difference from between dummynet in FreeBSD 7.2 and 528dummynet in FreeBSD 8 (the FreeBSD 8 version includes support to pipe delay 529profile and burst option), I have to include both header files. I copied 530the revision 191715 (for version 7.2) and the revision 196045 (for version 8) 531and I appended a number to each struct to mark them. 532 533The main function of this file is ip_dummynet_compat() that is called by 534ip_dn_ctl() when it receive a request of old socket option. 535 536A global variabile ('is7') store the version of 'ipfw' that FreeBSD is using. 537This variable is set every time a request of configuration is done, because 538with this request we receive a buffer of which size depending of ipfw version. 539Because of in general the first action is a configuration, this variable is 540usually set accordly. If the first action is a request of listing of pipes 541or queues, the system cannot know the version of ipfw, and we suppose that 542version 7.2 is used. If version is wrong, the output can be senseless, but 543the application should not crash. 544 545There are four request for old dummynet: 546- IP_DUMMYNET_FLUSH: the flush options have no parameter, so simply the 547 dummynet_flush() function is called; 548- IP_DUMMYNET_DEL: the delete option need to be translate. 549 It is only necessary to extract the number and the type of the object 550 (pipe or queue) to delete from the buffer received and build a new struct 551 gen contains the right parameters, then call the delete_object() function; 552- IP_DUMMYNET_CONFIGURE: the configure command receive a buffer depending of 553 the ipfw version. After the properly extraction of all data, that depends 554 by the ipfw version used, new structures are filled and then the dummynet 555 config_link() function is properly called. Note that the 7.2 version does 556 not support some parameter as burst or delay profile. 557- IP_DUMMYNET_GET: The get command should send to the ipfw the correct buffer 558 depending of its version. There are two function that build the 559 corrected buffer, ip_dummynet_get7() and ip_dummynet_get8(). These 560 functions reproduce the buffer exactly as 'ipfw' expect. The only difference 561 is that the weight parameter for a queue is no longer sent by dummynet and so 562 it is set to 0. 563 Moreover, because of the internal structure has changed, the bucket size 564 of a queue could not be correct, because now all flowset share the hash 565 table. 566 If the version of ipfw is wrong, the output could be senseless or truncated, 567 but the application should not crash. 568 569IP_FW_GLUE.C 570------------ 571The ipfw binary also is used to add rules to FreeBSD firewall. Because of the 572struct ip_fw is changed from FreeBsd 7.2 to FreeBSD 8, it is necessary 573to write some glue code to allow use ipfw from FreeBSD 7.2 with the kernel 574provided with FreeBSD 8. 575This file contains two functions to convert a rule from FreeBSD 7.2 format to 576FreeBSD 8 format, and viceversa. 577The conversion should be done when a rule passes from userspace to kernel space 578and viceversa. 579I have to modify the ip_fw2.c file to manage these two case, and added a 580variable (is7) to store the ipfw version used, using an approach like the 581previous file: 582- when a new rule is added (option IP_FW_ADD) the is7 variable is set if the 583 size of the rule received correspond to FreeBSD 7.2 ipfw version. If so, the 584 rule is converted to version 8 calling the function convert_rule_to_8(). 585 Moreover, after the insertion of the rule, the rule is now reconverted to 586 version 7 because the ipfw binary will print it. 587- when the user request a list of rules (option IP_FW_GET) the is7 variable 588 should be set correctly because we suppose that a configure command was done, 589 else we suppose that the FreeBSD version is 8. The function ipfw_getrules() 590 in ip_fw2.c file return all rules, eventually converted to version 7 (if 591 the is7 is set) to the ipfw binary. 592The conversion of a rule is quite simple. The only difference between the 593two structures (struct ip_fw) is that in the new there is a new field 594(uint32_t id). So, I copy the entire rule in a buffer and the copy the rule in 595the right position in the new (or old) struct. The size of commands are not 596changed, and the copy is done into a cicle. 597 598How to configure dummynet 599========================= 600It is possible to configure dummynet through two main commands: 601'ipfw pipe' and 'ipfw queue'. 602To allow compatibility with old version, it is possible configure dummynet 603using the old command syntax. Doing so, obviously, it is only possible to 604configure a FIFO scheduler or a wf2q+ scheduler. 605A new command, 'ipfw pipe x config sched <type>' is supported to add a new 606scheduler to the system. 607 608- ipfw pipe x config ... 609 create a new pipe with the link parameters 610 create a new scheduler fifo (x + offset) 611 create a new flowset fifo (x + offset) 612 the mask is eventually stored in the FIFO scheduler 613 614- ipfw queue y config pipe x ... 615 create a new flowset y linked to sched x. 616 The type of flowset depends by the specified scheduler. 617 If the scheduler does not exist, this flowset is inserted in a special 618 list and will be not active. 619 If pipe x exists and sched does not exist, a new wf2q+ scheduler is 620 created and the flowset will be linked to this new scheduler (this is 621 done for compatibility with old syntax). 622 623- ipfw pipe x config sched <type> ... 624 create a new scheduler x of type <type>. 625 Search into the flowset unlinked list if there are some flowset that 626 should be linked with this new scheduler. 627 628- ipfw pipe x delete 629 delete the pipe x 630 delete the scheduler fifo (x + offset) 631 delete the scheduler x 632 delete the flowset fifo (x + offset) 633 634- ipfw queue x delete 635 delete the flowset x 636 637- ipfw sched x delete ///XXX 638 delete the scheduler x 639 640Follow now some examples to how configure dummynet: 641- Ex1: 642 ipfw pipe 10 config bw 1M delay 15 // create a pipe with band and delay 643 A FIFO flowset and scheduler is 644 also created 645 ipfw queue 5 config pipe 10 weight 56 // create a flowset. This flowset 646 will be of wf2q+ because a pipe 10 647 exists. Moreover, the wf2q+ 648 scheduler is created now. 649- Ex2: 650 ipfw queue 5 config pipe 10 weight 56 // Create a flowset. Scheduler 10 651 does not exist, so this flowset 652 is inserted in the unlinked 653 flowset list. 654 ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler. 655 Because of a flowset with 'pipe 10' exists, 656 a wf2q+ scheduler is created now and that 657 flowset is linked with this sceduler. 658 659- Ex3: 660 ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler. 661 ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to 662 pipe 10 663 ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5. This flowset 664 will belong to scheduler 10 and 665 it is of type RR 666 667- Ex4: 668 ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to 669 pipe 10 (not exist yet) 670 ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler. 671 ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5.This flowset 672 will belong to scheduler 10 and 673 it is of type RR 674 ipfw pipe 10 config sched wf2q+ // Modify the type of scheduler 10. It 675 becomes a wf2q+ scheduler. 676 When a new packet of flowset 5 arrives, 677 the flowset 5 becomes to wf2q+ type. 678 679How to implement a new scheduler 680================================ 681In dummynet, a scheduler algorithm is represented by two main structs, some 682functions and other minor structs. 683- A struct dn_sch_xyz (where xyz is the 'type' of scheduler algorithm 684 implemented) contains data relative to scheduler, as global parameter that 685 are common to all instances of the scheduler 686- A struct dn_sch_inst_xyz contains data relative to a single scheduler 687 instance, as local status variable depending for example by flows that 688 are linked with the scheduler, and so on. 689To add a scheduler to dummynet, the user should type a command like: 690'ipfw pipe x config sched <type> [mask ... ...]' 691This command creates a new struct dn_sch_xyz of type <type>, and 692store the optional parameter in that struct. 693 694The parameter mask determines how many scheduler instance of this 695scheduler may exist. For example, it is possible to divide traffic 696depending on the source port (or destination, or ip address...), 697so that every scheduler instance act as an independent scheduler. 698If the mask is not set, all traffic goes to the same instance. 699 700When a packet arrives to a scheduler, the system search the corrected 701scheduler instance, and if it does not exist it is created now (the 702struct dn_sch_inst_xyz is allocated by the system, and the scheduler 703fills the field correctly). It is a task of the scheduler to create 704the struct that contains all queues for a scheduler instance. 705Dummynet provides some function to create an hash table to store 706queues, but the schedule algorithm can choice the own struct. 707 708To link a flow to a scheduler, the user should type a command like: 709'ipfw queue z config pipe x [mask... ...]' 710 711This command creates a new 'dn_fs' struct that will be inserted 712in the system. If the scheduler x exists, this flowset will be 713linked to that scheduler and the flowset type become the same as 714the scheduler type. At this point, the function create_alg_fs_xyz() 715is called to allow store eventually parameter for the flowset that 716depend by scheduler (for example the 'weight' parameter for a wf2q+ 717scheduler, or some priority...). A parameter mask can be used for 718a flowset. If the mask parameter is set, the scheduler instance can 719separate packet according to its flow id (src and dst ip, ports...) 720and assign it to a separate queue. This is done by the scheduler, 721so it can ignore the mask if it wants. 722 723See now the two main structs: 724struct dn_sch_xyz { 725 struct gen g; /* important the name g */ 726 /* global params */ 727}; 728struct dn_sch_inst_xyz { 729 struct gen g; /* important the name g */ 730 /* params of the instance */ 731}; 732It is important to embed the struct gen as first parameter. The struct gen 733contains some values that the scheduler instance must fill (the 'type' of 734scheduler, the 'len' of the struct...) 735The function create_scheduler_xyz() should be implemented to initialize global 736parameters in the first struct, and if memory allocation is done it is 737mandatory to implement the delete_scheduler_template() function to free that 738memory. 739The function create_scheduler_instance_xyz() must be implemented even if the 740scheduler instance does not use extra parameters. In this function the struct 741gen fields must be filled with corrected infos. The 742delete_scheduler_instance_xyz() function must bu implemented if the instance 743has allocated some memory in the previous function. 744 745To store data belonging to a flowset the follow struct is used: 746struct alg_fs_xyz { 747 struct gen g; 748 /* fill correctly the gen struct 749 g.subtype = DN_XYZ; 750 g.len = sizeof(struct alg_fs_xyz) 751 ... 752 */ 753 /* params for the flow */ 754}; 755The create_alg_fs_xyz() function is mandatory, because it must fill the struct 756gen, but the delete_alg_fs_xyz() is mandatory only if the previous function 757has allocated some memory. 758 759A struct dn_queue contains packets belonging to a queue and some statistical 760data. The scheduler could have to store data in this struct, so it must define 761a dn_queue_xyz struct: 762struct dn_queue_xyz { 763 struct dn_queue q; 764 /* parameter for a queue */ 765} 766 767All structures are allocated by the system. To do so, the scheduler must 768set the size of its structs in the scheduler descriptor: 769scheduler_size: sizeof(dn_sch_xyz) 770scheduler_i_size: sizeof(dn_sch_inst_xyz) 771flowset_size: sizeof(alg_fs_xyz) 772queue_size: sizeof(dn_queue_xyz); 773The scheduler_size could be 0, but other struct must have at least a struct gen. 774 775 776After the definition of structs, it is necessary to implement the 777scheduler functions. 778 779- int (*config_scheduler)(char *command, void *sch, int reconfigure); 780 Configure a scheduler, or reconfigure if 'reconfigure' == 1. 781 This function performs additional allocation and initialization of global 782 parameter for this scheduler. 783 If memory is allocated here, the delete_scheduler_template() function 784 should be implemented to remove this memory. 785- int (*delete_scheduler_template)(void* sch); 786 Delete a scheduler template. This function is mandatory if the scheduler 787 uses extra data respect the struct dn_sch. 788- int (*create_scheduler_instance)(void *s); 789 Create a new scheduler instance. The system allocate the necessary memory 790 and the schedulet can access it using the 's' pointer. 791 The scheduler instance stores all queues, and to do this can use the 792 hash table provided by the system. 793- int (*delete_scheduler_instance)(void *s); 794 Delete a scheduler instance. It is important to free memory allocated 795 by create_scheduler_instance() function. The memory allocated by system 796 is freed by the system itself. The struct contains all queue also has 797 to be deleted. 798- int (*enqueue)(void *s, struct gen *f, struct mbuf *m, 799 struct ipfw_flow_id *id); 800 Called when a packet arrives. The packet 'm' belongs to the scheduler 801 instance 's', has a flowset 'f' and the flowid 'id' has already been 802 masked. The enqueue() must call dn_queue_packet(q, m) function to really 803 enqueue packet in the queue q. The queue 'q' is chosen by the scheduler 804 and if it does not exist should be created calling the dn_create_queue() 805 function. If the schedule want to drop the packet, it must call the 806 dn_drop_packet() function and then return 1. 807- struct mbuf * (*dequeue)(void *s); 808 Called when the timer expires (or when a packet arrives and the scheduler 809 instance is idle). 810 This function is called when at least a packet can be send out. The 811 scheduler choices the packet and returns it; if no packet are in the 812 schedulerinstance, the function must return NULL. 813 Before return a packet, it is important to call the function 814 dn_return_packet() to update some statistic of the queue and update the 815 queue counters. 816- int (*drain_queue)(void *s, int flag); 817 The system request to scheduler to delete all queues that is not using 818 to free memory. The flag parameter indicate if a queue must be deleted 819 even if it is active. 820 821- int (*create_alg_fs)(char *command, struct gen *g, int reconfigure); 822 It is called when a flowset is linked with a scheduler. This is done 823 when the scheduler is defined, so we can know the type of flowset. 824 The function initialize the flowset paramenter parsing the command 825 line. The parameter will be stored in the g struct that have the right 826 size allocated by the system. If the reconfigure flag is set, it means 827 that the flowset is reconfiguring 828- int (*delete_alg_fs)(struct gen *f); 829 It is called when a flowset is deleting. Must remove the memory allocate 830 by the create_alg_fs() function. 831 832- int (*create_queue_alg)(struct dn_queue *q, struct gen *f); 833 Called when a queue is created. The function should link the queue 834 to the struct used by the scheduler instance to store all queues. 835- int (*delete_queue_alg)(struct dn_queue *q); 836 Called when a queue is deleting. The function should remove extra data 837 and update the struct contains all queues in the scheduler instance. 838 839The struct scheduler represent the scheduler descriptor that is passed to 840dummynet when a scheduler module is loaded. 841This struct contains the type of scheduler, the length of all structs and 842all function pointers. 843If a function is not implemented should be initialize to NULL. Some functions 844are mandatory, other are mandatory if some memory should be freed. 845Mandatory functions: 846- create_scheduler_instance() 847- enqueue() 848- dequeue() 849- create_alg_fs() 850- drain_queue() 851Optional functions: 852- config_scheduler() 853- create_queue_alg() 854Mandatory functions if the corresponding create...() has allocated memory: 855- delete_scheduler_template() 856- delete_scheduler_instance() 857- delete_alg_fs() 858- delete_queue_alg() 859 860