xref: /freebsd/sys/netpfil/ipfw/dummynet.txt (revision f37852c17391fdf0e8309bcf684384dd0d854e43)
1#
2# $FreeBSD$
3#
4
5Notes on the internal structure of dummynet (2010 version)
6by Riccardo Panicucci and Luigi Rizzo
7Work supported by the EC project ONELAB2
8
9
10*********
11* INDEX *
12*********
13Implementation of new dummynet
14    Internal structure
15    Files
16Packet arrival
17    The reconfiguration routine
18dummynet_task()
19Configuration
20    Add a pipe
21    Add a scheduler
22    Add a flowset
23Listing object
24Delete of object
25    Delete a pipe
26    Delete a flowset
27    Delete a scheduler
28Compatibility with FreeBSD7.2 and FreeBSD 8 ipfw binary
29    ip_dummynet_glue.c
30    ip_fw_glue.c
31How to configure dummynet
32How to implement a new scheduler
33
34
35
36OPEN ISSUES
37------------------------------
3820100131 deleting RR causes infinite loop
39	presumably in the rr_free_queue() call -- seems to hang
40	forever when deleting a live flow
41------------------------------
42
43Dummynet is a traffic shaper and network emulator. Packets are
44selected by an external filter such as ipfw, and passed to the emulator
45with a tag such as "pipe 10" or "queue 5" which tells what to
46do with the packet. As an example
47
48	ipfw add queue 5 icmp from 10.0.0.2 to all
49
50All packets with the same tag belong to a "flowset", or a set
51of flows which can be further partitioned according to a mask.
52Flowsets are then passed to a scheduler for processing. The
53association of flowsets and schedulers is configurable e.g.
54
55	ipfw queue 5 config sched 10 weight 3 flow_mask xxxx
56	ipfw queue 8 config sched 10 weight 1 ...
57	ipfw queue 3 config sched 20 weight 1 ...
58
59"sched 10" represents one or more scheduler instances,
60selected through a mask on the 5-tuple itself.
61
62	ipfw sched 20 config type FIFO sched_mask yyy ...
63
64There are in fact two masks applied to each packet:
65+ the "sched_mask" sends packets arriving to a scheduler_id to
66  one of many instances.
67+ the "flow_mask" together with the flowset_id is used to
68  collect packets into independent flows on each scheduler.
69
70As an example, we can have
71	ipfw queue 5 config sched 10 flow_mask src-ip 0x000000ff
72	ipfw sched 10 config type WF2Q+ sched_mask src-ip 0xffffff00
73
74means that sched 10 will have one instance per /24 source subnet,
75and within that, each individual source will be a flow.
76
77Internal structure
78-----------------
79Dummynet-related data is split into several data structures,
80part of them constituting the userland-kernel API, and others
81specific to the kernel.
82NOTE: for up-to-date details please look at the relevant source
83	headers (ip_dummynet.h, ip_dn_private.h, dn_sched.h)
84
85USERLAND-KERNEL API	(ip_dummynet.h)
86
87    struct dn_link:
88	contains data about the physical link such as
89	bandwidth, delay, burst size;
90
91    struct dn_fs:
92	describes a flowset, i.e. a template for queues.
93	Main parameters are the scheduler we attach to, a flow_mask,
94	buckets, queue size, plr, weight, and other scheduler-specific
95	parameters.
96
97    struct dn_flow
98	contains information on a flow, including masks and
99	statistics
100
101    struct dn_sch:
102	defines a scheduler (and a link attached to it).
103	Parameters include scheduler type, sched_mask, number of
104	buckets, and possibly other scheduler-specific parameters,
105
106    struct dn_profile:
107	fields to simulate a delay profile
108
109
110KERNEL REPRESENTATION	(ip_dn_private.h)
111
112    struct mq
113	a queue of mbufs with head and tail.
114
115    struct dn_queue
116	individual queue of packets, created by a flowset using
117	flow_mask and attached to a scheduler instance selected
118	through sched_mask.
119	A dn_queue has a pointer to the dn_fsk (which in turn counts
120	how many queues point to it), a pointer to the
121	dn_sch_inst it attaches to, and is in a hash table in the
122	flowset. scheduler instances also should store queues in
123	their own containers used for scheduling (lists, trees, etc.)
124	CREATE: done on packet arrivals when a flow matches a flowset.
125	DELETE: done only when deleting the parent dn_sch_inst
126		or draining memory.
127
128    struct dn_fsk
129	includes a dn_fs; a pointer to the dn_schk; a link field
130	for the list of dn_fsk attached to the same scheduler,
131	or for the unlinked list;
132	a refcount for the number of queues pointing to it;
133	The dn_fsk is in a hash table, fshash.
134	CREATE: done on configuration commands.
135	DELETE: on configuration commands.
136
137    struct dn_sch_inst
138	a scheduler instance, created from a dn_schk applying sched_mask.
139	Contains a delay line, a reference to the parent, and scheduler-
140	specific info.  Both dn_sch_inst and its delay line can be in the
141	evheap if they have events to be processed.
142	CREATE: created from a dn_schk applying sched_mask
143	DELETE: configuration command delete a scheduler which in turn
144		sweeps the hash table of instances deleting them
145
146    struct dn_schk
147	includes dn_sch, dn_link, a pointer to dn_profile,
148	a hash table of dn_sch_inst, a list of dn_fsk
149	attached to it.
150	CREATE: configuration command. If there are flowsets that
151		refer to this number, they are attached and moved
152		to the hash table
153	DELETE: manual, see dn_sch_inst
154
155
156	fshash                            schedhash
157      +---------------+   sched        +--------------+
158      |      sched-------------------->|      NEW_SCHK|
159  -<----*sch_chain    |<-----------------*fsk_list    |
160      |NEW_FSK        |<----.          | [dn_link]    |
161      +---------------+     |          +--------------+
162      |qht (hash)     |     |          |  siht(hash)  |
163      |   [dn_queue]  |     |          |  [dn_si]     |
164      |   [dn_queue]  |     |          |  [dn_si]     |
165      |     ...       |     |          |   ...        |
166      |   +--------+  |     |          | +---------+  |
167      |   |dn_queue|  |     |          | |dn_si    |  |
168      |  |    fs *----------'          | |         |  |
169      |  |    si *---------------------->|         |  |
170      |  +---------+  |                | +---------+  |
171      +---------------+                +--------------+
172
173The following global data structures contain all
174schedulers and flowsets.
175
176- schedhash[x]: contains all scheduler templates in the system.
177	Looked up only on manual configurations, where flowsets
178	are attached to matching schedulers.
179	We have one entry per 'sched X config' command
180	(plus one for each 'pipe X config').
181
182- fshash[x]: contains all flowsets.
183	We do a lookup on this for each packet.
184	We have one entry for each 'queue X config'
185	(plus one for each 'pipe X config').
186
187Additionally, a list that contains all unlinked flowset:
188- fsu:  contains flowset that are not linked with any scheduler.
189	Flowset are put in this list when they refer to a non
190	existing scheduler.
191	We don't need an efficient data structure as we never search
192	here on a packet arrivals.
193
194Scheduler instances and the delay lines associated with each scheduler
195instance need to be woken up at certain times. Because we have many
196such objects, we keep them in a priority heap (system_heap).
197
198Almost all objects in this implementation are preceded by a structure
199(struct dn_id) which makes it easier to identify them.
200
201
202Files
203-----
204The dummynet code is split in several files.
205All kernel code is in sys/netpfil/ipfw except ip_dummynet.h
206All userland code is in sbin/ipfw.
207Files are
208- sys/netpfil/ip_dummynet.h defines the kernel-userland API
209- ip_dn_private.h contains the kernel-specific APIs
210  and data structures
211- dn_sched.h defines the scheduler API
212- ip_dummynet.c cointains module glue and sockopt handlers, with all
213  functions to configure and list objects.
214- ip_dn_io.c contains the functions directly related to packet processing,
215  and run in the critical path. It also contains some functions
216  exported to the schedulers.
217- dn_heap.[ch] implement a binary heap and a generic hash table
218- dn_sched_* implement the various scheduler modules
219
220- dummynet.c is the file used to implement the user side of dummynet.
221  It contains the function to parsing command line, and functions to
222  show the output of dummynet objects.
223Moreover, there are two new file (ip_dummynet_glue.c and ip_fw_glue.c) that
224are used to allow compatibility with the "ipfw" binary from FreeBSD 7.2 and
225FreeBSD 8.
226
227LOCKING
228=======
229At the moment the entire processing occurs under a single lock
230which is expected to be acquired in exclusive mode
231DN_BH_WLOCK() / DN_BH_WUNLOCK().
232
233In perspective we aim at the following:
234- the 'busy' flag, 'pending' list and all structures modified by packet
235  arrivals and departures are protected by the BH_WLOCK.
236  This is normally acquired in exclusive mode by the packet processing
237  functions for short sections of code (exception -- the timer).
238  If 'busy' is not set, we can do regular packet processing.
239  If 'busy' is set, no pieces can be accessed.
240  We must enqueue the packet on 'pending' and return immediately.
241
242- the 'busy' flag is set/cleared by long sections of code as follows:
243	UH_WLOCK(); KASSERT(busy == 0);
244	BH_WLOCK(); busy=1; BH_WUNLOCK();
245	... do processing ...
246	BH_WLOCK(); busy=0; drain_queue(pending); BH_WUNLOCK();
247	UH_WUNLOCK();
248  this normally happens when the upper half has something heavy
249  to do. The prologue and epilogue are not in the critical path.
250
251- the main containers (fshash, schedhash, ...) are protected by
252  UH_WLOCK.
253
254Packet processing
255=================
256A packet enters dummynet through dummynet_io(). We first lookup
257the flowset number in fshash using dn_ht_find(), then find the scheduler
258instance using ipdn_si_find(), then possibly identify the correct
259queue with ipdn_q_find().
260If successful, we call the scheduler's enqueue function(), and
261if needed start I/O on the link calling serve_sched().
262If the packet can be returned immediately, this is done by
263leaving *m0 set. Otherwise, the packet is absorbed by dummynet
264and we simply return, possibly with some appropriate error code.
265
266Reconfiguration
267---------------
268Reconfiguration is the complex part of the system because we need to
269keep track of the various objects and containers.
270At the moment we do not use reference counts for objects so all
271processing must be done under a lock.
272
273The main entry points for configuration is the ip_dn_ctl() handler
274for the IP_DUMMYNET3 sockopt (others are provided only for backward
275compatibility). Modifications to the configuration call do_config().
276The argument is a sequence of blocks each starting with a  struct dn_id
277which specifies its content.
278The first dn_id must contain as obj.id the DN_API_VERSION
279The obj.type is DN_CMD_CONFIG (followed by actual objects),
280DN_CMD_DELETE (with the correct subtype and list of objects), or
281DN_CMD_FLUSH.
282
283DN_CMD_CONFIG is followed by objects to add/reconfigure. In general,
284if an object already exists it is reconfigured, otherwise it is
285created in a way that keeps the structure consistent.
286We have the following objects in the system, normally numbered with
287an identifier N between 1 and 65535. For certain objects we have
288"shadow" copies numbered I+NMAX and I+ 2*NMAX which are used to
289implement certain backward compatibility features.
290
291In general we have the following linking
292
293  TRADITIONAL DUMMYNET QUEUES "queue N config ... pipe M ..."
294	corresponds to a dn_fs object numbered N
295
296  TRADITIONAL DUMMYNET PIPES "pipe N config ..."
297	dn_fs N+2*NMAX --> dn_sch N+NMAX type FIFO --> dn_link N+NMAX
298
299  GENERIC SCHEDULER "sched N config ... "
300	[dn_fs N+NMAX] --> dn_sch N --> dn_link N
301	The flowset N+NMAX is created only if the scheduler is not
302	of type MULTIQUEUE.
303
304  DELAY PROFILE	"pipe N config profile ..."
305	it is always attached to an existing dn_link N
306
307Because traditional dummynet pipes actually configure both a
308'standalone' instance and one that can be used by queues,
309we do the following:
310
311    "pipe N config ..." configures:
312	dn_sched N type WF2Q+
313	dn_sched N+NMAX type FIFO
314	dn_fs N+2NMAX attached to dn_sched N+NMAX
315	dn_pipe N
316	dn_pipe N+NMAX
317
318    "queue N config" configures
319	dn_fs N
320
321    "sched N config" configures
322	dn_sched N type as desired
323	dn_fs N+NMAX attached to dn_sched N
324
325
326dummynet_task()
327===============
328The dummynet_task() function is the main dummynet processing function and is
329called every tick. This function first calculate the new current time, then
330it checks if it is the time to wake up object from the system_heap comparing
331the current time and the key of the heap. Two types of object (really the
332heap contains pointer to objects) are in the
333system_heap:
334
335- scheduler instance: if a scheduler instance is waked up, the dequeue()
336  function is called until it has credit. If the dequeue() returns packets,
337  the scheduler instance is inserted in the heap with a new key depending of
338  the data that will be send out. If the scheduler instance remains with
339  some credit, it means that is hasn't other packet to send and so the
340  instance is no longer inserted in the heap.
341
342  If the scheduler instance extracted from the heap has the DELETE flag set,
343  the dequeue() is not called and the instance is destroyed now.
344
345- delay line: when extracting a delay line, the function transmit_event() is
346  called to send out packet from delay line.
347
348  If the scheduler instance associated with this delay line doesn't exists,
349  the delay line will be delete now.
350
351Configuration
352=============
353To create a pipe, queue or scheduler, the user should type commands like:
354"ipfw pipe x config"
355"ipfw queue y config pipe x"
356"ipfw pipe x config sched <type>"
357
358The userland side of dummynet will prepare a buffer contains data to pass to
359kernel side.
360The buffer contains all struct needed to configure an object. In more detail,
361to configure a pipe all three structs (dn_link, dn_sch, dn_fs) are needed,
362plus the delay profile struct if the pipe has a delay profile.
363
364If configuring a scheduler only the struct dn_sch is wrote in the buffer,
365while if configuring a flowset only the dn_fs struct is wrote.
366
367The first struct in the buffer contains the type of command request, that is
368if it is configuring a pipe, a queue, or a scheduler. Then there are structs
369need to configure the object, and finally there is the struct that mark
370the end of the buffer.
371
372To support the insertion of pipe and queue using the old syntax, when adding
373a pipe it's necessary to create a FIFO flowset and a FIFO scheduler, which
374have a number x + DN_PIPEOFFSET.
375
376Add a pipe
377----------
378A pipe is only a template for a link.
379If the pipe already exists, parameters are updated. If a delay profile exists
380it is deleted and a new one is created.
381If the pipe doesn't exist a new one is created. After the creation, the
382flowset unlinked list is scanned to see if there are some flowset that would
383be linked with this pipe. If so, these flowset will be of wf2q+ type (for
384compatibility) and a new wf2q+ scheduler is created now.
385
386Add a scheduler
387---------------
388If the scheduler already exists, and the type and the mask are the same, the
389scheduler is simply reconfigured calling the config_scheduler() scheduler
390function with the RECONFIGURE flag active.
391If the type or the mask differ, it is necessary to delete the old scheduler
392and create a new one.
393If the scheduler doesn't exists, a new one is created. If the scheduler has
394a mask, the hash table is created to store pointers to scheduler instances.
395When a new scheduler is created, it is necessary to scan the unlinked
396flowset list to search eventually flowset that would be linked with this
397scheduler number. If some are found, flowsets became of the type of this
398scheduler and they are configured properly.
399
400Add a flowset
401-------------
402Flowset pointers are store in the system in two list. The unlinked flowset list
403contains all flowset that aren't linked with a scheduler, the flowset list
404contains flowset linked to a scheduler, and so they have a type.
405When adding a new flowset, first it is checked if the flowset exists (that is,
406it is in the flowset list) and if it doesn't exists a new flowset is created
407and added to unlinked flowset list if the scheduler which the flowset would be
408linked doesn't exists, or added in the flowset list and configured properly if
409the scheduler exists. If the flowset (before to be created) was in the
410unlinked flowset list, it is removed and deleted, and then recreated.
411If the flowset exists, to allow reconfiguration of this flowset, the
412scheduler number and types must match with the one in memory. If this isn't
413so, the flowset is deleted and a new one will be created. Really, the flowset
414it isn't deleted now, but it is removed from flowset list and it will be
415deleted later because there could be some queues that are using it.
416
417Listing of object
418=================
419The user can request a list of object present in dummynet through the command
420"ipfw [-v] pipe|queue [x] list|show"
421The kernel side of dummynet send a buffer to user side that contains all
422pipe, all scheduler, all flowset, plus all scheduler instances and all queues.
423The dummynet user land will format the output and show only the relevant
424information.
425The buffer sent start with all pipe from the system. The entire struct dn_link
426is passed, except the delay_profile struct that is useless in user space.
427After pipes, all flowset are wrote in the buffer. The struct contains
428scheduler flowset specific data is linked with the flowset writing the
429'obj' id of the extension into the 'alg_fs' pointer.
430Then schedulers are wrote. If a scheduler has one or more scheduler instance,
431these are linked to the parent scheduler writing the id of the parent in the
432'ptr_sched' pointer. If a scheduler instance has queues, there are wrote in
433the buffer and linked thorugh the 'obj' and 'sched_inst' pointer.
434Finally, flowsets in the unlinked flowset list  are write in the buffer, and
435then a struct gen in saved in the buffer to mark the last struct in the buffer.
436
437
438Delete of object
439================
440An object is usually removed by user through a command like
441"ipfw pipe|queue x delete". XXX sched?
442ipfw pass to the kernel a struct gen that contains the type and the number
443of the object to remove
444
445Delete of pipe x
446----------------
447A pipe can be deleted by the user through the command 'ipfw pipe x delete'.
448To delete a pipe, the pipe is removed from the pipe list, and then deleted.
449Also the scheduler associated with this pipe should be deleted.
450For compatibility with old dummynet syntax, the associated FIFO scheduler and
451FIFO flowset must be deleted.
452
453Delete of flowset x
454-------------------
455To remove a flowset, we must be sure that is no longer referenced by any object.
456If the flowset to remove is in the unlinked flowset list, there is not any
457issue, the flowset can be safely removed calling a free() (the flowset
458extension is not yet created if the flowset is in this list).
459If the flowset is in the flowset list, first we remove from it so new packet
460are discarded when arrive. Next, the flowset is marked as delete.
461Now we must check if some queue is using this flowset.
462To do this, a counter (active_f) is provided. This counter indicate how many
463queues exist using this flowset.
464The active_f counter is automatically incremented when a queue is created
465and decremented when a queue is deleted.
466If the counter is 0, the flowset can be safely deleted, and the delete_alg_fs()
467scheduler function is called before deallocate memory.
468If the counter is not 0, the flowset remain in memory until the counter become
469zero. When a queue is delete (by dn_delete_queue() function) it is checked if
470the linked flowset is deleting and if so the counter is decrementing. If the
471counter reaches 0, the flowset is deleted.
472The deletion of a queue can be done only by the scheduler, or when the scheduler
473is destroyed.
474
475Delete of scheduler x
476---------------------
477To delete a scheduler we must be sure that any scheduler instance of this type
478are in the system_heap. To do so, a counter (inst_counter) is provided.
479This counter is managed by the system: it is incremented every time it is
480inserted in the system_heap, and decremented every time it is extracted from it.
481To delete the scheduler, first we remove it from the scheduler list, so new
482packet are discarded when they arrive, and mark the scheduler as deleting.
483
484If the counter is 0, we can remove the scheduler safely calling the
485really_deletescheduler() function. This function will scan all scheduler
486instances and call the delete_scheduler_instance() function that will delete
487the instance. When all instance are deleted, the scheduler template is
488deleted calling the delete_scheduler_template(). If the delay line associate
489with the scheduler is empty, it is deleted now, else it will be deleted when
490it will became empy.
491If the counter was not 0, we wait for it. Every time the dummynet_task()
492function extract a scheduler from the system_heap, the counter is decremented.
493If the scheduler has the delete flag enabled the dequeue() is not called and
494delete_scheduler_instance() is called to delete the instance.
495Obviously this scheduler instance is no longer inserted in the system_heap.
496If the counter reaches 0, the delete_scheduler_template() function is called
497all memory is released.
498NOTE: Flowsets that belong to this scheduler are not deleted, so if a new
499      scheduler with the same number is inserted will use these flowsets.
500      To do so, the best approach would be insert these flowset in the
501      unlinked flowset list, but doing this now will be very expensive.
502      So flowsets will remain in memory and linked with a scheduler that no
503      longer exists until a packet belonging to this flowset arrives. When
504      this packet arrives, the reconfigure() function is called because the
505      generation number mismatch with one contains in the flowset and so
506      the flowset will be moved into the flowset unlinked list, or will be
507      linked with the new scheduler if a new one was created.
508
509
510COMPATIBILITY WITH FREEBSD 7.2 AND FREEBSD 8 'IPFW' BINARY
511==========================================================
512Dummynet is not compatible with old ipfw binary because internal structs are
513changed. Moreover, the old ipfw binary is not compatible with new kernels
514because the struct that represents a firewall rule has changed. So, if a user
515install a new kernel on a FreeBSD 7.2, the ipfw (and possibly many other
516commands) will not work.
517New dummynet uses a new socket option: IP_DUMMYNET3, used for both set and get.
518The old option can be used to allow compatibility with the 'ipfw' binary of
519older version (tested with 7.2 and 8.0) of FreeBSD.
520Two file are provided for this purpose:
521- ip_dummynet_glue.c translates old dummynet requests to the new ones,
522- ip_fw_glue.c converts the rule format between 7.2 and 8 versions.
523Let see in detail these two files.
524
525IP_DUMMYNET_GLUE.C
526------------------
527The internal structs of new dummynet are very different from the original.
528Because of there are some difference from between dummynet in FreeBSD 7.2 and
529dummynet in FreeBSD 8 (the FreeBSD 8 version includes support to pipe delay
530profile and burst option), I have to include both header files. I copied
531the revision 191715 (for version 7.2) and the revision 196045 (for version 8)
532and I appended a number to each struct to mark them.
533
534The main function of this file is ip_dummynet_compat() that is called by
535ip_dn_ctl() when it receive a request of old socket option.
536
537A global variabile ('is7') store the version of 'ipfw' that FreeBSD is using.
538This variable is set every time a request of configuration is done, because
539with this request we receive a buffer of which size depending of ipfw version.
540Because of in general the first action is a configuration, this variable is
541usually set accordly. If the first action is a request of listing of pipes
542or queues, the system cannot know the version of ipfw, and we suppose that
543version 7.2 is used. If version is wrong, the output can be senseless, but
544the application should not crash.
545
546There are four request for old dummynet:
547- IP_DUMMYNET_FLUSH: the flush options have no parameter, so simply the
548  dummynet_flush() function is called;
549- IP_DUMMYNET_DEL: the delete option need to be translate.
550  It is only necessary to extract the number and the type of the object
551  (pipe or queue) to delete from the buffer received and build a new struct
552  gen contains the right parameters, then call the delete_object() function;
553- IP_DUMMYNET_CONFIGURE: the configure command receive a buffer depending of
554  the ipfw version. After the properly extraction of all data, that depends
555  by the ipfw version used, new structures are filled and then the dummynet
556  config_link() function is properly called. Note that the 7.2 version does
557  not support some parameter as burst or delay profile.
558- IP_DUMMYNET_GET: The get command should send to the ipfw the correct buffer
559  depending of its version. There are two function that build the
560  corrected buffer, ip_dummynet_get7() and ip_dummynet_get8(). These
561  functions reproduce the buffer exactly as 'ipfw' expect. The only difference
562  is that the weight parameter for a queue is no longer sent by dummynet and so
563  it is set to 0.
564  Moreover, because of the internal structure has changed, the bucket size
565  of a queue could not be correct, because now all flowset share the hash
566  table.
567  If the version of ipfw is wrong, the output could be senseless or truncated,
568  but the application should not crash.
569
570IP_FW_GLUE.C
571------------
572The ipfw binary also is used to add rules to FreeBSD firewall. Because of the
573struct ip_fw is changed from FreeBsd 7.2 to FreeBSD 8, it is necessary
574to write some glue code to allow use ipfw from FreeBSD 7.2 with the kernel
575provided with FreeBSD 8.
576This file contains two functions to convert a rule from FreeBSD 7.2 format to
577FreeBSD 8 format, and viceversa.
578The conversion should be done when a rule passes from userspace to kernel space
579and viceversa.
580I have to modify the ip_fw2.c file to manage these two case, and added a
581variable (is7) to store the ipfw version used, using an approach like the
582previous file:
583- when a new rule is added (option IP_FW_ADD) the is7 variable is set if the
584  size of the rule received correspond to FreeBSD 7.2 ipfw version. If so, the
585  rule is converted to version 8 calling the function convert_rule_to_8().
586  Moreover, after the insertion of the rule, the rule is now reconverted to
587  version 7 because the ipfw binary will print it.
588- when the user request a list of rules (option IP_FW_GET) the is7 variable
589  should be set correctly because we suppose that a configure command was done,
590  else we suppose that the FreeBSD version is 8. The function ipfw_getrules()
591  in ip_fw2.c file return all rules, eventually converted to version 7 (if
592  the is7 is set) to the ipfw binary.
593The conversion of a rule is quite simple. The only difference between the
594two structures (struct ip_fw) is that in the new there is a new field
595(uint32_t id). So, I copy the entire rule in a buffer and the copy the rule in
596the right position in the new (or old) struct. The size of commands are not
597changed, and the copy is done into a cicle.
598
599How to configure dummynet
600=========================
601It is possible to configure dummynet through two main commands:
602'ipfw pipe' and 'ipfw queue'.
603To allow compatibility with old version, it is possible configure dummynet
604using the old command syntax. Doing so, obviously, it is only possible to
605configure a FIFO scheduler or a wf2q+ scheduler.
606A new command, 'ipfw pipe x config sched <type>' is supported to add a new
607scheduler to the system.
608
609- ipfw pipe x config ...
610  create a new pipe with the link parameters
611  create a new scheduler fifo (x + offset)
612  create a new flowset fifo (x + offset)
613  the mask is eventually stored in the FIFO scheduler
614
615- ipfw queue y config pipe x ...
616  create a new flowset y linked to sched x.
617    The type of flowset depends by the specified scheduler.
618    If the scheduler does not exist, this flowset is inserted in a special
619    list and will be not active.
620    If pipe x exists and sched does not exist, a new wf2q+ scheduler is
621    created and the flowset will be linked to this new scheduler (this is
622    done for compatibility with old syntax).
623
624- ipfw pipe x config sched <type> ...
625  create a new scheduler x of type <type>.
626  Search into the flowset unlinked list if there are some flowset that
627  should be linked with this new scheduler.
628
629- ipfw pipe x delete
630  delete the pipe x
631  delete the scheduler fifo (x + offset)
632  delete the scheduler x
633  delete the flowset fifo (x + offset)
634
635- ipfw queue x delete
636  delete the flowset x
637
638- ipfw sched x delete ///XXX
639  delete the scheduler x
640
641Follow now some examples to how configure dummynet:
642- Ex1:
643  ipfw pipe 10 config bw 1M delay 15 // create a pipe with band and delay
644                                        A FIFO flowset and scheduler is
645                                        also created
646  ipfw queue 5 config pipe 10 weight 56 // create a flowset. This flowset
647                                           will be of wf2q+ because a pipe 10
648                                           exists. Moreover, the wf2q+
649                                           scheduler is created now.
650- Ex2:
651  ipfw queue 5 config pipe 10 weight 56 // Create a flowset. Scheduler 10
652                                           does not exist, so this flowset
653                                           is inserted in the unlinked
654                                           flowset list.
655  ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler.
656                               Because of a flowset with 'pipe 10' exists,
657                               a wf2q+ scheduler is created now and that
658                               flowset is linked with this sceduler.
659
660- Ex3:
661  ipfw pipe 10 config bw...    // Create a pipe, a FIFO flowset and scheduler.
662  ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to
663                                  pipe 10
664  ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5. This flowset
665                                           will belong to scheduler 10 and
666                                           it is of type RR
667
668- Ex4:
669  ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to
670                                  pipe 10 (not exist yet)
671  ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler.
672  ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5.This flowset
673                                           will belong to scheduler 10 and
674                                           it is of type RR
675  ipfw pipe 10 config sched wf2q+ // Modify the type of scheduler 10. It
676                                     becomes a wf2q+ scheduler.
677                                     When a new packet of flowset 5 arrives,
678                                     the flowset 5 becomes to wf2q+ type.
679
680How to implement a new scheduler
681================================
682In dummynet, a scheduler algorithm is represented by two main structs, some
683functions and other minor structs.
684- A struct dn_sch_xyz (where xyz is the 'type' of scheduler algorithm
685  implemented) contains data relative to scheduler, as global parameter that
686  are common to all instances of the scheduler
687- A struct dn_sch_inst_xyz contains data relative to a single scheduler
688  instance, as local status variable depending for example by flows that
689  are linked with the scheduler, and so on.
690To add a scheduler to dummynet, the user should type a command like:
691'ipfw pipe x config sched <type> [mask ... ...]'
692This command creates a new struct dn_sch_xyz of type <type>, and
693store the optional parameter in that struct.
694
695The parameter mask determines how many scheduler instance of this
696scheduler may exist. For example, it is possible to divide traffic
697depending on the source port (or destination, or ip address...),
698so that every scheduler instance act as an independent scheduler.
699If the mask is not set, all traffic goes to the same instance.
700
701When a packet arrives to a scheduler, the system search the corrected
702scheduler instance, and if it does not exist it is created now (the
703struct dn_sch_inst_xyz is allocated by the system, and the scheduler
704fills the field correctly). It is a task of the scheduler to create
705the struct that contains all queues for a scheduler instance.
706Dummynet provides some function to create an hash table to store
707queues, but the schedule algorithm can choice the own struct.
708
709To link a flow to a scheduler, the user should type a command like:
710'ipfw queue z config pipe x [mask... ...]'
711
712This command creates a new 'dn_fs' struct that will be inserted
713in the system.  If the scheduler x exists, this flowset will be
714linked to that scheduler and the flowset type become the same as
715the scheduler type. At this point, the function create_alg_fs_xyz()
716is called to allow store eventually parameter for the flowset that
717depend by scheduler (for example the 'weight' parameter for a wf2q+
718scheduler, or some priority...). A parameter mask can be used for
719a flowset. If the mask parameter is set, the scheduler instance can
720separate packet according to its flow id (src and dst ip, ports...)
721and assign it to a separate queue. This is done by the scheduler,
722so it can ignore the mask if it wants.
723
724See now the two main structs:
725struct dn_sch_xyz {
726    struct gen g; /* important the name g */
727    /* global params */
728};
729struct dn_sch_inst_xyz {
730    struct gen g; /* important the name g */
731    /* params of the instance */
732};
733It is important to embed the struct gen as first parameter. The struct gen
734contains some values that the scheduler instance must fill (the 'type' of
735scheduler, the 'len' of the struct...)
736The function create_scheduler_xyz() should be implemented to initialize global
737parameters in the first struct, and if memory allocation is done it is
738mandatory to implement the delete_scheduler_template() function to free that
739memory.
740The function create_scheduler_instance_xyz() must be implemented even if the
741scheduler instance does not use extra parameters. In this function the struct
742gen fields must be filled with corrected infos. The
743delete_scheduler_instance_xyz() function must bu implemented if the instance
744has allocated some memory in the previous function.
745
746To store data belonging to a flowset the follow struct is used:
747struct alg_fs_xyz {
748    struct gen g;
749    /* fill correctly the gen struct
750     g.subtype = DN_XYZ;
751     g.len = sizeof(struct alg_fs_xyz)
752     ...
753     */
754    /* params for the flow */
755};
756The create_alg_fs_xyz() function is mandatory, because it must fill the struct
757gen, but the delete_alg_fs_xyz() is mandatory only if the previous function
758has allocated some memory.
759
760A struct dn_queue contains packets belonging to a queue and some statistical
761data. The scheduler could have to store data in this struct, so it must define
762a dn_queue_xyz struct:
763struct dn_queue_xyz {
764    struct dn_queue q;
765    /* parameter for a queue */
766}
767
768All structures are allocated by the system. To do so, the scheduler must
769set the size of its structs in the scheduler descriptor:
770scheduler_size:     sizeof(dn_sch_xyz)
771scheduler_i_size:   sizeof(dn_sch_inst_xyz)
772flowset_size:       sizeof(alg_fs_xyz)
773queue_size:         sizeof(dn_queue_xyz);
774The scheduler_size could be 0, but other struct must have at least a struct gen.
775
776
777After the definition of structs, it is necessary to implement the
778scheduler functions.
779
780- int (*config_scheduler)(char *command, void *sch, int reconfigure);
781    Configure a scheduler, or reconfigure if 'reconfigure' == 1.
782    This function performs additional allocation and initialization of global
783    parameter for this scheduler.
784    If memory is allocated here, the delete_scheduler_template() function
785    should be implemented to remove this memory.
786- int (*delete_scheduler_template)(void* sch);
787    Delete a scheduler template. This function is mandatory if the scheduler
788    uses extra data respect the struct dn_sch.
789- int (*create_scheduler_instance)(void *s);
790    Create a new scheduler instance. The system allocate the necessary memory
791    and the schedulet can access it using the 's' pointer.
792    The scheduler instance stores all queues, and to do this can use the
793    hash table provided by the system.
794- int (*delete_scheduler_instance)(void *s);
795    Delete a scheduler instance. It is important to free memory allocated
796    by create_scheduler_instance() function. The memory allocated by system
797    is freed by the system itself. The struct contains all queue also has
798    to be deleted.
799- int (*enqueue)(void *s, struct gen *f, struct mbuf *m,
800                 struct ipfw_flow_id *id);
801    Called when a packet arrives. The packet 'm' belongs to the scheduler
802    instance 's', has a flowset 'f' and the flowid 'id' has already been
803    masked. The enqueue() must call dn_queue_packet(q, m) function to really
804    enqueue packet in the queue q. The queue 'q' is chosen by the scheduler
805    and if it does not exist should be created calling the dn_create_queue()
806    function. If the schedule want to drop the packet, it must call the
807    dn_drop_packet() function and then return 1.
808- struct mbuf * (*dequeue)(void *s);
809    Called when the timer expires (or when a packet arrives and the scheduler
810    instance is idle).
811    This function is called when at least a packet can be send out. The
812    scheduler choices the packet and returns it; if no packet are in the
813    schedulerinstance, the function must return NULL.
814    Before return a packet, it is important to call the function
815    dn_return_packet() to update some statistic of the queue and update the
816    queue counters.
817- int (*drain_queue)(void *s, int flag);
818    The system request to scheduler to delete all queues that is not using
819    to free memory. The flag parameter indicate if a queue must be deleted
820    even if it is active.
821
822- int (*create_alg_fs)(char *command, struct gen *g, int reconfigure);
823    It is called when a flowset is linked with a scheduler. This is done
824    when the scheduler is defined, so we can know the type of flowset.
825    The function initialize the flowset paramenter parsing the command
826    line. The parameter will be stored in the g struct that have the right
827    size allocated by the system. If the reconfigure flag is set, it means
828    that the flowset is reconfiguring
829- int (*delete_alg_fs)(struct gen *f);
830    It is called when a flowset is deleting. Must remove the memory allocate
831    by the create_alg_fs() function.
832
833- int (*create_queue_alg)(struct dn_queue *q, struct gen *f);
834    Called when a queue is created. The function should link the queue
835    to the struct used by the scheduler instance to store all queues.
836- int (*delete_queue_alg)(struct dn_queue *q);
837    Called when a queue is deleting. The function should remove extra data
838    and update the struct contains all queues in the scheduler instance.
839
840The struct scheduler represent the scheduler descriptor that is passed to
841dummynet when a scheduler module is loaded.
842This struct contains the type of scheduler, the length of all structs and
843all function pointers.
844If a function is not implemented should be initialize to NULL. Some functions
845are mandatory, other are mandatory if some memory should be freed.
846Mandatory functions:
847- create_scheduler_instance()
848- enqueue()
849- dequeue()
850- create_alg_fs()
851- drain_queue()
852Optional functions:
853- config_scheduler()
854- create_queue_alg()
855Mandatory functions if the corresponding create...() has allocated memory:
856- delete_scheduler_template()
857- delete_scheduler_instance()
858- delete_alg_fs()
859- delete_queue_alg()
860
861