xref: /freebsd/sys/netpfil/ipfw/dummynet.txt (revision 51e235148a4becba94e824a44bd69687644a7f56)
1#
2#
3
4Notes on the internal structure of dummynet (2010 version)
5by Riccardo Panicucci and Luigi Rizzo
6Work supported by the EC project ONELAB2
7
8
9*********
10* INDEX *
11*********
12Implementation of new dummynet
13    Internal structure
14    Files
15Packet arrival
16    The reconfiguration routine
17dummynet_task()
18Configuration
19    Add a pipe
20    Add a scheduler
21    Add a flowset
22Listing object
23Delete of object
24    Delete a pipe
25    Delete a flowset
26    Delete a scheduler
27Compatibility with FreeBSD7.2 and FreeBSD 8 ipfw binary
28    ip_dummynet_glue.c
29    ip_fw_glue.c
30How to configure dummynet
31How to implement a new scheduler
32
33
34
35OPEN ISSUES
36------------------------------
3720100131 deleting RR causes infinite loop
38	presumably in the rr_free_queue() call -- seems to hang
39	forever when deleting a live flow
40------------------------------
41
42Dummynet is a traffic shaper and network emulator. Packets are
43selected by an external filter such as ipfw, and passed to the emulator
44with a tag such as "pipe 10" or "queue 5" which tells what to
45do with the packet. As an example
46
47	ipfw add queue 5 icmp from 10.0.0.2 to all
48
49All packets with the same tag belong to a "flowset", or a set
50of flows which can be further partitioned according to a mask.
51Flowsets are then passed to a scheduler for processing. The
52association of flowsets and schedulers is configurable e.g.
53
54	ipfw queue 5 config sched 10 weight 3 flow_mask xxxx
55	ipfw queue 8 config sched 10 weight 1 ...
56	ipfw queue 3 config sched 20 weight 1 ...
57
58"sched 10" represents one or more scheduler instances,
59selected through a mask on the 5-tuple itself.
60
61	ipfw sched 20 config type FIFO sched_mask yyy ...
62
63There are in fact two masks applied to each packet:
64+ the "sched_mask" sends packets arriving to a scheduler_id to
65  one of many instances.
66+ the "flow_mask" together with the flowset_id is used to
67  collect packets into independent flows on each scheduler.
68
69As an example, we can have
70	ipfw queue 5 config sched 10 flow_mask src-ip 0x000000ff
71	ipfw sched 10 config type WF2Q+ sched_mask src-ip 0xffffff00
72
73means that sched 10 will have one instance per /24 source subnet,
74and within that, each individual source will be a flow.
75
76Internal structure
77-----------------
78Dummynet-related data is split into several data structures,
79part of them constituting the userland-kernel API, and others
80specific to the kernel.
81NOTE: for up-to-date details please look at the relevant source
82	headers (ip_dummynet.h, ip_dn_private.h, dn_sched.h)
83
84USERLAND-KERNEL API	(ip_dummynet.h)
85
86    struct dn_link:
87	contains data about the physical link such as
88	bandwidth, delay, burst size;
89
90    struct dn_fs:
91	describes a flowset, i.e. a template for queues.
92	Main parameters are the scheduler we attach to, a flow_mask,
93	buckets, queue size, plr, weight, and other scheduler-specific
94	parameters.
95
96    struct dn_flow
97	contains information on a flow, including masks and
98	statistics
99
100    struct dn_sch:
101	defines a scheduler (and a link attached to it).
102	Parameters include scheduler type, sched_mask, number of
103	buckets, and possibly other scheduler-specific parameters,
104
105    struct dn_profile:
106	fields to simulate a delay profile
107
108
109KERNEL REPRESENTATION	(ip_dn_private.h)
110
111    struct mq
112	a queue of mbufs with head and tail.
113
114    struct dn_queue
115	individual queue of packets, created by a flowset using
116	flow_mask and attached to a scheduler instance selected
117	through sched_mask.
118	A dn_queue has a pointer to the dn_fsk (which in turn counts
119	how many queues point to it), a pointer to the
120	dn_sch_inst it attaches to, and is in a hash table in the
121	flowset. scheduler instances also should store queues in
122	their own containers used for scheduling (lists, trees, etc.)
123	CREATE: done on packet arrivals when a flow matches a flowset.
124	DELETE: done only when deleting the parent dn_sch_inst
125		or draining memory.
126
127    struct dn_fsk
128	includes a dn_fs; a pointer to the dn_schk; a link field
129	for the list of dn_fsk attached to the same scheduler,
130	or for the unlinked list;
131	a refcount for the number of queues pointing to it;
132	The dn_fsk is in a hash table, fshash.
133	CREATE: done on configuration commands.
134	DELETE: on configuration commands.
135
136    struct dn_sch_inst
137	a scheduler instance, created from a dn_schk applying sched_mask.
138	Contains a delay line, a reference to the parent, and scheduler-
139	specific info.  Both dn_sch_inst and its delay line can be in the
140	evheap if they have events to be processed.
141	CREATE: created from a dn_schk applying sched_mask
142	DELETE: configuration command delete a scheduler which in turn
143		sweeps the hash table of instances deleting them
144
145    struct dn_schk
146	includes dn_sch, dn_link, a pointer to dn_profile,
147	a hash table of dn_sch_inst, a list of dn_fsk
148	attached to it.
149	CREATE: configuration command. If there are flowsets that
150		refer to this number, they are attached and moved
151		to the hash table
152	DELETE: manual, see dn_sch_inst
153
154
155	fshash                            schedhash
156      +---------------+   sched        +--------------+
157      |      sched-------------------->|      NEW_SCHK|
158  -<----*sch_chain    |<-----------------*fsk_list    |
159      |NEW_FSK        |<----.          | [dn_link]    |
160      +---------------+     |          +--------------+
161      |qht (hash)     |     |          |  siht(hash)  |
162      |   [dn_queue]  |     |          |  [dn_si]     |
163      |   [dn_queue]  |     |          |  [dn_si]     |
164      |     ...       |     |          |   ...        |
165      |   +--------+  |     |          | +---------+  |
166      |   |dn_queue|  |     |          | |dn_si    |  |
167      |  |    fs *----------'          | |         |  |
168      |  |    si *---------------------->|         |  |
169      |  +---------+  |                | +---------+  |
170      +---------------+                +--------------+
171
172The following global data structures contain all
173schedulers and flowsets.
174
175- schedhash[x]: contains all scheduler templates in the system.
176	Looked up only on manual configurations, where flowsets
177	are attached to matching schedulers.
178	We have one entry per 'sched X config' command
179	(plus one for each 'pipe X config').
180
181- fshash[x]: contains all flowsets.
182	We do a lookup on this for each packet.
183	We have one entry for each 'queue X config'
184	(plus one for each 'pipe X config').
185
186Additionally, a list that contains all unlinked flowset:
187- fsu:  contains flowset that are not linked with any scheduler.
188	Flowset are put in this list when they refer to a non
189	existing scheduler.
190	We don't need an efficient data structure as we never search
191	here on a packet arrivals.
192
193Scheduler instances and the delay lines associated with each scheduler
194instance need to be woken up at certain times. Because we have many
195such objects, we keep them in a priority heap (system_heap).
196
197Almost all objects in this implementation are preceded by a structure
198(struct dn_id) which makes it easier to identify them.
199
200
201Files
202-----
203The dummynet code is split in several files.
204All kernel code is in sys/netpfil/ipfw except ip_dummynet.h
205All userland code is in sbin/ipfw.
206Files are
207- sys/netpfil/ip_dummynet.h defines the kernel-userland API
208- ip_dn_private.h contains the kernel-specific APIs
209  and data structures
210- dn_sched.h defines the scheduler API
211- ip_dummynet.c cointains module glue and sockopt handlers, with all
212  functions to configure and list objects.
213- ip_dn_io.c contains the functions directly related to packet processing,
214  and run in the critical path. It also contains some functions
215  exported to the schedulers.
216- dn_heap.[ch] implement a binary heap and a generic hash table
217- dn_sched_* implement the various scheduler modules
218
219- dummynet.c is the file used to implement the user side of dummynet.
220  It contains the function to parsing command line, and functions to
221  show the output of dummynet objects.
222Moreover, there are two new file (ip_dummynet_glue.c and ip_fw_glue.c) that
223are used to allow compatibility with the "ipfw" binary from FreeBSD 7.2 and
224FreeBSD 8.
225
226LOCKING
227=======
228At the moment the entire processing occurs under a single lock
229which is expected to be acquired in exclusive mode
230DN_BH_WLOCK() / DN_BH_WUNLOCK().
231
232In perspective we aim at the following:
233- the 'busy' flag, 'pending' list and all structures modified by packet
234  arrivals and departures are protected by the BH_WLOCK.
235  This is normally acquired in exclusive mode by the packet processing
236  functions for short sections of code (exception -- the timer).
237  If 'busy' is not set, we can do regular packet processing.
238  If 'busy' is set, no pieces can be accessed.
239  We must enqueue the packet on 'pending' and return immediately.
240
241- the 'busy' flag is set/cleared by long sections of code as follows:
242	UH_WLOCK(); KASSERT(busy == 0);
243	BH_WLOCK(); busy=1; BH_WUNLOCK();
244	... do processing ...
245	BH_WLOCK(); busy=0; drain_queue(pending); BH_WUNLOCK();
246	UH_WUNLOCK();
247  this normally happens when the upper half has something heavy
248  to do. The prologue and epilogue are not in the critical path.
249
250- the main containers (fshash, schedhash, ...) are protected by
251  UH_WLOCK.
252
253Packet processing
254=================
255A packet enters dummynet through dummynet_io(). We first lookup
256the flowset number in fshash using dn_ht_find(), then find the scheduler
257instance using ipdn_si_find(), then possibly identify the correct
258queue with ipdn_q_find().
259If successful, we call the scheduler's enqueue function(), and
260if needed start I/O on the link calling serve_sched().
261If the packet can be returned immediately, this is done by
262leaving *m0 set. Otherwise, the packet is absorbed by dummynet
263and we simply return, possibly with some appropriate error code.
264
265Reconfiguration
266---------------
267Reconfiguration is the complex part of the system because we need to
268keep track of the various objects and containers.
269At the moment we do not use reference counts for objects so all
270processing must be done under a lock.
271
272The main entry points for configuration is the ip_dn_ctl() handler
273for the IP_DUMMYNET3 sockopt (others are provided only for backward
274compatibility). Modifications to the configuration call do_config().
275The argument is a sequence of blocks each starting with a  struct dn_id
276which specifies its content.
277The first dn_id must contain as obj.id the DN_API_VERSION
278The obj.type is DN_CMD_CONFIG (followed by actual objects),
279DN_CMD_DELETE (with the correct subtype and list of objects), or
280DN_CMD_FLUSH.
281
282DN_CMD_CONFIG is followed by objects to add/reconfigure. In general,
283if an object already exists it is reconfigured, otherwise it is
284created in a way that keeps the structure consistent.
285We have the following objects in the system, normally numbered with
286an identifier N between 1 and 65535. For certain objects we have
287"shadow" copies numbered I+NMAX and I+ 2*NMAX which are used to
288implement certain backward compatibility features.
289
290In general we have the following linking
291
292  TRADITIONAL DUMMYNET QUEUES "queue N config ... pipe M ..."
293	corresponds to a dn_fs object numbered N
294
295  TRADITIONAL DUMMYNET PIPES "pipe N config ..."
296	dn_fs N+2*NMAX --> dn_sch N+NMAX type FIFO --> dn_link N+NMAX
297
298  GENERIC SCHEDULER "sched N config ... "
299	[dn_fs N+NMAX] --> dn_sch N --> dn_link N
300	The flowset N+NMAX is created only if the scheduler is not
301	of type MULTIQUEUE.
302
303  DELAY PROFILE	"pipe N config profile ..."
304	it is always attached to an existing dn_link N
305
306Because traditional dummynet pipes actually configure both a
307'standalone' instance and one that can be used by queues,
308we do the following:
309
310    "pipe N config ..." configures:
311	dn_sched N type WF2Q+
312	dn_sched N+NMAX type FIFO
313	dn_fs N+2NMAX attached to dn_sched N+NMAX
314	dn_pipe N
315	dn_pipe N+NMAX
316
317    "queue N config" configures
318	dn_fs N
319
320    "sched N config" configures
321	dn_sched N type as desired
322	dn_fs N+NMAX attached to dn_sched N
323
324
325dummynet_task()
326===============
327The dummynet_task() function is the main dummynet processing function and is
328called every tick. This function first calculate the new current time, then
329it checks if it is the time to wake up object from the system_heap comparing
330the current time and the key of the heap. Two types of object (really the
331heap contains pointer to objects) are in the
332system_heap:
333
334- scheduler instance: if a scheduler instance is waked up, the dequeue()
335  function is called until it has credit. If the dequeue() returns packets,
336  the scheduler instance is inserted in the heap with a new key depending of
337  the data that will be send out. If the scheduler instance remains with
338  some credit, it means that is hasn't other packet to send and so the
339  instance is no longer inserted in the heap.
340
341  If the scheduler instance extracted from the heap has the DELETE flag set,
342  the dequeue() is not called and the instance is destroyed now.
343
344- delay line: when extracting a delay line, the function transmit_event() is
345  called to send out packet from delay line.
346
347  If the scheduler instance associated with this delay line doesn't exists,
348  the delay line will be delete now.
349
350Configuration
351=============
352To create a pipe, queue or scheduler, the user should type commands like:
353"ipfw pipe x config"
354"ipfw queue y config pipe x"
355"ipfw pipe x config sched <type>"
356
357The userland side of dummynet will prepare a buffer contains data to pass to
358kernel side.
359The buffer contains all struct needed to configure an object. In more detail,
360to configure a pipe all three structs (dn_link, dn_sch, dn_fs) are needed,
361plus the delay profile struct if the pipe has a delay profile.
362
363If configuring a scheduler only the struct dn_sch is wrote in the buffer,
364while if configuring a flowset only the dn_fs struct is wrote.
365
366The first struct in the buffer contains the type of command request, that is
367if it is configuring a pipe, a queue, or a scheduler. Then there are structs
368need to configure the object, and finally there is the struct that mark
369the end of the buffer.
370
371To support the insertion of pipe and queue using the old syntax, when adding
372a pipe it's necessary to create a FIFO flowset and a FIFO scheduler, which
373have a number x + DN_PIPEOFFSET.
374
375Add a pipe
376----------
377A pipe is only a template for a link.
378If the pipe already exists, parameters are updated. If a delay profile exists
379it is deleted and a new one is created.
380If the pipe doesn't exist a new one is created. After the creation, the
381flowset unlinked list is scanned to see if there are some flowset that would
382be linked with this pipe. If so, these flowset will be of wf2q+ type (for
383compatibility) and a new wf2q+ scheduler is created now.
384
385Add a scheduler
386---------------
387If the scheduler already exists, and the type and the mask are the same, the
388scheduler is simply reconfigured calling the config_scheduler() scheduler
389function with the RECONFIGURE flag active.
390If the type or the mask differ, it is necessary to delete the old scheduler
391and create a new one.
392If the scheduler doesn't exists, a new one is created. If the scheduler has
393a mask, the hash table is created to store pointers to scheduler instances.
394When a new scheduler is created, it is necessary to scan the unlinked
395flowset list to search eventually flowset that would be linked with this
396scheduler number. If some are found, flowsets became of the type of this
397scheduler and they are configured properly.
398
399Add a flowset
400-------------
401Flowset pointers are store in the system in two list. The unlinked flowset list
402contains all flowset that aren't linked with a scheduler, the flowset list
403contains flowset linked to a scheduler, and so they have a type.
404When adding a new flowset, first it is checked if the flowset exists (that is,
405it is in the flowset list) and if it doesn't exists a new flowset is created
406and added to unlinked flowset list if the scheduler which the flowset would be
407linked doesn't exists, or added in the flowset list and configured properly if
408the scheduler exists. If the flowset (before to be created) was in the
409unlinked flowset list, it is removed and deleted, and then recreated.
410If the flowset exists, to allow reconfiguration of this flowset, the
411scheduler number and types must match with the one in memory. If this isn't
412so, the flowset is deleted and a new one will be created. Really, the flowset
413it isn't deleted now, but it is removed from flowset list and it will be
414deleted later because there could be some queues that are using it.
415
416Listing of object
417=================
418The user can request a list of object present in dummynet through the command
419"ipfw [-v] pipe|queue [x] list|show"
420The kernel side of dummynet send a buffer to user side that contains all
421pipe, all scheduler, all flowset, plus all scheduler instances and all queues.
422The dummynet user land will format the output and show only the relevant
423information.
424The buffer sent start with all pipe from the system. The entire struct dn_link
425is passed, except the delay_profile struct that is useless in user space.
426After pipes, all flowset are wrote in the buffer. The struct contains
427scheduler flowset specific data is linked with the flowset writing the
428'obj' id of the extension into the 'alg_fs' pointer.
429Then schedulers are wrote. If a scheduler has one or more scheduler instance,
430these are linked to the parent scheduler writing the id of the parent in the
431'ptr_sched' pointer. If a scheduler instance has queues, there are wrote in
432the buffer and linked thorugh the 'obj' and 'sched_inst' pointer.
433Finally, flowsets in the unlinked flowset list  are write in the buffer, and
434then a struct gen in saved in the buffer to mark the last struct in the buffer.
435
436
437Delete of object
438================
439An object is usually removed by user through a command like
440"ipfw pipe|queue x delete". XXX sched?
441ipfw pass to the kernel a struct gen that contains the type and the number
442of the object to remove
443
444Delete of pipe x
445----------------
446A pipe can be deleted by the user through the command 'ipfw pipe x delete'.
447To delete a pipe, the pipe is removed from the pipe list, and then deleted.
448Also the scheduler associated with this pipe should be deleted.
449For compatibility with old dummynet syntax, the associated FIFO scheduler and
450FIFO flowset must be deleted.
451
452Delete of flowset x
453-------------------
454To remove a flowset, we must be sure that is no longer referenced by any object.
455If the flowset to remove is in the unlinked flowset list, there is not any
456issue, the flowset can be safely removed calling a free() (the flowset
457extension is not yet created if the flowset is in this list).
458If the flowset is in the flowset list, first we remove from it so new packet
459are discarded when arrive. Next, the flowset is marked as delete.
460Now we must check if some queue is using this flowset.
461To do this, a counter (active_f) is provided. This counter indicate how many
462queues exist using this flowset.
463The active_f counter is automatically incremented when a queue is created
464and decremented when a queue is deleted.
465If the counter is 0, the flowset can be safely deleted, and the delete_alg_fs()
466scheduler function is called before deallocate memory.
467If the counter is not 0, the flowset remain in memory until the counter become
468zero. When a queue is delete (by dn_delete_queue() function) it is checked if
469the linked flowset is deleting and if so the counter is decrementing. If the
470counter reaches 0, the flowset is deleted.
471The deletion of a queue can be done only by the scheduler, or when the scheduler
472is destroyed.
473
474Delete of scheduler x
475---------------------
476To delete a scheduler we must be sure that any scheduler instance of this type
477are in the system_heap. To do so, a counter (inst_counter) is provided.
478This counter is managed by the system: it is incremented every time it is
479inserted in the system_heap, and decremented every time it is extracted from it.
480To delete the scheduler, first we remove it from the scheduler list, so new
481packet are discarded when they arrive, and mark the scheduler as deleting.
482
483If the counter is 0, we can remove the scheduler safely calling the
484really_deletescheduler() function. This function will scan all scheduler
485instances and call the delete_scheduler_instance() function that will delete
486the instance. When all instance are deleted, the scheduler template is
487deleted calling the delete_scheduler_template(). If the delay line associate
488with the scheduler is empty, it is deleted now, else it will be deleted when
489it will became empy.
490If the counter was not 0, we wait for it. Every time the dummynet_task()
491function extract a scheduler from the system_heap, the counter is decremented.
492If the scheduler has the delete flag enabled the dequeue() is not called and
493delete_scheduler_instance() is called to delete the instance.
494Obviously this scheduler instance is no longer inserted in the system_heap.
495If the counter reaches 0, the delete_scheduler_template() function is called
496all memory is released.
497NOTE: Flowsets that belong to this scheduler are not deleted, so if a new
498      scheduler with the same number is inserted will use these flowsets.
499      To do so, the best approach would be insert these flowset in the
500      unlinked flowset list, but doing this now will be very expensive.
501      So flowsets will remain in memory and linked with a scheduler that no
502      longer exists until a packet belonging to this flowset arrives. When
503      this packet arrives, the reconfigure() function is called because the
504      generation number mismatch with one contains in the flowset and so
505      the flowset will be moved into the flowset unlinked list, or will be
506      linked with the new scheduler if a new one was created.
507
508
509COMPATIBILITY WITH FREEBSD 7.2 AND FREEBSD 8 'IPFW' BINARY
510==========================================================
511Dummynet is not compatible with old ipfw binary because internal structs are
512changed. Moreover, the old ipfw binary is not compatible with new kernels
513because the struct that represents a firewall rule has changed. So, if a user
514install a new kernel on a FreeBSD 7.2, the ipfw (and possibly many other
515commands) will not work.
516New dummynet uses a new socket option: IP_DUMMYNET3, used for both set and get.
517The old option can be used to allow compatibility with the 'ipfw' binary of
518older version (tested with 7.2 and 8.0) of FreeBSD.
519Two file are provided for this purpose:
520- ip_dummynet_glue.c translates old dummynet requests to the new ones,
521- ip_fw_glue.c converts the rule format between 7.2 and 8 versions.
522Let see in detail these two files.
523
524IP_DUMMYNET_GLUE.C
525------------------
526The internal structs of new dummynet are very different from the original.
527Because of there are some difference from between dummynet in FreeBSD 7.2 and
528dummynet in FreeBSD 8 (the FreeBSD 8 version includes support to pipe delay
529profile and burst option), I have to include both header files. I copied
530the revision 191715 (for version 7.2) and the revision 196045 (for version 8)
531and I appended a number to each struct to mark them.
532
533The main function of this file is ip_dummynet_compat() that is called by
534ip_dn_ctl() when it receive a request of old socket option.
535
536A global variabile ('is7') store the version of 'ipfw' that FreeBSD is using.
537This variable is set every time a request of configuration is done, because
538with this request we receive a buffer of which size depending of ipfw version.
539Because of in general the first action is a configuration, this variable is
540usually set accordly. If the first action is a request of listing of pipes
541or queues, the system cannot know the version of ipfw, and we suppose that
542version 7.2 is used. If version is wrong, the output can be senseless, but
543the application should not crash.
544
545There are four request for old dummynet:
546- IP_DUMMYNET_FLUSH: the flush options have no parameter, so simply the
547  dummynet_flush() function is called;
548- IP_DUMMYNET_DEL: the delete option need to be translate.
549  It is only necessary to extract the number and the type of the object
550  (pipe or queue) to delete from the buffer received and build a new struct
551  gen contains the right parameters, then call the delete_object() function;
552- IP_DUMMYNET_CONFIGURE: the configure command receive a buffer depending of
553  the ipfw version. After the properly extraction of all data, that depends
554  by the ipfw version used, new structures are filled and then the dummynet
555  config_link() function is properly called. Note that the 7.2 version does
556  not support some parameter as burst or delay profile.
557- IP_DUMMYNET_GET: The get command should send to the ipfw the correct buffer
558  depending of its version. There are two function that build the
559  corrected buffer, ip_dummynet_get7() and ip_dummynet_get8(). These
560  functions reproduce the buffer exactly as 'ipfw' expect. The only difference
561  is that the weight parameter for a queue is no longer sent by dummynet and so
562  it is set to 0.
563  Moreover, because of the internal structure has changed, the bucket size
564  of a queue could not be correct, because now all flowset share the hash
565  table.
566  If the version of ipfw is wrong, the output could be senseless or truncated,
567  but the application should not crash.
568
569IP_FW_GLUE.C
570------------
571The ipfw binary also is used to add rules to FreeBSD firewall. Because of the
572struct ip_fw is changed from FreeBsd 7.2 to FreeBSD 8, it is necessary
573to write some glue code to allow use ipfw from FreeBSD 7.2 with the kernel
574provided with FreeBSD 8.
575This file contains two functions to convert a rule from FreeBSD 7.2 format to
576FreeBSD 8 format, and viceversa.
577The conversion should be done when a rule passes from userspace to kernel space
578and viceversa.
579I have to modify the ip_fw2.c file to manage these two case, and added a
580variable (is7) to store the ipfw version used, using an approach like the
581previous file:
582- when a new rule is added (option IP_FW_ADD) the is7 variable is set if the
583  size of the rule received correspond to FreeBSD 7.2 ipfw version. If so, the
584  rule is converted to version 8 calling the function convert_rule_to_8().
585  Moreover, after the insertion of the rule, the rule is now reconverted to
586  version 7 because the ipfw binary will print it.
587- when the user request a list of rules (option IP_FW_GET) the is7 variable
588  should be set correctly because we suppose that a configure command was done,
589  else we suppose that the FreeBSD version is 8. The function ipfw_getrules()
590  in ip_fw2.c file return all rules, eventually converted to version 7 (if
591  the is7 is set) to the ipfw binary.
592The conversion of a rule is quite simple. The only difference between the
593two structures (struct ip_fw) is that in the new there is a new field
594(uint32_t id). So, I copy the entire rule in a buffer and the copy the rule in
595the right position in the new (or old) struct. The size of commands are not
596changed, and the copy is done into a cicle.
597
598How to configure dummynet
599=========================
600It is possible to configure dummynet through two main commands:
601'ipfw pipe' and 'ipfw queue'.
602To allow compatibility with old version, it is possible configure dummynet
603using the old command syntax. Doing so, obviously, it is only possible to
604configure a FIFO scheduler or a wf2q+ scheduler.
605A new command, 'ipfw pipe x config sched <type>' is supported to add a new
606scheduler to the system.
607
608- ipfw pipe x config ...
609  create a new pipe with the link parameters
610  create a new scheduler fifo (x + offset)
611  create a new flowset fifo (x + offset)
612  the mask is eventually stored in the FIFO scheduler
613
614- ipfw queue y config pipe x ...
615  create a new flowset y linked to sched x.
616    The type of flowset depends by the specified scheduler.
617    If the scheduler does not exist, this flowset is inserted in a special
618    list and will be not active.
619    If pipe x exists and sched does not exist, a new wf2q+ scheduler is
620    created and the flowset will be linked to this new scheduler (this is
621    done for compatibility with old syntax).
622
623- ipfw pipe x config sched <type> ...
624  create a new scheduler x of type <type>.
625  Search into the flowset unlinked list if there are some flowset that
626  should be linked with this new scheduler.
627
628- ipfw pipe x delete
629  delete the pipe x
630  delete the scheduler fifo (x + offset)
631  delete the scheduler x
632  delete the flowset fifo (x + offset)
633
634- ipfw queue x delete
635  delete the flowset x
636
637- ipfw sched x delete ///XXX
638  delete the scheduler x
639
640Follow now some examples to how configure dummynet:
641- Ex1:
642  ipfw pipe 10 config bw 1M delay 15 // create a pipe with band and delay
643                                        A FIFO flowset and scheduler is
644                                        also created
645  ipfw queue 5 config pipe 10 weight 56 // create a flowset. This flowset
646                                           will be of wf2q+ because a pipe 10
647                                           exists. Moreover, the wf2q+
648                                           scheduler is created now.
649- Ex2:
650  ipfw queue 5 config pipe 10 weight 56 // Create a flowset. Scheduler 10
651                                           does not exist, so this flowset
652                                           is inserted in the unlinked
653                                           flowset list.
654  ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler.
655                               Because of a flowset with 'pipe 10' exists,
656                               a wf2q+ scheduler is created now and that
657                               flowset is linked with this sceduler.
658
659- Ex3:
660  ipfw pipe 10 config bw...    // Create a pipe, a FIFO flowset and scheduler.
661  ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to
662                                  pipe 10
663  ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5. This flowset
664                                           will belong to scheduler 10 and
665                                           it is of type RR
666
667- Ex4:
668  ipfw pipe 10 config sched rr // Create a scheduler of type RR, linked to
669                                  pipe 10 (not exist yet)
670  ipfw pipe 10 config bw... // Create a pipe, a FIFO flowset and scheduler.
671  ipfw queue 5 config pipe 10 weight 56 // Create a flowset 5.This flowset
672                                           will belong to scheduler 10 and
673                                           it is of type RR
674  ipfw pipe 10 config sched wf2q+ // Modify the type of scheduler 10. It
675                                     becomes a wf2q+ scheduler.
676                                     When a new packet of flowset 5 arrives,
677                                     the flowset 5 becomes to wf2q+ type.
678
679How to implement a new scheduler
680================================
681In dummynet, a scheduler algorithm is represented by two main structs, some
682functions and other minor structs.
683- A struct dn_sch_xyz (where xyz is the 'type' of scheduler algorithm
684  implemented) contains data relative to scheduler, as global parameter that
685  are common to all instances of the scheduler
686- A struct dn_sch_inst_xyz contains data relative to a single scheduler
687  instance, as local status variable depending for example by flows that
688  are linked with the scheduler, and so on.
689To add a scheduler to dummynet, the user should type a command like:
690'ipfw pipe x config sched <type> [mask ... ...]'
691This command creates a new struct dn_sch_xyz of type <type>, and
692store the optional parameter in that struct.
693
694The parameter mask determines how many scheduler instance of this
695scheduler may exist. For example, it is possible to divide traffic
696depending on the source port (or destination, or ip address...),
697so that every scheduler instance act as an independent scheduler.
698If the mask is not set, all traffic goes to the same instance.
699
700When a packet arrives to a scheduler, the system search the corrected
701scheduler instance, and if it does not exist it is created now (the
702struct dn_sch_inst_xyz is allocated by the system, and the scheduler
703fills the field correctly). It is a task of the scheduler to create
704the struct that contains all queues for a scheduler instance.
705Dummynet provides some function to create an hash table to store
706queues, but the schedule algorithm can choice the own struct.
707
708To link a flow to a scheduler, the user should type a command like:
709'ipfw queue z config pipe x [mask... ...]'
710
711This command creates a new 'dn_fs' struct that will be inserted
712in the system.  If the scheduler x exists, this flowset will be
713linked to that scheduler and the flowset type become the same as
714the scheduler type. At this point, the function create_alg_fs_xyz()
715is called to allow store eventually parameter for the flowset that
716depend by scheduler (for example the 'weight' parameter for a wf2q+
717scheduler, or some priority...). A parameter mask can be used for
718a flowset. If the mask parameter is set, the scheduler instance can
719separate packet according to its flow id (src and dst ip, ports...)
720and assign it to a separate queue. This is done by the scheduler,
721so it can ignore the mask if it wants.
722
723See now the two main structs:
724struct dn_sch_xyz {
725    struct gen g; /* important the name g */
726    /* global params */
727};
728struct dn_sch_inst_xyz {
729    struct gen g; /* important the name g */
730    /* params of the instance */
731};
732It is important to embed the struct gen as first parameter. The struct gen
733contains some values that the scheduler instance must fill (the 'type' of
734scheduler, the 'len' of the struct...)
735The function create_scheduler_xyz() should be implemented to initialize global
736parameters in the first struct, and if memory allocation is done it is
737mandatory to implement the delete_scheduler_template() function to free that
738memory.
739The function create_scheduler_instance_xyz() must be implemented even if the
740scheduler instance does not use extra parameters. In this function the struct
741gen fields must be filled with corrected infos. The
742delete_scheduler_instance_xyz() function must bu implemented if the instance
743has allocated some memory in the previous function.
744
745To store data belonging to a flowset the follow struct is used:
746struct alg_fs_xyz {
747    struct gen g;
748    /* fill correctly the gen struct
749     g.subtype = DN_XYZ;
750     g.len = sizeof(struct alg_fs_xyz)
751     ...
752     */
753    /* params for the flow */
754};
755The create_alg_fs_xyz() function is mandatory, because it must fill the struct
756gen, but the delete_alg_fs_xyz() is mandatory only if the previous function
757has allocated some memory.
758
759A struct dn_queue contains packets belonging to a queue and some statistical
760data. The scheduler could have to store data in this struct, so it must define
761a dn_queue_xyz struct:
762struct dn_queue_xyz {
763    struct dn_queue q;
764    /* parameter for a queue */
765}
766
767All structures are allocated by the system. To do so, the scheduler must
768set the size of its structs in the scheduler descriptor:
769scheduler_size:     sizeof(dn_sch_xyz)
770scheduler_i_size:   sizeof(dn_sch_inst_xyz)
771flowset_size:       sizeof(alg_fs_xyz)
772queue_size:         sizeof(dn_queue_xyz);
773The scheduler_size could be 0, but other struct must have at least a struct gen.
774
775
776After the definition of structs, it is necessary to implement the
777scheduler functions.
778
779- int (*config_scheduler)(char *command, void *sch, int reconfigure);
780    Configure a scheduler, or reconfigure if 'reconfigure' == 1.
781    This function performs additional allocation and initialization of global
782    parameter for this scheduler.
783    If memory is allocated here, the delete_scheduler_template() function
784    should be implemented to remove this memory.
785- int (*delete_scheduler_template)(void* sch);
786    Delete a scheduler template. This function is mandatory if the scheduler
787    uses extra data respect the struct dn_sch.
788- int (*create_scheduler_instance)(void *s);
789    Create a new scheduler instance. The system allocate the necessary memory
790    and the schedulet can access it using the 's' pointer.
791    The scheduler instance stores all queues, and to do this can use the
792    hash table provided by the system.
793- int (*delete_scheduler_instance)(void *s);
794    Delete a scheduler instance. It is important to free memory allocated
795    by create_scheduler_instance() function. The memory allocated by system
796    is freed by the system itself. The struct contains all queue also has
797    to be deleted.
798- int (*enqueue)(void *s, struct gen *f, struct mbuf *m,
799                 struct ipfw_flow_id *id);
800    Called when a packet arrives. The packet 'm' belongs to the scheduler
801    instance 's', has a flowset 'f' and the flowid 'id' has already been
802    masked. The enqueue() must call dn_queue_packet(q, m) function to really
803    enqueue packet in the queue q. The queue 'q' is chosen by the scheduler
804    and if it does not exist should be created calling the dn_create_queue()
805    function. If the schedule want to drop the packet, it must call the
806    dn_drop_packet() function and then return 1.
807- struct mbuf * (*dequeue)(void *s);
808    Called when the timer expires (or when a packet arrives and the scheduler
809    instance is idle).
810    This function is called when at least a packet can be send out. The
811    scheduler choices the packet and returns it; if no packet are in the
812    schedulerinstance, the function must return NULL.
813    Before return a packet, it is important to call the function
814    dn_return_packet() to update some statistic of the queue and update the
815    queue counters.
816- int (*drain_queue)(void *s, int flag);
817    The system request to scheduler to delete all queues that is not using
818    to free memory. The flag parameter indicate if a queue must be deleted
819    even if it is active.
820
821- int (*create_alg_fs)(char *command, struct gen *g, int reconfigure);
822    It is called when a flowset is linked with a scheduler. This is done
823    when the scheduler is defined, so we can know the type of flowset.
824    The function initialize the flowset paramenter parsing the command
825    line. The parameter will be stored in the g struct that have the right
826    size allocated by the system. If the reconfigure flag is set, it means
827    that the flowset is reconfiguring
828- int (*delete_alg_fs)(struct gen *f);
829    It is called when a flowset is deleting. Must remove the memory allocate
830    by the create_alg_fs() function.
831
832- int (*create_queue_alg)(struct dn_queue *q, struct gen *f);
833    Called when a queue is created. The function should link the queue
834    to the struct used by the scheduler instance to store all queues.
835- int (*delete_queue_alg)(struct dn_queue *q);
836    Called when a queue is deleting. The function should remove extra data
837    and update the struct contains all queues in the scheduler instance.
838
839The struct scheduler represent the scheduler descriptor that is passed to
840dummynet when a scheduler module is loaded.
841This struct contains the type of scheduler, the length of all structs and
842all function pointers.
843If a function is not implemented should be initialize to NULL. Some functions
844are mandatory, other are mandatory if some memory should be freed.
845Mandatory functions:
846- create_scheduler_instance()
847- enqueue()
848- dequeue()
849- create_alg_fs()
850- drain_queue()
851Optional functions:
852- config_scheduler()
853- create_queue_alg()
854Mandatory functions if the corresponding create...() has allocated memory:
855- delete_scheduler_template()
856- delete_scheduler_instance()
857- delete_alg_fs()
858- delete_queue_alg()
859
860