xref: /freebsd/share/man/man4/netgraph.4 (revision d056fa046c6a91b90cd98165face0e42a33a5173)
1.\" Copyright (c) 1996-1999 Whistle Communications, Inc.
2.\" All rights reserved.
3.\"
4.\" Subject to the following obligations and disclaimer of warranty, use and
5.\" redistribution of this software, in source or object code forms, with or
6.\" without modifications are expressly permitted by Whistle Communications;
7.\" provided, however, that:
8.\" 1. Any and all reproductions of the source or object code must include the
9.\"    copyright notice above and the following disclaimer of warranties; and
10.\" 2. No rights are granted, in any manner or form, to use Whistle
11.\"    Communications, Inc. trademarks, including the mark "WHISTLE
12.\"    COMMUNICATIONS" on advertising, endorsements, or otherwise except as
13.\"    such appears in the above copyright notice or in the software.
14.\"
15.\" THIS SOFTWARE IS BEING PROVIDED BY WHISTLE COMMUNICATIONS "AS IS", AND
16.\" TO THE MAXIMUM EXTENT PERMITTED BY LAW, WHISTLE COMMUNICATIONS MAKES NO
17.\" REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, REGARDING THIS SOFTWARE,
18.\" INCLUDING WITHOUT LIMITATION, ANY AND ALL IMPLIED WARRANTIES OF
19.\" MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT.
20.\" WHISTLE COMMUNICATIONS DOES NOT WARRANT, GUARANTEE, OR MAKE ANY
21.\" REPRESENTATIONS REGARDING THE USE OF, OR THE RESULTS OF THE USE OF THIS
22.\" SOFTWARE IN TERMS OF ITS CORRECTNESS, ACCURACY, RELIABILITY OR OTHERWISE.
23.\" IN NO EVENT SHALL WHISTLE COMMUNICATIONS BE LIABLE FOR ANY DAMAGES
24.\" RESULTING FROM OR ARISING OUT OF ANY USE OF THIS SOFTWARE, INCLUDING
25.\" WITHOUT LIMITATION, ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
26.\" PUNITIVE, OR CONSEQUENTIAL DAMAGES, PROCUREMENT OF SUBSTITUTE GOODS OR
27.\" SERVICES, LOSS OF USE, DATA OR PROFITS, HOWEVER CAUSED AND UNDER ANY
28.\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
29.\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
30.\" THIS SOFTWARE, EVEN IF WHISTLE COMMUNICATIONS IS ADVISED OF THE POSSIBILITY
31.\" OF SUCH DAMAGE.
32.\"
33.\" Authors: Julian Elischer <julian@FreeBSD.org>
34.\"          Archie Cobbs <archie@FreeBSD.org>
35.\"
36.\" $Whistle: netgraph.4,v 1.7 1999/01/28 23:54:52 julian Exp $
37.\" $FreeBSD$
38.\"
39.Dd July 1, 2004
40.Dt NETGRAPH 4
41.Os
42.Sh NAME
43.Nm netgraph
44.Nd "graph based kernel networking subsystem"
45.Sh DESCRIPTION
46The
47.Nm
48system provides a uniform and modular system for the implementation
49of kernel objects which perform various networking functions.
50The objects, known as
51.Em nodes ,
52can be arranged into arbitrarily complicated graphs.
53Nodes have
54.Em hooks
55which are used to connect two nodes together, forming the edges in the graph.
56Nodes communicate along the edges to process data, implement protocols, etc.
57.Pp
58The aim of
59.Nm
60is to supplement rather than replace the existing kernel networking
61infrastructure.
62It provides:
63.Pp
64.Bl -bullet -compact
65.It
66A flexible way of combining protocol and link level drivers.
67.It
68A modular way to implement new protocols.
69.It
70A common framework for kernel entities to inter-communicate.
71.It
72A reasonably fast, kernel-based implementation.
73.El
74.Ss Nodes and Types
75The most fundamental concept in
76.Nm
77is that of a
78.Em node .
79All nodes implement a number of predefined methods which allow them
80to interact with other nodes in a well defined manner.
81.Pp
82Each node has a
83.Em type ,
84which is a static property of the node determined at node creation time.
85A node's type is described by a unique
86.Tn ASCII
87type name.
88The type implies what the node does and how it may be connected
89to other nodes.
90.Pp
91In object-oriented language, types are classes, and nodes are instances
92of their respective class.
93All node types are subclasses of the generic node
94type, and hence inherit certain common functionality and capabilities
95(e.g., the ability to have an
96.Tn ASCII
97name).
98.Pp
99Nodes may be assigned a globally unique
100.Tn ASCII
101name which can be
102used to refer to the node.
103The name must not contain the characters
104.Ql .\&
105or
106.Ql \&: ,
107and is limited to
108.Dv NG_NODESIZ
109characters (including the terminating
110.Dv NUL
111character).
112.Pp
113Each node instance has a unique
114.Em ID number
115which is expressed as a 32-bit hexadecimal value.
116This value may be used to refer to a node when there is no
117.Tn ASCII
118name assigned to it.
119.Ss Hooks
120Nodes are connected to other nodes by connecting a pair of
121.Em hooks ,
122one from each node.
123Data flows bidirectionally between nodes along
124connected pairs of hooks.
125A node may have as many hooks as it
126needs, and may assign whatever meaning it wants to a hook.
127.Pp
128Hooks have these properties:
129.Bl -bullet
130.It
131A hook has an
132.Tn ASCII
133name which is unique among all hooks
134on that node (other hooks on other nodes may have the same name).
135The name must not contain the characters
136.Ql .\&
137or
138.Ql \&: ,
139and is
140limited to
141.Dv NG_HOOKSIZ
142characters (including the terminating
143.Dv NUL
144character).
145.It
146A hook is always connected to another hook.
147That is, hooks are
148created at the time they are connected, and breaking an edge by
149removing either hook destroys both hooks.
150.It
151A hook can be set into a state where incoming packets are always queued
152by the input queueing system, rather than being delivered directly.
153This can be used when the data is sent from an interrupt handler,
154and processing must be quick so as not to block other interrupts.
155.It
156A hook may supply overriding receive data and receive message functions,
157which should be used for data and messages received through that hook
158in preference to the general node-wide methods.
159.El
160.Pp
161A node may decide to assign special meaning to some hooks.
162For example, connecting to the hook named
163.Va debug
164might trigger
165the node to start sending debugging information to that hook.
166.Ss Data Flow
167Two types of information flow between nodes: data messages and
168control messages.
169Data messages are passed in
170.Vt mbuf chains
171along the edges
172in the graph, one edge at a time.
173The first
174.Vt mbuf
175in a chain must have the
176.Dv M_PKTHDR
177flag set.
178Each node decides how to handle data received through one of its hooks.
179.Pp
180Along with data, nodes can also receive control messages.
181There are generic and type-specific control messages.
182Control messages have a common
183header format, followed by type-specific data, and are binary structures
184for efficiency.
185However, node types may also support conversion of the
186type-specific data between binary and
187.Tn ASCII
188formats,
189for debugging and human interface purposes (see the
190.Dv NGM_ASCII2BINARY
191and
192.Dv NGM_BINARY2ASCII
193generic control messages below).
194Nodes are not required to support these conversions.
195.Pp
196There are three ways to address a control message.
197If there is a sequence of edges connecting the two nodes, the message
198may be
199.Dq source routed
200by specifying the corresponding sequence
201of
202.Tn ASCII
203hook names as the destination address for the message (relative
204addressing).
205If the destination is adjacent to the source, then the source
206node may simply specify (as a pointer in the code) the hook across which the
207message should be sent.
208Otherwise, the recipient node's global
209.Tn ASCII
210name
211(or equivalent ID-based name) is used as the destination address
212for the message (absolute addressing).
213The two types of
214.Tn ASCII
215addressing
216may be combined, by specifying an absolute start node and a sequence
217of hooks.
218Only the
219.Tn ASCII
220addressing modes are available to control programs outside the kernel;
221use of direct pointers is limited to kernel modules.
222.Pp
223Messages often represent commands that are followed by a reply message
224in the reverse direction.
225To facilitate this, the recipient of a
226control message is supplied with a
227.Dq return address
228that is suitable for addressing a reply.
229.Pp
230Each control message contains a 32-bit value, called a
231.Dq typecookie ,
232indicating the type of the message, i.e.\& how to interpret it.
233Typically each type defines a unique typecookie for the messages
234that it understands.
235However, a node may choose to recognize and
236implement more than one type of messages.
237.Pp
238If a message is delivered to an address that implies that it arrived
239at that node through a particular hook (as opposed to having been directly
240addressed using its ID or global name) then that hook is identified to the
241receiving node.
242This allows a message to be re-routed or passed on, should
243a node decide that this is required, in much the same way that data packets
244are passed around between nodes.
245A set of standard
246messages for flow control and link management purposes are
247defined by the base system that are usually
248passed around in this manner.
249Flow control message would usually travel
250in the opposite direction to the data to which they pertain.
251.Ss Netgraph is (Usually) Functional
252In order to minimize latency, most
253.Nm
254operations are functional.
255That is, data and control messages are delivered by making function
256calls rather than by using queues and mailboxes.
257For example, if node
258A wishes to send a data
259.Vt mbuf
260to neighboring node B, it calls the
261generic
262.Nm
263data delivery function.
264This function in turn locates
265node B and calls B's
266.Dq receive data
267method.
268There are exceptions to this.
269.Pp
270Each node has an input queue, and some operations can be considered to
271be
272.Em writers
273in that they alter the state of the node.
274Obviously, in an SMP
275world it would be bad if the state of a node were changed while another
276data packet were transiting the node.
277For this purpose, the input queue implements a
278.Em reader/writer
279semantic so that when there is a writer in the node, all other requests
280are queued, and while there are readers, a writer, and any following
281packets are queued.
282In the case where there is no reason to queue the
283data, the input method is called directly, as mentioned above.
284.Pp
285A node may declare that all requests should be considered as writers,
286or that requests coming in over a particular hook should be considered to
287be a writer, or even that packets leaving or entering across a particular
288hook should always be queued, rather than delivered directly (often useful
289for interrupt routines who want to get back to the hardware quickly).
290By default, all control message packets are considered to be writers
291unless specifically declared to be a reader in their definition.
292(See
293.Dv NGM_READONLY
294in
295.In ng_message.h . )
296.Pp
297While this mode of operation
298results in good performance, it has a few implications for node
299developers:
300.Bl -bullet
301.It
302Whenever a node delivers a data or control message, the node
303may need to allow for the possibility of receiving a returning
304message before the original delivery function call returns.
305.It
306.Nm Netgraph
307provides internal synchronization between nodes.
308Data always enters a
309.Dq graph
310at an
311.Em edge node .
312An
313.Em edge node
314is a node that interfaces between
315.Nm
316and some other part of the system.
317Examples of
318.Dq edge nodes
319include device drivers, the
320.Vt socket , ether , tty ,
321and
322.Vt ksocket
323node type.
324In these
325.Em edge nodes ,
326the calling thread directly executes code in the node, and from that code
327calls upon the
328.Nm
329framework to deliver data across some edge
330in the graph.
331From an execution point of view, the calling thread will execute the
332.Nm
333framework methods, and if it can acquire a lock to do so,
334the input methods of the next node.
335This continues until either the data is discarded or queued for some
336device or system entity, or the thread is unable to acquire a lock on
337the next node.
338In that case, the data is queued for the node, and execution rewinds
339back to the original calling entity.
340The queued data will be picked up and processed by either the current
341holder of the lock when they have completed their operations, or by
342a special
343.Nm
344thread that is activated when there are such items
345queued.
346.It
347It is possible for an infinite loop to occur if the graph contains cycles.
348.El
349.Pp
350So far, these issues have not proven problematical in practice.
351.Ss Interaction with Other Parts of the Kernel
352A node may have a hidden interaction with other components of the
353kernel outside of the
354.Nm
355subsystem, such as device hardware,
356kernel protocol stacks, etc.
357In fact, one of the benefits of
358.Nm
359is the ability to join disparate kernel networking entities together in a
360consistent communication framework.
361.Pp
362An example is the
363.Vt socket
364node type which is both a
365.Nm
366node and a
367.Xr socket 2
368in the protocol family
369.Dv PF_NETGRAPH .
370Socket nodes allow user processes to participate in
371.Nm .
372Other nodes communicate with socket nodes using the usual methods, and the
373node hides the fact that it is also passing information to and from a
374cooperating user process.
375.Pp
376Another example is a device driver that presents
377a node interface to the hardware.
378.Ss Node Methods
379Nodes are notified of the following actions via function calls
380to the following node methods,
381and may accept or reject that action (by returning the appropriate
382error code):
383.Bl -tag -width 2n
384.It Creation of a new node
385The constructor for the type is called.
386If creation of a new node is allowed, constructor method may allocate any
387special resources it needs.
388For nodes that correspond to hardware, this is typically done during the
389device attach routine.
390Often a global
391.Tn ASCII
392name corresponding to the
393device name is assigned here as well.
394.It Creation of a new hook
395The hook is created and tentatively
396linked to the node, and the node is told about the name that will be
397used to describe this hook.
398The node sets up any special data structures
399it needs, or may reject the connection, based on the name of the hook.
400.It Successful connection of two hooks
401After both ends have accepted their
402hooks, and the links have been made, the nodes get a chance to
403find out who their peer is across the link, and can then decide to reject
404the connection.
405Tear-down is automatic.
406This is also the time at which
407a node may decide whether to set a particular hook (or its peer) into
408the
409.Em queueing
410mode.
411.It Destruction of a hook
412The node is notified of a broken connection.
413The node may consider some hooks
414to be critical to operation and others to be expendable: the disconnection
415of one hook may be an acceptable event while for another it
416may effect a total shutdown for the node.
417.It Preshutdown of a node
418This method is called before real shutdown, which is discussed below.
419While in this method, the node is fully operational and can send a
420.Dq goodbye
421message to its peers, or it can exclude itself from the chain and reconnect
422its peers together, like the
423.Xr ng_tee 4
424node type does.
425.It Shutdown of a node
426This method allows a node to clean up
427and to ensure that any actions that need to be performed
428at this time are taken.
429The method is called by the generic (i.e., superclass)
430node destructor which will get rid of the generic components of the node.
431Some nodes (usually associated with a piece of hardware) may be
432.Em persistent
433in that a shutdown breaks all edges and resets the node,
434but does not remove it.
435In this case, the shutdown method should not
436free its resources, but rather, clean up and then call the
437.Fn NG_NODE_REVIVE
438macro to signal the generic code that the shutdown is aborted.
439In the case where the shutdown is started by the node itself due to hardware
440removal or unloading (via
441.Fn ng_rmnode_self ) ,
442it should set the
443.Dv NGF_REALLY_DIE
444flag to signal to its own shutdown method that it is not to persist.
445.El
446.Ss Sending and Receiving Data
447Two other methods are also supported by all nodes:
448.Bl -tag -width 2n
449.It Receive data message
450A
451.Nm
452.Em queueable request item ,
453usually referred to as an
454.Em item ,
455is received by this function.
456The item contains a pointer to an
457.Vt mbuf .
458.Pp
459The node is notified on which hook the item has arrived,
460and can use this information in its processing decision.
461The receiving node must always
462.Fn NG_FREE_M
463the
464.Vt mbuf chain
465on completion or error, or pass it on to another node
466(or kernel module) which will then be responsible for freeing it.
467Similarly, the
468.Em item
469must be freed if it is not to be passed on to another node, by using the
470.Fn NG_FREE_ITEM
471macro.
472If the item still holds references to
473.Vt mbufs
474at the time of
475freeing then they will also be appropriately freed.
476Therefore, if there is any chance that the
477.Vt mbuf
478will be
479changed or freed separately from the item, it is very important
480that it be retrieved using the
481.Fn NGI_GET_M
482macro that also removes the reference within the item.
483(Or multiple frees of the same object will occur.)
484.Pp
485If it is only required to examine the contents of the
486.Vt mbufs ,
487then it is possible to use the
488.Fn NGI_M
489macro to both read and rewrite
490.Vt mbuf
491pointer inside the item.
492.Pp
493If developer needs to pass any meta information along with the
494.Vt mbuf chain ,
495he should use
496.Xr mbuf_tags 9
497framework.
498.Bf -symbolic
499Note that old
500.Nm
501specific meta-data format is obsoleted now.
502.Ef
503.Pp
504The receiving node may decide to defer the data by queueing it in the
505.Nm
506NETISR system (see below).
507It achieves this by setting the
508.Dv HK_QUEUE
509flag in the flags word of the hook on which that data will arrive.
510The infrastructure will respect that bit and queue the data for delivery at
511a later time, rather than deliver it directly.
512A node may decide to set
513the bit on the
514.Em peer
515node, so that its own output packets are queued.
516.Pp
517The node may elect to nominate a different receive data function
518for data received on a particular hook, to simplify coding.
519It uses the
520.Fn NG_HOOK_SET_RCVDATA hook fn
521macro to do this.
522The function receives the same arguments in every way
523other than it will receive all (and only) packets from that hook.
524.It Receive control message
525This method is called when a control message is addressed to the node.
526As with the received data, an
527.Em item
528is received, with a pointer to the control message.
529The message can be examined using the
530.Fn NGI_MSG
531macro, or completely extracted from the item using the
532.Fn NGI_GET_MSG
533which also removes the reference within the item.
534If the Item still holds a reference to the message when it is freed
535(using the
536.Fn NG_FREE_ITEM
537macro), then the message will also be freed appropriately.
538If the
539reference has been removed, the node must free the message itself using the
540.Fn NG_FREE_MSG
541macro.
542A return address is always supplied, giving the address of the node
543that originated the message so a reply message can be sent anytime later.
544The return address is retrieved from the
545.Em item
546using the
547.Fn NGI_RETADDR
548macro and is of type
549.Vt ng_ID_t .
550All control messages and replies are
551allocated with the
552.Xr malloc 9
553type
554.Dv M_NETGRAPH_MSG ,
555however it is more convenient to use the
556.Fn NG_MKMESSAGE
557and
558.Fn NG_MKRESPONSE
559macros to allocate and fill out a message.
560Messages must be freed using the
561.Fn NG_FREE_MSG
562macro.
563.Pp
564If the message was delivered via a specific hook, that hook will
565also be made known, which allows the use of such things as flow-control
566messages, and status change messages, where the node may want to forward
567the message out another hook to that on which it arrived.
568.Pp
569The node may elect to nominate a different receive message function
570for messages received on a particular hook, to simplify coding.
571It uses the
572.Fn NG_HOOK_SET_RCVMSG hook fn
573macro to do this.
574The function receives the same arguments in every way
575other than it will receive all (and only) messages from that hook.
576.El
577.Pp
578Much use has been made of reference counts, so that nodes being
579freed of all references are automatically freed, and this behaviour
580has been tested and debugged to present a consistent and trustworthy
581framework for the
582.Dq type module
583writer to use.
584.Ss Addressing
585The
586.Nm
587framework provides an unambiguous and simple to use method of specifically
588addressing any single node in the graph.
589The naming of a node is
590independent of its type, in that another node, or external component
591need not know anything about the node's type in order to address it so as
592to send it a generic message type.
593Node and hook names should be
594chosen so as to make addresses meaningful.
595.Pp
596Addresses are either absolute or relative.
597An absolute address begins
598with a node name or ID, followed by a colon, followed by a sequence of hook
599names separated by periods.
600This addresses the node reached by starting
601at the named node and following the specified sequence of hooks.
602A relative address includes only the sequence of hook names, implicitly
603starting hook traversal at the local node.
604.Pp
605There are a couple of special possibilities for the node name.
606The name
607.Ql .\&
608(referred to as
609.Ql .: )
610always refers to the local node.
611Also, nodes that have no global name may be addressed by their ID numbers,
612by enclosing the hexadecimal representation of the ID number within
613the square brackets.
614Here are some examples of valid
615.Nm
616addresses:
617.Bd -literal -offset indent
618\&.:
619[3f]:
620foo:
621\&.:hook1
622foo:hook1.hook2
623[d80]:hook1
624.Ed
625.Pp
626The following set of nodes might be created for a site with
627a single physical frame relay line having two active logical DLCI channels,
628with RFC 1490 frames on DLCI 16 and PPP frames over DLCI 20:
629.Bd -literal
630[type SYNC ]                  [type FRAME]                 [type RFC1490]
631[ "Frame1" ](uplink)<-->(data)[<un-named>](dlci16)<-->(mux)[<un-named>  ]
632[    A     ]                  [    B     ](dlci20)<---+    [     C      ]
633                                                      |
634                                                      |      [ type PPP ]
635                                                      +>(mux)[<un-named>]
636                                                             [    D     ]
637.Ed
638.Pp
639One could always send a control message to node C from anywhere
640by using the name
641.Dq Li Frame1:uplink.dlci16 .
642In this case, node C would also be notified that the message
643reached it via its hook
644.Va mux .
645Similarly,
646.Dq Li Frame1:uplink.dlci20
647could reliably be used to reach node D, and node A could refer
648to node B as
649.Dq Li .:uplink ,
650or simply
651.Dq Li uplink .
652Conversely, B can refer to A as
653.Dq Li data .
654The address
655.Dq Li mux.data
656could be used by both nodes C and D to address a message to node A.
657.Pp
658Note that this is only for
659.Em control messages .
660In each of these cases, where a relative addressing mode is
661used, the recipient is notified of the hook on which the
662message arrived, as well as
663the originating node.
664This allows the option of hop-by-hop distribution of messages and
665state information.
666Data messages are
667.Em only
668routed one hop at a time, by specifying the departing
669hook, with each node making
670the next routing decision.
671So when B receives a frame on hook
672.Va data ,
673it decodes the frame relay header to determine the DLCI,
674and then forwards the unwrapped frame to either C or D.
675.Pp
676In a similar way, flow control messages may be routed in the reverse
677direction to outgoing data.
678For example a
679.Dq "buffer nearly full"
680message from
681.Dq Li Frame1:
682would be passed to node B
683which might decide to send similar messages to both nodes
684C and D.
685The nodes would use
686.Em "direct hook pointer"
687addressing to route the messages.
688The message may have travelled from
689.Dq Li Frame1:
690to B
691as a synchronous reply, saving time and cycles.
692.Pp
693A similar graph might be used to represent multi-link PPP running
694over an ISDN line:
695.Bd -literal
696[ type BRI ](B1)<--->(link1)[ type MPP  ]
697[  "ISDN1" ](B2)<--->(link2)[ (no name) ]
698[          ](D) <-+
699                  |
700 +----------------+
701 |
702 +->(switch)[ type Q.921 ](term1)<---->(datalink)[ type Q.931 ]
703            [ (no name)  ]                       [ (no name)  ]
704.Ed
705.Ss Netgraph Structures
706Structures are defined in
707.In netgraph/netgraph.h
708(for kernel structures only of interest to nodes)
709and
710.In netgraph/ng_message.h
711(for message definitions also of interest to user programs).
712.Pp
713The two basic object types that are of interest to node authors are
714.Em nodes
715and
716.Em hooks .
717These two objects have the following
718properties that are also of interest to the node writers.
719.Bl -tag -width 2n
720.It Vt "struct ng_node"
721Node authors should always use the following
722.Ic typedef
723to declare
724their pointers, and should never actually declare the structure.
725.Pp
726.Fd "typedef struct ng_node *node_p;"
727.Pp
728The following properties are associated with a node, and can be
729accessed in the following manner:
730.Bl -tag -width 2n
731.It Validity
732A driver or interrupt routine may want to check whether
733the node is still valid.
734It is assumed that the caller holds a reference
735on the node so it will not have been freed, however it may have been
736disabled or otherwise shut down.
737Using the
738.Fn NG_NODE_IS_VALID node
739macro will return this state.
740Eventually it should be almost impossible
741for code to run in an invalid node but at this time that work has not been
742completed.
743.It Node ID Pq Vt ng_ID_t
744This property can be retrieved using the macro
745.Fn NG_NODE_ID node .
746.It Node name
747Optional globally unique name,
748.Dv NUL
749terminated string.
750If there
751is a value in here, it is the name of the node.
752.Bd -literal -offset indent
753if (NG_NODE_NAME(node)[0] != '\e0') ...
754
755if (strcmp(NG_NODE_NAME(node), "fred") == 0) ...
756.Ed
757.It A node dependent opaque cookie
758Anything of the pointer type can be placed here.
759The macros
760.Fn NG_NODE_SET_PRIVATE node value
761and
762.Fn NG_NODE_PRIVATE node
763set and retrieve this property, respectively.
764.It Number of hooks
765The
766.Fn NG_NODE_NUMHOOKS node
767macro is used
768to retrieve this value.
769.It Hooks
770The node may have a number of hooks.
771A traversal method is provided to allow all the hooks to be
772tested for some condition.
773.Fn NG_NODE_FOREACH_HOOK node fn arg rethook
774where
775.Fa fn
776is a function that will be called for each hook
777with the form
778.Fn fn hook arg
779and returning 0 to terminate the search.
780If the search is terminated, then
781.Fa rethook
782will be set to the hook at which the search was terminated.
783.El
784.It Vt "struct ng_hook"
785Node authors should always use the following
786.Ic typedef
787to declare
788their hook pointers.
789.Pp
790.Fd "typedef struct ng_hook *hook_p;"
791.Pp
792The following properties are associated with a hook, and can be
793accessed in the following manner:
794.Bl -tag -width 2n
795.It A hook dependent opaque cookie
796Anything of the pointer type can be placed here.
797The macros
798.Fn NG_HOOK_SET_PRIVATE hook value
799and
800.Fn NG_HOOK_PRIVATE hook
801set and retrieve this property, respectively.
802.It \&An associate node
803The macro
804.Fn NG_HOOK_NODE hook
805finds the associated node.
806.It A peer hook Pq Vt hook_p
807The other hook in this connected pair.
808The
809.Fn NG_HOOK_PEER hook
810macro finds the peer.
811.It References
812The
813.Fn NG_HOOK_REF hook
814and
815.Fn NG_HOOK_UNREF hook
816macros
817increment and decrement the hook reference count accordingly.
818After decrement you should always assume the hook has been freed
819unless you have another reference still valid.
820.It Override receive functions
821The
822.Fn NG_HOOK_SET_RCVDATA hook fn
823and
824.Fn NG_HOOK_SET_RCVMSG hook fn
825macros can be used to set override methods that will be used in preference
826to the generic receive data and receive message functions.
827To unset these, use the macros to set them to
828.Dv NULL .
829They will only be used for data and
830messages received on the hook on which they are set.
831.El
832.Pp
833The maintenance of the names, reference counts, and linked list
834of hooks for each node is handled automatically by the
835.Nm
836subsystem.
837Typically a node's private info contains a back-pointer to the node or hook
838structure, which counts as a new reference that must be included
839in the reference count for the node.
840When the node constructor is called,
841there is already a reference for this calculated in, so that
842when the node is destroyed, it should remember to do a
843.Fn NG_NODE_UNREF
844on the node.
845.Pp
846From a hook you can obtain the corresponding node, and from
847a node, it is possible to traverse all the active hooks.
848.Pp
849A current example of how to define a node can always be seen in
850.Pa src/sys/netgraph/ng_sample.c
851and should be used as a starting point for new node writers.
852.El
853.Ss Netgraph Message Structure
854Control messages have the following structure:
855.Bd -literal
856#define NG_CMDSTRSIZ    32      /* Max command string (including nul) */
857
858struct ng_mesg {
859  struct ng_msghdr {
860    u_char      version;        /* Must equal NG_VERSION */
861    u_char      spare;          /* Pad to 2 bytes */
862    u_short     arglen;         /* Length of cmd/resp data */
863    u_long      flags;          /* Message status flags */
864    u_long      token;          /* Reply should have the same token */
865    u_long      typecookie;     /* Node type understanding this message */
866    u_long      cmd;            /* Command identifier */
867    u_char      cmdstr[NG_CMDSTRSIZ]; /* Cmd string (for debug) */
868  } header;
869  char  data[0];                /* Start of cmd/resp data */
870};
871
872#define NG_ABI_VERSION  5               /* Netgraph kernel ABI version */
873#define NG_VERSION      4               /* Netgraph message version */
874#define NGF_ORIG        0x0000          /* Command */
875#define NGF_RESP        0x0001          /* Response */
876.Ed
877.Pp
878Control messages have the fixed header shown above, followed by a
879variable length data section which depends on the type cookie
880and the command.
881Each field is explained below:
882.Bl -tag -width indent
883.It Va version
884Indicates the version of the
885.Nm
886message protocol itself.
887The current version is
888.Dv NG_VERSION .
889.It Va arglen
890This is the length of any extra arguments, which begin at
891.Va data .
892.It Va flags
893Indicates whether this is a command or a response control message.
894.It Va token
895The
896.Va token
897is a means by which a sender can match a reply message to the
898corresponding command message; the reply always has the same token.
899.It Va typecookie
900The corresponding node type's unique 32-bit value.
901If a node does not recognize the type cookie it must reject the message
902by returning
903.Er EINVAL .
904.Pp
905Each type should have an include file that defines the commands,
906argument format, and cookie for its own messages.
907The typecookie
908insures that the same header file was included by both sender and
909receiver; when an incompatible change in the header file is made,
910the typecookie
911.Em must
912be changed.
913The de-facto method for generating unique type cookies is to take the
914seconds from the Epoch at the time the header file is written
915(i.e., the output of
916.Dq Nm date Fl u Li +%s ) .
917.Pp
918There is a predefined typecookie
919.Dv NGM_GENERIC_COOKIE
920for the
921.Vt generic
922node type, and
923a corresponding set of generic messages which all nodes understand.
924The handling of these messages is automatic.
925.It Va cmd
926The identifier for the message command.
927This is type specific,
928and is defined in the same header file as the typecookie.
929.It Va cmdstr
930Room for a short human readable version of
931.Va command
932(for debugging purposes only).
933.El
934.Pp
935Some modules may choose to implement messages from more than one
936of the header files and thus recognize more than one type cookie.
937.Ss Control Message ASCII Form
938Control messages are in binary format for efficiency.
939However, for
940debugging and human interface purposes, and if the node type supports
941it, control messages may be converted to and from an equivalent
942.Tn ASCII
943form.
944The
945.Tn ASCII
946form is similar to the binary form, with two exceptions:
947.Bl -enum
948.It
949The
950.Va cmdstr
951header field must contain the
952.Tn ASCII
953name of the command, corresponding to the
954.Va cmd
955header field.
956.It
957The arguments field contains a
958.Dv NUL Ns
959-terminated
960.Tn ASCII
961string version of the message arguments.
962.El
963.Pp
964In general, the arguments field of a control message can be any
965arbitrary C data type.
966.Nm Netgraph
967includes parsing routines to support
968some pre-defined datatypes in
969.Tn ASCII
970with this simple syntax:
971.Bl -bullet
972.It
973Integer types are represented by base 8, 10, or 16 numbers.
974.It
975Strings are enclosed in double quotes and respect the normal
976C language backslash escapes.
977.It
978IP addresses have the obvious form.
979.It
980Arrays are enclosed in square brackets, with the elements listed
981consecutively starting at index zero.
982An element may have an optional index and equals sign
983.Pq Ql =
984preceding it.
985Whenever an element
986does not have an explicit index, the index is implicitly the previous
987element's index plus one.
988.It
989Structures are enclosed in curly braces, and each field is specified
990in the form
991.Ar fieldname Ns = Ns Ar value .
992.It
993Any array element or structure field whose value is equal to its
994.Dq default value
995may be omitted.
996For integer types, the default value
997is usually zero; for string types, the empty string.
998.It
999Array elements and structure fields may be specified in any order.
1000.El
1001.Pp
1002Each node type may define its own arbitrary types by providing
1003the necessary routines to parse and unparse.
1004.Tn ASCII
1005forms defined
1006for a specific node type are documented in the corresponding man page.
1007.Ss Generic Control Messages
1008There are a number of standard predefined messages that will work
1009for any node, as they are supported directly by the framework itself.
1010These are defined in
1011.In netgraph/ng_message.h
1012along with the basic layout of messages and other similar information.
1013.Bl -tag -width indent
1014.It Dv NGM_CONNECT
1015Connect to another node, using the supplied hook names on either end.
1016.It Dv NGM_MKPEER
1017Construct a node of the given type and then connect to it using the
1018supplied hook names.
1019.It Dv NGM_SHUTDOWN
1020The target node should disconnect from all its neighbours and shut down.
1021Persistent nodes such as those representing physical hardware
1022might not disappear from the node namespace, but only reset themselves.
1023The node must disconnect all of its hooks.
1024This may result in neighbors shutting themselves down, and possibly a
1025cascading shutdown of the entire connected graph.
1026.It Dv NGM_NAME
1027Assign a name to a node.
1028Nodes can exist without having a name, and this
1029is the default for nodes created using the
1030.Dv NGM_MKPEER
1031method.
1032Such nodes can only be addressed relatively or by their ID number.
1033.It Dv NGM_RMHOOK
1034Ask the node to break a hook connection to one of its neighbours.
1035Both nodes will have their
1036.Dq disconnect
1037method invoked.
1038Either node may elect to totally shut down as a result.
1039.It Dv NGM_NODEINFO
1040Asks the target node to describe itself.
1041The four returned fields
1042are the node name (if named), the node type, the node ID and the
1043number of hooks attached.
1044The ID is an internal number unique to that node.
1045.It Dv NGM_LISTHOOKS
1046This returns the information given by
1047.Dv NGM_NODEINFO ,
1048but in addition
1049includes an array of fields describing each link, and the description for
1050the node at the far end of that link.
1051.It Dv NGM_LISTNAMES
1052This returns an array of node descriptions (as for
1053.Dv NGM_NODEINFO )
1054where each entry of the array describes a named node.
1055All named nodes will be described.
1056.It Dv NGM_LISTNODES
1057This is the same as
1058.Dv NGM_LISTNAMES
1059except that all nodes are listed regardless of whether they have a name or not.
1060.It Dv NGM_LISTTYPES
1061This returns a list of all currently installed
1062.Nm
1063types.
1064.It Dv NGM_TEXT_STATUS
1065The node may return a text formatted status message.
1066The status information is determined entirely by the node type.
1067It is the only
1068.Dq generic
1069message
1070that requires any support within the node itself and as such the node may
1071elect to not support this message.
1072The text response must be less than
1073.Dv NG_TEXTRESPONSE
1074bytes in length (presently 1024).
1075This can be used to return general
1076status information in human readable form.
1077.It Dv NGM_BINARY2ASCII
1078This message converts a binary control message to its
1079.Tn ASCII
1080form.
1081The entire control message to be converted is contained within the
1082arguments field of the
1083.Dv NGM_BINARY2ASCII
1084message itself.
1085If successful, the reply will contain the same control
1086message in
1087.Tn ASCII
1088form.
1089A node will typically only know how to translate messages that it
1090itself understands, so the target node of the
1091.Dv NGM_BINARY2ASCII
1092is often the same node that would actually receive that message.
1093.It Dv NGM_ASCII2BINARY
1094The opposite of
1095.Dv NGM_BINARY2ASCII .
1096The entire control message to be converted, in
1097.Tn ASCII
1098form, is contained
1099in the arguments section of the
1100.Dv NGM_ASCII2BINARY
1101and need only have the
1102.Va flags , cmdstr ,
1103and
1104.Va arglen
1105header fields filled in, plus the
1106.Dv NUL Ns
1107-terminated string version of
1108the arguments in the arguments field.
1109If successful, the reply
1110contains the binary version of the control message.
1111.El
1112.Ss Flow Control Messages
1113In addition to the control messages that affect nodes with respect to the
1114graph, there are also a number of
1115.Em flow control
1116messages defined.
1117At present these are
1118.Em not
1119handled automatically by the system, so
1120nodes need to handle them if they are going to be used in a graph utilising
1121flow control, and will be in the likely path of these messages.
1122The default action of a node that does not understand these messages should
1123be to pass them onto the next node.
1124Hopefully some helper functions will assist in this eventually.
1125These messages are also defined in
1126.In netgraph/ng_message.h
1127and have a separate cookie
1128.Dv NG_FLOW_COOKIE
1129to help identify them.
1130They will not be covered in depth here.
1131.Sh INITIALIZATION
1132The base
1133.Nm
1134code may either be statically compiled
1135into the kernel or else loaded dynamically as a KLD via
1136.Xr kldload 8 .
1137In the former case, include
1138.Pp
1139.D1 Cd "options NETGRAPH"
1140.Pp
1141in your kernel configuration file.
1142You may also include selected
1143node types in the kernel compilation, for example:
1144.Pp
1145.D1 Cd "options NETGRAPH"
1146.D1 Cd "options NETGRAPH_SOCKET"
1147.D1 Cd "options NETGRAPH_ECHO"
1148.Pp
1149Once the
1150.Nm
1151subsystem is loaded, individual node types may be loaded at any time
1152as KLD modules via
1153.Xr kldload 8 .
1154Moreover,
1155.Nm
1156knows how to automatically do this; when a request to create a new
1157node of unknown type
1158.Ar type
1159is made,
1160.Nm
1161will attempt to load the KLD module
1162.Pa ng_ Ns Ao Ar type Ac Ns Pa .ko .
1163.Pp
1164Types can also be installed at boot time, as certain device drivers
1165may want to export each instance of the device as a
1166.Nm
1167node.
1168.Pp
1169In general, new types can be installed at any time from within the
1170kernel by calling
1171.Fn ng_newtype ,
1172supplying a pointer to the type's
1173.Vt "struct ng_type"
1174structure.
1175.Pp
1176The
1177.Fn NETGRAPH_INIT
1178macro automates this process by using a linker set.
1179.Sh EXISTING NODE TYPES
1180Several node types currently exist.
1181Each is fully documented in its own man page:
1182.Bl -tag -width indent
1183.It SOCKET
1184The socket type implements two new sockets in the new protocol domain
1185.Dv PF_NETGRAPH .
1186The new sockets protocols are
1187.Dv NG_DATA
1188and
1189.Dv NG_CONTROL ,
1190both of type
1191.Dv SOCK_DGRAM .
1192Typically one of each is associated with a socket node.
1193When both sockets have closed, the node will shut down.
1194The
1195.Dv NG_DATA
1196socket is used for sending and receiving data, while the
1197.Dv NG_CONTROL
1198socket is used for sending and receiving control messages.
1199Data and control messages are passed using the
1200.Xr sendto 2
1201and
1202.Xr recvfrom 2
1203system calls, using a
1204.Vt "struct sockaddr_ng"
1205socket address.
1206.It HOLE
1207Responds only to generic messages and is a
1208.Dq black hole
1209for data.
1210Useful for testing.
1211Always accepts new hooks.
1212.It ECHO
1213Responds only to generic messages and always echoes data back through the
1214hook from which it arrived.
1215Returns any non-generic messages as their own response.
1216Useful for testing.
1217Always accepts new hooks.
1218.It TEE
1219This node is useful for
1220.Dq snooping .
1221It has 4 hooks:
1222.Va left , right , left2right ,
1223and
1224.Va right2left .
1225Data entering from the
1226.Va right
1227is passed to the
1228.Va left
1229and duplicated on
1230.Va right2left ,
1231and data entering from the
1232.Va left
1233is passed to the
1234.Va right
1235and duplicated on
1236.Va left2right .
1237Data entering from
1238.Va left2right
1239is sent to the
1240.Va right
1241and data from
1242.Va right2left
1243to
1244.Va left .
1245.It RFC1490 MUX
1246Encapsulates/de-encapsulates frames encoded according to RFC 1490.
1247Has a hook for the encapsulated packets
1248.Pq Va downstream
1249and one hook
1250for each protocol (i.e., IP, PPP, etc.).
1251.It FRAME RELAY MUX
1252Encapsulates/de-encapsulates Frame Relay frames.
1253Has a hook for the encapsulated packets
1254.Pq Va downstream
1255and one hook
1256for each DLCI.
1257.It FRAME RELAY LMI
1258Automatically handles frame relay
1259.Dq LMI
1260(link management interface) operations and packets.
1261Automatically probes and detects which of several LMI standards
1262is in use at the exchange.
1263.It TTY
1264This node is also a line discipline.
1265It simply converts between
1266.Vt mbuf
1267frames and sequential serial data, allowing a TTY to appear as a
1268.Nm
1269node.
1270It has a programmable
1271.Dq hotkey
1272character.
1273.It ASYNC
1274This node encapsulates and de-encapsulates asynchronous frames
1275according to RFC 1662.
1276This is used in conjunction with the TTY node
1277type for supporting PPP links over asynchronous serial lines.
1278.It ETHERNET
1279This node is attached to every Ethernet interface in the system.
1280It allows capturing raw Ethernet frames from the network, as well as
1281sending frames out of the interface.
1282.It INTERFACE
1283This node is also a system networking interface.
1284It has hooks representing
1285each protocol family (IP, AppleTalk, IPX, etc.) and appears in the output of
1286.Xr ifconfig 8 .
1287The interfaces are named
1288.Dq Li ng0 ,
1289.Dq Li ng1 ,
1290etc.
1291.It ONE2MANY
1292This node implements a simple round-robin multiplexer.
1293It can be used
1294for example to make several LAN ports act together to get a higher speed
1295link between two machines.
1296.It Various PPP related nodes
1297There is a full multilink PPP implementation that runs in
1298.Nm .
1299The
1300.Pa net/mpd
1301port can use these modules to make a very low latency high
1302capacity PPP system.
1303It also supports
1304.Tn PPTP
1305VPNs using the PPTP node.
1306.It PPPOE
1307A server and client side implementation of PPPoE.
1308Used in conjunction with
1309either
1310.Xr ppp 8
1311or the
1312.Pa net/mpd
1313port.
1314.It BRIDGE
1315This node, together with the Ethernet nodes, allows a very flexible
1316bridging system to be implemented.
1317.It KSOCKET
1318This intriguing node looks like a socket to the system but diverts
1319all data to and from the
1320.Nm
1321system for further processing.
1322This allows
1323such things as UDP tunnels to be almost trivially implemented from the
1324command line.
1325.El
1326.Pp
1327Refer to the section at the end of this man page for more nodes types.
1328.Sh NOTES
1329Whether a named node exists can be checked by trying to send a control message
1330to it (e.g.,
1331.Dv NGM_NODEINFO ) .
1332If it does not exist,
1333.Er ENOENT
1334will be returned.
1335.Pp
1336All data messages are
1337.Vt mbuf chains
1338with the
1339.Dv M_PKTHDR
1340flag set.
1341.Pp
1342Nodes are responsible for freeing what they allocate.
1343There are three exceptions:
1344.Bl -enum
1345.It
1346.Vt Mbufs
1347sent across a data link are never to be freed by the sender.
1348In the
1349case of error, they should be considered freed.
1350.It
1351Messages sent using one of
1352.Fn NG_SEND_MSG_*
1353family macros are freed by the recipient.
1354As in the case above, the addresses
1355associated with the message are freed by whatever allocated them so the
1356recipient should copy them if it wants to keep that information.
1357.It
1358Both control messages and data are delivered and queued with a
1359.Nm
1360.Em item .
1361The item must be freed using
1362.Fn NG_FREE_ITEM item
1363or passed on to another node.
1364.El
1365.Sh FILES
1366.Bl -tag -width indent
1367.It In netgraph/netgraph.h
1368Definitions for use solely within the kernel by
1369.Nm
1370nodes.
1371.It In netgraph/ng_message.h
1372Definitions needed by any file that needs to deal with
1373.Nm
1374messages.
1375.It In netgraph/ng_socket.h
1376Definitions needed to use
1377.Nm
1378.Vt socket
1379type nodes.
1380.It In netgraph/ng_ Ns Ao Ar type Ac Ns Pa .h
1381Definitions needed to use
1382.Nm
1383.Ar type
1384nodes, including the type cookie definition.
1385.It Pa /boot/kernel/netgraph.ko
1386The
1387.Nm
1388subsystem loadable KLD module.
1389.It Pa /boot/kernel/ng_ Ns Ao Ar type Ac Ns Pa .ko
1390Loadable KLD module for node type
1391.Ar type .
1392.It Pa src/sys/netgraph/ng_sample.c
1393Skeleton
1394.Nm
1395node.
1396Use this as a starting point for new node types.
1397.El
1398.Sh USER MODE SUPPORT
1399There is a library for supporting user-mode programs that wish
1400to interact with the
1401.Nm
1402system.
1403See
1404.Xr netgraph 3
1405for details.
1406.Pp
1407Two user-mode support programs,
1408.Xr ngctl 8
1409and
1410.Xr nghook 8 ,
1411are available to assist manual configuration and debugging.
1412.Pp
1413There are a few useful techniques for debugging new node types.
1414First, implementing new node types in user-mode first
1415makes debugging easier.
1416The
1417.Vt tee
1418node type is also useful for debugging, especially in conjunction with
1419.Xr ngctl 8
1420and
1421.Xr nghook 8 .
1422.Pp
1423Also look in
1424.Pa /usr/share/examples/netgraph
1425for solutions to several
1426common networking problems, solved using
1427.Nm .
1428.Sh SEE ALSO
1429.Xr socket 2 ,
1430.Xr netgraph 3 ,
1431.Xr ng_async 4 ,
1432.Xr ng_atm 4 ,
1433.Xr ng_atmllc 4 ,
1434.Xr ng_atmpif 4 ,
1435.Xr ng_bluetooth 4 ,
1436.Xr ng_bpf 4 ,
1437.Xr ng_bridge 4 ,
1438.Xr ng_bt3c 4 ,
1439.Xr ng_btsocket 4 ,
1440.Xr ng_cisco 4 ,
1441.Xr ng_device 4 ,
1442.Xr ng_echo 4 ,
1443.Xr ng_eiface 4 ,
1444.Xr ng_etf 4 ,
1445.Xr ng_ether 4 ,
1446.Xr ng_fec 4 ,
1447.Xr ng_frame_relay 4 ,
1448.Xr ng_gif 4 ,
1449.Xr ng_gif_demux 4 ,
1450.Xr ng_h4 4 ,
1451.Xr ng_hci 4 ,
1452.Xr ng_hole 4 ,
1453.Xr ng_hub 4 ,
1454.Xr ng_iface 4 ,
1455.Xr ng_ip_input 4 ,
1456.Xr ng_ksocket 4 ,
1457.Xr ng_l2cap 4 ,
1458.Xr ng_l2tp 4 ,
1459.Xr ng_lmi 4 ,
1460.Xr ng_mppc 4 ,
1461.Xr ng_netflow 4 ,
1462.Xr ng_one2many 4 ,
1463.Xr ng_ppp 4 ,
1464.Xr ng_pppoe 4 ,
1465.Xr ng_pptpgre 4 ,
1466.Xr ng_rfc1490 4 ,
1467.Xr ng_socket 4 ,
1468.Xr ng_split 4 ,
1469.Xr ng_sppp 4 ,
1470.Xr ng_sscfu 4 ,
1471.Xr ng_sscop 4 ,
1472.Xr ng_tee 4 ,
1473.Xr ng_tty 4 ,
1474.Xr ng_ubt 4 ,
1475.Xr ng_UI 4 ,
1476.Xr ng_uni 4 ,
1477.Xr ng_vjc 4 ,
1478.Xr ng_vlan 4 ,
1479.Xr ngctl 8 ,
1480.Xr nghook 8
1481.Sh HISTORY
1482The
1483.Nm
1484system was designed and first implemented at Whistle Communications, Inc.\&
1485in a version of
1486.Fx 2.2
1487customized for the Whistle InterJet.
1488It first made its debut in the main tree in
1489.Fx 3.4 .
1490.Sh AUTHORS
1491.An -nosplit
1492.An Julian Elischer Aq julian@FreeBSD.org ,
1493with contributions by
1494.An Archie Cobbs Aq archie@FreeBSD.org .
1495