xref: /freebsd/share/man/man4/netgraph.4 (revision 5e75e35cca2eb446c514ea7c13a6be832fc8602a)
1.\" Copyright (c) 1996-1999 Whistle Communications, Inc.
2.\" All rights reserved.
3.\"
4.\" Subject to the following obligations and disclaimer of warranty, use and
5.\" redistribution of this software, in source or object code forms, with or
6.\" without modifications are expressly permitted by Whistle Communications;
7.\" provided, however, that:
8.\" 1. Any and all reproductions of the source or object code must include the
9.\"    copyright notice above and the following disclaimer of warranties; and
10.\" 2. No rights are granted, in any manner or form, to use Whistle
11.\"    Communications, Inc. trademarks, including the mark "WHISTLE
12.\"    COMMUNICATIONS" on advertising, endorsements, or otherwise except as
13.\"    such appears in the above copyright notice or in the software.
14.\"
15.\" THIS SOFTWARE IS BEING PROVIDED BY WHISTLE COMMUNICATIONS "AS IS", AND
16.\" TO THE MAXIMUM EXTENT PERMITTED BY LAW, WHISTLE COMMUNICATIONS MAKES NO
17.\" REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, REGARDING THIS SOFTWARE,
18.\" INCLUDING WITHOUT LIMITATION, ANY AND ALL IMPLIED WARRANTIES OF
19.\" MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT.
20.\" WHISTLE COMMUNICATIONS DOES NOT WARRANT, GUARANTEE, OR MAKE ANY
21.\" REPRESENTATIONS REGARDING THE USE OF, OR THE RESULTS OF THE USE OF THIS
22.\" SOFTWARE IN TERMS OF ITS CORRECTNESS, ACCURACY, RELIABILITY OR OTHERWISE.
23.\" IN NO EVENT SHALL WHISTLE COMMUNICATIONS BE LIABLE FOR ANY DAMAGES
24.\" RESULTING FROM OR ARISING OUT OF ANY USE OF THIS SOFTWARE, INCLUDING
25.\" WITHOUT LIMITATION, ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY,
26.\" PUNITIVE, OR CONSEQUENTIAL DAMAGES, PROCUREMENT OF SUBSTITUTE GOODS OR
27.\" SERVICES, LOSS OF USE, DATA OR PROFITS, HOWEVER CAUSED AND UNDER ANY
28.\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
29.\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
30.\" THIS SOFTWARE, EVEN IF WHISTLE COMMUNICATIONS IS ADVISED OF THE POSSIBILITY
31.\" OF SUCH DAMAGE.
32.\"
33.\" Authors: Julian Elischer <julian@FreeBSD.org>
34.\"          Archie Cobbs <archie@FreeBSD.org>
35.\"
36.\" $FreeBSD$
37.\" $Whistle: netgraph.4,v 1.7 1999/01/28 23:54:52 julian Exp $
38.\"
39.Dd January 19, 1999
40.Dt NETGRAPH 4
41.Os FreeBSD
42.Sh NAME
43.Nm netgraph
44.Nd graph based kernel networking subsystem
45.Sh DESCRIPTION
46The
47.Nm
48system provides a uniform and modular system for the implementation
49of kernel objects which perform various networking functions. The objects,
50known as
51.Em nodes ,
52can be arranged into arbitrarily complicated graphs. Nodes have
53.Em hooks
54which are used to connect two nodes together, forming the edges in the graph.
55Nodes communicate along the edges to process data, implement protocols, etc.
56.Pp
57The aim of
58.Nm
59is to supplement rather than replace the existing kernel networking
60infrastructure.  It provides:
61.Pp
62.Bl -bullet -compact -offset 2n
63.It
64A flexible way of combining protocol and link level drivers
65.It
66A modular way to implement new protocols
67.It
68A common framework for kernel entities to inter-communicate
69.It
70A reasonably fast, kernel-based implementation
71.El
72.Sh Nodes and Types
73The most fundamental concept in
74.Nm
75is that of a
76.Em node .
77All nodes implement a number of predefined methods which allow them
78to interact with other nodes in a well defined manner.
79.Pp
80Each node has a
81.Em type ,
82which is a static property of the node determined at node creation time.
83A node's type is described by a unique
84.Tn ASCII
85type name.
86The type implies what the node does and how it may be connected
87to other nodes.
88.Pp
89In object-oriented language, types are classes and nodes are instances
90of their respective class. All node types are subclasses of the generic node
91type, and hence inherit certain common functionality and capabilities
92(e.g., the ability to have an
93.Tn ASCII
94name).
95.Pp
96Nodes may be assigned a globally unique
97.Tn ASCII
98name which can be
99used to refer to the node.
100The name must not contain the characters
101.Dq .\&
102or
103.Dq \&:
104and is limited to
105.Dv "NG_NODELEN + 1"
106characters (including NUL byte).
107.Pp
108Each node instance has a unique
109.Em ID number
110which is expressed as a 32-bit hex value. This value may be used to
111refer to a node when there is no
112.Tn ASCII
113name assigned to it.
114.Sh Hooks
115Nodes are connected to other nodes by connecting a pair of
116.Em hooks ,
117one from each node. Data flows bidirectionally between nodes along
118connected pairs of hooks.  A node may have as many hooks as it
119needs, and may assign whatever meaning it wants to a hook.
120.Pp
121Hooks have these properties:
122.Pp
123.Bl -bullet -compact -offset 2n
124.It
125A hook has an
126.Tn ASCII
127name which is unique among all hooks
128on that node (other hooks on other nodes may have the same name).
129The name must not contain a
130.Dq .\&
131or a
132.Dq \&:
133and is
134limited to
135.Dv "NG_HOOKLEN + 1"
136characters (including NUL byte).
137.It
138A hook is always connected to another hook. That is, hooks are
139created at the time they are connected, and breaking an edge by
140removing either hook destroys both hooks.
141.It
142A hook can be set into a state where incoming packets are always queued
143by the input queuing system, rather than being delivered directly. This
144is used when the two joined nodes need to be decoupled, e.g. if they are
145running at different processor priority levels. (spl)
146.It
147A hook may supply over-riding receive data and receive message functions
148which should be used for data and messages received through that hook
149in preference to the general node-wide methods.
150.El
151.Pp
152A node may decide to assign special meaning to some hooks.
153For example, connecting to the hook named
154.Dq debug
155might trigger
156the node to start sending debugging information to that hook.
157.Sh Data Flow
158Two types of information flow between nodes: data messages and
159control messages. Data messages are passed in mbuf chains along the edges
160in the graph, one edge at a time. The first mbuf in a chain must have the
161.Dv M_PKTHDR
162flag set. Each node decides how to handle data coming in on its hooks.
163.Pp
164Control messages are type-specific C structures sent from one node
165directly to some arbitrary other node.  Control messages have a common
166header format, followed by type-specific data, and are binary structures
167for efficiency.  However, node types also may support conversion of the
168type specific data between binary and
169.Tn ASCII
170for debugging and human interface purposes (see the
171.Dv NGM_ASCII2BINARY
172and
173.Dv NGM_BINARY2ASCII
174generic control messages below).  Nodes are not required to support
175these conversions.
176.Pp
177There are three ways to address a control message. If
178there is a sequence of edges connecting the two nodes, the message
179may be
180.Dq source routed
181by specifying the corresponding sequence
182of
183.Tn ASCII
184hook names as the destination address for the message (relative
185addressing). If the destination is adjacent to the source, then the source
186node may simply specify (as a pointer in the code) the hook across which the
187message should be sent. Otherwise, the recipient node global
188.Tn ASCII
189name
190(or equivalent ID based name) is used as the destination address
191for the message (absolute addressing).  The two types of
192.Tn ASCII
193addressing
194may be combined, by specifying an absolute start node and a sequence
195of hooks. Only the
196.Tn ASCII
197addressing modes are available to control programs outside the kernel,
198as use of direct pointers is limited of course to kernel modules.
199.Pp
200Messages often represent commands that are followed by a reply message
201in the reverse direction. To facilitate this, the recipient of a
202control message is supplied with a
203.Dq return address
204that is suitable for addressing a reply.
205.Pp
206Each control message contains a 32 bit value called a
207.Em typecookie
208indicating the type of the message, i.e., how to interpret it.
209Typically each type defines a unique typecookie for the messages
210that it understands.  However, a node may choose to recognize and
211implement more than one type of message.
212.Pp
213If a message is delivered to an address that implies that it arrived
214at that node through a particular hook, (as opposed to having been directly
215addressed using its ID or global name), then that hook is identified to the
216receiving node. This allows a message to be rerouted or passed on, should
217a node decide that this is required, in much the same way that data packets
218are passed around between nodes. A set of standard
219messages for flow control and link management purposes are
220defined by the base system that are usually
221passed around in this manner. Flow control message would usually travel
222in the opposite direction to the data to which they pertain.
223.Sh Netgraph is (usually) Functional
224In order to minimize latency, most
225.Nm
226operations are functional.
227That is, data and control messages are delivered by making function
228calls rather than by using queues and mailboxes.  For example, if node
229A wishes to send a data mbuf to neighboring node B, it calls the
230generic
231.Nm
232data delivery function. This function in turn locates
233node B and calls B's
234.Dq receive data
235method. There are exceptions to this.
236.Pp
237Each node has an input queue, and some operations can be considered to
238be 'writers' in that they alter the state of the node. Obviously in an SMP
239world it would be bad if the state of a node were changed while another
240data packet were transiting the node. For this purpose, the input queue
241implements a
242.Em reader/writer
243semantic so that when there is a writer in the node, all other requests
244are queued, and while there are readers, a writer, and any following
245packets are queued. In the case where there is no reason to queue the
246data, the input method is called directly, as mentionned above.
247.Pp
248A node may declare that all requests should be considered as writers,
249or that requests coming in over a particular hook should be considered to
250be a writer, or even that packets leaving or entering across a particular
251hook should always be queued, rather than delivered directly (often useful
252for interrupt routines who want to get back to the hardware quickly).
253By default, all controll message packets are considered to be writers
254unless specifically declared to be a reader in their definition. (see
255NGM_READONLY in ng_message.h)
256.Pp
257While this mode of operation
258results in good performance, it has a few implications for node
259developers:
260.Pp
261.Bl -bullet -compact -offset 2n
262.It
263Whenever a node delivers a data or control message, the node
264may need to allow for the possibility of receiving a returning
265message before the original delivery function call returns.
266.It
267Netgraph nodes and support routines generally run at
268.Fn splnet .
269However, some nodes may want to send data and control messages
270from a different priority level. Netgraph supplies a mechanism which
271utilizes the NETISR system to move message and data delivery to
272.Fn splnet .
273Nodes that run at other priorities (e.g. interfaces) can be directly
274linked to other nodes so that the combination runs at the other priority,
275however any interaction with nodes running at splnet MUST be achieved via the
276queueing functions, (which use the
277.Fn netisr
278feature of the kernel).
279Note that messages are always received at
280.Fn splnet .
281.It
282It's possible for an infinite loop to occur if the graph contains cycles.
283.El
284.Pp
285So far, these issues have not proven problematical in practice.
286.Sh Interaction With Other Parts of the Kernel
287A node may have a hidden interaction with other components of the
288kernel outside of the
289.Nm
290subsystem, such as device hardware,
291kernel protocol stacks, etc.  In fact, one of the benefits of
292.Nm
293is the ability to join disparate kernel networking entities together in a
294consistent communication framework.
295.Pp
296An example is the node type
297.Em socket
298which is both a netgraph node and a
299.Xr socket 2
300BSD socket in the protocol family
301.Dv PF_NETGRAPH .
302Socket nodes allow user processes to participate in
303.Nm .
304Other nodes communicate with socket nodes using the usual methods, and the
305node hides the fact that it is also passing information to and from a
306cooperating user process.
307.Pp
308Another example is a device driver that presents
309a node interface to the hardware.
310.Sh Node Methods
311Nodes are notified of the following actions via function calls
312to the following node methods (all at
313.Fn splnet )
314and may accept or reject that action (by returning the appropriate
315error code):
316.Bl -tag -width xxx
317.It Creation of a new node
318The constructor for the type is called. If creation of a new node is
319allowed, the constructor must call the generic node creation
320function (in object-oriented terms, the superclass constructor)
321and then allocate any special resources it needs. For nodes that
322correspond to hardware, this is typically done during the device
323attach routine. Often a global
324.Tn ASCII
325name corresponding to the
326device name is assigned here as well.
327.It Creation of a new hook
328The hook is created and tentatively
329linked to the node, and the node is told about the name that will be
330used to describe this hook. The node sets up any special data structures
331it needs, or may reject the connection, based on the name of the hook.
332.It Successful connection of two hooks
333After both ends have accepted their
334hooks, and the links have been made, the nodes get a chance to
335find out who their peer is across the link and can then decide to reject
336the connection. Tear-down is automatic. This is also the time at which
337a node may decide whether to set a particular hook (or its peer) into
338.Em queuing
339mode.
340.It Destruction of a hook
341The node is notified of a broken connection. The node may consider some hooks
342to be critical to operation and others to be expendable: the disconnection
343of one hook may be an acceptable event while for another it
344may effect a total shutdown for the node.
345.It Shutdown of a node
346This method allows a node to clean up
347and to ensure that any actions that need to be performed
348at this time are taken. The method is called by the generic (i.e., superclass)
349node destructor which will get rid of the generic components of the node.
350Some nodes (usually associated with a piece of hardware) may be
351.Em persistent
352in that a shutdown breaks all edges and resets the node,
353but doesn't remove it. In this case the shutdown method should not
354free its resources, but rather, clean up and then clear the
355.Em NG_INVALID
356flag to signal the generic code that the shutdown is aborted. In
357the case where the shutdown is started by the node itself due to hardware
358removal or unloading, (via ng_rmnode_self()) it should set the
359.Em NG_REALLY_DIE
360flag to signal to its own shutdown method that it is not to persist.
361.El
362.Sh Sending and Receiving Data
363Two other methods are also supported by all nodes:
364.Bl -tag -width xxx
365.It Receive data message
366A
367.Em Netgraph queueable reqest item ,
368usually refered to as an
369.Em item ,
370is recieved by the function.
371The item contains a pointer to an mbuf and metadata about the packet.
372.Pp
373The node is notified on which hook the item arrived,
374and can use this information in its processing decision.
375The receiving node must always
376.Fn NG_FREE_M
377the mbuf chain on completion or error, or pass it on to another node
378(or kernel module) which will then be responsible for freeing it.
379Similarly the
380.Em item
381must be freed if it is not to be passed on to another node, by using the
382.Fn NG_FREE_ITEM
383macro. If the item still holds references to mbufs or metadata at the time of
384freeing then they will also be appropriatly freed.
385Therefore, if there is any chance that the mbuf or metadata will be
386changed or freed separatly from the item, it is very important
387that these fields be retrieved using the
388.Fn NGI_GET_M
389and
390.Fn NGI_GET_META
391macros that also remove the reference within the item. (or multiple frees
392of the same object will occur).
393.Pp
394If it is only required to examine the contents of the mbufs or the
395metadata, then it is possible to use the
396.Fn NGI_M
397and
398.Fn NGI_META
399macros to both read and rewrite these fields.
400.Pp
401In addition to the mbuf chain itself there may also be a pointer to a
402structure describing meta-data about the message
403(e.g. priority information). This pointer may be
404.Dv NULL
405if there is no additional information. The format for this information is
406described in
407.Pa sys/netgraph/netgraph.h .
408The memory for meta-data must allocated via
409.Fn malloc
410with type
411.Dv M_NETGRAPH_META .
412As with the data itself, it is the receiver's responsibility to
413.Fn free
414the meta-data. If the mbuf chain is freed the meta-data must
415be freed at the same time. If the meta-data is freed but the
416real data on is passed on, then a
417.Dv NULL
418pointer must be substituted. It is also the duty of the receiver to free
419the request item itself, or to use it to pass the message on further.
420.Pp
421The receiving node may decide to defer the data by queueing it in the
422.Nm
423NETISR system (see below). It achieves this by setting the
424.Dv HK_QUEUE
425flag in the flags word of the hook on which that data will arrive.
426The infrastructure will respect that bit and queue the data for delivery at
427a later time, rather than deliver it directly. A node may decide to set
428the bit on the
429.Em peer
430node, so that its own output packets are queued. This is used
431by device drivers running at different processor priorities to transfer
432packet delivery to the splnet() level at which the bulk of
433.Nm
434runs.
435.Pp
436The structure and use of meta-data is still experimental, but is
437presently used in frame-relay to indicate that management packets
438should be queued for transmission
439at a higher priority than data packets. This is required for
440conformance with Frame Relay standards.
441.Pp
442The node may elect to nominate a different receive data function
443for data received on a particular hook, to simplify coding. It uses
444the
445.Fn NG_HOOK_SET_RCVDATA hook fn
446macro to do this. The function receives the same arguments in every way
447other than it will receive all (and only) packets from that hook.
448.It Receive control message
449This method is called when a control message is addressed to the node.
450As with the received data, an
451.Em item
452is reveived, with a pointer to the control message.
453The message can be examined using the
454.Fn NGI_MSG
455macro, or completely extracted from the item using the
456.Fn NGI_GET_MSG
457which also removes the reference within the item.
458If the Item still holds a reference to the message when it is freed
459(using the
460.Fn NG_FREE_ITEM
461macro), then the message will also be freed appropriatly. If the
462reference has been removed the node must free the message itself using the
463.Fn NG_FREE_MSG
464macro.
465A return address is always supplied, giving the address of the node
466that originated the message so a reply message can be sent anytime later.
467The return address is retrieved from the
468.Em item
469using the
470.Fn NGI_RETADDR
471macro and is of type
472.Em ng_ID_t .
473All control messages and replies are
474allocated with
475.Fn malloc
476type
477.Dv M_NETGRAPH_MSG ,
478however it is more usual to use the
479.Fn NG_MKMESSAGE
480and
481.Fn NG_MKRESPONSE
482macros to allocate and fill out a message.
483Messages must be freed using the
484.Fn NG_FREE_MSG
485macro.
486.Pp
487If the message was delivered via a specific hook, that hook will
488also be made known, which allows the use of such things as flow-control
489messages, and status change messages, where the node may want to forward
490the message out another hook to that on which it arrived.
491.Pp
492The node may elect to nominate a different receive message function
493for messages received on a particular hook, to simplify coding. It uses
494the
495.Fn NG_HOOK_SET_RCVMSG hook fn
496macro to do this. The function receives the same arguments in every way
497other than it will receive all (and only) messages from that hook.
498.El
499.Pp
500Much use has been made of reference counts, so that nodes being
501free'd of all references are automatically freed, and this behaviour
502has been tested and debugged to present a consistent and trustworthy
503framework for the
504.Dq type module
505writer to use.
506.Sh Addressing
507The
508.Nm
509framework provides an unambiguous and simple to use method of specifically
510addressing any single node in the graph. The naming of a node is
511independent of its type, in that another node, or external component
512need not know anything about the node's type in order to address it so as
513to send it a generic message type. Node and hook names should be
514chosen so as to make addresses meaningful.
515.Pp
516Addresses are either absolute or relative. An absolute address begins
517with a node name, (or ID), followed by a colon, followed by a sequence of hook
518names separated by periods. This addresses the node reached by starting
519at the named node and following the specified sequence of hooks.
520A relative address includes only the sequence of hook names, implicitly
521starting hook traversal at the local node.
522.Pp
523There are a couple of special possibilities for the node name.
524The name
525.Dq .\&
526(referred to as
527.Dq \&.: )
528always refers to the local node.
529Also, nodes that have no global name may be addressed by their ID numbers,
530by enclosing the hex representation of the ID number within square brackets.
531Here are some examples of valid netgraph addresses:
532.Bd -literal -offset 4n -compact
533
534  .:
535  [3f]:
536  foo:
537  .:hook1
538  foo:hook1.hook2
539  [d80]:hook1
540.Ed
541.Pp
542Consider the following set of nodes might be created for a site with
543a single physical frame relay line having two active logical DLCI channels,
544with RFC-1490 frames on DLCI 16 and PPP frames over DLCI 20:
545.Pp
546.Bd -literal
547[type SYNC ]                  [type FRAME]                 [type RFC1490]
548[ "Frame1" ](uplink)<-->(data)[<un-named>](dlci16)<-->(mux)[<un-named>  ]
549[    A     ]                  [    B     ](dlci20)<---+    [     C      ]
550                                                      |
551                                                      |      [ type PPP ]
552                                                      +>(mux)[<un-named>]
553                                                             [    D     ]
554.Ed
555.Pp
556One could always send a control message to node C from anywhere
557by using the name
558.Em "Frame1:uplink.dlci16" .
559In this case, node C would also be notified that the message
560reached it via its hook
561.Dq mux .
562Similarly,
563.Em "Frame1:uplink.dlci20"
564could reliably be used to reach node D, and node A could refer
565to node B as
566.Em ".:uplink" ,
567or simply
568.Em "uplink" .
569Conversely, B can refer to A as
570.Em "data" .
571The address
572.Em "mux.data"
573could be used by both nodes C and D to address a message to node A.
574.Pp
575Note that this is only for
576.Em control messages .
577In each of these cases, where a relative addressing mode is
578used, the recipient is notified of the hook on which the
579message arrived, as well as
580the originating node.
581This allows the option of hop-by-hop distibution of messages and
582state information.
583Data messages are
584.Em only
585routed one hop at a time, by specifying the departing
586hook, with each node making
587the next routing decision. So when B receives a frame on hook
588.Dq data
589it decodes the frame relay header to determine the DLCI,
590and then forwards the unwrapped frame to either C or D.
591.Pp
592In a similar way, flow control messages may be routed in the reverse
593direction to outgoing data. For example a "buffer nearly full" message from
594.Em "Frame1:
595would be passed to node
596.Em B
597which might decide to send similar messages to both nodes
598.Em C
599and
600.Em D .
601The nodes would use
602.Em "Direct hook pointer"
603addressing to route the messages. The message may have travelled from
604.Em "Frame1:
605to
606.Em B
607as a synchronous reply, saving time and cycles.
608.Pp
609A similar graph might be used to represent multi-link PPP running
610over an ISDN line:
611.Pp
612.Bd -literal
613[ type BRI ](B1)<--->(link1)[ type MPP  ]
614[  "ISDN1" ](B2)<--->(link2)[ (no name) ]
615[          ](D) <-+
616                  |
617 +----------------+
618 |
619 +->(switch)[ type Q.921 ](term1)<---->(datalink)[ type Q.931 ]
620            [ (no name)  ]                       [ (no name)  ]
621.Ed
622.Sh Netgraph Structures
623Structures are defined in
624.Pa sys/netgraph/netgraph.h
625(for kernel sructures only of interest to nodes)
626and
627.Pa sys/netgraph/ng_message.h
628(for message definitions also of interest to user programs).
629.Pp
630The two basic object types that are of interest to node authors are
631.Em nodes
632and
633.Em hooks .
634These two objects have the following
635properties that are also of interest to the node writers.
636.Bl -tag -width xxx
637.It struct  ng_node
638Node authors should always use the following typedef to declare
639their pointers, and should never actually declare the structure.
640.Pp
641typedef struct ng_node *node_p;
642.Pp
643The following properties are associated with a node, and can be
644accessed in the following manner:
645.Bl -bullet -compact -offset 2n
646.Pp
647.It
648Validity
649.Pp
650A driver or interrupt routine may want to check whether
651the node is still valid. It is assumed that the caller holds a reference
652on the node so it will not have been freed, however it may have been
653disabled or otherwise shut down. Using the
654.Fn NG_NODE_IS_VALID "node"
655macro will return this state. Eventually it should be almost impossible
656for code to run in an invalid node but at this time that work has not been
657completed.
658.Pp
659.It
660node ID
661.Pp
662Of type
663.Em ng_ID_t ,
664This property can be retrieved using the macro
665.Fn NG_NODE_ID "node" .
666.Pp
667.It
668node name
669.Pp
670Optional globally unique name, null terminated string. If there
671is a value in here, it is the name of the node.
672.Pp
673if (
674.Fn NG_NODE_NAME "node"
675[0]) ....
676.Pp
677if (strncmp(
678.Fn NG_NODE_NAME "node"
679, "fred", NG_NODELEN)) ...
680.Pp
681.It
682A node dependent opaque cookie
683.Pp
684You may place anything of type
685.Em pointer
686here.
687Use the macros
688.Fn NG_NODE_SET_PRIVATE node value
689and
690.Fn NG_NODE_PRIVATE "node"
691to set and retrieve this property.
692.Pp
693.It
694number of hooks
695.Pp
696Use
697.Fn NG_NODE_NUMHOOKS "node"
698to retrieve this value.
699.Pp
700.It
701hooks
702.Pp
703The node may have a number of hooks.
704A traversal method is provided to allow all the hooks to be
705tested for some condition.
706.Fn NG_NODE_FOREACH_HOOK node fn arg rethook
707where fn is a function that will be called for each hook
708with the form
709.Fn fn hook arg
710and returning 0 to terminate the search. If the search is terminated, then
711.Em rethook
712will be set to the hook at which the search was terminated.
713.El
714.It struct  ng_hook
715Node authors should always use the following typedef to declare
716their hook pointers.
717.Pp
718typedef struct ng_hook *hook_p;
719.Pp
720The following properties are associated with a hook, and can be
721accessed in the following manner:
722.Bl -bullet -compact -offset 2n
723.Pp
724.It
725A node dependent opaque cookie.
726.Pp
727You may place anything of type
728.Em pointer
729here.
730Use the macros
731.Fn NG_HOOK_SET_PRIVATE hook value
732and
733.Fn NG_HOOK_PRIVATE "hook"
734to set and retrieve this property.
735.Pp
736.It
737An associate node.
738.Pp
739You may use the macro
740.Fn NG_HOOK_NODE "hook"
741to find the associated node.
742.Pp
743.It
744A peer hook
745.Pp
746The other hook in this connected pair. Of type hook_p. You can
747use
748.Fn NG_HOOK_PEER "hook"
749to find the peer.
750.Pp
751.It
752references
753.Pp
754.Fn NG_HOOK_REF "hook"
755and
756.Fn NG_HOOK_UNREF "hook"
757increment and decrement the hook reference count accordingly.
758After decrement you should always assume the hook has been freed
759unless you have another reference still valid.
760.Pp
761.It
762Over-ride receive functions.
763.Pp
764The
765.Fn NG_HOOK_SET_RCVDATA hook fn
766and
767.Fn NG_HOOK_SET_RCVMSG hook fn
768macros can be used to set over-ride methods that will be used in preference
769to the generic receive data and reveive message functions. To unset these
770use the macros to set them to NULL. They will only be used for data and
771messages received on the hook on which they are set.
772.El
773.Pp
774The maintenance of the names, reference counts, and linked list
775of hooks for each node is handled automatically by the
776.Nm
777subsystem.
778Typically a node's private info contains a back-pointer to the node or hook
779structure, which counts as a new reference that must be included
780in the reference count for the node. When the node constructor is called
781there is already a reference for this calculated in, so that
782when the node is destroyed, it should remember to do a
783.Fn NG_NODE_UNREF
784on the node.
785.Pp
786From a hook you can obtain the corresponding node, and from
787a node, it is possible to traverse all the active hooks.
788.Pp
789A current example of how to define a node can always be seen in
790.Em sys/netgraph/ng_sample.c
791and should be used as a starting point for new node writers.
792.El
793.Sh Netgraph Message Structure
794Control messages have the following structure:
795.Bd -literal
796#define NG_CMDSTRLEN    15      /* Max command string (16 with null) */
797
798struct ng_mesg {
799  struct ng_msghdr {
800    u_char      version;        /* Must equal NG_VERSION */
801    u_char      spare;          /* Pad to 2 bytes */
802    u_short     arglen;         /* Length of cmd/resp data */
803    u_long      flags;          /* Message status flags */
804    u_long      token;          /* Reply should have the same token */
805    u_long      typecookie;     /* Node type understanding this message */
806    u_long      cmd;            /* Command identifier */
807    u_char      cmdstr[NG_CMDSTRLEN+1]; /* Cmd string (for debug) */
808  } header;
809  char  data[0];                /* Start of cmd/resp data */
810};
811
812#define NG_ABI_VERSION  5               /* Netgraph kernel ABI version */
813#define NG_VERSION      4               /* Netgraph message version */
814#define NGF_ORIG        0x0000          /* Command */
815#define NGF_RESP        0x0001          /* Response */
816.Ed
817.Pp
818Control messages have the fixed header shown above, followed by a
819variable length data section which depends on the type cookie
820and the command. Each field is explained below:
821.Bl -tag -width xxx
822.It Dv version
823Indicates the version of the netgraph message protocol itself. The current version is
824.Dv NG_VERSION .
825.It Dv arglen
826This is the length of any extra arguments, which begin at
827.Dv data .
828.It Dv flags
829Indicates whether this is a command or a response control message.
830.It Dv token
831The
832.Dv token
833is a means by which a sender can match a reply message to the
834corresponding command message; the reply always has the same token.
835.Pp
836.It Dv typecookie
837The corresponding node type's unique 32-bit value.
838If a node doesn't recognize the type cookie it must reject the message
839by returning
840.Er EINVAL .
841.Pp
842Each type should have an include file that defines the commands,
843argument format, and cookie for its own messages.
844The typecookie
845insures that the same header file was included by both sender and
846receiver; when an incompatible change in the header file is made,
847the typecookie
848.Em must
849be changed.
850The de facto method for generating unique type cookies is to take the
851seconds from the epoch at the time the header file is written
852(i.e., the output of
853.Dv "date -u +'%s'" ) .
854.Pp
855There is a predefined typecookie
856.Dv NGM_GENERIC_COOKIE
857for the
858.Dq generic
859node type, and
860a corresponding set of generic messages which all nodes understand.
861The handling of these messages is automatic.
862.It Dv command
863The identifier for the message command. This is type specific,
864and is defined in the same header file as the typecookie.
865.It Dv cmdstr
866Room for a short human readable version of
867.Dq command
868(for debugging purposes only).
869.El
870.Pp
871Some modules may choose to implement messages from more than one
872of the header files and thus recognize more than one type cookie.
873.Sh Control Message ASCII Form
874Control messages are in binary format for efficiency.  However, for
875debugging and human interface purposes, and if the node type supports
876it, control messages may be converted to and from an equivalent
877.Tn ASCII
878form.  The
879.Tn ASCII
880form is similar to the binary form, with two exceptions:
881.Pp
882.Bl -tag -compact -width xxx
883.It o
884The
885.Dv cmdstr
886header field must contain the
887.Tn ASCII
888name of the command, corresponding to the
889.Dv cmd
890header field.
891.It o
892The
893.Dv args
894field contains a NUL-terminated
895.Tn ASCII
896string version of the message arguments.
897.El
898.Pp
899In general, the arguments field of a control messgage can be any
900arbitrary C data type.  Netgraph includes parsing routines to support
901some pre-defined datatypes in
902.Tn ASCII
903with this simple syntax:
904.Pp
905.Bl -tag -compact -width xxx
906.It o
907Integer types are represented by base 8, 10, or 16 numbers.
908.It o
909Strings are enclosed in double quotes and respect the normal
910C language backslash escapes.
911.It o
912IP addresses have the obvious form.
913.It o
914Arrays are enclosed in square brackets, with the elements listed
915consecutively starting at index zero.  An element may have an optional
916index and equals sign preceding it.  Whenever an element
917does not have an explicit index, the index is implicitly the previous
918element's index plus one.
919.It o
920Structures are enclosed in curly braces, and each field is specified
921in the form
922.Dq fieldname=value .
923.It o
924Any array element or structure field whose value is equal to its
925.Dq default value
926may be omitted. For integer types, the default value
927is usually zero; for string types, the empty string.
928.It o
929Array elements and structure fields may be specified in any order.
930.El
931.Pp
932Each node type may define its own arbitrary types by providing
933the necessary routines to parse and unparse.
934.Tn ASCII
935forms defined
936for a specific node type are documented in the documentation for
937that node type.
938.Sh Generic Control Messages
939There are a number of standard predefined messages that will work
940for any node, as they are supported directly by the framework itself.
941These are defined in
942.Pa ng_message.h
943along with the basic layout of messages and other similar information.
944.Bl -tag -width xxx
945.It Dv NGM_CONNECT
946Connect to another node, using the supplied hook names on either end.
947.It Dv NGM_MKPEER
948Construct a node of the given type and then connect to it using the
949supplied hook names.
950.It Dv NGM_SHUTDOWN
951The target node should disconnect from all its neighbours and shut down.
952Persistent nodes such as those representing physical hardware
953might not disappear from the node namespace, but only reset themselves.
954The node must disconnect all of its hooks.
955This may result in neighbors shutting themselves down, and possibly a
956cascading shutdown of the entire connected graph.
957.It Dv NGM_NAME
958Assign a name to a node. Nodes can exist without having a name, and this
959is the default for nodes created using the
960.Dv NGM_MKPEER
961method. Such nodes can only be addressed relatively or by their ID number.
962.It Dv NGM_RMHOOK
963Ask the node to break a hook connection to one of its neighbours.
964Both nodes will have their
965.Dq disconnect
966method invoked.
967Either node may elect to totally shut down as a result.
968.It Dv NGM_NODEINFO
969Asks the target node to describe itself. The four returned fields
970are the node name (if named), the node type, the node ID and the
971number of hooks attached. The ID is an internal number unique to that node.
972.It Dv NGM_LISTHOOKS
973This returns the information given by
974.Dv NGM_NODEINFO ,
975but in addition
976includes an array of fields describing each link, and the description for
977the node at the far end of that link.
978.It Dv NGM_LISTNAMES
979This returns an array of node descriptions (as for
980.Dv NGM_NODEINFO ")"
981where each entry of the array describes a named node.
982All named nodes will be described.
983.It Dv NGM_LISTNODES
984This is the same as
985.Dv NGM_LISTNAMES
986except that all nodes are listed regardless of whether they have a name or not.
987.It Dv NGM_LISTTYPES
988This returns a list of all currently installed netgraph types.
989.It Dv NGM_TEXT_STATUS
990The node may return a text formatted status message.
991The status information is determined entirely by the node type.
992It is the only "generic" message
993that requires any support within the node itself and as such the node may
994elect to not support this message. The text response must be less than
995.Dv NG_TEXTRESPONSE
996bytes in length (presently 1024). This can be used to return general
997status information in human readable form.
998.It Dv NGM_BINARY2ASCII
999This message converts a binary control message to its
1000.Tn ASCII
1001form.
1002The entire control message to be converted is contained within the
1003arguments field of the
1004.Dv NGM_BINARY2ASCII
1005message itself.  If successful, the reply will contain the same control
1006message in
1007.Tn ASCII
1008form.
1009A node will typically only know how to translate messages that it
1010itself understands, so the target node of the
1011.Dv NGM_BINARY2ASCII
1012is often the same node that would actually receive that message.
1013.It Dv NGM_ASCII2BINARY
1014The opposite of
1015.Dv NGM_BINARY2ASCII .
1016The entire control message to be converted, in
1017.Tn ASCII
1018form, is contained
1019in the arguments section of the
1020.Dv NGM_ASCII2BINARY
1021and need only have the
1022.Dv flags ,
1023.Dv cmdstr ,
1024and
1025.Dv arglen
1026header fields filled in, plus the NUL-terminated string version of
1027the arguments in the arguments field.  If successful, the reply
1028contains the binary version of the control message.
1029.El
1030.Sh Flow Control Messages
1031In addition to the control messages that affect nodes with respect to the
1032graph, there are also a number of
1033.Em Flow-control
1034messages defined. At present these are
1035.Em NOT
1036handled automatically by the system, so
1037nodes need to handle them if they are going to be used in a graph utilising
1038flow control, and will be in the likely path of these messages. The
1039default action of a node that doesn't understand these messages should
1040be to pass them onto the next node. Hopefully some helper functions
1041will assist in this eventually. These messages are also defined in
1042.Pa sys/netgraph/ng_message.h
1043and have a separate cookie
1044.Em NG_FLOW_COOKIE
1045to help identify them. They will not be covered in depth here.
1046.Sh Metadata
1047Data moving through the
1048.Nm
1049system can be accompanied by meta-data that describes some
1050aspect of that data. The form of the meta-data is a fixed header,
1051which contains enough information for most uses, and can optionally
1052be supplemented by trailing
1053.Em option
1054structures, which contain a
1055.Em cookie
1056(see the section on control messages), an identifier, a length and optional
1057data. If a node does not recognize the cookie associated with an option,
1058it should ignore that option.
1059.Pp
1060Meta data might include such things as priority, discard eligibility,
1061or special processing requirements. It might also mark a packet for
1062debug status, etc. The use of meta-data is still experimental.
1063.Sh INITIALIZATION
1064The base
1065.Nm
1066code may either be statically compiled
1067into the kernel or else loaded dynamically as a KLD via
1068.Xr kldload 8 .
1069In the former case, include
1070.Pp
1071.Dl options NETGRAPH
1072.Pp
1073in your kernel configuration file. You may also include selected
1074node types in the kernel compilation, for example:
1075.Bd -literal -offset indent
1076options NETGRAPH
1077options NETGRAPH_SOCKET
1078options NETGRAPH_ECHO
1079.Ed
1080.Pp
1081Once the
1082.Nm
1083subsystem is loaded, individual node types may be loaded at any time
1084as KLD modules via
1085.Xr kldload 8 .
1086Moreover,
1087.Nm
1088knows how to automatically do this; when a request to create a new
1089node of unknown type
1090.Em type
1091is made,
1092.Nm
1093will attempt to load the KLD module
1094.Pa ng_type.ko .
1095.Pp
1096Types can also be installed at boot time, as certain device drivers
1097may want to export each instance of the device as a netgraph node.
1098.Pp
1099In general, new types can be installed at any time from within the
1100kernel by calling
1101.Fn ng_newtype ,
1102supplying a pointer to the type's
1103.Dv struct ng_type
1104structure.
1105.Pp
1106The
1107.Fn NETGRAPH_INIT
1108macro automates this process by using a linker set.
1109.Sh EXISTING NODE TYPES
1110Several node types currently exist. Each is fully documented
1111in its own man page:
1112.Bl -tag -width xxx
1113.It SOCKET
1114The socket type implements two new sockets in the new protocol domain
1115.Dv PF_NETGRAPH .
1116The new sockets protocols are
1117.Dv NG_DATA
1118and
1119.Dv NG_CONTROL ,
1120both of type
1121.Dv SOCK_DGRAM .
1122Typically one of each is associated with a socket node.
1123When both sockets have closed, the node will shut down. The
1124.Dv NG_DATA
1125socket is used for sending and receiving data, while the
1126.Dv NG_CONTROL
1127socket is used for sending and receiving control messages.
1128Data and control messages are passed using the
1129.Xr sendto 2
1130and
1131.Xr recvfrom 2
1132calls, using a
1133.Dv struct sockaddr_ng
1134socket address.
1135.Pp
1136.It HOLE
1137Responds only to generic messages and is a
1138.Dq black hole
1139for data, Useful for testing. Always accepts new hooks.
1140.Pp
1141.It ECHO
1142Responds only to generic messages and always echoes data back through the
1143hook from which it arrived. Returns any non generic messages as their
1144own response. Useful for testing.  Always accepts new hooks.
1145.Pp
1146.It TEE
1147This node is useful for
1148.Dq snooping .
1149It has 4 hooks:
1150.Dv left ,
1151.Dv right ,
1152.Dv left2right ,
1153and
1154.Dv right2left .
1155Data entering from the right is passed to the left and duplicated on
1156.Dv right2left ,
1157and data entering from the left is passed to the right and
1158duplicated on
1159.Dv left2right .
1160Data entering from
1161.Dv left2right
1162is sent to the right and data from
1163.Dv right2left
1164to left.
1165.Pp
1166.It RFC1490 MUX
1167Encapsulates/de-encapsulates frames encoded according to RFC 1490.
1168Has a hook for the encapsulated packets
1169.Pq Dq downstream
1170and one hook
1171for each protocol (i.e., IP, PPP, etc.).
1172.Pp
1173.It FRAME RELAY MUX
1174Encapsulates/de-encapsulates Frame Relay frames.
1175Has a hook for the encapsulated packets
1176.Pq Dq downstream
1177and one hook
1178for each DLCI.
1179.Pp
1180.It FRAME RELAY LMI
1181Automatically handles frame relay
1182.Dq LMI
1183(link management interface) operations and packets.
1184Automatically probes and detects which of several LMI standards
1185is in use at the exchange.
1186.Pp
1187.It TTY
1188This node is also a line discipline. It simply converts between mbuf
1189frames and sequential serial data, allowing a tty to appear as a netgraph
1190node. It has a programmable
1191.Dq hotkey
1192character.
1193.Pp
1194.It ASYNC
1195This node encapsulates and de-encapsulates asynchronous frames
1196according to RFC 1662. This is used in conjunction with the TTY node
1197type for supporting PPP links over asynchronous serial lines.
1198.Pp
1199.It INTERFACE
1200This node is also a system networking interface. It has hooks representing
1201each protocol family (IP, AppleTalk, IPX, etc.) and appears in the output of
1202.Xr ifconfig 8 .
1203The interfaces are named
1204.Em ng0 ,
1205.Em ng1 ,
1206etc.
1207.It ONE2MANY
1208This node implements a simple round-robin multiplexer. It can be used
1209for example to make several LAN ports act together to get a higher speed
1210link between two machines.
1211.It Various PPP related nodes.
1212There is a full multilink PPP implementation that runs in Netgraph.
1213The
1214.Em Mpd
1215port can use these modules to make a very low latency high
1216capacity ppp system. It also supports
1217.Em PPTP
1218vpns using the
1219.Em PPTP
1220node.
1221.It PPPOE
1222A server and client side implememtation of PPPoE. Used in conjunction with
1223either
1224.Xr ppp 8
1225or the
1226.Em mpd port .
1227.It BRIDGE
1228This node, togther with the ethernet nodes allows a very flexible
1229bridging system to be implemented.
1230.It KSOCKET
1231This intriguing node looks like a socket to the system but diverts
1232all data to and from the netgraph system for further processing. This allows
1233such things as UDP tunnels to be almost trivially implemented from the
1234command line.
1235.El
1236.Pp
1237Refer to the section at the end of this man page for more nodes types.
1238.Sh NOTES
1239Whether a named node exists can be checked by trying to send a control message
1240to it (e.g.,
1241.Dv NGM_NODEINFO
1242).
1243If it does not exist,
1244.Er ENOENT
1245will be returned.
1246.Pp
1247All data messages are mbuf chains with the M_PKTHDR flag set.
1248.Pp
1249Nodes are responsible for freeing what they allocate.
1250There are three exceptions:
1251.Bl -tag -width xxxx
1252.It 1
1253Mbufs sent across a data link are never to be freed by the sender. In the
1254case of error, they should be considered freed.
1255.It 2
1256Any meta-data information traveling with the data has the same restriction.
1257It might be freed by any node the data passes through, and a
1258.Dv NULL
1259passed onwards, but the caller will never free it.
1260Two macros
1261.Fn NG_FREE_META "meta"
1262and
1263.Fn NG_FREE_M "m"
1264should be used if possible to free data and meta data (see
1265.Pa netgraph.h ) .
1266.It 3
1267Messages sent using
1268.Fn ng_send_message
1269are freed by the recipient. As in the case above, the addresses
1270associated with the message are freed by whatever allocated them so the
1271recipient should copy them if it wants to keep that information.
1272.It 4
1273Both control mesages and data are delivered and queued with
1274a netgraph
1275.Em item .
1276The item must be freed using
1277.Fn NG_FREE_ITEM "item"
1278or passed on to another node.
1279.El
1280.Sh FILES
1281.Bl -tag -width xxxxx -compact
1282.It Pa /sys/netgraph/netgraph.h
1283Definitions for use solely within the kernel by
1284.Nm
1285nodes.
1286.It Pa /sys/netgraph/ng_message.h
1287Definitions needed by any file that needs to deal with
1288.Nm
1289messages.
1290.It Pa /sys/netgraph/ng_socket.h
1291Definitions needed to use
1292.Nm
1293socket type nodes.
1294.It Pa /sys/netgraph/ng_{type}.h
1295Definitions needed to use
1296.Nm
1297{type}
1298nodes, including the type cookie definition.
1299.It Pa /modules/netgraph.ko
1300Netgraph subsystem loadable KLD module.
1301.It Pa /modules/ng_{type}.ko
1302Loadable KLD module for node type {type}.
1303.It Pa /sys/netgraph/ng_sample.c
1304Skeleton netgraph node.
1305Use this as a starting point for new node types.
1306.El
1307.Sh USER MODE SUPPORT
1308There is a library for supporting user-mode programs that wish
1309to interact with the netgraph system. See
1310.Xr netgraph 3
1311for details.
1312.Pp
1313Two user-mode support programs,
1314.Xr ngctl 8
1315and
1316.Xr nghook 8 ,
1317are available to assist manual configuration and debugging.
1318.Pp
1319There are a few useful techniques for debugging new node types.
1320First, implementing new node types in user-mode first
1321makes debugging easier.
1322The
1323.Em tee
1324node type is also useful for debugging, especially in conjunction with
1325.Xr ngctl 8
1326and
1327.Xr nghook 8 .
1328.Pp
1329Also look in /usr/share/examples/netgraph for solutions to several
1330common networking problems, solved using
1331.Nm .
1332.Sh SEE ALSO
1333.Xr socket 2 ,
1334.Xr netgraph 3 ,
1335.Xr ng_async 4 ,
1336.Xr ng_bridge 4 ,
1337.Xr ng_bpf 4 ,
1338.Xr ng_cisco 4 ,
1339.Xr ng_ether 4 ,
1340.Xr ng_echo 4 ,
1341.Xr ng_ether 4 ,
1342.Xr ng_frame_relay 4 ,
1343.Xr ng_hole 4 ,
1344.Xr ng_iface 4 ,
1345.Xr ng_ksocket 4 ,
1346.Xr ng_lmi 4 ,
1347.Xr ng_mppc 4 ,
1348.Xr ng_ppp 4 ,
1349.Xr ng_pppoe 4 ,
1350.Xr ng_pptpgre 4 ,
1351.Xr ng_rfc1490 4 ,
1352.Xr ng_socket 4 ,
1353.Xr ng_tee 4 ,
1354.Xr ng_tty 4 ,
1355.Xr ng_UI 4 ,
1356.Xr ng_vjc 4 ,
1357.Xr ng_{type} 4 ,
1358.Xr ngctl 8 ,
1359.Xr nghook 8
1360.Sh HISTORY
1361The
1362.Nm
1363system was designed and first implemented at Whistle Communications, Inc.
1364in a version of
1365.Fx 2.2
1366customized for the Whistle InterJet.
1367It first made its debut in the main tree in
1368.Fx 3.4 .
1369.Sh AUTHORS
1370.An -nosplit
1371.An Julian Elischer Aq julian@FreeBSD.org ,
1372with contributions by
1373.An Archie Cobbs Aq archie@FreeBSD.org .
1374