1.\" Copyright (c) 1996-1999 Whistle Communications, Inc. 2.\" All rights reserved. 3.\" 4.\" Subject to the following obligations and disclaimer of warranty, use and 5.\" redistribution of this software, in source or object code forms, with or 6.\" without modifications are expressly permitted by Whistle Communications; 7.\" provided, however, that: 8.\" 1. Any and all reproductions of the source or object code must include the 9.\" copyright notice above and the following disclaimer of warranties; and 10.\" 2. No rights are granted, in any manner or form, to use Whistle 11.\" Communications, Inc. trademarks, including the mark "WHISTLE 12.\" COMMUNICATIONS" on advertising, endorsements, or otherwise except as 13.\" such appears in the above copyright notice or in the software. 14.\" 15.\" THIS SOFTWARE IS BEING PROVIDED BY WHISTLE COMMUNICATIONS "AS IS", AND 16.\" TO THE MAXIMUM EXTENT PERMITTED BY LAW, WHISTLE COMMUNICATIONS MAKES NO 17.\" REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED, REGARDING THIS SOFTWARE, 18.\" INCLUDING WITHOUT LIMITATION, ANY AND ALL IMPLIED WARRANTIES OF 19.\" MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. 20.\" WHISTLE COMMUNICATIONS DOES NOT WARRANT, GUARANTEE, OR MAKE ANY 21.\" REPRESENTATIONS REGARDING THE USE OF, OR THE RESULTS OF THE USE OF THIS 22.\" SOFTWARE IN TERMS OF ITS CORRECTNESS, ACCURACY, RELIABILITY OR OTHERWISE. 23.\" IN NO EVENT SHALL WHISTLE COMMUNICATIONS BE LIABLE FOR ANY DAMAGES 24.\" RESULTING FROM OR ARISING OUT OF ANY USE OF THIS SOFTWARE, INCLUDING 25.\" WITHOUT LIMITATION, ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, 26.\" PUNITIVE, OR CONSEQUENTIAL DAMAGES, PROCUREMENT OF SUBSTITUTE GOODS OR 27.\" SERVICES, LOSS OF USE, DATA OR PROFITS, HOWEVER CAUSED AND UNDER ANY 28.\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT 29.\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF 30.\" THIS SOFTWARE, EVEN IF WHISTLE COMMUNICATIONS IS ADVISED OF THE POSSIBILITY 31.\" OF SUCH DAMAGE. 32.\" 33.\" Authors: Julian Elischer <julian@FreeBSD.org> 34.\" Archie Cobbs <archie@FreeBSD.org> 35.\" 36.\" $FreeBSD$ 37.\" $Whistle: netgraph.4,v 1.7 1999/01/28 23:54:52 julian Exp $ 38.\" 39.Dd January 19, 1999 40.Dt NETGRAPH 4 41.Os 42.Sh NAME 43.Nm netgraph 44.Nd graph based kernel networking subsystem 45.Sh DESCRIPTION 46The 47.Nm 48system provides a uniform and modular system for the implementation 49of kernel objects which perform various networking functions. The objects, 50known as 51.Em nodes , 52can be arranged into arbitrarily complicated graphs. Nodes have 53.Em hooks 54which are used to connect two nodes together, forming the edges in the graph. 55Nodes communicate along the edges to process data, implement protocols, etc. 56.Pp 57The aim of 58.Nm 59is to supplement rather than replace the existing kernel networking 60infrastructure. It provides: 61.Pp 62.Bl -bullet -compact -offset 2n 63.It 64A flexible way of combining protocol and link level drivers 65.It 66A modular way to implement new protocols 67.It 68A common framework for kernel entities to inter-communicate 69.It 70A reasonably fast, kernel-based implementation 71.El 72.Sh Nodes and Types 73The most fundamental concept in 74.Nm 75is that of a 76.Em node . 77All nodes implement a number of predefined methods which allow them 78to interact with other nodes in a well defined manner. 79.Pp 80Each node has a 81.Em type , 82which is a static property of the node determined at node creation time. 83A node's type is described by a unique 84.Tn ASCII 85type name. 86The type implies what the node does and how it may be connected 87to other nodes. 88.Pp 89In object-oriented language, types are classes and nodes are instances 90of their respective class. All node types are subclasses of the generic node 91type, and hence inherit certain common functionality and capabilities 92(e.g., the ability to have an 93.Tn ASCII 94name). 95.Pp 96Nodes may be assigned a globally unique 97.Tn ASCII 98name which can be 99used to refer to the node. 100The name must not contain the characters 101.Dq .\& 102or 103.Dq \&: 104and is limited to 105.Dv "NG_NODELEN + 1" 106characters (including NUL byte). 107.Pp 108Each node instance has a unique 109.Em ID number 110which is expressed as a 32-bit hex value. This value may be used to 111refer to a node when there is no 112.Tn ASCII 113name assigned to it. 114.Sh Hooks 115Nodes are connected to other nodes by connecting a pair of 116.Em hooks , 117one from each node. Data flows bidirectionally between nodes along 118connected pairs of hooks. A node may have as many hooks as it 119needs, and may assign whatever meaning it wants to a hook. 120.Pp 121Hooks have these properties: 122.Pp 123.Bl -bullet -compact -offset 2n 124.It 125A hook has an 126.Tn ASCII 127name which is unique among all hooks 128on that node (other hooks on other nodes may have the same name). 129The name must not contain a 130.Dq .\& 131or a 132.Dq \&: 133and is 134limited to 135.Dv "NG_HOOKLEN + 1" 136characters (including NUL byte). 137.It 138A hook is always connected to another hook. That is, hooks are 139created at the time they are connected, and breaking an edge by 140removing either hook destroys both hooks. 141.It 142A hook can be set into a state where incoming packets are always queued 143by the input queueing system, rather than being delivered directly. This 144is used when the two joined nodes need to be decoupled, e.g. if they are 145running at different processor priority levels. (spl) 146.It 147A hook may supply over-riding receive data and receive message functions 148which should be used for data and messages received through that hook 149in preference to the general node-wide methods. 150.El 151.Pp 152A node may decide to assign special meaning to some hooks. 153For example, connecting to the hook named 154.Dq debug 155might trigger 156the node to start sending debugging information to that hook. 157.Sh Data Flow 158Two types of information flow between nodes: data messages and 159control messages. Data messages are passed in mbuf chains along the edges 160in the graph, one edge at a time. The first mbuf in a chain must have the 161.Dv M_PKTHDR 162flag set. Each node decides how to handle data coming in on its hooks. 163.Pp 164Control messages are type-specific C structures sent from one node 165directly to some arbitrary other node. Control messages have a common 166header format, followed by type-specific data, and are binary structures 167for efficiency. However, node types also may support conversion of the 168type specific data between binary and 169.Tn ASCII 170for debugging and human interface purposes (see the 171.Dv NGM_ASCII2BINARY 172and 173.Dv NGM_BINARY2ASCII 174generic control messages below). Nodes are not required to support 175these conversions. 176.Pp 177There are three ways to address a control message. If 178there is a sequence of edges connecting the two nodes, the message 179may be 180.Dq source routed 181by specifying the corresponding sequence 182of 183.Tn ASCII 184hook names as the destination address for the message (relative 185addressing). If the destination is adjacent to the source, then the source 186node may simply specify (as a pointer in the code) the hook across which the 187message should be sent. Otherwise, the recipient node global 188.Tn ASCII 189name 190(or equivalent ID based name) is used as the destination address 191for the message (absolute addressing). The two types of 192.Tn ASCII 193addressing 194may be combined, by specifying an absolute start node and a sequence 195of hooks. Only the 196.Tn ASCII 197addressing modes are available to control programs outside the kernel, 198as use of direct pointers is limited of course to kernel modules. 199.Pp 200Messages often represent commands that are followed by a reply message 201in the reverse direction. To facilitate this, the recipient of a 202control message is supplied with a 203.Dq return address 204that is suitable for addressing a reply. 205.Pp 206Each control message contains a 32 bit value called a 207.Em typecookie 208indicating the type of the message, i.e., how to interpret it. 209Typically each type defines a unique typecookie for the messages 210that it understands. However, a node may choose to recognize and 211implement more than one type of message. 212.Pp 213If a message is delivered to an address that implies that it arrived 214at that node through a particular hook, (as opposed to having been directly 215addressed using its ID or global name), then that hook is identified to the 216receiving node. This allows a message to be rerouted or passed on, should 217a node decide that this is required, in much the same way that data packets 218are passed around between nodes. A set of standard 219messages for flow control and link management purposes are 220defined by the base system that are usually 221passed around in this manner. Flow control message would usually travel 222in the opposite direction to the data to which they pertain. 223.Sh Netgraph is (usually) Functional 224In order to minimize latency, most 225.Nm 226operations are functional. 227That is, data and control messages are delivered by making function 228calls rather than by using queues and mailboxes. For example, if node 229A wishes to send a data mbuf to neighboring node B, it calls the 230generic 231.Nm 232data delivery function. This function in turn locates 233node B and calls B's 234.Dq receive data 235method. There are exceptions to this. 236.Pp 237Each node has an input queue, and some operations can be considered to 238be 'writers' in that they alter the state of the node. Obviously in an SMP 239world it would be bad if the state of a node were changed while another 240data packet were transiting the node. For this purpose, the input queue 241implements a 242.Em reader/writer 243semantic so that when there is a writer in the node, all other requests 244are queued, and while there are readers, a writer, and any following 245packets are queued. In the case where there is no reason to queue the 246data, the input method is called directly, as mentionned above. 247.Pp 248A node may declare that all requests should be considered as writers, 249or that requests coming in over a particular hook should be considered to 250be a writer, or even that packets leaving or entering across a particular 251hook should always be queued, rather than delivered directly (often useful 252for interrupt routines who want to get back to the hardware quickly). 253By default, all controll message packets are considered to be writers 254unless specifically declared to be a reader in their definition. (see 255NGM_READONLY in ng_message.h) 256.Pp 257While this mode of operation 258results in good performance, it has a few implications for node 259developers: 260.Pp 261.Bl -bullet -compact -offset 2n 262.It 263Whenever a node delivers a data or control message, the node 264may need to allow for the possibility of receiving a returning 265message before the original delivery function call returns. 266.It 267Netgraph nodes and support routines generally run at 268.Fn splnet . 269However, some nodes may want to send data and control messages 270from a different priority level. Netgraph supplies a mechanism which 271utilizes the NETISR system to move message and data delivery to 272.Fn splnet . 273Nodes that run at other priorities (e.g. interfaces) can be directly 274linked to other nodes so that the combination runs at the other priority, 275however any interaction with nodes running at splnet MUST be achieved via the 276queueing functions, (which use the 277.Fn netisr 278feature of the kernel). 279Note that messages are always received at 280.Fn splnet . 281.It 282It's possible for an infinite loop to occur if the graph contains cycles. 283.El 284.Pp 285So far, these issues have not proven problematical in practice. 286.Sh Interaction With Other Parts of the Kernel 287A node may have a hidden interaction with other components of the 288kernel outside of the 289.Nm 290subsystem, such as device hardware, 291kernel protocol stacks, etc. In fact, one of the benefits of 292.Nm 293is the ability to join disparate kernel networking entities together in a 294consistent communication framework. 295.Pp 296An example is the node type 297.Em socket 298which is both a netgraph node and a 299.Xr socket 2 300.Bx 301socket in the protocol family 302.Dv PF_NETGRAPH . 303Socket nodes allow user processes to participate in 304.Nm . 305Other nodes communicate with socket nodes using the usual methods, and the 306node hides the fact that it is also passing information to and from a 307cooperating user process. 308.Pp 309Another example is a device driver that presents 310a node interface to the hardware. 311.Sh Node Methods 312Nodes are notified of the following actions via function calls 313to the following node methods (all at 314.Fn splnet ) 315and may accept or reject that action (by returning the appropriate 316error code): 317.Bl -tag -width xxx 318.It Creation of a new node 319The constructor for the type is called. If creation of a new node is 320allowed, the constructor must call the generic node creation 321function (in object-oriented terms, the superclass constructor) 322and then allocate any special resources it needs. For nodes that 323correspond to hardware, this is typically done during the device 324attach routine. Often a global 325.Tn ASCII 326name corresponding to the 327device name is assigned here as well. 328.It Creation of a new hook 329The hook is created and tentatively 330linked to the node, and the node is told about the name that will be 331used to describe this hook. The node sets up any special data structures 332it needs, or may reject the connection, based on the name of the hook. 333.It Successful connection of two hooks 334After both ends have accepted their 335hooks, and the links have been made, the nodes get a chance to 336find out who their peer is across the link and can then decide to reject 337the connection. Tear-down is automatic. This is also the time at which 338a node may decide whether to set a particular hook (or its peer) into 339.Em queueing 340mode. 341.It Destruction of a hook 342The node is notified of a broken connection. The node may consider some hooks 343to be critical to operation and others to be expendable: the disconnection 344of one hook may be an acceptable event while for another it 345may affect a total shutdown for the node. 346.It Shutdown of a node 347This method allows a node to clean up 348and to ensure that any actions that need to be performed 349at this time are taken. The method is called by the generic (i.e., superclass) 350node destructor which will get rid of the generic components of the node. 351Some nodes (usually associated with a piece of hardware) may be 352.Em persistent 353in that a shutdown breaks all edges and resets the node, 354but doesn't remove it. In this case the shutdown method should not 355free its resources, but rather, clean up and then clear the 356.Em NG_INVALID 357flag to signal the generic code that the shutdown is aborted. In 358the case where the shutdown is started by the node itself due to hardware 359removal or unloading, (via ng_rmnode_self()) it should set the 360.Em NG_REALLY_DIE 361flag to signal to its own shutdown method that it is not to persist. 362.El 363.Sh Sending and Receiving Data 364Two other methods are also supported by all nodes: 365.Bl -tag -width xxx 366.It Receive data message 367A 368.Em Netgraph queueable reqest item , 369usually refered to as an 370.Em item , 371is recieved by the function. 372The item contains a pointer to an mbuf and metadata about the packet. 373.Pp 374The node is notified on which hook the item arrived, 375and can use this information in its processing decision. 376The receiving node must always 377.Fn NG_FREE_M 378the mbuf chain on completion or error, or pass it on to another node 379(or kernel module) which will then be responsible for freeing it. 380Similarly the 381.Em item 382must be freed if it is not to be passed on to another node, by using the 383.Fn NG_FREE_ITEM 384macro. If the item still holds references to mbufs or metadata at the time of 385freeing then they will also be appropriatly freed. 386Therefore, if there is any chance that the mbuf or metadata will be 387changed or freed separatly from the item, it is very important 388that these fields be retrieved using the 389.Fn NGI_GET_M 390and 391.Fn NGI_GET_META 392macros that also remove the reference within the item. (or multiple frees 393of the same object will occur). 394.Pp 395If it is only required to examine the contents of the mbufs or the 396metadata, then it is possible to use the 397.Fn NGI_M 398and 399.Fn NGI_META 400macros to both read and rewrite these fields. 401.Pp 402In addition to the mbuf chain itself there may also be a pointer to a 403structure describing meta-data about the message 404(e.g. priority information). This pointer may be 405.Dv NULL 406if there is no additional information. The format for this information is 407described in 408.Pa sys/netgraph/netgraph.h . 409The memory for meta-data must allocated via 410.Fn malloc 411with type 412.Dv M_NETGRAPH_META . 413As with the data itself, it is the receiver's responsibility to 414.Fn free 415the meta-data. If the mbuf chain is freed the meta-data must 416be freed at the same time. If the meta-data is freed but the 417real data on is passed on, then a 418.Dv NULL 419pointer must be substituted. It is also the duty of the receiver to free 420the request item itself, or to use it to pass the message on further. 421.Pp 422The receiving node may decide to defer the data by queueing it in the 423.Nm 424NETISR system (see below). It achieves this by setting the 425.Dv HK_QUEUE 426flag in the flags word of the hook on which that data will arrive. 427The infrastructure will respect that bit and queue the data for delivery at 428a later time, rather than deliver it directly. A node may decide to set 429the bit on the 430.Em peer 431node, so that its own output packets are queued. This is used 432by device drivers running at different processor priorities to transfer 433packet delivery to the splnet() level at which the bulk of 434.Nm 435runs. 436.Pp 437The structure and use of meta-data is still experimental, but is 438presently used in frame-relay to indicate that management packets 439should be queued for transmission 440at a higher priority than data packets. This is required for 441conformance with Frame Relay standards. 442.Pp 443The node may elect to nominate a different receive data function 444for data received on a particular hook, to simplify coding. It uses 445the 446.Fn NG_HOOK_SET_RCVDATA hook fn 447macro to do this. The function receives the same arguments in every way 448other than it will receive all (and only) packets from that hook. 449.It Receive control message 450This method is called when a control message is addressed to the node. 451As with the received data, an 452.Em item 453is reveived, with a pointer to the control message. 454The message can be examined using the 455.Fn NGI_MSG 456macro, or completely extracted from the item using the 457.Fn NGI_GET_MSG 458which also removes the reference within the item. 459If the Item still holds a reference to the message when it is freed 460(using the 461.Fn NG_FREE_ITEM 462macro), then the message will also be freed appropriatly. If the 463reference has been removed the node must free the message itself using the 464.Fn NG_FREE_MSG 465macro. 466A return address is always supplied, giving the address of the node 467that originated the message so a reply message can be sent anytime later. 468The return address is retrieved from the 469.Em item 470using the 471.Fn NGI_RETADDR 472macro and is of type 473.Em ng_ID_t . 474All control messages and replies are 475allocated with 476.Fn malloc 477type 478.Dv M_NETGRAPH_MSG , 479however it is more usual to use the 480.Fn NG_MKMESSAGE 481and 482.Fn NG_MKRESPONSE 483macros to allocate and fill out a message. 484Messages must be freed using the 485.Fn NG_FREE_MSG 486macro. 487.Pp 488If the message was delivered via a specific hook, that hook will 489also be made known, which allows the use of such things as flow-control 490messages, and status change messages, where the node may want to forward 491the message out another hook to that on which it arrived. 492.Pp 493The node may elect to nominate a different receive message function 494for messages received on a particular hook, to simplify coding. It uses 495the 496.Fn NG_HOOK_SET_RCVMSG hook fn 497macro to do this. The function receives the same arguments in every way 498other than it will receive all (and only) messages from that hook. 499.El 500.Pp 501Much use has been made of reference counts, so that nodes being 502free'd of all references are automatically freed, and this behaviour 503has been tested and debugged to present a consistent and trustworthy 504framework for the 505.Dq type module 506writer to use. 507.Sh Addressing 508The 509.Nm 510framework provides an unambiguous and simple to use method of specifically 511addressing any single node in the graph. The naming of a node is 512independent of its type, in that another node, or external component 513need not know anything about the node's type in order to address it so as 514to send it a generic message type. Node and hook names should be 515chosen so as to make addresses meaningful. 516.Pp 517Addresses are either absolute or relative. An absolute address begins 518with a node name, (or ID), followed by a colon, followed by a sequence of hook 519names separated by periods. This addresses the node reached by starting 520at the named node and following the specified sequence of hooks. 521A relative address includes only the sequence of hook names, implicitly 522starting hook traversal at the local node. 523.Pp 524There are a couple of special possibilities for the node name. 525The name 526.Dq .\& 527(referred to as 528.Dq \&.: ) 529always refers to the local node. 530Also, nodes that have no global name may be addressed by their ID numbers, 531by enclosing the hex representation of the ID number within square brackets. 532Here are some examples of valid netgraph addresses: 533.Bd -literal -offset 4n -compact 534 535 .: 536 [3f]: 537 foo: 538 .:hook1 539 foo:hook1.hook2 540 [d80]:hook1 541.Ed 542.Pp 543Consider the following set of nodes might be created for a site with 544a single physical frame relay line having two active logical DLCI channels, 545with RFC-1490 frames on DLCI 16 and PPP frames over DLCI 20: 546.Pp 547.Bd -literal 548[type SYNC ] [type FRAME] [type RFC1490] 549[ "Frame1" ](uplink)<-->(data)[<un-named>](dlci16)<-->(mux)[<un-named> ] 550[ A ] [ B ](dlci20)<---+ [ C ] 551 | 552 | [ type PPP ] 553 +>(mux)[<un-named>] 554 [ D ] 555.Ed 556.Pp 557One could always send a control message to node C from anywhere 558by using the name 559.Em "Frame1:uplink.dlci16" . 560In this case, node C would also be notified that the message 561reached it via its hook 562.Dq mux . 563Similarly, 564.Em "Frame1:uplink.dlci20" 565could reliably be used to reach node D, and node A could refer 566to node B as 567.Em ".:uplink" , 568or simply 569.Em "uplink" . 570Conversely, B can refer to A as 571.Em "data" . 572The address 573.Em "mux.data" 574could be used by both nodes C and D to address a message to node A. 575.Pp 576Note that this is only for 577.Em control messages . 578In each of these cases, where a relative addressing mode is 579used, the recipient is notified of the hook on which the 580message arrived, as well as 581the originating node. 582This allows the option of hop-by-hop distibution of messages and 583state information. 584Data messages are 585.Em only 586routed one hop at a time, by specifying the departing 587hook, with each node making 588the next routing decision. So when B receives a frame on hook 589.Dq data 590it decodes the frame relay header to determine the DLCI, 591and then forwards the unwrapped frame to either C or D. 592.Pp 593In a similar way, flow control messages may be routed in the reverse 594direction to outgoing data. For example a "buffer nearly full" message from 595.Em "Frame1: 596would be passed to node 597.Em B 598which might decide to send similar messages to both nodes 599.Em C 600and 601.Em D . 602The nodes would use 603.Em "Direct hook pointer" 604addressing to route the messages. The message may have travelled from 605.Em "Frame1: 606to 607.Em B 608as a synchronous reply, saving time and cycles. 609.Pp 610A similar graph might be used to represent multi-link PPP running 611over an ISDN line: 612.Pp 613.Bd -literal 614[ type BRI ](B1)<--->(link1)[ type MPP ] 615[ "ISDN1" ](B2)<--->(link2)[ (no name) ] 616[ ](D) <-+ 617 | 618 +----------------+ 619 | 620 +->(switch)[ type Q.921 ](term1)<---->(datalink)[ type Q.931 ] 621 [ (no name) ] [ (no name) ] 622.Ed 623.Sh Netgraph Structures 624Structures are defined in 625.Pa sys/netgraph/netgraph.h 626(for kernel sructures only of interest to nodes) 627and 628.Pa sys/netgraph/ng_message.h 629(for message definitions also of interest to user programs). 630.Pp 631The two basic object types that are of interest to node authors are 632.Em nodes 633and 634.Em hooks . 635These two objects have the following 636properties that are also of interest to the node writers. 637.Bl -tag -width xxx 638.It struct ng_node 639Node authors should always use the following typedef to declare 640their pointers, and should never actually declare the structure. 641.Pp 642typedef struct ng_node *node_p; 643.Pp 644The following properties are associated with a node, and can be 645accessed in the following manner: 646.Bl -bullet -compact -offset 2n 647.Pp 648.It 649Validity 650.Pp 651A driver or interrupt routine may want to check whether 652the node is still valid. It is assumed that the caller holds a reference 653on the node so it will not have been freed, however it may have been 654disabled or otherwise shut down. Using the 655.Fn NG_NODE_IS_VALID "node" 656macro will return this state. Eventually it should be almost impossible 657for code to run in an invalid node but at this time that work has not been 658completed. 659.Pp 660.It 661node ID 662.Pp 663Of type 664.Em ng_ID_t , 665This property can be retrieved using the macro 666.Fn NG_NODE_ID "node" . 667.Pp 668.It 669node name 670.Pp 671Optional globally unique name, null terminated string. If there 672is a value in here, it is the name of the node. 673.Pp 674if 675.Fn ( NG_NODE_NAME "node" 676[0]) .... 677.Pp 678if (strncmp( 679.Fn NG_NODE_NAME "node" , 680"fred", NG_NODELEN)) ... 681.Pp 682.It 683A node dependent opaque cookie 684.Pp 685You may place anything of type 686.Em pointer 687here. 688Use the macros 689.Fn NG_NODE_SET_PRIVATE node value 690and 691.Fn NG_NODE_PRIVATE "node" 692to set and retrieve this property. 693.Pp 694.It 695number of hooks 696.Pp 697Use 698.Fn NG_NODE_NUMHOOKS "node" 699to retrieve this value. 700.Pp 701.It 702hooks 703.Pp 704The node may have a number of hooks. 705A traversal method is provided to allow all the hooks to be 706tested for some condition. 707.Fn NG_NODE_FOREACH_HOOK node fn arg rethook 708where fn is a function that will be called for each hook 709with the form 710.Fn fn hook arg 711and returning 0 to terminate the search. If the search is terminated, then 712.Em rethook 713will be set to the hook at which the search was terminated. 714.El 715.It struct ng_hook 716Node authors should always use the following typedef to declare 717their hook pointers. 718.Pp 719typedef struct ng_hook *hook_p; 720.Pp 721The following properties are associated with a hook, and can be 722accessed in the following manner: 723.Bl -bullet -compact -offset 2n 724.Pp 725.It 726A node dependent opaque cookie. 727.Pp 728You may place anything of type 729.Em pointer 730here. 731Use the macros 732.Fn NG_HOOK_SET_PRIVATE hook value 733and 734.Fn NG_HOOK_PRIVATE "hook" 735to set and retrieve this property. 736.Pp 737.It 738An associate node. 739.Pp 740You may use the macro 741.Fn NG_HOOK_NODE "hook" 742to find the associated node. 743.Pp 744.It 745A peer hook 746.Pp 747The other hook in this connected pair. Of type hook_p. You can 748use 749.Fn NG_HOOK_PEER "hook" 750to find the peer. 751.Pp 752.It 753references 754.Pp 755.Fn NG_HOOK_REF "hook" 756and 757.Fn NG_HOOK_UNREF "hook" 758increment and decrement the hook reference count accordingly. 759After decrement you should always assume the hook has been freed 760unless you have another reference still valid. 761.Pp 762.It 763Over-ride receive functions. 764.Pp 765The 766.Fn NG_HOOK_SET_RCVDATA hook fn 767and 768.Fn NG_HOOK_SET_RCVMSG hook fn 769macros can be used to set over-ride methods that will be used in preference 770to the generic receive data and reveive message functions. To unset these 771use the macros to set them to NULL. They will only be used for data and 772messages received on the hook on which they are set. 773.El 774.Pp 775The maintenance of the names, reference counts, and linked list 776of hooks for each node is handled automatically by the 777.Nm 778subsystem. 779Typically a node's private info contains a back-pointer to the node or hook 780structure, which counts as a new reference that must be included 781in the reference count for the node. When the node constructor is called 782there is already a reference for this calculated in, so that 783when the node is destroyed, it should remember to do a 784.Fn NG_NODE_UNREF 785on the node. 786.Pp 787From a hook you can obtain the corresponding node, and from 788a node, it is possible to traverse all the active hooks. 789.Pp 790A current example of how to define a node can always be seen in 791.Em sys/netgraph/ng_sample.c 792and should be used as a starting point for new node writers. 793.El 794.Sh Netgraph Message Structure 795Control messages have the following structure: 796.Bd -literal 797#define NG_CMDSTRLEN 15 /* Max command string (16 with null) */ 798 799struct ng_mesg { 800 struct ng_msghdr { 801 u_char version; /* Must equal NG_VERSION */ 802 u_char spare; /* Pad to 2 bytes */ 803 u_short arglen; /* Length of cmd/resp data */ 804 u_long flags; /* Message status flags */ 805 u_long token; /* Reply should have the same token */ 806 u_long typecookie; /* Node type understanding this message */ 807 u_long cmd; /* Command identifier */ 808 u_char cmdstr[NG_CMDSTRLEN+1]; /* Cmd string (for debug) */ 809 } header; 810 char data[0]; /* Start of cmd/resp data */ 811}; 812 813#define NG_ABI_VERSION 5 /* Netgraph kernel ABI version */ 814#define NG_VERSION 4 /* Netgraph message version */ 815#define NGF_ORIG 0x0000 /* Command */ 816#define NGF_RESP 0x0001 /* Response */ 817.Ed 818.Pp 819Control messages have the fixed header shown above, followed by a 820variable length data section which depends on the type cookie 821and the command. Each field is explained below: 822.Bl -tag -width xxx 823.It Dv version 824Indicates the version of the netgraph message protocol itself. The current version is 825.Dv NG_VERSION . 826.It Dv arglen 827This is the length of any extra arguments, which begin at 828.Dv data . 829.It Dv flags 830Indicates whether this is a command or a response control message. 831.It Dv token 832The 833.Dv token 834is a means by which a sender can match a reply message to the 835corresponding command message; the reply always has the same token. 836.Pp 837.It Dv typecookie 838The corresponding node type's unique 32-bit value. 839If a node doesn't recognize the type cookie it must reject the message 840by returning 841.Er EINVAL . 842.Pp 843Each type should have an include file that defines the commands, 844argument format, and cookie for its own messages. 845The typecookie 846insures that the same header file was included by both sender and 847receiver; when an incompatible change in the header file is made, 848the typecookie 849.Em must 850be changed. 851The de facto method for generating unique type cookies is to take the 852seconds from the epoch at the time the header file is written 853(i.e., the output of 854.Dv "date -u +'%s'" ) . 855.Pp 856There is a predefined typecookie 857.Dv NGM_GENERIC_COOKIE 858for the 859.Dq generic 860node type, and 861a corresponding set of generic messages which all nodes understand. 862The handling of these messages is automatic. 863.It Dv command 864The identifier for the message command. This is type specific, 865and is defined in the same header file as the typecookie. 866.It Dv cmdstr 867Room for a short human readable version of 868.Dq command 869(for debugging purposes only). 870.El 871.Pp 872Some modules may choose to implement messages from more than one 873of the header files and thus recognize more than one type cookie. 874.Sh Control Message ASCII Form 875Control messages are in binary format for efficiency. However, for 876debugging and human interface purposes, and if the node type supports 877it, control messages may be converted to and from an equivalent 878.Tn ASCII 879form. The 880.Tn ASCII 881form is similar to the binary form, with two exceptions: 882.Pp 883.Bl -tag -compact -width xxx 884.It o 885The 886.Dv cmdstr 887header field must contain the 888.Tn ASCII 889name of the command, corresponding to the 890.Dv cmd 891header field. 892.It o 893The 894.Dv args 895field contains a NUL-terminated 896.Tn ASCII 897string version of the message arguments. 898.El 899.Pp 900In general, the arguments field of a control messgage can be any 901arbitrary C data type. Netgraph includes parsing routines to support 902some pre-defined datatypes in 903.Tn ASCII 904with this simple syntax: 905.Pp 906.Bl -tag -compact -width xxx 907.It o 908Integer types are represented by base 8, 10, or 16 numbers. 909.It o 910Strings are enclosed in double quotes and respect the normal 911C language backslash escapes. 912.It o 913IP addresses have the obvious form. 914.It o 915Arrays are enclosed in square brackets, with the elements listed 916consecutively starting at index zero. An element may have an optional 917index and equals sign preceding it. Whenever an element 918does not have an explicit index, the index is implicitly the previous 919element's index plus one. 920.It o 921Structures are enclosed in curly braces, and each field is specified 922in the form 923.Dq fieldname=value . 924.It o 925Any array element or structure field whose value is equal to its 926.Dq default value 927may be omitted. For integer types, the default value 928is usually zero; for string types, the empty string. 929.It o 930Array elements and structure fields may be specified in any order. 931.El 932.Pp 933Each node type may define its own arbitrary types by providing 934the necessary routines to parse and unparse. 935.Tn ASCII 936forms defined 937for a specific node type are documented in the documentation for 938that node type. 939.Sh Generic Control Messages 940There are a number of standard predefined messages that will work 941for any node, as they are supported directly by the framework itself. 942These are defined in 943.Pa ng_message.h 944along with the basic layout of messages and other similar information. 945.Bl -tag -width xxx 946.It Dv NGM_CONNECT 947Connect to another node, using the supplied hook names on either end. 948.It Dv NGM_MKPEER 949Construct a node of the given type and then connect to it using the 950supplied hook names. 951.It Dv NGM_SHUTDOWN 952The target node should disconnect from all its neighbours and shut down. 953Persistent nodes such as those representing physical hardware 954might not disappear from the node namespace, but only reset themselves. 955The node must disconnect all of its hooks. 956This may result in neighbors shutting themselves down, and possibly a 957cascading shutdown of the entire connected graph. 958.It Dv NGM_NAME 959Assign a name to a node. Nodes can exist without having a name, and this 960is the default for nodes created using the 961.Dv NGM_MKPEER 962method. Such nodes can only be addressed relatively or by their ID number. 963.It Dv NGM_RMHOOK 964Ask the node to break a hook connection to one of its neighbours. 965Both nodes will have their 966.Dq disconnect 967method invoked. 968Either node may elect to totally shut down as a result. 969.It Dv NGM_NODEINFO 970Asks the target node to describe itself. The four returned fields 971are the node name (if named), the node type, the node ID and the 972number of hooks attached. The ID is an internal number unique to that node. 973.It Dv NGM_LISTHOOKS 974This returns the information given by 975.Dv NGM_NODEINFO , 976but in addition 977includes an array of fields describing each link, and the description for 978the node at the far end of that link. 979.It Dv NGM_LISTNAMES 980This returns an array of node descriptions (as for 981.Dv NGM_NODEINFO ")" 982where each entry of the array describes a named node. 983All named nodes will be described. 984.It Dv NGM_LISTNODES 985This is the same as 986.Dv NGM_LISTNAMES 987except that all nodes are listed regardless of whether they have a name or not. 988.It Dv NGM_LISTTYPES 989This returns a list of all currently installed netgraph types. 990.It Dv NGM_TEXT_STATUS 991The node may return a text formatted status message. 992The status information is determined entirely by the node type. 993It is the only "generic" message 994that requires any support within the node itself and as such the node may 995elect to not support this message. The text response must be less than 996.Dv NG_TEXTRESPONSE 997bytes in length (presently 1024). This can be used to return general 998status information in human readable form. 999.It Dv NGM_BINARY2ASCII 1000This message converts a binary control message to its 1001.Tn ASCII 1002form. 1003The entire control message to be converted is contained within the 1004arguments field of the 1005.Dv NGM_BINARY2ASCII 1006message itself. If successful, the reply will contain the same control 1007message in 1008.Tn ASCII 1009form. 1010A node will typically only know how to translate messages that it 1011itself understands, so the target node of the 1012.Dv NGM_BINARY2ASCII 1013is often the same node that would actually receive that message. 1014.It Dv NGM_ASCII2BINARY 1015The opposite of 1016.Dv NGM_BINARY2ASCII . 1017The entire control message to be converted, in 1018.Tn ASCII 1019form, is contained 1020in the arguments section of the 1021.Dv NGM_ASCII2BINARY 1022and need only have the 1023.Dv flags , 1024.Dv cmdstr , 1025and 1026.Dv arglen 1027header fields filled in, plus the NUL-terminated string version of 1028the arguments in the arguments field. If successful, the reply 1029contains the binary version of the control message. 1030.El 1031.Sh Flow Control Messages 1032In addition to the control messages that affect nodes with respect to the 1033graph, there are also a number of 1034.Em Flow-control 1035messages defined. At present these are 1036.Em NOT 1037handled automatically by the system, so 1038nodes need to handle them if they are going to be used in a graph utilising 1039flow control, and will be in the likely path of these messages. The 1040default action of a node that doesn't understand these messages should 1041be to pass them onto the next node. Hopefully some helper functions 1042will assist in this eventually. These messages are also defined in 1043.Pa sys/netgraph/ng_message.h 1044and have a separate cookie 1045.Em NG_FLOW_COOKIE 1046to help identify them. They will not be covered in depth here. 1047.Sh Metadata 1048Data moving through the 1049.Nm 1050system can be accompanied by meta-data that describes some 1051aspect of that data. The form of the meta-data is a fixed header, 1052which contains enough information for most uses, and can optionally 1053be supplemented by trailing 1054.Em option 1055structures, which contain a 1056.Em cookie 1057(see the section on control messages), an identifier, a length and optional 1058data. If a node does not recognize the cookie associated with an option, 1059it should ignore that option. 1060.Pp 1061Meta data might include such things as priority, discard eligibility, 1062or special processing requirements. It might also mark a packet for 1063debug status, etc. The use of meta-data is still experimental. 1064.Sh INITIALIZATION 1065The base 1066.Nm 1067code may either be statically compiled 1068into the kernel or else loaded dynamically as a KLD via 1069.Xr kldload 8 . 1070In the former case, include 1071.Pp 1072.Dl options NETGRAPH 1073.Pp 1074in your kernel configuration file. You may also include selected 1075node types in the kernel compilation, for example: 1076.Bd -literal -offset indent 1077options NETGRAPH 1078options NETGRAPH_SOCKET 1079options NETGRAPH_ECHO 1080.Ed 1081.Pp 1082Once the 1083.Nm 1084subsystem is loaded, individual node types may be loaded at any time 1085as KLD modules via 1086.Xr kldload 8 . 1087Moreover, 1088.Nm 1089knows how to automatically do this; when a request to create a new 1090node of unknown type 1091.Em type 1092is made, 1093.Nm 1094will attempt to load the KLD module 1095.Pa ng_type.ko . 1096.Pp 1097Types can also be installed at boot time, as certain device drivers 1098may want to export each instance of the device as a netgraph node. 1099.Pp 1100In general, new types can be installed at any time from within the 1101kernel by calling 1102.Fn ng_newtype , 1103supplying a pointer to the type's 1104.Dv struct ng_type 1105structure. 1106.Pp 1107The 1108.Fn NETGRAPH_INIT 1109macro automates this process by using a linker set. 1110.Sh EXISTING NODE TYPES 1111Several node types currently exist. Each is fully documented 1112in its own man page: 1113.Bl -tag -width xxx 1114.It SOCKET 1115The socket type implements two new sockets in the new protocol domain 1116.Dv PF_NETGRAPH . 1117The new sockets protocols are 1118.Dv NG_DATA 1119and 1120.Dv NG_CONTROL , 1121both of type 1122.Dv SOCK_DGRAM . 1123Typically one of each is associated with a socket node. 1124When both sockets have closed, the node will shut down. The 1125.Dv NG_DATA 1126socket is used for sending and receiving data, while the 1127.Dv NG_CONTROL 1128socket is used for sending and receiving control messages. 1129Data and control messages are passed using the 1130.Xr sendto 2 1131and 1132.Xr recvfrom 2 1133calls, using a 1134.Dv struct sockaddr_ng 1135socket address. 1136.Pp 1137.It HOLE 1138Responds only to generic messages and is a 1139.Dq black hole 1140for data, Useful for testing. Always accepts new hooks. 1141.Pp 1142.It ECHO 1143Responds only to generic messages and always echoes data back through the 1144hook from which it arrived. Returns any non generic messages as their 1145own response. Useful for testing. Always accepts new hooks. 1146.Pp 1147.It TEE 1148This node is useful for 1149.Dq snooping . 1150It has 4 hooks: 1151.Dv left , 1152.Dv right , 1153.Dv left2right , 1154and 1155.Dv right2left . 1156Data entering from the right is passed to the left and duplicated on 1157.Dv right2left , 1158and data entering from the left is passed to the right and 1159duplicated on 1160.Dv left2right . 1161Data entering from 1162.Dv left2right 1163is sent to the right and data from 1164.Dv right2left 1165to left. 1166.Pp 1167.It RFC1490 MUX 1168Encapsulates/de-encapsulates frames encoded according to RFC 1490. 1169Has a hook for the encapsulated packets 1170.Pq Dq downstream 1171and one hook 1172for each protocol (i.e., IP, PPP, etc.). 1173.Pp 1174.It FRAME RELAY MUX 1175Encapsulates/de-encapsulates Frame Relay frames. 1176Has a hook for the encapsulated packets 1177.Pq Dq downstream 1178and one hook 1179for each DLCI. 1180.Pp 1181.It FRAME RELAY LMI 1182Automatically handles frame relay 1183.Dq LMI 1184(link management interface) operations and packets. 1185Automatically probes and detects which of several LMI standards 1186is in use at the exchange. 1187.Pp 1188.It TTY 1189This node is also a line discipline. It simply converts between mbuf 1190frames and sequential serial data, allowing a tty to appear as a netgraph 1191node. It has a programmable 1192.Dq hotkey 1193character. 1194.Pp 1195.It ASYNC 1196This node encapsulates and de-encapsulates asynchronous frames 1197according to RFC 1662. This is used in conjunction with the TTY node 1198type for supporting PPP links over asynchronous serial lines. 1199.Pp 1200.It INTERFACE 1201This node is also a system networking interface. It has hooks representing 1202each protocol family (IP, AppleTalk, IPX, etc.) and appears in the output of 1203.Xr ifconfig 8 . 1204The interfaces are named 1205.Em ng0 , 1206.Em ng1 , 1207etc. 1208.It ONE2MANY 1209This node implements a simple round-robin multiplexer. It can be used 1210for example to make several LAN ports act together to get a higher speed 1211link between two machines. 1212.It Various PPP related nodes. 1213There is a full multilink PPP implementation that runs in Netgraph. 1214The 1215.Em Mpd 1216port can use these modules to make a very low latency high 1217capacity ppp system. It also supports 1218.Em PPTP 1219vpns using the 1220.Em PPTP 1221node. 1222.It PPPOE 1223A server and client side implememtation of PPPoE. Used in conjunction with 1224either 1225.Xr ppp 8 1226or the 1227.Em mpd port . 1228.It BRIDGE 1229This node, togther with the ethernet nodes allows a very flexible 1230bridging system to be implemented. 1231.It KSOCKET 1232This intriguing node looks like a socket to the system but diverts 1233all data to and from the netgraph system for further processing. This allows 1234such things as UDP tunnels to be almost trivially implemented from the 1235command line. 1236.El 1237.Pp 1238Refer to the section at the end of this man page for more nodes types. 1239.Sh NOTES 1240Whether a named node exists can be checked by trying to send a control message 1241to it (e.g., 1242.Dv NGM_NODEINFO ) . 1243If it does not exist, 1244.Er ENOENT 1245will be returned. 1246.Pp 1247All data messages are mbuf chains with the M_PKTHDR flag set. 1248.Pp 1249Nodes are responsible for freeing what they allocate. 1250There are three exceptions: 1251.Bl -tag -width xxxx 1252.It 1 1253Mbufs sent across a data link are never to be freed by the sender. In the 1254case of error, they should be considered freed. 1255.It 2 1256Any meta-data information traveling with the data has the same restriction. 1257It might be freed by any node the data passes through, and a 1258.Dv NULL 1259passed onwards, but the caller will never free it. 1260Two macros 1261.Fn NG_FREE_META "meta" 1262and 1263.Fn NG_FREE_M "m" 1264should be used if possible to free data and meta data (see 1265.Pa netgraph.h ) . 1266.It 3 1267Messages sent using 1268.Fn ng_send_message 1269are freed by the recipient. As in the case above, the addresses 1270associated with the message are freed by whatever allocated them so the 1271recipient should copy them if it wants to keep that information. 1272.It 4 1273Both control mesages and data are delivered and queued with 1274a netgraph 1275.Em item . 1276The item must be freed using 1277.Fn NG_FREE_ITEM "item" 1278or passed on to another node. 1279.El 1280.Sh FILES 1281.Bl -tag -width xxxxx -compact 1282.It Pa /sys/netgraph/netgraph.h 1283Definitions for use solely within the kernel by 1284.Nm 1285nodes. 1286.It Pa /sys/netgraph/ng_message.h 1287Definitions needed by any file that needs to deal with 1288.Nm 1289messages. 1290.It Pa /sys/netgraph/ng_socket.h 1291Definitions needed to use 1292.Nm 1293socket type nodes. 1294.It Pa /sys/netgraph/ng_{type}.h 1295Definitions needed to use 1296.Nm 1297{type} 1298nodes, including the type cookie definition. 1299.It Pa /modules/netgraph.ko 1300Netgraph subsystem loadable KLD module. 1301.It Pa /modules/ng_{type}.ko 1302Loadable KLD module for node type {type}. 1303.It Pa /sys/netgraph/ng_sample.c 1304Skeleton netgraph node. 1305Use this as a starting point for new node types. 1306.El 1307.Sh USER MODE SUPPORT 1308There is a library for supporting user-mode programs that wish 1309to interact with the netgraph system. See 1310.Xr netgraph 3 1311for details. 1312.Pp 1313Two user-mode support programs, 1314.Xr ngctl 8 1315and 1316.Xr nghook 8 , 1317are available to assist manual configuration and debugging. 1318.Pp 1319There are a few useful techniques for debugging new node types. 1320First, implementing new node types in user-mode first 1321makes debugging easier. 1322The 1323.Em tee 1324node type is also useful for debugging, especially in conjunction with 1325.Xr ngctl 8 1326and 1327.Xr nghook 8 . 1328.Pp 1329Also look in /usr/share/examples/netgraph for solutions to several 1330common networking problems, solved using 1331.Nm . 1332.Sh SEE ALSO 1333.Xr socket 2 , 1334.Xr netgraph 3 , 1335.Xr ng_async 4 , 1336.Xr ng_bpf 4 , 1337.Xr ng_bridge 4 , 1338.Xr ng_cisco 4 , 1339.Xr ng_echo 4 , 1340.Xr ng_ether 4 , 1341.Xr ng_ether 4 , 1342.Xr ng_frame_relay 4 , 1343.Xr ng_hole 4 , 1344.Xr ng_iface 4 , 1345.Xr ng_ksocket 4 , 1346.Xr ng_lmi 4 , 1347.Xr ng_mppc 4 , 1348.Xr ng_ppp 4 , 1349.Xr ng_pppoe 4 , 1350.Xr ng_pptpgre 4 , 1351.Xr ng_rfc1490 4 , 1352.Xr ng_socket 4 , 1353.Xr ng_tee 4 , 1354.Xr ng_tty 4 , 1355.Xr ng_UI 4 , 1356.Xr ng_vjc 4 , 1357.Xr ngctl 8 , 1358.Xr nghook 8 1359.Sh HISTORY 1360The 1361.Nm 1362system was designed and first implemented at Whistle Communications, Inc.\& 1363in a version of 1364.Fx 2.2 1365customized for the Whistle InterJet. 1366It first made its debut in the main tree in 1367.Fx 3.4 . 1368.Sh AUTHORS 1369.An -nosplit 1370.An Julian Elischer Aq julian@FreeBSD.org , 1371with contributions by 1372.An Archie Cobbs Aq archie@FreeBSD.org . 1373