1addeef82SBruce A. Mah.\" Copyright (c) 2001-2003 International Computer Science Institute 2addeef82SBruce A. Mah.\" 3addeef82SBruce A. Mah.\" Permission is hereby granted, free of charge, to any person obtaining a 4addeef82SBruce A. Mah.\" copy of this software and associated documentation files (the "Software"), 5addeef82SBruce A. Mah.\" to deal in the Software without restriction, including without limitation 6addeef82SBruce A. Mah.\" the rights to use, copy, modify, merge, publish, distribute, sublicense, 7addeef82SBruce A. Mah.\" and/or sell copies of the Software, and to permit persons to whom the 8addeef82SBruce A. Mah.\" Software is furnished to do so, subject to the following conditions: 9addeef82SBruce A. Mah.\" 10addeef82SBruce A. Mah.\" The above copyright notice and this permission notice shall be included in 11addeef82SBruce A. Mah.\" all copies or substantial portions of the Software. 12addeef82SBruce A. Mah.\" 13addeef82SBruce A. Mah.\" The names and trademarks of copyright holders may not be used in 14addeef82SBruce A. Mah.\" advertising or publicity pertaining to the software without specific 15addeef82SBruce A. Mah.\" prior permission. Title to copyright in this software and any associated 16addeef82SBruce A. Mah.\" documentation will at all times remain with the copyright holders. 17addeef82SBruce A. Mah.\" 18addeef82SBruce A. Mah.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 19addeef82SBruce A. Mah.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 20addeef82SBruce A. Mah.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 21addeef82SBruce A. Mah.\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 22addeef82SBruce A. Mah.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 23addeef82SBruce A. Mah.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 24addeef82SBruce A. Mah.\" DEALINGS IN THE SOFTWARE. 25addeef82SBruce A. Mah.\" 26addeef82SBruce A. Mah.\" $FreeBSD$ 27addeef82SBruce A. Mah.\" 282fb1aecaSBruce M Simpson.Dd February 13, 2009 29addeef82SBruce A. Mah.Dt MULTICAST 4 30addeef82SBruce A. Mah.Os 31addeef82SBruce A. Mah.\" 32addeef82SBruce A. Mah.Sh NAME 33addeef82SBruce A. Mah.Nm multicast 34addeef82SBruce A. Mah.Nd Multicast Routing 35addeef82SBruce A. Mah.\" 36addeef82SBruce A. Mah.Sh SYNOPSIS 37addeef82SBruce A. Mah.Cd "options MROUTING" 38addeef82SBruce A. Mah.Pp 39addeef82SBruce A. Mah.In sys/types.h 40addeef82SBruce A. Mah.In sys/socket.h 41addeef82SBruce A. Mah.In netinet/in.h 42addeef82SBruce A. Mah.In netinet/ip_mroute.h 43addeef82SBruce A. Mah.In netinet6/ip6_mroute.h 44addeef82SBruce A. Mah.Ft int 45addeef82SBruce A. Mah.Fn getsockopt "int s" IPPROTO_IP MRT_INIT "void *optval" "socklen_t *optlen" 46addeef82SBruce A. Mah.Ft int 47addeef82SBruce A. Mah.Fn setsockopt "int s" IPPROTO_IP MRT_INIT "const void *optval" "socklen_t optlen" 48addeef82SBruce A. Mah.Ft int 49addeef82SBruce A. Mah.Fn getsockopt "int s" IPPROTO_IPV6 MRT6_INIT "void *optval" "socklen_t *optlen" 50addeef82SBruce A. Mah.Ft int 51addeef82SBruce A. Mah.Fn setsockopt "int s" IPPROTO_IPV6 MRT6_INIT "const void *optval" "socklen_t optlen" 52addeef82SBruce A. Mah.Sh DESCRIPTION 53addeef82SBruce A. Mah.Tn "Multicast routing" 54addeef82SBruce A. Mahis used to efficiently propagate data 55addeef82SBruce A. Mahpackets to a set of multicast listeners in multipoint networks. 56addeef82SBruce A. MahIf unicast is used to replicate the data to all listeners, 57addeef82SBruce A. Mahthen some of the network links may carry multiple copies of the same 58addeef82SBruce A. Mahdata packets. 59addeef82SBruce A. MahWith multicast routing, the overhead is reduced to one copy 60addeef82SBruce A. Mah(at most) per network link. 61addeef82SBruce A. Mah.Pp 62addeef82SBruce A. MahAll multicast-capable routers must run a common multicast routing 63addeef82SBruce A. Mahprotocol. 64eea554b7SBruce M SimpsonIt is recommended that either 65addeef82SBruce A. MahProtocol Independent Multicast - Sparse Mode (PIM-SM), 66eea554b7SBruce M Simpsonor Protocol Independent Multicast - Dense Mode (PIM-DM) 67eea554b7SBruce M Simpsonare used, as these are now the generally accepted protocols 68eea554b7SBruce M Simpsonin the Internet community. 69eea554b7SBruce M SimpsonThe 70eea554b7SBruce M Simpson.Sx HISTORY 71eea554b7SBruce M Simpsonsection discusses previous multicast routing protocols. 72addeef82SBruce A. Mah.Pp 73addeef82SBruce A. MahTo start multicast routing, 74addeef82SBruce A. Mahthe user must enable multicast forwarding in the kernel 75addeef82SBruce A. Mah(see 76addeef82SBruce A. Mah.Sx SYNOPSIS 77addeef82SBruce A. Mahabout the kernel configuration options), 78addeef82SBruce A. Mahand must run a multicast routing capable user-level process. 79addeef82SBruce A. MahFrom developer's point of view, 80addeef82SBruce A. Mahthe programming guide described in the 81addeef82SBruce A. Mah.Sx "Programming Guide" 82addeef82SBruce A. Mahsection should be used to control the multicast forwarding in the kernel. 83addeef82SBruce A. Mah.\" 84addeef82SBruce A. Mah.Ss Programming Guide 85addeef82SBruce A. MahThis section provides information about the basic multicast routing API. 86addeef82SBruce A. MahThe so-called 87addeef82SBruce A. Mah.Dq advanced multicast API 88addeef82SBruce A. Mahis described in the 89addeef82SBruce A. Mah.Sx "Advanced Multicast API Programming Guide" 90addeef82SBruce A. Mahsection. 91addeef82SBruce A. Mah.Pp 92addeef82SBruce A. MahFirst, a multicast routing socket must be open. 93addeef82SBruce A. MahThat socket would be used 94addeef82SBruce A. Mahto control the multicast forwarding in the kernel. 95addeef82SBruce A. MahNote that most operations below require certain privilege 96addeef82SBruce A. Mah(i.e., root privilege): 97addeef82SBruce A. Mah.Bd -literal 98addeef82SBruce A. Mah/* IPv4 */ 99addeef82SBruce A. Mahint mrouter_s4; 100addeef82SBruce A. Mahmrouter_s4 = socket(AF_INET, SOCK_RAW, IPPROTO_IGMP); 101addeef82SBruce A. Mah.Ed 102addeef82SBruce A. Mah.Bd -literal 103addeef82SBruce A. Mahint mrouter_s6; 104addeef82SBruce A. Mahmrouter_s6 = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6); 105addeef82SBruce A. Mah.Ed 106addeef82SBruce A. Mah.Pp 107addeef82SBruce A. MahNote that if the router needs to open an IGMP or ICMPv6 socket 108addeef82SBruce A. Mah(in case of IPv4 and IPv6 respectively) 109addeef82SBruce A. Mahfor sending or receiving of IGMP or MLD multicast group membership messages, 110ef151d78SRuslan Ermilovthen the same 111ef151d78SRuslan Ermilov.Va mrouter_s4 112ef151d78SRuslan Ermilovor 113ef151d78SRuslan Ermilov.Va mrouter_s6 114ef151d78SRuslan Ermilovsockets should be used 115addeef82SBruce A. Mahfor sending and receiving respectively IGMP or MLD messages. 116ef151d78SRuslan ErmilovIn case of 117ef151d78SRuslan Ermilov.Bx Ns 118ef151d78SRuslan Ermilov-derived kernel, it may be possible to open separate sockets 119addeef82SBruce A. Mahfor IGMP or MLD messages only. 120ef151d78SRuslan ErmilovHowever, some other kernels (e.g., 121ef151d78SRuslan Ermilov.Tn Linux ) 122ef151d78SRuslan Ermilovrequire that the multicast 123addeef82SBruce A. Mahrouting socket must be used for sending and receiving of IGMP or MLD 124addeef82SBruce A. Mahmessages. 125addeef82SBruce A. MahTherefore, for portability reason the multicast 126addeef82SBruce A. Mahrouting socket should be reused for IGMP and MLD messages as well. 127addeef82SBruce A. Mah.Pp 128addeef82SBruce A. MahAfter the multicast routing socket is open, it can be used to enable 129addeef82SBruce A. Mahor disable multicast forwarding in the kernel: 130addeef82SBruce A. Mah.Bd -literal 131addeef82SBruce A. Mah/* IPv4 */ 132addeef82SBruce A. Mahint v = 1; /* 1 to enable, or 0 to disable */ 133addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_INIT, (void *)&v, sizeof(v)); 134addeef82SBruce A. Mah.Ed 135addeef82SBruce A. Mah.Bd -literal 136addeef82SBruce A. Mah/* IPv6 */ 137addeef82SBruce A. Mahint v = 1; /* 1 to enable, or 0 to disable */ 138addeef82SBruce A. Mahsetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_INIT, (void *)&v, sizeof(v)); 139addeef82SBruce A. Mah\&... 140addeef82SBruce A. Mah/* If necessary, filter all ICMPv6 messages */ 141addeef82SBruce A. Mahstruct icmp6_filter filter; 142addeef82SBruce A. MahICMP6_FILTER_SETBLOCKALL(&filter); 143addeef82SBruce A. Mahsetsockopt(mrouter_s6, IPPROTO_ICMPV6, ICMP6_FILTER, (void *)&filter, 144addeef82SBruce A. Mah sizeof(filter)); 145addeef82SBruce A. Mah.Ed 146addeef82SBruce A. Mah.Pp 147addeef82SBruce A. MahAfter multicast forwarding is enabled, the multicast routing socket 148addeef82SBruce A. Mahcan be used to enable PIM processing in the kernel if we are running PIM-SM or 149addeef82SBruce A. MahPIM-DM 150addeef82SBruce A. Mah(see 151addeef82SBruce A. Mah.Xr pim 4 ) . 152addeef82SBruce A. Mah.Pp 153addeef82SBruce A. MahFor each network interface (e.g., physical or a virtual tunnel) 154addeef82SBruce A. Mahthat would be used for multicast forwarding, a corresponding 155addeef82SBruce A. Mahmulticast interface must be added to the kernel: 156addeef82SBruce A. Mah.Bd -literal 157addeef82SBruce A. Mah/* IPv4 */ 158addeef82SBruce A. Mahstruct vifctl vc; 159addeef82SBruce A. Mahmemset(&vc, 0, sizeof(vc)); 160addeef82SBruce A. Mah/* Assign all vifctl fields as appropriate */ 161addeef82SBruce A. Mahvc.vifc_vifi = vif_index; 162addeef82SBruce A. Mahvc.vifc_flags = vif_flags; 163addeef82SBruce A. Mahvc.vifc_threshold = min_ttl_threshold; 1640770db89SBruce M Simpsonvc.vifc_rate_limit = 0; 165addeef82SBruce A. Mahmemcpy(&vc.vifc_lcl_addr, &vif_local_address, sizeof(vc.vifc_lcl_addr)); 166addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, (void *)&vc, 167addeef82SBruce A. Mah sizeof(vc)); 168addeef82SBruce A. Mah.Ed 169addeef82SBruce A. Mah.Pp 170addeef82SBruce A. MahThe 171ef151d78SRuslan Ermilov.Va vif_index 172addeef82SBruce A. Mahmust be unique per vif. 173addeef82SBruce A. MahThe 174ef151d78SRuslan Ermilov.Va vif_flags 175addeef82SBruce A. Mahcontains the 176ef151d78SRuslan Ermilov.Dv VIFF_* 177ef151d78SRuslan Ermilovflags as defined in 178ef151d78SRuslan Ermilov.In netinet/ip_mroute.h . 179addeef82SBruce A. MahThe 1800770db89SBruce M Simpson.Dv VIFF_TUNNEL 1810770db89SBruce M Simpsonflag is no longer supported by 1820770db89SBruce M Simpson.Fx . 1830770db89SBruce M SimpsonUsers who wish to forward multicast datagrams over a tunnel should consider 1840770db89SBruce M Simpsonconfiguring a 1850770db89SBruce M Simpson.Xr gif 4 1860770db89SBruce M Simpsonor 1870770db89SBruce M Simpson.Xr gre 4 1880770db89SBruce M Simpsontunnel and using it as a physical interface. 1890770db89SBruce M Simpson.Pp 1900770db89SBruce M SimpsonThe 191ef151d78SRuslan Ermilov.Va min_ttl_threshold 192addeef82SBruce A. Mahcontains the minimum TTL a multicast data packet must have to be 193addeef82SBruce A. Mahforwarded on that vif. 194addeef82SBruce A. MahTypically, it would have value of 1. 1950770db89SBruce M Simpson.Pp 196addeef82SBruce A. MahThe 197ef151d78SRuslan Ermilov.Va max_rate_limit 1980770db89SBruce M Simpsonargument is no longer supported in 1990770db89SBruce M Simpson.Fx 2000770db89SBruce M Simpsonand should be set to 0. 2010770db89SBruce M SimpsonUsers who wish to rate-limit multicast datagrams should consider the use of 2020770db89SBruce M Simpson.Xr dummynet 4 2030770db89SBruce M Simpsonor 2040770db89SBruce M Simpson.Xr altq 4 . 2050770db89SBruce M Simpson.Pp 206addeef82SBruce A. MahThe 207ef151d78SRuslan Ermilov.Va vif_local_address 208addeef82SBruce A. Mahcontains the local IP address of the corresponding local interface. 209addeef82SBruce A. MahThe 210ef151d78SRuslan Ermilov.Va vif_remote_address 211addeef82SBruce A. Mahcontains the remote IP address in case of DVMRP multicast tunnels. 212addeef82SBruce A. Mah.Bd -literal 213addeef82SBruce A. Mah/* IPv6 */ 214addeef82SBruce A. Mahstruct mif6ctl mc; 215addeef82SBruce A. Mahmemset(&mc, 0, sizeof(mc)); 216addeef82SBruce A. Mah/* Assign all mif6ctl fields as appropriate */ 217addeef82SBruce A. Mahmc.mif6c_mifi = mif_index; 218addeef82SBruce A. Mahmc.mif6c_flags = mif_flags; 219addeef82SBruce A. Mahmc.mif6c_pifi = pif_index; 220addeef82SBruce A. Mahsetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, (void *)&mc, 221addeef82SBruce A. Mah sizeof(mc)); 222addeef82SBruce A. Mah.Ed 223addeef82SBruce A. Mah.Pp 224addeef82SBruce A. MahThe 225ef151d78SRuslan Ermilov.Va mif_index 226addeef82SBruce A. Mahmust be unique per vif. 227addeef82SBruce A. MahThe 228ef151d78SRuslan Ermilov.Va mif_flags 229addeef82SBruce A. Mahcontains the 230ef151d78SRuslan Ermilov.Dv MIFF_* 231ef151d78SRuslan Ermilovflags as defined in 232ef151d78SRuslan Ermilov.In netinet6/ip6_mroute.h . 233addeef82SBruce A. MahThe 234ef151d78SRuslan Ermilov.Va pif_index 235addeef82SBruce A. Mahis the physical interface index of the corresponding local interface. 236addeef82SBruce A. Mah.Pp 237addeef82SBruce A. MahA multicast interface is deleted by: 238addeef82SBruce A. Mah.Bd -literal 239addeef82SBruce A. Mah/* IPv4 */ 240addeef82SBruce A. Mahvifi_t vifi = vif_index; 241addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_VIF, (void *)&vifi, 242addeef82SBruce A. Mah sizeof(vifi)); 243addeef82SBruce A. Mah.Ed 244addeef82SBruce A. Mah.Bd -literal 245addeef82SBruce A. Mah/* IPv6 */ 246addeef82SBruce A. Mahmifi_t mifi = mif_index; 247addeef82SBruce A. Mahsetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DEL_MIF, (void *)&mifi, 248addeef82SBruce A. Mah sizeof(mifi)); 249addeef82SBruce A. Mah.Ed 250addeef82SBruce A. Mah.Pp 251addeef82SBruce A. MahAfter the multicast forwarding is enabled, and the multicast virtual 252addeef82SBruce A. Mahinterfaces are 253addeef82SBruce A. Mahadded, the kernel may deliver upcall messages (also called signals 254addeef82SBruce A. Mahlater in this text) on the multicast routing socket that was open 255addeef82SBruce A. Mahearlier with 256ef151d78SRuslan Ermilov.Dv MRT_INIT 257addeef82SBruce A. Mahor 258ef151d78SRuslan Ermilov.Dv MRT6_INIT . 259addeef82SBruce A. MahThe IPv4 upcalls have 260ef151d78SRuslan Ermilov.Vt "struct igmpmsg" 261ef151d78SRuslan Ermilovheader (see 262ef151d78SRuslan Ermilov.In netinet/ip_mroute.h ) 263ef151d78SRuslan Ermilovwith field 264ef151d78SRuslan Ermilov.Va im_mbz 265addeef82SBruce A. Mahset to zero. 266addeef82SBruce A. MahNote that this header follows the structure of 267ef151d78SRuslan Ermilov.Vt "struct ip" 268addeef82SBruce A. Mahwith the protocol field 269ef151d78SRuslan Ermilov.Va ip_p 270addeef82SBruce A. Mahset to zero. 271addeef82SBruce A. MahThe IPv6 upcalls have 272ef151d78SRuslan Ermilov.Vt "struct mrt6msg" 273ef151d78SRuslan Ermilovheader (see 274ef151d78SRuslan Ermilov.In netinet6/ip6_mroute.h ) 275ef151d78SRuslan Ermilovwith field 276ef151d78SRuslan Ermilov.Va im6_mbz 277addeef82SBruce A. Mahset to zero. 278addeef82SBruce A. MahNote that this header follows the structure of 279ef151d78SRuslan Ermilov.Vt "struct ip6_hdr" 280addeef82SBruce A. Mahwith the next header field 281ef151d78SRuslan Ermilov.Va ip6_nxt 282addeef82SBruce A. Mahset to zero. 283addeef82SBruce A. Mah.Pp 284addeef82SBruce A. MahThe upcall header contains field 285ef151d78SRuslan Ermilov.Va im_msgtype 286addeef82SBruce A. Mahand 287ef151d78SRuslan Ermilov.Va im6_msgtype 288addeef82SBruce A. Mahwith the type of the upcall 289ef151d78SRuslan Ermilov.Dv IGMPMSG_* 290addeef82SBruce A. Mahand 291ef151d78SRuslan Ermilov.Dv MRT6MSG_* 292addeef82SBruce A. Mahfor IPv4 and IPv6 respectively. 293addeef82SBruce A. MahThe values of the rest of the upcall header fields 294addeef82SBruce A. Mahand the body of the upcall message depend on the particular upcall type. 295addeef82SBruce A. Mah.Pp 296addeef82SBruce A. MahIf the upcall message type is 297ef151d78SRuslan Ermilov.Dv IGMPMSG_NOCACHE 298addeef82SBruce A. Mahor 299ef151d78SRuslan Ermilov.Dv MRT6MSG_NOCACHE , 300addeef82SBruce A. Mahthis is an indication that a multicast packet has reached the multicast 301addeef82SBruce A. Mahrouter, but the router has no forwarding state for that packet. 302addeef82SBruce A. MahTypically, the upcall would be a signal for the multicast routing 303addeef82SBruce A. Mahuser-level process to install the appropriate Multicast Forwarding 304addeef82SBruce A. MahCache (MFC) entry in the kernel. 305addeef82SBruce A. Mah.Pp 306ef151d78SRuslan ErmilovAn MFC entry is added by: 307addeef82SBruce A. Mah.Bd -literal 308addeef82SBruce A. Mah/* IPv4 */ 309addeef82SBruce A. Mahstruct mfcctl mc; 310addeef82SBruce A. Mahmemset(&mc, 0, sizeof(mc)); 311addeef82SBruce A. Mahmemcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin)); 312addeef82SBruce A. Mahmemcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp)); 313addeef82SBruce A. Mahmc.mfcc_parent = iif_index; 314addeef82SBruce A. Mahfor (i = 0; i < maxvifs; i++) 315addeef82SBruce A. Mah mc.mfcc_ttls[i] = oifs_ttl[i]; 316addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_MFC, 317addeef82SBruce A. Mah (void *)&mc, sizeof(mc)); 318addeef82SBruce A. Mah.Ed 319addeef82SBruce A. Mah.Bd -literal 320addeef82SBruce A. Mah/* IPv6 */ 321addeef82SBruce A. Mahstruct mf6cctl mc; 322addeef82SBruce A. Mahmemset(&mc, 0, sizeof(mc)); 323addeef82SBruce A. Mahmemcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin)); 324addeef82SBruce A. Mahmemcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp)); 325addeef82SBruce A. Mahmc.mf6cc_parent = iif_index; 326addeef82SBruce A. Mahfor (i = 0; i < maxvifs; i++) 327addeef82SBruce A. Mah if (oifs_ttl[i] > 0) 328addeef82SBruce A. Mah IF_SET(i, &mc.mf6cc_ifset); 329addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_ADD_MFC, 330addeef82SBruce A. Mah (void *)&mc, sizeof(mc)); 331addeef82SBruce A. Mah.Ed 332addeef82SBruce A. Mah.Pp 333addeef82SBruce A. MahThe 334ef151d78SRuslan Ermilov.Va source_addr 335addeef82SBruce A. Mahand 336ef151d78SRuslan Ermilov.Va group_addr 337addeef82SBruce A. Mahare the source and group address of the multicast packet (as set 338addeef82SBruce A. Mahin the upcall message). 339addeef82SBruce A. MahThe 340ef151d78SRuslan Ermilov.Va iif_index 341addeef82SBruce A. Mahis the virtual interface index of the multicast interface the multicast 342addeef82SBruce A. Mahpackets for this specific source and group address should be received on. 343addeef82SBruce A. MahThe 344ef151d78SRuslan Ermilov.Va oifs_ttl[] 345addeef82SBruce A. Maharray contains the minimum TTL (per interface) a multicast packet 346addeef82SBruce A. Mahshould have to be forwarded on an outgoing interface. 347addeef82SBruce A. MahIf the TTL value is zero, the corresponding interface is not included 348addeef82SBruce A. Mahin the set of outgoing interfaces. 349addeef82SBruce A. MahNote that in case of IPv6 only the set of outgoing interfaces can 350addeef82SBruce A. Mahbe specified. 351addeef82SBruce A. Mah.Pp 352ef151d78SRuslan ErmilovAn MFC entry is deleted by: 353addeef82SBruce A. Mah.Bd -literal 354addeef82SBruce A. Mah/* IPv4 */ 355addeef82SBruce A. Mahstruct mfcctl mc; 356addeef82SBruce A. Mahmemset(&mc, 0, sizeof(mc)); 357addeef82SBruce A. Mahmemcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin)); 358addeef82SBruce A. Mahmemcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp)); 359addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_MFC, 360addeef82SBruce A. Mah (void *)&mc, sizeof(mc)); 361addeef82SBruce A. Mah.Ed 362addeef82SBruce A. Mah.Bd -literal 363addeef82SBruce A. Mah/* IPv6 */ 364addeef82SBruce A. Mahstruct mf6cctl mc; 365addeef82SBruce A. Mahmemset(&mc, 0, sizeof(mc)); 366addeef82SBruce A. Mahmemcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin)); 367addeef82SBruce A. Mahmemcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp)); 368addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_DEL_MFC, 369addeef82SBruce A. Mah (void *)&mc, sizeof(mc)); 370addeef82SBruce A. Mah.Ed 371addeef82SBruce A. Mah.Pp 372addeef82SBruce A. MahThe following method can be used to get various statistics per 373addeef82SBruce A. Mahinstalled MFC entry in the kernel (e.g., the number of forwarded 374addeef82SBruce A. Mahpackets per source and group address): 375addeef82SBruce A. Mah.Bd -literal 376addeef82SBruce A. Mah/* IPv4 */ 377addeef82SBruce A. Mahstruct sioc_sg_req sgreq; 378addeef82SBruce A. Mahmemset(&sgreq, 0, sizeof(sgreq)); 379addeef82SBruce A. Mahmemcpy(&sgreq.src, &source_addr, sizeof(sgreq.src)); 380addeef82SBruce A. Mahmemcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp)); 381addeef82SBruce A. Mahioctl(mrouter_s4, SIOCGETSGCNT, &sgreq); 382addeef82SBruce A. Mah.Ed 383addeef82SBruce A. Mah.Bd -literal 384addeef82SBruce A. Mah/* IPv6 */ 385addeef82SBruce A. Mahstruct sioc_sg_req6 sgreq; 386addeef82SBruce A. Mahmemset(&sgreq, 0, sizeof(sgreq)); 387addeef82SBruce A. Mahmemcpy(&sgreq.src, &source_addr, sizeof(sgreq.src)); 388addeef82SBruce A. Mahmemcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp)); 389addeef82SBruce A. Mahioctl(mrouter_s6, SIOCGETSGCNT_IN6, &sgreq); 390addeef82SBruce A. Mah.Ed 391addeef82SBruce A. Mah.Pp 392addeef82SBruce A. MahThe following method can be used to get various statistics per 393addeef82SBruce A. Mahmulticast virtual interface in the kernel (e.g., the number of forwarded 394addeef82SBruce A. Mahpackets per interface): 395addeef82SBruce A. Mah.Bd -literal 396addeef82SBruce A. Mah/* IPv4 */ 397addeef82SBruce A. Mahstruct sioc_vif_req vreq; 398addeef82SBruce A. Mahmemset(&vreq, 0, sizeof(vreq)); 399addeef82SBruce A. Mahvreq.vifi = vif_index; 400addeef82SBruce A. Mahioctl(mrouter_s4, SIOCGETVIFCNT, &vreq); 401addeef82SBruce A. Mah.Ed 402addeef82SBruce A. Mah.Bd -literal 403addeef82SBruce A. Mah/* IPv6 */ 404addeef82SBruce A. Mahstruct sioc_mif_req6 mreq; 405addeef82SBruce A. Mahmemset(&mreq, 0, sizeof(mreq)); 406addeef82SBruce A. Mahmreq.mifi = vif_index; 407addeef82SBruce A. Mahioctl(mrouter_s6, SIOCGETMIFCNT_IN6, &mreq); 408addeef82SBruce A. Mah.Ed 409addeef82SBruce A. Mah.Ss Advanced Multicast API Programming Guide 410addeef82SBruce A. MahIf we want to add new features in the kernel, it becomes difficult 411addeef82SBruce A. Mahto preserve backward compatibility (binary and API), 412addeef82SBruce A. Mahand at the same time to allow user-level processes to take advantage of 413addeef82SBruce A. Mahthe new features (if the kernel supports them). 414addeef82SBruce A. Mah.Pp 415addeef82SBruce A. MahOne of the mechanisms that allows us to preserve the backward 416addeef82SBruce A. Mahcompatibility is a sort of negotiation 417addeef82SBruce A. Mahbetween the user-level process and the kernel: 418addeef82SBruce A. Mah.Bl -enum 419addeef82SBruce A. Mah.It 420addeef82SBruce A. MahThe user-level process tries to enable in the kernel the set of new 421addeef82SBruce A. Mahfeatures (and the corresponding API) it would like to use. 422addeef82SBruce A. Mah.It 423addeef82SBruce A. MahThe kernel returns the (sub)set of features it knows about 424addeef82SBruce A. Mahand is willing to be enabled. 425addeef82SBruce A. Mah.It 426addeef82SBruce A. MahThe user-level process uses only that set of features 427addeef82SBruce A. Mahthe kernel has agreed on. 428addeef82SBruce A. Mah.El 429addeef82SBruce A. Mah.\" 430addeef82SBruce A. Mah.Pp 431ef151d78SRuslan ErmilovTo support backward compatibility, if the user-level process does not 432addeef82SBruce A. Mahask for any new features, the kernel defaults to the basic 433addeef82SBruce A. Mahmulticast API (see the 434addeef82SBruce A. Mah.Sx "Programming Guide" 435addeef82SBruce A. Mahsection). 436addeef82SBruce A. Mah.\" XXX: edit as appropriate after the advanced multicast API is 437addeef82SBruce A. Mah.\" supported under IPv6 438addeef82SBruce A. MahCurrently, the advanced multicast API exists only for IPv4; 439addeef82SBruce A. Mahin the future there will be IPv6 support as well. 440addeef82SBruce A. Mah.Pp 441addeef82SBruce A. MahBelow is a summary of the expandable API solution. 442addeef82SBruce A. MahNote that all new options and structures are defined 443ef151d78SRuslan Ermilovin 444ef151d78SRuslan Ermilov.In netinet/ip_mroute.h 445ef151d78SRuslan Ermilovand 446ef151d78SRuslan Ermilov.In netinet6/ip6_mroute.h , 447addeef82SBruce A. Mahunless stated otherwise. 448addeef82SBruce A. Mah.Pp 449ef151d78SRuslan ErmilovThe user-level process uses new 450ef151d78SRuslan Ermilov.Fn getsockopt Ns / Ns Fn setsockopt 451ef151d78SRuslan Ermilovoptions to 452addeef82SBruce A. Mahperform the API features negotiation with the kernel. 453addeef82SBruce A. MahThis negotiation must be performed right after the multicast routing 454addeef82SBruce A. Mahsocket is open. 455addeef82SBruce A. MahThe set of desired/allowed features is stored in a bitset 456ef151d78SRuslan Ermilov(currently, in 457ef151d78SRuslan Ermilov.Vt uint32_t ; 458ef151d78SRuslan Ermilovi.e., maximum of 32 new features). 459ef151d78SRuslan ErmilovThe new 460ef151d78SRuslan Ermilov.Fn getsockopt Ns / Ns Fn setsockopt 461ef151d78SRuslan Ermilovoptions are 462ef151d78SRuslan Ermilov.Dv MRT_API_SUPPORT 463addeef82SBruce A. Mahand 464ef151d78SRuslan Ermilov.Dv MRT_API_CONFIG . 465addeef82SBruce A. MahExample: 466addeef82SBruce A. Mah.Bd -literal 467addeef82SBruce A. Mahuint32_t v; 468addeef82SBruce A. Mahgetsockopt(sock, IPPROTO_IP, MRT_API_SUPPORT, (void *)&v, sizeof(v)); 469addeef82SBruce A. Mah.Ed 470addeef82SBruce A. Mah.Pp 471addeef82SBruce A. Mahwould set in 472ef151d78SRuslan Ermilov.Va v 473addeef82SBruce A. Mahthe pre-defined bits that the kernel API supports. 474ef151d78SRuslan ErmilovThe eight least significant bits in 475ef151d78SRuslan Ermilov.Vt uint32_t 476ef151d78SRuslan Ermilovare same as the 477addeef82SBruce A. Maheight possible flags 478ef151d78SRuslan Ermilov.Dv MRT_MFC_FLAGS_* 479addeef82SBruce A. Mahthat can be used in 480ef151d78SRuslan Ermilov.Va mfcc_flags 481addeef82SBruce A. Mahas part of the new definition of 482ef151d78SRuslan Ermilov.Vt "struct mfcctl" 483addeef82SBruce A. Mah(see below about those flags), which leaves 24 flags for other new features. 484ef151d78SRuslan ErmilovThe value returned by 485ef151d78SRuslan Ermilov.Fn getsockopt MRT_API_SUPPORT 486ef151d78SRuslan Ermilovis read-only; in other words, 487ef151d78SRuslan Ermilov.Fn setsockopt MRT_API_SUPPORT 488ef151d78SRuslan Ermilovwould fail. 489addeef82SBruce A. Mah.Pp 490addeef82SBruce A. MahTo modify the API, and to set some specific feature in the kernel, then: 491addeef82SBruce A. Mah.Bd -literal 492addeef82SBruce A. Mahuint32_t v = MRT_MFC_FLAGS_DISABLE_WRONGVIF; 493addeef82SBruce A. Mahif (setsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v)) 494addeef82SBruce A. Mah != 0) { 495addeef82SBruce A. Mah return (ERROR); 496addeef82SBruce A. Mah} 497addeef82SBruce A. Mahif (v & MRT_MFC_FLAGS_DISABLE_WRONGVIF) 498addeef82SBruce A. Mah return (OK); /* Success */ 499addeef82SBruce A. Mahelse 500addeef82SBruce A. Mah return (ERROR); 501addeef82SBruce A. Mah.Ed 502addeef82SBruce A. Mah.Pp 503ef151d78SRuslan ErmilovIn other words, when 504ef151d78SRuslan Ermilov.Fn setsockopt MRT_API_CONFIG 505ef151d78SRuslan Ermilovis called, the 506addeef82SBruce A. Mahargument to it specifies the desired set of features to 507addeef82SBruce A. Mahbe enabled in the API and the kernel. 508addeef82SBruce A. MahThe return value in 509ef151d78SRuslan Ermilov.Va v 510addeef82SBruce A. Mahis the actual (sub)set of features that were enabled in the kernel. 511addeef82SBruce A. MahTo obtain later the same set of features that were enabled, then: 512addeef82SBruce A. Mah.Bd -literal 513addeef82SBruce A. Mahgetsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v)); 514addeef82SBruce A. Mah.Ed 515addeef82SBruce A. Mah.Pp 516addeef82SBruce A. MahThe set of enabled features is global. 517ef151d78SRuslan ErmilovIn other words, 518ef151d78SRuslan Ermilov.Fn setsockopt MRT_API_CONFIG 519ef151d78SRuslan Ermilovshould be called right after 520ef151d78SRuslan Ermilov.Fn setsockopt MRT_INIT . 521addeef82SBruce A. Mah.Pp 522addeef82SBruce A. MahCurrently, the following set of new features is defined: 523addeef82SBruce A. Mah.Bd -literal 524addeef82SBruce A. Mah#define MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */ 525addeef82SBruce A. Mah#define MRT_MFC_FLAGS_BORDER_VIF (1 << 1) /* border vif */ 526addeef82SBruce A. Mah#define MRT_MFC_RP (1 << 8) /* enable RP address */ 527addeef82SBruce A. Mah#define MRT_MFC_BW_UPCALL (1 << 9) /* enable bw upcalls */ 528addeef82SBruce A. Mah.Ed 529addeef82SBruce A. Mah.\" .Pp 530addeef82SBruce A. Mah.\" In the future there might be: 531addeef82SBruce A. Mah.\" .Bd -literal 532addeef82SBruce A. Mah.\" #define MRT_MFC_GROUP_SPECIFIC (1 << 10) /* allow (*,G) MFC entries */ 533addeef82SBruce A. Mah.\" .Ed 534addeef82SBruce A. Mah.\" .Pp 535addeef82SBruce A. Mah.\" to allow (*,G) MFC entries (i.e., group-specific entries) in the kernel. 536addeef82SBruce A. Mah.\" For now this is left-out until it is clear whether 537addeef82SBruce A. Mah.\" (*,G) MFC support is the preferred solution instead of something more generic 538addeef82SBruce A. Mah.\" solution for example. 539addeef82SBruce A. Mah.\" 540addeef82SBruce A. Mah.\" 2. The newly defined struct mfcctl2. 541addeef82SBruce A. Mah.\" 542addeef82SBruce A. Mah.Pp 543addeef82SBruce A. MahThe advanced multicast API uses a newly defined 544ef151d78SRuslan Ermilov.Vt "struct mfcctl2" 545addeef82SBruce A. Mahinstead of the traditional 546ef151d78SRuslan Ermilov.Vt "struct mfcctl" . 547addeef82SBruce A. MahThe original 548ef151d78SRuslan Ermilov.Vt "struct mfcctl" 549addeef82SBruce A. Mahis kept as is. 550addeef82SBruce A. MahThe new 551ef151d78SRuslan Ermilov.Vt "struct mfcctl2" 552addeef82SBruce A. Mahis: 553addeef82SBruce A. Mah.Bd -literal 554addeef82SBruce A. Mah/* 555addeef82SBruce A. Mah * The new argument structure for MRT_ADD_MFC and MRT_DEL_MFC overlays 556addeef82SBruce A. Mah * and extends the old struct mfcctl. 557addeef82SBruce A. Mah */ 558addeef82SBruce A. Mahstruct mfcctl2 { 559addeef82SBruce A. Mah /* the mfcctl fields */ 560addeef82SBruce A. Mah struct in_addr mfcc_origin; /* ip origin of mcasts */ 561addeef82SBruce A. Mah struct in_addr mfcc_mcastgrp; /* multicast group associated*/ 562addeef82SBruce A. Mah vifi_t mfcc_parent; /* incoming vif */ 563addeef82SBruce A. Mah u_char mfcc_ttls[MAXVIFS];/* forwarding ttls on vifs */ 564addeef82SBruce A. Mah 565addeef82SBruce A. Mah /* extension fields */ 566addeef82SBruce A. Mah uint8_t mfcc_flags[MAXVIFS];/* the MRT_MFC_FLAGS_* flags*/ 567addeef82SBruce A. Mah struct in_addr mfcc_rp; /* the RP address */ 568addeef82SBruce A. Mah}; 569addeef82SBruce A. Mah.Ed 570addeef82SBruce A. Mah.Pp 571addeef82SBruce A. MahThe new fields are 572ef151d78SRuslan Ermilov.Va mfcc_flags[MAXVIFS] 573addeef82SBruce A. Mahand 574ef151d78SRuslan Ermilov.Va mfcc_rp . 575addeef82SBruce A. MahNote that for compatibility reasons they are added at the end. 576addeef82SBruce A. Mah.Pp 577addeef82SBruce A. MahThe 578ef151d78SRuslan Ermilov.Va mfcc_flags[MAXVIFS] 579addeef82SBruce A. Mahfield is used to set various flags per 580addeef82SBruce A. Mahinterface per (S,G) entry. 581addeef82SBruce A. MahCurrently, the defined flags are: 582addeef82SBruce A. Mah.Bd -literal 583addeef82SBruce A. Mah#define MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */ 584addeef82SBruce A. Mah#define MRT_MFC_FLAGS_BORDER_VIF (1 << 1) /* border vif */ 585addeef82SBruce A. Mah.Ed 586addeef82SBruce A. Mah.Pp 587addeef82SBruce A. MahThe 588ef151d78SRuslan Ermilov.Dv MRT_MFC_FLAGS_DISABLE_WRONGVIF 589addeef82SBruce A. Mahflag is used to explicitly disable the 590ef151d78SRuslan Ermilov.Dv IGMPMSG_WRONGVIF 591addeef82SBruce A. Mahkernel signal at the (S,G) granularity if a multicast data packet 592addeef82SBruce A. Maharrives on the wrong interface. 593addeef82SBruce A. MahUsually, this signal is used to 594addeef82SBruce A. Mahcomplete the shortest-path switch in case of PIM-SM multicast routing, 595addeef82SBruce A. Mahor to trigger a PIM assert message. 596addeef82SBruce A. MahHowever, it should not be delivered for interfaces that are not in 597addeef82SBruce A. Mahthe outgoing interface set, and that are not expecting to 598addeef82SBruce A. Mahbecome an incoming interface. 599addeef82SBruce A. MahHence, if the 600ef151d78SRuslan Ermilov.Dv MRT_MFC_FLAGS_DISABLE_WRONGVIF 601addeef82SBruce A. Mahflag is set for some of the 602addeef82SBruce A. Mahinterfaces, then a data packet that arrives on that interface for 603addeef82SBruce A. Mahthat MFC entry will NOT trigger a WRONGVIF signal. 604addeef82SBruce A. MahIf that flag is not set, then a signal is triggered (the default action). 605addeef82SBruce A. Mah.Pp 606addeef82SBruce A. MahThe 607ef151d78SRuslan Ermilov.Dv MRT_MFC_FLAGS_BORDER_VIF 608addeef82SBruce A. Mahflag is used to specify whether the Border-bit in PIM 609addeef82SBruce A. MahRegister messages should be set (in case when the Register encapsulation 610addeef82SBruce A. Mahis performed inside the kernel). 611addeef82SBruce A. MahIf it is set for the special PIM Register kernel virtual interface 612addeef82SBruce A. Mah(see 613addeef82SBruce A. Mah.Xr pim 4 ) , 614addeef82SBruce A. Mahthe Border-bit in the Register messages sent to the RP will be set. 615addeef82SBruce A. Mah.Pp 616addeef82SBruce A. MahThe remaining six bits are reserved for future usage. 617addeef82SBruce A. Mah.Pp 618addeef82SBruce A. MahThe 619ef151d78SRuslan Ermilov.Va mfcc_rp 620addeef82SBruce A. Mahfield is used to specify the RP address (in case of PIM-SM multicast routing) 621addeef82SBruce A. Mahfor a multicast 622addeef82SBruce A. Mahgroup G if we want to perform kernel-level PIM Register encapsulation. 623addeef82SBruce A. MahThe 624ef151d78SRuslan Ermilov.Va mfcc_rp 625addeef82SBruce A. Mahfield is used only if the 626ef151d78SRuslan Ermilov.Dv MRT_MFC_RP 627addeef82SBruce A. Mahadvanced API flag/capability has been successfully set by 628ef151d78SRuslan Ermilov.Fn setsockopt MRT_API_CONFIG . 629addeef82SBruce A. Mah.Pp 630addeef82SBruce A. Mah.\" 631addeef82SBruce A. Mah.\" 3. Kernel-level PIM Register encapsulation 632addeef82SBruce A. Mah.\" 633addeef82SBruce A. MahIf the 634ef151d78SRuslan Ermilov.Dv MRT_MFC_RP 635addeef82SBruce A. Mahflag was successfully set by 636ef151d78SRuslan Ermilov.Fn setsockopt MRT_API_CONFIG , 637ef151d78SRuslan Ermilovthen the kernel will attempt to perform 638addeef82SBruce A. Mahthe PIM Register encapsulation itself instead of sending the 639ef151d78SRuslan Ermilovmulticast data packets to user level (inside 640ef151d78SRuslan Ermilov.Dv IGMPMSG_WHOLEPKT 641addeef82SBruce A. Mahupcalls) for user-level encapsulation. 642addeef82SBruce A. MahThe RP address would be taken from the 643ef151d78SRuslan Ermilov.Va mfcc_rp 644addeef82SBruce A. Mahfield 645addeef82SBruce A. Mahinside the new 646ef151d78SRuslan Ermilov.Vt "struct mfcctl2" . 647addeef82SBruce A. MahHowever, even if the 648ef151d78SRuslan Ermilov.Dv MRT_MFC_RP 649addeef82SBruce A. Mahflag was successfully set, if the 650ef151d78SRuslan Ermilov.Va mfcc_rp 651addeef82SBruce A. Mahfield was set to 652ef151d78SRuslan Ermilov.Dv INADDR_ANY , 653addeef82SBruce A. Mahthen the 654ef151d78SRuslan Ermilovkernel will still deliver an 655ef151d78SRuslan Ermilov.Dv IGMPMSG_WHOLEPKT 656ef151d78SRuslan Ermilovupcall with the 657addeef82SBruce A. Mahmulticast data packet to the user-level process. 658addeef82SBruce A. Mah.Pp 659addeef82SBruce A. MahIn addition, if the multicast data packet is too large to fit within 660addeef82SBruce A. Maha single IP packet after the PIM Register encapsulation (e.g., if 661addeef82SBruce A. Mahits size was on the order of 65500 bytes), the data packet will be 662addeef82SBruce A. Mahfragmented, and then each of the fragments will be encapsulated 663addeef82SBruce A. Mahseparately. 664addeef82SBruce A. MahNote that typically a multicast data packet can be that 665addeef82SBruce A. Mahlarge only if it was originated locally from the same hosts that 666addeef82SBruce A. Mahperforms the encapsulation; otherwise the transmission of the 667addeef82SBruce A. Mahmulticast data packet over Ethernet for example would have 668addeef82SBruce A. Mahfragmented it into much smaller pieces. 669addeef82SBruce A. Mah.\" 670addeef82SBruce A. Mah.\" Note that if this code is ported to IPv6, we may need the kernel to 671addeef82SBruce A. Mah.\" perform MTU discovery to the RP, and keep those discoveries inside 672addeef82SBruce A. Mah.\" the kernel so the encapsulating router may send back ICMP 673addeef82SBruce A. Mah.\" Fragmentation Required if the size of the multicast data packet is 674addeef82SBruce A. Mah.\" too large (see "Encapsulating data packets in the Register Tunnel" 675addeef82SBruce A. Mah.\" in Section 4.4.1 in the PIM-SM spec 676addeef82SBruce A. Mah.\" draft-ietf-pim-sm-v2-new-05.{txt,ps}). 677addeef82SBruce A. Mah.\" For IPv4 we may be able to get away without it, but for IPv6 we need 678addeef82SBruce A. Mah.\" that. 679addeef82SBruce A. Mah.\" 680addeef82SBruce A. Mah.\" 4. Mechanism for "multicast bandwidth monitoring and upcalls". 681addeef82SBruce A. Mah.\" 682addeef82SBruce A. Mah.Pp 683addeef82SBruce A. MahTypically, a multicast routing user-level process would need to know the 684addeef82SBruce A. Mahforwarding bandwidth for some data flow. 685addeef82SBruce A. MahFor example, the multicast routing process may want to timeout idle MFC 686addeef82SBruce A. Mahentries, or in case of PIM-SM it can initiate (S,G) shortest-path switch if 687addeef82SBruce A. Mahthe bandwidth rate is above a threshold for example. 688addeef82SBruce A. Mah.Pp 689addeef82SBruce A. MahThe original solution for measuring the bandwidth of a dataflow was 690addeef82SBruce A. Mahthat a user-level process would periodically 691addeef82SBruce A. Mahquery the kernel about the number of forwarded packets/bytes per 692addeef82SBruce A. Mah(S,G), and then based on those numbers it would estimate whether a source 693addeef82SBruce A. Mahhas been idle, or whether the source's transmission bandwidth is above a 694addeef82SBruce A. Mahthreshold. 695addeef82SBruce A. MahThat solution is far from being scalable, hence the need for a new 696addeef82SBruce A. Mahmechanism for bandwidth monitoring. 697addeef82SBruce A. Mah.Pp 698addeef82SBruce A. MahBelow is a description of the bandwidth monitoring mechanism. 699addeef82SBruce A. Mah.Bl -bullet 700addeef82SBruce A. Mah.It 701addeef82SBruce A. MahIf the bandwidth of a data flow satisfies some pre-defined filter, 702addeef82SBruce A. Mahthe kernel delivers an upcall on the multicast routing socket 703addeef82SBruce A. Mahto the multicast routing process that has installed that filter. 704addeef82SBruce A. Mah.It 7055203edcdSRuslan ErmilovThe bandwidth-upcall filters are installed per (S,G). 7065203edcdSRuslan ErmilovThere can be 707addeef82SBruce A. Mahmore than one filter per (S,G). 708addeef82SBruce A. Mah.It 709addeef82SBruce A. MahInstead of supporting all possible comparison operations 710addeef82SBruce A. Mah(i.e., < <= == != > >= ), there is support only for the 711addeef82SBruce A. Mah<= and >= operations, 712addeef82SBruce A. Mahbecause this makes the kernel-level implementation simpler, 713addeef82SBruce A. Mahand because practically we need only those two. 714addeef82SBruce A. MahFurther, the missing operations can be simulated by secondary 715addeef82SBruce A. Mahuser-level filtering of those <= and >= filters. 716addeef82SBruce A. MahFor example, to simulate !=, then we need to install filter 717addeef82SBruce A. Mah.Dq bw <= 0xffffffff , 718addeef82SBruce A. Mahand after an 719addeef82SBruce A. Mahupcall is received, we need to check whether 720addeef82SBruce A. Mah.Dq measured_bw != expected_bw . 721addeef82SBruce A. Mah.It 722addeef82SBruce A. MahThe bandwidth-upcall mechanism is enabled by 723ef151d78SRuslan Ermilov.Fn setsockopt MRT_API_CONFIG 724ef151d78SRuslan Ermilovfor the 725ef151d78SRuslan Ermilov.Dv MRT_MFC_BW_UPCALL 726ef151d78SRuslan Ermilovflag. 727addeef82SBruce A. Mah.It 728addeef82SBruce A. MahThe bandwidth-upcall filters are added/deleted by the new 729ef151d78SRuslan Ermilov.Fn setsockopt MRT_ADD_BW_UPCALL 730ef151d78SRuslan Ermilovand 731ef151d78SRuslan Ermilov.Fn setsockopt MRT_DEL_BW_UPCALL 732addeef82SBruce A. Mahrespectively (with the appropriate 733ef151d78SRuslan Ermilov.Vt "struct bw_upcall" 734addeef82SBruce A. Mahargument of course). 735addeef82SBruce A. Mah.El 736addeef82SBruce A. Mah.Pp 737addeef82SBruce A. MahFrom application point of view, a developer needs to know about 738addeef82SBruce A. Mahthe following: 739addeef82SBruce A. Mah.Bd -literal 740addeef82SBruce A. Mah/* 741addeef82SBruce A. Mah * Structure for installing or delivering an upcall if the 742addeef82SBruce A. Mah * measured bandwidth is above or below a threshold. 743addeef82SBruce A. Mah * 744addeef82SBruce A. Mah * User programs (e.g. daemons) may have a need to know when the 745addeef82SBruce A. Mah * bandwidth used by some data flow is above or below some threshold. 746addeef82SBruce A. Mah * This interface allows the userland to specify the threshold (in 747addeef82SBruce A. Mah * bytes and/or packets) and the measurement interval. Flows are 748addeef82SBruce A. Mah * all packet with the same source and destination IP address. 749addeef82SBruce A. Mah * At the moment the code is only used for multicast destinations 750addeef82SBruce A. Mah * but there is nothing that prevents its use for unicast. 751addeef82SBruce A. Mah * 752addeef82SBruce A. Mah * The measurement interval cannot be shorter than some Tmin (currently, 3s). 753addeef82SBruce A. Mah * The threshold is set in packets and/or bytes per_interval. 754addeef82SBruce A. Mah * 755addeef82SBruce A. Mah * Measurement works as follows: 756addeef82SBruce A. Mah * 757addeef82SBruce A. Mah * For >= measurements: 758addeef82SBruce A. Mah * The first packet marks the start of a measurement interval. 759addeef82SBruce A. Mah * During an interval we count packets and bytes, and when we 760addeef82SBruce A. Mah * pass the threshold we deliver an upcall and we are done. 761addeef82SBruce A. Mah * The first packet after the end of the interval resets the 762addeef82SBruce A. Mah * count and restarts the measurement. 763addeef82SBruce A. Mah * 764addeef82SBruce A. Mah * For <= measurement: 765addeef82SBruce A. Mah * We start a timer to fire at the end of the interval, and 766addeef82SBruce A. Mah * then for each incoming packet we count packets and bytes. 767addeef82SBruce A. Mah * When the timer fires, we compare the value with the threshold, 768addeef82SBruce A. Mah * schedule an upcall if we are below, and restart the measurement 769addeef82SBruce A. Mah * (reschedule timer and zero counters). 770addeef82SBruce A. Mah */ 771addeef82SBruce A. Mah 772addeef82SBruce A. Mahstruct bw_data { 773addeef82SBruce A. Mah struct timeval b_time; 774addeef82SBruce A. Mah uint64_t b_packets; 775addeef82SBruce A. Mah uint64_t b_bytes; 776addeef82SBruce A. Mah}; 777addeef82SBruce A. Mah 778addeef82SBruce A. Mahstruct bw_upcall { 779addeef82SBruce A. Mah struct in_addr bu_src; /* source address */ 780addeef82SBruce A. Mah struct in_addr bu_dst; /* destination address */ 781addeef82SBruce A. Mah uint32_t bu_flags; /* misc flags (see below) */ 782addeef82SBruce A. Mah#define BW_UPCALL_UNIT_PACKETS (1 << 0) /* threshold (in packets) */ 783addeef82SBruce A. Mah#define BW_UPCALL_UNIT_BYTES (1 << 1) /* threshold (in bytes) */ 784addeef82SBruce A. Mah#define BW_UPCALL_GEQ (1 << 2) /* upcall if bw >= threshold */ 785addeef82SBruce A. Mah#define BW_UPCALL_LEQ (1 << 3) /* upcall if bw <= threshold */ 786addeef82SBruce A. Mah#define BW_UPCALL_DELETE_ALL (1 << 4) /* delete all upcalls for s,d*/ 787addeef82SBruce A. Mah struct bw_data bu_threshold; /* the bw threshold */ 788addeef82SBruce A. Mah struct bw_data bu_measured; /* the measured bw */ 789addeef82SBruce A. Mah}; 790addeef82SBruce A. Mah 791addeef82SBruce A. Mah/* max. number of upcalls to deliver together */ 792addeef82SBruce A. Mah#define BW_UPCALLS_MAX 128 793addeef82SBruce A. Mah/* min. threshold time interval for bandwidth measurement */ 794addeef82SBruce A. Mah#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_SEC 3 795addeef82SBruce A. Mah#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_USEC 0 796addeef82SBruce A. Mah.Ed 797addeef82SBruce A. Mah.Pp 798addeef82SBruce A. MahThe 799ef151d78SRuslan Ermilov.Vt bw_upcall 800addeef82SBruce A. Mahstructure is used as an argument to 801ef151d78SRuslan Ermilov.Fn setsockopt MRT_ADD_BW_UPCALL 802ef151d78SRuslan Ermilovand 803ef151d78SRuslan Ermilov.Fn setsockopt MRT_DEL_BW_UPCALL . 804ef151d78SRuslan ErmilovEach 805ef151d78SRuslan Ermilov.Fn setsockopt MRT_ADD_BW_UPCALL 806ef151d78SRuslan Ermilovinstalls a filter in the kernel 807addeef82SBruce A. Mahfor the source and destination address in the 808ef151d78SRuslan Ermilov.Vt bw_upcall 809addeef82SBruce A. Mahargument, 810addeef82SBruce A. Mahand that filter will trigger an upcall according to the following 811addeef82SBruce A. Mahpseudo-algorithm: 812addeef82SBruce A. Mah.Bd -literal 813addeef82SBruce A. Mah if (bw_upcall_oper IS ">=") { 814addeef82SBruce A. Mah if (((bw_upcall_unit & PACKETS == PACKETS) && 815addeef82SBruce A. Mah (measured_packets >= threshold_packets)) || 816addeef82SBruce A. Mah ((bw_upcall_unit & BYTES == BYTES) && 817addeef82SBruce A. Mah (measured_bytes >= threshold_bytes))) 818addeef82SBruce A. Mah SEND_UPCALL("measured bandwidth is >= threshold"); 819addeef82SBruce A. Mah } 820addeef82SBruce A. Mah if (bw_upcall_oper IS "<=" && measured_interval >= threshold_interval) { 821addeef82SBruce A. Mah if (((bw_upcall_unit & PACKETS == PACKETS) && 822addeef82SBruce A. Mah (measured_packets <= threshold_packets)) || 823addeef82SBruce A. Mah ((bw_upcall_unit & BYTES == BYTES) && 824addeef82SBruce A. Mah (measured_bytes <= threshold_bytes))) 825addeef82SBruce A. Mah SEND_UPCALL("measured bandwidth is <= threshold"); 826addeef82SBruce A. Mah } 827addeef82SBruce A. Mah.Ed 828addeef82SBruce A. Mah.Pp 829addeef82SBruce A. MahIn the same 830ef151d78SRuslan Ermilov.Vt bw_upcall 831addeef82SBruce A. Mahthe unit can be specified in both BYTES and PACKETS. 832addeef82SBruce A. MahHowever, the GEQ and LEQ flags are mutually exclusive. 833addeef82SBruce A. Mah.Pp 834addeef82SBruce A. MahBasically, an upcall is delivered if the measured bandwidth is >= or 835addeef82SBruce A. Mah<= the threshold bandwidth (within the specified measurement 836addeef82SBruce A. Mahinterval). 837addeef82SBruce A. MahFor practical reasons, the smallest value for the measurement 838addeef82SBruce A. Mahinterval is 3 seconds. 839addeef82SBruce A. MahIf smaller values are allowed, then the bandwidth 840addeef82SBruce A. Mahestimation may be less accurate, or the potentially very high frequency 841addeef82SBruce A. Mahof the generated upcalls may introduce too much overhead. 842addeef82SBruce A. MahFor the >= operation, the answer may be known before the end of 843ef151d78SRuslan Ermilov.Va threshold_interval , 844addeef82SBruce A. Mahtherefore the upcall may be delivered earlier. 845addeef82SBruce A. MahFor the <= operation however, we must wait 846addeef82SBruce A. Mahuntil the threshold interval has expired to know the answer. 847addeef82SBruce A. Mah.Pp 848addeef82SBruce A. MahExample of usage: 849addeef82SBruce A. Mah.Bd -literal 850addeef82SBruce A. Mahstruct bw_upcall bw_upcall; 851addeef82SBruce A. Mah/* Assign all bw_upcall fields as appropriate */ 852addeef82SBruce A. Mahmemset(&bw_upcall, 0, sizeof(bw_upcall)); 853addeef82SBruce A. Mahmemcpy(&bw_upcall.bu_src, &source, sizeof(bw_upcall.bu_src)); 854addeef82SBruce A. Mahmemcpy(&bw_upcall.bu_dst, &group, sizeof(bw_upcall.bu_dst)); 855addeef82SBruce A. Mahbw_upcall.bu_threshold.b_data = threshold_interval; 856addeef82SBruce A. Mahbw_upcall.bu_threshold.b_packets = threshold_packets; 857addeef82SBruce A. Mahbw_upcall.bu_threshold.b_bytes = threshold_bytes; 858addeef82SBruce A. Mahif (is_threshold_in_packets) 859addeef82SBruce A. Mah bw_upcall.bu_flags |= BW_UPCALL_UNIT_PACKETS; 860addeef82SBruce A. Mahif (is_threshold_in_bytes) 861addeef82SBruce A. Mah bw_upcall.bu_flags |= BW_UPCALL_UNIT_BYTES; 862addeef82SBruce A. Mahdo { 863addeef82SBruce A. Mah if (is_geq_upcall) { 864addeef82SBruce A. Mah bw_upcall.bu_flags |= BW_UPCALL_GEQ; 865addeef82SBruce A. Mah break; 866addeef82SBruce A. Mah } 867addeef82SBruce A. Mah if (is_leq_upcall) { 868addeef82SBruce A. Mah bw_upcall.bu_flags |= BW_UPCALL_LEQ; 869addeef82SBruce A. Mah break; 870addeef82SBruce A. Mah } 871addeef82SBruce A. Mah return (ERROR); 872addeef82SBruce A. Mah} while (0); 873addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_BW_UPCALL, 874addeef82SBruce A. Mah (void *)&bw_upcall, sizeof(bw_upcall)); 875addeef82SBruce A. Mah.Ed 876addeef82SBruce A. Mah.Pp 877ef151d78SRuslan ErmilovTo delete a single filter, then use 878ef151d78SRuslan Ermilov.Dv MRT_DEL_BW_UPCALL , 879addeef82SBruce A. Mahand the fields of bw_upcall must be set 880ef151d78SRuslan Ermilovexactly same as when 881ef151d78SRuslan Ermilov.Dv MRT_ADD_BW_UPCALL 882ef151d78SRuslan Ermilovwas called. 883addeef82SBruce A. Mah.Pp 884addeef82SBruce A. MahTo delete all bandwidth filters for a given (S,G), then 885addeef82SBruce A. Mahonly the 886ef151d78SRuslan Ermilov.Va bu_src 887addeef82SBruce A. Mahand 888ef151d78SRuslan Ermilov.Va bu_dst 889addeef82SBruce A. Mahfields in 890ef151d78SRuslan Ermilov.Vt "struct bw_upcall" 891addeef82SBruce A. Mahneed to be set, and then just set only the 892ef151d78SRuslan Ermilov.Dv BW_UPCALL_DELETE_ALL 893addeef82SBruce A. Mahflag inside field 894ef151d78SRuslan Ermilov.Va bw_upcall.bu_flags . 895addeef82SBruce A. Mah.Pp 896addeef82SBruce A. MahThe bandwidth upcalls are received by aggregating them in the new upcall 897addeef82SBruce A. Mahmessage: 898addeef82SBruce A. Mah.Bd -literal 899addeef82SBruce A. Mah#define IGMPMSG_BW_UPCALL 4 /* BW monitoring upcall */ 900addeef82SBruce A. Mah.Ed 901addeef82SBruce A. Mah.Pp 902addeef82SBruce A. MahThis message is an array of 903ef151d78SRuslan Ermilov.Vt "struct bw_upcall" 904ef151d78SRuslan Ermilovelements (up to 905ef151d78SRuslan Ermilov.Dv BW_UPCALLS_MAX 906ef151d78SRuslan Ermilov= 128). 907addeef82SBruce A. MahThe upcalls are 908addeef82SBruce A. Mahdelivered when there are 128 pending upcalls, or when 1 second has 909addeef82SBruce A. Mahexpired since the previous upcall (whichever comes first). 910addeef82SBruce A. MahIn an 911ef151d78SRuslan Ermilov.Vt "struct upcall" 912addeef82SBruce A. Mahelement, the 913ef151d78SRuslan Ermilov.Va bu_measured 914addeef82SBruce A. Mahfield is filled-in to 915addeef82SBruce A. Mahindicate the particular measured values. 916addeef82SBruce A. MahHowever, because of the way 917addeef82SBruce A. Mahthe particular intervals are measured, the user should be careful how 918ef151d78SRuslan Ermilov.Va bu_measured.b_time 919ef151d78SRuslan Ermilovis used. 920addeef82SBruce A. MahFor example, if the 921addeef82SBruce A. Mahfilter is installed to trigger an upcall if the number of packets 922addeef82SBruce A. Mahis >= 1, then 923ef151d78SRuslan Ermilov.Va bu_measured 924addeef82SBruce A. Mahmay have a value of zero in the upcalls after the 925addeef82SBruce A. Mahfirst one, because the measured interval for >= filters is 926addeef82SBruce A. Mah.Dq clocked 927addeef82SBruce A. Mahby the forwarded packets. 928addeef82SBruce A. MahHence, this upcall mechanism should not be used for measuring 929addeef82SBruce A. Mahthe exact value of the bandwidth of the forwarded data. 930addeef82SBruce A. MahTo measure the exact bandwidth, the user would need to 931ef151d78SRuslan Ermilovget the forwarded packets statistics with the 932ef151d78SRuslan Ermilov.Fn ioctl SIOCGETSGCNT 933addeef82SBruce A. Mahmechanism 934addeef82SBruce A. Mah(see the 935addeef82SBruce A. Mah.Sx Programming Guide 936addeef82SBruce A. Mahsection) . 937addeef82SBruce A. Mah.Pp 938addeef82SBruce A. MahNote that the upcalls for a filter are delivered until the specific 939addeef82SBruce A. Mahfilter is deleted, but no more frequently than once per 940ef151d78SRuslan Ermilov.Va bu_threshold.b_time . 941addeef82SBruce A. MahFor example, if the filter is specified to 942addeef82SBruce A. Mahdeliver a signal if bw >= 1 packet, the first packet will trigger a 943addeef82SBruce A. Mahsignal, but the next upcall will be triggered no earlier than 944ef151d78SRuslan Ermilov.Va bu_threshold.b_time 945addeef82SBruce A. Mahafter the previous upcall. 946addeef82SBruce A. Mah.\" 947addeef82SBruce A. Mah.Sh SEE ALSO 9480770db89SBruce M Simpson.Xr altq 4 , 9490770db89SBruce M Simpson.Xr dummynet 4 , 950addeef82SBruce A. Mah.Xr getsockopt 2 , 9510770db89SBruce M Simpson.Xr gif 4 , 9520770db89SBruce M Simpson.Xr gre 4 , 953addeef82SBruce A. Mah.Xr recvfrom 2 , 954addeef82SBruce A. Mah.Xr recvmsg 2 , 955addeef82SBruce A. Mah.Xr setsockopt 2 , 956addeef82SBruce A. Mah.Xr socket 2 , 9572fb1aecaSBruce M Simpson.Xr sourcefilter 3 , 958addeef82SBruce A. Mah.Xr icmp6 4 , 959d10910e6SBruce M Simpson.Xr igmp 4 , 960addeef82SBruce A. Mah.Xr inet 4 , 961addeef82SBruce A. Mah.Xr inet6 4 , 962addeef82SBruce A. Mah.Xr intro 4 , 963addeef82SBruce A. Mah.Xr ip 4 , 964addeef82SBruce A. Mah.Xr ip6 4 , 965addeef82SBruce A. Mah.Xr pim 4 966addeef82SBruce A. Mah.\" 967eea554b7SBruce M Simpson.Sh HISTORY 968eea554b7SBruce M SimpsonThe Distance Vector Multicast Routing Protocol (DVMRP) 969eea554b7SBruce M Simpsonwas the first developed multicast routing protocol. 970eea554b7SBruce M SimpsonLater, other protocols such as Multicast Extensions to OSPF (MOSPF) 971eea554b7SBruce M Simpsonand Core Based Trees (CBT), were developed as well. 972eea554b7SBruce M SimpsonRouters at autonomous system boundaries may now exchange multicast 973eea554b7SBruce M Simpsonroutes with peers via the Border Gateway Protocol (BGP). 974eea554b7SBruce M SimpsonMany other routing protocols are able to redistribute multicast routes 975eea554b7SBruce M Simpsonfor use with 976eea554b7SBruce M Simpson.Dv PIM-SM 977eea554b7SBruce M Simpsonand 978eea554b7SBruce M Simpson.Dv PIM-DM . 979addeef82SBruce A. Mah.Sh AUTHORS 9805203edcdSRuslan Ermilov.An -nosplit 9815203edcdSRuslan ErmilovThe original multicast code was written by 9825203edcdSRuslan Ermilov.An David Waitzman 9835203edcdSRuslan Ermilov(BBN Labs), 984addeef82SBruce A. Mahand later modified by the following individuals: 9855203edcdSRuslan Ermilov.An Steve Deering 9865203edcdSRuslan Ermilov(Stanford), 9875203edcdSRuslan Ermilov.An Mark J. Steiglitz 9885203edcdSRuslan Ermilov(Stanford), 9895203edcdSRuslan Ermilov.An Van Jacobson 9905203edcdSRuslan Ermilov(LBL), 9915203edcdSRuslan Ermilov.An Ajit Thyagarajan 9925203edcdSRuslan Ermilov(PARC), 9935203edcdSRuslan Ermilov.An Bill Fenner 9945203edcdSRuslan Ermilov(PARC). 995addeef82SBruce A. MahThe IPv6 multicast support was implemented by the KAME project 996ef151d78SRuslan Ermilov.Pq Pa http://www.kame.net , 997ef151d78SRuslan Ermilovand was based on the IPv4 multicast code. 998addeef82SBruce A. MahThe advanced multicast API and the multicast bandwidth 9995203edcdSRuslan Ermilovmonitoring were implemented by 10005203edcdSRuslan Ermilov.An Pavlin Radoslavov 10015203edcdSRuslan Ermilov(ICSI) 10025203edcdSRuslan Ermilovin collaboration with 10035203edcdSRuslan Ermilov.An Chris Brown 10045203edcdSRuslan Ermilov(NextHop). 1005addeef82SBruce A. Mah.Pp 10065203edcdSRuslan ErmilovThis manual page was written by 10075203edcdSRuslan Ermilov.An Pavlin Radoslavov 10085203edcdSRuslan Ermilov(ICSI). 1009