xref: /freebsd/share/man/man4/multicast.4 (revision fa9896e082a1046ff4fbc75fcba4d18d1f2efc19)
1addeef82SBruce A. Mah.\" Copyright (c) 2001-2003 International Computer Science Institute
2addeef82SBruce A. Mah.\"
3addeef82SBruce A. Mah.\" Permission is hereby granted, free of charge, to any person obtaining a
4addeef82SBruce A. Mah.\" copy of this software and associated documentation files (the "Software"),
5addeef82SBruce A. Mah.\" to deal in the Software without restriction, including without limitation
6addeef82SBruce A. Mah.\" the rights to use, copy, modify, merge, publish, distribute, sublicense,
7addeef82SBruce A. Mah.\" and/or sell copies of the Software, and to permit persons to whom the
8addeef82SBruce A. Mah.\" Software is furnished to do so, subject to the following conditions:
9addeef82SBruce A. Mah.\"
10addeef82SBruce A. Mah.\" The above copyright notice and this permission notice shall be included in
11addeef82SBruce A. Mah.\" all copies or substantial portions of the Software.
12addeef82SBruce A. Mah.\"
13addeef82SBruce A. Mah.\" The names and trademarks of copyright holders may not be used in
14addeef82SBruce A. Mah.\" advertising or publicity pertaining to the software without specific
15addeef82SBruce A. Mah.\" prior permission. Title to copyright in this software and any associated
16addeef82SBruce A. Mah.\" documentation will at all times remain with the copyright holders.
17addeef82SBruce A. Mah.\"
18addeef82SBruce A. Mah.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
19addeef82SBruce A. Mah.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
20addeef82SBruce A. Mah.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
21addeef82SBruce A. Mah.\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
22addeef82SBruce A. Mah.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
23addeef82SBruce A. Mah.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
24addeef82SBruce A. Mah.\" DEALINGS IN THE SOFTWARE.
25addeef82SBruce A. Mah.\"
2629dc7bc6SBruce M Simpson.Dd May 27, 2009
27addeef82SBruce A. Mah.Dt MULTICAST 4
28addeef82SBruce A. Mah.Os
29addeef82SBruce A. Mah.\"
30addeef82SBruce A. Mah.Sh NAME
31addeef82SBruce A. Mah.Nm multicast
32addeef82SBruce A. Mah.Nd Multicast Routing
33addeef82SBruce A. Mah.\"
34addeef82SBruce A. Mah.Sh SYNOPSIS
35addeef82SBruce A. Mah.Cd "options MROUTING"
36addeef82SBruce A. Mah.Pp
37addeef82SBruce A. Mah.In sys/types.h
38addeef82SBruce A. Mah.In sys/socket.h
39addeef82SBruce A. Mah.In netinet/in.h
40addeef82SBruce A. Mah.In netinet/ip_mroute.h
41addeef82SBruce A. Mah.In netinet6/ip6_mroute.h
42addeef82SBruce A. Mah.Ft int
43addeef82SBruce A. Mah.Fn getsockopt "int s" IPPROTO_IP MRT_INIT "void *optval" "socklen_t *optlen"
44addeef82SBruce A. Mah.Ft int
45addeef82SBruce A. Mah.Fn setsockopt "int s" IPPROTO_IP MRT_INIT "const void *optval" "socklen_t optlen"
46addeef82SBruce A. Mah.Ft int
47addeef82SBruce A. Mah.Fn getsockopt "int s" IPPROTO_IPV6 MRT6_INIT "void *optval" "socklen_t *optlen"
48addeef82SBruce A. Mah.Ft int
49addeef82SBruce A. Mah.Fn setsockopt "int s" IPPROTO_IPV6 MRT6_INIT "const void *optval" "socklen_t optlen"
50addeef82SBruce A. Mah.Sh DESCRIPTION
51addeef82SBruce A. Mah.Tn "Multicast routing"
52addeef82SBruce A. Mahis used to efficiently propagate data
53addeef82SBruce A. Mahpackets to a set of multicast listeners in multipoint networks.
54addeef82SBruce A. MahIf unicast is used to replicate the data to all listeners,
55addeef82SBruce A. Mahthen some of the network links may carry multiple copies of the same
56addeef82SBruce A. Mahdata packets.
57addeef82SBruce A. MahWith multicast routing, the overhead is reduced to one copy
58addeef82SBruce A. Mah(at most) per network link.
59addeef82SBruce A. Mah.Pp
60addeef82SBruce A. MahAll multicast-capable routers must run a common multicast routing
61addeef82SBruce A. Mahprotocol.
62eea554b7SBruce M SimpsonIt is recommended that either
63addeef82SBruce A. MahProtocol Independent Multicast - Sparse Mode (PIM-SM),
64eea554b7SBruce M Simpsonor Protocol Independent Multicast - Dense Mode (PIM-DM)
65eea554b7SBruce M Simpsonare used, as these are now the generally accepted protocols
66eea554b7SBruce M Simpsonin the Internet community.
67eea554b7SBruce M SimpsonThe
68eea554b7SBruce M Simpson.Sx HISTORY
69eea554b7SBruce M Simpsonsection discusses previous multicast routing protocols.
70addeef82SBruce A. Mah.Pp
71addeef82SBruce A. MahTo start multicast routing,
72addeef82SBruce A. Mahthe user must enable multicast forwarding in the kernel
73addeef82SBruce A. Mah(see
74addeef82SBruce A. Mah.Sx SYNOPSIS
75addeef82SBruce A. Mahabout the kernel configuration options),
76addeef82SBruce A. Mahand must run a multicast routing capable user-level process.
77addeef82SBruce A. MahFrom developer's point of view,
78addeef82SBruce A. Mahthe programming guide described in the
79addeef82SBruce A. Mah.Sx "Programming Guide"
80addeef82SBruce A. Mahsection should be used to control the multicast forwarding in the kernel.
81addeef82SBruce A. Mah.\"
82addeef82SBruce A. Mah.Ss Programming Guide
83addeef82SBruce A. MahThis section provides information about the basic multicast routing API.
84addeef82SBruce A. MahThe so-called
85addeef82SBruce A. Mah.Dq advanced multicast API
86addeef82SBruce A. Mahis described in the
87addeef82SBruce A. Mah.Sx "Advanced Multicast API Programming Guide"
88addeef82SBruce A. Mahsection.
89addeef82SBruce A. Mah.Pp
90addeef82SBruce A. MahFirst, a multicast routing socket must be open.
91addeef82SBruce A. MahThat socket would be used
92addeef82SBruce A. Mahto control the multicast forwarding in the kernel.
93addeef82SBruce A. MahNote that most operations below require certain privilege
94addeef82SBruce A. Mah(i.e., root privilege):
95addeef82SBruce A. Mah.Bd -literal
96addeef82SBruce A. Mah/* IPv4 */
97addeef82SBruce A. Mahint mrouter_s4;
98addeef82SBruce A. Mahmrouter_s4 = socket(AF_INET, SOCK_RAW, IPPROTO_IGMP);
99addeef82SBruce A. Mah.Ed
100addeef82SBruce A. Mah.Bd -literal
101addeef82SBruce A. Mahint mrouter_s6;
102addeef82SBruce A. Mahmrouter_s6 = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
103addeef82SBruce A. Mah.Ed
104addeef82SBruce A. Mah.Pp
105addeef82SBruce A. MahNote that if the router needs to open an IGMP or ICMPv6 socket
106addeef82SBruce A. Mah(in case of IPv4 and IPv6 respectively)
107addeef82SBruce A. Mahfor sending or receiving of IGMP or MLD multicast group membership messages,
108ef151d78SRuslan Ermilovthen the same
109ef151d78SRuslan Ermilov.Va mrouter_s4
110ef151d78SRuslan Ermilovor
111ef151d78SRuslan Ermilov.Va mrouter_s6
112ef151d78SRuslan Ermilovsockets should be used
113addeef82SBruce A. Mahfor sending and receiving respectively IGMP or MLD messages.
114ef151d78SRuslan ErmilovIn case of
115ef151d78SRuslan Ermilov.Bx Ns
116ef151d78SRuslan Ermilov-derived kernel, it may be possible to open separate sockets
117addeef82SBruce A. Mahfor IGMP or MLD messages only.
118ef151d78SRuslan ErmilovHowever, some other kernels (e.g.,
119ef151d78SRuslan Ermilov.Tn Linux )
120ef151d78SRuslan Ermilovrequire that the multicast
121addeef82SBruce A. Mahrouting socket must be used for sending and receiving of IGMP or MLD
122addeef82SBruce A. Mahmessages.
123addeef82SBruce A. MahTherefore, for portability reason the multicast
124addeef82SBruce A. Mahrouting socket should be reused for IGMP and MLD messages as well.
125addeef82SBruce A. Mah.Pp
126addeef82SBruce A. MahAfter the multicast routing socket is open, it can be used to enable
127addeef82SBruce A. Mahor disable multicast forwarding in the kernel:
128addeef82SBruce A. Mah.Bd -literal
129addeef82SBruce A. Mah/* IPv4 */
130addeef82SBruce A. Mahint v = 1;        /* 1 to enable, or 0 to disable */
131addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_INIT, (void *)&v, sizeof(v));
132addeef82SBruce A. Mah.Ed
133addeef82SBruce A. Mah.Bd -literal
134addeef82SBruce A. Mah/* IPv6 */
135addeef82SBruce A. Mahint v = 1;        /* 1 to enable, or 0 to disable */
136addeef82SBruce A. Mahsetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_INIT, (void *)&v, sizeof(v));
137addeef82SBruce A. Mah\&...
138addeef82SBruce A. Mah/* If necessary, filter all ICMPv6 messages */
139addeef82SBruce A. Mahstruct icmp6_filter filter;
140addeef82SBruce A. MahICMP6_FILTER_SETBLOCKALL(&filter);
141addeef82SBruce A. Mahsetsockopt(mrouter_s6, IPPROTO_ICMPV6, ICMP6_FILTER, (void *)&filter,
142addeef82SBruce A. Mah           sizeof(filter));
143addeef82SBruce A. Mah.Ed
144addeef82SBruce A. Mah.Pp
145addeef82SBruce A. MahAfter multicast forwarding is enabled, the multicast routing socket
146addeef82SBruce A. Mahcan be used to enable PIM processing in the kernel if we are running PIM-SM or
147addeef82SBruce A. MahPIM-DM
148addeef82SBruce A. Mah(see
149addeef82SBruce A. Mah.Xr pim 4 ) .
150addeef82SBruce A. Mah.Pp
151addeef82SBruce A. MahFor each network interface (e.g., physical or a virtual tunnel)
152addeef82SBruce A. Mahthat would be used for multicast forwarding, a corresponding
153addeef82SBruce A. Mahmulticast interface must be added to the kernel:
154addeef82SBruce A. Mah.Bd -literal
155addeef82SBruce A. Mah/* IPv4 */
156addeef82SBruce A. Mahstruct vifctl vc;
157addeef82SBruce A. Mahmemset(&vc, 0, sizeof(vc));
158addeef82SBruce A. Mah/* Assign all vifctl fields as appropriate */
159addeef82SBruce A. Mahvc.vifc_vifi = vif_index;
160addeef82SBruce A. Mahvc.vifc_flags = vif_flags;
161addeef82SBruce A. Mahvc.vifc_threshold = min_ttl_threshold;
1620770db89SBruce M Simpsonvc.vifc_rate_limit = 0;
163addeef82SBruce A. Mahmemcpy(&vc.vifc_lcl_addr, &vif_local_address, sizeof(vc.vifc_lcl_addr));
164addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, (void *)&vc,
165addeef82SBruce A. Mah           sizeof(vc));
166addeef82SBruce A. Mah.Ed
167addeef82SBruce A. Mah.Pp
168addeef82SBruce A. MahThe
169ef151d78SRuslan Ermilov.Va vif_index
170addeef82SBruce A. Mahmust be unique per vif.
171addeef82SBruce A. MahThe
172ef151d78SRuslan Ermilov.Va vif_flags
173addeef82SBruce A. Mahcontains the
174ef151d78SRuslan Ermilov.Dv VIFF_*
175ef151d78SRuslan Ermilovflags as defined in
176ef151d78SRuslan Ermilov.In netinet/ip_mroute.h .
177addeef82SBruce A. MahThe
1780770db89SBruce M Simpson.Dv VIFF_TUNNEL
1790770db89SBruce M Simpsonflag is no longer supported by
1800770db89SBruce M Simpson.Fx .
1810770db89SBruce M SimpsonUsers who wish to forward multicast datagrams over a tunnel should consider
1820770db89SBruce M Simpsonconfiguring a
1830770db89SBruce M Simpson.Xr gif 4
1840770db89SBruce M Simpsonor
1850770db89SBruce M Simpson.Xr gre 4
1860770db89SBruce M Simpsontunnel and using it as a physical interface.
1870770db89SBruce M Simpson.Pp
1880770db89SBruce M SimpsonThe
189ef151d78SRuslan Ermilov.Va min_ttl_threshold
190addeef82SBruce A. Mahcontains the minimum TTL a multicast data packet must have to be
191addeef82SBruce A. Mahforwarded on that vif.
192addeef82SBruce A. MahTypically, it would have value of 1.
1930770db89SBruce M Simpson.Pp
194addeef82SBruce A. MahThe
195ef151d78SRuslan Ermilov.Va max_rate_limit
1960770db89SBruce M Simpsonargument is no longer supported in
1970770db89SBruce M Simpson.Fx
1980770db89SBruce M Simpsonand should be set to 0.
1990770db89SBruce M SimpsonUsers who wish to rate-limit multicast datagrams should consider the use of
2000770db89SBruce M Simpson.Xr dummynet 4
2010770db89SBruce M Simpsonor
2020770db89SBruce M Simpson.Xr altq 4 .
2030770db89SBruce M Simpson.Pp
204addeef82SBruce A. MahThe
205ef151d78SRuslan Ermilov.Va vif_local_address
206addeef82SBruce A. Mahcontains the local IP address of the corresponding local interface.
207addeef82SBruce A. MahThe
208ef151d78SRuslan Ermilov.Va vif_remote_address
209addeef82SBruce A. Mahcontains the remote IP address in case of DVMRP multicast tunnels.
210addeef82SBruce A. Mah.Bd -literal
211addeef82SBruce A. Mah/* IPv6 */
212addeef82SBruce A. Mahstruct mif6ctl mc;
213addeef82SBruce A. Mahmemset(&mc, 0, sizeof(mc));
214addeef82SBruce A. Mah/* Assign all mif6ctl fields as appropriate */
215addeef82SBruce A. Mahmc.mif6c_mifi = mif_index;
216addeef82SBruce A. Mahmc.mif6c_flags = mif_flags;
217addeef82SBruce A. Mahmc.mif6c_pifi = pif_index;
218addeef82SBruce A. Mahsetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, (void *)&mc,
219addeef82SBruce A. Mah           sizeof(mc));
220addeef82SBruce A. Mah.Ed
221addeef82SBruce A. Mah.Pp
222addeef82SBruce A. MahThe
223ef151d78SRuslan Ermilov.Va mif_index
224addeef82SBruce A. Mahmust be unique per vif.
225addeef82SBruce A. MahThe
226ef151d78SRuslan Ermilov.Va mif_flags
227addeef82SBruce A. Mahcontains the
228ef151d78SRuslan Ermilov.Dv MIFF_*
229ef151d78SRuslan Ermilovflags as defined in
230ef151d78SRuslan Ermilov.In netinet6/ip6_mroute.h .
231addeef82SBruce A. MahThe
232ef151d78SRuslan Ermilov.Va pif_index
233addeef82SBruce A. Mahis the physical interface index of the corresponding local interface.
234addeef82SBruce A. Mah.Pp
235addeef82SBruce A. MahA multicast interface is deleted by:
236addeef82SBruce A. Mah.Bd -literal
237addeef82SBruce A. Mah/* IPv4 */
238addeef82SBruce A. Mahvifi_t vifi = vif_index;
239addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_VIF, (void *)&vifi,
240addeef82SBruce A. Mah           sizeof(vifi));
241addeef82SBruce A. Mah.Ed
242addeef82SBruce A. Mah.Bd -literal
243addeef82SBruce A. Mah/* IPv6 */
244addeef82SBruce A. Mahmifi_t mifi = mif_index;
245addeef82SBruce A. Mahsetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DEL_MIF, (void *)&mifi,
246addeef82SBruce A. Mah           sizeof(mifi));
247addeef82SBruce A. Mah.Ed
248addeef82SBruce A. Mah.Pp
249addeef82SBruce A. MahAfter the multicast forwarding is enabled, and the multicast virtual
250addeef82SBruce A. Mahinterfaces are
251addeef82SBruce A. Mahadded, the kernel may deliver upcall messages (also called signals
252addeef82SBruce A. Mahlater in this text) on the multicast routing socket that was open
253addeef82SBruce A. Mahearlier with
254ef151d78SRuslan Ermilov.Dv MRT_INIT
255addeef82SBruce A. Mahor
256ef151d78SRuslan Ermilov.Dv MRT6_INIT .
257addeef82SBruce A. MahThe IPv4 upcalls have
258ef151d78SRuslan Ermilov.Vt "struct igmpmsg"
259ef151d78SRuslan Ermilovheader (see
260ef151d78SRuslan Ermilov.In netinet/ip_mroute.h )
261ef151d78SRuslan Ermilovwith field
262ef151d78SRuslan Ermilov.Va im_mbz
263addeef82SBruce A. Mahset to zero.
264addeef82SBruce A. MahNote that this header follows the structure of
265ef151d78SRuslan Ermilov.Vt "struct ip"
266addeef82SBruce A. Mahwith the protocol field
267ef151d78SRuslan Ermilov.Va ip_p
268addeef82SBruce A. Mahset to zero.
269addeef82SBruce A. MahThe IPv6 upcalls have
270ef151d78SRuslan Ermilov.Vt "struct mrt6msg"
271ef151d78SRuslan Ermilovheader (see
272ef151d78SRuslan Ermilov.In netinet6/ip6_mroute.h )
273ef151d78SRuslan Ermilovwith field
274ef151d78SRuslan Ermilov.Va im6_mbz
275addeef82SBruce A. Mahset to zero.
276addeef82SBruce A. MahNote that this header follows the structure of
277ef151d78SRuslan Ermilov.Vt "struct ip6_hdr"
278addeef82SBruce A. Mahwith the next header field
279ef151d78SRuslan Ermilov.Va ip6_nxt
280addeef82SBruce A. Mahset to zero.
281addeef82SBruce A. Mah.Pp
282addeef82SBruce A. MahThe upcall header contains field
283ef151d78SRuslan Ermilov.Va im_msgtype
284addeef82SBruce A. Mahand
285ef151d78SRuslan Ermilov.Va im6_msgtype
286addeef82SBruce A. Mahwith the type of the upcall
287ef151d78SRuslan Ermilov.Dv IGMPMSG_*
288addeef82SBruce A. Mahand
289ef151d78SRuslan Ermilov.Dv MRT6MSG_*
290addeef82SBruce A. Mahfor IPv4 and IPv6 respectively.
291addeef82SBruce A. MahThe values of the rest of the upcall header fields
292addeef82SBruce A. Mahand the body of the upcall message depend on the particular upcall type.
293addeef82SBruce A. Mah.Pp
294addeef82SBruce A. MahIf the upcall message type is
295ef151d78SRuslan Ermilov.Dv IGMPMSG_NOCACHE
296addeef82SBruce A. Mahor
297ef151d78SRuslan Ermilov.Dv MRT6MSG_NOCACHE ,
298addeef82SBruce A. Mahthis is an indication that a multicast packet has reached the multicast
299addeef82SBruce A. Mahrouter, but the router has no forwarding state for that packet.
300addeef82SBruce A. MahTypically, the upcall would be a signal for the multicast routing
301addeef82SBruce A. Mahuser-level process to install the appropriate Multicast Forwarding
302addeef82SBruce A. MahCache (MFC) entry in the kernel.
303addeef82SBruce A. Mah.Pp
304ef151d78SRuslan ErmilovAn MFC entry is added by:
305addeef82SBruce A. Mah.Bd -literal
306addeef82SBruce A. Mah/* IPv4 */
307addeef82SBruce A. Mahstruct mfcctl mc;
308addeef82SBruce A. Mahmemset(&mc, 0, sizeof(mc));
309addeef82SBruce A. Mahmemcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin));
310addeef82SBruce A. Mahmemcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp));
311addeef82SBruce A. Mahmc.mfcc_parent = iif_index;
312addeef82SBruce A. Mahfor (i = 0; i < maxvifs; i++)
313addeef82SBruce A. Mah    mc.mfcc_ttls[i] = oifs_ttl[i];
314addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_MFC,
315addeef82SBruce A. Mah           (void *)&mc, sizeof(mc));
316addeef82SBruce A. Mah.Ed
317addeef82SBruce A. Mah.Bd -literal
318addeef82SBruce A. Mah/* IPv6 */
319addeef82SBruce A. Mahstruct mf6cctl mc;
320addeef82SBruce A. Mahmemset(&mc, 0, sizeof(mc));
321addeef82SBruce A. Mahmemcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin));
322addeef82SBruce A. Mahmemcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp));
323addeef82SBruce A. Mahmc.mf6cc_parent = iif_index;
324addeef82SBruce A. Mahfor (i = 0; i < maxvifs; i++)
325addeef82SBruce A. Mah    if (oifs_ttl[i] > 0)
326addeef82SBruce A. Mah        IF_SET(i, &mc.mf6cc_ifset);
327d531fb5bSChristian Brueffersetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MFC,
328addeef82SBruce A. Mah           (void *)&mc, sizeof(mc));
329addeef82SBruce A. Mah.Ed
330addeef82SBruce A. Mah.Pp
331addeef82SBruce A. MahThe
332ef151d78SRuslan Ermilov.Va source_addr
333addeef82SBruce A. Mahand
334ef151d78SRuslan Ermilov.Va group_addr
335addeef82SBruce A. Mahare the source and group address of the multicast packet (as set
336addeef82SBruce A. Mahin the upcall message).
337addeef82SBruce A. MahThe
338ef151d78SRuslan Ermilov.Va iif_index
339addeef82SBruce A. Mahis the virtual interface index of the multicast interface the multicast
340addeef82SBruce A. Mahpackets for this specific source and group address should be received on.
341addeef82SBruce A. MahThe
342ef151d78SRuslan Ermilov.Va oifs_ttl[]
343addeef82SBruce A. Maharray contains the minimum TTL (per interface) a multicast packet
344addeef82SBruce A. Mahshould have to be forwarded on an outgoing interface.
345addeef82SBruce A. MahIf the TTL value is zero, the corresponding interface is not included
346addeef82SBruce A. Mahin the set of outgoing interfaces.
347addeef82SBruce A. MahNote that in case of IPv6 only the set of outgoing interfaces can
348addeef82SBruce A. Mahbe specified.
349addeef82SBruce A. Mah.Pp
350ef151d78SRuslan ErmilovAn MFC entry is deleted by:
351addeef82SBruce A. Mah.Bd -literal
352addeef82SBruce A. Mah/* IPv4 */
353addeef82SBruce A. Mahstruct mfcctl mc;
354addeef82SBruce A. Mahmemset(&mc, 0, sizeof(mc));
355addeef82SBruce A. Mahmemcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin));
356addeef82SBruce A. Mahmemcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp));
357addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_MFC,
358addeef82SBruce A. Mah           (void *)&mc, sizeof(mc));
359addeef82SBruce A. Mah.Ed
360addeef82SBruce A. Mah.Bd -literal
361addeef82SBruce A. Mah/* IPv6 */
362addeef82SBruce A. Mahstruct mf6cctl mc;
363addeef82SBruce A. Mahmemset(&mc, 0, sizeof(mc));
364addeef82SBruce A. Mahmemcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin));
365addeef82SBruce A. Mahmemcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp));
366d531fb5bSChristian Brueffersetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DEL_MFC,
367addeef82SBruce A. Mah           (void *)&mc, sizeof(mc));
368addeef82SBruce A. Mah.Ed
369addeef82SBruce A. Mah.Pp
370addeef82SBruce A. MahThe following method can be used to get various statistics per
371addeef82SBruce A. Mahinstalled MFC entry in the kernel (e.g., the number of forwarded
372addeef82SBruce A. Mahpackets per source and group address):
373addeef82SBruce A. Mah.Bd -literal
374addeef82SBruce A. Mah/* IPv4 */
375addeef82SBruce A. Mahstruct sioc_sg_req sgreq;
376addeef82SBruce A. Mahmemset(&sgreq, 0, sizeof(sgreq));
377addeef82SBruce A. Mahmemcpy(&sgreq.src, &source_addr, sizeof(sgreq.src));
378addeef82SBruce A. Mahmemcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp));
379addeef82SBruce A. Mahioctl(mrouter_s4, SIOCGETSGCNT, &sgreq);
380addeef82SBruce A. Mah.Ed
381addeef82SBruce A. Mah.Bd -literal
382addeef82SBruce A. Mah/* IPv6 */
383addeef82SBruce A. Mahstruct sioc_sg_req6 sgreq;
384addeef82SBruce A. Mahmemset(&sgreq, 0, sizeof(sgreq));
385addeef82SBruce A. Mahmemcpy(&sgreq.src, &source_addr, sizeof(sgreq.src));
386addeef82SBruce A. Mahmemcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp));
387addeef82SBruce A. Mahioctl(mrouter_s6, SIOCGETSGCNT_IN6, &sgreq);
388addeef82SBruce A. Mah.Ed
389addeef82SBruce A. Mah.Pp
390addeef82SBruce A. MahThe following method can be used to get various statistics per
391addeef82SBruce A. Mahmulticast virtual interface in the kernel (e.g., the number of forwarded
392addeef82SBruce A. Mahpackets per interface):
393addeef82SBruce A. Mah.Bd -literal
394addeef82SBruce A. Mah/* IPv4 */
395addeef82SBruce A. Mahstruct sioc_vif_req vreq;
396addeef82SBruce A. Mahmemset(&vreq, 0, sizeof(vreq));
397addeef82SBruce A. Mahvreq.vifi = vif_index;
398addeef82SBruce A. Mahioctl(mrouter_s4, SIOCGETVIFCNT, &vreq);
399addeef82SBruce A. Mah.Ed
400addeef82SBruce A. Mah.Bd -literal
401addeef82SBruce A. Mah/* IPv6 */
402addeef82SBruce A. Mahstruct sioc_mif_req6 mreq;
403addeef82SBruce A. Mahmemset(&mreq, 0, sizeof(mreq));
404addeef82SBruce A. Mahmreq.mifi = vif_index;
405addeef82SBruce A. Mahioctl(mrouter_s6, SIOCGETMIFCNT_IN6, &mreq);
406addeef82SBruce A. Mah.Ed
407addeef82SBruce A. Mah.Ss Advanced Multicast API Programming Guide
408addeef82SBruce A. MahIf we want to add new features in the kernel, it becomes difficult
409addeef82SBruce A. Mahto preserve backward compatibility (binary and API),
410addeef82SBruce A. Mahand at the same time to allow user-level processes to take advantage of
411addeef82SBruce A. Mahthe new features (if the kernel supports them).
412addeef82SBruce A. Mah.Pp
413addeef82SBruce A. MahOne of the mechanisms that allows us to preserve the backward
414addeef82SBruce A. Mahcompatibility is a sort of negotiation
415addeef82SBruce A. Mahbetween the user-level process and the kernel:
416addeef82SBruce A. Mah.Bl -enum
417addeef82SBruce A. Mah.It
418addeef82SBruce A. MahThe user-level process tries to enable in the kernel the set of new
419addeef82SBruce A. Mahfeatures (and the corresponding API) it would like to use.
420addeef82SBruce A. Mah.It
421addeef82SBruce A. MahThe kernel returns the (sub)set of features it knows about
422addeef82SBruce A. Mahand is willing to be enabled.
423addeef82SBruce A. Mah.It
424addeef82SBruce A. MahThe user-level process uses only that set of features
425addeef82SBruce A. Mahthe kernel has agreed on.
426addeef82SBruce A. Mah.El
427addeef82SBruce A. Mah.\"
428addeef82SBruce A. Mah.Pp
429ef151d78SRuslan ErmilovTo support backward compatibility, if the user-level process does not
430addeef82SBruce A. Mahask for any new features, the kernel defaults to the basic
431addeef82SBruce A. Mahmulticast API (see the
432addeef82SBruce A. Mah.Sx "Programming Guide"
433addeef82SBruce A. Mahsection).
434addeef82SBruce A. Mah.\" XXX: edit as appropriate after the advanced multicast API is
435addeef82SBruce A. Mah.\" supported under IPv6
436addeef82SBruce A. MahCurrently, the advanced multicast API exists only for IPv4;
437addeef82SBruce A. Mahin the future there will be IPv6 support as well.
438addeef82SBruce A. Mah.Pp
439addeef82SBruce A. MahBelow is a summary of the expandable API solution.
440addeef82SBruce A. MahNote that all new options and structures are defined
441ef151d78SRuslan Ermilovin
442ef151d78SRuslan Ermilov.In netinet/ip_mroute.h
443ef151d78SRuslan Ermilovand
444ef151d78SRuslan Ermilov.In netinet6/ip6_mroute.h ,
445addeef82SBruce A. Mahunless stated otherwise.
446addeef82SBruce A. Mah.Pp
447ef151d78SRuslan ErmilovThe user-level process uses new
448ef151d78SRuslan Ermilov.Fn getsockopt Ns / Ns Fn setsockopt
449ef151d78SRuslan Ermilovoptions to
450addeef82SBruce A. Mahperform the API features negotiation with the kernel.
451addeef82SBruce A. MahThis negotiation must be performed right after the multicast routing
452addeef82SBruce A. Mahsocket is open.
453addeef82SBruce A. MahThe set of desired/allowed features is stored in a bitset
454ef151d78SRuslan Ermilov(currently, in
455ef151d78SRuslan Ermilov.Vt uint32_t ;
456ef151d78SRuslan Ermilovi.e., maximum of 32 new features).
457ef151d78SRuslan ErmilovThe new
458ef151d78SRuslan Ermilov.Fn getsockopt Ns / Ns Fn setsockopt
459ef151d78SRuslan Ermilovoptions are
460ef151d78SRuslan Ermilov.Dv MRT_API_SUPPORT
461addeef82SBruce A. Mahand
462ef151d78SRuslan Ermilov.Dv MRT_API_CONFIG .
463addeef82SBruce A. MahExample:
464addeef82SBruce A. Mah.Bd -literal
465addeef82SBruce A. Mahuint32_t v;
466addeef82SBruce A. Mahgetsockopt(sock, IPPROTO_IP, MRT_API_SUPPORT, (void *)&v, sizeof(v));
467addeef82SBruce A. Mah.Ed
468addeef82SBruce A. Mah.Pp
469addeef82SBruce A. Mahwould set in
470ef151d78SRuslan Ermilov.Va v
471addeef82SBruce A. Mahthe pre-defined bits that the kernel API supports.
472ef151d78SRuslan ErmilovThe eight least significant bits in
473ef151d78SRuslan Ermilov.Vt uint32_t
474ef151d78SRuslan Ermilovare same as the
475addeef82SBruce A. Maheight possible flags
476ef151d78SRuslan Ermilov.Dv MRT_MFC_FLAGS_*
477addeef82SBruce A. Mahthat can be used in
478ef151d78SRuslan Ermilov.Va mfcc_flags
479addeef82SBruce A. Mahas part of the new definition of
480ef151d78SRuslan Ermilov.Vt "struct mfcctl"
481addeef82SBruce A. Mah(see below about those flags), which leaves 24 flags for other new features.
482ef151d78SRuslan ErmilovThe value returned by
483ef151d78SRuslan Ermilov.Fn getsockopt MRT_API_SUPPORT
484ef151d78SRuslan Ermilovis read-only; in other words,
485ef151d78SRuslan Ermilov.Fn setsockopt MRT_API_SUPPORT
486ef151d78SRuslan Ermilovwould fail.
487addeef82SBruce A. Mah.Pp
488addeef82SBruce A. MahTo modify the API, and to set some specific feature in the kernel, then:
489addeef82SBruce A. Mah.Bd -literal
490addeef82SBruce A. Mahuint32_t v = MRT_MFC_FLAGS_DISABLE_WRONGVIF;
491addeef82SBruce A. Mahif (setsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v))
492addeef82SBruce A. Mah    != 0) {
493addeef82SBruce A. Mah    return (ERROR);
494addeef82SBruce A. Mah}
495addeef82SBruce A. Mahif (v & MRT_MFC_FLAGS_DISABLE_WRONGVIF)
496addeef82SBruce A. Mah    return (OK);	/* Success */
497addeef82SBruce A. Mahelse
498addeef82SBruce A. Mah    return (ERROR);
499addeef82SBruce A. Mah.Ed
500addeef82SBruce A. Mah.Pp
501ef151d78SRuslan ErmilovIn other words, when
502ef151d78SRuslan Ermilov.Fn setsockopt MRT_API_CONFIG
503ef151d78SRuslan Ermilovis called, the
504addeef82SBruce A. Mahargument to it specifies the desired set of features to
505addeef82SBruce A. Mahbe enabled in the API and the kernel.
506addeef82SBruce A. MahThe return value in
507ef151d78SRuslan Ermilov.Va v
508addeef82SBruce A. Mahis the actual (sub)set of features that were enabled in the kernel.
509addeef82SBruce A. MahTo obtain later the same set of features that were enabled, then:
510addeef82SBruce A. Mah.Bd -literal
511addeef82SBruce A. Mahgetsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v));
512addeef82SBruce A. Mah.Ed
513addeef82SBruce A. Mah.Pp
514addeef82SBruce A. MahThe set of enabled features is global.
515ef151d78SRuslan ErmilovIn other words,
516ef151d78SRuslan Ermilov.Fn setsockopt MRT_API_CONFIG
517ef151d78SRuslan Ermilovshould be called right after
518ef151d78SRuslan Ermilov.Fn setsockopt MRT_INIT .
519addeef82SBruce A. Mah.Pp
520addeef82SBruce A. MahCurrently, the following set of new features is defined:
521addeef82SBruce A. Mah.Bd -literal
522addeef82SBruce A. Mah#define	MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */
523addeef82SBruce A. Mah#define	MRT_MFC_FLAGS_BORDER_VIF   (1 << 1)  /* border vif              */
524addeef82SBruce A. Mah#define MRT_MFC_RP                 (1 << 8)  /* enable RP address	*/
525addeef82SBruce A. Mah#define MRT_MFC_BW_UPCALL          (1 << 9)  /* enable bw upcalls	*/
526addeef82SBruce A. Mah.Ed
527addeef82SBruce A. Mah.\" .Pp
528addeef82SBruce A. Mah.\" In the future there might be:
529addeef82SBruce A. Mah.\" .Bd -literal
530addeef82SBruce A. Mah.\" #define MRT_MFC_GROUP_SPECIFIC     (1 << 10) /* allow (*,G) MFC entries */
531addeef82SBruce A. Mah.\" .Ed
532addeef82SBruce A. Mah.\" .Pp
533addeef82SBruce A. Mah.\" to allow (*,G) MFC entries (i.e., group-specific entries) in the kernel.
534addeef82SBruce A. Mah.\" For now this is left-out until it is clear whether
535addeef82SBruce A. Mah.\" (*,G) MFC support is the preferred solution instead of something more generic
536addeef82SBruce A. Mah.\" solution for example.
537addeef82SBruce A. Mah.\"
538addeef82SBruce A. Mah.\" 2. The newly defined struct mfcctl2.
539addeef82SBruce A. Mah.\"
540addeef82SBruce A. Mah.Pp
541addeef82SBruce A. MahThe advanced multicast API uses a newly defined
542ef151d78SRuslan Ermilov.Vt "struct mfcctl2"
543addeef82SBruce A. Mahinstead of the traditional
544ef151d78SRuslan Ermilov.Vt "struct mfcctl" .
545addeef82SBruce A. MahThe original
546ef151d78SRuslan Ermilov.Vt "struct mfcctl"
547addeef82SBruce A. Mahis kept as is.
548addeef82SBruce A. MahThe new
549ef151d78SRuslan Ermilov.Vt "struct mfcctl2"
550addeef82SBruce A. Mahis:
551addeef82SBruce A. Mah.Bd -literal
552addeef82SBruce A. Mah/*
553addeef82SBruce A. Mah * The new argument structure for MRT_ADD_MFC and MRT_DEL_MFC overlays
554addeef82SBruce A. Mah * and extends the old struct mfcctl.
555addeef82SBruce A. Mah */
556addeef82SBruce A. Mahstruct mfcctl2 {
557addeef82SBruce A. Mah        /* the mfcctl fields */
558addeef82SBruce A. Mah        struct in_addr  mfcc_origin;       /* ip origin of mcasts       */
559addeef82SBruce A. Mah        struct in_addr  mfcc_mcastgrp;     /* multicast group associated*/
560addeef82SBruce A. Mah        vifi_t          mfcc_parent;       /* incoming vif              */
561addeef82SBruce A. Mah        u_char          mfcc_ttls[MAXVIFS];/* forwarding ttls on vifs   */
562addeef82SBruce A. Mah
563addeef82SBruce A. Mah        /* extension fields */
564addeef82SBruce A. Mah        uint8_t         mfcc_flags[MAXVIFS];/* the MRT_MFC_FLAGS_* flags*/
565addeef82SBruce A. Mah        struct in_addr  mfcc_rp;            /* the RP address           */
566addeef82SBruce A. Mah};
567addeef82SBruce A. Mah.Ed
568addeef82SBruce A. Mah.Pp
569addeef82SBruce A. MahThe new fields are
570ef151d78SRuslan Ermilov.Va mfcc_flags[MAXVIFS]
571addeef82SBruce A. Mahand
572ef151d78SRuslan Ermilov.Va mfcc_rp .
573addeef82SBruce A. MahNote that for compatibility reasons they are added at the end.
574addeef82SBruce A. Mah.Pp
575addeef82SBruce A. MahThe
576ef151d78SRuslan Ermilov.Va mfcc_flags[MAXVIFS]
577addeef82SBruce A. Mahfield is used to set various flags per
578addeef82SBruce A. Mahinterface per (S,G) entry.
579addeef82SBruce A. MahCurrently, the defined flags are:
580addeef82SBruce A. Mah.Bd -literal
581addeef82SBruce A. Mah#define	MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */
582addeef82SBruce A. Mah#define	MRT_MFC_FLAGS_BORDER_VIF       (1 << 1) /* border vif          */
583addeef82SBruce A. Mah.Ed
584addeef82SBruce A. Mah.Pp
585addeef82SBruce A. MahThe
586ef151d78SRuslan Ermilov.Dv MRT_MFC_FLAGS_DISABLE_WRONGVIF
587addeef82SBruce A. Mahflag is used to explicitly disable the
588ef151d78SRuslan Ermilov.Dv IGMPMSG_WRONGVIF
589addeef82SBruce A. Mahkernel signal at the (S,G) granularity if a multicast data packet
590addeef82SBruce A. Maharrives on the wrong interface.
591addeef82SBruce A. MahUsually, this signal is used to
592addeef82SBruce A. Mahcomplete the shortest-path switch in case of PIM-SM multicast routing,
593addeef82SBruce A. Mahor to trigger a PIM assert message.
594addeef82SBruce A. MahHowever, it should not be delivered for interfaces that are not in
595addeef82SBruce A. Mahthe outgoing interface set, and that are not expecting to
596addeef82SBruce A. Mahbecome an incoming interface.
597addeef82SBruce A. MahHence, if the
598ef151d78SRuslan Ermilov.Dv MRT_MFC_FLAGS_DISABLE_WRONGVIF
599addeef82SBruce A. Mahflag is set for some of the
600addeef82SBruce A. Mahinterfaces, then a data packet that arrives on that interface for
601addeef82SBruce A. Mahthat MFC entry will NOT trigger a WRONGVIF signal.
602addeef82SBruce A. MahIf that flag is not set, then a signal is triggered (the default action).
603addeef82SBruce A. Mah.Pp
604addeef82SBruce A. MahThe
605ef151d78SRuslan Ermilov.Dv MRT_MFC_FLAGS_BORDER_VIF
606addeef82SBruce A. Mahflag is used to specify whether the Border-bit in PIM
607addeef82SBruce A. MahRegister messages should be set (in case when the Register encapsulation
608addeef82SBruce A. Mahis performed inside the kernel).
609addeef82SBruce A. MahIf it is set for the special PIM Register kernel virtual interface
610addeef82SBruce A. Mah(see
611addeef82SBruce A. Mah.Xr pim 4 ) ,
612addeef82SBruce A. Mahthe Border-bit in the Register messages sent to the RP will be set.
613addeef82SBruce A. Mah.Pp
614addeef82SBruce A. MahThe remaining six bits are reserved for future usage.
615addeef82SBruce A. Mah.Pp
616addeef82SBruce A. MahThe
617ef151d78SRuslan Ermilov.Va mfcc_rp
618addeef82SBruce A. Mahfield is used to specify the RP address (in case of PIM-SM multicast routing)
619addeef82SBruce A. Mahfor a multicast
620addeef82SBruce A. Mahgroup G if we want to perform kernel-level PIM Register encapsulation.
621addeef82SBruce A. MahThe
622ef151d78SRuslan Ermilov.Va mfcc_rp
623addeef82SBruce A. Mahfield is used only if the
624ef151d78SRuslan Ermilov.Dv MRT_MFC_RP
625addeef82SBruce A. Mahadvanced API flag/capability has been successfully set by
626ef151d78SRuslan Ermilov.Fn setsockopt MRT_API_CONFIG .
627addeef82SBruce A. Mah.Pp
628addeef82SBruce A. Mah.\"
629addeef82SBruce A. Mah.\" 3. Kernel-level PIM Register encapsulation
630addeef82SBruce A. Mah.\"
631addeef82SBruce A. MahIf the
632ef151d78SRuslan Ermilov.Dv MRT_MFC_RP
633addeef82SBruce A. Mahflag was successfully set by
634ef151d78SRuslan Ermilov.Fn setsockopt MRT_API_CONFIG ,
635ef151d78SRuslan Ermilovthen the kernel will attempt to perform
636addeef82SBruce A. Mahthe PIM Register encapsulation itself instead of sending the
637ef151d78SRuslan Ermilovmulticast data packets to user level (inside
638ef151d78SRuslan Ermilov.Dv IGMPMSG_WHOLEPKT
639addeef82SBruce A. Mahupcalls) for user-level encapsulation.
640addeef82SBruce A. MahThe RP address would be taken from the
641ef151d78SRuslan Ermilov.Va mfcc_rp
642addeef82SBruce A. Mahfield
643addeef82SBruce A. Mahinside the new
644ef151d78SRuslan Ermilov.Vt "struct mfcctl2" .
645addeef82SBruce A. MahHowever, even if the
646ef151d78SRuslan Ermilov.Dv MRT_MFC_RP
647addeef82SBruce A. Mahflag was successfully set, if the
648ef151d78SRuslan Ermilov.Va mfcc_rp
649addeef82SBruce A. Mahfield was set to
650ef151d78SRuslan Ermilov.Dv INADDR_ANY ,
651addeef82SBruce A. Mahthen the
652ef151d78SRuslan Ermilovkernel will still deliver an
653ef151d78SRuslan Ermilov.Dv IGMPMSG_WHOLEPKT
654ef151d78SRuslan Ermilovupcall with the
655addeef82SBruce A. Mahmulticast data packet to the user-level process.
656addeef82SBruce A. Mah.Pp
657addeef82SBruce A. MahIn addition, if the multicast data packet is too large to fit within
658addeef82SBruce A. Maha single IP packet after the PIM Register encapsulation (e.g., if
659addeef82SBruce A. Mahits size was on the order of 65500 bytes), the data packet will be
660addeef82SBruce A. Mahfragmented, and then each of the fragments will be encapsulated
661addeef82SBruce A. Mahseparately.
662addeef82SBruce A. MahNote that typically a multicast data packet can be that
663addeef82SBruce A. Mahlarge only if it was originated locally from the same hosts that
664addeef82SBruce A. Mahperforms the encapsulation; otherwise the transmission of the
665addeef82SBruce A. Mahmulticast data packet over Ethernet for example would have
666addeef82SBruce A. Mahfragmented it into much smaller pieces.
667addeef82SBruce A. Mah.\"
668addeef82SBruce A. Mah.\" Note that if this code is ported to IPv6, we may need the kernel to
669addeef82SBruce A. Mah.\" perform MTU discovery to the RP, and keep those discoveries inside
670addeef82SBruce A. Mah.\" the kernel so the encapsulating router may send back ICMP
671addeef82SBruce A. Mah.\" Fragmentation Required if the size of the multicast data packet is
672addeef82SBruce A. Mah.\" too large (see "Encapsulating data packets in the Register Tunnel"
673addeef82SBruce A. Mah.\" in Section 4.4.1 in the PIM-SM spec
674addeef82SBruce A. Mah.\" draft-ietf-pim-sm-v2-new-05.{txt,ps}).
675addeef82SBruce A. Mah.\" For IPv4 we may be able to get away without it, but for IPv6 we need
676addeef82SBruce A. Mah.\" that.
677addeef82SBruce A. Mah.\"
678addeef82SBruce A. Mah.\" 4. Mechanism for "multicast bandwidth monitoring and upcalls".
679addeef82SBruce A. Mah.\"
680addeef82SBruce A. Mah.Pp
681addeef82SBruce A. MahTypically, a multicast routing user-level process would need to know the
682addeef82SBruce A. Mahforwarding bandwidth for some data flow.
683addeef82SBruce A. MahFor example, the multicast routing process may want to timeout idle MFC
684addeef82SBruce A. Mahentries, or in case of PIM-SM it can initiate (S,G) shortest-path switch if
685addeef82SBruce A. Mahthe bandwidth rate is above a threshold for example.
686addeef82SBruce A. Mah.Pp
687addeef82SBruce A. MahThe original solution for measuring the bandwidth of a dataflow was
688addeef82SBruce A. Mahthat a user-level process would periodically
689addeef82SBruce A. Mahquery the kernel about the number of forwarded packets/bytes per
690addeef82SBruce A. Mah(S,G), and then based on those numbers it would estimate whether a source
691addeef82SBruce A. Mahhas been idle, or whether the source's transmission bandwidth is above a
692addeef82SBruce A. Mahthreshold.
693addeef82SBruce A. MahThat solution is far from being scalable, hence the need for a new
694addeef82SBruce A. Mahmechanism for bandwidth monitoring.
695addeef82SBruce A. Mah.Pp
696addeef82SBruce A. MahBelow is a description of the bandwidth monitoring mechanism.
697addeef82SBruce A. Mah.Bl -bullet
698addeef82SBruce A. Mah.It
699addeef82SBruce A. MahIf the bandwidth of a data flow satisfies some pre-defined filter,
700addeef82SBruce A. Mahthe kernel delivers an upcall on the multicast routing socket
701addeef82SBruce A. Mahto the multicast routing process that has installed that filter.
702addeef82SBruce A. Mah.It
7035203edcdSRuslan ErmilovThe bandwidth-upcall filters are installed per (S,G).
7045203edcdSRuslan ErmilovThere can be
705addeef82SBruce A. Mahmore than one filter per (S,G).
706addeef82SBruce A. Mah.It
707addeef82SBruce A. MahInstead of supporting all possible comparison operations
708addeef82SBruce A. Mah(i.e., < <= == != > >= ), there is support only for the
709addeef82SBruce A. Mah<= and >= operations,
710addeef82SBruce A. Mahbecause this makes the kernel-level implementation simpler,
711addeef82SBruce A. Mahand because practically we need only those two.
712addeef82SBruce A. MahFurther, the missing operations can be simulated by secondary
713addeef82SBruce A. Mahuser-level filtering of those <= and >= filters.
714addeef82SBruce A. MahFor example, to simulate !=, then we need to install filter
715addeef82SBruce A. Mah.Dq bw <= 0xffffffff ,
716addeef82SBruce A. Mahand after an
717addeef82SBruce A. Mahupcall is received, we need to check whether
718addeef82SBruce A. Mah.Dq measured_bw != expected_bw .
719addeef82SBruce A. Mah.It
720addeef82SBruce A. MahThe bandwidth-upcall mechanism is enabled by
721ef151d78SRuslan Ermilov.Fn setsockopt MRT_API_CONFIG
722ef151d78SRuslan Ermilovfor the
723ef151d78SRuslan Ermilov.Dv MRT_MFC_BW_UPCALL
724ef151d78SRuslan Ermilovflag.
725addeef82SBruce A. Mah.It
726addeef82SBruce A. MahThe bandwidth-upcall filters are added/deleted by the new
727ef151d78SRuslan Ermilov.Fn setsockopt MRT_ADD_BW_UPCALL
728ef151d78SRuslan Ermilovand
729ef151d78SRuslan Ermilov.Fn setsockopt MRT_DEL_BW_UPCALL
730addeef82SBruce A. Mahrespectively (with the appropriate
731ef151d78SRuslan Ermilov.Vt "struct bw_upcall"
732addeef82SBruce A. Mahargument of course).
733addeef82SBruce A. Mah.El
734addeef82SBruce A. Mah.Pp
735addeef82SBruce A. MahFrom application point of view, a developer needs to know about
736addeef82SBruce A. Mahthe following:
737addeef82SBruce A. Mah.Bd -literal
738addeef82SBruce A. Mah/*
739addeef82SBruce A. Mah * Structure for installing or delivering an upcall if the
740addeef82SBruce A. Mah * measured bandwidth is above or below a threshold.
741addeef82SBruce A. Mah *
742addeef82SBruce A. Mah * User programs (e.g. daemons) may have a need to know when the
743addeef82SBruce A. Mah * bandwidth used by some data flow is above or below some threshold.
744addeef82SBruce A. Mah * This interface allows the userland to specify the threshold (in
745addeef82SBruce A. Mah * bytes and/or packets) and the measurement interval. Flows are
746addeef82SBruce A. Mah * all packet with the same source and destination IP address.
747addeef82SBruce A. Mah * At the moment the code is only used for multicast destinations
748addeef82SBruce A. Mah * but there is nothing that prevents its use for unicast.
749addeef82SBruce A. Mah *
750addeef82SBruce A. Mah * The measurement interval cannot be shorter than some Tmin (currently, 3s).
751addeef82SBruce A. Mah * The threshold is set in packets and/or bytes per_interval.
752addeef82SBruce A. Mah *
753addeef82SBruce A. Mah * Measurement works as follows:
754addeef82SBruce A. Mah *
755addeef82SBruce A. Mah * For >= measurements:
756addeef82SBruce A. Mah * The first packet marks the start of a measurement interval.
757addeef82SBruce A. Mah * During an interval we count packets and bytes, and when we
758addeef82SBruce A. Mah * pass the threshold we deliver an upcall and we are done.
759addeef82SBruce A. Mah * The first packet after the end of the interval resets the
760addeef82SBruce A. Mah * count and restarts the measurement.
761addeef82SBruce A. Mah *
762addeef82SBruce A. Mah * For <= measurement:
763addeef82SBruce A. Mah * We start a timer to fire at the end of the interval, and
764addeef82SBruce A. Mah * then for each incoming packet we count packets and bytes.
765addeef82SBruce A. Mah * When the timer fires, we compare the value with the threshold,
766addeef82SBruce A. Mah * schedule an upcall if we are below, and restart the measurement
767addeef82SBruce A. Mah * (reschedule timer and zero counters).
768addeef82SBruce A. Mah */
769addeef82SBruce A. Mah
770addeef82SBruce A. Mahstruct bw_data {
771addeef82SBruce A. Mah        struct timeval  b_time;
772addeef82SBruce A. Mah        uint64_t        b_packets;
773addeef82SBruce A. Mah        uint64_t        b_bytes;
774addeef82SBruce A. Mah};
775addeef82SBruce A. Mah
776addeef82SBruce A. Mahstruct bw_upcall {
777addeef82SBruce A. Mah        struct in_addr  bu_src;         /* source address            */
778addeef82SBruce A. Mah        struct in_addr  bu_dst;         /* destination address       */
779addeef82SBruce A. Mah        uint32_t        bu_flags;       /* misc flags (see below)    */
780addeef82SBruce A. Mah#define BW_UPCALL_UNIT_PACKETS (1 << 0) /* threshold (in packets)    */
781addeef82SBruce A. Mah#define BW_UPCALL_UNIT_BYTES   (1 << 1) /* threshold (in bytes)      */
782addeef82SBruce A. Mah#define BW_UPCALL_GEQ          (1 << 2) /* upcall if bw >= threshold */
783addeef82SBruce A. Mah#define BW_UPCALL_LEQ          (1 << 3) /* upcall if bw <= threshold */
784addeef82SBruce A. Mah#define BW_UPCALL_DELETE_ALL   (1 << 4) /* delete all upcalls for s,d*/
785addeef82SBruce A. Mah        struct bw_data  bu_threshold;   /* the bw threshold          */
786addeef82SBruce A. Mah        struct bw_data  bu_measured;    /* the measured bw           */
787addeef82SBruce A. Mah};
788addeef82SBruce A. Mah
789addeef82SBruce A. Mah/* max. number of upcalls to deliver together */
790addeef82SBruce A. Mah#define BW_UPCALLS_MAX				128
791addeef82SBruce A. Mah/* min. threshold time interval for bandwidth measurement */
792addeef82SBruce A. Mah#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_SEC	3
793addeef82SBruce A. Mah#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_USEC	0
794addeef82SBruce A. Mah.Ed
795addeef82SBruce A. Mah.Pp
796addeef82SBruce A. MahThe
797ef151d78SRuslan Ermilov.Vt bw_upcall
798addeef82SBruce A. Mahstructure is used as an argument to
799ef151d78SRuslan Ermilov.Fn setsockopt MRT_ADD_BW_UPCALL
800ef151d78SRuslan Ermilovand
801ef151d78SRuslan Ermilov.Fn setsockopt MRT_DEL_BW_UPCALL .
802ef151d78SRuslan ErmilovEach
803ef151d78SRuslan Ermilov.Fn setsockopt MRT_ADD_BW_UPCALL
804ef151d78SRuslan Ermilovinstalls a filter in the kernel
805addeef82SBruce A. Mahfor the source and destination address in the
806ef151d78SRuslan Ermilov.Vt bw_upcall
807addeef82SBruce A. Mahargument,
808addeef82SBruce A. Mahand that filter will trigger an upcall according to the following
809addeef82SBruce A. Mahpseudo-algorithm:
810addeef82SBruce A. Mah.Bd -literal
811addeef82SBruce A. Mah if (bw_upcall_oper IS ">=") {
812addeef82SBruce A. Mah    if (((bw_upcall_unit & PACKETS == PACKETS) &&
813addeef82SBruce A. Mah         (measured_packets >= threshold_packets)) ||
814addeef82SBruce A. Mah        ((bw_upcall_unit & BYTES == BYTES) &&
815addeef82SBruce A. Mah         (measured_bytes >= threshold_bytes)))
816addeef82SBruce A. Mah       SEND_UPCALL("measured bandwidth is >= threshold");
817addeef82SBruce A. Mah  }
818addeef82SBruce A. Mah  if (bw_upcall_oper IS "<=" && measured_interval >= threshold_interval) {
819addeef82SBruce A. Mah    if (((bw_upcall_unit & PACKETS == PACKETS) &&
820addeef82SBruce A. Mah         (measured_packets <= threshold_packets)) ||
821addeef82SBruce A. Mah        ((bw_upcall_unit & BYTES == BYTES) &&
822addeef82SBruce A. Mah         (measured_bytes <= threshold_bytes)))
823addeef82SBruce A. Mah       SEND_UPCALL("measured bandwidth is <= threshold");
824addeef82SBruce A. Mah  }
825addeef82SBruce A. Mah.Ed
826addeef82SBruce A. Mah.Pp
827addeef82SBruce A. MahIn the same
828ef151d78SRuslan Ermilov.Vt bw_upcall
829addeef82SBruce A. Mahthe unit can be specified in both BYTES and PACKETS.
830addeef82SBruce A. MahHowever, the GEQ and LEQ flags are mutually exclusive.
831addeef82SBruce A. Mah.Pp
832addeef82SBruce A. MahBasically, an upcall is delivered if the measured bandwidth is >= or
833addeef82SBruce A. Mah<= the threshold bandwidth (within the specified measurement
834addeef82SBruce A. Mahinterval).
835addeef82SBruce A. MahFor practical reasons, the smallest value for the measurement
836addeef82SBruce A. Mahinterval is 3 seconds.
837addeef82SBruce A. MahIf smaller values are allowed, then the bandwidth
838addeef82SBruce A. Mahestimation may be less accurate, or the potentially very high frequency
839addeef82SBruce A. Mahof the generated upcalls may introduce too much overhead.
840addeef82SBruce A. MahFor the >= operation, the answer may be known before the end of
841ef151d78SRuslan Ermilov.Va threshold_interval ,
842addeef82SBruce A. Mahtherefore the upcall may be delivered earlier.
843addeef82SBruce A. MahFor the <= operation however, we must wait
844addeef82SBruce A. Mahuntil the threshold interval has expired to know the answer.
845addeef82SBruce A. Mah.Pp
846addeef82SBruce A. MahExample of usage:
847addeef82SBruce A. Mah.Bd -literal
848addeef82SBruce A. Mahstruct bw_upcall bw_upcall;
849addeef82SBruce A. Mah/* Assign all bw_upcall fields as appropriate */
850addeef82SBruce A. Mahmemset(&bw_upcall, 0, sizeof(bw_upcall));
851addeef82SBruce A. Mahmemcpy(&bw_upcall.bu_src, &source, sizeof(bw_upcall.bu_src));
852addeef82SBruce A. Mahmemcpy(&bw_upcall.bu_dst, &group, sizeof(bw_upcall.bu_dst));
853addeef82SBruce A. Mahbw_upcall.bu_threshold.b_data = threshold_interval;
854addeef82SBruce A. Mahbw_upcall.bu_threshold.b_packets = threshold_packets;
855addeef82SBruce A. Mahbw_upcall.bu_threshold.b_bytes = threshold_bytes;
856addeef82SBruce A. Mahif (is_threshold_in_packets)
857addeef82SBruce A. Mah    bw_upcall.bu_flags |= BW_UPCALL_UNIT_PACKETS;
858addeef82SBruce A. Mahif (is_threshold_in_bytes)
859addeef82SBruce A. Mah    bw_upcall.bu_flags |= BW_UPCALL_UNIT_BYTES;
860addeef82SBruce A. Mahdo {
861addeef82SBruce A. Mah    if (is_geq_upcall) {
862addeef82SBruce A. Mah        bw_upcall.bu_flags |= BW_UPCALL_GEQ;
863addeef82SBruce A. Mah        break;
864addeef82SBruce A. Mah    }
865addeef82SBruce A. Mah    if (is_leq_upcall) {
866addeef82SBruce A. Mah        bw_upcall.bu_flags |= BW_UPCALL_LEQ;
867addeef82SBruce A. Mah        break;
868addeef82SBruce A. Mah    }
869addeef82SBruce A. Mah    return (ERROR);
870addeef82SBruce A. Mah} while (0);
871addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_BW_UPCALL,
872addeef82SBruce A. Mah          (void *)&bw_upcall, sizeof(bw_upcall));
873addeef82SBruce A. Mah.Ed
874addeef82SBruce A. Mah.Pp
875ef151d78SRuslan ErmilovTo delete a single filter, then use
876ef151d78SRuslan Ermilov.Dv MRT_DEL_BW_UPCALL ,
877addeef82SBruce A. Mahand the fields of bw_upcall must be set
878ef151d78SRuslan Ermilovexactly same as when
879ef151d78SRuslan Ermilov.Dv MRT_ADD_BW_UPCALL
880ef151d78SRuslan Ermilovwas called.
881addeef82SBruce A. Mah.Pp
882addeef82SBruce A. MahTo delete all bandwidth filters for a given (S,G), then
883addeef82SBruce A. Mahonly the
884ef151d78SRuslan Ermilov.Va bu_src
885addeef82SBruce A. Mahand
886ef151d78SRuslan Ermilov.Va bu_dst
887addeef82SBruce A. Mahfields in
888ef151d78SRuslan Ermilov.Vt "struct bw_upcall"
889addeef82SBruce A. Mahneed to be set, and then just set only the
890ef151d78SRuslan Ermilov.Dv BW_UPCALL_DELETE_ALL
891addeef82SBruce A. Mahflag inside field
892ef151d78SRuslan Ermilov.Va bw_upcall.bu_flags .
893addeef82SBruce A. Mah.Pp
894addeef82SBruce A. MahThe bandwidth upcalls are received by aggregating them in the new upcall
895addeef82SBruce A. Mahmessage:
896addeef82SBruce A. Mah.Bd -literal
897addeef82SBruce A. Mah#define IGMPMSG_BW_UPCALL  4  /* BW monitoring upcall */
898addeef82SBruce A. Mah.Ed
899addeef82SBruce A. Mah.Pp
900addeef82SBruce A. MahThis message is an array of
901ef151d78SRuslan Ermilov.Vt "struct bw_upcall"
902ef151d78SRuslan Ermilovelements (up to
903ef151d78SRuslan Ermilov.Dv BW_UPCALLS_MAX
904ef151d78SRuslan Ermilov= 128).
905addeef82SBruce A. MahThe upcalls are
906addeef82SBruce A. Mahdelivered when there are 128 pending upcalls, or when 1 second has
907addeef82SBruce A. Mahexpired since the previous upcall (whichever comes first).
908addeef82SBruce A. MahIn an
909ef151d78SRuslan Ermilov.Vt "struct upcall"
910addeef82SBruce A. Mahelement, the
911ef151d78SRuslan Ermilov.Va bu_measured
912addeef82SBruce A. Mahfield is filled-in to
913addeef82SBruce A. Mahindicate the particular measured values.
914addeef82SBruce A. MahHowever, because of the way
915addeef82SBruce A. Mahthe particular intervals are measured, the user should be careful how
916ef151d78SRuslan Ermilov.Va bu_measured.b_time
917ef151d78SRuslan Ermilovis used.
918addeef82SBruce A. MahFor example, if the
919addeef82SBruce A. Mahfilter is installed to trigger an upcall if the number of packets
920addeef82SBruce A. Mahis >= 1, then
921ef151d78SRuslan Ermilov.Va bu_measured
922addeef82SBruce A. Mahmay have a value of zero in the upcalls after the
923addeef82SBruce A. Mahfirst one, because the measured interval for >= filters is
924addeef82SBruce A. Mah.Dq clocked
925addeef82SBruce A. Mahby the forwarded packets.
926addeef82SBruce A. MahHence, this upcall mechanism should not be used for measuring
927addeef82SBruce A. Mahthe exact value of the bandwidth of the forwarded data.
928addeef82SBruce A. MahTo measure the exact bandwidth, the user would need to
929ef151d78SRuslan Ermilovget the forwarded packets statistics with the
930ef151d78SRuslan Ermilov.Fn ioctl SIOCGETSGCNT
931addeef82SBruce A. Mahmechanism
932addeef82SBruce A. Mah(see the
933addeef82SBruce A. Mah.Sx Programming Guide
934addeef82SBruce A. Mahsection) .
935addeef82SBruce A. Mah.Pp
936addeef82SBruce A. MahNote that the upcalls for a filter are delivered until the specific
937addeef82SBruce A. Mahfilter is deleted, but no more frequently than once per
938ef151d78SRuslan Ermilov.Va bu_threshold.b_time .
939addeef82SBruce A. MahFor example, if the filter is specified to
940addeef82SBruce A. Mahdeliver a signal if bw >= 1 packet, the first packet will trigger a
941addeef82SBruce A. Mahsignal, but the next upcall will be triggered no earlier than
942ef151d78SRuslan Ermilov.Va bu_threshold.b_time
943addeef82SBruce A. Mahafter the previous upcall.
944addeef82SBruce A. Mah.\"
945addeef82SBruce A. Mah.Sh SEE ALSO
946addeef82SBruce A. Mah.Xr getsockopt 2 ,
947addeef82SBruce A. Mah.Xr recvfrom 2 ,
948addeef82SBruce A. Mah.Xr recvmsg 2 ,
949addeef82SBruce A. Mah.Xr setsockopt 2 ,
950addeef82SBruce A. Mah.Xr socket 2 ,
9512fb1aecaSBruce M Simpson.Xr sourcefilter 3 ,
9520b3504fdSChristian Brueffer.Xr altq 4 ,
9530b3504fdSChristian Brueffer.Xr dummynet 4 ,
9540b3504fdSChristian Brueffer.Xr gif 4 ,
9550b3504fdSChristian Brueffer.Xr gre 4 ,
956addeef82SBruce A. Mah.Xr icmp6 4 ,
957d10910e6SBruce M Simpson.Xr igmp 4 ,
958addeef82SBruce A. Mah.Xr inet 4 ,
959addeef82SBruce A. Mah.Xr inet6 4 ,
960addeef82SBruce A. Mah.Xr intro 4 ,
961addeef82SBruce A. Mah.Xr ip 4 ,
962addeef82SBruce A. Mah.Xr ip6 4 ,
96329dc7bc6SBruce M Simpson.Xr mld 4 ,
964addeef82SBruce A. Mah.Xr pim 4
965addeef82SBruce A. Mah.\"
966eea554b7SBruce M Simpson.Sh HISTORY
967eea554b7SBruce M SimpsonThe Distance Vector Multicast Routing Protocol (DVMRP)
968eea554b7SBruce M Simpsonwas the first developed multicast routing protocol.
969eea554b7SBruce M SimpsonLater, other protocols such as Multicast Extensions to OSPF (MOSPF)
970eea554b7SBruce M Simpsonand Core Based Trees (CBT), were developed as well.
971eea554b7SBruce M SimpsonRouters at autonomous system boundaries may now exchange multicast
972eea554b7SBruce M Simpsonroutes with peers via the Border Gateway Protocol (BGP).
973eea554b7SBruce M SimpsonMany other routing protocols are able to redistribute multicast routes
974eea554b7SBruce M Simpsonfor use with
975eea554b7SBruce M Simpson.Dv PIM-SM
976eea554b7SBruce M Simpsonand
977eea554b7SBruce M Simpson.Dv PIM-DM .
978addeef82SBruce A. Mah.Sh AUTHORS
9795203edcdSRuslan Ermilov.An -nosplit
9805203edcdSRuslan ErmilovThe original multicast code was written by
9815203edcdSRuslan Ermilov.An David Waitzman
9825203edcdSRuslan Ermilov(BBN Labs),
983addeef82SBruce A. Mahand later modified by the following individuals:
9845203edcdSRuslan Ermilov.An Steve Deering
9855203edcdSRuslan Ermilov(Stanford),
9865203edcdSRuslan Ermilov.An Mark J. Steiglitz
9875203edcdSRuslan Ermilov(Stanford),
9885203edcdSRuslan Ermilov.An Van Jacobson
9895203edcdSRuslan Ermilov(LBL),
9905203edcdSRuslan Ermilov.An Ajit Thyagarajan
9915203edcdSRuslan Ermilov(PARC),
9925203edcdSRuslan Ermilov.An Bill Fenner
9935203edcdSRuslan Ermilov(PARC).
994addeef82SBruce A. MahThe IPv6 multicast support was implemented by the KAME project
995*1f9aaf18SWolfram Schneider.Pq Pa https://www.kame.net ,
996ef151d78SRuslan Ermilovand was based on the IPv4 multicast code.
997addeef82SBruce A. MahThe advanced multicast API and the multicast bandwidth
9985203edcdSRuslan Ermilovmonitoring were implemented by
9995203edcdSRuslan Ermilov.An Pavlin Radoslavov
10005203edcdSRuslan Ermilov(ICSI)
10015203edcdSRuslan Ermilovin collaboration with
10025203edcdSRuslan Ermilov.An Chris Brown
10035203edcdSRuslan Ermilov(NextHop).
100429dc7bc6SBruce M SimpsonThe IGMPv3 and MLDv2 multicast support was implemented by
100529dc7bc6SBruce M Simpson.An Bruce Simpson .
1006addeef82SBruce A. Mah.Pp
10075203edcdSRuslan ErmilovThis manual page was written by
10085203edcdSRuslan Ermilov.An Pavlin Radoslavov
10095203edcdSRuslan Ermilov(ICSI).
1010