xref: /freebsd/share/man/man4/multicast.4 (revision 5203edcdc553fda6caa1da8826a89b1a02dad1bf)
1addeef82SBruce A. Mah.\" Copyright (c) 2001-2003 International Computer Science Institute
2addeef82SBruce A. Mah.\"
3addeef82SBruce A. Mah.\" Permission is hereby granted, free of charge, to any person obtaining a
4addeef82SBruce A. Mah.\" copy of this software and associated documentation files (the "Software"),
5addeef82SBruce A. Mah.\" to deal in the Software without restriction, including without limitation
6addeef82SBruce A. Mah.\" the rights to use, copy, modify, merge, publish, distribute, sublicense,
7addeef82SBruce A. Mah.\" and/or sell copies of the Software, and to permit persons to whom the
8addeef82SBruce A. Mah.\" Software is furnished to do so, subject to the following conditions:
9addeef82SBruce A. Mah.\"
10addeef82SBruce A. Mah.\" The above copyright notice and this permission notice shall be included in
11addeef82SBruce A. Mah.\" all copies or substantial portions of the Software.
12addeef82SBruce A. Mah.\"
13addeef82SBruce A. Mah.\" The names and trademarks of copyright holders may not be used in
14addeef82SBruce A. Mah.\" advertising or publicity pertaining to the software without specific
15addeef82SBruce A. Mah.\" prior permission. Title to copyright in this software and any associated
16addeef82SBruce A. Mah.\" documentation will at all times remain with the copyright holders.
17addeef82SBruce A. Mah.\"
18addeef82SBruce A. Mah.\" THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
19addeef82SBruce A. Mah.\" IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
20addeef82SBruce A. Mah.\" FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
21addeef82SBruce A. Mah.\" AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
22addeef82SBruce A. Mah.\" LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
23addeef82SBruce A. Mah.\" FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
24addeef82SBruce A. Mah.\" DEALINGS IN THE SOFTWARE.
25addeef82SBruce A. Mah.\"
26addeef82SBruce A. Mah.\" $FreeBSD$
27addeef82SBruce A. Mah.\"
28addeef82SBruce A. Mah.Dd September 4, 2003
29addeef82SBruce A. Mah.Dt MULTICAST 4
30addeef82SBruce A. Mah.Os
31addeef82SBruce A. Mah.\"
32addeef82SBruce A. Mah.Sh NAME
33addeef82SBruce A. Mah.Nm multicast
34addeef82SBruce A. Mah.Nd Multicast Routing
35addeef82SBruce A. Mah.\"
36addeef82SBruce A. Mah.Sh SYNOPSIS
37addeef82SBruce A. Mah.Cd "options MROUTING"
38addeef82SBruce A. Mah.Pp
39addeef82SBruce A. Mah.In sys/types.h
40addeef82SBruce A. Mah.In sys/socket.h
41addeef82SBruce A. Mah.In netinet/in.h
42addeef82SBruce A. Mah.In netinet/ip_mroute.h
43addeef82SBruce A. Mah.In netinet6/ip6_mroute.h
44addeef82SBruce A. Mah.Ft int
45addeef82SBruce A. Mah.Fn getsockopt "int s" IPPROTO_IP MRT_INIT "void *optval" "socklen_t *optlen"
46addeef82SBruce A. Mah.Ft int
47addeef82SBruce A. Mah.Fn setsockopt "int s" IPPROTO_IP MRT_INIT "const void *optval" "socklen_t optlen"
48addeef82SBruce A. Mah.Ft int
49addeef82SBruce A. Mah.Fn getsockopt "int s" IPPROTO_IPV6 MRT6_INIT "void *optval" "socklen_t *optlen"
50addeef82SBruce A. Mah.Ft int
51addeef82SBruce A. Mah.Fn setsockopt "int s" IPPROTO_IPV6 MRT6_INIT "const void *optval" "socklen_t optlen"
52addeef82SBruce A. Mah.Sh DESCRIPTION
53addeef82SBruce A. Mah.Tn "Multicast routing"
54addeef82SBruce A. Mahis used to efficiently propagate data
55addeef82SBruce A. Mahpackets to a set of multicast listeners in multipoint networks.
56addeef82SBruce A. MahIf unicast is used to replicate the data to all listeners,
57addeef82SBruce A. Mahthen some of the network links may carry multiple copies of the same
58addeef82SBruce A. Mahdata packets.
59addeef82SBruce A. MahWith multicast routing, the overhead is reduced to one copy
60addeef82SBruce A. Mah(at most) per network link.
61addeef82SBruce A. Mah.Pp
62addeef82SBruce A. MahAll multicast-capable routers must run a common multicast routing
63addeef82SBruce A. Mahprotocol.
64addeef82SBruce A. MahThe Distance Vector Multicast Routing Protocol (DVMRP)
65addeef82SBruce A. Mahwas the first developed multicast routing protocol.
66addeef82SBruce A. MahLater, other protocols such as Multicast Extensions to OSPF (MOSPF),
67addeef82SBruce A. MahCore Based Trees (CBT),
68addeef82SBruce A. MahProtocol Independent Multicast - Sparse Mode (PIM-SM),
69addeef82SBruce A. Mahand Protocol Independent Multicast - Dense Mode (PIM-DM)
70addeef82SBruce A. Mahwere developed as well.
71addeef82SBruce A. Mah.Pp
72addeef82SBruce A. MahTo start multicast routing,
73addeef82SBruce A. Mahthe user must enable multicast forwarding in the kernel
74addeef82SBruce A. Mah(see
75addeef82SBruce A. Mah.Sx SYNOPSIS
76addeef82SBruce A. Mahabout the kernel configuration options),
77addeef82SBruce A. Mahand must run a multicast routing capable user-level process.
78addeef82SBruce A. MahFrom developer's point of view,
79addeef82SBruce A. Mahthe programming guide described in the
80addeef82SBruce A. Mah.Sx "Programming Guide"
81addeef82SBruce A. Mahsection should be used to control the multicast forwarding in the kernel.
82addeef82SBruce A. Mah.\"
83addeef82SBruce A. Mah.Ss Programming Guide
84addeef82SBruce A. MahThis section provides information about the basic multicast routing API.
85addeef82SBruce A. MahThe so-called
86addeef82SBruce A. Mah.Dq advanced multicast API
87addeef82SBruce A. Mahis described in the
88addeef82SBruce A. Mah.Sx "Advanced Multicast API Programming Guide"
89addeef82SBruce A. Mahsection.
90addeef82SBruce A. Mah.Pp
91addeef82SBruce A. MahFirst, a multicast routing socket must be open.
92addeef82SBruce A. MahThat socket would be used
93addeef82SBruce A. Mahto control the multicast forwarding in the kernel.
94addeef82SBruce A. MahNote that most operations below require certain privilege
95addeef82SBruce A. Mah(i.e., root privilege):
96addeef82SBruce A. Mah.Pp
97addeef82SBruce A. Mah.Bd -literal
98addeef82SBruce A. Mah/* IPv4 */
99addeef82SBruce A. Mahint mrouter_s4;
100addeef82SBruce A. Mahmrouter_s4 = socket(AF_INET, SOCK_RAW, IPPROTO_IGMP);
101addeef82SBruce A. Mah.Ed
102addeef82SBruce A. Mah.Pp
103addeef82SBruce A. Mah.Bd -literal
104addeef82SBruce A. Mahint mrouter_s6;
105addeef82SBruce A. Mahmrouter_s6 = socket(AF_INET6, SOCK_RAW, IPPROTO_ICMPV6);
106addeef82SBruce A. Mah.Ed
107addeef82SBruce A. Mah.Pp
108addeef82SBruce A. MahNote that if the router needs to open an IGMP or ICMPv6 socket
109addeef82SBruce A. Mah(in case of IPv4 and IPv6 respectively)
110addeef82SBruce A. Mahfor sending or receiving of IGMP or MLD multicast group membership messages,
111addeef82SBruce A. Mahthen the same mrouter_s4 or mrouter_s6 sockets should be used
112addeef82SBruce A. Mahfor sending and receiving respectively IGMP or MLD messages.
113addeef82SBruce A. MahIn case of BSD-derived kernel, it may be possible to open separate sockets
114addeef82SBruce A. Mahfor IGMP or MLD messages only.
115addeef82SBruce A. MahHowever, some other kernels (e.g., Linux) require that the multicast
116addeef82SBruce A. Mahrouting socket must be used for sending and receiving of IGMP or MLD
117addeef82SBruce A. Mahmessages.
118addeef82SBruce A. MahTherefore, for portability reason the multicast
119addeef82SBruce A. Mahrouting socket should be reused for IGMP and MLD messages as well.
120addeef82SBruce A. Mah.Pp
121addeef82SBruce A. MahAfter the multicast routing socket is open, it can be used to enable
122addeef82SBruce A. Mahor disable multicast forwarding in the kernel:
123addeef82SBruce A. Mah.Bd -literal
124addeef82SBruce A. Mah/* IPv4 */
125addeef82SBruce A. Mahint v = 1;        /* 1 to enable, or 0 to disable */
126addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_INIT, (void *)&v, sizeof(v));
127addeef82SBruce A. Mah.Ed
128addeef82SBruce A. Mah.Pp
129addeef82SBruce A. Mah.Bd -literal
130addeef82SBruce A. Mah/* IPv6 */
131addeef82SBruce A. Mahint v = 1;        /* 1 to enable, or 0 to disable */
132addeef82SBruce A. Mahsetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_INIT, (void *)&v, sizeof(v));
133addeef82SBruce A. Mah\&...
134addeef82SBruce A. Mah/* If necessary, filter all ICMPv6 messages */
135addeef82SBruce A. Mahstruct icmp6_filter filter;
136addeef82SBruce A. MahICMP6_FILTER_SETBLOCKALL(&filter);
137addeef82SBruce A. Mahsetsockopt(mrouter_s6, IPPROTO_ICMPV6, ICMP6_FILTER, (void *)&filter,
138addeef82SBruce A. Mah           sizeof(filter));
139addeef82SBruce A. Mah.Ed
140addeef82SBruce A. Mah.Pp
141addeef82SBruce A. MahAfter multicast forwarding is enabled, the multicast routing socket
142addeef82SBruce A. Mahcan be used to enable PIM processing in the kernel if we are running PIM-SM or
143addeef82SBruce A. MahPIM-DM
144addeef82SBruce A. Mah(see
145addeef82SBruce A. Mah.Xr pim 4 ) .
146addeef82SBruce A. Mah.Pp
147addeef82SBruce A. MahFor each network interface (e.g., physical or a virtual tunnel)
148addeef82SBruce A. Mahthat would be used for multicast forwarding, a corresponding
149addeef82SBruce A. Mahmulticast interface must be added to the kernel:
150addeef82SBruce A. Mah.Bd -literal
151addeef82SBruce A. Mah/* IPv4 */
152addeef82SBruce A. Mahstruct vifctl vc;
153addeef82SBruce A. Mahmemset(&vc, 0, sizeof(vc));
154addeef82SBruce A. Mah/* Assign all vifctl fields as appropriate */
155addeef82SBruce A. Mahvc.vifc_vifi = vif_index;
156addeef82SBruce A. Mahvc.vifc_flags = vif_flags;
157addeef82SBruce A. Mahvc.vifc_threshold = min_ttl_threshold;
158addeef82SBruce A. Mahvc.vifc_rate_limit = max_rate_limit;
159addeef82SBruce A. Mahmemcpy(&vc.vifc_lcl_addr, &vif_local_address, sizeof(vc.vifc_lcl_addr));
160addeef82SBruce A. Mahif (vc.vifc_flags & VIFF_TUNNEL)
161addeef82SBruce A. Mah    memcpy(&vc.vifc_rmt_addr, &vif_remote_address,
162addeef82SBruce A. Mah           sizeof(vc.vifc_rmt_addr));
163addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, (void *)&vc,
164addeef82SBruce A. Mah           sizeof(vc));
165addeef82SBruce A. Mah.Ed
166addeef82SBruce A. Mah.Pp
167addeef82SBruce A. MahThe
168addeef82SBruce A. Mah.Dq vif_index
169addeef82SBruce A. Mahmust be unique per vif.
170addeef82SBruce A. MahThe
171addeef82SBruce A. Mah.Dq vif_flags
172addeef82SBruce A. Mahcontains the
173addeef82SBruce A. Mah.Dq VIFF_*
174addeef82SBruce A. Mahflags as defined in <netinet/ip_mroute.h>.
175addeef82SBruce A. MahThe
176addeef82SBruce A. Mah.Dq min_ttl_threshold
177addeef82SBruce A. Mahcontains the minimum TTL a multicast data packet must have to be
178addeef82SBruce A. Mahforwarded on that vif.
179addeef82SBruce A. MahTypically, it would have value of 1.
180addeef82SBruce A. MahThe
181addeef82SBruce A. Mah.Dq max_rate_limit
182addeef82SBruce A. Mahcontains the maximum rate (in bits/s) of the multicast data packets forwarded
183addeef82SBruce A. Mahon that vif.
184addeef82SBruce A. MahValue of 0 means no limit.
185addeef82SBruce A. MahThe
186addeef82SBruce A. Mah.Dq vif_local_address
187addeef82SBruce A. Mahcontains the local IP address of the corresponding local interface.
188addeef82SBruce A. MahThe
189addeef82SBruce A. Mah.Dq vif_remote_address
190addeef82SBruce A. Mahcontains the remote IP address in case of DVMRP multicast tunnels.
191addeef82SBruce A. Mah.Pp
192addeef82SBruce A. Mah.Bd -literal
193addeef82SBruce A. Mah/* IPv6 */
194addeef82SBruce A. Mahstruct mif6ctl mc;
195addeef82SBruce A. Mahmemset(&mc, 0, sizeof(mc));
196addeef82SBruce A. Mah/* Assign all mif6ctl fields as appropriate */
197addeef82SBruce A. Mahmc.mif6c_mifi = mif_index;
198addeef82SBruce A. Mahmc.mif6c_flags = mif_flags;
199addeef82SBruce A. Mahmc.mif6c_pifi = pif_index;
200addeef82SBruce A. Mahsetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, (void *)&mc,
201addeef82SBruce A. Mah           sizeof(mc));
202addeef82SBruce A. Mah.Ed
203addeef82SBruce A. Mah.Pp
204addeef82SBruce A. MahThe
205addeef82SBruce A. Mah.Dq mif_index
206addeef82SBruce A. Mahmust be unique per vif.
207addeef82SBruce A. MahThe
208addeef82SBruce A. Mah.Dq mif_flags
209addeef82SBruce A. Mahcontains the
210addeef82SBruce A. Mah.Dq MIFF_*
211addeef82SBruce A. Mahflags as defined in <netinet6/ip6_mroute.h>.
212addeef82SBruce A. MahThe
213addeef82SBruce A. Mah.Dq pif_index
214addeef82SBruce A. Mahis the physical interface index of the corresponding local interface.
215addeef82SBruce A. Mah.Pp
216addeef82SBruce A. MahA multicast interface is deleted by:
217addeef82SBruce A. Mah.Bd -literal
218addeef82SBruce A. Mah/* IPv4 */
219addeef82SBruce A. Mahvifi_t vifi = vif_index;
220addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_VIF, (void *)&vifi,
221addeef82SBruce A. Mah           sizeof(vifi));
222addeef82SBruce A. Mah.Ed
223addeef82SBruce A. Mah.Pp
224addeef82SBruce A. Mah.Bd -literal
225addeef82SBruce A. Mah/* IPv6 */
226addeef82SBruce A. Mahmifi_t mifi = mif_index;
227addeef82SBruce A. Mahsetsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_DEL_MIF, (void *)&mifi,
228addeef82SBruce A. Mah           sizeof(mifi));
229addeef82SBruce A. Mah.Ed
230addeef82SBruce A. Mah.Pp
231addeef82SBruce A. MahAfter the multicast forwarding is enabled, and the multicast virtual
232addeef82SBruce A. Mahinterfaces are
233addeef82SBruce A. Mahadded, the kernel may deliver upcall messages (also called signals
234addeef82SBruce A. Mahlater in this text) on the multicast routing socket that was open
235addeef82SBruce A. Mahearlier with
236addeef82SBruce A. Mah.Dq MRT_INIT
237addeef82SBruce A. Mahor
238addeef82SBruce A. Mah.Dq MRT6_INIT .
239addeef82SBruce A. MahThe IPv4 upcalls have
240addeef82SBruce A. Mah.Dq struct igmpmsg
241addeef82SBruce A. Mahheader (see <netinet/ip_mroute.h>) with field
242addeef82SBruce A. Mah.Dq im_mbz
243addeef82SBruce A. Mahset to zero.
244addeef82SBruce A. MahNote that this header follows the structure of
245addeef82SBruce A. Mah.Dq struct ip
246addeef82SBruce A. Mahwith the protocol field
247addeef82SBruce A. Mah.Dq ip_p
248addeef82SBruce A. Mahset to zero.
249addeef82SBruce A. MahThe IPv6 upcalls have
250addeef82SBruce A. Mah.Dq struct mrt6msg
251addeef82SBruce A. Mahheader (see <netinet6/ip6_mroute.h>) with field
252addeef82SBruce A. Mah.Dq im6_mbz
253addeef82SBruce A. Mahset to zero.
254addeef82SBruce A. MahNote that this header follows the structure of
255addeef82SBruce A. Mah.Dq struct ip6_hdr
256addeef82SBruce A. Mahwith the next header field
257addeef82SBruce A. Mah.Dq ip6_nxt
258addeef82SBruce A. Mahset to zero.
259addeef82SBruce A. Mah.Pp
260addeef82SBruce A. MahThe upcall header contains field
261addeef82SBruce A. Mah.Dq im_msgtype
262addeef82SBruce A. Mahand
263addeef82SBruce A. Mah.Dq im6_msgtype
264addeef82SBruce A. Mahwith the type of the upcall
265addeef82SBruce A. Mah.Dq IGMPMSG_*
266addeef82SBruce A. Mahand
267addeef82SBruce A. Mah.Dq MRT6MSG_*
268addeef82SBruce A. Mahfor IPv4 and IPv6 respectively.
269addeef82SBruce A. MahThe values of the rest of the upcall header fields
270addeef82SBruce A. Mahand the body of the upcall message depend on the particular upcall type.
271addeef82SBruce A. Mah.Pp
272addeef82SBruce A. MahIf the upcall message type is
273addeef82SBruce A. Mah.Dq IGMPMSG_NOCACHE
274addeef82SBruce A. Mahor
275addeef82SBruce A. Mah.Dq MRT6MSG_NOCACHE ,
276addeef82SBruce A. Mahthis is an indication that a multicast packet has reached the multicast
277addeef82SBruce A. Mahrouter, but the router has no forwarding state for that packet.
278addeef82SBruce A. MahTypically, the upcall would be a signal for the multicast routing
279addeef82SBruce A. Mahuser-level process to install the appropriate Multicast Forwarding
280addeef82SBruce A. MahCache (MFC) entry in the kernel.
281addeef82SBruce A. Mah.Pp
282addeef82SBruce A. MahA MFC entry is added by:
283addeef82SBruce A. Mah.Bd -literal
284addeef82SBruce A. Mah/* IPv4 */
285addeef82SBruce A. Mahstruct mfcctl mc;
286addeef82SBruce A. Mahmemset(&mc, 0, sizeof(mc));
287addeef82SBruce A. Mahmemcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin));
288addeef82SBruce A. Mahmemcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp));
289addeef82SBruce A. Mahmc.mfcc_parent = iif_index;
290addeef82SBruce A. Mahfor (i = 0; i < maxvifs; i++)
291addeef82SBruce A. Mah    mc.mfcc_ttls[i] = oifs_ttl[i];
292addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_MFC,
293addeef82SBruce A. Mah           (void *)&mc, sizeof(mc));
294addeef82SBruce A. Mah.Ed
295addeef82SBruce A. Mah.Pp
296addeef82SBruce A. Mah.Bd -literal
297addeef82SBruce A. Mah/* IPv6 */
298addeef82SBruce A. Mahstruct mf6cctl mc;
299addeef82SBruce A. Mahmemset(&mc, 0, sizeof(mc));
300addeef82SBruce A. Mahmemcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin));
301addeef82SBruce A. Mahmemcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp));
302addeef82SBruce A. Mahmc.mf6cc_parent = iif_index;
303addeef82SBruce A. Mahfor (i = 0; i < maxvifs; i++)
304addeef82SBruce A. Mah    if (oifs_ttl[i] > 0)
305addeef82SBruce A. Mah        IF_SET(i, &mc.mf6cc_ifset);
306addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_ADD_MFC,
307addeef82SBruce A. Mah           (void *)&mc, sizeof(mc));
308addeef82SBruce A. Mah.Ed
309addeef82SBruce A. Mah.Pp
310addeef82SBruce A. MahThe
311addeef82SBruce A. Mah.Dq source_addr
312addeef82SBruce A. Mahand
313addeef82SBruce A. Mah.Dq group_addr
314addeef82SBruce A. Mahare the source and group address of the multicast packet (as set
315addeef82SBruce A. Mahin the upcall message).
316addeef82SBruce A. MahThe
317addeef82SBruce A. Mah.Dq iif_index
318addeef82SBruce A. Mahis the virtual interface index of the multicast interface the multicast
319addeef82SBruce A. Mahpackets for this specific source and group address should be received on.
320addeef82SBruce A. MahThe
321addeef82SBruce A. Mah.Dq oifs_ttl[]
322addeef82SBruce A. Maharray contains the minimum TTL (per interface) a multicast packet
323addeef82SBruce A. Mahshould have to be forwarded on an outgoing interface.
324addeef82SBruce A. MahIf the TTL value is zero, the corresponding interface is not included
325addeef82SBruce A. Mahin the set of outgoing interfaces.
326addeef82SBruce A. MahNote that in case of IPv6 only the set of outgoing interfaces can
327addeef82SBruce A. Mahbe specified.
328addeef82SBruce A. Mah.Pp
329addeef82SBruce A. MahA MFC entry is deleted by:
330addeef82SBruce A. Mah.Bd -literal
331addeef82SBruce A. Mah/* IPv4 */
332addeef82SBruce A. Mahstruct mfcctl mc;
333addeef82SBruce A. Mahmemset(&mc, 0, sizeof(mc));
334addeef82SBruce A. Mahmemcpy(&mc.mfcc_origin, &source_addr, sizeof(mc.mfcc_origin));
335addeef82SBruce A. Mahmemcpy(&mc.mfcc_mcastgrp, &group_addr, sizeof(mc.mfcc_mcastgrp));
336addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_MFC,
337addeef82SBruce A. Mah           (void *)&mc, sizeof(mc));
338addeef82SBruce A. Mah.Ed
339addeef82SBruce A. Mah.Pp
340addeef82SBruce A. Mah.Bd -literal
341addeef82SBruce A. Mah/* IPv6 */
342addeef82SBruce A. Mahstruct mf6cctl mc;
343addeef82SBruce A. Mahmemset(&mc, 0, sizeof(mc));
344addeef82SBruce A. Mahmemcpy(&mc.mf6cc_origin, &source_addr, sizeof(mc.mf6cc_origin));
345addeef82SBruce A. Mahmemcpy(&mc.mf6cc_mcastgrp, &group_addr, sizeof(mf6cc_mcastgrp));
346addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_DEL_MFC,
347addeef82SBruce A. Mah           (void *)&mc, sizeof(mc));
348addeef82SBruce A. Mah.Ed
349addeef82SBruce A. Mah.Pp
350addeef82SBruce A. MahThe following method can be used to get various statistics per
351addeef82SBruce A. Mahinstalled MFC entry in the kernel (e.g., the number of forwarded
352addeef82SBruce A. Mahpackets per source and group address):
353addeef82SBruce A. Mah.Bd -literal
354addeef82SBruce A. Mah/* IPv4 */
355addeef82SBruce A. Mahstruct sioc_sg_req sgreq;
356addeef82SBruce A. Mahmemset(&sgreq, 0, sizeof(sgreq));
357addeef82SBruce A. Mahmemcpy(&sgreq.src, &source_addr, sizeof(sgreq.src));
358addeef82SBruce A. Mahmemcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp));
359addeef82SBruce A. Mahioctl(mrouter_s4, SIOCGETSGCNT, &sgreq);
360addeef82SBruce A. Mah.Ed
361addeef82SBruce A. Mah.Pp
362addeef82SBruce A. Mah.Bd -literal
363addeef82SBruce A. Mah/* IPv6 */
364addeef82SBruce A. Mahstruct sioc_sg_req6 sgreq;
365addeef82SBruce A. Mahmemset(&sgreq, 0, sizeof(sgreq));
366addeef82SBruce A. Mahmemcpy(&sgreq.src, &source_addr, sizeof(sgreq.src));
367addeef82SBruce A. Mahmemcpy(&sgreq.grp, &group_addr, sizeof(sgreq.grp));
368addeef82SBruce A. Mahioctl(mrouter_s6, SIOCGETSGCNT_IN6, &sgreq);
369addeef82SBruce A. Mah.Ed
370addeef82SBruce A. Mah.Pp
371addeef82SBruce A. MahThe following method can be used to get various statistics per
372addeef82SBruce A. Mahmulticast virtual interface in the kernel (e.g., the number of forwarded
373addeef82SBruce A. Mahpackets per interface):
374addeef82SBruce A. Mah.Bd -literal
375addeef82SBruce A. Mah/* IPv4 */
376addeef82SBruce A. Mahstruct sioc_vif_req vreq;
377addeef82SBruce A. Mahmemset(&vreq, 0, sizeof(vreq));
378addeef82SBruce A. Mahvreq.vifi = vif_index;
379addeef82SBruce A. Mahioctl(mrouter_s4, SIOCGETVIFCNT, &vreq);
380addeef82SBruce A. Mah.Ed
381addeef82SBruce A. Mah.Pp
382addeef82SBruce A. Mah.Bd -literal
383addeef82SBruce A. Mah/* IPv6 */
384addeef82SBruce A. Mahstruct sioc_mif_req6 mreq;
385addeef82SBruce A. Mahmemset(&mreq, 0, sizeof(mreq));
386addeef82SBruce A. Mahmreq.mifi = vif_index;
387addeef82SBruce A. Mahioctl(mrouter_s6, SIOCGETMIFCNT_IN6, &mreq);
388addeef82SBruce A. Mah.Ed
389addeef82SBruce A. Mah.Pp
390addeef82SBruce A. Mah.Ss Advanced Multicast API Programming Guide
391addeef82SBruce A. MahIf we want to add new features in the kernel, it becomes difficult
392addeef82SBruce A. Mahto preserve backward compatibility (binary and API),
393addeef82SBruce A. Mahand at the same time to allow user-level processes to take advantage of
394addeef82SBruce A. Mahthe new features (if the kernel supports them).
395addeef82SBruce A. Mah.Pp
396addeef82SBruce A. MahOne of the mechanisms that allows us to preserve the backward
397addeef82SBruce A. Mahcompatibility is a sort of negotiation
398addeef82SBruce A. Mahbetween the user-level process and the kernel:
399addeef82SBruce A. Mah.Bl -enum
400addeef82SBruce A. Mah.It
401addeef82SBruce A. MahThe user-level process tries to enable in the kernel the set of new
402addeef82SBruce A. Mahfeatures (and the corresponding API) it would like to use.
403addeef82SBruce A. Mah.It
404addeef82SBruce A. MahThe kernel returns the (sub)set of features it knows about
405addeef82SBruce A. Mahand is willing to be enabled.
406addeef82SBruce A. Mah.It
407addeef82SBruce A. MahThe user-level process uses only that set of features
408addeef82SBruce A. Mahthe kernel has agreed on.
409addeef82SBruce A. Mah.El
410addeef82SBruce A. Mah.\"
411addeef82SBruce A. Mah.Pp
412addeef82SBruce A. MahTo support backward compatibility, if the user-level process doesn't
413addeef82SBruce A. Mahask for any new features, the kernel defaults to the basic
414addeef82SBruce A. Mahmulticast API (see the
415addeef82SBruce A. Mah.Sx "Programming Guide"
416addeef82SBruce A. Mahsection).
417addeef82SBruce A. Mah.\" XXX: edit as appropriate after the advanced multicast API is
418addeef82SBruce A. Mah.\" supported under IPv6
419addeef82SBruce A. MahCurrently, the advanced multicast API exists only for IPv4;
420addeef82SBruce A. Mahin the future there will be IPv6 support as well.
421addeef82SBruce A. Mah.Pp
422addeef82SBruce A. MahBelow is a summary of the expandable API solution.
423addeef82SBruce A. MahNote that all new options and structures are defined
424addeef82SBruce A. Mahin <netinet/ip_mroute.h> and <netinet6/ip6_mroute.h>,
425addeef82SBruce A. Mahunless stated otherwise.
426addeef82SBruce A. Mah.Pp
427addeef82SBruce A. MahThe user-level process uses new get/setsockopt() options to
428addeef82SBruce A. Mahperform the API features negotiation with the kernel.
429addeef82SBruce A. MahThis negotiation must be performed right after the multicast routing
430addeef82SBruce A. Mahsocket is open.
431addeef82SBruce A. MahThe set of desired/allowed features is stored in a bitset
432addeef82SBruce A. Mah(currently, in uint32_t; i.e., maximum of 32 new features).
433addeef82SBruce A. MahThe new get/setsockopt() options are
434addeef82SBruce A. Mah.Dq MRT_API_SUPPORT
435addeef82SBruce A. Mahand
436addeef82SBruce A. Mah.Dq MRT_API_CONFIG .
437addeef82SBruce A. MahExample:
438addeef82SBruce A. Mah.Bd -literal
439addeef82SBruce A. Mahuint32_t v;
440addeef82SBruce A. Mahgetsockopt(sock, IPPROTO_IP, MRT_API_SUPPORT, (void *)&v, sizeof(v));
441addeef82SBruce A. Mah.Ed
442addeef82SBruce A. Mah.Pp
443addeef82SBruce A. Mahwould set in
444addeef82SBruce A. Mah.Dq v
445addeef82SBruce A. Mahthe pre-defined bits that the kernel API supports.
446addeef82SBruce A. MahThe eight least significant bits in uint32_t are same as the
447addeef82SBruce A. Maheight possible flags
448addeef82SBruce A. Mah.Dq MRT_MFC_FLAGS_*
449addeef82SBruce A. Mahthat can be used in
450addeef82SBruce A. Mah.Dq mfcc_flags
451addeef82SBruce A. Mahas part of the new definition of
452addeef82SBruce A. Mah.Dq struct mfcctl
453addeef82SBruce A. Mah(see below about those flags), which leaves 24 flags for other new features.
454addeef82SBruce A. MahThe value returned by getsockopt(MRT_API_SUPPORT) is read-only; in other
455addeef82SBruce A. Mahwords, setsockopt(MRT_API_SUPPORT) would fail.
456addeef82SBruce A. Mah.Pp
457addeef82SBruce A. MahTo modify the API, and to set some specific feature in the kernel, then:
458addeef82SBruce A. Mah.Bd -literal
459addeef82SBruce A. Mahuint32_t v = MRT_MFC_FLAGS_DISABLE_WRONGVIF;
460addeef82SBruce A. Mahif (setsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v))
461addeef82SBruce A. Mah    != 0) {
462addeef82SBruce A. Mah    return (ERROR);
463addeef82SBruce A. Mah}
464addeef82SBruce A. Mahif (v & MRT_MFC_FLAGS_DISABLE_WRONGVIF)
465addeef82SBruce A. Mah    return (OK);	/* Success */
466addeef82SBruce A. Mahelse
467addeef82SBruce A. Mah    return (ERROR);
468addeef82SBruce A. Mah.Ed
469addeef82SBruce A. Mah.Pp
470addeef82SBruce A. MahIn other words, when setsockopt(MRT_API_CONFIG) is called, the
471addeef82SBruce A. Mahargument to it specifies the desired set of features to
472addeef82SBruce A. Mahbe enabled in the API and the kernel.
473addeef82SBruce A. MahThe return value in
474addeef82SBruce A. Mah.Dq v
475addeef82SBruce A. Mahis the actual (sub)set of features that were enabled in the kernel.
476addeef82SBruce A. MahTo obtain later the same set of features that were enabled, then:
477addeef82SBruce A. Mah.Bd -literal
478addeef82SBruce A. Mahgetsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *)&v, sizeof(v));
479addeef82SBruce A. Mah.Ed
480addeef82SBruce A. Mah.Pp
481addeef82SBruce A. MahThe set of enabled features is global.
482addeef82SBruce A. MahIn other words, setsockopt(MRT_API_CONFIG)
483addeef82SBruce A. Mahshould be called right after setsockopt(MRT_INIT).
484addeef82SBruce A. Mah.Pp
485addeef82SBruce A. MahCurrently, the following set of new features is defined:
486addeef82SBruce A. Mah.Bd -literal
487addeef82SBruce A. Mah#define	MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */
488addeef82SBruce A. Mah#define	MRT_MFC_FLAGS_BORDER_VIF   (1 << 1)  /* border vif              */
489addeef82SBruce A. Mah#define MRT_MFC_RP                 (1 << 8)  /* enable RP address	*/
490addeef82SBruce A. Mah#define MRT_MFC_BW_UPCALL          (1 << 9)  /* enable bw upcalls	*/
491addeef82SBruce A. Mah.Ed
492addeef82SBruce A. Mah.\" .Pp
493addeef82SBruce A. Mah.\" In the future there might be:
494addeef82SBruce A. Mah.\" .Bd -literal
495addeef82SBruce A. Mah.\" #define MRT_MFC_GROUP_SPECIFIC     (1 << 10) /* allow (*,G) MFC entries */
496addeef82SBruce A. Mah.\" .Ed
497addeef82SBruce A. Mah.\" .Pp
498addeef82SBruce A. Mah.\" to allow (*,G) MFC entries (i.e., group-specific entries) in the kernel.
499addeef82SBruce A. Mah.\" For now this is left-out until it is clear whether
500addeef82SBruce A. Mah.\" (*,G) MFC support is the preferred solution instead of something more generic
501addeef82SBruce A. Mah.\" solution for example.
502addeef82SBruce A. Mah.\"
503addeef82SBruce A. Mah.\" 2. The newly defined struct mfcctl2.
504addeef82SBruce A. Mah.\"
505addeef82SBruce A. Mah.Pp
506addeef82SBruce A. MahThe advanced multicast API uses a newly defined
507addeef82SBruce A. Mah.Dq struct mfcctl2
508addeef82SBruce A. Mahinstead of the traditional
509addeef82SBruce A. Mah.Dq struct mfcctl .
510addeef82SBruce A. MahThe original
511addeef82SBruce A. Mah.Dq struct mfcctl
512addeef82SBruce A. Mahis kept as is.
513addeef82SBruce A. MahThe new
514addeef82SBruce A. Mah.Dq struct mfcctl2
515addeef82SBruce A. Mahis:
516addeef82SBruce A. Mah.Bd -literal
517addeef82SBruce A. Mah/*
518addeef82SBruce A. Mah * The new argument structure for MRT_ADD_MFC and MRT_DEL_MFC overlays
519addeef82SBruce A. Mah * and extends the old struct mfcctl.
520addeef82SBruce A. Mah */
521addeef82SBruce A. Mahstruct mfcctl2 {
522addeef82SBruce A. Mah        /* the mfcctl fields */
523addeef82SBruce A. Mah        struct in_addr  mfcc_origin;       /* ip origin of mcasts       */
524addeef82SBruce A. Mah        struct in_addr  mfcc_mcastgrp;     /* multicast group associated*/
525addeef82SBruce A. Mah        vifi_t          mfcc_parent;       /* incoming vif              */
526addeef82SBruce A. Mah        u_char          mfcc_ttls[MAXVIFS];/* forwarding ttls on vifs   */
527addeef82SBruce A. Mah
528addeef82SBruce A. Mah        /* extension fields */
529addeef82SBruce A. Mah        uint8_t         mfcc_flags[MAXVIFS];/* the MRT_MFC_FLAGS_* flags*/
530addeef82SBruce A. Mah        struct in_addr  mfcc_rp;            /* the RP address           */
531addeef82SBruce A. Mah};
532addeef82SBruce A. Mah.Ed
533addeef82SBruce A. Mah.Pp
534addeef82SBruce A. MahThe new fields are
535addeef82SBruce A. Mah.Dq mfcc_flags[MAXVIFS]
536addeef82SBruce A. Mahand
537addeef82SBruce A. Mah.Dq mfcc_rp .
538addeef82SBruce A. MahNote that for compatibility reasons they are added at the end.
539addeef82SBruce A. Mah.Pp
540addeef82SBruce A. MahThe
541addeef82SBruce A. Mah.Dq mfcc_flags[MAXVIFS]
542addeef82SBruce A. Mahfield is used to set various flags per
543addeef82SBruce A. Mahinterface per (S,G) entry.
544addeef82SBruce A. MahCurrently, the defined flags are:
545addeef82SBruce A. Mah.Bd -literal
546addeef82SBruce A. Mah#define	MRT_MFC_FLAGS_DISABLE_WRONGVIF (1 << 0) /* disable WRONGVIF signals */
547addeef82SBruce A. Mah#define	MRT_MFC_FLAGS_BORDER_VIF       (1 << 1) /* border vif          */
548addeef82SBruce A. Mah.Ed
549addeef82SBruce A. Mah.Pp
550addeef82SBruce A. MahThe
551addeef82SBruce A. Mah.Dq MRT_MFC_FLAGS_DISABLE_WRONGVIF
552addeef82SBruce A. Mahflag is used to explicitly disable the
553addeef82SBruce A. Mah.Dq IGMPMSG_WRONGVIF
554addeef82SBruce A. Mahkernel signal at the (S,G) granularity if a multicast data packet
555addeef82SBruce A. Maharrives on the wrong interface.
556addeef82SBruce A. MahUsually, this signal is used to
557addeef82SBruce A. Mahcomplete the shortest-path switch in case of PIM-SM multicast routing,
558addeef82SBruce A. Mahor to trigger a PIM assert message.
559addeef82SBruce A. MahHowever, it should not be delivered for interfaces that are not in
560addeef82SBruce A. Mahthe outgoing interface set, and that are not expecting to
561addeef82SBruce A. Mahbecome an incoming interface.
562addeef82SBruce A. MahHence, if the
563addeef82SBruce A. Mah.Dq MRT_MFC_FLAGS_DISABLE_WRONGVIF
564addeef82SBruce A. Mahflag is set for some of the
565addeef82SBruce A. Mahinterfaces, then a data packet that arrives on that interface for
566addeef82SBruce A. Mahthat MFC entry will NOT trigger a WRONGVIF signal.
567addeef82SBruce A. MahIf that flag is not set, then a signal is triggered (the default action).
568addeef82SBruce A. Mah.Pp
569addeef82SBruce A. MahThe
570addeef82SBruce A. Mah.Dq MRT_MFC_FLAGS_BORDER_VIF
571addeef82SBruce A. Mahflag is used to specify whether the Border-bit in PIM
572addeef82SBruce A. MahRegister messages should be set (in case when the Register encapsulation
573addeef82SBruce A. Mahis performed inside the kernel).
574addeef82SBruce A. MahIf it is set for the special PIM Register kernel virtual interface
575addeef82SBruce A. Mah(see
576addeef82SBruce A. Mah.Xr pim 4 ) ,
577addeef82SBruce A. Mahthe Border-bit in the Register messages sent to the RP will be set.
578addeef82SBruce A. Mah.Pp
579addeef82SBruce A. MahThe remaining six bits are reserved for future usage.
580addeef82SBruce A. Mah.Pp
581addeef82SBruce A. MahThe
582addeef82SBruce A. Mah.Dq mfcc_rp
583addeef82SBruce A. Mahfield is used to specify the RP address (in case of PIM-SM multicast routing)
584addeef82SBruce A. Mahfor a multicast
585addeef82SBruce A. Mahgroup G if we want to perform kernel-level PIM Register encapsulation.
586addeef82SBruce A. MahThe
587addeef82SBruce A. Mah.Dq mfcc_rp
588addeef82SBruce A. Mahfield is used only if the
589addeef82SBruce A. Mah.Dq MRT_MFC_RP
590addeef82SBruce A. Mahadvanced API flag/capability has been successfully set by
591addeef82SBruce A. Mahsetsockopt(MRT_API_CONFIG).
592addeef82SBruce A. Mah.Pp
593addeef82SBruce A. Mah.\"
594addeef82SBruce A. Mah.\" 3. Kernel-level PIM Register encapsulation
595addeef82SBruce A. Mah.\"
596addeef82SBruce A. MahIf the
597addeef82SBruce A. Mah.Dq MRT_MFC_RP
598addeef82SBruce A. Mahflag was successfully set by
599addeef82SBruce A. Mahsetsockopt(MRT_API_CONFIG), then the kernel will attempt to perform
600addeef82SBruce A. Mahthe PIM Register encapsulation itself instead of sending the
601addeef82SBruce A. Mahmulticast data packets to user level (inside IGMPMSG_WHOLEPKT
602addeef82SBruce A. Mahupcalls) for user-level encapsulation.
603addeef82SBruce A. MahThe RP address would be taken from the
604addeef82SBruce A. Mah.Dq mfcc_rp
605addeef82SBruce A. Mahfield
606addeef82SBruce A. Mahinside the new
607addeef82SBruce A. Mah.Dq struct mfcctl2 .
608addeef82SBruce A. MahHowever, even if the
609addeef82SBruce A. Mah.Dq MRT_MFC_RP
610addeef82SBruce A. Mahflag was successfully set, if the
611addeef82SBruce A. Mah.Dq mfcc_rp
612addeef82SBruce A. Mahfield was set to
613addeef82SBruce A. Mah.Dq INADDR_ANY ,
614addeef82SBruce A. Mahthen the
615addeef82SBruce A. Mahkernel will still deliver an IGMPMSG_WHOLEPKT upcall with the
616addeef82SBruce A. Mahmulticast data packet to the user-level process.
617addeef82SBruce A. Mah.Pp
618addeef82SBruce A. MahIn addition, if the multicast data packet is too large to fit within
619addeef82SBruce A. Maha single IP packet after the PIM Register encapsulation (e.g., if
620addeef82SBruce A. Mahits size was on the order of 65500 bytes), the data packet will be
621addeef82SBruce A. Mahfragmented, and then each of the fragments will be encapsulated
622addeef82SBruce A. Mahseparately.
623addeef82SBruce A. MahNote that typically a multicast data packet can be that
624addeef82SBruce A. Mahlarge only if it was originated locally from the same hosts that
625addeef82SBruce A. Mahperforms the encapsulation; otherwise the transmission of the
626addeef82SBruce A. Mahmulticast data packet over Ethernet for example would have
627addeef82SBruce A. Mahfragmented it into much smaller pieces.
628addeef82SBruce A. Mah.\"
629addeef82SBruce A. Mah.\" Note that if this code is ported to IPv6, we may need the kernel to
630addeef82SBruce A. Mah.\" perform MTU discovery to the RP, and keep those discoveries inside
631addeef82SBruce A. Mah.\" the kernel so the encapsulating router may send back ICMP
632addeef82SBruce A. Mah.\" Fragmentation Required if the size of the multicast data packet is
633addeef82SBruce A. Mah.\" too large (see "Encapsulating data packets in the Register Tunnel"
634addeef82SBruce A. Mah.\" in Section 4.4.1 in the PIM-SM spec
635addeef82SBruce A. Mah.\" draft-ietf-pim-sm-v2-new-05.{txt,ps}).
636addeef82SBruce A. Mah.\" For IPv4 we may be able to get away without it, but for IPv6 we need
637addeef82SBruce A. Mah.\" that.
638addeef82SBruce A. Mah.\"
639addeef82SBruce A. Mah.\" 4. Mechanism for "multicast bandwidth monitoring and upcalls".
640addeef82SBruce A. Mah.\"
641addeef82SBruce A. Mah.Pp
642addeef82SBruce A. MahTypically, a multicast routing user-level process would need to know the
643addeef82SBruce A. Mahforwarding bandwidth for some data flow.
644addeef82SBruce A. MahFor example, the multicast routing process may want to timeout idle MFC
645addeef82SBruce A. Mahentries, or in case of PIM-SM it can initiate (S,G) shortest-path switch if
646addeef82SBruce A. Mahthe bandwidth rate is above a threshold for example.
647addeef82SBruce A. Mah.Pp
648addeef82SBruce A. MahThe original solution for measuring the bandwidth of a dataflow was
649addeef82SBruce A. Mahthat a user-level process would periodically
650addeef82SBruce A. Mahquery the kernel about the number of forwarded packets/bytes per
651addeef82SBruce A. Mah(S,G), and then based on those numbers it would estimate whether a source
652addeef82SBruce A. Mahhas been idle, or whether the source's transmission bandwidth is above a
653addeef82SBruce A. Mahthreshold.
654addeef82SBruce A. MahThat solution is far from being scalable, hence the need for a new
655addeef82SBruce A. Mahmechanism for bandwidth monitoring.
656addeef82SBruce A. Mah.Pp
657addeef82SBruce A. MahBelow is a description of the bandwidth monitoring mechanism.
658addeef82SBruce A. Mah.Bl -bullet
659addeef82SBruce A. Mah.It
660addeef82SBruce A. MahIf the bandwidth of a data flow satisfies some pre-defined filter,
661addeef82SBruce A. Mahthe kernel delivers an upcall on the multicast routing socket
662addeef82SBruce A. Mahto the multicast routing process that has installed that filter.
663addeef82SBruce A. Mah.It
6645203edcdSRuslan ErmilovThe bandwidth-upcall filters are installed per (S,G).
6655203edcdSRuslan ErmilovThere can be
666addeef82SBruce A. Mahmore than one filter per (S,G).
667addeef82SBruce A. Mah.It
668addeef82SBruce A. MahInstead of supporting all possible comparison operations
669addeef82SBruce A. Mah(i.e., < <= == != > >= ), there is support only for the
670addeef82SBruce A. Mah<= and >= operations,
671addeef82SBruce A. Mahbecause this makes the kernel-level implementation simpler,
672addeef82SBruce A. Mahand because practically we need only those two.
673addeef82SBruce A. MahFurther, the missing operations can be simulated by secondary
674addeef82SBruce A. Mahuser-level filtering of those <= and >= filters.
675addeef82SBruce A. MahFor example, to simulate !=, then we need to install filter
676addeef82SBruce A. Mah.Dq bw <= 0xffffffff ,
677addeef82SBruce A. Mahand after an
678addeef82SBruce A. Mahupcall is received, we need to check whether
679addeef82SBruce A. Mah.Dq measured_bw != expected_bw .
680addeef82SBruce A. Mah.It
681addeef82SBruce A. MahThe bandwidth-upcall mechanism is enabled by
682addeef82SBruce A. Mahsetsockopt(MRT_API_CONFIG) for the MRT_MFC_BW_UPCALL flag.
683addeef82SBruce A. Mah.It
684addeef82SBruce A. MahThe bandwidth-upcall filters are added/deleted by the new
685addeef82SBruce A. Mahsetsockopt(MRT_ADD_BW_UPCALL) and setsockopt(MRT_DEL_BW_UPCALL)
686addeef82SBruce A. Mahrespectively (with the appropriate
687addeef82SBruce A. Mah.Dq struct bw_upcall
688addeef82SBruce A. Mahargument of course).
689addeef82SBruce A. Mah.El
690addeef82SBruce A. Mah.Pp
691addeef82SBruce A. MahFrom application point of view, a developer needs to know about
692addeef82SBruce A. Mahthe following:
693addeef82SBruce A. Mah.Bd -literal
694addeef82SBruce A. Mah/*
695addeef82SBruce A. Mah * Structure for installing or delivering an upcall if the
696addeef82SBruce A. Mah * measured bandwidth is above or below a threshold.
697addeef82SBruce A. Mah *
698addeef82SBruce A. Mah * User programs (e.g. daemons) may have a need to know when the
699addeef82SBruce A. Mah * bandwidth used by some data flow is above or below some threshold.
700addeef82SBruce A. Mah * This interface allows the userland to specify the threshold (in
701addeef82SBruce A. Mah * bytes and/or packets) and the measurement interval. Flows are
702addeef82SBruce A. Mah * all packet with the same source and destination IP address.
703addeef82SBruce A. Mah * At the moment the code is only used for multicast destinations
704addeef82SBruce A. Mah * but there is nothing that prevents its use for unicast.
705addeef82SBruce A. Mah *
706addeef82SBruce A. Mah * The measurement interval cannot be shorter than some Tmin (currently, 3s).
707addeef82SBruce A. Mah * The threshold is set in packets and/or bytes per_interval.
708addeef82SBruce A. Mah *
709addeef82SBruce A. Mah * Measurement works as follows:
710addeef82SBruce A. Mah *
711addeef82SBruce A. Mah * For >= measurements:
712addeef82SBruce A. Mah * The first packet marks the start of a measurement interval.
713addeef82SBruce A. Mah * During an interval we count packets and bytes, and when we
714addeef82SBruce A. Mah * pass the threshold we deliver an upcall and we are done.
715addeef82SBruce A. Mah * The first packet after the end of the interval resets the
716addeef82SBruce A. Mah * count and restarts the measurement.
717addeef82SBruce A. Mah *
718addeef82SBruce A. Mah * For <= measurement:
719addeef82SBruce A. Mah * We start a timer to fire at the end of the interval, and
720addeef82SBruce A. Mah * then for each incoming packet we count packets and bytes.
721addeef82SBruce A. Mah * When the timer fires, we compare the value with the threshold,
722addeef82SBruce A. Mah * schedule an upcall if we are below, and restart the measurement
723addeef82SBruce A. Mah * (reschedule timer and zero counters).
724addeef82SBruce A. Mah */
725addeef82SBruce A. Mah
726addeef82SBruce A. Mahstruct bw_data {
727addeef82SBruce A. Mah        struct timeval  b_time;
728addeef82SBruce A. Mah        uint64_t        b_packets;
729addeef82SBruce A. Mah        uint64_t        b_bytes;
730addeef82SBruce A. Mah};
731addeef82SBruce A. Mah
732addeef82SBruce A. Mahstruct bw_upcall {
733addeef82SBruce A. Mah        struct in_addr  bu_src;         /* source address            */
734addeef82SBruce A. Mah        struct in_addr  bu_dst;         /* destination address       */
735addeef82SBruce A. Mah        uint32_t        bu_flags;       /* misc flags (see below)    */
736addeef82SBruce A. Mah#define BW_UPCALL_UNIT_PACKETS (1 << 0) /* threshold (in packets)    */
737addeef82SBruce A. Mah#define BW_UPCALL_UNIT_BYTES   (1 << 1) /* threshold (in bytes)      */
738addeef82SBruce A. Mah#define BW_UPCALL_GEQ          (1 << 2) /* upcall if bw >= threshold */
739addeef82SBruce A. Mah#define BW_UPCALL_LEQ          (1 << 3) /* upcall if bw <= threshold */
740addeef82SBruce A. Mah#define BW_UPCALL_DELETE_ALL   (1 << 4) /* delete all upcalls for s,d*/
741addeef82SBruce A. Mah        struct bw_data  bu_threshold;   /* the bw threshold          */
742addeef82SBruce A. Mah        struct bw_data  bu_measured;    /* the measured bw           */
743addeef82SBruce A. Mah};
744addeef82SBruce A. Mah
745addeef82SBruce A. Mah/* max. number of upcalls to deliver together */
746addeef82SBruce A. Mah#define BW_UPCALLS_MAX				128
747addeef82SBruce A. Mah/* min. threshold time interval for bandwidth measurement */
748addeef82SBruce A. Mah#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_SEC	3
749addeef82SBruce A. Mah#define BW_UPCALL_THRESHOLD_INTERVAL_MIN_USEC	0
750addeef82SBruce A. Mah.Ed
751addeef82SBruce A. Mah.Pp
752addeef82SBruce A. MahThe
753addeef82SBruce A. Mah.Dq bw_upcall
754addeef82SBruce A. Mahstructure is used as an argument to
755addeef82SBruce A. Mahsetsockopt(MRT_ADD_BW_UPCALL) and setsockopt(MRT_DEL_BW_UPCALL).
756addeef82SBruce A. MahEach setsockopt(MRT_ADD_BW_UPCALL) installs a filter in the kernel
757addeef82SBruce A. Mahfor the source and destination address in the
758addeef82SBruce A. Mah.Dq bw_upcall
759addeef82SBruce A. Mahargument,
760addeef82SBruce A. Mahand that filter will trigger an upcall according to the following
761addeef82SBruce A. Mahpseudo-algorithm:
762addeef82SBruce A. Mah.Bd -literal
763addeef82SBruce A. Mah if (bw_upcall_oper IS ">=") {
764addeef82SBruce A. Mah    if (((bw_upcall_unit & PACKETS == PACKETS) &&
765addeef82SBruce A. Mah         (measured_packets >= threshold_packets)) ||
766addeef82SBruce A. Mah        ((bw_upcall_unit & BYTES == BYTES) &&
767addeef82SBruce A. Mah         (measured_bytes >= threshold_bytes)))
768addeef82SBruce A. Mah       SEND_UPCALL("measured bandwidth is >= threshold");
769addeef82SBruce A. Mah  }
770addeef82SBruce A. Mah  if (bw_upcall_oper IS "<=" && measured_interval >= threshold_interval) {
771addeef82SBruce A. Mah    if (((bw_upcall_unit & PACKETS == PACKETS) &&
772addeef82SBruce A. Mah         (measured_packets <= threshold_packets)) ||
773addeef82SBruce A. Mah        ((bw_upcall_unit & BYTES == BYTES) &&
774addeef82SBruce A. Mah         (measured_bytes <= threshold_bytes)))
775addeef82SBruce A. Mah       SEND_UPCALL("measured bandwidth is <= threshold");
776addeef82SBruce A. Mah  }
777addeef82SBruce A. Mah.Ed
778addeef82SBruce A. Mah.Pp
779addeef82SBruce A. MahIn the same
780addeef82SBruce A. Mah.Dq bw_upcall
781addeef82SBruce A. Mahthe unit can be specified in both BYTES and PACKETS.
782addeef82SBruce A. MahHowever, the GEQ and LEQ flags are mutually exclusive.
783addeef82SBruce A. Mah.Pp
784addeef82SBruce A. MahBasically, an upcall is delivered if the measured bandwidth is >= or
785addeef82SBruce A. Mah<= the threshold bandwidth (within the specified measurement
786addeef82SBruce A. Mahinterval).
787addeef82SBruce A. MahFor practical reasons, the smallest value for the measurement
788addeef82SBruce A. Mahinterval is 3 seconds.
789addeef82SBruce A. MahIf smaller values are allowed, then the bandwidth
790addeef82SBruce A. Mahestimation may be less accurate, or the potentially very high frequency
791addeef82SBruce A. Mahof the generated upcalls may introduce too much overhead.
792addeef82SBruce A. MahFor the >= operation, the answer may be known before the end of
793addeef82SBruce A. Mah.Dq threshold_interval ,
794addeef82SBruce A. Mahtherefore the upcall may be delivered earlier.
795addeef82SBruce A. MahFor the <= operation however, we must wait
796addeef82SBruce A. Mahuntil the threshold interval has expired to know the answer.
797addeef82SBruce A. Mah.Pp
798addeef82SBruce A. MahExample of usage:
799addeef82SBruce A. Mah.Bd -literal
800addeef82SBruce A. Mahstruct bw_upcall bw_upcall;
801addeef82SBruce A. Mah/* Assign all bw_upcall fields as appropriate */
802addeef82SBruce A. Mahmemset(&bw_upcall, 0, sizeof(bw_upcall));
803addeef82SBruce A. Mahmemcpy(&bw_upcall.bu_src, &source, sizeof(bw_upcall.bu_src));
804addeef82SBruce A. Mahmemcpy(&bw_upcall.bu_dst, &group, sizeof(bw_upcall.bu_dst));
805addeef82SBruce A. Mahbw_upcall.bu_threshold.b_data = threshold_interval;
806addeef82SBruce A. Mahbw_upcall.bu_threshold.b_packets = threshold_packets;
807addeef82SBruce A. Mahbw_upcall.bu_threshold.b_bytes = threshold_bytes;
808addeef82SBruce A. Mahif (is_threshold_in_packets)
809addeef82SBruce A. Mah    bw_upcall.bu_flags |= BW_UPCALL_UNIT_PACKETS;
810addeef82SBruce A. Mahif (is_threshold_in_bytes)
811addeef82SBruce A. Mah    bw_upcall.bu_flags |= BW_UPCALL_UNIT_BYTES;
812addeef82SBruce A. Mahdo {
813addeef82SBruce A. Mah    if (is_geq_upcall) {
814addeef82SBruce A. Mah        bw_upcall.bu_flags |= BW_UPCALL_GEQ;
815addeef82SBruce A. Mah        break;
816addeef82SBruce A. Mah    }
817addeef82SBruce A. Mah    if (is_leq_upcall) {
818addeef82SBruce A. Mah        bw_upcall.bu_flags |= BW_UPCALL_LEQ;
819addeef82SBruce A. Mah        break;
820addeef82SBruce A. Mah    }
821addeef82SBruce A. Mah    return (ERROR);
822addeef82SBruce A. Mah} while (0);
823addeef82SBruce A. Mahsetsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_BW_UPCALL,
824addeef82SBruce A. Mah          (void *)&bw_upcall, sizeof(bw_upcall));
825addeef82SBruce A. Mah.Ed
826addeef82SBruce A. Mah.Pp
827addeef82SBruce A. MahTo delete a single filter, then use MRT_DEL_BW_UPCALL,
828addeef82SBruce A. Mahand the fields of bw_upcall must be set
829addeef82SBruce A. Mahexactly same as when MRT_ADD_BW_UPCALL was called.
830addeef82SBruce A. Mah.Pp
831addeef82SBruce A. MahTo delete all bandwidth filters for a given (S,G), then
832addeef82SBruce A. Mahonly the
833addeef82SBruce A. Mah.Dq bu_src
834addeef82SBruce A. Mahand
835addeef82SBruce A. Mah.Dq bu_dst
836addeef82SBruce A. Mahfields in
837addeef82SBruce A. Mah.Dq struct bw_upcall
838addeef82SBruce A. Mahneed to be set, and then just set only the
839addeef82SBruce A. Mah.Dq BW_UPCALL_DELETE_ALL
840addeef82SBruce A. Mahflag inside field
841addeef82SBruce A. Mah.Dq bw_upcall.bu_flags .
842addeef82SBruce A. Mah.Pp
843addeef82SBruce A. MahThe bandwidth upcalls are received by aggregating them in the new upcall
844addeef82SBruce A. Mahmessage:
845addeef82SBruce A. Mah.Bd -literal
846addeef82SBruce A. Mah#define IGMPMSG_BW_UPCALL  4  /* BW monitoring upcall */
847addeef82SBruce A. Mah.Ed
848addeef82SBruce A. Mah.Pp
849addeef82SBruce A. MahThis message is an array of
850addeef82SBruce A. Mah.Dq struct bw_upcall
851addeef82SBruce A. Mahelements (up to BW_UPCALLS_MAX = 128).
852addeef82SBruce A. MahThe upcalls are
853addeef82SBruce A. Mahdelivered when there are 128 pending upcalls, or when 1 second has
854addeef82SBruce A. Mahexpired since the previous upcall (whichever comes first).
855addeef82SBruce A. MahIn an
856addeef82SBruce A. Mah.Dq struct upcall
857addeef82SBruce A. Mahelement, the
858addeef82SBruce A. Mah.Dq bu_measured
859addeef82SBruce A. Mahfield is filled-in to
860addeef82SBruce A. Mahindicate the particular measured values.
861addeef82SBruce A. MahHowever, because of the way
862addeef82SBruce A. Mahthe particular intervals are measured, the user should be careful how
863addeef82SBruce A. Mahbu_measured.b_time is used.
864addeef82SBruce A. MahFor example, if the
865addeef82SBruce A. Mahfilter is installed to trigger an upcall if the number of packets
866addeef82SBruce A. Mahis >= 1, then
867addeef82SBruce A. Mah.Dq bu_measured
868addeef82SBruce A. Mahmay have a value of zero in the upcalls after the
869addeef82SBruce A. Mahfirst one, because the measured interval for >= filters is
870addeef82SBruce A. Mah.Dq clocked
871addeef82SBruce A. Mahby the forwarded packets.
872addeef82SBruce A. MahHence, this upcall mechanism should not be used for measuring
873addeef82SBruce A. Mahthe exact value of the bandwidth of the forwarded data.
874addeef82SBruce A. MahTo measure the exact bandwidth, the user would need to
875addeef82SBruce A. Mahget the forwarded packets statistics with the ioctl(SIOCGETSGCNT)
876addeef82SBruce A. Mahmechanism
877addeef82SBruce A. Mah(see the
878addeef82SBruce A. Mah.Sx Programming Guide
879addeef82SBruce A. Mahsection) .
880addeef82SBruce A. Mah.Pp
881addeef82SBruce A. MahNote that the upcalls for a filter are delivered until the specific
882addeef82SBruce A. Mahfilter is deleted, but no more frequently than once per
883addeef82SBruce A. Mah.Dq bu_threshold.b_time .
884addeef82SBruce A. MahFor example, if the filter is specified to
885addeef82SBruce A. Mahdeliver a signal if bw >= 1 packet, the first packet will trigger a
886addeef82SBruce A. Mahsignal, but the next upcall will be triggered no earlier than
887addeef82SBruce A. Mah.Dq bu_threshold.b_time
888addeef82SBruce A. Mahafter the previous upcall.
889addeef82SBruce A. Mah.Pp
890addeef82SBruce A. Mah.\"
891addeef82SBruce A. Mah.Sh SEE ALSO
892addeef82SBruce A. Mah.Xr getsockopt 2 ,
893addeef82SBruce A. Mah.Xr recvfrom 2 ,
894addeef82SBruce A. Mah.Xr recvmsg 2 ,
895addeef82SBruce A. Mah.Xr setsockopt 2 ,
896addeef82SBruce A. Mah.Xr socket 2 ,
897addeef82SBruce A. Mah.Xr icmp6 4 ,
898addeef82SBruce A. Mah.Xr inet 4 ,
899addeef82SBruce A. Mah.Xr inet6 4 ,
900addeef82SBruce A. Mah.Xr intro 4 ,
901addeef82SBruce A. Mah.Xr ip 4 ,
902addeef82SBruce A. Mah.Xr ip6 4 ,
903addeef82SBruce A. Mah.Xr pim 4
904addeef82SBruce A. Mah.\"
905addeef82SBruce A. Mah.Pp
906addeef82SBruce A. Mah.Sh AUTHORS
9075203edcdSRuslan Ermilov.An -nosplit
9085203edcdSRuslan ErmilovThe original multicast code was written by
9095203edcdSRuslan Ermilov.An David Waitzman
9105203edcdSRuslan Ermilov(BBN Labs),
911addeef82SBruce A. Mahand later modified by the following individuals:
9125203edcdSRuslan Ermilov.An Steve Deering
9135203edcdSRuslan Ermilov(Stanford),
9145203edcdSRuslan Ermilov.An Mark J. Steiglitz
9155203edcdSRuslan Ermilov(Stanford),
9165203edcdSRuslan Ermilov.An Van Jacobson
9175203edcdSRuslan Ermilov(LBL),
9185203edcdSRuslan Ermilov.An Ajit Thyagarajan
9195203edcdSRuslan Ermilov(PARC),
9205203edcdSRuslan Ermilov.An Bill Fenner
9215203edcdSRuslan Ermilov(PARC).
922addeef82SBruce A. MahThe IPv6 multicast support was implemented by the KAME project
923addeef82SBruce A. Mah(http://www.kame.net), and was based on the IPv4 multicast code.
924addeef82SBruce A. MahThe advanced multicast API and the multicast bandwidth
9255203edcdSRuslan Ermilovmonitoring were implemented by
9265203edcdSRuslan Ermilov.An Pavlin Radoslavov
9275203edcdSRuslan Ermilov(ICSI)
9285203edcdSRuslan Ermilovin collaboration with
9295203edcdSRuslan Ermilov.An Chris Brown
9305203edcdSRuslan Ermilov(NextHop).
931addeef82SBruce A. Mah.Pp
9325203edcdSRuslan ErmilovThis manual page was written by
9335203edcdSRuslan Ermilov.An Pavlin Radoslavov
9345203edcdSRuslan Ermilov(ICSI).
935