xref: /freebsd/share/man/man4/ip.4 (revision 02e9120893770924227138ba49df1edb3896112a)
1.\" Copyright (c) 1983, 1991, 1993
2.\"	The Regents of the University of California.  All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice, this list of conditions and the following disclaimer.
9.\" 2. Redistributions in binary form must reproduce the above copyright
10.\"    notice, this list of conditions and the following disclaimer in the
11.\"    documentation and/or other materials provided with the distribution.
12.\" 3. Neither the name of the University nor the names of its contributors
13.\"    may be used to endorse or promote products derived from this software
14.\"    without specific prior written permission.
15.\"
16.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
17.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
19.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
20.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
21.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
22.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
23.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
24.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
25.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
26.\" SUCH DAMAGE.
27.\"
28.Dd August 9, 2021
29.Dt IP 4
30.Os
31.Sh NAME
32.Nm ip
33.Nd Internet Protocol
34.Sh SYNOPSIS
35.In sys/types.h
36.In sys/socket.h
37.In netinet/in.h
38.Ft int
39.Fn socket AF_INET SOCK_RAW proto
40.Sh DESCRIPTION
41.Tn IP
42is the transport layer protocol used
43by the Internet protocol family.
44Options may be set at the
45.Tn IP
46level
47when using higher-level protocols that are based on
48.Tn IP
49(such as
50.Tn TCP
51and
52.Tn UDP ) .
53It may also be accessed
54through a
55.Dq raw socket
56when developing new protocols, or
57special-purpose applications.
58.Pp
59There are several
60.Tn IP-level
61.Xr setsockopt 2
62and
63.Xr getsockopt 2
64options.
65.Dv IP_OPTIONS
66may be used to provide
67.Tn IP
68options to be transmitted in the
69.Tn IP
70header of each outgoing packet
71or to examine the header options on incoming packets.
72.Tn IP
73options may be used with any socket type in the Internet family.
74The format of
75.Tn IP
76options to be sent is that specified by the
77.Tn IP
78protocol specification (RFC-791), with one exception:
79the list of addresses for Source Route options must include the first-hop
80gateway at the beginning of the list of gateways.
81The first-hop gateway address will be extracted from the option list
82and the size adjusted accordingly before use.
83To disable previously specified options,
84use a zero-length buffer:
85.Bd -literal
86setsockopt(s, IPPROTO_IP, IP_OPTIONS, NULL, 0);
87.Ed
88.Pp
89.Dv IP_TOS
90may be used to set the differential service codepoint (DSCP) and the
91explicit congestion notification (ECN) codepoint.
92Setting the ECN codepoint - the two least significant bits - on a
93socket using a transport protocol implementing ECN has no effect.
94.Pp
95.Dv IP_TTL
96configures the time-to-live (TTL) field in the
97.Tn IP
98header for
99.Dv SOCK_STREAM , SOCK_DGRAM ,
100and certain types of
101.Dv SOCK_RAW
102sockets.
103For example,
104.Bd -literal
105int tos = IPTOS_DSCP_EF;       /* see <netinet/ip.h> */
106setsockopt(s, IPPROTO_IP, IP_TOS, &tos, sizeof(tos));
107
108int ttl = 60;                   /* max = 255 */
109setsockopt(s, IPPROTO_IP, IP_TTL, &ttl, sizeof(ttl));
110.Ed
111.Pp
112.Dv IP_IPSEC_POLICY
113controls IPSec policy for sockets.
114For example,
115.Bd -literal
116const char *policy = "in ipsec ah/transport//require";
117char *buf = ipsec_set_policy(policy, strlen(policy));
118setsockopt(s, IPPROTO_IP, IP_IPSEC_POLICY, buf, ipsec_get_policylen(buf));
119.Ed
120.Pp
121.Dv IP_MINTTL
122may be used to set the minimum acceptable TTL a packet must have when
123received on a socket.
124All packets with a lower TTL are silently dropped.
125This option is only really useful when set to 255, preventing packets
126from outside the directly connected networks reaching local listeners
127on sockets.
128.Pp
129.Dv IP_DONTFRAG
130may be used to set the Don't Fragment flag on IP packets.
131Currently this option is respected only on
132.Xr udp 4
133and raw
134.Nm
135sockets, unless the
136.Dv IP_HDRINCL
137option has been set.
138On
139.Xr tcp 4
140sockets, the Don't Fragment flag is controlled by the Path
141MTU Discovery option.
142Sending a packet larger than the MTU size of the egress interface,
143determined by the destination address, returns an
144.Er EMSGSIZE
145error.
146.Pp
147If the
148.Dv IP_ORIGDSTADDR
149option is enabled on a
150.Dv SOCK_DGRAM
151socket,
152the
153.Xr recvmsg 2
154call will return the destination
155.Tn IP
156address and destination port for a
157.Tn UDP
158datagram.
159The
160.Vt msg_control
161field in the
162.Vt msghdr
163structure points to a buffer
164that contains a
165.Vt cmsghdr
166structure followed by the
167.Tn sockaddr_in
168structure.
169The
170.Vt cmsghdr
171fields have the following values:
172.Bd -literal
173cmsg_len = CMSG_LEN(sizeof(struct sockaddr_in))
174cmsg_level = IPPROTO_IP
175cmsg_type = IP_ORIGDSTADDR
176.Ed
177.Pp
178If the
179.Dv IP_RECVDSTADDR
180option is enabled on a
181.Dv SOCK_DGRAM
182socket,
183the
184.Xr recvmsg 2
185call will return the destination
186.Tn IP
187address for a
188.Tn UDP
189datagram.
190The
191.Vt msg_control
192field in the
193.Vt msghdr
194structure points to a buffer
195that contains a
196.Vt cmsghdr
197structure followed by the
198.Tn IP
199address.
200The
201.Vt cmsghdr
202fields have the following values:
203.Bd -literal
204cmsg_len = CMSG_LEN(sizeof(struct in_addr))
205cmsg_level = IPPROTO_IP
206cmsg_type = IP_RECVDSTADDR
207.Ed
208.Pp
209The source address to be used for outgoing
210.Tn UDP
211datagrams on a socket can be specified as ancillary data with a type code of
212.Dv IP_SENDSRCADDR .
213The msg_control field in the msghdr structure should point to a buffer
214that contains a
215.Vt cmsghdr
216structure followed by the
217.Tn IP
218address.
219The cmsghdr fields should have the following values:
220.Bd -literal
221cmsg_len = CMSG_LEN(sizeof(struct in_addr))
222cmsg_level = IPPROTO_IP
223cmsg_type = IP_SENDSRCADDR
224.Ed
225.Pp
226The socket should be either bound to
227.Dv INADDR_ANY
228and a local port, and the address supplied with
229.Dv IP_SENDSRCADDR
230shouldn't be
231.Dv INADDR_ANY ,
232or the socket should be bound to a local address and the address supplied with
233.Dv IP_SENDSRCADDR
234should be
235.Dv INADDR_ANY .
236In the latter case bound address is overridden via generic source address
237selection logic, which would choose IP address of interface closest to
238destination.
239.Pp
240For convenience,
241.Dv IP_SENDSRCADDR
242is defined to have the same value as
243.Dv IP_RECVDSTADDR ,
244so the
245.Dv IP_RECVDSTADDR
246control message from
247.Xr recvmsg 2
248can be used directly as a control message for
249.Xr sendmsg 2 .
250.\"
251.Pp
252If the
253.Dv IP_ONESBCAST
254option is enabled on a
255.Dv SOCK_DGRAM
256or a
257.Dv SOCK_RAW
258socket, the destination address of outgoing
259broadcast datagrams on that socket will be forced
260to the undirected broadcast address,
261.Dv INADDR_BROADCAST ,
262before transmission.
263This is in contrast to the default behavior of the
264system, which is to transmit undirected broadcasts
265via the first network interface with the
266.Dv IFF_BROADCAST
267flag set.
268.Pp
269This option allows applications to choose which
270interface is used to transmit an undirected broadcast
271datagram.
272For example, the following code would force an
273undirected broadcast to be transmitted via the interface
274configured with the broadcast address 192.168.2.255:
275.Bd -literal
276char msg[512];
277struct sockaddr_in sin;
278int onesbcast = 1;	/* 0 = disable (default), 1 = enable */
279
280setsockopt(s, IPPROTO_IP, IP_ONESBCAST, &onesbcast, sizeof(onesbcast));
281sin.sin_addr.s_addr = inet_addr("192.168.2.255");
282sin.sin_port = htons(1234);
283sendto(s, msg, sizeof(msg), 0, &sin, sizeof(sin));
284.Ed
285.Pp
286It is the application's responsibility to set the
287.Dv IP_TTL
288option
289to an appropriate value in order to prevent broadcast storms.
290The application must have sufficient credentials to set the
291.Dv SO_BROADCAST
292socket level option, otherwise the
293.Dv IP_ONESBCAST
294option has no effect.
295.Pp
296If the
297.Dv IP_BINDANY
298option is enabled on a
299.Dv SOCK_STREAM ,
300.Dv SOCK_DGRAM
301or a
302.Dv SOCK_RAW
303socket, one can
304.Xr bind 2
305to any address, even one not bound to any available network interface in the
306system.
307This functionality (in conjunction with special firewall rules) can be used for
308implementing a transparent proxy.
309The
310.Dv PRIV_NETINET_BINDANY
311privilege is needed to set this option.
312.Pp
313If the
314.Dv IP_RECVTTL
315option is enabled on a
316.Dv SOCK_DGRAM
317socket, the
318.Xr recvmsg 2
319call will return the
320.Tn IP
321.Tn TTL
322(time to live) field for a
323.Tn UDP
324datagram.
325The msg_control field in the msghdr structure points to a buffer
326that contains a cmsghdr structure followed by the
327.Tn TTL .
328The cmsghdr fields have the following values:
329.Bd -literal
330cmsg_len = CMSG_LEN(sizeof(u_char))
331cmsg_level = IPPROTO_IP
332cmsg_type = IP_RECVTTL
333.Ed
334.\"
335.Pp
336If the
337.Dv IP_RECVTOS
338option is enabled on a
339.Dv SOCK_DGRAM
340socket, the
341.Xr recvmsg 2
342call will return the
343.Tn IP
344.Tn TOS
345(type of service) field for a
346.Tn UDP
347datagram.
348The msg_control field in the msghdr structure points to a buffer
349that contains a cmsghdr structure followed by the
350.Tn TOS .
351The cmsghdr fields have the following values:
352.Bd -literal
353cmsg_len = CMSG_LEN(sizeof(u_char))
354cmsg_level = IPPROTO_IP
355cmsg_type = IP_RECVTOS
356.Ed
357.\"
358.Pp
359If the
360.Dv IP_RECVIF
361option is enabled on a
362.Dv SOCK_DGRAM
363socket, the
364.Xr recvmsg 2
365call returns a
366.Vt "struct sockaddr_dl"
367corresponding to the interface on which the
368packet was received.
369The
370.Va msg_control
371field in the
372.Vt msghdr
373structure points to a buffer that contains a
374.Vt cmsghdr
375structure followed by the
376.Vt "struct sockaddr_dl" .
377The
378.Vt cmsghdr
379fields have the following values:
380.Bd -literal
381cmsg_len = CMSG_LEN(sizeof(struct sockaddr_dl))
382cmsg_level = IPPROTO_IP
383cmsg_type = IP_RECVIF
384.Ed
385.Pp
386.Dv IP_PORTRANGE
387may be used to set the port range used for selecting a local port number
388on a socket with an unspecified (zero) port number.
389It has the following
390possible values:
391.Bl -tag -width IP_PORTRANGE_DEFAULT
392.It Dv IP_PORTRANGE_DEFAULT
393use the default range of values, normally
394.Dv IPPORT_HIFIRSTAUTO
395through
396.Dv IPPORT_HILASTAUTO .
397This is adjustable through the sysctl setting:
398.Va net.inet.ip.portrange.first
399and
400.Va net.inet.ip.portrange.last .
401.It Dv IP_PORTRANGE_HIGH
402use a high range of values, normally
403.Dv IPPORT_HIFIRSTAUTO
404and
405.Dv IPPORT_HILASTAUTO .
406This is adjustable through the sysctl setting:
407.Va net.inet.ip.portrange.hifirst
408and
409.Va net.inet.ip.portrange.hilast .
410.It Dv IP_PORTRANGE_LOW
411use a low range of ports, which are normally restricted to
412privileged processes on
413.Ux
414systems.
415The range is normally from
416.Dv IPPORT_RESERVED
417\- 1 down to
418.Li IPPORT_RESERVEDSTART
419in descending order.
420This is adjustable through the sysctl setting:
421.Va net.inet.ip.portrange.lowfirst
422and
423.Va net.inet.ip.portrange.lowlast .
424.El
425.Pp
426The range of privileged ports which only may be opened by
427root-owned processes may be modified by the
428.Va net.inet.ip.portrange.reservedlow
429and
430.Va net.inet.ip.portrange.reservedhigh
431sysctl settings.
432The values default to the traditional range,
4330 through
434.Dv IPPORT_RESERVED
435\- 1
436(0 through 1023), respectively.
437Note that these settings do not affect and are not accounted for in the
438use or calculation of the other
439.Va net.inet.ip.portrange
440values above.
441Changing these values departs from
442.Ux
443tradition and has security
444consequences that the administrator should carefully evaluate before
445modifying these settings.
446.Pp
447Ports are allocated at random within the specified port range in order
448to increase the difficulty of random spoofing attacks.
449In scenarios such as benchmarking, this behavior may be undesirable.
450In these cases,
451.Va net.inet.ip.portrange.randomized
452can be used to toggle randomization off.
453.Ss "Multicast Options"
454.Tn IP
455multicasting is supported only on
456.Dv AF_INET
457sockets of type
458.Dv SOCK_DGRAM
459and
460.Dv SOCK_RAW ,
461and only on networks where the interface
462driver supports multicasting.
463.Pp
464The
465.Dv IP_MULTICAST_TTL
466option changes the time-to-live (TTL)
467for outgoing multicast datagrams
468in order to control the scope of the multicasts:
469.Bd -literal
470u_char ttl;	/* range: 0 to 255, default = 1 */
471setsockopt(s, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, sizeof(ttl));
472.Ed
473.Pp
474Datagrams with a TTL of 1 are not forwarded beyond the local network.
475Multicast datagrams with a TTL of 0 will not be transmitted on any network,
476but may be delivered locally if the sending host belongs to the destination
477group and if multicast loopback has not been disabled on the sending socket
478(see below).
479Multicast datagrams with TTL greater than 1 may be forwarded
480to other networks if a multicast router is attached to the local network.
481.Pp
482For hosts with multiple interfaces, where an interface has not
483been specified for a multicast group membership,
484each multicast transmission is sent from the primary network interface.
485The
486.Dv IP_MULTICAST_IF
487option overrides the default for
488subsequent transmissions from a given socket:
489.Bd -literal
490struct in_addr addr;
491setsockopt(s, IPPROTO_IP, IP_MULTICAST_IF, &addr, sizeof(addr));
492.Ed
493.Pp
494where "addr" is the local
495.Tn IP
496address of the desired interface or
497.Dv INADDR_ANY
498to specify the default interface.
499.Pp
500To specify an interface by index, an instance of
501.Vt ip_mreqn
502may be passed instead.
503The
504.Vt imr_ifindex
505member should be set to the index of the desired interface,
506or 0 to specify the default interface.
507The kernel differentiates between these two structures by their size.
508.Pp
509The use of
510.Vt IP_MULTICAST_IF
511is
512.Em not recommended ,
513as multicast memberships are scoped to each
514individual interface.
515It is supported for legacy use only by applications,
516such as routing daemons, which expect to
517be able to transmit link-local IPv4 multicast datagrams (224.0.0.0/24)
518on multiple interfaces,
519without requesting an individual membership for each interface.
520.Pp
521.\"
522An interface's local IP address and multicast capability can
523be obtained via the
524.Dv SIOCGIFCONF
525and
526.Dv SIOCGIFFLAGS
527ioctls.
528Normal applications should not need to use this option.
529.Pp
530If a multicast datagram is sent to a group to which the sending host itself
531belongs (on the outgoing interface), a copy of the datagram is, by default,
532looped back by the IP layer for local delivery.
533The
534.Dv IP_MULTICAST_LOOP
535option gives the sender explicit control
536over whether or not subsequent datagrams are looped back:
537.Bd -literal
538u_char loop;	/* 0 = disable, 1 = enable (default) */
539setsockopt(s, IPPROTO_IP, IP_MULTICAST_LOOP, &loop, sizeof(loop));
540.Ed
541.Pp
542This option
543improves performance for applications that may have no more than one
544instance on a single host (such as a routing daemon), by eliminating
545the overhead of receiving their own transmissions.
546It should generally not
547be used by applications for which there may be more than one instance on a
548single host (such as a conferencing program) or for which the sender does
549not belong to the destination group (such as a time querying program).
550.Pp
551The sysctl setting
552.Va net.inet.ip.mcast.loop
553controls the default setting of the
554.Dv IP_MULTICAST_LOOP
555socket option for new sockets.
556.Pp
557A multicast datagram sent with an initial TTL greater than 1 may be delivered
558to the sending host on a different interface from that on which it was sent,
559if the host belongs to the destination group on that other interface.
560The loopback control option has no effect on such delivery.
561.Pp
562A host must become a member of a multicast group before it can receive
563datagrams sent to the group.
564To join a multicast group, use the
565.Dv IP_ADD_MEMBERSHIP
566option:
567.Bd -literal
568struct ip_mreqn mreqn;
569setsockopt(s, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreqn, sizeof(mreqn));
570.Ed
571.Pp
572where
573.Fa mreqn
574is the following structure:
575.Bd -literal
576struct ip_mreqn {
577    struct in_addr imr_multiaddr; /* IP multicast address of group */
578    struct in_addr imr_interface; /* local IP address of interface */
579    int            imr_ifindex;   /* interface index */
580}
581.Ed
582.Pp
583.Va imr_ifindex
584should be set to the index of a particular multicast-capable interface if
585the host is multihomed.
586If
587.Va imr_ifindex
588is non-zero, value of
589.Va imr_interface
590is ignored.
591Otherwise, if
592.Va imr_ifindex
593is 0, kernel will use IP address from
594.Va imr_interface
595to lookup the interface.
596Value of
597.Va imr_interface
598may be set to
599.Va INADDR_ANY
600to choose the default interface, although this is not recommended; this is
601considered to be the first interface corresponding to the default route.
602Otherwise, the first multicast-capable interface configured in the system
603will be used.
604.Pp
605Legacy
606.Vt "struct ip_mreq" ,
607that lacks
608.Va imr_ifindex
609field is also supported by
610.Dv IP_ADD_MEMBERSHIP
611setsockopt.
612In this case kernel would behave as if
613.Va imr_ifindex
614was set to zero:
615.Va imr_interface
616will be used to lookup interface.
617.Pp
618Prior to
619.Fx 7.0 ,
620if the
621.Va imr_interface
622member is within the network range
623.Li 0.0.0.0/8 ,
624it is treated as an interface index in the system interface MIB,
625as per the RIP Version 2 MIB Extension (RFC-1724).
626In versions of
627.Fx
628since 7.0, this behavior is no longer supported.
629Developers should
630instead use the RFC 3678 multicast source filter APIs; in particular,
631.Dv MCAST_JOIN_GROUP .
632.Pp
633Up to
634.Dv IP_MAX_MEMBERSHIPS
635memberships may be added on a single socket.
636Membership is associated with a single interface;
637programs running on multihomed hosts may need to
638join the same group on more than one interface.
639.Pp
640To drop a membership, use:
641.Bd -literal
642struct ip_mreq mreq;
643setsockopt(s, IPPROTO_IP, IP_DROP_MEMBERSHIP, &mreq, sizeof(mreq));
644.Ed
645.Pp
646where
647.Fa mreq
648contains the same values as used to add the membership.
649Memberships are dropped when the socket is closed or the process exits.
650.\" TODO: Update this piece when IPv4 source-address selection is implemented.
651.Pp
652The IGMP protocol uses the primary IP address of the interface
653as its identifier for group membership.
654This is the first IP address configured on the interface.
655If this address is removed or changed, the results are
656undefined, as the IGMP membership state will then be inconsistent.
657If multiple IP aliases are configured on the same interface,
658they will be ignored.
659.Pp
660This shortcoming was addressed in IPv6; MLDv2 requires
661that the unique link-local address for an interface is
662used to identify an MLDv2 listener.
663.Ss "Source-Specific Multicast Options"
664Since
665.Fx 8.0 ,
666the use of Source-Specific Multicast (SSM) is supported.
667These extensions require an IGMPv3 multicast router in order to
668make best use of them.
669If a legacy multicast router is present on the link,
670.Fx
671will simply downgrade to the version of IGMP spoken by the router,
672and the benefits of source filtering on the upstream link
673will not be present, although the kernel will continue to
674squelch transmissions from blocked sources.
675.Pp
676Each group membership on a socket now has a filter mode:
677.Bl -tag -width MCAST_EXCLUDE
678.It Dv MCAST_EXCLUDE
679Datagrams sent to this group are accepted,
680unless the source is in a list of blocked source addresses.
681.It Dv MCAST_INCLUDE
682Datagrams sent to this group are accepted
683only if the source is in a list of accepted source addresses.
684.El
685.Pp
686Groups joined using the legacy
687.Dv IP_ADD_MEMBERSHIP
688option are placed in exclusive-mode,
689and are able to request that certain sources are blocked or allowed.
690This is known as the
691.Em delta-based API .
692.Pp
693To block a multicast source on an existing group membership:
694.Bd -literal
695struct ip_mreq_source mreqs;
696setsockopt(s, IPPROTO_IP, IP_BLOCK_SOURCE, &mreqs, sizeof(mreqs));
697.Ed
698.Pp
699where
700.Fa mreqs
701is the following structure:
702.Bd -literal
703struct ip_mreq_source {
704    struct in_addr imr_multiaddr; /* IP multicast address of group */
705    struct in_addr imr_sourceaddr; /* IP address of source */
706    struct in_addr imr_interface; /* local IP address of interface */
707}
708.Ed
709.Va imr_sourceaddr
710should be set to the address of the source to be blocked.
711.Pp
712To unblock a multicast source on an existing group:
713.Bd -literal
714struct ip_mreq_source mreqs;
715setsockopt(s, IPPROTO_IP, IP_UNBLOCK_SOURCE, &mreqs, sizeof(mreqs));
716.Ed
717.Pp
718The
719.Dv IP_BLOCK_SOURCE
720and
721.Dv IP_UNBLOCK_SOURCE
722options are
723.Em not permitted
724for inclusive-mode group memberships.
725.Pp
726To join a multicast group in
727.Dv MCAST_INCLUDE
728mode with a single source,
729or add another source to an existing inclusive-mode membership:
730.Bd -literal
731struct ip_mreq_source mreqs;
732setsockopt(s, IPPROTO_IP, IP_ADD_SOURCE_MEMBERSHIP, &mreqs, sizeof(mreqs));
733.Ed
734.Pp
735To leave a single source from an existing group in inclusive mode:
736.Bd -literal
737struct ip_mreq_source mreqs;
738setsockopt(s, IPPROTO_IP, IP_DROP_SOURCE_MEMBERSHIP, &mreqs, sizeof(mreqs));
739.Ed
740If this is the last accepted source for the group, the membership
741will be dropped.
742.Pp
743The
744.Dv IP_ADD_SOURCE_MEMBERSHIP
745and
746.Dv IP_DROP_SOURCE_MEMBERSHIP
747options are
748.Em not accepted
749for exclusive-mode group memberships.
750However, both exclusive and inclusive mode memberships
751support the use of the
752.Em full-state API
753documented in RFC 3678.
754For management of source filter lists using this API,
755please refer to
756.Xr sourcefilter 3 .
757.Pp
758The sysctl settings
759.Va net.inet.ip.mcast.maxsocksrc
760and
761.Va net.inet.ip.mcast.maxgrpsrc
762are used to specify an upper limit on the number of per-socket and per-group
763source filter entries which the kernel may allocate.
764.\"-----------------------
765.Ss "Raw IP Sockets"
766Raw
767.Tn IP
768sockets are connectionless,
769and are normally used with the
770.Xr sendto 2
771and
772.Xr recvfrom 2
773calls, though the
774.Xr connect 2
775call may also be used to fix the destination for future
776packets (in which case the
777.Xr read 2
778or
779.Xr recv 2
780and
781.Xr write 2
782or
783.Xr send 2
784system calls may be used).
785.Pp
786If
787.Fa proto
788is 0, the default protocol
789.Dv IPPROTO_RAW
790is used for outgoing
791packets, and only incoming packets destined for that protocol
792are received.
793If
794.Fa proto
795is non-zero, that protocol number will be used on outgoing packets
796and to filter incoming packets.
797.Pp
798Outgoing packets automatically have an
799.Tn IP
800header prepended to
801them (based on the destination address and the protocol
802number the socket is created with),
803unless the
804.Dv IP_HDRINCL
805option has been set.
806Unlike in previous
807.Bx
808releases, incoming packets are received with
809.Tn IP
810header and options intact, leaving all fields in network byte order.
811.Pp
812.Dv IP_HDRINCL
813indicates the complete IP header is included with the data
814and may be used only with the
815.Dv SOCK_RAW
816type.
817.Bd -literal
818#include <netinet/in_systm.h>
819#include <netinet/ip.h>
820
821int hincl = 1;                  /* 1 = on, 0 = off */
822setsockopt(s, IPPROTO_IP, IP_HDRINCL, &hincl, sizeof(hincl));
823.Ed
824.Pp
825Unlike previous
826.Bx
827releases, the program must set all
828the fields of the IP header, including the following:
829.Bd -literal
830ip->ip_v = IPVERSION;
831ip->ip_hl = hlen >> 2;
832ip->ip_id = 0;  /* 0 means kernel set appropriate value */
833ip->ip_off = htons(offset);
834ip->ip_len = htons(len);
835.Ed
836.Pp
837The packet should be provided as is to be sent over wire.
838This implies all fields, including
839.Va ip_len
840and
841.Va ip_off
842to be in network byte order.
843See
844.Xr byteorder 3
845for more information on network byte order.
846If the
847.Va ip_id
848field is set to 0 then the kernel will choose an
849appropriate value.
850If the header source address is set to
851.Dv INADDR_ANY ,
852the kernel will choose an appropriate address.
853.Sh ERRORS
854A socket operation may fail with one of the following errors returned:
855.Bl -tag -width Er
856.It Bq Er EISCONN
857when trying to establish a connection on a socket which
858already has one, or when trying to send a datagram with the destination
859address specified and the socket is already connected;
860.It Bq Er ENOTCONN
861when trying to send a datagram, but
862no destination address is specified, and the socket has not been
863connected;
864.It Bq Er ENOBUFS
865when the system runs out of memory for
866an internal data structure;
867.It Bq Er EADDRNOTAVAIL
868when an attempt is made to create a
869socket with a network address for which no network interface
870exists.
871.It Bq Er EACCES
872when an attempt is made to create
873a raw IP socket by a non-privileged process.
874.El
875.Pp
876The following errors specific to
877.Tn IP
878may occur when setting or getting
879.Tn IP
880options:
881.Bl -tag -width Er
882.It Bq Er EINVAL
883An unknown socket option name was given.
884.It Bq Er EINVAL
885The IP option field was improperly formed;
886an option field was shorter than the minimum value
887or longer than the option buffer provided.
888.El
889.Pp
890The following errors may occur when attempting to send
891.Tn IP
892datagrams via a
893.Dq raw socket
894with the
895.Dv IP_HDRINCL
896option set:
897.Bl -tag -width Er
898.It Bq Er EINVAL
899The user-supplied
900.Va ip_len
901field was not equal to the length of the datagram written to the socket.
902.El
903.Sh SEE ALSO
904.Xr getsockopt 2 ,
905.Xr recv 2 ,
906.Xr send 2 ,
907.Xr byteorder 3 ,
908.Xr CMSG_DATA 3 ,
909.Xr sourcefilter 3 ,
910.Xr icmp 4 ,
911.Xr igmp 4 ,
912.Xr inet 4 ,
913.Xr intro 4 ,
914.Xr multicast 4
915.Rs
916.%A D. Thaler
917.%A B. Fenner
918.%A B. Quinn
919.%T "Socket Interface Extensions for Multicast Source Filters"
920.%N RFC 3678
921.%D Jan 2004
922.Re
923.Sh HISTORY
924The
925.Nm
926protocol appeared in
927.Bx 4.2 .
928The
929.Vt ip_mreqn
930structure appeared in
931.Tn Linux 2.4 .
932.Sh BUGS
933Before
934.Fx 10.0
935packets received on raw IP sockets had the
936.Va ip_hl
937subtracted from the
938.Va ip_len
939field.
940.Pp
941Before
942.Fx 11.0
943packets received on raw IP sockets had the
944.Va ip_len
945and
946.Va ip_off
947fields converted to host byte order.
948Packets written to raw IP sockets were expected to have
949.Va ip_len
950and
951.Va ip_off
952in host byte order.
953