xref: /freebsd/lib/libsys/sendfile.2 (revision ae07a5805b1906f29e786f415d67bef334557bd3)
1.\" Copyright (c) 2003, David G. Lawrence
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice unmodified, this list of conditions, and the following
9.\"    disclaimer.
10.\" 2. Redistributions in binary form must reproduce the above copyright
11.\"    notice, this list of conditions and the following disclaimer in the
12.\"    documentation and/or other materials provided with the distribution.
13.\"
14.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
15.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
16.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
17.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
18.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
19.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
20.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
21.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
22.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
23.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
24.\" SUCH DAMAGE.
25.\"
26.Dd June 24, 2025
27.Dt SENDFILE 2
28.Os
29.Sh NAME
30.Nm sendfile
31.Nd send a file to a socket
32.Sh LIBRARY
33.Lb libc
34.Sh SYNOPSIS
35.In sys/types.h
36.In sys/socket.h
37.In sys/uio.h
38.Ft int
39.Fo sendfile
40.Fa "int fd" "int s" "off_t offset" "size_t nbytes"
41.Fa "struct sf_hdtr *hdtr" "off_t *sbytes" "int flags"
42.Fc
43.Sh DESCRIPTION
44The
45.Fn sendfile
46system call
47sends a regular file or shared memory object specified by descriptor
48.Fa fd
49out a stream socket specified by descriptor
50.Fa s .
51.Pp
52The
53.Fa offset
54argument specifies where to begin in the file.
55Should
56.Fa offset
57fall beyond the end of file, the system will return
58success and report 0 bytes sent as described below.
59The
60.Fa nbytes
61argument specifies how many bytes of the file should be sent, with 0 having the special
62meaning of send until the end of file has been reached.
63.Pp
64An optional header and/or trailer can be sent before and after the file data by specifying
65a pointer to a
66.Vt "struct sf_hdtr" ,
67which has the following structure:
68.Pp
69.Bd -literal -offset indent -compact
70struct sf_hdtr {
71	struct iovec *headers;	/* pointer to header iovecs */
72	int hdr_cnt;		/* number of header iovecs */
73	struct iovec *trailers;	/* pointer to trailer iovecs */
74	int trl_cnt;		/* number of trailer iovecs */
75};
76.Ed
77.Pp
78The
79.Fa headers
80and
81.Fa trailers
82pointers, if
83.Pf non- Dv NULL ,
84point to arrays of
85.Vt "struct iovec"
86structures.
87See the
88.Fn writev
89system call for information on the iovec structure.
90The number of iovecs in these
91arrays is specified by
92.Fa hdr_cnt
93and
94.Fa trl_cnt .
95.Pp
96If
97.Pf non- Dv NULL ,
98the system will write the total number of bytes sent on the socket to the
99variable pointed to by
100.Fa sbytes .
101.Pp
102The least significant 16 bits of
103.Fa flags
104argument is a bitmap of these values:
105.Bl -tag -offset indent -width "SF_USER_READAHEAD"
106.It Dv SF_NODISKIO
107This flag causes
108.Nm
109to return
110.Er EBUSY
111instead of blocking when a busy page is encountered.
112This rare situation can happen if some other process is now working
113with the same region of the file.
114It is advised to retry the operation after a short period.
115.Pp
116Note that in older
117.Fx
118versions the
119.Dv SF_NODISKIO
120had slightly different notion.
121The flag prevented
122.Nm
123to run I/O operations in case if an invalid (not cached) page is encountered,
124thus avoiding blocking on I/O.
125Starting with
126.Fx 11
127.Nm
128sending files off the
129.Xr ffs 4
130filesystem does not block on I/O
131(see
132.Sx IMPLEMENTATION NOTES
133), so the condition no longer applies.
134However, it is safe if an application utilizes
135.Dv SF_NODISKIO
136and on
137.Er EBUSY
138performs the same action as it did in
139older
140.Fx
141versions, e.g.,
142.Xr aio_read 2 ,
143.Xr read 2
144or
145.Nm
146in a different context.
147.It Dv SF_NOCACHE
148The data sent to socket will not be cached by the virtual memory system,
149and will be freed directly to the pool of free pages.
150.It Dv SF_USER_READAHEAD
151.Nm
152has some internal heuristics to do readahead when sending data.
153This flag forces
154.Nm
155to override any heuristically calculated readahead and use exactly the
156application specified readahead.
157See
158.Sx SETTING READAHEAD
159for more details on readahead.
160.El
161.Pp
162When using a socket marked for non-blocking I/O,
163.Fn sendfile
164may send fewer bytes than requested.
165In this case, the number of bytes successfully
166written is returned in
167.Fa *sbytes
168(if specified),
169and the error
170.Er EAGAIN
171is returned.
172.Sh SETTING READAHEAD
173.Nm
174uses internal heuristics based on request size and file system layout
175to do readahead.
176Additionally application may request extra readahead.
177The most significant 16 bits of
178.Fa flags
179specify amount of pages that
180.Nm
181may read ahead when reading the file.
182A macro
183.Fn SF_FLAGS
184is provided to combine readahead amount and flags.
185An example showing specifying readahead of 16 pages and
186.Dv SF_NOCACHE
187flag:
188.Pp
189.Bd -literal -offset indent -compact
190	SF_FLAGS(16, SF_NOCACHE)
191.Ed
192.Pp
193.Nm
194will use either application specified readahead or internally calculated,
195whichever is bigger.
196Setting flag
197.Dv SF_USER_READAHEAD
198would turn off any heuristics and set maximum possible readahead length to
199the number of pages specified via flags.
200.Sh IMPLEMENTATION NOTES
201The
202.Fx
203implementation of
204.Fn sendfile
205does not block on disk I/O when it sends a file off the
206.Xr ffs 4
207filesystem.
208The syscall returns success before the actual I/O completes, and data
209is put into the socket later unattended.
210However, the order of data in the socket is preserved, so it is safe
211to do further writes to the socket.
212.Pp
213The
214.Fx
215implementation of
216.Fn sendfile
217is "zero-copy", meaning that it has been optimized so that copying of the file data is avoided.
218.Sh TUNING
219.Ss physical paging buffers
220.Fn sendfile
221uses vnode pager to read file pages into memory.
222The pager uses a pool of physical buffers to run its I/O operations.
223When system runs out of pbufs, sendfile will block and report state
224.Dq Li zonelimit .
225Size of the pool can be tuned with
226.Va vm.vnode_pbufs
227.Xr loader.conf 5
228tunable and can be checked with
229.Xr sysctl 8
230OID of the same name at runtime.
231.Ss sendfile(2) buffers
232On some architectures, this system call internally uses a special
233.Fn sendfile
234buffer
235.Pq Vt "struct sf_buf"
236to handle sending file data to the client.
237If the sending socket is
238blocking, and there are not enough
239.Fn sendfile
240buffers available,
241.Fn sendfile
242will block and report a state of
243.Dq Li sfbufa .
244If the sending socket is non-blocking and there are not enough
245.Fn sendfile
246buffers available, the call will block and wait for the
247necessary buffers to become available before finishing the call.
248.Pp
249The number of
250.Vt sf_buf Ns 's
251allocated should be proportional to the number of nmbclusters used to
252send data to a client via
253.Fn sendfile .
254Tune accordingly to avoid blocking!
255Busy installations that make extensive use of
256.Fn sendfile
257may want to increase these values to be inline with their
258.Va kern.ipc.nmbclusters
259(see
260.Xr tuning 7
261for details).
262.Pp
263The number of
264.Fn sendfile
265buffers available is determined at boot time by either the
266.Va kern.ipc.nsfbufs
267.Xr loader.conf 5
268variable or the
269.Dv NSFBUFS
270kernel configuration tunable.
271The number of
272.Fn sendfile
273buffers scales with
274.Va kern.maxusers .
275The
276.Va kern.ipc.nsfbufsused
277and
278.Va kern.ipc.nsfbufspeak
279read-only
280.Xr sysctl 8
281variables show current and peak
282.Fn sendfile
283buffers usage respectively.
284These values may also be viewed through
285.Nm netstat Fl m .
286.Pp
287If
288.Xr sysctl 8
289OID
290.Va kern.ipc.nsfbufs
291doesn't exist, your architecture does not need to use
292.Fn sendfile
293buffers because their task can be efficiently performed
294by the generic virtual memory structures.
295.Sh RETURN VALUES
296.Rv -std sendfile
297.Sh ERRORS
298.Bl -tag -width Er
299.It Bq Er EAGAIN
300The socket is marked for non-blocking I/O and not all data was sent due to
301the socket buffer being filled.
302If specified, the number of bytes successfully sent will be returned in
303.Fa *sbytes .
304.It Bq Er EBADF
305The
306.Fa fd
307argument
308is not a valid file descriptor.
309.It Bq Er EBADF
310The
311.Fa s
312argument
313is not a valid socket descriptor.
314.It Bq Er EBUSY
315A busy page was encountered and
316.Dv SF_NODISKIO
317had been specified.
318Partial data may have been sent.
319.It Bq Er EFAULT
320An invalid address was specified for an argument.
321.It Bq Er EINTR
322A signal interrupted
323.Fn sendfile
324before it could be completed.
325If specified, the number
326of bytes successfully sent will be returned in
327.Fa *sbytes .
328.It Bq Er EINVAL
329The
330.Fa fd
331argument
332is not a regular file.
333.It Bq Er EINVAL
334The
335.Fa s
336argument
337is not a SOCK_STREAM type socket.
338.It Bq Er EINVAL
339The
340.Fa offset
341argument
342is negative.
343.It Bq Er EIO
344An error occurred while reading from
345.Fa fd .
346.It Bq Er EINTEGRITY
347Corrupted data was detected while reading from
348.Fa fd .
349.It Bq Er ENOTCAPABLE
350The
351.Fa fd
352or the
353.Fa s
354argument has insufficient rights.
355.It Bq Er ENOBUFS
356The system was unable to allocate an internal buffer.
357.It Bq Er ENOTCONN
358The
359.Fa s
360argument
361points to an unconnected socket.
362.It Bq Er ENOTSOCK
363The
364.Fa s
365argument
366is not a socket.
367.It Bq Er EOPNOTSUPP
368The file system for descriptor
369.Fa fd
370does not support
371.Fn sendfile .
372.It Bq Er EPIPE
373The socket peer has closed the connection.
374.El
375.Sh SEE ALSO
376.Xr netstat 1 ,
377.Xr open 2 ,
378.Xr send 2 ,
379.Xr socket 2 ,
380.Xr writev 2 ,
381.Xr loader.conf 5 ,
382.Xr tuning 7 ,
383.Xr sysctl 8
384.Rs
385.%A K. Elmeleegy
386.%A A. Chanda
387.%A A. L. Cox
388.%A W. Zwaenepoel
389.%T A Portable Kernel Abstraction for Low-Overhead Ephemeral Mapping Management
390.%J The Proceedings of the 2005 USENIX Annual Technical Conference
391.%P pp 223-236
392.%D 2005
393.Re
394.Sh HISTORY
395The
396.Fn sendfile
397system call
398first appeared in
399.Fx 3.0 .
400This manual page first appeared in
401.Fx 3.1 .
402In
403.Fx 10
404support for sending shared memory descriptors had been introduced.
405In
406.Fx 11
407a non-blocking implementation had been introduced.
408.Sh AUTHORS
409The initial implementation of
410.Fn sendfile
411system call
412and this manual page were written by
413.An David G. Lawrence Aq Mt dg@dglawrence.com .
414The
415.Fx 11
416implementation was written by
417.An Gleb Smirnoff Aq Mt glebius@FreeBSD.org .
418.Sh BUGS
419The
420.Fn sendfile
421system call will not fail, i.e., return
422.Dv -1
423and set
424.Va errno
425to
426.Er EFAULT ,
427if provided an invalid address for
428.Fa sbytes .
429The
430.Fn sendfile
431system call does not support SCTP sockets,
432it will return
433.Dv -1
434and set
435.Va errno
436to
437.Er EINVAL .
438