xref: /freebsd/lib/libsys/sendfile.2 (revision db33c6f3ae9d1231087710068ee4ea5398aacca7)
1.\" Copyright (c) 2003, David G. Lawrence
2.\" All rights reserved.
3.\"
4.\" Redistribution and use in source and binary forms, with or without
5.\" modification, are permitted provided that the following conditions
6.\" are met:
7.\" 1. Redistributions of source code must retain the above copyright
8.\"    notice unmodified, this list of conditions, and the following
9.\"    disclaimer.
10.\" 2. Redistributions in binary form must reproduce the above copyright
11.\"    notice, this list of conditions and the following disclaimer in the
12.\"    documentation and/or other materials provided with the distribution.
13.\"
14.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND
15.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
16.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
17.\" ARE DISCLAIMED.  IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE
18.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
19.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
20.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
21.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
22.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
23.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
24.\" SUCH DAMAGE.
25.\"
26.Dd March 30, 2020
27.Dt SENDFILE 2
28.Os
29.Sh NAME
30.Nm sendfile
31.Nd send a file to a socket
32.Sh LIBRARY
33.Lb libc
34.Sh SYNOPSIS
35.In sys/types.h
36.In sys/socket.h
37.In sys/uio.h
38.Ft int
39.Fo sendfile
40.Fa "int fd" "int s" "off_t offset" "size_t nbytes"
41.Fa "struct sf_hdtr *hdtr" "off_t *sbytes" "int flags"
42.Fc
43.Sh DESCRIPTION
44The
45.Fn sendfile
46system call
47sends a regular file or shared memory object specified by descriptor
48.Fa fd
49out a stream socket specified by descriptor
50.Fa s .
51.Pp
52The
53.Fa offset
54argument specifies where to begin in the file.
55Should
56.Fa offset
57fall beyond the end of file, the system will return
58success and report 0 bytes sent as described below.
59The
60.Fa nbytes
61argument specifies how many bytes of the file should be sent, with 0 having the special
62meaning of send until the end of file has been reached.
63.Pp
64An optional header and/or trailer can be sent before and after the file data by specifying
65a pointer to a
66.Vt "struct sf_hdtr" ,
67which has the following structure:
68.Pp
69.Bd -literal -offset indent -compact
70struct sf_hdtr {
71	struct iovec *headers;	/* pointer to header iovecs */
72	int hdr_cnt;		/* number of header iovecs */
73	struct iovec *trailers;	/* pointer to trailer iovecs */
74	int trl_cnt;		/* number of trailer iovecs */
75};
76.Ed
77.Pp
78The
79.Fa headers
80and
81.Fa trailers
82pointers, if
83.Pf non- Dv NULL ,
84point to arrays of
85.Vt "struct iovec"
86structures.
87See the
88.Fn writev
89system call for information on the iovec structure.
90The number of iovecs in these
91arrays is specified by
92.Fa hdr_cnt
93and
94.Fa trl_cnt .
95.Pp
96If
97.Pf non- Dv NULL ,
98the system will write the total number of bytes sent on the socket to the
99variable pointed to by
100.Fa sbytes .
101.Pp
102The least significant 16 bits of
103.Fa flags
104argument is a bitmap of these values:
105.Bl -tag -offset indent -width "SF_USER_READAHEAD"
106.It Dv SF_NODISKIO
107This flag causes
108.Nm
109to return
110.Er EBUSY
111instead of blocking when a busy page is encountered.
112This rare situation can happen if some other process is now working
113with the same region of the file.
114It is advised to retry the operation after a short period.
115.Pp
116Note that in older
117.Fx
118versions the
119.Dv SF_NODISKIO
120had slightly different notion.
121The flag prevented
122.Nm
123to run I/O operations in case if an invalid (not cached) page is encountered,
124thus avoiding blocking on I/O.
125Starting with
126.Fx 11
127.Nm
128sending files off the
129.Xr ffs 4
130filesystem does not block on I/O
131(see
132.Sx IMPLEMENTATION NOTES
133), so the condition no longer applies.
134However, it is safe if an application utilizes
135.Dv SF_NODISKIO
136and on
137.Er EBUSY
138performs the same action as it did in
139older
140.Fx
141versions, e.g.,
142.Xr aio_read 2 ,
143.Xr read 2
144or
145.Nm
146in a different context.
147.It Dv SF_NOCACHE
148The data sent to socket will not be cached by the virtual memory system,
149and will be freed directly to the pool of free pages.
150.It Dv SF_SYNC
151.Nm
152sleeps until the network stack no longer references the VM pages
153of the file, making subsequent modifications to it safe.
154Please note that this is not a guarantee that the data has actually
155been sent.
156.It Dv SF_USER_READAHEAD
157.Nm
158has some internal heuristics to do readahead when sending data.
159This flag forces
160.Nm
161to override any heuristically calculated readahead and use exactly the
162application specified readahead.
163See
164.Sx SETTING READAHEAD
165for more details on readahead.
166.El
167.Pp
168When using a socket marked for non-blocking I/O,
169.Fn sendfile
170may send fewer bytes than requested.
171In this case, the number of bytes successfully
172written is returned in
173.Fa *sbytes
174(if specified),
175and the error
176.Er EAGAIN
177is returned.
178.Sh SETTING READAHEAD
179.Nm
180uses internal heuristics based on request size and file system layout
181to do readahead.
182Additionally application may request extra readahead.
183The most significant 16 bits of
184.Fa flags
185specify amount of pages that
186.Nm
187may read ahead when reading the file.
188A macro
189.Fn SF_FLAGS
190is provided to combine readahead amount and flags.
191An example showing specifying readahead of 16 pages and
192.Dv SF_NOCACHE
193flag:
194.Pp
195.Bd -literal -offset indent -compact
196	SF_FLAGS(16, SF_NOCACHE)
197.Ed
198.Pp
199.Nm
200will use either application specified readahead or internally calculated,
201whichever is bigger.
202Setting flag
203.Dv SF_USER_READAHEAD
204would turn off any heuristics and set maximum possible readahead length to
205the number of pages specified via flags.
206.Sh IMPLEMENTATION NOTES
207The
208.Fx
209implementation of
210.Fn sendfile
211does not block on disk I/O when it sends a file off the
212.Xr ffs 4
213filesystem.
214The syscall returns success before the actual I/O completes, and data
215is put into the socket later unattended.
216However, the order of data in the socket is preserved, so it is safe
217to do further writes to the socket.
218.Pp
219The
220.Fx
221implementation of
222.Fn sendfile
223is "zero-copy", meaning that it has been optimized so that copying of the file data is avoided.
224.Sh TUNING
225.Ss physical paging buffers
226.Fn sendfile
227uses vnode pager to read file pages into memory.
228The pager uses a pool of physical buffers to run its I/O operations.
229When system runs out of pbufs, sendfile will block and report state
230.Dq Li zonelimit .
231Size of the pool can be tuned with
232.Va vm.vnode_pbufs
233.Xr loader.conf 5
234tunable and can be checked with
235.Xr sysctl 8
236OID of the same name at runtime.
237.Ss sendfile(2) buffers
238On some architectures, this system call internally uses a special
239.Fn sendfile
240buffer
241.Pq Vt "struct sf_buf"
242to handle sending file data to the client.
243If the sending socket is
244blocking, and there are not enough
245.Fn sendfile
246buffers available,
247.Fn sendfile
248will block and report a state of
249.Dq Li sfbufa .
250If the sending socket is non-blocking and there are not enough
251.Fn sendfile
252buffers available, the call will block and wait for the
253necessary buffers to become available before finishing the call.
254.Pp
255The number of
256.Vt sf_buf Ns 's
257allocated should be proportional to the number of nmbclusters used to
258send data to a client via
259.Fn sendfile .
260Tune accordingly to avoid blocking!
261Busy installations that make extensive use of
262.Fn sendfile
263may want to increase these values to be inline with their
264.Va kern.ipc.nmbclusters
265(see
266.Xr tuning 7
267for details).
268.Pp
269The number of
270.Fn sendfile
271buffers available is determined at boot time by either the
272.Va kern.ipc.nsfbufs
273.Xr loader.conf 5
274variable or the
275.Dv NSFBUFS
276kernel configuration tunable.
277The number of
278.Fn sendfile
279buffers scales with
280.Va kern.maxusers .
281The
282.Va kern.ipc.nsfbufsused
283and
284.Va kern.ipc.nsfbufspeak
285read-only
286.Xr sysctl 8
287variables show current and peak
288.Fn sendfile
289buffers usage respectively.
290These values may also be viewed through
291.Nm netstat Fl m .
292.Pp
293If
294.Xr sysctl 8
295OID
296.Va kern.ipc.nsfbufs
297doesn't exist, your architecture does not need to use
298.Fn sendfile
299buffers because their task can be efficiently performed
300by the generic virtual memory structures.
301.Sh RETURN VALUES
302.Rv -std sendfile
303.Sh ERRORS
304.Bl -tag -width Er
305.It Bq Er EAGAIN
306The socket is marked for non-blocking I/O and not all data was sent due to
307the socket buffer being filled.
308If specified, the number of bytes successfully sent will be returned in
309.Fa *sbytes .
310.It Bq Er EBADF
311The
312.Fa fd
313argument
314is not a valid file descriptor.
315.It Bq Er EBADF
316The
317.Fa s
318argument
319is not a valid socket descriptor.
320.It Bq Er EBUSY
321A busy page was encountered and
322.Dv SF_NODISKIO
323had been specified.
324Partial data may have been sent.
325.It Bq Er EFAULT
326An invalid address was specified for an argument.
327.It Bq Er EINTR
328A signal interrupted
329.Fn sendfile
330before it could be completed.
331If specified, the number
332of bytes successfully sent will be returned in
333.Fa *sbytes .
334.It Bq Er EINVAL
335The
336.Fa fd
337argument
338is not a regular file.
339.It Bq Er EINVAL
340The
341.Fa s
342argument
343is not a SOCK_STREAM type socket.
344.It Bq Er EINVAL
345The
346.Fa offset
347argument
348is negative.
349.It Bq Er EIO
350An error occurred while reading from
351.Fa fd .
352.It Bq Er EINTEGRITY
353Corrupted data was detected while reading from
354.Fa fd .
355.It Bq Er ENOTCAPABLE
356The
357.Fa fd
358or the
359.Fa s
360argument has insufficient rights.
361.It Bq Er ENOBUFS
362The system was unable to allocate an internal buffer.
363.It Bq Er ENOTCONN
364The
365.Fa s
366argument
367points to an unconnected socket.
368.It Bq Er ENOTSOCK
369The
370.Fa s
371argument
372is not a socket.
373.It Bq Er EOPNOTSUPP
374The file system for descriptor
375.Fa fd
376does not support
377.Fn sendfile .
378.It Bq Er EPIPE
379The socket peer has closed the connection.
380.El
381.Sh SEE ALSO
382.Xr netstat 1 ,
383.Xr open 2 ,
384.Xr send 2 ,
385.Xr socket 2 ,
386.Xr writev 2 ,
387.Xr loader.conf 5 ,
388.Xr tuning 7 ,
389.Xr sysctl 8
390.Rs
391.%A K. Elmeleegy
392.%A A. Chanda
393.%A A. L. Cox
394.%A W. Zwaenepoel
395.%T A Portable Kernel Abstraction for Low-Overhead Ephemeral Mapping Management
396.%J The Proceedings of the 2005 USENIX Annual Technical Conference
397.%P pp 223-236
398.%D 2005
399.Re
400.Sh HISTORY
401The
402.Fn sendfile
403system call
404first appeared in
405.Fx 3.0 .
406This manual page first appeared in
407.Fx 3.1 .
408In
409.Fx 10
410support for sending shared memory descriptors had been introduced.
411In
412.Fx 11
413a non-blocking implementation had been introduced.
414.Sh AUTHORS
415The initial implementation of
416.Fn sendfile
417system call
418and this manual page were written by
419.An David G. Lawrence Aq Mt dg@dglawrence.com .
420The
421.Fx 11
422implementation was written by
423.An Gleb Smirnoff Aq Mt glebius@FreeBSD.org .
424.Sh BUGS
425The
426.Fn sendfile
427system call will not fail, i.e., return
428.Dv -1
429and set
430.Va errno
431to
432.Er EFAULT ,
433if provided an invalid address for
434.Fa sbytes .
435The
436.Fn sendfile
437system call does not support SCTP sockets,
438it will return
439.Dv -1
440and set
441.Va errno
442to
443.Er EINVAL .
444