1.\" Copyright (c) 2003, David G. Lawrence 2.\" All rights reserved. 3.\" 4.\" Redistribution and use in source and binary forms, with or without 5.\" modification, are permitted provided that the following conditions 6.\" are met: 7.\" 1. Redistributions of source code must retain the above copyright 8.\" notice unmodified, this list of conditions, and the following 9.\" disclaimer. 10.\" 2. Redistributions in binary form must reproduce the above copyright 11.\" notice, this list of conditions and the following disclaimer in the 12.\" documentation and/or other materials provided with the distribution. 13.\" 14.\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND 15.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 17.\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE 18.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 20.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 21.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 22.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 23.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 24.\" SUCH DAMAGE. 25.\" 26.Dd March 30, 2020 27.Dt SENDFILE 2 28.Os 29.Sh NAME 30.Nm sendfile 31.Nd send a file to a socket 32.Sh LIBRARY 33.Lb libc 34.Sh SYNOPSIS 35.In sys/types.h 36.In sys/socket.h 37.In sys/uio.h 38.Ft int 39.Fo sendfile 40.Fa "int fd" "int s" "off_t offset" "size_t nbytes" 41.Fa "struct sf_hdtr *hdtr" "off_t *sbytes" "int flags" 42.Fc 43.Sh DESCRIPTION 44The 45.Fn sendfile 46system call 47sends a regular file or shared memory object specified by descriptor 48.Fa fd 49out a stream socket specified by descriptor 50.Fa s . 51.Pp 52The 53.Fa offset 54argument specifies where to begin in the file. 55Should 56.Fa offset 57fall beyond the end of file, the system will return 58success and report 0 bytes sent as described below. 59The 60.Fa nbytes 61argument specifies how many bytes of the file should be sent, with 0 having the special 62meaning of send until the end of file has been reached. 63.Pp 64An optional header and/or trailer can be sent before and after the file data by specifying 65a pointer to a 66.Vt "struct sf_hdtr" , 67which has the following structure: 68.Pp 69.Bd -literal -offset indent -compact 70struct sf_hdtr { 71 struct iovec *headers; /* pointer to header iovecs */ 72 int hdr_cnt; /* number of header iovecs */ 73 struct iovec *trailers; /* pointer to trailer iovecs */ 74 int trl_cnt; /* number of trailer iovecs */ 75}; 76.Ed 77.Pp 78The 79.Fa headers 80and 81.Fa trailers 82pointers, if 83.Pf non- Dv NULL , 84point to arrays of 85.Vt "struct iovec" 86structures. 87See the 88.Fn writev 89system call for information on the iovec structure. 90The number of iovecs in these 91arrays is specified by 92.Fa hdr_cnt 93and 94.Fa trl_cnt . 95.Pp 96If 97.Pf non- Dv NULL , 98the system will write the total number of bytes sent on the socket to the 99variable pointed to by 100.Fa sbytes . 101.Pp 102The least significant 16 bits of 103.Fa flags 104argument is a bitmap of these values: 105.Bl -tag -offset indent -width "SF_USER_READAHEAD" 106.It Dv SF_NODISKIO 107This flag causes 108.Nm 109to return 110.Er EBUSY 111instead of blocking when a busy page is encountered. 112This rare situation can happen if some other process is now working 113with the same region of the file. 114It is advised to retry the operation after a short period. 115.Pp 116Note that in older 117.Fx 118versions the 119.Dv SF_NODISKIO 120had slightly different notion. 121The flag prevented 122.Nm 123to run I/O operations in case if an invalid (not cached) page is encountered, 124thus avoiding blocking on I/O. 125Starting with 126.Fx 11 127.Nm 128sending files off the 129.Xr ffs 4 130filesystem does not block on I/O 131(see 132.Sx IMPLEMENTATION NOTES 133), so the condition no longer applies. 134However, it is safe if an application utilizes 135.Dv SF_NODISKIO 136and on 137.Er EBUSY 138performs the same action as it did in 139older 140.Fx 141versions, e.g., 142.Xr aio_read 2 , 143.Xr read 2 144or 145.Nm 146in a different context. 147.It Dv SF_NOCACHE 148The data sent to socket will not be cached by the virtual memory system, 149and will be freed directly to the pool of free pages. 150.It Dv SF_SYNC 151.Nm 152sleeps until the network stack no longer references the VM pages 153of the file, making subsequent modifications to it safe. 154Please note that this is not a guarantee that the data has actually 155been sent. 156.It Dv SF_USER_READAHEAD 157.Nm 158has some internal heuristics to do readahead when sending data. 159This flag forces 160.Nm 161to override any heuristically calculated readahead and use exactly the 162application specified readahead. 163See 164.Sx SETTING READAHEAD 165for more details on readahead. 166.El 167.Pp 168When using a socket marked for non-blocking I/O, 169.Fn sendfile 170may send fewer bytes than requested. 171In this case, the number of bytes successfully 172written is returned in 173.Fa *sbytes 174(if specified), 175and the error 176.Er EAGAIN 177is returned. 178.Sh SETTING READAHEAD 179.Nm 180uses internal heuristics based on request size and file system layout 181to do readahead. 182Additionally application may request extra readahead. 183The most significant 16 bits of 184.Fa flags 185specify amount of pages that 186.Nm 187may read ahead when reading the file. 188A macro 189.Fn SF_FLAGS 190is provided to combine readahead amount and flags. 191An example showing specifying readahead of 16 pages and 192.Dv SF_NOCACHE 193flag: 194.Pp 195.Bd -literal -offset indent -compact 196 SF_FLAGS(16, SF_NOCACHE) 197.Ed 198.Pp 199.Nm 200will use either application specified readahead or internally calculated, 201whichever is bigger. 202Setting flag 203.Dv SF_USER_READAHEAD 204would turn off any heuristics and set maximum possible readahead length to 205the number of pages specified via flags. 206.Sh IMPLEMENTATION NOTES 207The 208.Fx 209implementation of 210.Fn sendfile 211does not block on disk I/O when it sends a file off the 212.Xr ffs 4 213filesystem. 214The syscall returns success before the actual I/O completes, and data 215is put into the socket later unattended. 216However, the order of data in the socket is preserved, so it is safe 217to do further writes to the socket. 218.Pp 219The 220.Fx 221implementation of 222.Fn sendfile 223is "zero-copy", meaning that it has been optimized so that copying of the file data is avoided. 224.Sh TUNING 225.Ss physical paging buffers 226.Fn sendfile 227uses vnode pager to read file pages into memory. 228The pager uses a pool of physical buffers to run its I/O operations. 229When system runs out of pbufs, sendfile will block and report state 230.Dq Li zonelimit . 231Size of the pool can be tuned with 232.Va vm.vnode_pbufs 233.Xr loader.conf 5 234tunable and can be checked with 235.Xr sysctl 8 236OID of the same name at runtime. 237.Ss sendfile(2) buffers 238On some architectures, this system call internally uses a special 239.Fn sendfile 240buffer 241.Pq Vt "struct sf_buf" 242to handle sending file data to the client. 243If the sending socket is 244blocking, and there are not enough 245.Fn sendfile 246buffers available, 247.Fn sendfile 248will block and report a state of 249.Dq Li sfbufa . 250If the sending socket is non-blocking and there are not enough 251.Fn sendfile 252buffers available, the call will block and wait for the 253necessary buffers to become available before finishing the call. 254.Pp 255The number of 256.Vt sf_buf Ns 's 257allocated should be proportional to the number of nmbclusters used to 258send data to a client via 259.Fn sendfile . 260Tune accordingly to avoid blocking! 261Busy installations that make extensive use of 262.Fn sendfile 263may want to increase these values to be inline with their 264.Va kern.ipc.nmbclusters 265(see 266.Xr tuning 7 267for details). 268.Pp 269The number of 270.Fn sendfile 271buffers available is determined at boot time by either the 272.Va kern.ipc.nsfbufs 273.Xr loader.conf 5 274variable or the 275.Dv NSFBUFS 276kernel configuration tunable. 277The number of 278.Fn sendfile 279buffers scales with 280.Va kern.maxusers . 281The 282.Va kern.ipc.nsfbufsused 283and 284.Va kern.ipc.nsfbufspeak 285read-only 286.Xr sysctl 8 287variables show current and peak 288.Fn sendfile 289buffers usage respectively. 290These values may also be viewed through 291.Nm netstat Fl m . 292.Pp 293If 294.Xr sysctl 8 295OID 296.Va kern.ipc.nsfbufs 297doesn't exist, your architecture does not need to use 298.Fn sendfile 299buffers because their task can be efficiently performed 300by the generic virtual memory structures. 301.Sh RETURN VALUES 302.Rv -std sendfile 303.Sh ERRORS 304.Bl -tag -width Er 305.It Bq Er EAGAIN 306The socket is marked for non-blocking I/O and not all data was sent due to 307the socket buffer being filled. 308If specified, the number of bytes successfully sent will be returned in 309.Fa *sbytes . 310.It Bq Er EBADF 311The 312.Fa fd 313argument 314is not a valid file descriptor. 315.It Bq Er EBADF 316The 317.Fa s 318argument 319is not a valid socket descriptor. 320.It Bq Er EBUSY 321A busy page was encountered and 322.Dv SF_NODISKIO 323had been specified. 324Partial data may have been sent. 325.It Bq Er EFAULT 326An invalid address was specified for an argument. 327.It Bq Er EINTR 328A signal interrupted 329.Fn sendfile 330before it could be completed. 331If specified, the number 332of bytes successfully sent will be returned in 333.Fa *sbytes . 334.It Bq Er EINVAL 335The 336.Fa fd 337argument 338is not a regular file. 339.It Bq Er EINVAL 340The 341.Fa s 342argument 343is not a SOCK_STREAM type socket. 344.It Bq Er EINVAL 345The 346.Fa offset 347argument 348is negative. 349.It Bq Er EIO 350An error occurred while reading from 351.Fa fd . 352.It Bq Er EINTEGRITY 353Corrupted data was detected while reading from 354.Fa fd . 355.It Bq Er ENOTCAPABLE 356The 357.Fa fd 358or the 359.Fa s 360argument has insufficient rights. 361.It Bq Er ENOBUFS 362The system was unable to allocate an internal buffer. 363.It Bq Er ENOTCONN 364The 365.Fa s 366argument 367points to an unconnected socket. 368.It Bq Er ENOTSOCK 369The 370.Fa s 371argument 372is not a socket. 373.It Bq Er EOPNOTSUPP 374The file system for descriptor 375.Fa fd 376does not support 377.Fn sendfile . 378.It Bq Er EPIPE 379The socket peer has closed the connection. 380.El 381.Sh SEE ALSO 382.Xr netstat 1 , 383.Xr open 2 , 384.Xr send 2 , 385.Xr socket 2 , 386.Xr writev 2 , 387.Xr loader.conf 5 , 388.Xr tuning 7 , 389.Xr sysctl 8 390.Rs 391.%A K. Elmeleegy 392.%A A. Chanda 393.%A A. L. Cox 394.%A W. Zwaenepoel 395.%T A Portable Kernel Abstraction for Low-Overhead Ephemeral Mapping Management 396.%J The Proceedings of the 2005 USENIX Annual Technical Conference 397.%P pp 223-236 398.%D 2005 399.Re 400.Sh HISTORY 401The 402.Fn sendfile 403system call 404first appeared in 405.Fx 3.0 . 406This manual page first appeared in 407.Fx 3.1 . 408In 409.Fx 10 410support for sending shared memory descriptors had been introduced. 411In 412.Fx 11 413a non-blocking implementation had been introduced. 414.Sh AUTHORS 415The initial implementation of 416.Fn sendfile 417system call 418and this manual page were written by 419.An David G. Lawrence Aq Mt dg@dglawrence.com . 420The 421.Fx 11 422implementation was written by 423.An Gleb Smirnoff Aq Mt glebius@FreeBSD.org . 424.Sh BUGS 425The 426.Fn sendfile 427system call will not fail, i.e., return 428.Dv -1 429and set 430.Va errno 431to 432.Er EFAULT , 433if provided an invalid address for 434.Fa sbytes . 435The 436.Fn sendfile 437system call does not support SCTP sockets, 438it will return 439.Dv -1 440and set 441.Va errno 442to 443.Er EINVAL . 444