Revision tags: release/11.3.0 |
|
#
82334850 |
| 29-Jun-2019 |
John Baldwin <jhb@FreeBSD.org> |
Add an external mbuf buffer type that holds multiple unmapped pages.
Unmapped mbufs allow sendfile to carry multiple pages of data in a single mbuf, without mapping those pages. It is a requirement
Add an external mbuf buffer type that holds multiple unmapped pages.
Unmapped mbufs allow sendfile to carry multiple pages of data in a single mbuf, without mapping those pages. It is a requirement for Netflix's in-kernel TLS, and provides a 5-10% CPU savings on heavy web serving workloads when used by sendfile, due to effectively compressing socket buffers by an order of magnitude, and hence reducing cache misses.
For this new external mbuf buffer type (EXT_PGS), the ext_buf pointer now points to a struct mbuf_ext_pgs structure instead of a data buffer. This structure contains an array of physical addresses (this reduces cache misses compared to an earlier version that stored an array of vm_page_t pointers). It also stores additional fields needed for in-kernel TLS such as the TLS header and trailer data that are currently unused. To more easily detect these mbufs, the M_NOMAP flag is set in m_flags in addition to M_EXT.
Various functions like m_copydata() have been updated to safely access packet contents (using uiomove_fromphys()), to make things like BPF safe.
NIC drivers advertise support for unmapped mbufs on transmit via a new IFCAP_NOMAP capability. This capability can be toggled via the new 'nomap' and '-nomap' ifconfig(8) commands. For NIC drivers that only transmit packet contents via DMA and use bus_dma, adding the capability to if_capabilities and if_capenable should be all that is required.
If a NIC does not support unmapped mbufs, they are converted to a chain of mapped mbufs (using sf_bufs to provide the mapping) in ip_output or ip6_output. If an unmapped mbuf requires software checksums, it is also converted to a chain of mapped mbufs before computing the checksum.
Submitted by: gallatin (earlier version) Reviewed by: gallatin, hselasky, rrs Discussed with: ae, kp (firewalls) Relnotes: yes Sponsored by: Netflix Differential Revision: https://reviews.freebsd.org/D20616
show more ...
|
#
0269ae4c |
| 06-Jun-2019 |
Alan Somers <asomers@FreeBSD.org> |
MFHead @348740
Sponsored by: The FreeBSD Foundation
|
#
e2e050c8 |
| 20-May-2019 |
Conrad Meyer <cem@FreeBSD.org> |
Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces hea
Extract eventfilter declarations to sys/_eventfilter.h
This allows replacing "sys/eventfilter.h" includes with "sys/_eventfilter.h" in other header files (e.g., sys/{bus,conf,cpu}.h) and reduces header pollution substantially.
EVENTHANDLER_DECLARE and EVENTHANDLER_LIST_DECLAREs were moved out of .c files into appropriate headers (e.g., sys/proc.h, powernv/opal.h).
As a side effect of reduced header pollution, many .c files and headers no longer contain needed definitions. The remainder of the patch addresses adding appropriate includes to fix those files.
LOCK_DEBUG and LOCK_FILE_LINE_ARG are moved to sys/_lock.h, as required by sys/mutex.h since r326106 (but silently protected by header pollution prior to this change).
No functional change (intended). Of course, any out of tree modules that relied on header pollution for sys/eventhandler.h, sys/lock.h, or sys/mutex.h inclusion need to be fixed. __FreeBSD_version has been bumped.
show more ...
|
Revision tags: release/12.0.0, release/11.2.0 |
|
#
8a36da99 |
| 27-Nov-2017 |
Pedro F. Giffuni <pfg@FreeBSD.org> |
sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone
sys/kern: adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I was using misidentified many licenses so this was mostly a manual - error prone - task.
The Software Package Data Exchange (SPDX) group provides a specification to make it easier for automated tools to detect and summarize well known opensource licenses. We are gradually adopting the specification, noting that the tags are considered only advisory and do not, in any way, superceed or replace the license texts.
show more ...
|
Revision tags: release/10.4.0 |
|
#
b754c279 |
| 13-Sep-2017 |
Navdeep Parhar <np@FreeBSD.org> |
MFH @ r323558.
|
#
5be4ad9e |
| 09-Sep-2017 |
Enji Cooper <ngie@FreeBSD.org> |
MFhead@r323343
|
#
51977281 |
| 29-Aug-2017 |
Warner Losh <imp@FreeBSD.org> |
Add CAM/NVMe support for CAM_DATA_SG
This adds support in pass(4) for data to be described with a scatter-gather list (sglist) to augment the existing (single) virtual address.
Differential Revisio
Add CAM/NVMe support for CAM_DATA_SG
This adds support in pass(4) for data to be described with a scatter-gather list (sglist) to augment the existing (single) virtual address.
Differential Revision: https://reviews.freebsd.org/D11361 Submitted by: Chuck Tuffli Reviewed by: imp@, scottl@, kenm@
show more ...
|
#
531c2d7a |
| 24-Jul-2017 |
Enji Cooper <ngie@FreeBSD.org> |
MFhead@r320180
|
#
bca9d05f |
| 23-Jul-2017 |
Hans Petter Selasky <hselasky@FreeBSD.org> |
Merge ^/head r319973 through 321382.
|
Revision tags: release/11.1.0 |
|
#
03f072d1 |
| 14-Jul-2017 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r320971 through r320993.
|
#
df424515 |
| 14-Jul-2017 |
Warner Losh <imp@FreeBSD.org> |
This adds CAM pass(4) support for NVMe IO's. Applications indicate the IO type (Admin or NVM) using XPT op-codes XPT_NVME_ADMIN or XPT_NVME_IO.
Submitted by: Chuck Tuffli <chuck@tuffli.net> Differ
This adds CAM pass(4) support for NVMe IO's. Applications indicate the IO type (Admin or NVM) using XPT op-codes XPT_NVME_ADMIN or XPT_NVME_IO.
Submitted by: Chuck Tuffli <chuck@tuffli.net> Differential Revision: https://reviews.freebsd.org/D10247
show more ...
|
Revision tags: release/11.0.1, release/11.0.0, release/10.3.0 |
|
#
009e81b1 |
| 22-Jan-2016 |
Bjoern A. Zeeb <bz@FreeBSD.org> |
MFH @r294567
|
#
ea2c42d8 |
| 13-Jan-2016 |
Dimitry Andric <dim@FreeBSD.org> |
Merge ^/head r293686 through r293849.
|
#
e6068002 |
| 12-Jan-2016 |
Glen Barber <gjb@FreeBSD.org> |
MFH
Sponsored by: The FreeBSD Foundation
|
#
c0ada037 |
| 11-Jan-2016 |
Colin Percival <cperciva@FreeBSD.org> |
Fix a bug introduced in r291716:
"The problem with the approach taken both in _bus_dmamap_load_pages and bus_dmamap_load_ma_triv is that they split the request buffer into arbitrary chunks based on
Fix a bug introduced in r291716:
"The problem with the approach taken both in _bus_dmamap_load_pages and bus_dmamap_load_ma_triv is that they split the request buffer into arbitrary chunks based on page boundaries, creating segments that no longer have a size that's a multiple of the sector size. This breaks drivers like blkfront (and probably other stuff)." [1]
This was most easily triggered by running `fsck /` on a system running in Xen (e.g. Amazon EC2) but also showed up via growfs(8) and probably many other userland tools which access the disk directly.
Patch by: royger [1] "Thinks this should be fine" by: ken
show more ...
|
#
b626f5a7 |
| 04-Jan-2016 |
Glen Barber <gjb@FreeBSD.org> |
MFH r289384-r293170
Sponsored by: The FreeBSD Foundation
|
#
6c43c26f |
| 04-Dec-2015 |
Navdeep Parhar <np@FreeBSD.org> |
Catch up with head.
|
#
a9934668 |
| 03-Dec-2015 |
Kenneth D. Merry <ken@FreeBSD.org> |
Add asynchronous command support to the pass(4) driver, and the new camdd(8) utility.
CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and completed CCBs may be retrieved via the CAMIO
Add asynchronous command support to the pass(4) driver, and the new camdd(8) utility.
CCBs may be queued to the driver via the new CAMIOQUEUE ioctl, and completed CCBs may be retrieved via the CAMIOGET ioctl. User processes can use poll(2) or kevent(2) to get notification when I/O has completed.
While the existing CAMIOCOMMAND blocking ioctl interface only supports user virtual data pointers in a CCB (generally only one per CCB), the new CAMIOQUEUE ioctl supports user virtual and physical address pointers, as well as user virtual and physical scatter/gather lists. This allows user applications to have more flexibility in their data handling operations.
Kernel memory for data transferred via the queued interface is allocated from the zone allocator in MAXPHYS sized chunks, and user data is copied in and out. This is likely faster than the vmapbuf()/vunmapbuf() method used by the CAMIOCOMMAND ioctl in configurations with many processors (there are more TLB shootdowns caused by the mapping/unmapping operation) but may not be as fast as running with unmapped I/O.
The new memory handling model for user requests also allows applications to send CCBs with request sizes that are larger than MAXPHYS. The pass(4) driver now limits queued requests to the I/O size listed by the SIM driver in the maxio field in the Path Inquiry (XPT_PATH_INQ) CCB.
There are some things things would be good to add:
1. Come up with a way to do unmapped I/O on multiple buffers. Currently the unmapped I/O interface operates on a struct bio, which includes only one address and length. It would be nice to be able to send an unmapped scatter/gather list down to busdma. This would allow eliminating the copy we currently do for data.
2. Add an ioctl to list currently outstanding CCBs in the various queues.
3. Add an ioctl to cancel a request, or use the XPT_ABORT CCB to do that.
4. Test physical address support. Virtual pointers and scatter gather lists have been tested, but I have not yet tested physical addresses or scatter/gather lists.
5. Investigate multiple queue support. At the moment there is one queue of commands per pass(4) device. If multiple processes open the device, they will submit I/O into the same queue and get events for the same completions. This is probably the right model for most applications, but it is something that could be changed later on.
Also, add a new utility, camdd(8) that uses the asynchronous pass(4) driver interface.
This utility is intended to be a basic data transfer/copy utility, a simple benchmark utility, and an example of how to use the asynchronous pass(4) interface.
It can copy data to and from pass(4) devices using any target queue depth, starting offset and blocksize for the input and ouptut devices. It currently only supports SCSI devices, but could be easily extended to support ATA devices.
It can also copy data to and from regular files, block devices, tape devices, pipes, stdin, and stdout. It does not support queueing multiple commands to any of those targets, since it uses the standard read(2)/write(2)/writev(2)/readv(2) system calls.
The I/O is done by two threads, one for the reader and one for the writer. The reader thread sends completed read requests to the writer thread in strictly sequential order, even if they complete out of order. That could be modified later on for random I/O patterns or slightly out of order I/O.
camdd(8) uses kqueue(2)/kevent(2) to get I/O completion events from the pass(4) driver and also to send request notifications internally.
For pass(4) devcies, camdd(8) uses a single buffer (CAM_DATA_VADDR) per CAM CCB on the reading side, and a scatter/gather list (CAM_DATA_SG) on the writing side. In addition to testing both interfaces, this makes any potential reblocking of I/O easier. No data is copied between the reader and the writer, but rather the reader's buffers are split into multiple I/O requests or combined into a single I/O request depending on the input and output blocksize.
For the file I/O path, camdd(8) also uses a single buffer (read(2), write(2), pread(2) or pwrite(2)) on reads, and a scatter/gather list (readv(2), writev(2), preadv(2), pwritev(2)) on writes.
Things that would be nice to do for camdd(8) eventually:
1. Add support for I/O pattern generation. Patterns like all zeros, all ones, LBA-based patterns, random patterns, etc. Right Now you can always use /dev/zero, /dev/random, etc.
2. Add support for a "sink" mode, so we do only reads with no writes. Right now, you can use /dev/null.
3. Add support for automatic queue depth probing, so that we can figure out the right queue depth on the input and output side for maximum throughput. At the moment it defaults to 6.
4. Add support for SATA device passthrough I/O.
5. Add support for random LBAs and/or lengths on the input and output sides.
6. Track average per-I/O latency and busy time. The busy time and latency could also feed in to the automatic queue depth determination.
sys/cam/scsi/scsi_pass.h: Define two new ioctls, CAMIOQUEUE and CAMIOGET, that queue and fetch asynchronous CAM CCBs respectively.
Although these ioctls do not have a declared argument, they both take a union ccb pointer. If we declare a size here, the ioctl code in sys/kern/sys_generic.c will malloc and free a buffer for either the CCB or the CCB pointer (depending on how it is declared). Since we have to keep a copy of the CCB (which is fairly large) anyway, having the ioctl malloc and free a CCB for each call is wasteful.
sys/cam/scsi/scsi_pass.c: Add asynchronous CCB support.
Add two new ioctls, CAMIOQUEUE and CAMIOGET.
CAMIOQUEUE adds a CCB to the incoming queue. The CCB is executed immediately (and moved to the active queue) if it is an immediate CCB, but otherwise it will be executed in passstart() when a CCB is available from the transport layer.
When CCBs are completed (because they are immediate or passdone() if they are queued), they are put on the done queue.
If we get the final close on the device before all pending I/O is complete, all active I/O is moved to the abandoned queue and we increment the peripheral reference count so that the peripheral driver instance doesn't go away before all pending I/O is done.
The new passcreatezone() function is called on the first call to the CAMIOQUEUE ioctl on a given device to allocate the UMA zones for I/O requests and S/G list buffers. This may be good to move off to a taskqueue at some point. The new passmemsetup() function allocates memory and scatter/gather lists to hold the user's data, and copies in any data that needs to be written. For virtual pointers (CAM_DATA_VADDR), the kernel buffer is malloced from the new pass(4) driver malloc bucket. For virtual scatter/gather lists (CAM_DATA_SG), buffers are allocated from a new per-pass(9) UMA zone in MAXPHYS-sized chunks. Physical pointers are passed in unchanged. We have support for up to 16 scatter/gather segments (for the user and kernel S/G lists) in the default struct pass_io_req, so requests with longer S/G lists require an extra kernel malloc.
The new passcopysglist() function copies a user scatter/gather list to a kernel scatter/gather list. The number of elements in each list may be different, but (obviously) the amount of data stored has to be identical.
The new passmemdone() function copies data out for the CAM_DATA_VADDR and CAM_DATA_SG cases.
The new passiocleanup() function restores data pointers in user CCBs and frees memory.
Add new functions to support kqueue(2)/kevent(2):
passreadfilt() tells kevent whether or not the done queue is empty.
passkqfilter() adds a knote to our list.
passreadfiltdetach() removes a knote from our list.
Add a new function, passpoll(), for poll(2)/select(2) to use.
Add devstat(9) support for the queued CCB path.
sys/cam/ata/ata_da.c: Add support for the BIO_VLIST bio type.
sys/cam/cam_ccb.h: Add a new enumeration for the xflags field in the CCB header. (This doesn't change the CCB header, just adds an enumeration to use.)
sys/cam/cam_xpt.c: Add a new function, xpt_setup_ccb_flags(), that allows specifying CCB flags.
sys/cam/cam_xpt.h: Add a prototype for xpt_setup_ccb_flags().
sys/cam/scsi/scsi_da.c: Add support for BIO_VLIST.
sys/dev/md/md.c: Add BIO_VLIST support to md(4).
sys/geom/geom_disk.c: Add BIO_VLIST support to the GEOM disk class. Re-factor the I/O size limiting code in g_disk_start() a bit.
sys/kern/subr_bus_dma.c: Change _bus_dmamap_load_vlist() to take a starting offset and length.
Add a new function, _bus_dmamap_load_pages(), that will load a list of physical pages starting at an offset.
Update _bus_dmamap_load_bio() to allow loading BIO_VLIST bios. Allow unmapped I/O to start at an offset.
sys/kern/subr_uio.c: Add two new functions, physcopyin_vlist() and physcopyout_vlist().
sys/pc98/include/bus.h: Guard kernel-only parts of the pc98 machine/bus.h header with #ifdef _KERNEL.
This allows userland programs to include <machine/bus.h> to get the definition of bus_addr_t and bus_size_t.
sys/sys/bio.h: Add a new bio flag, BIO_VLIST.
sys/sys/uio.h: Add prototypes for physcopyin_vlist() and physcopyout_vlist().
share/man/man4/pass.4: Document the CAMIOQUEUE and CAMIOGET ioctls.
usr.sbin/Makefile: Add camdd.
usr.sbin/camdd/Makefile: Add a makefile for camdd(8).
usr.sbin/camdd/camdd.8: Man page for camdd(8).
usr.sbin/camdd/camdd.c: The new camdd(8) utility.
Sponsored by: Spectra Logic MFC after: 1 week
show more ...
|
Revision tags: release/10.2.0, release/10.1.0, release/9.3.0 |
|
#
3b8f0845 |
| 28-Apr-2014 |
Simon J. Gerraty <sjg@FreeBSD.org> |
Merge head
|
#
84e51a1b |
| 23-Apr-2014 |
Alan Somers <asomers@FreeBSD.org> |
IFC @264767
|
#
485ac45a |
| 04-Feb-2014 |
Peter Grehan <grehan@FreeBSD.org> |
MFC @ r259205 in preparation for some SVM updates. (for real this time)
|
Revision tags: release/10.0.0 |
|
#
f9b2a21c |
| 31-Oct-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Merge head r232040 through r257457. M usr.sbin/portsnap/portsnap/portsnap.8 M usr.sbin/portsnap/portsnap/portsnap.sh M usr.sbin/tcpdump/tcpdump/Makefile
|
#
80938e75 |
| 27-Oct-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
Add bus_dmamap_load_ma() function to load map with the array of vm_pages. Provide trivial implementation which forwards the load to _bus_dmamap_load_phys() page by page. Right now all architectures
Add bus_dmamap_load_ma() function to load map with the array of vm_pages. Provide trivial implementation which forwards the load to _bus_dmamap_load_phys() page by page. Right now all architectures use bus_dmamap_load_ma_triv().
Tested by: pho (as part of the functional patch) Sponsored by: The FreeBSD Foundation MFC after: 1 month
show more ...
|
Revision tags: release/9.2.0 |
|
#
d1d01586 |
| 05-Sep-2013 |
Simon J. Gerraty <sjg@FreeBSD.org> |
Merge from head
|
#
40f65a4d |
| 07-Aug-2013 |
Peter Grehan <grehan@FreeBSD.org> |
IFC @ r254014
|