#
c28078e9 |
| 23-Oct-2013 |
Steven Hartland <smh@FreeBSD.org> |
Improve ZFS N-way mirror read performance by using load and locality information.
The existing algorithm selects a preferred leaf vdev based on offset of the zio request modulo the number of members
Improve ZFS N-way mirror read performance by using load and locality information.
The existing algorithm selects a preferred leaf vdev based on offset of the zio request modulo the number of members in the mirror. It assumes the devices are of equal performance and that spreading the requests randomly over both drives will be sufficient to saturate them. In practice this results in the leaf vdevs being under utilized.
The new algorithm takes into the following additional factors: * Load of the vdevs (number outstanding I/O requests) * The locality of last queued I/O vs the new I/O request.
Within the locality calculation additional knowledge about the underlying vdev is considered such as; is the device backing the vdev a rotating media device.
This results in performance increases across the board as well as significant increases for predominantly streaming loads and for configurations which don't have evenly performing devices.
The following are results from a setup with 3 Way Mirror with 2 x HD's and 1 x SSD from a basic test running multiple parrallel dd's.
With pre-fetch disabled (vfs.zfs.prefetch_disable=1):
== Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 161 seconds @ 95 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 297 seconds @ 51 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 54 seconds @ 284 MB/s
With pre-fetch enabled (vfs.zfs.prefetch_disable=0):
== Stripe Balanced (default) == Read 15360MB using bs: 1048576, readers: 3, took 91 seconds @ 168 MB/s == Load Balanced (zfslinux) == Read 15360MB using bs: 1048576, readers: 3, took 108 seconds @ 142 MB/s == Load Balanced (locality freebsd) == Read 15360MB using bs: 1048576, readers: 3, took 48 seconds @ 320 MB/s
In addition to the performance changes the code was also restructured, with the help of Justin Gibbs, to provide a more logical flow which also ensures vdevs loads are only calculated from the set of valid candidates.
The following additional sysctls where added to allow the administrator to tune the behaviour of the load algorithm: * vfs.zfs.vdev.mirror.rotating_inc * vfs.zfs.vdev.mirror.rotating_seek_inc * vfs.zfs.vdev.mirror.rotating_seek_offset * vfs.zfs.vdev.mirror.non_rotating_inc * vfs.zfs.vdev.mirror.non_rotating_seek_inc
These changes where based on work started by the zfsonlinux developers: https://github.com/zfsonlinux/zfs/pull/1487
Reviewed by: gibbs, mav, will MFC after: 2 weeks Sponsored by: Multiplay
show more ...
|
#
1a29adad |
| 22-Oct-2013 |
Alexander Motin <mav@FreeBSD.org> |
Remove Giant-locked drivers support (DISKFLAG_NEEDSGIANT flag) from disk(9).
Since at least FreeBSD 7 we had only four of them in the base tree, and in head branch, thanks to jhb@, we have no any fo
Remove Giant-locked drivers support (DISKFLAG_NEEDSGIANT flag) from disk(9).
Since at least FreeBSD 7 we had only four of them in the base tree, and in head branch, thanks to jhb@, we have no any for more then a year.
show more ...
|
#
40ea77a0 |
| 22-Oct-2013 |
Alexander Motin <mav@FreeBSD.org> |
Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in
Merge GEOM direct dispatch changes from the projects/camlock branch.
When safety requirements are met, it allows to avoid passing I/O requests to GEOM g_up/g_down thread, executing them directly in the caller context. That allows to avoid CPU bottlenecks in g_up/g_down threads, plus avoid several context switches per I/O.
The defined now safety requirements are: - caller should not hold any locks and should be reenterable; - callee should not depend on GEOM dual-threaded concurency semantics; - on the way down, if request is unmapped while callee doesn't support it, the context should be sleepable; - kernel thread stack usage should be below 50%.
To keep compatibility with GEOM classes not meeting above requirements new provider and consumer flags added: - G_CF_DIRECT_SEND -- consumer code meets caller requirements (request); - G_CF_DIRECT_RECEIVE -- consumer code meets callee requirements (done); - G_PF_DIRECT_SEND -- provider code meets caller requirements (done); - G_PF_DIRECT_RECEIVE -- provider code meets callee requirements (request). Capable GEOM class can set them, allowing direct dispatch in cases where it is safe. If any of requirements are not met, request is queued to g_up or g_down thread same as before.
Such GEOM classes were reviewed and updated to support direct dispatch: CONCAT, DEV, DISK, GATE, MD, MIRROR, MULTIPATH, NOP, PART, RAID, STRIPE, VFS, ZERO, ZFS::VDEV, ZFS::ZVOL, all classes based on g_slice KPI (LABEL, MAP, FLASHMAP, etc).
To declare direct completion capability disk(9) KPI got new flag equivalent to G_PF_DIRECT_SEND -- DISKFLAG_DIRECT_COMPLETION. da(4) and ada(4) disk drivers got it set now thanks to earlier CAM locking work.
This change more then twice increases peak block storage performance on systems with manu CPUs, together with earlier CAM locking changes reaching more then 1 million IOPS (512 byte raw reads from 16 SATA SSDs on 4 HBAs to 256 user-level threads).
Sponsored by: iXsystems, Inc. MFC after: 2 months
show more ...
|
#
0bfd163f |
| 18-Oct-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Merge head r233826 through r256722.
|
Revision tags: release/9.2.0 |
|
#
d1d01586 |
| 05-Sep-2013 |
Simon J. Gerraty <sjg@FreeBSD.org> |
Merge from head
|
#
40f65a4d |
| 07-Aug-2013 |
Peter Grehan <grehan@FreeBSD.org> |
IFC @ r254014
|
#
552311f4 |
| 17-Jul-2013 |
Xin LI <delphij@FreeBSD.org> |
IFC @253398
|
#
ceae90c2 |
| 05-Jul-2013 |
Peter Grehan <grehan@FreeBSD.org> |
IFC @ r252763
|
#
8383a92e |
| 04-Jul-2013 |
Steven Hartland <smh@FreeBSD.org> |
Bump disk(9) ABI version to signify the addition of d_delmaxsize by r249940.
Ensure that d_delmaxsize is always set, removing init to 0 which could cause future issues if use cases change.
Allow ke
Bump disk(9) ABI version to signify the addition of d_delmaxsize by r249940.
Ensure that d_delmaxsize is always set, removing init to 0 which could cause future issues if use cases change.
Allow kern.cam.da.X.delete_max (which maps to d_delmaxsize) to be increased up to the calculated max after being reduced.
MFC after: 1 day X-MFC-With: r249940
show more ...
|
#
cfe30d02 |
| 19-Jun-2013 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Merge fresh head.
|
Revision tags: release/8.4.0 |
|
#
9fe9ba5b |
| 26-Apr-2013 |
Steven Hartland <smh@FreeBSD.org> |
Teach GEOM and CAM about the difference between the max "size" of r/w and delete requests.
sys/geom/geom_disk.h: - Added d_delmaxsize which represents the maximum size of individual
Teach GEOM and CAM about the difference between the max "size" of r/w and delete requests.
sys/geom/geom_disk.h: - Added d_delmaxsize which represents the maximum size of individual device delete requests in bytes. This can be used by devices to inform geom of their size limitations regarding delete operations which are generally different from the read / write limits as data is not usually transferred from the host to physical device.
sys/geom/geom_disk.c: - Use new d_delmaxsize to calculate the size of chunks passed through to the underlying strategy during deletes instead of using read / write optimised values. This defaults to d_maxsize if unset (0).
- Moved d_maxsize default up so it can be used to default d_delmaxsize
sys/cam/ata/ata_da.c: - Added d_delmaxsize calculations for TRIM and CFA
sys/cam/scsi/scsi_da.c: - Added re-calculation of d_delmaxsize whenever delete_method is set.
- Added kern.cam.da.X.delete_max sysctl which allows the max size for delete requests to be limited. This is useful in preventing timeouts on devices who's delete methods are slow. It should be noted that this limit is reset then the device delete method is changed and that it can only be lowered not increased from the device max.
Reviewed by: mav Approved by: pjd (mentor)
show more ...
|
#
252c094e |
| 15-Apr-2013 |
Ivan Voras <ivoras@FreeBSD.org> |
Introduce a symbol for the GEOM class name instead of using the ad-hoc string constant.
|
#
69e6d7b7 |
| 12-Apr-2013 |
Simon J. Gerraty <sjg@FreeBSD.org> |
sync from head
|
#
f8c19ba4 |
| 19-Mar-2013 |
Konstantin Belousov <kib@FreeBSD.org> |
A flag for the geom disk driver to indicate that it accepts the unmapped i/o requests.
Sponsored by: The FreeBSD Foundation Tested by: pho
|
Revision tags: release/9.1.0 |
|
#
300675f6 |
| 27-Nov-2012 |
Alexander Motin <mav@FreeBSD.org> |
MFC
|
#
e477abf7 |
| 27-Nov-2012 |
Alexander Motin <mav@FreeBSD.org> |
MFC @ r241285
|
#
a10c6f55 |
| 11-Nov-2012 |
Neel Natu <neel@FreeBSD.org> |
IFC @ r242684
|
#
23090366 |
| 04-Nov-2012 |
Simon J. Gerraty <sjg@FreeBSD.org> |
Sync from head
|
#
1af2d09b |
| 29-Oct-2012 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Fix locking problem in disk_resize(); previously it would run without topology lock, resulting in assertion when running with DIAGNOSTIC.
Reviewed by: mav (earlier version)
|
#
e11b6fa3 |
| 03-Aug-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Merge head r233826 through r239010.
|
#
3631c638 |
| 29-Jul-2012 |
Alexander Motin <mav@FreeBSD.org> |
Implement media change notification for DA and CD removable media devices. It includes three parts: 1) Modifications to CAM to detect media media changes and report them to disk(9) layer. For modern
Implement media change notification for DA and CD removable media devices. It includes three parts: 1) Modifications to CAM to detect media media changes and report them to disk(9) layer. For modern SATA (and potentially UAS) devices it utilizes Asynchronous Notification mechanism to receive events from hardware. Active polling with TEST UNIT READY commands with 3 seconds period is used for incapable hardware. After that both CD and DA drivers work the same way, detecting two conditions: "NOT READY: Medium not present" after medium was detected previously, and "UNIT ATTENTION: Not ready to ready change, medium may have changed". First one reported to disk(9) as media removal, second as media insert/change. To reliably receive second event new AC_UNIT_ATTENTION async added to make UAs broadcasted to all periphs by generic error handling code in cam_periph_error(). 2) Modifications to GEOM core to handle media remove and change events. Media removal handled by spoiling all consumers attached to the provider. Media change event also schedules provider retaste after spoiling to probe new media. New flag G_CF_ORPHAN was added to consumers to reflect that consumer is in process of destruction. It allows retaste to create new geom instance of the same class, while previous one is still dying. 3) Modifications to some GEOM classes: DEV -- to report media change events to devd; VFS -- to handle spoiling same as orphan to prevent accessing replaced media. PART class already handles spoiling alike to orphan.
Reviewed by: silence on geom@ and scsi@ Tested by: avg Sponsored by: iXsystems, Inc. / PC-BSD MFC after: 2 months
show more ...
|
#
de720122 |
| 15-Jul-2012 |
Gleb Smirnoff <glebius@FreeBSD.org> |
Merge head r236710 through r238467.
|
#
6cf87ec8 |
| 13-Jul-2012 |
Xin LI <delphij@FreeBSD.org> |
IFC @238412.
|
#
b652778e |
| 11-Jul-2012 |
Peter Grehan <grehan@FreeBSD.org> |
IFC @ r238370
|
#
bc97ce36 |
| 07-Jul-2012 |
Edward Tomasz Napierala <trasz@FreeBSD.org> |
Add disk_resize(), to make it possible for the disk drivers such as da(4) to notify GEOM about LUN size change.
Reviewed by: mav (earlier version) Sponsored by: FreeBSD Foundation
|