xref: /freebsd/sys/contrib/openzfs/man/man4/zfs.4 (revision 61145dc2b94f12f6a47344fb9aac702321880e43)
1*61145dc2SMartin Matuska.\" SPDX-License-Identifier: CDDL-1.0
23ff01b23SMartin Matuska.\"
33ff01b23SMartin Matuska.\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
43ff01b23SMartin Matuska.\" Copyright (c) 2019, 2021 by Delphix. All rights reserved.
53ff01b23SMartin Matuska.\" Copyright (c) 2019 Datto Inc.
6783d3ff6SMartin Matuska.\" Copyright (c) 2023, 2024 Klara, Inc.
73ff01b23SMartin Matuska.\" The contents of this file are subject to the terms of the Common Development
83ff01b23SMartin Matuska.\" and Distribution License (the "License").  You may not use this file except
93ff01b23SMartin Matuska.\" in compliance with the License. You can obtain a copy of the license at
10271171e0SMartin Matuska.\" usr/src/OPENSOLARIS.LICENSE or https://opensource.org/licenses/CDDL-1.0.
113ff01b23SMartin Matuska.\"
123ff01b23SMartin Matuska.\" See the License for the specific language governing permissions and
133ff01b23SMartin Matuska.\" limitations under the License. When distributing Covered Code, include this
143ff01b23SMartin Matuska.\" CDDL HEADER in each file and include the License file at
153ff01b23SMartin Matuska.\" usr/src/OPENSOLARIS.LICENSE.  If applicable, add the following below this
163ff01b23SMartin Matuska.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
173ff01b23SMartin Matuska.\" own identifying information:
183ff01b23SMartin Matuska.\" Portions Copyright [yyyy] [name of copyright owner]
193ff01b23SMartin Matuska.\"
207a7741afSMartin Matuska.\" Copyright (c) 2024, Klara, Inc.
217a7741afSMartin Matuska.\"
225c65a0a9SMartin Matuska.Dd November 1, 2024
233ff01b23SMartin Matuska.Dt ZFS 4
243ff01b23SMartin Matuska.Os
253ff01b23SMartin Matuska.
263ff01b23SMartin Matuska.Sh NAME
273ff01b23SMartin Matuska.Nm zfs
283ff01b23SMartin Matuska.Nd tuning of the ZFS kernel module
293ff01b23SMartin Matuska.
303ff01b23SMartin Matuska.Sh DESCRIPTION
313ff01b23SMartin MatuskaThe ZFS module supports these parameters:
323ff01b23SMartin Matuska.Bl -tag -width Ds
33dbd5678dSMartin Matuska.It Sy dbuf_cache_max_bytes Ns = Ns Sy UINT64_MAX Ns B Pq u64
343ff01b23SMartin MatuskaMaximum size in bytes of the dbuf cache.
353ff01b23SMartin MatuskaThe target size is determined by the MIN versus
363ff01b23SMartin Matuska.No 1/2^ Ns Sy dbuf_cache_shift Pq 1/32nd
373ff01b23SMartin Matuskaof the target ARC size.
383ff01b23SMartin MatuskaThe behavior of the dbuf cache and its associated settings
393ff01b23SMartin Matuskacan be observed via the
403ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/dbufstats
413ff01b23SMartin Matuskakstat.
423ff01b23SMartin Matuska.
43dbd5678dSMartin Matuska.It Sy dbuf_metadata_cache_max_bytes Ns = Ns Sy UINT64_MAX Ns B Pq u64
443ff01b23SMartin MatuskaMaximum size in bytes of the metadata dbuf cache.
453ff01b23SMartin MatuskaThe target size is determined by the MIN versus
463ff01b23SMartin Matuska.No 1/2^ Ns Sy dbuf_metadata_cache_shift Pq 1/64th
473ff01b23SMartin Matuskaof the target ARC size.
483ff01b23SMartin MatuskaThe behavior of the metadata dbuf cache and its associated settings
493ff01b23SMartin Matuskacan be observed via the
503ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/dbufstats
513ff01b23SMartin Matuskakstat.
523ff01b23SMartin Matuska.
533ff01b23SMartin Matuska.It Sy dbuf_cache_hiwater_pct Ns = Ns Sy 10 Ns % Pq uint
543ff01b23SMartin MatuskaThe percentage over
553ff01b23SMartin Matuska.Sy dbuf_cache_max_bytes
563ff01b23SMartin Matuskawhen dbufs must be evicted directly.
573ff01b23SMartin Matuska.
583ff01b23SMartin Matuska.It Sy dbuf_cache_lowater_pct Ns = Ns Sy 10 Ns % Pq uint
593ff01b23SMartin MatuskaThe percentage below
603ff01b23SMartin Matuska.Sy dbuf_cache_max_bytes
613ff01b23SMartin Matuskawhen the evict thread stops evicting dbufs.
623ff01b23SMartin Matuska.
63be181ee2SMartin Matuska.It Sy dbuf_cache_shift Ns = Ns Sy 5 Pq uint
643ff01b23SMartin MatuskaSet the size of the dbuf cache
653ff01b23SMartin Matuska.Pq Sy dbuf_cache_max_bytes
663ff01b23SMartin Matuskato a log2 fraction of the target ARC size.
673ff01b23SMartin Matuska.
68be181ee2SMartin Matuska.It Sy dbuf_metadata_cache_shift Ns = Ns Sy 6 Pq uint
693ff01b23SMartin MatuskaSet the size of the dbuf metadata cache
703ff01b23SMartin Matuska.Pq Sy dbuf_metadata_cache_max_bytes
713ff01b23SMartin Matuskato a log2 fraction of the target ARC size.
723ff01b23SMartin Matuska.
73be181ee2SMartin Matuska.It Sy dbuf_mutex_cache_shift Ns = Ns Sy 0 Pq uint
74be181ee2SMartin MatuskaSet the size of the mutex array for the dbuf cache.
75be181ee2SMartin MatuskaWhen set to
76be181ee2SMartin Matuska.Sy 0
77be181ee2SMartin Matuskathe array is dynamically sized based on total system memory.
78be181ee2SMartin Matuska.
79be181ee2SMartin Matuska.It Sy dmu_object_alloc_chunk_shift Ns = Ns Sy 7 Po 128 Pc Pq uint
803ff01b23SMartin Matuskadnode slots allocated in a single operation as a power of 2.
813ff01b23SMartin MatuskaThe default value minimizes lock contention for the bulk operation performed.
823ff01b23SMartin Matuska.
83e2df9bb4SMartin Matuska.It Sy dmu_ddt_copies Ns = Ns Sy 3 Pq uint
84e2df9bb4SMartin MatuskaControls the number of copies stored for DeDup Table
85e2df9bb4SMartin Matuska.Pq DDT
86e2df9bb4SMartin Matuskaobjects.
87e2df9bb4SMartin MatuskaReducing the number of copies to 1 from the previous default of 3
88e2df9bb4SMartin Matuskacan reduce the write inflation caused by deduplication.
89e2df9bb4SMartin MatuskaThis assumes redundancy for this data is provided by the vdev layer.
90e2df9bb4SMartin MatuskaIf the DDT is damaged, space may be leaked
91e2df9bb4SMartin Matuska.Pq not freed
92e2df9bb4SMartin Matuskawhen the DDT can not report the correct reference count.
93e2df9bb4SMartin Matuska.
94be181ee2SMartin Matuska.It Sy dmu_prefetch_max Ns = Ns Sy 134217728 Ns B Po 128 MiB Pc Pq uint
953ff01b23SMartin MatuskaLimit the amount we can prefetch with one call to this amount in bytes.
963ff01b23SMartin MatuskaThis helps to limit the amount of memory that can be used by prefetching.
973ff01b23SMartin Matuska.
983ff01b23SMartin Matuska.It Sy ignore_hole_birth Pq int
993ff01b23SMartin MatuskaAlias for
1003ff01b23SMartin Matuska.Sy send_holes_without_birth_time .
1013ff01b23SMartin Matuska.
1023ff01b23SMartin Matuska.It Sy l2arc_feed_again Ns = Ns Sy 1 Ns | Ns 0 Pq int
1033ff01b23SMartin MatuskaTurbo L2ARC warm-up.
1043ff01b23SMartin MatuskaWhen the L2ARC is cold the fill interval will be set as fast as possible.
1053ff01b23SMartin Matuska.
106dbd5678dSMartin Matuska.It Sy l2arc_feed_min_ms Ns = Ns Sy 200 Pq u64
1073ff01b23SMartin MatuskaMin feed interval in milliseconds.
1083ff01b23SMartin MatuskaRequires
1093ff01b23SMartin Matuska.Sy l2arc_feed_again Ns = Ns Ar 1
1103ff01b23SMartin Matuskaand only applicable in related situations.
1113ff01b23SMartin Matuska.
112dbd5678dSMartin Matuska.It Sy l2arc_feed_secs Ns = Ns Sy 1 Pq u64
1133ff01b23SMartin MatuskaSeconds between L2ARC writing.
1143ff01b23SMartin Matuska.
115e716630dSMartin Matuska.It Sy l2arc_headroom Ns = Ns Sy 8 Pq u64
1163ff01b23SMartin MatuskaHow far through the ARC lists to search for L2ARC cacheable content,
1173ff01b23SMartin Matuskaexpressed as a multiplier of
1183ff01b23SMartin Matuska.Sy l2arc_write_max .
1193ff01b23SMartin MatuskaARC persistence across reboots can be achieved with persistent L2ARC
1203ff01b23SMartin Matuskaby setting this parameter to
1213ff01b23SMartin Matuska.Sy 0 ,
1223ff01b23SMartin Matuskaallowing the full length of ARC lists to be searched for cacheable content.
1233ff01b23SMartin Matuska.
124dbd5678dSMartin Matuska.It Sy l2arc_headroom_boost Ns = Ns Sy 200 Ns % Pq u64
1253ff01b23SMartin MatuskaScales
1263ff01b23SMartin Matuska.Sy l2arc_headroom
1273ff01b23SMartin Matuskaby this percentage when L2ARC contents are being successfully compressed
1283ff01b23SMartin Matuskabefore writing.
1293ff01b23SMartin MatuskaA value of
1303ff01b23SMartin Matuska.Sy 100
1313ff01b23SMartin Matuskadisables this feature.
1323ff01b23SMartin Matuska.
133dae17134SMartin Matuska.It Sy l2arc_exclude_special Ns = Ns Sy 0 Ns | Ns 1 Pq int
134e92ffd9bSMartin MatuskaControls whether buffers present on special vdevs are eligible for caching
135dae17134SMartin Matuskainto L2ARC.
136dae17134SMartin MatuskaIf set to 1, exclude dbufs on special vdevs from being cached to L2ARC.
137dae17134SMartin Matuska.
138e2df9bb4SMartin Matuska.It Sy l2arc_mfuonly Ns = Ns Sy 0 Ns | Ns 1 Ns | Ns 2 Pq int
1393ff01b23SMartin MatuskaControls whether only MFU metadata and data are cached from ARC into L2ARC.
1403ff01b23SMartin MatuskaThis may be desired to avoid wasting space on L2ARC when reading/writing large
1413ff01b23SMartin Matuskaamounts of data that are not expected to be accessed more than once.
1423ff01b23SMartin Matuska.Pp
143e2df9bb4SMartin MatuskaThe default is 0,
1443ff01b23SMartin Matuskameaning both MRU and MFU data and metadata are cached.
145e2df9bb4SMartin MatuskaWhen turning off this feature (setting it to 0), some MRU buffers will
146e2df9bb4SMartin Matuskastill be present in ARC and eventually cached on L2ARC.
1473ff01b23SMartin Matuska.No If Sy l2arc_noprefetch Ns = Ns Sy 0 ,
1483ff01b23SMartin Matuskasome prefetched buffers will be cached to L2ARC, and those might later
1493ff01b23SMartin Matuskatransition to MRU, in which case the
1503ff01b23SMartin Matuska.Sy l2arc_mru_asize No arcstat will not be Sy 0 .
1513ff01b23SMartin Matuska.Pp
152e2df9bb4SMartin MatuskaSetting it to 1 means to L2 cache only MFU data and metadata.
153e2df9bb4SMartin Matuska.Pp
154e2df9bb4SMartin MatuskaSetting it to 2 means to L2 cache all metadata (MRU+MFU) but
155*61145dc2SMartin Matuskaonly MFU data (i.e. MRU data are not cached). This can be the right setting
156e2df9bb4SMartin Matuskato cache as much metadata as possible even when having high data turnover.
157e2df9bb4SMartin Matuska.Pp
1583ff01b23SMartin MatuskaRegardless of
1593ff01b23SMartin Matuska.Sy l2arc_noprefetch ,
1603ff01b23SMartin Matuskasome MFU buffers might be evicted from ARC,
1613ff01b23SMartin Matuskaaccessed later on as prefetches and transition to MRU as prefetches.
1623ff01b23SMartin MatuskaIf accessed again they are counted as MRU and the
1633ff01b23SMartin Matuska.Sy l2arc_mru_asize No arcstat will not be Sy 0 .
1643ff01b23SMartin Matuska.Pp
1653ff01b23SMartin MatuskaThe ARC status of L2ARC buffers when they were first cached in
1663ff01b23SMartin MatuskaL2ARC can be seen in the
1673ff01b23SMartin Matuska.Sy l2arc_mru_asize , Sy l2arc_mfu_asize , No and Sy l2arc_prefetch_asize
1683ff01b23SMartin Matuskaarcstats when importing the pool or onlining a cache
1693ff01b23SMartin Matuskadevice if persistent L2ARC is enabled.
1703ff01b23SMartin Matuska.Pp
1713ff01b23SMartin MatuskaThe
1723ff01b23SMartin Matuska.Sy evict_l2_eligible_mru
1733ff01b23SMartin Matuskaarcstat does not take into account if this option is enabled as the information
1743ff01b23SMartin Matuskaprovided by the
1753ff01b23SMartin Matuska.Sy evict_l2_eligible_m[rf]u
1763ff01b23SMartin Matuskaarcstats can be used to decide if toggling this option is appropriate
1773ff01b23SMartin Matuskafor the current workload.
1783ff01b23SMartin Matuska.
179be181ee2SMartin Matuska.It Sy l2arc_meta_percent Ns = Ns Sy 33 Ns % Pq uint
1803ff01b23SMartin MatuskaPercent of ARC size allowed for L2ARC-only headers.
1813ff01b23SMartin MatuskaSince L2ARC buffers are not evicted on memory pressure,
1823ff01b23SMartin Matuskatoo many headers on a system with an irrationally large L2ARC
1833ff01b23SMartin Matuskacan render it slow or unusable.
1843ff01b23SMartin MatuskaThis parameter limits L2ARC writes and rebuilds to achieve the target.
1853ff01b23SMartin Matuska.
186dbd5678dSMartin Matuska.It Sy l2arc_trim_ahead Ns = Ns Sy 0 Ns % Pq u64
1873ff01b23SMartin MatuskaTrims ahead of the current write size
1883ff01b23SMartin Matuska.Pq Sy l2arc_write_max
1893ff01b23SMartin Matuskaon L2ARC devices by this percentage of write size if we have filled the device.
1903ff01b23SMartin MatuskaIf set to
1913ff01b23SMartin Matuska.Sy 100
1923ff01b23SMartin Matuskawe TRIM twice the space required to accommodate upcoming writes.
1933ff01b23SMartin MatuskaA minimum of
194716fd348SMartin Matuska.Sy 64 MiB
1953ff01b23SMartin Matuskawill be trimmed.
1963ff01b23SMartin MatuskaIt also enables TRIM of the whole L2ARC device upon creation
1973ff01b23SMartin Matuskaor addition to an existing pool or if the header of the device is
1983ff01b23SMartin Matuskainvalid upon importing a pool or onlining a cache device.
1993ff01b23SMartin MatuskaA value of
2003ff01b23SMartin Matuska.Sy 0
2013ff01b23SMartin Matuskadisables TRIM on L2ARC altogether and is the default as it can put significant
2023ff01b23SMartin Matuskastress on the underlying storage devices.
2033ff01b23SMartin MatuskaThis will vary depending of how well the specific device handles these commands.
2043ff01b23SMartin Matuska.
2053ff01b23SMartin Matuska.It Sy l2arc_noprefetch Ns = Ns Sy 1 Ns | Ns 0 Pq int
2063ff01b23SMartin MatuskaDo not write buffers to L2ARC if they were prefetched but not used by
2073ff01b23SMartin Matuskaapplications.
2083ff01b23SMartin MatuskaIn case there are prefetched buffers in L2ARC and this option
2093ff01b23SMartin Matuskais later set, we do not read the prefetched buffers from L2ARC.
2103ff01b23SMartin MatuskaUnsetting this option is useful for caching sequential reads from the
2113ff01b23SMartin Matuskadisks to L2ARC and serve those reads from L2ARC later on.
2123ff01b23SMartin MatuskaThis may be beneficial in case the L2ARC device is significantly faster
2133ff01b23SMartin Matuskain sequential reads than the disks of the pool.
2143ff01b23SMartin Matuska.Pp
2153ff01b23SMartin MatuskaUse
2163ff01b23SMartin Matuska.Sy 1
2173ff01b23SMartin Matuskato disable and
2183ff01b23SMartin Matuska.Sy 0
2193ff01b23SMartin Matuskato enable caching/reading prefetches to/from L2ARC.
2203ff01b23SMartin Matuska.
2213ff01b23SMartin Matuska.It Sy l2arc_norw Ns = Ns Sy 0 Ns | Ns 1 Pq int
2223ff01b23SMartin MatuskaNo reads during writes.
2233ff01b23SMartin Matuska.
224e716630dSMartin Matuska.It Sy l2arc_write_boost Ns = Ns Sy 33554432 Ns B Po 32 MiB Pc Pq u64
2253ff01b23SMartin MatuskaCold L2ARC devices will have
2263ff01b23SMartin Matuska.Sy l2arc_write_max
2273ff01b23SMartin Matuskaincreased by this amount while they remain cold.
2283ff01b23SMartin Matuska.
229e716630dSMartin Matuska.It Sy l2arc_write_max Ns = Ns Sy 33554432 Ns B Po 32 MiB Pc Pq u64
2303ff01b23SMartin MatuskaMax write bytes per interval.
2313ff01b23SMartin Matuska.
2323ff01b23SMartin Matuska.It Sy l2arc_rebuild_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
2333ff01b23SMartin MatuskaRebuild the L2ARC when importing a pool (persistent L2ARC).
2343ff01b23SMartin MatuskaThis can be disabled if there are problems importing a pool
2353ff01b23SMartin Matuskaor attaching an L2ARC device (e.g. the L2ARC device is slow
2363ff01b23SMartin Matuskain reading stored log metadata, or the metadata
2373ff01b23SMartin Matuskahas become somehow fragmented/unusable).
2383ff01b23SMartin Matuska.
239dbd5678dSMartin Matuska.It Sy l2arc_rebuild_blocks_min_l2size Ns = Ns Sy 1073741824 Ns B Po 1 GiB Pc Pq u64
240*61145dc2SMartin MatuskaMinimum size of an L2ARC device required in order to write log blocks in it.
2413ff01b23SMartin MatuskaThe log blocks are used upon importing the pool to rebuild the persistent L2ARC.
2423ff01b23SMartin Matuska.Pp
243716fd348SMartin MatuskaFor L2ARC devices less than 1 GiB, the amount of data
2443ff01b23SMartin Matuska.Fn l2arc_evict
2453ff01b23SMartin Matuskaevicts is significant compared to the amount of restored L2ARC data.
2463ff01b23SMartin MatuskaIn this case, do not write log blocks in L2ARC in order not to waste space.
2473ff01b23SMartin Matuska.
248*61145dc2SMartin Matuska.It Sy metaslab_aliquot Ns = Ns Sy 2097152 Ns B Po 2 MiB Pc Pq u64
249*61145dc2SMartin MatuskaMetaslab group's per child vdev allocation granularity, in bytes.
2503ff01b23SMartin MatuskaThis is roughly similar to what would be referred to as the "stripe size"
2513ff01b23SMartin Matuskain traditional RAID arrays.
252*61145dc2SMartin MatuskaIn normal operation, ZFS will try to write this amount of data to each child
253*61145dc2SMartin Matuskaof a top-level vdev before moving on to the next top-level vdev.
2543ff01b23SMartin Matuska.
2553ff01b23SMartin Matuska.It Sy metaslab_bias_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
256*61145dc2SMartin MatuskaEnable metaslab groups biasing based on their over- or under-utilization
257*61145dc2SMartin Matuskarelative to the metaslab class average.
258*61145dc2SMartin MatuskaIf disabled, each metaslab group will receive allocations proportional to its
259*61145dc2SMartin Matuskacapacity.
260*61145dc2SMartin Matuska.
261*61145dc2SMartin Matuska.It Sy metaslab_perf_bias Ns = Ns Sy 1 Ns | Ns 0 Ns | Ns 2 Pq int
262*61145dc2SMartin MatuskaControls metaslab groups biasing based on their write performance.
263*61145dc2SMartin MatuskaSetting to 0 makes all metaslab groups receive fixed amounts of allocations.
264*61145dc2SMartin MatuskaSetting to 2 allows faster metaslab groups to allocate more.
265*61145dc2SMartin MatuskaSetting to 1 equals to 2 if the pool is write-bound or 0 otherwise.
266*61145dc2SMartin MatuskaThat is, if the pool is limited by write throughput, then allocate more from
267*61145dc2SMartin Matuskafaster metaslab groups, but if not, try to evenly distribute the allocations.
2683ff01b23SMartin Matuska.
269dbd5678dSMartin Matuska.It Sy metaslab_force_ganging Ns = Ns Sy 16777217 Ns B Po 16 MiB + 1 B Pc Pq u64
2703ff01b23SMartin MatuskaMake some blocks above a certain size be gang blocks.
2713ff01b23SMartin MatuskaThis option is used by the test suite to facilitate testing.
2723ff01b23SMartin Matuska.
273315ee00fSMartin Matuska.It Sy metaslab_force_ganging_pct Ns = Ns Sy 3 Ns % Pq uint
274315ee00fSMartin MatuskaFor blocks that could be forced to be a gang block (due to
275315ee00fSMartin Matuska.Sy metaslab_force_ganging ) ,
276315ee00fSMartin Matuskaforce this many of them to be gang blocks.
277315ee00fSMartin Matuska.
278783d3ff6SMartin Matuska.It Sy brt_zap_prefetch Ns = Ns Sy 1 Ns | Ns 0 Pq int
279783d3ff6SMartin MatuskaControls prefetching BRT records for blocks which are going to be cloned.
280783d3ff6SMartin Matuska.
281783d3ff6SMartin Matuska.It Sy brt_zap_default_bs Ns = Ns Sy 12 Po 4 KiB Pc Pq int
282783d3ff6SMartin MatuskaDefault BRT ZAP data block size as a power of 2. Note that changing this after
283783d3ff6SMartin Matuskacreating a BRT on the pool will not affect existing BRTs, only newly created
284783d3ff6SMartin Matuskaones.
285783d3ff6SMartin Matuska.
286783d3ff6SMartin Matuska.It Sy brt_zap_default_ibs Ns = Ns Sy 12 Po 4 KiB Pc Pq int
287783d3ff6SMartin MatuskaDefault BRT ZAP indirect block size as a power of 2. Note that changing this
288783d3ff6SMartin Matuskaafter creating a BRT on the pool will not affect existing BRTs, only newly
289783d3ff6SMartin Matuskacreated ones.
290783d3ff6SMartin Matuska.
291783d3ff6SMartin Matuska.It Sy ddt_zap_default_bs Ns = Ns Sy 15 Po 32 KiB Pc Pq int
2920a97523dSMartin MatuskaDefault DDT ZAP data block size as a power of 2. Note that changing this after
2930a97523dSMartin Matuskacreating a DDT on the pool will not affect existing DDTs, only newly created
2940a97523dSMartin Matuskaones.
2950a97523dSMartin Matuska.
296783d3ff6SMartin Matuska.It Sy ddt_zap_default_ibs Ns = Ns Sy 15 Po 32 KiB Pc Pq int
2970a97523dSMartin MatuskaDefault DDT ZAP indirect block size as a power of 2. Note that changing this
2980a97523dSMartin Matuskaafter creating a DDT on the pool will not affect existing DDTs, only newly
2990a97523dSMartin Matuskacreated ones.
3000a97523dSMartin Matuska.
30115f0b8c3SMartin Matuska.It Sy zfs_default_bs Ns = Ns Sy 9 Po 512 B Pc Pq int
30215f0b8c3SMartin MatuskaDefault dnode block size as a power of 2.
30315f0b8c3SMartin Matuska.
30415f0b8c3SMartin Matuska.It Sy zfs_default_ibs Ns = Ns Sy 17 Po 128 KiB Pc Pq int
30515f0b8c3SMartin MatuskaDefault dnode indirect block size as a power of 2.
30615f0b8c3SMartin Matuska.
3077a7741afSMartin Matuska.It Sy zfs_dio_enabled Ns = Ns Sy 0 Ns | Ns 1 Pq int
3087a7741afSMartin MatuskaEnable Direct I/O.
3097a7741afSMartin MatuskaIf this setting is 0, then all I/O requests will be directed through the ARC
3107a7741afSMartin Matuskaacting as though the dataset property
3117a7741afSMartin Matuska.Sy direct
3127a7741afSMartin Matuskawas set to
3137a7741afSMartin Matuska.Sy disabled .
3147a7741afSMartin Matuska.
315dbd5678dSMartin Matuska.It Sy zfs_history_output_max Ns = Ns Sy 1048576 Ns B Po 1 MiB Pc Pq u64
3163ff01b23SMartin MatuskaWhen attempting to log an output nvlist of an ioctl in the on-disk history,
3173ff01b23SMartin Matuskathe output will not be stored if it is larger than this size (in bytes).
3183ff01b23SMartin MatuskaThis must be less than
319716fd348SMartin Matuska.Sy DMU_MAX_ACCESS Pq 64 MiB .
3203ff01b23SMartin MatuskaThis applies primarily to
3213ff01b23SMartin Matuska.Fn zfs_ioc_channel_program Pq cf. Xr zfs-program 8 .
3223ff01b23SMartin Matuska.
3233ff01b23SMartin Matuska.It Sy zfs_keep_log_spacemaps_at_export Ns = Ns Sy 0 Ns | Ns 1 Pq int
3243ff01b23SMartin MatuskaPrevent log spacemaps from being destroyed during pool exports and destroys.
3253ff01b23SMartin Matuska.
3263ff01b23SMartin Matuska.It Sy zfs_metaslab_segment_weight_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
3273ff01b23SMartin MatuskaEnable/disable segment-based metaslab selection.
3283ff01b23SMartin Matuska.
3293ff01b23SMartin Matuska.It Sy zfs_metaslab_switch_threshold Ns = Ns Sy 2 Pq int
3303ff01b23SMartin MatuskaWhen using segment-based metaslab selection, continue allocating
3313ff01b23SMartin Matuskafrom the active metaslab until this option's
3323ff01b23SMartin Matuskaworth of buckets have been exhausted.
3333ff01b23SMartin Matuska.
3343ff01b23SMartin Matuska.It Sy metaslab_debug_load Ns = Ns Sy 0 Ns | Ns 1 Pq int
3353ff01b23SMartin MatuskaLoad all metaslabs during pool import.
3363ff01b23SMartin Matuska.
3373ff01b23SMartin Matuska.It Sy metaslab_debug_unload Ns = Ns Sy 0 Ns | Ns 1 Pq int
3383ff01b23SMartin MatuskaPrevent metaslabs from being unloaded.
3393ff01b23SMartin Matuska.
3403ff01b23SMartin Matuska.It Sy metaslab_fragmentation_factor_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
3413ff01b23SMartin MatuskaEnable use of the fragmentation metric in computing metaslab weights.
3423ff01b23SMartin Matuska.
343be181ee2SMartin Matuska.It Sy metaslab_df_max_search Ns = Ns Sy 16777216 Ns B Po 16 MiB Pc Pq uint
3443ff01b23SMartin MatuskaMaximum distance to search forward from the last offset.
3453ff01b23SMartin MatuskaWithout this limit, fragmented pools can see
3463ff01b23SMartin Matuska.Em >100`000
3473ff01b23SMartin Matuskaiterations and
3483ff01b23SMartin Matuska.Fn metaslab_block_picker
3493ff01b23SMartin Matuskabecomes the performance limiting factor on high-performance storage.
3503ff01b23SMartin Matuska.Pp
3513ff01b23SMartin MatuskaWith the default setting of
352716fd348SMartin Matuska.Sy 16 MiB ,
3533ff01b23SMartin Matuskawe typically see less than
3543ff01b23SMartin Matuska.Em 500
3553ff01b23SMartin Matuskaiterations, even with very fragmented
3563ff01b23SMartin Matuska.Sy ashift Ns = Ns Sy 9
3573ff01b23SMartin Matuskapools.
3583ff01b23SMartin MatuskaThe maximum number of iterations possible is
3593ff01b23SMartin Matuska.Sy metaslab_df_max_search / 2^(ashift+1) .
3603ff01b23SMartin MatuskaWith the default setting of
361716fd348SMartin Matuska.Sy 16 MiB
3623ff01b23SMartin Matuskathis is
3633ff01b23SMartin Matuska.Em 16*1024 Pq with Sy ashift Ns = Ns Sy 9
3643ff01b23SMartin Matuskaor
3653ff01b23SMartin Matuska.Em 2*1024 Pq with Sy ashift Ns = Ns Sy 12 .
3663ff01b23SMartin Matuska.
3673ff01b23SMartin Matuska.It Sy metaslab_df_use_largest_segment Ns = Ns Sy 0 Ns | Ns 1 Pq int
3683ff01b23SMartin MatuskaIf not searching forward (due to
3693ff01b23SMartin Matuska.Sy metaslab_df_max_search , metaslab_df_free_pct ,
3703ff01b23SMartin Matuska.No or Sy metaslab_df_alloc_threshold ) ,
3713ff01b23SMartin Matuskathis tunable controls which segment is used.
3723ff01b23SMartin MatuskaIf set, we will use the largest free segment.
3733ff01b23SMartin MatuskaIf unset, we will use a segment of at least the requested size.
3743ff01b23SMartin Matuska.
375dbd5678dSMartin Matuska.It Sy zfs_metaslab_max_size_cache_sec Ns = Ns Sy 3600 Ns s Po 1 hour Pc Pq u64
3763ff01b23SMartin MatuskaWhen we unload a metaslab, we cache the size of the largest free chunk.
3773ff01b23SMartin MatuskaWe use that cached size to determine whether or not to load a metaslab
3783ff01b23SMartin Matuskafor a given allocation.
3793ff01b23SMartin MatuskaAs more frees accumulate in that metaslab while it's unloaded,
3803ff01b23SMartin Matuskathe cached max size becomes less and less accurate.
3813ff01b23SMartin MatuskaAfter a number of seconds controlled by this tunable,
3823ff01b23SMartin Matuskawe stop considering the cached max size and start
3833ff01b23SMartin Matuskaconsidering only the histogram instead.
3843ff01b23SMartin Matuska.
385be181ee2SMartin Matuska.It Sy zfs_metaslab_mem_limit Ns = Ns Sy 25 Ns % Pq uint
3863ff01b23SMartin MatuskaWhen we are loading a new metaslab, we check the amount of memory being used
3873ff01b23SMartin Matuskato store metaslab range trees.
3883ff01b23SMartin MatuskaIf it is over a threshold, we attempt to unload the least recently used metaslab
3893ff01b23SMartin Matuskato prevent the system from clogging all of its memory with range trees.
3903ff01b23SMartin MatuskaThis tunable sets the percentage of total system memory that is the threshold.
3913ff01b23SMartin Matuska.
3923ff01b23SMartin Matuska.It Sy zfs_metaslab_try_hard_before_gang Ns = Ns Sy 0 Ns | Ns 1 Pq int
3933ff01b23SMartin Matuska.Bl -item -compact
3943ff01b23SMartin Matuska.It
3953ff01b23SMartin MatuskaIf unset, we will first try normal allocation.
3963ff01b23SMartin Matuska.It
3973ff01b23SMartin MatuskaIf that fails then we will do a gang allocation.
3983ff01b23SMartin Matuska.It
3993ff01b23SMartin MatuskaIf that fails then we will do a "try hard" gang allocation.
4003ff01b23SMartin Matuska.It
4013ff01b23SMartin MatuskaIf that fails then we will have a multi-layer gang block.
4023ff01b23SMartin Matuska.El
4033ff01b23SMartin Matuska.Pp
4043ff01b23SMartin Matuska.Bl -item -compact
4053ff01b23SMartin Matuska.It
4063ff01b23SMartin MatuskaIf set, we will first try normal allocation.
4073ff01b23SMartin Matuska.It
4083ff01b23SMartin MatuskaIf that fails then we will do a "try hard" allocation.
4093ff01b23SMartin Matuska.It
4103ff01b23SMartin MatuskaIf that fails we will do a gang allocation.
4113ff01b23SMartin Matuska.It
4123ff01b23SMartin MatuskaIf that fails we will do a "try hard" gang allocation.
4133ff01b23SMartin Matuska.It
4143ff01b23SMartin MatuskaIf that fails then we will have a multi-layer gang block.
4153ff01b23SMartin Matuska.El
4163ff01b23SMartin Matuska.
417be181ee2SMartin Matuska.It Sy zfs_metaslab_find_max_tries Ns = Ns Sy 100 Pq uint
4183ff01b23SMartin MatuskaWhen not trying hard, we only consider this number of the best metaslabs.
4193ff01b23SMartin MatuskaThis improves performance, especially when there are many metaslabs per vdev
4203ff01b23SMartin Matuskaand the allocation can't actually be satisfied
4213ff01b23SMartin Matuska(so we would otherwise iterate all metaslabs).
4223ff01b23SMartin Matuska.
423be181ee2SMartin Matuska.It Sy zfs_vdev_default_ms_count Ns = Ns Sy 200 Pq uint
4243ff01b23SMartin MatuskaWhen a vdev is added, target this number of metaslabs per top-level vdev.
4253ff01b23SMartin Matuska.
426be181ee2SMartin Matuska.It Sy zfs_vdev_default_ms_shift Ns = Ns Sy 29 Po 512 MiB Pc Pq uint
427d411c1d6SMartin MatuskaDefault lower limit for metaslab size.
428d411c1d6SMartin Matuska.
429d411c1d6SMartin Matuska.It Sy zfs_vdev_max_ms_shift Ns = Ns Sy 34 Po 16 GiB Pc Pq uint
430d411c1d6SMartin MatuskaDefault upper limit for metaslab size.
4313ff01b23SMartin Matuska.
432dbd5678dSMartin Matuska.It Sy zfs_vdev_max_auto_ashift Ns = Ns Sy 14 Pq uint
433bb2d13b6SMartin MatuskaMaximum ashift used when optimizing for logical \[->] physical sector size on
434bb2d13b6SMartin Matuskanew
4353ff01b23SMartin Matuskatop-level vdevs.
436c7046f76SMartin MatuskaMay be increased up to
437c7046f76SMartin Matuska.Sy ASHIFT_MAX Po 16 Pc ,
438c7046f76SMartin Matuskabut this may negatively impact pool space efficiency.
4393ff01b23SMartin Matuska.
4407a7741afSMartin Matuska.It Sy zfs_vdev_direct_write_verify Ns = Ns Sy Linux 1 | FreeBSD 0 Pq uint
4417a7741afSMartin MatuskaIf non-zero, then a Direct I/O write's checksum will be verified every
442c6767dc1SMartin Matuskatime the write is issued and before it is committed to the block pointer.
4437a7741afSMartin MatuskaIn the event the checksum is not valid then the I/O operation will return EIO.
4447a7741afSMartin MatuskaThis module parameter can be used to detect if the
4457a7741afSMartin Matuskacontents of the users buffer have changed in the process of doing a Direct I/O
4467a7741afSMartin Matuskawrite.
4477a7741afSMartin MatuskaIt can also help to identify if reported checksum errors are tied to Direct I/O
4487a7741afSMartin Matuskawrites.
4497a7741afSMartin MatuskaEach verify error causes a
45087bf66d4SMartin Matuska.Sy dio_verify_wr
4517a7741afSMartin Matuskazevent.
452c6767dc1SMartin MatuskaDirect Write I/O checksum verify errors can be seen with
4537a7741afSMartin Matuska.Nm zpool Cm status Fl d .
4547a7741afSMartin MatuskaThe default value for this is 1 on Linux, but is 0 for
4557a7741afSMartin Matuska.Fx
4567a7741afSMartin Matuskabecause user pages can be placed under write protection in
4577a7741afSMartin Matuska.Fx
4587a7741afSMartin Matuskabefore the Direct I/O write is issued.
4597a7741afSMartin Matuska.
460dbd5678dSMartin Matuska.It Sy zfs_vdev_min_auto_ashift Ns = Ns Sy ASHIFT_MIN Po 9 Pc Pq uint
4613ff01b23SMartin MatuskaMinimum ashift used when creating new top-level vdevs.
4623ff01b23SMartin Matuska.
463be181ee2SMartin Matuska.It Sy zfs_vdev_min_ms_count Ns = Ns Sy 16 Pq uint
4643ff01b23SMartin MatuskaMinimum number of metaslabs to create in a top-level vdev.
4653ff01b23SMartin Matuska.
4663ff01b23SMartin Matuska.It Sy vdev_validate_skip Ns = Ns Sy 0 Ns | Ns 1 Pq int
4673ff01b23SMartin MatuskaSkip label validation steps during pool import.
4683ff01b23SMartin MatuskaChanging is not recommended unless you know what you're doing
4693ff01b23SMartin Matuskaand are recovering a damaged label.
4703ff01b23SMartin Matuska.
471be181ee2SMartin Matuska.It Sy zfs_vdev_ms_count_limit Ns = Ns Sy 131072 Po 128k Pc Pq uint
4723ff01b23SMartin MatuskaPractical upper limit of total metaslabs per top-level vdev.
4733ff01b23SMartin Matuska.
4743ff01b23SMartin Matuska.It Sy metaslab_preload_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
4753ff01b23SMartin MatuskaEnable metaslab group preloading.
4763ff01b23SMartin Matuska.
477b2526e8bSMartin Matuska.It Sy metaslab_preload_limit Ns = Ns Sy 10 Pq uint
478b2526e8bSMartin MatuskaMaximum number of metaslabs per group to preload
479b2526e8bSMartin Matuska.
480b2526e8bSMartin Matuska.It Sy metaslab_preload_pct Ns = Ns Sy 50 Pq uint
481b2526e8bSMartin MatuskaPercentage of CPUs to run a metaslab preload taskq
482b2526e8bSMartin Matuska.
4833ff01b23SMartin Matuska.It Sy metaslab_lba_weighting_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
4843ff01b23SMartin MatuskaGive more weight to metaslabs with lower LBAs,
4853ff01b23SMartin Matuskaassuming they have greater bandwidth,
4863ff01b23SMartin Matuskaas is typically the case on a modern constant angular velocity disk drive.
4873ff01b23SMartin Matuska.
488be181ee2SMartin Matuska.It Sy metaslab_unload_delay Ns = Ns Sy 32 Pq uint
4893ff01b23SMartin MatuskaAfter a metaslab is used, we keep it loaded for this many TXGs, to attempt to
4903ff01b23SMartin Matuskareduce unnecessary reloading.
4913ff01b23SMartin MatuskaNote that both this many TXGs and
4923ff01b23SMartin Matuska.Sy metaslab_unload_delay_ms
4933ff01b23SMartin Matuskamilliseconds must pass before unloading will occur.
4943ff01b23SMartin Matuska.
495be181ee2SMartin Matuska.It Sy metaslab_unload_delay_ms Ns = Ns Sy 600000 Ns ms Po 10 min Pc Pq uint
4963ff01b23SMartin MatuskaAfter a metaslab is used, we keep it loaded for this many milliseconds,
4973ff01b23SMartin Matuskato attempt to reduce unnecessary reloading.
4983ff01b23SMartin MatuskaNote, that both this many milliseconds and
4993ff01b23SMartin Matuska.Sy metaslab_unload_delay
5003ff01b23SMartin MatuskaTXGs must pass before unloading will occur.
5013ff01b23SMartin Matuska.
502be181ee2SMartin Matuska.It Sy reference_history Ns = Ns Sy 3 Pq uint
503bb2d13b6SMartin MatuskaMaximum reference holders being tracked when reference_tracking_enable is
504bb2d13b6SMartin Matuskaactive.
505e716630dSMartin Matuska.It Sy raidz_expand_max_copy_bytes Ns = Ns Sy 160MB Pq ulong
506e716630dSMartin MatuskaMax amount of memory to use for RAID-Z expansion I/O.
507e716630dSMartin MatuskaThis limits how much I/O can be outstanding at once.
508e716630dSMartin Matuska.
509e716630dSMartin Matuska.It Sy raidz_expand_max_reflow_bytes Ns = Ns Sy 0 Pq ulong
510e716630dSMartin MatuskaFor testing, pause RAID-Z expansion when reflow amount reaches this value.
511e716630dSMartin Matuska.
512e716630dSMartin Matuska.It Sy raidz_io_aggregate_rows Ns = Ns Sy 4 Pq ulong
513e716630dSMartin MatuskaFor expanded RAID-Z, aggregate reads that have more rows than this.
514e716630dSMartin Matuska.
515e716630dSMartin Matuska.It Sy reference_history Ns = Ns Sy 3 Pq int
516e716630dSMartin MatuskaMaximum reference holders being tracked when reference_tracking_enable is
517e716630dSMartin Matuskaactive.
5183ff01b23SMartin Matuska.
5193ff01b23SMartin Matuska.It Sy reference_tracking_enable Ns = Ns Sy 0 Ns | Ns 1 Pq int
5203ff01b23SMartin MatuskaTrack reference holders to
5213ff01b23SMartin Matuska.Sy refcount_t
5223ff01b23SMartin Matuskaobjects (debug builds only).
5233ff01b23SMartin Matuska.
5243ff01b23SMartin Matuska.It Sy send_holes_without_birth_time Ns = Ns Sy 1 Ns | Ns 0 Pq int
5253ff01b23SMartin MatuskaWhen set, the
5263ff01b23SMartin Matuska.Sy hole_birth
5273ff01b23SMartin Matuskaoptimization will not be used, and all holes will always be sent during a
5283ff01b23SMartin Matuska.Nm zfs Cm send .
5293ff01b23SMartin MatuskaThis is useful if you suspect your datasets are affected by a bug in
5303ff01b23SMartin Matuska.Sy hole_birth .
5313ff01b23SMartin Matuska.
5323ff01b23SMartin Matuska.It Sy spa_config_path Ns = Ns Pa /etc/zfs/zpool.cache Pq charp
5333ff01b23SMartin MatuskaSPA config file.
5343ff01b23SMartin Matuska.
535be181ee2SMartin Matuska.It Sy spa_asize_inflation Ns = Ns Sy 24 Pq uint
5363ff01b23SMartin MatuskaMultiplication factor used to estimate actual disk consumption from the
5373ff01b23SMartin Matuskasize of data being written.
5383ff01b23SMartin MatuskaThe default value is a worst case estimate,
5393ff01b23SMartin Matuskabut lower values may be valid for a given pool depending on its configuration.
5403ff01b23SMartin MatuskaPool administrators who understand the factors involved
5413ff01b23SMartin Matuskamay wish to specify a more realistic inflation factor,
5423ff01b23SMartin Matuskaparticularly if they operate close to quota or capacity limits.
5433ff01b23SMartin Matuska.
5443ff01b23SMartin Matuska.It Sy spa_load_print_vdev_tree Ns = Ns Sy 0 Ns | Ns 1 Pq int
545bb2d13b6SMartin MatuskaWhether to print the vdev tree in the debugging message buffer during pool
546bb2d13b6SMartin Matuskaimport.
5473ff01b23SMartin Matuska.
5483ff01b23SMartin Matuska.It Sy spa_load_verify_data Ns = Ns Sy 1 Ns | Ns 0 Pq int
5493ff01b23SMartin MatuskaWhether to traverse data blocks during an "extreme rewind"
5503ff01b23SMartin Matuska.Pq Fl X
5513ff01b23SMartin Matuskaimport.
5523ff01b23SMartin Matuska.Pp
5533ff01b23SMartin MatuskaAn extreme rewind import normally performs a full traversal of all
5543ff01b23SMartin Matuskablocks in the pool for verification.
5553ff01b23SMartin MatuskaIf this parameter is unset, the traversal skips non-metadata blocks.
5563ff01b23SMartin MatuskaIt can be toggled once the
5573ff01b23SMartin Matuskaimport has started to stop or start the traversal of non-metadata blocks.
5583ff01b23SMartin Matuska.
5593ff01b23SMartin Matuska.It Sy spa_load_verify_metadata  Ns = Ns Sy 1 Ns | Ns 0 Pq int
5603ff01b23SMartin MatuskaWhether to traverse blocks during an "extreme rewind"
5613ff01b23SMartin Matuska.Pq Fl X
5623ff01b23SMartin Matuskapool import.
5633ff01b23SMartin Matuska.Pp
5643ff01b23SMartin MatuskaAn extreme rewind import normally performs a full traversal of all
5653ff01b23SMartin Matuskablocks in the pool for verification.
5663ff01b23SMartin MatuskaIf this parameter is unset, the traversal is not performed.
5673ff01b23SMartin MatuskaIt can be toggled once the import has started to stop or start the traversal.
5683ff01b23SMartin Matuska.
569be181ee2SMartin Matuska.It Sy spa_load_verify_shift Ns = Ns Sy 4 Po 1/16th Pc Pq uint
5703ff01b23SMartin MatuskaSets the maximum number of bytes to consume during pool import to the log2
5713ff01b23SMartin Matuskafraction of the target ARC size.
5723ff01b23SMartin Matuska.
5733ff01b23SMartin Matuska.It Sy spa_slop_shift Ns = Ns Sy 5 Po 1/32nd Pc Pq int
5743ff01b23SMartin MatuskaNormally, we don't allow the last
5753ff01b23SMartin Matuska.Sy 3.2% Pq Sy 1/2^spa_slop_shift
5763ff01b23SMartin Matuskaof space in the pool to be consumed.
5773ff01b23SMartin MatuskaThis ensures that we don't run the pool completely out of space,
5783ff01b23SMartin Matuskadue to unaccounted changes (e.g. to the MOS).
5793ff01b23SMartin MatuskaIt also limits the worst-case time to allocate space.
5803ff01b23SMartin MatuskaIf we have less than this amount of free space,
5813ff01b23SMartin Matuskamost ZPL operations (e.g. write, create) will return
5823ff01b23SMartin Matuska.Sy ENOSPC .
5833ff01b23SMartin Matuska.
58414c2e0a0SMartin Matuska.It Sy spa_num_allocators Ns = Ns Sy 4 Pq int
585*61145dc2SMartin MatuskaDetermines the number of block allocators to use per spa instance.
586b985c9caSMartin MatuskaCapped by the number of actual CPUs in the system via
587b985c9caSMartin Matuska.Sy spa_cpus_per_allocator .
58814c2e0a0SMartin Matuska.Pp
58914c2e0a0SMartin MatuskaNote that setting this value too high could result in performance
590*61145dc2SMartin Matuskadegradation and/or excess fragmentation.
591b985c9caSMartin MatuskaSet value only applies to pools imported/created after that.
592b985c9caSMartin Matuska.
593b985c9caSMartin Matuska.It Sy spa_cpus_per_allocator Ns = Ns Sy 4 Pq int
594*61145dc2SMartin MatuskaDetermines the minimum number of CPUs in a system for block allocator
595b985c9caSMartin Matuskaper spa instance.
596b985c9caSMartin MatuskaSet value only applies to pools imported/created after that.
59714c2e0a0SMartin Matuska.
598716fd348SMartin Matuska.It Sy spa_upgrade_errlog_limit Ns = Ns Sy 0 Pq uint
599716fd348SMartin MatuskaLimits the number of on-disk error log entries that will be converted to the
600716fd348SMartin Matuskanew format when enabling the
601716fd348SMartin Matuska.Sy head_errlog
602716fd348SMartin Matuskafeature.
603716fd348SMartin MatuskaThe default is to convert all log entries.
604716fd348SMartin Matuska.
605be181ee2SMartin Matuska.It Sy vdev_removal_max_span Ns = Ns Sy 32768 Ns B Po 32 KiB Pc Pq uint
6063ff01b23SMartin MatuskaDuring top-level vdev removal, chunks of data are copied from the vdev
6073ff01b23SMartin Matuskawhich may include free space in order to trade bandwidth for IOPS.
6083ff01b23SMartin MatuskaThis parameter determines the maximum span of free space, in bytes,
6093ff01b23SMartin Matuskawhich will be included as "unnecessary" data in a chunk of copied data.
6103ff01b23SMartin Matuska.Pp
6113ff01b23SMartin MatuskaThe default value here was chosen to align with
6123ff01b23SMartin Matuska.Sy zfs_vdev_read_gap_limit ,
6133ff01b23SMartin Matuskawhich is a similar concept when doing
6143ff01b23SMartin Matuskaregular reads (but there's no reason it has to be the same).
6153ff01b23SMartin Matuska.
616dbd5678dSMartin Matuska.It Sy vdev_file_logical_ashift Ns = Ns Sy 9 Po 512 B Pc Pq u64
6173ff01b23SMartin MatuskaLogical ashift for file-based devices.
6183ff01b23SMartin Matuska.
619dbd5678dSMartin Matuska.It Sy vdev_file_physical_ashift Ns = Ns Sy 9 Po 512 B Pc Pq u64
6203ff01b23SMartin MatuskaPhysical ashift for file-based devices.
6213ff01b23SMartin Matuska.
6223ff01b23SMartin Matuska.It Sy zap_iterate_prefetch Ns = Ns Sy 1 Ns | Ns 0 Pq int
6233ff01b23SMartin MatuskaIf set, when we start iterating over a ZAP object,
6243ff01b23SMartin Matuskaprefetch the entire object (all leaf blocks).
6253ff01b23SMartin MatuskaHowever, this is limited by
6263ff01b23SMartin Matuska.Sy dmu_prefetch_max .
6273ff01b23SMartin Matuska.
62815f0b8c3SMartin Matuska.It Sy zap_micro_max_size Ns = Ns Sy 131072 Ns B Po 128 KiB Pc Pq int
62915f0b8c3SMartin MatuskaMaximum micro ZAP size.
6307a7741afSMartin MatuskaA "micro" ZAP is upgraded to a "fat" ZAP once it grows beyond the specified
6317a7741afSMartin Matuskasize.
6327a7741afSMartin MatuskaSizes higher than 128KiB will be clamped to 128KiB unless the
6337a7741afSMartin Matuska.Sy large_microzap
6347a7741afSMartin Matuskafeature is enabled.
63515f0b8c3SMartin Matuska.
636b985c9caSMartin Matuska.It Sy zap_shrink_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
637b985c9caSMartin MatuskaIf set, adjacent empty ZAP blocks will be collapsed, reducing disk space.
6381719886fSMartin Matuska.
639e3aa18adSMartin Matuska.It Sy zfetch_min_distance Ns = Ns Sy 4194304 Ns B Po 4 MiB Pc Pq uint
640e3aa18adSMartin MatuskaMin bytes to prefetch per stream.
641e3aa18adSMartin MatuskaPrefetch distance starts from the demand access size and quickly grows to
642e3aa18adSMartin Matuskathis value, doubling on each hit.
643e3aa18adSMartin MatuskaAfter that it may grow further by 1/8 per hit, but only if some prefetch
644e3aa18adSMartin Matuskasince last time haven't completed in time to satisfy demand request, i.e.
645e3aa18adSMartin Matuskaprefetch depth didn't cover the read latency or the pool got saturated.
646e3aa18adSMartin Matuska.
647e3aa18adSMartin Matuska.It Sy zfetch_max_distance Ns = Ns Sy 67108864 Ns B Po 64 MiB Pc Pq uint
6483ff01b23SMartin MatuskaMax bytes to prefetch per stream.
6493ff01b23SMartin Matuska.
650716fd348SMartin Matuska.It Sy zfetch_max_idistance Ns = Ns Sy 67108864 Ns B Po 64 MiB Pc Pq uint
6513ff01b23SMartin MatuskaMax bytes to prefetch indirects for per stream.
6523ff01b23SMartin Matuska.
6531719886fSMartin Matuska.It Sy zfetch_max_reorder Ns = Ns Sy 16777216 Ns B Po 16 MiB Pc Pq uint
6541719886fSMartin MatuskaRequests within this byte distance from the current prefetch stream position
6551719886fSMartin Matuskaare considered parts of the stream, reordered due to parallel processing.
6561719886fSMartin MatuskaSuch requests do not advance the stream position immediately unless
6571719886fSMartin Matuska.Sy zfetch_hole_shift
6581719886fSMartin Matuskafill threshold is reached, but saved to fill holes in the stream later.
6591719886fSMartin Matuska.
6603ff01b23SMartin Matuska.It Sy zfetch_max_streams Ns = Ns Sy 8 Pq uint
6613ff01b23SMartin MatuskaMax number of streams per zfetch (prefetch streams per file).
6623ff01b23SMartin Matuska.
663e3aa18adSMartin Matuska.It Sy zfetch_min_sec_reap Ns = Ns Sy 1 Pq uint
664e3aa18adSMartin MatuskaMin time before inactive prefetch stream can be reclaimed
665e3aa18adSMartin Matuska.
666e3aa18adSMartin Matuska.It Sy zfetch_max_sec_reap Ns = Ns Sy 2 Pq uint
667e3aa18adSMartin MatuskaMax time before inactive prefetch stream can be deleted
6683ff01b23SMartin Matuska.
6693ff01b23SMartin Matuska.It Sy zfs_abd_scatter_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
6703ff01b23SMartin MatuskaEnables ARC from using scatter/gather lists and forces all allocations to be
6713ff01b23SMartin Matuskalinear in kernel memory.
6723ff01b23SMartin MatuskaDisabling can improve performance in some code paths
6733ff01b23SMartin Matuskaat the expense of fragmented kernel memory.
6743ff01b23SMartin Matuska.
675e92ffd9bSMartin Matuska.It Sy zfs_abd_scatter_max_order Ns = Ns Sy MAX_ORDER\-1 Pq uint
6763ff01b23SMartin MatuskaMaximum number of consecutive memory pages allocated in a single block for
6773ff01b23SMartin Matuskascatter/gather lists.
6783ff01b23SMartin Matuska.Pp
6793ff01b23SMartin MatuskaThe value of
6803ff01b23SMartin Matuska.Sy MAX_ORDER
6813ff01b23SMartin Matuskadepends on kernel configuration.
6823ff01b23SMartin Matuska.
683716fd348SMartin Matuska.It Sy zfs_abd_scatter_min_size Ns = Ns Sy 1536 Ns B Po 1.5 KiB Pc Pq uint
6843ff01b23SMartin MatuskaThis is the minimum allocation size that will use scatter (page-based) ABDs.
6853ff01b23SMartin MatuskaSmaller allocations will use linear ABDs.
6863ff01b23SMartin Matuska.
687dbd5678dSMartin Matuska.It Sy zfs_arc_dnode_limit Ns = Ns Sy 0 Ns B Pq u64
6883ff01b23SMartin MatuskaWhen the number of bytes consumed by dnodes in the ARC exceeds this number of
6893ff01b23SMartin Matuskabytes, try to unpin some of it in response to demand for non-metadata.
6903ff01b23SMartin MatuskaThis value acts as a ceiling to the amount of dnode metadata, and defaults to
6913ff01b23SMartin Matuska.Sy 0 ,
6923ff01b23SMartin Matuskawhich indicates that a percent which is based on
6933ff01b23SMartin Matuska.Sy zfs_arc_dnode_limit_percent
6943ff01b23SMartin Matuskaof the ARC meta buffers that may be used for dnodes.
695dbd5678dSMartin Matuska.It Sy zfs_arc_dnode_limit_percent Ns = Ns Sy 10 Ns % Pq u64
6963ff01b23SMartin MatuskaPercentage that can be consumed by dnodes of ARC meta buffers.
6973ff01b23SMartin Matuska.Pp
6983ff01b23SMartin MatuskaSee also
6993ff01b23SMartin Matuska.Sy zfs_arc_dnode_limit ,
7003ff01b23SMartin Matuskawhich serves a similar purpose but has a higher priority if nonzero.
7013ff01b23SMartin Matuska.
702dbd5678dSMartin Matuska.It Sy zfs_arc_dnode_reduce_percent Ns = Ns Sy 10 Ns % Pq u64
7033ff01b23SMartin MatuskaPercentage of ARC dnodes to try to scan in response to demand for non-metadata
7043ff01b23SMartin Matuskawhen the number of bytes consumed by dnodes exceeds
7053ff01b23SMartin Matuska.Sy zfs_arc_dnode_limit .
7063ff01b23SMartin Matuska.
707be181ee2SMartin Matuska.It Sy zfs_arc_average_blocksize Ns = Ns Sy 8192 Ns B Po 8 KiB Pc Pq uint
7083ff01b23SMartin MatuskaThe ARC's buffer hash table is sized based on the assumption of an average
7093ff01b23SMartin Matuskablock size of this value.
710716fd348SMartin MatuskaThis works out to roughly 1 MiB of hash table per 1 GiB of physical memory
7113ff01b23SMartin Matuskawith 8-byte pointers.
7123ff01b23SMartin MatuskaFor configurations with a known larger average block size,
7133ff01b23SMartin Matuskathis value can be increased to reduce the memory footprint.
7143ff01b23SMartin Matuska.
715be181ee2SMartin Matuska.It Sy zfs_arc_eviction_pct Ns = Ns Sy 200 Ns % Pq uint
7163ff01b23SMartin MatuskaWhen
7173ff01b23SMartin Matuska.Fn arc_is_overflowing ,
7183ff01b23SMartin Matuska.Fn arc_get_data_impl
7193ff01b23SMartin Matuskawaits for this percent of the requested amount of data to be evicted.
7203ff01b23SMartin MatuskaFor example, by default, for every
721716fd348SMartin Matuska.Em 2 KiB
7223ff01b23SMartin Matuskathat's evicted,
723716fd348SMartin Matuska.Em 1 KiB
7243ff01b23SMartin Matuskaof it may be "reused" by a new allocation.
7253ff01b23SMartin MatuskaSince this is above
7263ff01b23SMartin Matuska.Sy 100 Ns % ,
7273ff01b23SMartin Matuskait ensures that progress is made towards getting
7283ff01b23SMartin Matuska.Sy arc_size No under Sy arc_c .
7293ff01b23SMartin MatuskaSince this is finite, it ensures that allocations can still happen,
7303ff01b23SMartin Matuskaeven during the potentially long time that
7313ff01b23SMartin Matuska.Sy arc_size No is more than Sy arc_c .
7323ff01b23SMartin Matuska.
733be181ee2SMartin Matuska.It Sy zfs_arc_evict_batch_limit Ns = Ns Sy 10 Pq uint
7343ff01b23SMartin MatuskaNumber ARC headers to evict per sub-list before proceeding to another sub-list.
7353ff01b23SMartin MatuskaThis batch-style operation prevents entire sub-lists from being evicted at once
7363ff01b23SMartin Matuskabut comes at a cost of additional unlocking and locking.
7373ff01b23SMartin Matuska.
738be181ee2SMartin Matuska.It Sy zfs_arc_grow_retry Ns = Ns Sy 0 Ns s Pq uint
7393ff01b23SMartin MatuskaIf set to a non zero value, it will replace the
7403ff01b23SMartin Matuska.Sy arc_grow_retry
7413ff01b23SMartin Matuskavalue with this value.
7423ff01b23SMartin MatuskaThe
7433ff01b23SMartin Matuska.Sy arc_grow_retry
7443ff01b23SMartin Matuska.No value Pq default Sy 5 Ns s
7453ff01b23SMartin Matuskais the number of seconds the ARC will wait before
7463ff01b23SMartin Matuskatrying to resume growth after a memory pressure event.
7473ff01b23SMartin Matuska.
7483ff01b23SMartin Matuska.It Sy zfs_arc_lotsfree_percent Ns = Ns Sy 10 Ns % Pq int
7493ff01b23SMartin MatuskaThrottle I/O when free system memory drops below this percentage of total
7503ff01b23SMartin Matuskasystem memory.
7513ff01b23SMartin MatuskaSetting this value to
7523ff01b23SMartin Matuska.Sy 0
7533ff01b23SMartin Matuskawill disable the throttle.
7543ff01b23SMartin Matuska.
755dbd5678dSMartin Matuska.It Sy zfs_arc_max Ns = Ns Sy 0 Ns B Pq u64
7563ff01b23SMartin MatuskaMax size of ARC in bytes.
7573ff01b23SMartin MatuskaIf
7583ff01b23SMartin Matuska.Sy 0 ,
7593ff01b23SMartin Matuskathen the max size of ARC is determined by the amount of system memory installed.
7606c1e79dfSMartin MatuskaThe larger of
761716fd348SMartin Matuska.Sy all_system_memory No \- Sy 1 GiB
762e92ffd9bSMartin Matuskaand
763e92ffd9bSMartin Matuska.Sy 5/8 No \(mu Sy all_system_memory
7643ff01b23SMartin Matuskawill be used as the limit.
7653ff01b23SMartin MatuskaThis value must be at least
766716fd348SMartin Matuska.Sy 67108864 Ns B Pq 64 MiB .
7673ff01b23SMartin Matuska.Pp
7683ff01b23SMartin MatuskaThis value can be changed dynamically, with some caveats.
7693ff01b23SMartin MatuskaIt cannot be set back to
7703ff01b23SMartin Matuska.Sy 0
7713ff01b23SMartin Matuskawhile running, and reducing it below the current ARC size will not cause
7723ff01b23SMartin Matuskathe ARC to shrink without memory pressure to induce shrinking.
7733ff01b23SMartin Matuska.
7742a58b312SMartin Matuska.It Sy zfs_arc_meta_balance Ns = Ns Sy 500 Pq uint
7752a58b312SMartin MatuskaBalance between metadata and data on ghost hits.
7762a58b312SMartin MatuskaValues above 100 increase metadata caching by proportionally reducing effect
7772a58b312SMartin Matuskaof ghost data hits on target data/metadata rate.
7783ff01b23SMartin Matuska.
779dbd5678dSMartin Matuska.It Sy zfs_arc_min Ns = Ns Sy 0 Ns B Pq u64
7803ff01b23SMartin MatuskaMin size of ARC in bytes.
7813ff01b23SMartin Matuska.No If set to Sy 0 , arc_c_min
7823ff01b23SMartin Matuskawill default to consuming the larger of
783716fd348SMartin Matuska.Sy 32 MiB
784e92ffd9bSMartin Matuskaand
785e92ffd9bSMartin Matuska.Sy all_system_memory No / Sy 32 .
7863ff01b23SMartin Matuska.
787be181ee2SMartin Matuska.It Sy zfs_arc_min_prefetch_ms Ns = Ns Sy 0 Ns ms Ns Po Ns ≡ Ns 1s Pc Pq uint
7883ff01b23SMartin MatuskaMinimum time prefetched blocks are locked in the ARC.
7893ff01b23SMartin Matuska.
790be181ee2SMartin Matuska.It Sy zfs_arc_min_prescient_prefetch_ms Ns = Ns Sy 0 Ns ms Ns Po Ns ≡ Ns 6s Pc Pq uint
7913ff01b23SMartin MatuskaMinimum time "prescient prefetched" blocks are locked in the ARC.
7923ff01b23SMartin MatuskaThese blocks are meant to be prefetched fairly aggressively ahead of
7933ff01b23SMartin Matuskathe code that may use them.
7943ff01b23SMartin Matuska.
795e92ffd9bSMartin Matuska.It Sy zfs_arc_prune_task_threads Ns = Ns Sy 1 Pq int
796e92ffd9bSMartin MatuskaNumber of arc_prune threads.
797e92ffd9bSMartin Matuska.Fx
798e92ffd9bSMartin Matuskadoes not need more than one.
799e92ffd9bSMartin MatuskaLinux may theoretically use one per mount point up to number of CPUs,
800e92ffd9bSMartin Matuskabut that was not proven to be useful.
801e92ffd9bSMartin Matuska.
8023ff01b23SMartin Matuska.It Sy zfs_max_missing_tvds Ns = Ns Sy 0 Pq int
8033ff01b23SMartin MatuskaNumber of missing top-level vdevs which will be allowed during
8043ff01b23SMartin Matuskapool import (only in read-only mode).
8053ff01b23SMartin Matuska.
806dbd5678dSMartin Matuska.It Sy zfs_max_nvlist_src_size Ns = Sy 0 Pq u64
8073ff01b23SMartin MatuskaMaximum size in bytes allowed to be passed as
8083ff01b23SMartin Matuska.Sy zc_nvlist_src_size
8093ff01b23SMartin Matuskafor ioctls on
8103ff01b23SMartin Matuska.Pa /dev/zfs .
8113ff01b23SMartin MatuskaThis prevents a user from causing the kernel to allocate
8123ff01b23SMartin Matuskaan excessive amount of memory.
8133ff01b23SMartin MatuskaWhen the limit is exceeded, the ioctl fails with
8143ff01b23SMartin Matuska.Sy EINVAL
8153ff01b23SMartin Matuskaand a description of the error is sent to the
8163ff01b23SMartin Matuska.Pa zfs-dbgmsg
8173ff01b23SMartin Matuskalog.
8183ff01b23SMartin MatuskaThis parameter should not need to be touched under normal circumstances.
8193ff01b23SMartin MatuskaIf
8203ff01b23SMartin Matuska.Sy 0 ,
8213ff01b23SMartin Matuskaequivalent to a quarter of the user-wired memory limit under
8223ff01b23SMartin Matuska.Fx
8233ff01b23SMartin Matuskaand to
824716fd348SMartin Matuska.Sy 134217728 Ns B Pq 128 MiB
8253ff01b23SMartin Matuskaunder Linux.
8263ff01b23SMartin Matuska.
827be181ee2SMartin Matuska.It Sy zfs_multilist_num_sublists Ns = Ns Sy 0 Pq uint
8283ff01b23SMartin MatuskaTo allow more fine-grained locking, each ARC state contains a series
8293ff01b23SMartin Matuskaof lists for both data and metadata objects.
8303ff01b23SMartin MatuskaLocking is performed at the level of these "sub-lists".
8313ff01b23SMartin MatuskaThis parameters controls the number of sub-lists per ARC state,
8323ff01b23SMartin Matuskaand also applies to other uses of the multilist data structure.
8333ff01b23SMartin Matuska.Pp
8343ff01b23SMartin MatuskaIf
8353ff01b23SMartin Matuska.Sy 0 ,
8363ff01b23SMartin Matuskaequivalent to the greater of the number of online CPUs and
8373ff01b23SMartin Matuska.Sy 4 .
8383ff01b23SMartin Matuska.
8393ff01b23SMartin Matuska.It Sy zfs_arc_overflow_shift Ns = Ns Sy 8 Pq int
8403ff01b23SMartin MatuskaThe ARC size is considered to be overflowing if it exceeds the current
8413ff01b23SMartin MatuskaARC target size
8423ff01b23SMartin Matuska.Pq Sy arc_c
8433f9d360cSMartin Matuskaby thresholds determined by this parameter.
8443f9d360cSMartin MatuskaExceeding by
845e92ffd9bSMartin Matuska.Sy ( arc_c No >> Sy zfs_arc_overflow_shift ) No / Sy 2
8463f9d360cSMartin Matuskastarts ARC reclamation process.
8473f9d360cSMartin MatuskaIf that appears insufficient, exceeding by
848e92ffd9bSMartin Matuska.Sy ( arc_c No >> Sy zfs_arc_overflow_shift ) No \(mu Sy 1.5
8493f9d360cSMartin Matuskablocks new buffer allocation until the reclaim thread catches up.
8503f9d360cSMartin MatuskaStarted reclamation process continues till ARC size returns below the
8513f9d360cSMartin Matuskatarget size.
8523ff01b23SMartin Matuska.Pp
8533ff01b23SMartin MatuskaThe default value of
8543ff01b23SMartin Matuska.Sy 8
8553f9d360cSMartin Matuskacauses the ARC to start reclamation if it exceeds the target size by
8563f9d360cSMartin Matuska.Em 0.2%
8573f9d360cSMartin Matuskaof the target size, and block allocations by
8583f9d360cSMartin Matuska.Em 0.6% .
8593ff01b23SMartin Matuska.
860be181ee2SMartin Matuska.It Sy zfs_arc_shrink_shift Ns = Ns Sy 0 Pq uint
8613ff01b23SMartin MatuskaIf nonzero, this will update
8623ff01b23SMartin Matuska.Sy arc_shrink_shift Pq default Sy 7
8633ff01b23SMartin Matuskawith the new value.
8643ff01b23SMartin Matuska.
8653ff01b23SMartin Matuska.It Sy zfs_arc_pc_percent Ns = Ns Sy 0 Ns % Po off Pc Pq uint
8663ff01b23SMartin MatuskaPercent of pagecache to reclaim ARC to.
8673ff01b23SMartin Matuska.Pp
8683ff01b23SMartin MatuskaThis tunable allows the ZFS ARC to play more nicely
8693ff01b23SMartin Matuskawith the kernel's LRU pagecache.
8703ff01b23SMartin MatuskaIt can guarantee that the ARC size won't collapse under scanning
8713ff01b23SMartin Matuskapressure on the pagecache, yet still allows the ARC to be reclaimed down to
8723ff01b23SMartin Matuska.Sy zfs_arc_min
8733ff01b23SMartin Matuskaif necessary.
8743ff01b23SMartin MatuskaThis value is specified as percent of pagecache size (as measured by
8753ff01b23SMartin Matuska.Sy NR_FILE_PAGES ) ,
8763ff01b23SMartin Matuskawhere that percent may exceed
8773ff01b23SMartin Matuska.Sy 100 .
8783ff01b23SMartin MatuskaThis
8793ff01b23SMartin Matuskaonly operates during memory pressure/reclaim.
8803ff01b23SMartin Matuska.
881dd215568SMartin Matuska.It Sy zfs_arc_shrinker_limit Ns = Ns Sy 0 Pq int
8823ff01b23SMartin MatuskaThis is a limit on how many pages the ARC shrinker makes available for
8833ff01b23SMartin Matuskaeviction in response to one page allocation attempt.
8843ff01b23SMartin MatuskaNote that in practice, the kernel's shrinker can ask us to evict
8853ff01b23SMartin Matuskaup to about four times this for one allocation attempt.
886e2df9bb4SMartin MatuskaTo reduce OOM risk, this limit is applied for kswapd reclaims only.
8873ff01b23SMartin Matuska.Pp
888dd215568SMartin MatuskaFor example a value of
889716fd348SMartin Matuska.Sy 10000 Pq in practice, Em 160 MiB No per allocation attempt with 4 KiB pages
8903ff01b23SMartin Matuskalimits the amount of time spent attempting to reclaim ARC memory to
8913ff01b23SMartin Matuskaless than 100 ms per allocation attempt,
892716fd348SMartin Matuskaeven with a small average compressed block size of ~8 KiB.
8933ff01b23SMartin Matuska.Pp
8943ff01b23SMartin MatuskaThe parameter can be set to 0 (zero) to disable the limit,
8953ff01b23SMartin Matuskaand only applies on Linux.
8963ff01b23SMartin Matuska.
897ce4dcb97SMartin Matuska.It Sy zfs_arc_shrinker_seeks Ns = Ns Sy 2 Pq int
898ce4dcb97SMartin MatuskaRelative cost of ARC eviction on Linux, AKA number of seeks needed to
899ce4dcb97SMartin Matuskarestore evicted page.
900ce4dcb97SMartin MatuskaBigger values make ARC more precious and evictions smaller, comparing to
901ce4dcb97SMartin Matuskaother kernel subsystems.
902ce4dcb97SMartin MatuskaValue of 4 means parity with page cache.
903ce4dcb97SMartin Matuska.
904dbd5678dSMartin Matuska.It Sy zfs_arc_sys_free Ns = Ns Sy 0 Ns B Pq u64
9053ff01b23SMartin MatuskaThe target number of bytes the ARC should leave as free memory on the system.
9063ff01b23SMartin MatuskaIf zero, equivalent to the bigger of
907716fd348SMartin Matuska.Sy 512 KiB No and Sy all_system_memory/64 .
9083ff01b23SMartin Matuska.
9093ff01b23SMartin Matuska.It Sy zfs_autoimport_disable Ns = Ns Sy 1 Ns | Ns 0 Pq int
9103ff01b23SMartin MatuskaDisable pool import at module load by ignoring the cache file
9113ff01b23SMartin Matuska.Pq Sy spa_config_path .
9123ff01b23SMartin Matuska.
9133ff01b23SMartin Matuska.It Sy zfs_checksum_events_per_second Ns = Ns Sy 20 Ns /s Pq uint
9143ff01b23SMartin MatuskaRate limit checksum events to this many per second.
9153ff01b23SMartin MatuskaNote that this should not be set below the ZED thresholds
9163ff01b23SMartin Matuska(currently 10 checksums over 10 seconds)
9173ff01b23SMartin Matuskaor else the daemon may not trigger any action.
9183ff01b23SMartin Matuska.
9196c1e79dfSMartin Matuska.It Sy zfs_commit_timeout_pct Ns = Ns Sy 10 Ns % Pq uint
9203ff01b23SMartin MatuskaThis controls the amount of time that a ZIL block (lwb) will remain "open"
9213ff01b23SMartin Matuskawhen it isn't "full", and it has a thread waiting for it to be committed to
9223ff01b23SMartin Matuskastable storage.
9233ff01b23SMartin MatuskaThe timeout is scaled based on a percentage of the last lwb
9243ff01b23SMartin Matuskalatency to avoid significantly impacting the latency of each individual
9253ff01b23SMartin Matuskatransaction record (itx).
9263ff01b23SMartin Matuska.
9273ff01b23SMartin Matuska.It Sy zfs_condense_indirect_commit_entry_delay_ms Ns = Ns Sy 0 Ns ms Pq int
9283ff01b23SMartin MatuskaVdev indirection layer (used for device removal) sleeps for this many
9293ff01b23SMartin Matuskamilliseconds during mapping generation.
9303ff01b23SMartin MatuskaIntended for use with the test suite to throttle vdev removal speed.
9313ff01b23SMartin Matuska.
932be181ee2SMartin Matuska.It Sy zfs_condense_indirect_obsolete_pct Ns = Ns Sy 25 Ns % Pq uint
933bb2d13b6SMartin MatuskaMinimum percent of obsolete bytes in vdev mapping required to attempt to
934bb2d13b6SMartin Matuskacondense
9353ff01b23SMartin Matuska.Pq see Sy zfs_condense_indirect_vdevs_enable .
9363ff01b23SMartin MatuskaIntended for use with the test suite
9373ff01b23SMartin Matuskato facilitate triggering condensing as needed.
9383ff01b23SMartin Matuska.
9393ff01b23SMartin Matuska.It Sy zfs_condense_indirect_vdevs_enable Ns = Ns Sy 1 Ns | Ns 0 Pq int
9403ff01b23SMartin MatuskaEnable condensing indirect vdev mappings.
9413ff01b23SMartin MatuskaWhen set, attempt to condense indirect vdev mappings
9423ff01b23SMartin Matuskaif the mapping uses more than
9433ff01b23SMartin Matuska.Sy zfs_condense_min_mapping_bytes
9443ff01b23SMartin Matuskabytes of memory and if the obsolete space map object uses more than
9453ff01b23SMartin Matuska.Sy zfs_condense_max_obsolete_bytes
9463ff01b23SMartin Matuskabytes on-disk.
947bb2d13b6SMartin MatuskaThe condensing process is an attempt to save memory by removing obsolete
948bb2d13b6SMartin Matuskamappings.
9493ff01b23SMartin Matuska.
950dbd5678dSMartin Matuska.It Sy zfs_condense_max_obsolete_bytes Ns = Ns Sy 1073741824 Ns B Po 1 GiB Pc Pq u64
9513ff01b23SMartin MatuskaOnly attempt to condense indirect vdev mappings if the on-disk size
9523ff01b23SMartin Matuskaof the obsolete space map object is greater than this number of bytes
9533ff01b23SMartin Matuska.Pq see Sy zfs_condense_indirect_vdevs_enable .
9543ff01b23SMartin Matuska.
955dbd5678dSMartin Matuska.It Sy zfs_condense_min_mapping_bytes Ns = Ns Sy 131072 Ns B Po 128 KiB Pc Pq u64
9563ff01b23SMartin MatuskaMinimum size vdev mapping to attempt to condense
9573ff01b23SMartin Matuska.Pq see Sy zfs_condense_indirect_vdevs_enable .
9583ff01b23SMartin Matuska.
9593ff01b23SMartin Matuska.It Sy zfs_dbgmsg_enable Ns = Ns Sy 1 Ns | Ns 0 Pq int
9603ff01b23SMartin MatuskaInternally ZFS keeps a small log to facilitate debugging.
9613ff01b23SMartin MatuskaThe log is enabled by default, and can be disabled by unsetting this option.
9623ff01b23SMartin MatuskaThe contents of the log can be accessed by reading
9633ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/dbgmsg .
9643ff01b23SMartin MatuskaWriting
9653ff01b23SMartin Matuska.Sy 0
9663ff01b23SMartin Matuskato the file clears the log.
9673ff01b23SMartin Matuska.Pp
9683ff01b23SMartin MatuskaThis setting does not influence debug prints due to
9693ff01b23SMartin Matuska.Sy zfs_flags .
9703ff01b23SMartin Matuska.
971be181ee2SMartin Matuska.It Sy zfs_dbgmsg_maxsize Ns = Ns Sy 4194304 Ns B Po 4 MiB Pc Pq uint
9723ff01b23SMartin MatuskaMaximum size of the internal ZFS debug log.
9733ff01b23SMartin Matuska.
9743ff01b23SMartin Matuska.It Sy zfs_dbuf_state_index Ns = Ns Sy 0 Pq int
9753ff01b23SMartin MatuskaHistorically used for controlling what reporting was available under
9763ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs .
9773ff01b23SMartin MatuskaNo effect.
9783ff01b23SMartin Matuska.
979aca928a5SMartin Matuska.It Sy zfs_deadman_checktime_ms Ns = Ns Sy 60000 Ns ms Po 1 min Pc Pq u64
980aca928a5SMartin MatuskaCheck time in milliseconds.
981aca928a5SMartin MatuskaThis defines the frequency at which we check for hung I/O requests
982aca928a5SMartin Matuskaand potentially invoke the
983aca928a5SMartin Matuska.Sy zfs_deadman_failmode
984aca928a5SMartin Matuskabehavior.
985aca928a5SMartin Matuska.
9863ff01b23SMartin Matuska.It Sy zfs_deadman_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
9873ff01b23SMartin MatuskaWhen a pool sync operation takes longer than
9883ff01b23SMartin Matuska.Sy zfs_deadman_synctime_ms ,
9893ff01b23SMartin Matuskaor when an individual I/O operation takes longer than
9903ff01b23SMartin Matuska.Sy zfs_deadman_ziotime_ms ,
9913ff01b23SMartin Matuskathen the operation is considered to be "hung".
9923ff01b23SMartin MatuskaIf
9933ff01b23SMartin Matuska.Sy zfs_deadman_enabled
9943ff01b23SMartin Matuskais set, then the deadman behavior is invoked as described by
9953ff01b23SMartin Matuska.Sy zfs_deadman_failmode .
9963ff01b23SMartin MatuskaBy default, the deadman is enabled and set to
9973ff01b23SMartin Matuska.Sy wait
998c03c5b1cSMartin Matuskawhich results in "hung" I/O operations only being logged.
9993ff01b23SMartin MatuskaThe deadman is automatically disabled when a pool gets suspended.
10003ff01b23SMartin Matuska.
1001aca928a5SMartin Matuska.It Sy zfs_deadman_events_per_second Ns = Ns Sy 1 Ns /s Pq int
1002aca928a5SMartin MatuskaRate limit deadman zevents (which report hung I/O operations) to this many per
1003aca928a5SMartin Matuskasecond.
1004aca928a5SMartin Matuska.
10053ff01b23SMartin Matuska.It Sy zfs_deadman_failmode Ns = Ns Sy wait Pq charp
10063ff01b23SMartin MatuskaControls the failure behavior when the deadman detects a "hung" I/O operation.
10073ff01b23SMartin MatuskaValid values are:
10083ff01b23SMartin Matuska.Bl -tag -compact -offset 4n -width "continue"
10093ff01b23SMartin Matuska.It Sy wait
10103ff01b23SMartin MatuskaWait for a "hung" operation to complete.
10113ff01b23SMartin MatuskaFor each "hung" operation a "deadman" event will be posted
10123ff01b23SMartin Matuskadescribing that operation.
10133ff01b23SMartin Matuska.It Sy continue
10143ff01b23SMartin MatuskaAttempt to recover from a "hung" operation by re-dispatching it
10153ff01b23SMartin Matuskato the I/O pipeline if possible.
10163ff01b23SMartin Matuska.It Sy panic
10173ff01b23SMartin MatuskaPanic the system.
10183ff01b23SMartin MatuskaThis can be used to facilitate automatic fail-over
10193ff01b23SMartin Matuskato a properly configured fail-over partner.
10203ff01b23SMartin Matuska.El
10213ff01b23SMartin Matuska.
1022dbd5678dSMartin Matuska.It Sy zfs_deadman_synctime_ms Ns = Ns Sy 600000 Ns ms Po 10 min Pc Pq u64
10233ff01b23SMartin MatuskaInterval in milliseconds after which the deadman is triggered and also
10243ff01b23SMartin Matuskathe interval after which a pool sync operation is considered to be "hung".
10253ff01b23SMartin MatuskaOnce this limit is exceeded the deadman will be invoked every
10263ff01b23SMartin Matuska.Sy zfs_deadman_checktime_ms
10273ff01b23SMartin Matuskamilliseconds until the pool sync completes.
10283ff01b23SMartin Matuska.
1029dbd5678dSMartin Matuska.It Sy zfs_deadman_ziotime_ms Ns = Ns Sy 300000 Ns ms Po 5 min Pc Pq u64
10303ff01b23SMartin MatuskaInterval in milliseconds after which the deadman is triggered and an
10313ff01b23SMartin Matuskaindividual I/O operation is considered to be "hung".
10323ff01b23SMartin MatuskaAs long as the operation remains "hung",
10333ff01b23SMartin Matuskathe deadman will be invoked every
10343ff01b23SMartin Matuska.Sy zfs_deadman_checktime_ms
10353ff01b23SMartin Matuskamilliseconds until the operation completes.
10363ff01b23SMartin Matuska.
10373ff01b23SMartin Matuska.It Sy zfs_dedup_prefetch Ns = Ns Sy 0 Ns | Ns 1 Pq int
10383ff01b23SMartin MatuskaEnable prefetching dedup-ed blocks which are going to be freed.
10393ff01b23SMartin Matuska.
1040e2df9bb4SMartin Matuska.It Sy zfs_dedup_log_flush_min_time_ms Ns = Ns Sy 1000 Ns Pq uint
1041e2df9bb4SMartin MatuskaMinimum time to spend on dedup log flush each transaction.
1042e2df9bb4SMartin Matuska.Pp
1043e2df9bb4SMartin MatuskaAt least this long will be spent flushing dedup log entries each transaction,
1044e2df9bb4SMartin Matuskaup to
1045e2df9bb4SMartin Matuska.Sy zfs_txg_timeout .
1046e2df9bb4SMartin MatuskaThis occurs even if doing so would delay the transaction, that is, other IO
1047e2df9bb4SMartin Matuskacompletes under this time.
1048e2df9bb4SMartin Matuska.
1049*61145dc2SMartin Matuska.It Sy zfs_dedup_log_flush_entries_min Ns = Ns Sy 100 Ns Pq uint
1050e2df9bb4SMartin MatuskaFlush at least this many entries each transaction.
1051e2df9bb4SMartin Matuska.Pp
1052*61145dc2SMartin MatuskaOpenZFS will flush a fraction of the log every TXG, to keep the size
1053*61145dc2SMartin Matuskaproportional to the ingest rate (see
1054*61145dc2SMartin Matuska.Sy zfs_dedup_log_flush_txgs ) .
1055*61145dc2SMartin MatuskaThis sets the minimum for that estimate, which prevents the backlog from
1056*61145dc2SMartin Matuskacompletely draining if the ingest rate falls.
1057*61145dc2SMartin MatuskaRaising it can force OpenZFS to flush more aggressively, reducing the backlog
1058*61145dc2SMartin Matuskato zero more quickly, but can make it less able to back off if log
1059*61145dc2SMartin Matuskaflushing would compete with other IO too much.
1060e2df9bb4SMartin Matuska.
1061*61145dc2SMartin Matuska.It Sy zfs_dedup_log_flush_entries_max Ns = Ns Sy UINT_MAX Ns Pq uint
1062*61145dc2SMartin MatuskaFlush at most this many entries each transaction.
1063*61145dc2SMartin Matuska.Pp
1064*61145dc2SMartin MatuskaMostly used for debugging purposes.
1065*61145dc2SMartin Matuska.It Sy zfs_dedup_log_flush_txgs Ns = Ns Sy 100 Ns Pq uint
1066*61145dc2SMartin MatuskaTarget number of TXGs to process the whole dedup log.
1067*61145dc2SMartin Matuska.Pp
1068*61145dc2SMartin MatuskaEvery TXG, OpenZFS will process the inverse of this number times the size
1069*61145dc2SMartin Matuskaof the DDT backlog.
1070*61145dc2SMartin MatuskaThis will keep the backlog at a size roughly equal to the ingest rate
1071*61145dc2SMartin Matuskatimes this value.
1072*61145dc2SMartin MatuskaThis offers a balance between a more efficient DDT log, with better
1073*61145dc2SMartin Matuskaaggregation, and shorter import times, which increase as the size of the
1074*61145dc2SMartin MatuskaDDT log increases.
1075*61145dc2SMartin MatuskaIncreasing this value will result in a more efficient DDT log, but longer
1076*61145dc2SMartin Matuskaimport times.
1077*61145dc2SMartin Matuska.It Sy zfs_dedup_log_cap Ns = Ns Sy UINT_MAX Ns Pq uint
1078*61145dc2SMartin MatuskaSoft cap for the size of the current dedup log.
1079*61145dc2SMartin Matuska.Pp
1080*61145dc2SMartin MatuskaIf the log is larger than this size, we increase the aggressiveness of
1081*61145dc2SMartin Matuskathe flushing to try to bring it back down to the soft cap.
1082*61145dc2SMartin MatuskaSetting it will reduce import times, but will reduce the efficiency of
1083*61145dc2SMartin Matuskathe DDT log, increasing the expected number of IOs required to flush the same
1084*61145dc2SMartin Matuskaamount of data.
1085*61145dc2SMartin Matuska.It Sy zfs_dedup_log_hard_cap Ns = Ns Sy 0 Ns | Ns 1 Pq uint
1086*61145dc2SMartin MatuskaWhether to treat the log cap as a firm cap or not.
1087*61145dc2SMartin Matuska.Pp
1088*61145dc2SMartin MatuskaWhen set to 0 (the default), the
1089*61145dc2SMartin Matuska.Sy zfs_dedup_log_cap
1090*61145dc2SMartin Matuskawill increase the maximum number of log entries we flush in a given txg.
1091*61145dc2SMartin MatuskaThis will bring the backlog size down towards the cap, but not at the expense
1092*61145dc2SMartin Matuskaof making TXG syncs take longer.
1093*61145dc2SMartin MatuskaIf this is set to 1, the cap acts more like a hard cap than a soft cap; it will
1094*61145dc2SMartin Matuskaalso increase the minimum number of log entries we flush per TXG.
1095*61145dc2SMartin MatuskaEnabling it will reduce worst-case import times, at the cost of increased TXG
1096*61145dc2SMartin Matuskasync times.
1097e2df9bb4SMartin Matuska.It Sy zfs_dedup_log_flush_flow_rate_txgs Ns = Ns Sy 10 Ns Pq uint
1098e2df9bb4SMartin MatuskaNumber of transactions to use to compute the flow rate.
1099e2df9bb4SMartin Matuska.Pp
1100*61145dc2SMartin MatuskaOpenZFS will estimate number of entries changed (ingest rate), number of entries
1101e2df9bb4SMartin Matuskaflushed (flush rate) and time spent flushing (flush time rate) and combining
1102e2df9bb4SMartin Matuskathese into an overall "flow rate".
1103e2df9bb4SMartin MatuskaIt will use an exponential weighted moving average over some number of recent
1104e2df9bb4SMartin Matuskatransactions to compute these rates.
1105e2df9bb4SMartin MatuskaThis sets the number of transactions to compute these averages over.
1106e2df9bb4SMartin MatuskaSetting it higher can help to smooth out the flow rate in the face of spiky
1107e2df9bb4SMartin Matuskaworkloads, but will take longer for the flow rate to adjust to a sustained
1108e2df9bb4SMartin Matuskachange in the ingress rate.
1109e2df9bb4SMartin Matuska.
1110e2df9bb4SMartin Matuska.It Sy zfs_dedup_log_txg_max Ns = Ns Sy 8 Ns Pq uint
1111e2df9bb4SMartin MatuskaMax transactions to before starting to flush dedup logs.
1112e2df9bb4SMartin Matuska.Pp
1113e2df9bb4SMartin MatuskaOpenZFS maintains two dedup logs, one receiving new changes, one flushing.
1114e2df9bb4SMartin MatuskaIf there is nothing to flush, it will accumulate changes for no more than this
1115e2df9bb4SMartin Matuskamany transactions before switching the logs and starting to flush entries out.
1116e2df9bb4SMartin Matuska.
1117e2df9bb4SMartin Matuska.It Sy zfs_dedup_log_mem_max Ns = Ns Sy 0 Ns Pq u64
1118e2df9bb4SMartin MatuskaMax memory to use for dedup logs.
1119e2df9bb4SMartin Matuska.Pp
1120e2df9bb4SMartin MatuskaOpenZFS will spend no more than this much memory on maintaining the in-memory
1121e2df9bb4SMartin Matuskadedup log.
1122e2df9bb4SMartin MatuskaFlushing will begin when around half this amount is being spent on logs.
1123e2df9bb4SMartin MatuskaThe default value of
1124e2df9bb4SMartin Matuska.Sy 0
1125e2df9bb4SMartin Matuskawill cause it to be set by
1126e2df9bb4SMartin Matuska.Sy zfs_dedup_log_mem_max_percent
1127e2df9bb4SMartin Matuskainstead.
1128e2df9bb4SMartin Matuska.
1129e2df9bb4SMartin Matuska.It Sy zfs_dedup_log_mem_max_percent Ns = Ns Sy 1 Ns % Pq uint
1130e2df9bb4SMartin MatuskaMax memory to use for dedup logs, as a percentage of total memory.
1131e2df9bb4SMartin Matuska.Pp
1132e2df9bb4SMartin MatuskaIf
1133e2df9bb4SMartin Matuska.Sy zfs_dedup_log_mem_max
1134*61145dc2SMartin Matuskais not set, it will be initialized as a percentage of the total memory in the
1135e2df9bb4SMartin Matuskasystem.
1136e2df9bb4SMartin Matuska.
1137be181ee2SMartin Matuska.It Sy zfs_delay_min_dirty_percent Ns = Ns Sy 60 Ns % Pq uint
11383ff01b23SMartin MatuskaStart to delay each transaction once there is this amount of dirty data,
11393ff01b23SMartin Matuskaexpressed as a percentage of
11403ff01b23SMartin Matuska.Sy zfs_dirty_data_max .
11413ff01b23SMartin MatuskaThis value should be at least
11423ff01b23SMartin Matuska.Sy zfs_vdev_async_write_active_max_dirty_percent .
11433ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY .
11443ff01b23SMartin Matuska.
11453ff01b23SMartin Matuska.It Sy zfs_delay_scale Ns = Ns Sy 500000 Pq int
11463ff01b23SMartin MatuskaThis controls how quickly the transaction delay approaches infinity.
11473ff01b23SMartin MatuskaLarger values cause longer delays for a given amount of dirty data.
11483ff01b23SMartin Matuska.Pp
11493ff01b23SMartin MatuskaFor the smoothest delay, this value should be about 1 billion divided
11503ff01b23SMartin Matuskaby the maximum number of operations per second.
11513ff01b23SMartin MatuskaThis will smoothly handle between ten times and a tenth of this number.
11523ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY .
11533ff01b23SMartin Matuska.Pp
1154e92ffd9bSMartin Matuska.Sy zfs_delay_scale No \(mu Sy zfs_dirty_data_max Em must No be smaller than Sy 2^64 .
11553ff01b23SMartin Matuska.
11567a7741afSMartin Matuska.It Sy zfs_dio_write_verify_events_per_second Ns = Ns Sy 20 Ns /s Pq uint
11577a7741afSMartin MatuskaRate limit Direct I/O write verify events to this many per second.
11587a7741afSMartin Matuska.
11593ff01b23SMartin Matuska.It Sy zfs_disable_ivset_guid_check Ns = Ns Sy 0 Ns | Ns 1 Pq int
11603ff01b23SMartin MatuskaDisables requirement for IVset GUIDs to be present and match when doing a raw
11613ff01b23SMartin Matuskareceive of encrypted datasets.
11623ff01b23SMartin MatuskaIntended for users whose pools were created with
11633ff01b23SMartin MatuskaOpenZFS pre-release versions and now have compatibility issues.
11643ff01b23SMartin Matuska.
11653ff01b23SMartin Matuska.It Sy zfs_key_max_salt_uses Ns = Ns Sy 400000000 Po 4*10^8 Pc Pq ulong
11663ff01b23SMartin MatuskaMaximum number of uses of a single salt value before generating a new one for
11673ff01b23SMartin Matuskaencrypted datasets.
11683ff01b23SMartin MatuskaThe default value is also the maximum.
11693ff01b23SMartin Matuska.
11703ff01b23SMartin Matuska.It Sy zfs_object_mutex_size Ns = Ns Sy 64 Pq uint
11713ff01b23SMartin MatuskaSize of the znode hashtable used for holds.
11723ff01b23SMartin Matuska.Pp
11733ff01b23SMartin MatuskaDue to the need to hold locks on objects that may not exist yet, kernel mutexes
11743ff01b23SMartin Matuskaare not created per-object and instead a hashtable is used where collisions
11753ff01b23SMartin Matuskawill result in objects waiting when there is not actually contention on the
11763ff01b23SMartin Matuskasame object.
11773ff01b23SMartin Matuska.
11783ff01b23SMartin Matuska.It Sy zfs_slow_io_events_per_second Ns = Ns Sy 20 Ns /s Pq int
1179aca928a5SMartin MatuskaRate limit delay zevents (which report slow I/O operations) to this many per
11803ff01b23SMartin Matuskasecond.
11813ff01b23SMartin Matuska.
1182dbd5678dSMartin Matuska.It Sy zfs_unflushed_max_mem_amt Ns = Ns Sy 1073741824 Ns B Po 1 GiB Pc Pq u64
11833ff01b23SMartin MatuskaUpper-bound limit for unflushed metadata changes to be held by the
11843ff01b23SMartin Matuskalog spacemap in memory, in bytes.
11853ff01b23SMartin Matuska.
1186dbd5678dSMartin Matuska.It Sy zfs_unflushed_max_mem_ppm Ns = Ns Sy 1000 Ns ppm Po 0.1% Pc Pq u64
11873ff01b23SMartin MatuskaPart of overall system memory that ZFS allows to be used
11883ff01b23SMartin Matuskafor unflushed metadata changes by the log spacemap, in millionths.
11893ff01b23SMartin Matuska.
1190dbd5678dSMartin Matuska.It Sy zfs_unflushed_log_block_max Ns = Ns Sy 131072 Po 128k Pc Pq u64
11913ff01b23SMartin MatuskaDescribes the maximum number of log spacemap blocks allowed for each pool.
11923ff01b23SMartin MatuskaThe default value means that the space in all the log spacemaps
11933ff01b23SMartin Matuskacan add up to no more than
1194716fd348SMartin Matuska.Sy 131072
11953ff01b23SMartin Matuskablocks (which means
1196716fd348SMartin Matuska.Em 16 GiB
11973ff01b23SMartin Matuskaof logical space before compression and ditto blocks,
11983ff01b23SMartin Matuskaassuming that blocksize is
1199716fd348SMartin Matuska.Em 128 KiB ) .
12003ff01b23SMartin Matuska.Pp
12013ff01b23SMartin MatuskaThis tunable is important because it involves a trade-off between import
12023ff01b23SMartin Matuskatime after an unclean export and the frequency of flushing metaslabs.
12033ff01b23SMartin MatuskaThe higher this number is, the more log blocks we allow when the pool is
12043ff01b23SMartin Matuskaactive which means that we flush metaslabs less often and thus decrease
1205c03c5b1cSMartin Matuskathe number of I/O operations for spacemap updates per TXG.
12063ff01b23SMartin MatuskaAt the same time though, that means that in the event of an unclean export,
12073ff01b23SMartin Matuskathere will be more log spacemap blocks for us to read, inducing overhead
12083ff01b23SMartin Matuskain the import time of the pool.
12093ff01b23SMartin MatuskaThe lower the number, the amount of flushing increases, destroying log
12103ff01b23SMartin Matuskablocks quicker as they become obsolete faster, which leaves less blocks
12113ff01b23SMartin Matuskato be read during import time after a crash.
12123ff01b23SMartin Matuska.Pp
12133ff01b23SMartin MatuskaEach log spacemap block existing during pool import leads to approximately
12143ff01b23SMartin Matuskaone extra logical I/O issued.
12153ff01b23SMartin MatuskaThis is the reason why this tunable is exposed in terms of blocks rather
12163ff01b23SMartin Matuskathan space used.
12173ff01b23SMartin Matuska.
1218dbd5678dSMartin Matuska.It Sy zfs_unflushed_log_block_min Ns = Ns Sy 1000 Pq u64
12193ff01b23SMartin MatuskaIf the number of metaslabs is small and our incoming rate is high,
12203ff01b23SMartin Matuskawe could get into a situation that we are flushing all our metaslabs every TXG.
12213ff01b23SMartin MatuskaThus we always allow at least this many log blocks.
12223ff01b23SMartin Matuska.
1223dbd5678dSMartin Matuska.It Sy zfs_unflushed_log_block_pct Ns = Ns Sy 400 Ns % Pq u64
12243ff01b23SMartin MatuskaTunable used to determine the number of blocks that can be used for
12253ff01b23SMartin Matuskathe spacemap log, expressed as a percentage of the total number of
1226716fd348SMartin Matuskaunflushed metaslabs in the pool.
1227716fd348SMartin Matuska.
1228dbd5678dSMartin Matuska.It Sy zfs_unflushed_log_txg_max Ns = Ns Sy 1000 Pq u64
1229716fd348SMartin MatuskaTunable limiting maximum time in TXGs any metaslab may remain unflushed.
1230716fd348SMartin MatuskaIt effectively limits maximum number of unflushed per-TXG spacemap logs
1231716fd348SMartin Matuskathat need to be read after unclean pool export.
12323ff01b23SMartin Matuska.
12333ff01b23SMartin Matuska.It Sy zfs_unlink_suspend_progress Ns = Ns Sy 0 Ns | Ns 1 Pq uint
12343ff01b23SMartin MatuskaWhen enabled, files will not be asynchronously removed from the list of pending
12353ff01b23SMartin Matuskaunlinks and the space they consume will be leaked.
12363ff01b23SMartin MatuskaOnce this option has been disabled and the dataset is remounted,
12373ff01b23SMartin Matuskathe pending unlinks will be processed and the freed space returned to the pool.
12383ff01b23SMartin MatuskaThis option is used by the test suite.
12393ff01b23SMartin Matuska.
12403ff01b23SMartin Matuska.It Sy zfs_delete_blocks Ns = Ns Sy 20480 Pq ulong
12413ff01b23SMartin MatuskaThis is the used to define a large file for the purposes of deletion.
12423ff01b23SMartin MatuskaFiles containing more than
12433ff01b23SMartin Matuska.Sy zfs_delete_blocks
12443ff01b23SMartin Matuskawill be deleted asynchronously, while smaller files are deleted synchronously.
12453ff01b23SMartin MatuskaDecreasing this value will reduce the time spent in an
12463ff01b23SMartin Matuska.Xr unlink 2
1247bb2d13b6SMartin Matuskasystem call, at the expense of a longer delay before the freed space is
1248bb2d13b6SMartin Matuskaavailable.
1249dbd5678dSMartin MatuskaThis only applies on Linux.
12503ff01b23SMartin Matuska.
12513ff01b23SMartin Matuska.It Sy zfs_dirty_data_max Ns = Pq int
12523ff01b23SMartin MatuskaDetermines the dirty space limit in bytes.
12533ff01b23SMartin MatuskaOnce this limit is exceeded, new writes are halted until space frees up.
12543ff01b23SMartin MatuskaThis parameter takes precedence over
12553ff01b23SMartin Matuska.Sy zfs_dirty_data_max_percent .
12563ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY .
12573ff01b23SMartin Matuska.Pp
12583ff01b23SMartin MatuskaDefaults to
12593ff01b23SMartin Matuska.Sy physical_ram/10 ,
12603ff01b23SMartin Matuskacapped at
12613ff01b23SMartin Matuska.Sy zfs_dirty_data_max_max .
12623ff01b23SMartin Matuska.
12633ff01b23SMartin Matuska.It Sy zfs_dirty_data_max_max Ns = Pq int
12643ff01b23SMartin MatuskaMaximum allowable value of
12653ff01b23SMartin Matuska.Sy zfs_dirty_data_max ,
12663ff01b23SMartin Matuskaexpressed in bytes.
12673ff01b23SMartin MatuskaThis limit is only enforced at module load time, and will be ignored if
12683ff01b23SMartin Matuska.Sy zfs_dirty_data_max
12693ff01b23SMartin Matuskais later changed.
12703ff01b23SMartin MatuskaThis parameter takes precedence over
12713ff01b23SMartin Matuska.Sy zfs_dirty_data_max_max_percent .
12723ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY .
12733ff01b23SMartin Matuska.Pp
12743ff01b23SMartin MatuskaDefaults to
127515f0b8c3SMartin Matuska.Sy min(physical_ram/4, 4GiB) ,
127615f0b8c3SMartin Matuskaor
127715f0b8c3SMartin Matuska.Sy min(physical_ram/4, 1GiB)
127815f0b8c3SMartin Matuskafor 32-bit systems.
12793ff01b23SMartin Matuska.
1280be181ee2SMartin Matuska.It Sy zfs_dirty_data_max_max_percent Ns = Ns Sy 25 Ns % Pq uint
12813ff01b23SMartin MatuskaMaximum allowable value of
12823ff01b23SMartin Matuska.Sy zfs_dirty_data_max ,
12833ff01b23SMartin Matuskaexpressed as a percentage of physical RAM.
12843ff01b23SMartin MatuskaThis limit is only enforced at module load time, and will be ignored if
12853ff01b23SMartin Matuska.Sy zfs_dirty_data_max
12863ff01b23SMartin Matuskais later changed.
12873ff01b23SMartin MatuskaThe parameter
12883ff01b23SMartin Matuska.Sy zfs_dirty_data_max_max
12893ff01b23SMartin Matuskatakes precedence over this one.
12903ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY .
12913ff01b23SMartin Matuska.
1292be181ee2SMartin Matuska.It Sy zfs_dirty_data_max_percent Ns = Ns Sy 10 Ns % Pq uint
12933ff01b23SMartin MatuskaDetermines the dirty space limit, expressed as a percentage of all memory.
12943ff01b23SMartin MatuskaOnce this limit is exceeded, new writes are halted until space frees up.
12953ff01b23SMartin MatuskaThe parameter
12963ff01b23SMartin Matuska.Sy zfs_dirty_data_max
12973ff01b23SMartin Matuskatakes precedence over this one.
12983ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY .
12993ff01b23SMartin Matuska.Pp
13003ff01b23SMartin MatuskaSubject to
13013ff01b23SMartin Matuska.Sy zfs_dirty_data_max_max .
13023ff01b23SMartin Matuska.
1303be181ee2SMartin Matuska.It Sy zfs_dirty_data_sync_percent Ns = Ns Sy 20 Ns % Pq uint
13043ff01b23SMartin MatuskaStart syncing out a transaction group if there's at least this much dirty data
13053ff01b23SMartin Matuska.Pq as a percentage of Sy zfs_dirty_data_max .
13063ff01b23SMartin MatuskaThis should be less than
13073ff01b23SMartin Matuska.Sy zfs_vdev_async_write_active_min_dirty_percent .
13083ff01b23SMartin Matuska.
13093f9d360cSMartin Matuska.It Sy zfs_wrlog_data_max Ns = Pq int
1310*61145dc2SMartin MatuskaThe upper limit of write-transaction ZIL log data size in bytes.
1311e3aa18adSMartin MatuskaWrite operations are throttled when approaching the limit until log data is
1312e3aa18adSMartin Matuskacleared out after transaction group sync.
1313e3aa18adSMartin MatuskaBecause of some overhead, it should be set at least 2 times the size of
13143f9d360cSMartin Matuska.Sy zfs_dirty_data_max
13153f9d360cSMartin Matuska.No to prevent harming normal write throughput .
13163f9d360cSMartin MatuskaIt also should be smaller than the size of the slog device if slog is present.
13173f9d360cSMartin Matuska.Pp
13183f9d360cSMartin MatuskaDefaults to
13193f9d360cSMartin Matuska.Sy zfs_dirty_data_max*2
13203f9d360cSMartin Matuska.
13213ff01b23SMartin Matuska.It Sy zfs_fallocate_reserve_percent Ns = Ns Sy 110 Ns % Pq uint
13223ff01b23SMartin MatuskaSince ZFS is a copy-on-write filesystem with snapshots, blocks cannot be
13233ff01b23SMartin Matuskapreallocated for a file in order to guarantee that later writes will not
13243ff01b23SMartin Matuskarun out of space.
13253ff01b23SMartin MatuskaInstead,
13263ff01b23SMartin Matuska.Xr fallocate 2
13273ff01b23SMartin Matuskaspace preallocation only checks that sufficient space is currently available
13283ff01b23SMartin Matuskain the pool or the user's project quota allocation,
13293ff01b23SMartin Matuskaand then creates a sparse file of the requested size.
13303ff01b23SMartin MatuskaThe requested space is multiplied by
13313ff01b23SMartin Matuska.Sy zfs_fallocate_reserve_percent
13323ff01b23SMartin Matuskato allow additional space for indirect blocks and other internal metadata.
13333ff01b23SMartin MatuskaSetting this to
13343ff01b23SMartin Matuska.Sy 0
13353ff01b23SMartin Matuskadisables support for
13363ff01b23SMartin Matuska.Xr fallocate 2
13373ff01b23SMartin Matuskaand causes it to return
13383ff01b23SMartin Matuska.Sy EOPNOTSUPP .
13393ff01b23SMartin Matuska.
13403ff01b23SMartin Matuska.It Sy zfs_fletcher_4_impl Ns = Ns Sy fastest Pq string
13413ff01b23SMartin MatuskaSelect a fletcher 4 implementation.
13423ff01b23SMartin Matuska.Pp
13433ff01b23SMartin MatuskaSupported selectors are:
13443ff01b23SMartin Matuska.Sy fastest , scalar , sse2 , ssse3 , avx2 , avx512f , avx512bw ,
13453ff01b23SMartin Matuska.No and Sy aarch64_neon .
13463ff01b23SMartin MatuskaAll except
13473ff01b23SMartin Matuska.Sy fastest No and Sy scalar
13483ff01b23SMartin Matuskarequire instruction set extensions to be available,
13493ff01b23SMartin Matuskaand will only appear if ZFS detects that they are present at runtime.
13503ff01b23SMartin MatuskaIf multiple implementations of fletcher 4 are available, the
13513ff01b23SMartin Matuska.Sy fastest
13523ff01b23SMartin Matuskawill be chosen using a micro benchmark.
13533ff01b23SMartin MatuskaSelecting
13543ff01b23SMartin Matuska.Sy scalar
13553ff01b23SMartin Matuskaresults in the original CPU-based calculation being used.
13563ff01b23SMartin MatuskaSelecting any option other than
13573ff01b23SMartin Matuska.Sy fastest No or Sy scalar
13583ff01b23SMartin Matuskaresults in vector instructions
13593ff01b23SMartin Matuskafrom the respective CPU instruction set being used.
13603ff01b23SMartin Matuska.
136147bb16f8SMartin Matuska.It Sy zfs_bclone_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
13625c65a0a9SMartin MatuskaEnables access to the block cloning feature.
136347bb16f8SMartin MatuskaIf this setting is 0, then even if feature@block_cloning is enabled,
13645c65a0a9SMartin Matuskausing functions and system calls that attempt to clone blocks will act as
13655c65a0a9SMartin Matuskathough the feature is disabled.
136647bb16f8SMartin Matuska.
1367a4e5e010SMartin Matuska.It Sy zfs_bclone_wait_dirty Ns = Ns Sy 0 Ns | Ns 1 Pq int
1368a4e5e010SMartin MatuskaWhen set to 1 the FICLONE and FICLONERANGE ioctls wait for dirty data to be
1369a4e5e010SMartin Matuskawritten to disk.
1370a4e5e010SMartin MatuskaThis allows the clone operation to reliably succeed when a file is
1371a4e5e010SMartin Matuskamodified and then immediately cloned.
1372a4e5e010SMartin MatuskaFor small files this may be slower than making a copy of the file.
1373a4e5e010SMartin MatuskaTherefore, this setting defaults to 0 which causes a clone operation to
1374a4e5e010SMartin Matuskaimmediately fail when encountering a dirty block.
1375a4e5e010SMartin Matuska.
1376c7046f76SMartin Matuska.It Sy zfs_blake3_impl Ns = Ns Sy fastest Pq string
1377c7046f76SMartin MatuskaSelect a BLAKE3 implementation.
1378c7046f76SMartin Matuska.Pp
1379c7046f76SMartin MatuskaSupported selectors are:
1380c7046f76SMartin Matuska.Sy cycle , fastest , generic , sse2 , sse41 , avx2 , avx512 .
1381c7046f76SMartin MatuskaAll except
1382c7046f76SMartin Matuska.Sy cycle , fastest No and Sy generic
1383c7046f76SMartin Matuskarequire instruction set extensions to be available,
1384c7046f76SMartin Matuskaand will only appear if ZFS detects that they are present at runtime.
1385c7046f76SMartin MatuskaIf multiple implementations of BLAKE3 are available, the
1386c7046f76SMartin Matuska.Sy fastest will be chosen using a micro benchmark. You can see the
1387c7046f76SMartin Matuskabenchmark results by reading this kstat file:
1388c7046f76SMartin Matuska.Pa /proc/spl/kstat/zfs/chksum_bench .
1389c7046f76SMartin Matuska.
13903ff01b23SMartin Matuska.It Sy zfs_free_bpobj_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
13913ff01b23SMartin MatuskaEnable/disable the processing of the free_bpobj object.
13923ff01b23SMartin Matuska.
1393dbd5678dSMartin Matuska.It Sy zfs_async_block_max_blocks Ns = Ns Sy UINT64_MAX Po unlimited Pc Pq u64
13943ff01b23SMartin MatuskaMaximum number of blocks freed in a single TXG.
13953ff01b23SMartin Matuska.
1396dbd5678dSMartin Matuska.It Sy zfs_max_async_dedup_frees Ns = Ns Sy 100000 Po 10^5 Pc Pq u64
13973ff01b23SMartin MatuskaMaximum number of dedup blocks freed in a single TXG.
13983ff01b23SMartin Matuska.
1399be181ee2SMartin Matuska.It Sy zfs_vdev_async_read_max_active Ns = Ns Sy 3 Pq uint
14003ff01b23SMartin MatuskaMaximum asynchronous read I/O operations active to each device.
14013ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14023ff01b23SMartin Matuska.
1403be181ee2SMartin Matuska.It Sy zfs_vdev_async_read_min_active Ns = Ns Sy 1 Pq uint
14043ff01b23SMartin MatuskaMinimum asynchronous read I/O operation active to each device.
14053ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14063ff01b23SMartin Matuska.
1407be181ee2SMartin Matuska.It Sy zfs_vdev_async_write_active_max_dirty_percent Ns = Ns Sy 60 Ns % Pq uint
14083ff01b23SMartin MatuskaWhen the pool has more than this much dirty data, use
14093ff01b23SMartin Matuska.Sy zfs_vdev_async_write_max_active
14103ff01b23SMartin Matuskato limit active async writes.
14113ff01b23SMartin MatuskaIf the dirty data is between the minimum and maximum,
14123ff01b23SMartin Matuskathe active I/O limit is linearly interpolated.
14133ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14143ff01b23SMartin Matuska.
1415be181ee2SMartin Matuska.It Sy zfs_vdev_async_write_active_min_dirty_percent Ns = Ns Sy 30 Ns % Pq uint
14163ff01b23SMartin MatuskaWhen the pool has less than this much dirty data, use
14173ff01b23SMartin Matuska.Sy zfs_vdev_async_write_min_active
14183ff01b23SMartin Matuskato limit active async writes.
14193ff01b23SMartin MatuskaIf the dirty data is between the minimum and maximum,
14203ff01b23SMartin Matuskathe active I/O limit is linearly
14213ff01b23SMartin Matuskainterpolated.
14223ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14233ff01b23SMartin Matuska.
1424bb2d13b6SMartin Matuska.It Sy zfs_vdev_async_write_max_active Ns = Ns Sy 10 Pq uint
14253ff01b23SMartin MatuskaMaximum asynchronous write I/O operations active to each device.
14263ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14273ff01b23SMartin Matuska.
1428be181ee2SMartin Matuska.It Sy zfs_vdev_async_write_min_active Ns = Ns Sy 2 Pq uint
14293ff01b23SMartin MatuskaMinimum asynchronous write I/O operations active to each device.
14303ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14313ff01b23SMartin Matuska.Pp
14323ff01b23SMartin MatuskaLower values are associated with better latency on rotational media but poorer
14333ff01b23SMartin Matuskaresilver performance.
14343ff01b23SMartin MatuskaThe default value of
14353ff01b23SMartin Matuska.Sy 2
14363ff01b23SMartin Matuskawas chosen as a compromise.
14373ff01b23SMartin MatuskaA value of
14383ff01b23SMartin Matuska.Sy 3
14393ff01b23SMartin Matuskahas been shown to improve resilver performance further at a cost of
14403ff01b23SMartin Matuskafurther increasing latency.
14413ff01b23SMartin Matuska.
1442be181ee2SMartin Matuska.It Sy zfs_vdev_initializing_max_active Ns = Ns Sy 1 Pq uint
14433ff01b23SMartin MatuskaMaximum initializing I/O operations active to each device.
14443ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14453ff01b23SMartin Matuska.
1446be181ee2SMartin Matuska.It Sy zfs_vdev_initializing_min_active Ns = Ns Sy 1 Pq uint
14473ff01b23SMartin MatuskaMinimum initializing I/O operations active to each device.
14483ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14493ff01b23SMartin Matuska.
1450be181ee2SMartin Matuska.It Sy zfs_vdev_max_active Ns = Ns Sy 1000 Pq uint
14513ff01b23SMartin MatuskaThe maximum number of I/O operations active to each device.
14523ff01b23SMartin MatuskaIdeally, this will be at least the sum of each queue's
14533ff01b23SMartin Matuska.Sy max_active .
14543ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14553ff01b23SMartin Matuska.
1456dbd5678dSMartin Matuska.It Sy zfs_vdev_open_timeout_ms Ns = Ns Sy 1000 Pq uint
1457dbd5678dSMartin MatuskaTimeout value to wait before determining a device is missing
1458dbd5678dSMartin Matuskaduring import.
1459dbd5678dSMartin MatuskaThis is helpful for transient missing paths due
1460dbd5678dSMartin Matuskato links being briefly removed and recreated in response to
1461dbd5678dSMartin Matuskaudev events.
1462dbd5678dSMartin Matuska.
1463be181ee2SMartin Matuska.It Sy zfs_vdev_rebuild_max_active Ns = Ns Sy 3 Pq uint
14643ff01b23SMartin MatuskaMaximum sequential resilver I/O operations active to each device.
14653ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14663ff01b23SMartin Matuska.
1467be181ee2SMartin Matuska.It Sy zfs_vdev_rebuild_min_active Ns = Ns Sy 1 Pq uint
14683ff01b23SMartin MatuskaMinimum sequential resilver I/O operations active to each device.
14693ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14703ff01b23SMartin Matuska.
1471be181ee2SMartin Matuska.It Sy zfs_vdev_removal_max_active Ns = Ns Sy 2 Pq uint
14723ff01b23SMartin MatuskaMaximum removal I/O operations active to each device.
14733ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14743ff01b23SMartin Matuska.
1475be181ee2SMartin Matuska.It Sy zfs_vdev_removal_min_active Ns = Ns Sy 1 Pq uint
14763ff01b23SMartin MatuskaMinimum removal I/O operations active to each device.
14773ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14783ff01b23SMartin Matuska.
1479be181ee2SMartin Matuska.It Sy zfs_vdev_scrub_max_active Ns = Ns Sy 2 Pq uint
14803ff01b23SMartin MatuskaMaximum scrub I/O operations active to each device.
14813ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14823ff01b23SMartin Matuska.
1483be181ee2SMartin Matuska.It Sy zfs_vdev_scrub_min_active Ns = Ns Sy 1 Pq uint
14843ff01b23SMartin MatuskaMinimum scrub I/O operations active to each device.
14853ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14863ff01b23SMartin Matuska.
1487be181ee2SMartin Matuska.It Sy zfs_vdev_sync_read_max_active Ns = Ns Sy 10 Pq uint
14883ff01b23SMartin MatuskaMaximum synchronous read I/O operations active to each device.
14893ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14903ff01b23SMartin Matuska.
1491be181ee2SMartin Matuska.It Sy zfs_vdev_sync_read_min_active Ns = Ns Sy 10 Pq uint
14923ff01b23SMartin MatuskaMinimum synchronous read I/O operations active to each device.
14933ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14943ff01b23SMartin Matuska.
1495be181ee2SMartin Matuska.It Sy zfs_vdev_sync_write_max_active Ns = Ns Sy 10 Pq uint
14963ff01b23SMartin MatuskaMaximum synchronous write I/O operations active to each device.
14973ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
14983ff01b23SMartin Matuska.
1499be181ee2SMartin Matuska.It Sy zfs_vdev_sync_write_min_active Ns = Ns Sy 10 Pq uint
15003ff01b23SMartin MatuskaMinimum synchronous write I/O operations active to each device.
15013ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
15023ff01b23SMartin Matuska.
1503be181ee2SMartin Matuska.It Sy zfs_vdev_trim_max_active Ns = Ns Sy 2 Pq uint
15043ff01b23SMartin MatuskaMaximum trim/discard I/O operations active to each device.
15053ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
15063ff01b23SMartin Matuska.
1507be181ee2SMartin Matuska.It Sy zfs_vdev_trim_min_active Ns = Ns Sy 1 Pq uint
15083ff01b23SMartin MatuskaMinimum trim/discard I/O operations active to each device.
15093ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
15103ff01b23SMartin Matuska.
1511be181ee2SMartin Matuska.It Sy zfs_vdev_nia_delay Ns = Ns Sy 5 Pq uint
15123ff01b23SMartin MatuskaFor non-interactive I/O (scrub, resilver, removal, initialize and rebuild),
15133ff01b23SMartin Matuskathe number of concurrently-active I/O operations is limited to
15143ff01b23SMartin Matuska.Sy zfs_*_min_active ,
15153ff01b23SMartin Matuskaunless the vdev is "idle".
1516e92ffd9bSMartin MatuskaWhen there are no interactive I/O operations active (synchronous or otherwise),
15173ff01b23SMartin Matuskaand
15183ff01b23SMartin Matuska.Sy zfs_vdev_nia_delay
15193ff01b23SMartin Matuskaoperations have completed since the last interactive operation,
15203ff01b23SMartin Matuskathen the vdev is considered to be "idle",
15213ff01b23SMartin Matuskaand the number of concurrently-active non-interactive operations is increased to
15223ff01b23SMartin Matuska.Sy zfs_*_max_active .
15233ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
15243ff01b23SMartin Matuska.
1525be181ee2SMartin Matuska.It Sy zfs_vdev_nia_credit Ns = Ns Sy 5 Pq uint
15263ff01b23SMartin MatuskaSome HDDs tend to prioritize sequential I/O so strongly, that concurrent
15273ff01b23SMartin Matuskarandom I/O latency reaches several seconds.
15283ff01b23SMartin MatuskaOn some HDDs this happens even if sequential I/O operations
15293ff01b23SMartin Matuskaare submitted one at a time, and so setting
15303ff01b23SMartin Matuska.Sy zfs_*_max_active Ns = Sy 1
15313ff01b23SMartin Matuskadoes not help.
15323ff01b23SMartin MatuskaTo prevent non-interactive I/O, like scrub,
15333ff01b23SMartin Matuskafrom monopolizing the device, no more than
15343ff01b23SMartin Matuska.Sy zfs_vdev_nia_credit operations can be sent
15353ff01b23SMartin Matuskawhile there are outstanding incomplete interactive operations.
15363ff01b23SMartin MatuskaThis enforced wait ensures the HDD services the interactive I/O
15373ff01b23SMartin Matuskawithin a reasonable amount of time.
15383ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
15393ff01b23SMartin Matuska.
1540dbd5678dSMartin Matuska.It Sy zfs_vdev_failfast_mask Ns = Ns Sy 1 Pq uint
1541dbd5678dSMartin MatuskaDefines if the driver should retire on a given error type.
1542dbd5678dSMartin MatuskaThe following options may be bitwise-ored together:
1543dbd5678dSMartin Matuska.TS
1544dbd5678dSMartin Matuskabox;
1545dbd5678dSMartin Matuskalbz r l l .
1546dbd5678dSMartin Matuska	Value	Name	Description
1547dbd5678dSMartin Matuska_
1548dbd5678dSMartin Matuska	1	Device	No driver retries on device errors
1549dbd5678dSMartin Matuska	2	Transport	No driver retries on transport errors.
1550dbd5678dSMartin Matuska	4	Driver	No driver retries on driver errors.
1551dbd5678dSMartin Matuska.TE
1552dbd5678dSMartin Matuska.
1553783d3ff6SMartin Matuska.It Sy zfs_vdev_disk_max_segs Ns = Ns Sy 0 Pq uint
1554783d3ff6SMartin MatuskaMaximum number of segments to add to a BIO (min 4).
1555783d3ff6SMartin MatuskaIf this is higher than the maximum allowed by the device queue or the kernel
1556783d3ff6SMartin Matuskaitself, it will be clamped.
1557783d3ff6SMartin MatuskaSetting it to zero will cause the kernel's ideal size to be used.
1558783d3ff6SMartin MatuskaThis parameter only applies on Linux.
1559783d3ff6SMartin MatuskaThis parameter is ignored if
1560783d3ff6SMartin Matuska.Sy zfs_vdev_disk_classic Ns = Ns Sy 1 .
1561783d3ff6SMartin Matuska.
1562783d3ff6SMartin Matuska.It Sy zfs_vdev_disk_classic Ns = Ns Sy 0 Ns | Ns 1 Pq uint
1563783d3ff6SMartin MatuskaIf set to 1, OpenZFS will submit IO to Linux using the method it used in 2.2
1564783d3ff6SMartin Matuskaand earlier.
1565783d3ff6SMartin MatuskaThis "classic" method has known issues with highly fragmented IO requests and
1566783d3ff6SMartin Matuskais slower on many workloads, but it has been in use for many years and is known
1567783d3ff6SMartin Matuskato be very stable.
1568783d3ff6SMartin MatuskaIf you set this parameter, please also open a bug report why you did so,
1569783d3ff6SMartin Matuskaincluding the workload involved and any error messages.
1570783d3ff6SMartin Matuska.Pp
1571783d3ff6SMartin MatuskaThis parameter and the classic submission method will be removed once we have
1572783d3ff6SMartin Matuskatotal confidence in the new method.
1573783d3ff6SMartin Matuska.Pp
1574783d3ff6SMartin MatuskaThis parameter only applies on Linux, and can only be set at module load time.
1575783d3ff6SMartin Matuska.
15763ff01b23SMartin Matuska.It Sy zfs_expire_snapshot Ns = Ns Sy 300 Ns s Pq int
15773ff01b23SMartin MatuskaTime before expiring
15783ff01b23SMartin Matuska.Pa .zfs/snapshot .
15793ff01b23SMartin Matuska.
15803ff01b23SMartin Matuska.It Sy zfs_admin_snapshot Ns = Ns Sy 0 Ns | Ns 1 Pq int
15813ff01b23SMartin MatuskaAllow the creation, removal, or renaming of entries in the
15823ff01b23SMartin Matuska.Sy .zfs/snapshot
15833ff01b23SMartin Matuskadirectory to cause the creation, destruction, or renaming of snapshots.
15843ff01b23SMartin MatuskaWhen enabled, this functionality works both locally and over NFS exports
15853ff01b23SMartin Matuskawhich have the
15863ff01b23SMartin Matuska.Em no_root_squash
15873ff01b23SMartin Matuskaoption set.
15883ff01b23SMartin Matuska.
15897a7741afSMartin Matuska.It Sy zfs_snapshot_no_setuid Ns = Ns Sy 0 Ns | Ns 1 Pq int
15907a7741afSMartin MatuskaWhether to disable
15917a7741afSMartin Matuska.Em setuid/setgid
15927a7741afSMartin Matuskasupport for snapshot mounts triggered by access to the
15937a7741afSMartin Matuska.Sy .zfs/snapshot
15947a7741afSMartin Matuskadirectory by setting the
15957a7741afSMartin Matuska.Em nosuid
15967a7741afSMartin Matuskamount option.
15977a7741afSMartin Matuska.
15983ff01b23SMartin Matuska.It Sy zfs_flags Ns = Ns Sy 0 Pq int
15993ff01b23SMartin MatuskaSet additional debugging flags.
16003ff01b23SMartin MatuskaThe following flags may be bitwise-ored together:
16013ff01b23SMartin Matuska.TS
16023ff01b23SMartin Matuskabox;
16033ff01b23SMartin Matuskalbz r l l .
1604dbd5678dSMartin Matuska	Value	Name	Description
16053ff01b23SMartin Matuska_
16063ff01b23SMartin Matuska	1	ZFS_DEBUG_DPRINTF	Enable dprintf entries in the debug log.
16073ff01b23SMartin Matuska*	2	ZFS_DEBUG_DBUF_VERIFY	Enable extra dbuf verifications.
16083ff01b23SMartin Matuska*	4	ZFS_DEBUG_DNODE_VERIFY	Enable extra dnode verifications.
16093ff01b23SMartin Matuska	8	ZFS_DEBUG_SNAPNAMES	Enable snapshot name verification.
161015f0b8c3SMartin Matuska*	16	ZFS_DEBUG_MODIFY	Check for illegally modified ARC buffers.
16113ff01b23SMartin Matuska	64	ZFS_DEBUG_ZIO_FREE	Enable verification of block frees.
16123ff01b23SMartin Matuska	128	ZFS_DEBUG_HISTOGRAM_VERIFY	Enable extra spacemap histogram verifications.
16133ff01b23SMartin Matuska	256	ZFS_DEBUG_METASLAB_VERIFY	Verify space accounting on disk matches in-memory \fBrange_trees\fP.
16143ff01b23SMartin Matuska	512	ZFS_DEBUG_SET_ERROR	Enable \fBSET_ERROR\fP and dprintf entries in the debug log.
16153ff01b23SMartin Matuska	1024	ZFS_DEBUG_INDIRECT_REMAP	Verify split blocks created by device removal.
16163ff01b23SMartin Matuska	2048	ZFS_DEBUG_TRIM	Verify TRIM ranges are always within the allocatable range tree.
16173ff01b23SMartin Matuska	4096	ZFS_DEBUG_LOG_SPACEMAP	Verify that the log summary is consistent with the spacemap log
16183ff01b23SMartin Matuska			       and enable \fBzfs_dbgmsgs\fP for metaslab loading and flushing.
1619*61145dc2SMartin Matuska	8192	ZFS_DEBUG_METASLAB_ALLOC	Enable debugging messages when allocations fail.
1620*61145dc2SMartin Matuska	16384	ZFS_DEBUG_BRT	Enable BRT-related debugging messages.
1621*61145dc2SMartin Matuska	32768	ZFS_DEBUG_RAIDZ_RECONSTRUCT	Enabled debugging messages for raidz reconstruction.
1622*61145dc2SMartin Matuska	65536	ZFS_DEBUG_DDT	Enable DDT-related debugging messages.
16233ff01b23SMartin Matuska.TE
16243ff01b23SMartin Matuska.Sy \& * No Requires debug build .
16253ff01b23SMartin Matuska.
1626c7046f76SMartin Matuska.It Sy zfs_btree_verify_intensity Ns = Ns Sy 0 Pq uint
1627c7046f76SMartin MatuskaEnables btree verification.
1628c6767dc1SMartin MatuskaThe following settings are cumulative:
1629c7046f76SMartin Matuska.TS
1630c7046f76SMartin Matuskabox;
1631c7046f76SMartin Matuskalbz r l l .
1632c7046f76SMartin Matuska	Value	Description
1633c7046f76SMartin Matuska
1634c7046f76SMartin Matuska	1	Verify height.
1635c7046f76SMartin Matuska	2	Verify pointers from children to parent.
1636c7046f76SMartin Matuska	3	Verify element counts.
1637c7046f76SMartin Matuska	4	Verify element order. (expensive)
1638c7046f76SMartin Matuska*	5	Verify unused memory is poisoned. (expensive)
1639c7046f76SMartin Matuska.TE
1640c7046f76SMartin Matuska.Sy \& * No Requires debug build .
1641c7046f76SMartin Matuska.
16423ff01b23SMartin Matuska.It Sy zfs_free_leak_on_eio Ns = Ns Sy 0 Ns | Ns 1 Pq int
16433ff01b23SMartin MatuskaIf destroy encounters an
16443ff01b23SMartin Matuska.Sy EIO
16453ff01b23SMartin Matuskawhile reading metadata (e.g. indirect blocks),
16463ff01b23SMartin Matuskaspace referenced by the missing metadata can not be freed.
16473ff01b23SMartin MatuskaNormally this causes the background destroy to become "stalled",
16483ff01b23SMartin Matuskaas it is unable to make forward progress.
16493ff01b23SMartin MatuskaWhile in this stalled state, all remaining space to free
16503ff01b23SMartin Matuskafrom the error-encountering filesystem is "temporarily leaked".
16513ff01b23SMartin MatuskaSet this flag to cause it to ignore the
16523ff01b23SMartin Matuska.Sy EIO ,
16533ff01b23SMartin Matuskapermanently leak the space from indirect blocks that can not be read,
16543ff01b23SMartin Matuskaand continue to free everything else that it can.
16553ff01b23SMartin Matuska.Pp
16563ff01b23SMartin MatuskaThe default "stalling" behavior is useful if the storage partially
16573ff01b23SMartin Matuskafails (i.e. some but not all I/O operations fail), and then later recovers.
16583ff01b23SMartin MatuskaIn this case, we will be able to continue pool operations while it is
16593ff01b23SMartin Matuskapartially failed, and when it recovers, we can continue to free the
16603ff01b23SMartin Matuskaspace, with no leaks.
16613ff01b23SMartin MatuskaNote, however, that this case is actually fairly rare.
16623ff01b23SMartin Matuska.Pp
16633ff01b23SMartin MatuskaTypically pools either
16643ff01b23SMartin Matuska.Bl -enum -compact -offset 4n -width "1."
16653ff01b23SMartin Matuska.It
16663ff01b23SMartin Matuskafail completely (but perhaps temporarily,
16673ff01b23SMartin Matuskae.g. due to a top-level vdev going offline), or
16683ff01b23SMartin Matuska.It
16693ff01b23SMartin Matuskahave localized, permanent errors (e.g. disk returns the wrong data
16703ff01b23SMartin Matuskadue to bit flip or firmware bug).
16713ff01b23SMartin Matuska.El
16723ff01b23SMartin MatuskaIn the former case, this setting does not matter because the
16733ff01b23SMartin Matuskapool will be suspended and the sync thread will not be able to make
16743ff01b23SMartin Matuskaforward progress regardless.
16753ff01b23SMartin MatuskaIn the latter, because the error is permanent, the best we can do
16763ff01b23SMartin Matuskais leak the minimum amount of space,
16773ff01b23SMartin Matuskawhich is what setting this flag will do.
16783ff01b23SMartin MatuskaIt is therefore reasonable for this flag to normally be set,
16793ff01b23SMartin Matuskabut we chose the more conservative approach of not setting it,
16803ff01b23SMartin Matuskaso that there is no possibility of
16813ff01b23SMartin Matuskaleaking space in the "partial temporary" failure case.
16823ff01b23SMartin Matuska.
1683be181ee2SMartin Matuska.It Sy zfs_free_min_time_ms Ns = Ns Sy 1000 Ns ms Po 1s Pc Pq uint
16843ff01b23SMartin MatuskaDuring a
16853ff01b23SMartin Matuska.Nm zfs Cm destroy
16863ff01b23SMartin Matuskaoperation using the
16873ff01b23SMartin Matuska.Sy async_destroy
16883ff01b23SMartin Matuskafeature,
16893ff01b23SMartin Matuskaa minimum of this much time will be spent working on freeing blocks per TXG.
16903ff01b23SMartin Matuska.
1691be181ee2SMartin Matuska.It Sy zfs_obsolete_min_time_ms Ns = Ns Sy 500 Ns ms Pq uint
16923ff01b23SMartin MatuskaSimilar to
16933ff01b23SMartin Matuska.Sy zfs_free_min_time_ms ,
16943ff01b23SMartin Matuskabut for cleanup of old indirection records for removed vdevs.
16953ff01b23SMartin Matuska.
1696dbd5678dSMartin Matuska.It Sy zfs_immediate_write_sz Ns = Ns Sy 32768 Ns B Po 32 KiB Pc Pq s64
16973ff01b23SMartin MatuskaLargest data block to write to the ZIL.
16983ff01b23SMartin MatuskaLarger blocks will be treated as if the dataset being written to had the
16993ff01b23SMartin Matuska.Sy logbias Ns = Ns Sy throughput
17003ff01b23SMartin Matuskaproperty set.
17013ff01b23SMartin Matuska.
1702dbd5678dSMartin Matuska.It Sy zfs_initialize_value Ns = Ns Sy 16045690984833335022 Po 0xDEADBEEFDEADBEEE Pc Pq u64
17033ff01b23SMartin MatuskaPattern written to vdev free space by
17043ff01b23SMartin Matuska.Xr zpool-initialize 8 .
17053ff01b23SMartin Matuska.
1706dbd5678dSMartin Matuska.It Sy zfs_initialize_chunk_size Ns = Ns Sy 1048576 Ns B Po 1 MiB Pc Pq u64
17073ff01b23SMartin MatuskaSize of writes used by
17083ff01b23SMartin Matuska.Xr zpool-initialize 8 .
17093ff01b23SMartin MatuskaThis option is used by the test suite.
17103ff01b23SMartin Matuska.
1711dbd5678dSMartin Matuska.It Sy zfs_livelist_max_entries Ns = Ns Sy 500000 Po 5*10^5 Pc Pq u64
17123ff01b23SMartin MatuskaThe threshold size (in block pointers) at which we create a new sub-livelist.
17133ff01b23SMartin MatuskaLarger sublists are more costly from a memory perspective but the fewer
17143ff01b23SMartin Matuskasublists there are, the lower the cost of insertion.
17153ff01b23SMartin Matuska.
17163ff01b23SMartin Matuska.It Sy zfs_livelist_min_percent_shared Ns = Ns Sy 75 Ns % Pq int
17173ff01b23SMartin MatuskaIf the amount of shared space between a snapshot and its clone drops below
17183ff01b23SMartin Matuskathis threshold, the clone turns off the livelist and reverts to the old
17193ff01b23SMartin Matuskadeletion method.
17203ff01b23SMartin MatuskaThis is in place because livelists no long give us a benefit
17213ff01b23SMartin Matuskaonce a clone has been overwritten enough.
17223ff01b23SMartin Matuska.
17233ff01b23SMartin Matuska.It Sy zfs_livelist_condense_new_alloc Ns = Ns Sy 0 Pq int
17243ff01b23SMartin MatuskaIncremented each time an extra ALLOC blkptr is added to a livelist entry while
17253ff01b23SMartin Matuskait is being condensed.
17263ff01b23SMartin MatuskaThis option is used by the test suite to track race conditions.
17273ff01b23SMartin Matuska.
17283ff01b23SMartin Matuska.It Sy zfs_livelist_condense_sync_cancel Ns = Ns Sy 0 Pq int
17293ff01b23SMartin MatuskaIncremented each time livelist condensing is canceled while in
17303ff01b23SMartin Matuska.Fn spa_livelist_condense_sync .
17313ff01b23SMartin MatuskaThis option is used by the test suite to track race conditions.
17323ff01b23SMartin Matuska.
17333ff01b23SMartin Matuska.It Sy zfs_livelist_condense_sync_pause Ns = Ns Sy 0 Ns | Ns 1 Pq int
17343ff01b23SMartin MatuskaWhen set, the livelist condense process pauses indefinitely before
1735e92ffd9bSMartin Matuskaexecuting the synctask \(em
17363ff01b23SMartin Matuska.Fn spa_livelist_condense_sync .
17373ff01b23SMartin MatuskaThis option is used by the test suite to trigger race conditions.
17383ff01b23SMartin Matuska.
17393ff01b23SMartin Matuska.It Sy zfs_livelist_condense_zthr_cancel Ns = Ns Sy 0 Pq int
17403ff01b23SMartin MatuskaIncremented each time livelist condensing is canceled while in
17413ff01b23SMartin Matuska.Fn spa_livelist_condense_cb .
17423ff01b23SMartin MatuskaThis option is used by the test suite to track race conditions.
17433ff01b23SMartin Matuska.
17443ff01b23SMartin Matuska.It Sy zfs_livelist_condense_zthr_pause Ns = Ns Sy 0 Ns | Ns 1 Pq int
17453ff01b23SMartin MatuskaWhen set, the livelist condense process pauses indefinitely before
17463ff01b23SMartin Matuskaexecuting the open context condensing work in
17473ff01b23SMartin Matuska.Fn spa_livelist_condense_cb .
17483ff01b23SMartin MatuskaThis option is used by the test suite to trigger race conditions.
17493ff01b23SMartin Matuska.
1750dbd5678dSMartin Matuska.It Sy zfs_lua_max_instrlimit Ns = Ns Sy 100000000 Po 10^8 Pc Pq u64
17513ff01b23SMartin MatuskaThe maximum execution time limit that can be set for a ZFS channel program,
17523ff01b23SMartin Matuskaspecified as a number of Lua instructions.
17533ff01b23SMartin Matuska.
1754dbd5678dSMartin Matuska.It Sy zfs_lua_max_memlimit Ns = Ns Sy 104857600 Po 100 MiB Pc Pq u64
17553ff01b23SMartin MatuskaThe maximum memory limit that can be set for a ZFS channel program, specified
17563ff01b23SMartin Matuskain bytes.
17573ff01b23SMartin Matuska.
17583ff01b23SMartin Matuska.It Sy zfs_max_dataset_nesting Ns = Ns Sy 50 Pq int
17593ff01b23SMartin MatuskaThe maximum depth of nested datasets.
17603ff01b23SMartin MatuskaThis value can be tuned temporarily to
17613ff01b23SMartin Matuskafix existing datasets that exceed the predefined limit.
17623ff01b23SMartin Matuska.
1763dbd5678dSMartin Matuska.It Sy zfs_max_log_walking Ns = Ns Sy 5 Pq u64
17643ff01b23SMartin MatuskaThe number of past TXGs that the flushing algorithm of the log spacemap
17653ff01b23SMartin Matuskafeature uses to estimate incoming log blocks.
17663ff01b23SMartin Matuska.
1767dbd5678dSMartin Matuska.It Sy zfs_max_logsm_summary_length Ns = Ns Sy 10 Pq u64
17683ff01b23SMartin MatuskaMaximum number of rows allowed in the summary of the spacemap log.
17693ff01b23SMartin Matuska.
1770be181ee2SMartin Matuska.It Sy zfs_max_recordsize Ns = Ns Sy 16777216 Po 16 MiB Pc Pq uint
17713ff01b23SMartin MatuskaWe currently support block sizes from
1772716fd348SMartin Matuska.Em 512 Po 512 B Pc No to Em 16777216 Po 16 MiB Pc .
17733ff01b23SMartin MatuskaThe benefits of larger blocks, and thus larger I/O,
17743ff01b23SMartin Matuskaneed to be weighed against the cost of COWing a giant block to modify one byte.
17753ff01b23SMartin MatuskaAdditionally, very large blocks can have an impact on I/O latency,
17763ff01b23SMartin Matuskaand also potentially on the memory allocator.
1777716fd348SMartin MatuskaTherefore, we formerly forbade creating blocks larger than 1M.
1778716fd348SMartin MatuskaLarger blocks could be created by changing it,
17793ff01b23SMartin Matuskaand pools with larger blocks can always be imported and used,
17803ff01b23SMartin Matuskaregardless of this setting.
17817a7741afSMartin Matuska.Pp
17827a7741afSMartin MatuskaNote that it is still limited by default to
17837a7741afSMartin Matuska.Ar 1 MiB
17847a7741afSMartin Matuskaon x86_32, because Linux's
17857a7741afSMartin Matuska3/1 memory split doesn't leave much room for 16M chunks.
17863ff01b23SMartin Matuska.
17873ff01b23SMartin Matuska.It Sy zfs_allow_redacted_dataset_mount Ns = Ns Sy 0 Ns | Ns 1 Pq int
17883ff01b23SMartin MatuskaAllow datasets received with redacted send/receive to be mounted.
17893ff01b23SMartin MatuskaNormally disabled because these datasets may be missing key data.
17903ff01b23SMartin Matuska.
1791dbd5678dSMartin Matuska.It Sy zfs_min_metaslabs_to_flush Ns = Ns Sy 1 Pq u64
17923ff01b23SMartin MatuskaMinimum number of metaslabs to flush per dirty TXG.
17933ff01b23SMartin Matuska.
1794b59a0cdeSMartin Matuska.It Sy zfs_metaslab_fragmentation_threshold Ns = Ns Sy 77 Ns % Pq uint
17953ff01b23SMartin MatuskaAllow metaslabs to keep their active state as long as their fragmentation
17963ff01b23SMartin Matuskapercentage is no more than this value.
17973ff01b23SMartin MatuskaAn active metaslab that exceeds this threshold
17983ff01b23SMartin Matuskawill no longer keep its active status allowing better metaslabs to be selected.
17993ff01b23SMartin Matuska.
1800be181ee2SMartin Matuska.It Sy zfs_mg_fragmentation_threshold Ns = Ns Sy 95 Ns % Pq uint
18013ff01b23SMartin MatuskaMetaslab groups are considered eligible for allocations if their
18023ff01b23SMartin Matuskafragmentation metric (measured as a percentage) is less than or equal to
18033ff01b23SMartin Matuskathis value.
18043ff01b23SMartin MatuskaIf a metaslab group exceeds this threshold then it will be
18053ff01b23SMartin Matuskaskipped unless all metaslab groups within the metaslab class have also
18063ff01b23SMartin Matuskacrossed this threshold.
18073ff01b23SMartin Matuska.
1808be181ee2SMartin Matuska.It Sy zfs_mg_noalloc_threshold Ns = Ns Sy 0 Ns % Pq uint
18093ff01b23SMartin MatuskaDefines a threshold at which metaslab groups should be eligible for allocations.
18103ff01b23SMartin MatuskaThe value is expressed as a percentage of free space
18113ff01b23SMartin Matuskabeyond which a metaslab group is always eligible for allocations.
18123ff01b23SMartin MatuskaIf a metaslab group's free space is less than or equal to the
18133ff01b23SMartin Matuskathreshold, the allocator will avoid allocating to that group
18143ff01b23SMartin Matuskaunless all groups in the pool have reached the threshold.
18153ff01b23SMartin MatuskaOnce all groups have reached the threshold, all groups are allowed to accept
18163ff01b23SMartin Matuskaallocations.
18173ff01b23SMartin MatuskaThe default value of
18183ff01b23SMartin Matuska.Sy 0
1819bb2d13b6SMartin Matuskadisables the feature and causes all metaslab groups to be eligible for
1820bb2d13b6SMartin Matuskaallocations.
18213ff01b23SMartin Matuska.Pp
18223ff01b23SMartin MatuskaThis parameter allows one to deal with pools having heavily imbalanced
18233ff01b23SMartin Matuskavdevs such as would be the case when a new vdev has been added.
18243ff01b23SMartin MatuskaSetting the threshold to a non-zero percentage will stop allocations
18253ff01b23SMartin Matuskafrom being made to vdevs that aren't filled to the specified percentage
18263ff01b23SMartin Matuskaand allow lesser filled vdevs to acquire more allocations than they
18273ff01b23SMartin Matuskaotherwise would under the old
18283ff01b23SMartin Matuska.Sy zfs_mg_alloc_failures
18293ff01b23SMartin Matuskafacility.
18303ff01b23SMartin Matuska.
18313ff01b23SMartin Matuska.It Sy zfs_ddt_data_is_special Ns = Ns Sy 1 Ns | Ns 0 Pq int
18323ff01b23SMartin MatuskaIf enabled, ZFS will place DDT data into the special allocation class.
18333ff01b23SMartin Matuska.
18343ff01b23SMartin Matuska.It Sy zfs_user_indirect_is_special Ns = Ns Sy 1 Ns | Ns 0 Pq int
18353ff01b23SMartin MatuskaIf enabled, ZFS will place user data indirect blocks
18363ff01b23SMartin Matuskainto the special allocation class.
18373ff01b23SMartin Matuska.
1838be181ee2SMartin Matuska.It Sy zfs_multihost_history Ns = Ns Sy 0 Pq uint
1839bb2d13b6SMartin MatuskaHistorical statistics for this many latest multihost updates will be available
1840bb2d13b6SMartin Matuskain
18413ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/ Ns Ao Ar pool Ac Ns Pa /multihost .
18423ff01b23SMartin Matuska.
1843dbd5678dSMartin Matuska.It Sy zfs_multihost_interval Ns = Ns Sy 1000 Ns ms Po 1 s Pc Pq u64
18443ff01b23SMartin MatuskaUsed to control the frequency of multihost writes which are performed when the
18453ff01b23SMartin Matuska.Sy multihost
18463ff01b23SMartin Matuskapool property is on.
18473ff01b23SMartin MatuskaThis is one of the factors used to determine the
18483ff01b23SMartin Matuskalength of the activity check during import.
18493ff01b23SMartin Matuska.Pp
18503ff01b23SMartin MatuskaThe multihost write period is
1851e92ffd9bSMartin Matuska.Sy zfs_multihost_interval No / Sy leaf-vdevs .
18523ff01b23SMartin MatuskaOn average a multihost write will be issued for each leaf vdev
18533ff01b23SMartin Matuskaevery
18543ff01b23SMartin Matuska.Sy zfs_multihost_interval
18553ff01b23SMartin Matuskamilliseconds.
18563ff01b23SMartin MatuskaIn practice, the observed period can vary with the I/O load
18573ff01b23SMartin Matuskaand this observed value is the delay which is stored in the uberblock.
18583ff01b23SMartin Matuska.
18593ff01b23SMartin Matuska.It Sy zfs_multihost_import_intervals Ns = Ns Sy 20 Pq uint
18603ff01b23SMartin MatuskaUsed to control the duration of the activity test on import.
18613ff01b23SMartin MatuskaSmaller values of
18623ff01b23SMartin Matuska.Sy zfs_multihost_import_intervals
18633ff01b23SMartin Matuskawill reduce the import time but increase
18643ff01b23SMartin Matuskathe risk of failing to detect an active pool.
18653ff01b23SMartin MatuskaThe total activity check time is never allowed to drop below one second.
18663ff01b23SMartin Matuska.Pp
18673ff01b23SMartin MatuskaOn import the activity check waits a minimum amount of time determined by
1868e92ffd9bSMartin Matuska.Sy zfs_multihost_interval No \(mu Sy zfs_multihost_import_intervals ,
18693ff01b23SMartin Matuskaor the same product computed on the host which last had the pool imported,
18703ff01b23SMartin Matuskawhichever is greater.
18713ff01b23SMartin MatuskaThe activity check time may be further extended if the value of MMP
18723ff01b23SMartin Matuskadelay found in the best uberblock indicates actual multihost updates happened
18733ff01b23SMartin Matuskaat longer intervals than
18743ff01b23SMartin Matuska.Sy zfs_multihost_interval .
18753ff01b23SMartin MatuskaA minimum of
18763ff01b23SMartin Matuska.Em 100 ms
18773ff01b23SMartin Matuskais enforced.
18783ff01b23SMartin Matuska.Pp
18793ff01b23SMartin Matuska.Sy 0 No is equivalent to Sy 1 .
18803ff01b23SMartin Matuska.
18813ff01b23SMartin Matuska.It Sy zfs_multihost_fail_intervals Ns = Ns Sy 10 Pq uint
18823ff01b23SMartin MatuskaControls the behavior of the pool when multihost write failures or delays are
18833ff01b23SMartin Matuskadetected.
18843ff01b23SMartin Matuska.Pp
18853ff01b23SMartin MatuskaWhen
18863ff01b23SMartin Matuska.Sy 0 ,
18873ff01b23SMartin Matuskamultihost write failures or delays are ignored.
18883ff01b23SMartin MatuskaThe failures will still be reported to the ZED which depending on
18893ff01b23SMartin Matuskaits configuration may take action such as suspending the pool or offlining a
18903ff01b23SMartin Matuskadevice.
18913ff01b23SMartin Matuska.Pp
18923ff01b23SMartin MatuskaOtherwise, the pool will be suspended if
1893e92ffd9bSMartin Matuska.Sy zfs_multihost_fail_intervals No \(mu Sy zfs_multihost_interval
18943ff01b23SMartin Matuskamilliseconds pass without a successful MMP write.
18953ff01b23SMartin MatuskaThis guarantees the activity test will see MMP writes if the pool is imported.
18963ff01b23SMartin Matuska.Sy 1 No is equivalent to Sy 2 ;
18973ff01b23SMartin Matuskathis is necessary to prevent the pool from being suspended
18983ff01b23SMartin Matuskadue to normal, small I/O latency variations.
18993ff01b23SMartin Matuska.
19003ff01b23SMartin Matuska.It Sy zfs_no_scrub_io Ns = Ns Sy 0 Ns | Ns 1 Pq int
19013ff01b23SMartin MatuskaSet to disable scrub I/O.
19023ff01b23SMartin MatuskaThis results in scrubs not actually scrubbing data and
19033ff01b23SMartin Matuskasimply doing a metadata crawl of the pool instead.
19043ff01b23SMartin Matuska.
19053ff01b23SMartin Matuska.It Sy zfs_no_scrub_prefetch Ns = Ns Sy 0 Ns | Ns 1 Pq int
19063ff01b23SMartin MatuskaSet to disable block prefetching for scrubs.
19073ff01b23SMartin Matuska.
19083ff01b23SMartin Matuska.It Sy zfs_nocacheflush Ns = Ns Sy 0 Ns | Ns 1 Pq int
19093ff01b23SMartin MatuskaDisable cache flush operations on disks when writing.
19103ff01b23SMartin MatuskaSetting this will cause pool corruption on power loss
19113ff01b23SMartin Matuskaif a volatile out-of-order write cache is enabled.
19123ff01b23SMartin Matuska.
19133ff01b23SMartin Matuska.It Sy zfs_nopwrite_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
19143ff01b23SMartin MatuskaAllow no-operation writes.
19153ff01b23SMartin MatuskaThe occurrence of nopwrites will further depend on other pool properties
19163ff01b23SMartin Matuska.Pq i.a. the checksumming and compression algorithms .
19173ff01b23SMartin Matuska.
1918681ce946SMartin Matuska.It Sy zfs_dmu_offset_next_sync Ns = Ns Sy 1 Ns | Ns 0 Pq int
19193ff01b23SMartin MatuskaEnable forcing TXG sync to find holes.
1920681ce946SMartin MatuskaWhen enabled forces ZFS to sync data when
19213ff01b23SMartin Matuska.Sy SEEK_HOLE No or Sy SEEK_DATA
1922681ce946SMartin Matuskaflags are used allowing holes in a file to be accurately reported.
1923681ce946SMartin MatuskaWhen disabled holes will not be reported in recently dirtied files.
19243ff01b23SMartin Matuska.
1925716fd348SMartin Matuska.It Sy zfs_pd_bytes_max Ns = Ns Sy 52428800 Ns B Po 50 MiB Pc Pq int
19263ff01b23SMartin MatuskaThe number of bytes which should be prefetched during a pool traversal, like
19273ff01b23SMartin Matuska.Nm zfs Cm send
19283ff01b23SMartin Matuskaor other data crawling operations.
19293ff01b23SMartin Matuska.
1930be181ee2SMartin Matuska.It Sy zfs_traverse_indirect_prefetch_limit Ns = Ns Sy 32 Pq uint
19313ff01b23SMartin MatuskaThe number of blocks pointed by indirect (non-L0) block which should be
19323ff01b23SMartin Matuskaprefetched during a pool traversal, like
19333ff01b23SMartin Matuska.Nm zfs Cm send
19343ff01b23SMartin Matuskaor other data crawling operations.
19353ff01b23SMartin Matuska.
1936dbd5678dSMartin Matuska.It Sy zfs_per_txg_dirty_frees_percent Ns = Ns Sy 30 Ns % Pq u64
19373ff01b23SMartin MatuskaControl percentage of dirtied indirect blocks from frees allowed into one TXG.
19383ff01b23SMartin MatuskaAfter this threshold is crossed, additional frees will wait until the next TXG.
19393ff01b23SMartin Matuska.Sy 0 No disables this throttle .
19403ff01b23SMartin Matuska.
19413ff01b23SMartin Matuska.It Sy zfs_prefetch_disable Ns = Ns Sy 0 Ns | Ns 1 Pq int
19423ff01b23SMartin MatuskaDisable predictive prefetch.
1943c03c5b1cSMartin MatuskaNote that it leaves "prescient" prefetch
1944c03c5b1cSMartin Matuska.Pq for, e.g., Nm zfs Cm send
19453ff01b23SMartin Matuskaintact.
19463ff01b23SMartin MatuskaUnlike predictive prefetch, prescient prefetch never issues I/O
19473ff01b23SMartin Matuskathat ends up not being needed, so it can't hurt performance.
19483ff01b23SMartin Matuska.
19493ff01b23SMartin Matuska.It Sy zfs_qat_checksum_disable Ns = Ns Sy 0 Ns | Ns 1 Pq int
19503ff01b23SMartin MatuskaDisable QAT hardware acceleration for SHA256 checksums.
19513ff01b23SMartin MatuskaMay be unset after the ZFS modules have been loaded to initialize the QAT
19523ff01b23SMartin Matuskahardware as long as support is compiled in and the QAT driver is present.
19533ff01b23SMartin Matuska.
19543ff01b23SMartin Matuska.It Sy zfs_qat_compress_disable Ns = Ns Sy 0 Ns | Ns 1 Pq int
19553ff01b23SMartin MatuskaDisable QAT hardware acceleration for gzip compression.
19563ff01b23SMartin MatuskaMay be unset after the ZFS modules have been loaded to initialize the QAT
19573ff01b23SMartin Matuskahardware as long as support is compiled in and the QAT driver is present.
19583ff01b23SMartin Matuska.
19593ff01b23SMartin Matuska.It Sy zfs_qat_encrypt_disable Ns = Ns Sy 0 Ns | Ns 1 Pq int
19603ff01b23SMartin MatuskaDisable QAT hardware acceleration for AES-GCM encryption.
19613ff01b23SMartin MatuskaMay be unset after the ZFS modules have been loaded to initialize the QAT
19623ff01b23SMartin Matuskahardware as long as support is compiled in and the QAT driver is present.
19633ff01b23SMartin Matuska.
1964dbd5678dSMartin Matuska.It Sy zfs_vnops_read_chunk_size Ns = Ns Sy 1048576 Ns B Po 1 MiB Pc Pq u64
19653ff01b23SMartin MatuskaBytes to read per chunk.
19663ff01b23SMartin Matuska.
1967be181ee2SMartin Matuska.It Sy zfs_read_history Ns = Ns Sy 0 Pq uint
19683ff01b23SMartin MatuskaHistorical statistics for this many latest reads will be available in
19693ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/ Ns Ao Ar pool Ac Ns Pa /reads .
19703ff01b23SMartin Matuska.
19713ff01b23SMartin Matuska.It Sy zfs_read_history_hits Ns = Ns Sy 0 Ns | Ns 1 Pq int
19723ff01b23SMartin MatuskaInclude cache hits in read history
19733ff01b23SMartin Matuska.
1974dbd5678dSMartin Matuska.It Sy zfs_rebuild_max_segment Ns = Ns Sy 1048576 Ns B Po 1 MiB Pc Pq u64
19753ff01b23SMartin MatuskaMaximum read segment size to issue when sequentially resilvering a
19763ff01b23SMartin Matuskatop-level vdev.
19773ff01b23SMartin Matuska.
19783ff01b23SMartin Matuska.It Sy zfs_rebuild_scrub_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
19793ff01b23SMartin MatuskaAutomatically start a pool scrub when the last active sequential resilver
19803ff01b23SMartin Matuskacompletes in order to verify the checksums of all blocks which have been
19813ff01b23SMartin Matuskaresilvered.
19823ff01b23SMartin MatuskaThis is enabled by default and strongly recommended.
19833ff01b23SMartin Matuska.
1984c9539b89SMartin Matuska.It Sy zfs_rebuild_vdev_limit Ns = Ns Sy 67108864 Ns B Po 64 MiB Pc Pq u64
19853ff01b23SMartin MatuskaMaximum amount of I/O that can be concurrently issued for a sequential
19863ff01b23SMartin Matuskaresilver per leaf device, given in bytes.
19873ff01b23SMartin Matuska.
19883ff01b23SMartin Matuska.It Sy zfs_reconstruct_indirect_combinations_max Ns = Ns Sy 4096 Pq int
19893ff01b23SMartin MatuskaIf an indirect split block contains more than this many possible unique
19903ff01b23SMartin Matuskacombinations when being reconstructed, consider it too computationally
19913ff01b23SMartin Matuskaexpensive to check them all.
19923ff01b23SMartin MatuskaInstead, try at most this many randomly selected
19933ff01b23SMartin Matuskacombinations each time the block is accessed.
19943ff01b23SMartin MatuskaThis allows all segment copies to participate fairly
19953ff01b23SMartin Matuskain the reconstruction when all combinations
19963ff01b23SMartin Matuskacannot be checked and prevents repeated use of one bad copy.
19973ff01b23SMartin Matuska.
19983ff01b23SMartin Matuska.It Sy zfs_recover Ns = Ns Sy 0 Ns | Ns 1 Pq int
19993ff01b23SMartin MatuskaSet to attempt to recover from fatal errors.
20003ff01b23SMartin MatuskaThis should only be used as a last resort,
20013ff01b23SMartin Matuskaas it typically results in leaked space, or worse.
20023ff01b23SMartin Matuska.
20033ff01b23SMartin Matuska.It Sy zfs_removal_ignore_errors Ns = Ns Sy 0 Ns | Ns 1 Pq int
2004c03c5b1cSMartin MatuskaIgnore hard I/O errors during device removal.
2005c03c5b1cSMartin MatuskaWhen set, if a device encounters a hard I/O error during the removal process
2006*61145dc2SMartin Matuskathe removal will not be canceled.
20073ff01b23SMartin MatuskaThis can result in a normally recoverable block becoming permanently damaged
20083ff01b23SMartin Matuskaand is hence not recommended.
20093ff01b23SMartin MatuskaThis should only be used as a last resort when the
20103ff01b23SMartin Matuskapool cannot be returned to a healthy state prior to removing the device.
20113ff01b23SMartin Matuska.
2012be181ee2SMartin Matuska.It Sy zfs_removal_suspend_progress Ns = Ns Sy 0 Ns | Ns 1 Pq uint
20133ff01b23SMartin MatuskaThis is used by the test suite so that it can ensure that certain actions
20143ff01b23SMartin Matuskahappen while in the middle of a removal.
20153ff01b23SMartin Matuska.
2016be181ee2SMartin Matuska.It Sy zfs_remove_max_segment Ns = Ns Sy 16777216 Ns B Po 16 MiB Pc Pq uint
20173ff01b23SMartin MatuskaThe largest contiguous segment that we will attempt to allocate when removing
20183ff01b23SMartin Matuskaa device.
20193ff01b23SMartin MatuskaIf there is a performance problem with attempting to allocate large blocks,
20203ff01b23SMartin Matuskaconsider decreasing this.
20213ff01b23SMartin MatuskaThe default value is also the maximum.
20223ff01b23SMartin Matuska.
20233ff01b23SMartin Matuska.It Sy zfs_resilver_disable_defer Ns = Ns Sy 0 Ns | Ns 1 Pq int
20243ff01b23SMartin MatuskaIgnore the
20253ff01b23SMartin Matuska.Sy resilver_defer
20263ff01b23SMartin Matuskafeature, causing an operation that would start a resilver to
20273ff01b23SMartin Matuskaimmediately restart the one in progress.
20283ff01b23SMartin Matuska.
20297a7741afSMartin Matuska.It Sy zfs_resilver_defer_percent Ns = Ns Sy 10 Ns % Pq uint
20307a7741afSMartin MatuskaIf the ongoing resilver progress is below this threshold, a new resilver will
20317a7741afSMartin Matuskarestart from scratch instead of being deferred after the current one finishes,
20327a7741afSMartin Matuskaeven if the
20337a7741afSMartin Matuska.Sy resilver_defer
20347a7741afSMartin Matuskafeature is enabled.
20357a7741afSMartin Matuska.
2036be181ee2SMartin Matuska.It Sy zfs_resilver_min_time_ms Ns = Ns Sy 3000 Ns ms Po 3 s Pc Pq uint
20373ff01b23SMartin MatuskaResilvers are processed by the sync thread.
20383ff01b23SMartin MatuskaWhile resilvering, it will spend at least this much time
20393ff01b23SMartin Matuskaworking on a resilver between TXG flushes.
20403ff01b23SMartin Matuska.
20413ff01b23SMartin Matuska.It Sy zfs_scan_ignore_errors Ns = Ns Sy 0 Ns | Ns 1 Pq int
20423ff01b23SMartin MatuskaIf set, remove the DTL (dirty time list) upon completion of a pool scan (scrub),
20433ff01b23SMartin Matuskaeven if there were unrepairable errors.
20443ff01b23SMartin MatuskaIntended to be used during pool repair or recovery to
20453ff01b23SMartin Matuskastop resilvering when the pool is next imported.
20463ff01b23SMartin Matuska.
2047e716630dSMartin Matuska.It Sy zfs_scrub_after_expand Ns = Ns Sy 1 Ns | Ns 0 Pq int
2048e716630dSMartin MatuskaAutomatically start a pool scrub after a RAIDZ expansion completes
2049e716630dSMartin Matuskain order to verify the checksums of all blocks which have been
2050e716630dSMartin Matuskacopied during the expansion.
2051e716630dSMartin MatuskaThis is enabled by default and strongly recommended.
2052e716630dSMartin Matuska.
2053be181ee2SMartin Matuska.It Sy zfs_scrub_min_time_ms Ns = Ns Sy 1000 Ns ms Po 1 s Pc Pq uint
20543ff01b23SMartin MatuskaScrubs are processed by the sync thread.
20553ff01b23SMartin MatuskaWhile scrubbing, it will spend at least this much time
20563ff01b23SMartin Matuskaworking on a scrub between TXG flushes.
20573ff01b23SMartin Matuska.
2058c0a83fe0SMartin Matuska.It Sy zfs_scrub_error_blocks_per_txg Ns = Ns Sy 4096 Pq uint
2059c0a83fe0SMartin MatuskaError blocks to be scrubbed in one txg.
2060c0a83fe0SMartin Matuska.
2061be181ee2SMartin Matuska.It Sy zfs_scan_checkpoint_intval Ns = Ns Sy 7200 Ns s Po 2 hour Pc Pq uint
20623ff01b23SMartin MatuskaTo preserve progress across reboots, the sequential scan algorithm periodically
20633ff01b23SMartin Matuskaneeds to stop metadata scanning and issue all the verification I/O to disk.
20643ff01b23SMartin MatuskaThe frequency of this flushing is determined by this tunable.
20653ff01b23SMartin Matuska.
2066be181ee2SMartin Matuska.It Sy zfs_scan_fill_weight Ns = Ns Sy 3 Pq uint
20673ff01b23SMartin MatuskaThis tunable affects how scrub and resilver I/O segments are ordered.
20683ff01b23SMartin MatuskaA higher number indicates that we care more about how filled in a segment is,
20693ff01b23SMartin Matuskawhile a lower number indicates we care more about the size of the extent without
20703ff01b23SMartin Matuskaconsidering the gaps within a segment.
20713ff01b23SMartin MatuskaThis value is only tunable upon module insertion.
2072bb2d13b6SMartin MatuskaChanging the value afterwards will have no effect on scrub or resilver
2073bb2d13b6SMartin Matuskaperformance.
20743ff01b23SMartin Matuska.
2075be181ee2SMartin Matuska.It Sy zfs_scan_issue_strategy Ns = Ns Sy 0 Pq uint
20763ff01b23SMartin MatuskaDetermines the order that data will be verified while scrubbing or resilvering:
20773ff01b23SMartin Matuska.Bl -tag -compact -offset 4n -width "a"
20783ff01b23SMartin Matuska.It Sy 1
20793ff01b23SMartin MatuskaData will be verified as sequentially as possible, given the
20803ff01b23SMartin Matuskaamount of memory reserved for scrubbing
20813ff01b23SMartin Matuska.Pq see Sy zfs_scan_mem_lim_fact .
20823ff01b23SMartin MatuskaThis may improve scrub performance if the pool's data is very fragmented.
20833ff01b23SMartin Matuska.It Sy 2
20843ff01b23SMartin MatuskaThe largest mostly-contiguous chunk of found data will be verified first.
20853ff01b23SMartin MatuskaBy deferring scrubbing of small segments, we may later find adjacent data
20863ff01b23SMartin Matuskato coalesce and increase the segment size.
20873ff01b23SMartin Matuska.It Sy 0
20883ff01b23SMartin Matuska.No Use strategy Sy 1 No during normal verification
20893ff01b23SMartin Matuska.No and strategy Sy 2 No while taking a checkpoint .
20903ff01b23SMartin Matuska.El
20913ff01b23SMartin Matuska.
20923ff01b23SMartin Matuska.It Sy zfs_scan_legacy Ns = Ns Sy 0 Ns | Ns 1 Pq int
20933ff01b23SMartin MatuskaIf unset, indicates that scrubs and resilvers will gather metadata in
20943ff01b23SMartin Matuskamemory before issuing sequential I/O.
20953ff01b23SMartin MatuskaOtherwise indicates that the legacy algorithm will be used,
20963ff01b23SMartin Matuskawhere I/O is initiated as soon as it is discovered.
20973ff01b23SMartin MatuskaUnsetting will not affect scrubs or resilvers that are already in progress.
20983ff01b23SMartin Matuska.
2099716fd348SMartin Matuska.It Sy zfs_scan_max_ext_gap Ns = Ns Sy 2097152 Ns B Po 2 MiB Pc Pq int
21003ff01b23SMartin MatuskaSets the largest gap in bytes between scrub/resilver I/O operations
21013ff01b23SMartin Matuskathat will still be considered sequential for sorting purposes.
21023ff01b23SMartin MatuskaChanging this value will not
21033ff01b23SMartin Matuskaaffect scrubs or resilvers that are already in progress.
21043ff01b23SMartin Matuska.
2105be181ee2SMartin Matuska.It Sy zfs_scan_mem_lim_fact Ns = Ns Sy 20 Ns ^-1 Pq uint
21063ff01b23SMartin MatuskaMaximum fraction of RAM used for I/O sorting by sequential scan algorithm.
21073ff01b23SMartin MatuskaThis tunable determines the hard limit for I/O sorting memory usage.
21083ff01b23SMartin MatuskaWhen the hard limit is reached we stop scanning metadata and start issuing
21093ff01b23SMartin Matuskadata verification I/O.
21103ff01b23SMartin MatuskaThis is done until we get below the soft limit.
21113ff01b23SMartin Matuska.
2112be181ee2SMartin Matuska.It Sy zfs_scan_mem_lim_soft_fact Ns = Ns Sy 20 Ns ^-1 Pq uint
21133ff01b23SMartin MatuskaThe fraction of the hard limit used to determined the soft limit for I/O sorting
21143ff01b23SMartin Matuskaby the sequential scan algorithm.
21153ff01b23SMartin MatuskaWhen we cross this limit from below no action is taken.
2116bb2d13b6SMartin MatuskaWhen we cross this limit from above it is because we are issuing verification
2117bb2d13b6SMartin MatuskaI/O.
21183ff01b23SMartin MatuskaIn this case (unless the metadata scan is done) we stop issuing verification I/O
21193ff01b23SMartin Matuskaand start scanning metadata again until we get to the hard limit.
21203ff01b23SMartin Matuska.
2121c9539b89SMartin Matuska.It Sy zfs_scan_report_txgs Ns = Ns Sy 0 Ns | Ns 1 Pq uint
2122c9539b89SMartin MatuskaWhen reporting resilver throughput and estimated completion time use the
2123c9539b89SMartin Matuskaperformance observed over roughly the last
2124c9539b89SMartin Matuska.Sy zfs_scan_report_txgs
2125c9539b89SMartin MatuskaTXGs.
2126c9539b89SMartin MatuskaWhen set to zero performance is calculated over the time between checkpoints.
2127c9539b89SMartin Matuska.
21283ff01b23SMartin Matuska.It Sy zfs_scan_strict_mem_lim Ns = Ns Sy 0 Ns | Ns 1 Pq int
21293ff01b23SMartin MatuskaEnforce tight memory limits on pool scans when a sequential scan is in progress.
21303ff01b23SMartin MatuskaWhen disabled, the memory limit may be exceeded by fast disks.
21313ff01b23SMartin Matuska.
21323ff01b23SMartin Matuska.It Sy zfs_scan_suspend_progress Ns = Ns Sy 0 Ns | Ns 1 Pq int
21333ff01b23SMartin MatuskaFreezes a scrub/resilver in progress without actually pausing it.
21343ff01b23SMartin MatuskaIntended for testing/debugging.
21353ff01b23SMartin Matuska.
2136c9539b89SMartin Matuska.It Sy zfs_scan_vdev_limit Ns = Ns Sy 16777216 Ns B Po 16 MiB Pc Pq int
21373ff01b23SMartin MatuskaMaximum amount of data that can be concurrently issued at once for scrubs and
21383ff01b23SMartin Matuskaresilvers per leaf device, given in bytes.
21393ff01b23SMartin Matuska.
21403ff01b23SMartin Matuska.It Sy zfs_send_corrupt_data Ns = Ns Sy 0 Ns | Ns 1 Pq int
21413ff01b23SMartin MatuskaAllow sending of corrupt data (ignore read/checksum errors when sending).
21423ff01b23SMartin Matuska.
21433ff01b23SMartin Matuska.It Sy zfs_send_unmodified_spill_blocks Ns = Ns Sy 1 Ns | Ns 0 Pq int
21443ff01b23SMartin MatuskaInclude unmodified spill blocks in the send stream.
21453ff01b23SMartin MatuskaUnder certain circumstances, previous versions of ZFS could incorrectly
21463ff01b23SMartin Matuskaremove the spill block from an existing object.
21473ff01b23SMartin MatuskaIncluding unmodified copies of the spill blocks creates a backwards-compatible
21483ff01b23SMartin Matuskastream which will recreate a spill block if it was incorrectly removed.
21493ff01b23SMartin Matuska.
2150be181ee2SMartin Matuska.It Sy zfs_send_no_prefetch_queue_ff Ns = Ns Sy 20 Ns ^\-1 Pq uint
21513ff01b23SMartin MatuskaThe fill fraction of the
21523ff01b23SMartin Matuska.Nm zfs Cm send
21533ff01b23SMartin Matuskainternal queues.
21543ff01b23SMartin MatuskaThe fill fraction controls the timing with which internal threads are woken up.
21553ff01b23SMartin Matuska.
2156be181ee2SMartin Matuska.It Sy zfs_send_no_prefetch_queue_length Ns = Ns Sy 1048576 Ns B Po 1 MiB Pc Pq uint
21573ff01b23SMartin MatuskaThe maximum number of bytes allowed in
21583ff01b23SMartin Matuska.Nm zfs Cm send Ns 's
21593ff01b23SMartin Matuskainternal queues.
21603ff01b23SMartin Matuska.
2161be181ee2SMartin Matuska.It Sy zfs_send_queue_ff Ns = Ns Sy 20 Ns ^\-1 Pq uint
21623ff01b23SMartin MatuskaThe fill fraction of the
21633ff01b23SMartin Matuska.Nm zfs Cm send
21643ff01b23SMartin Matuskaprefetch queue.
21653ff01b23SMartin MatuskaThe fill fraction controls the timing with which internal threads are woken up.
21663ff01b23SMartin Matuska.
2167be181ee2SMartin Matuska.It Sy zfs_send_queue_length Ns = Ns Sy 16777216 Ns B Po 16 MiB Pc Pq uint
21683ff01b23SMartin MatuskaThe maximum number of bytes allowed that will be prefetched by
21693ff01b23SMartin Matuska.Nm zfs Cm send .
21703ff01b23SMartin MatuskaThis value must be at least twice the maximum block size in use.
21713ff01b23SMartin Matuska.
2172be181ee2SMartin Matuska.It Sy zfs_recv_queue_ff Ns = Ns Sy 20 Ns ^\-1 Pq uint
21733ff01b23SMartin MatuskaThe fill fraction of the
21743ff01b23SMartin Matuska.Nm zfs Cm receive
21753ff01b23SMartin Matuskaqueue.
21763ff01b23SMartin MatuskaThe fill fraction controls the timing with which internal threads are woken up.
21773ff01b23SMartin Matuska.
2178be181ee2SMartin Matuska.It Sy zfs_recv_queue_length Ns = Ns Sy 16777216 Ns B Po 16 MiB Pc Pq uint
21793ff01b23SMartin MatuskaThe maximum number of bytes allowed in the
21803ff01b23SMartin Matuska.Nm zfs Cm receive
21813ff01b23SMartin Matuskaqueue.
21823ff01b23SMartin MatuskaThis value must be at least twice the maximum block size in use.
21833ff01b23SMartin Matuska.
2184be181ee2SMartin Matuska.It Sy zfs_recv_write_batch_size Ns = Ns Sy 1048576 Ns B Po 1 MiB Pc Pq uint
21853ff01b23SMartin MatuskaThe maximum amount of data, in bytes, that
21863ff01b23SMartin Matuska.Nm zfs Cm receive
21873ff01b23SMartin Matuskawill write in one DMU transaction.
21883ff01b23SMartin MatuskaThis is the uncompressed size, even when receiving a compressed send stream.
21893ff01b23SMartin MatuskaThis setting will not reduce the write size below a single block.
21903ff01b23SMartin MatuskaCapped at a maximum of
2191716fd348SMartin Matuska.Sy 32 MiB .
21923ff01b23SMartin Matuska.
2193271171e0SMartin Matuska.It Sy zfs_recv_best_effort_corrective Ns = Ns Sy 0 Pq int
2194271171e0SMartin MatuskaWhen this variable is set to non-zero a corrective receive:
2195271171e0SMartin Matuska.Bl -enum -compact -offset 4n -width "1."
2196271171e0SMartin Matuska.It
2197271171e0SMartin MatuskaDoes not enforce the restriction of source & destination snapshot GUIDs
2198271171e0SMartin Matuskamatching.
2199271171e0SMartin Matuska.It
2200271171e0SMartin MatuskaIf there is an error during healing, the healing receive is not
2201271171e0SMartin Matuskaterminated instead it moves on to the next record.
2202271171e0SMartin Matuska.El
2203271171e0SMartin Matuska.
2204be181ee2SMartin Matuska.It Sy zfs_override_estimate_recordsize Ns = Ns Sy 0 Ns | Ns 1 Pq uint
22053ff01b23SMartin MatuskaSetting this variable overrides the default logic for estimating block
22063ff01b23SMartin Matuskasizes when doing a
22073ff01b23SMartin Matuska.Nm zfs Cm send .
22083ff01b23SMartin MatuskaThe default heuristic is that the average block size
22093ff01b23SMartin Matuskawill be the current recordsize.
22103ff01b23SMartin MatuskaOverride this value if most data in your dataset is not of that size
22113ff01b23SMartin Matuskaand you require accurate zfs send size estimates.
22123ff01b23SMartin Matuska.
2213be181ee2SMartin Matuska.It Sy zfs_sync_pass_deferred_free Ns = Ns Sy 2 Pq uint
22143ff01b23SMartin MatuskaFlushing of data to disk is done in passes.
22153ff01b23SMartin MatuskaDefer frees starting in this pass.
22163ff01b23SMartin Matuska.
2217716fd348SMartin Matuska.It Sy zfs_spa_discard_memory_limit Ns = Ns Sy 16777216 Ns B Po 16 MiB Pc Pq int
22183ff01b23SMartin MatuskaMaximum memory used for prefetching a checkpoint's space map on each
22193ff01b23SMartin Matuskavdev while discarding the checkpoint.
22203ff01b23SMartin Matuska.
2221be181ee2SMartin Matuska.It Sy zfs_special_class_metadata_reserve_pct Ns = Ns Sy 25 Ns % Pq uint
22223ff01b23SMartin MatuskaOnly allow small data blocks to be allocated on the special and dedup vdev
2223bb2d13b6SMartin Matuskatypes when the available free space percentage on these vdevs exceeds this
2224bb2d13b6SMartin Matuskavalue.
22253ff01b23SMartin MatuskaThis ensures reserved space is available for pool metadata as the
22263ff01b23SMartin Matuskaspecial vdevs approach capacity.
22273ff01b23SMartin Matuska.
2228be181ee2SMartin Matuska.It Sy zfs_sync_pass_dont_compress Ns = Ns Sy 8 Pq uint
22293ff01b23SMartin MatuskaStarting in this sync pass, disable compression (including of metadata).
22303ff01b23SMartin MatuskaWith the default setting, in practice, we don't have this many sync passes,
22313ff01b23SMartin Matuskaso this has no effect.
22323ff01b23SMartin Matuska.Pp
22333ff01b23SMartin MatuskaThe original intent was that disabling compression would help the sync passes
22343ff01b23SMartin Matuskato converge.
22353ff01b23SMartin MatuskaHowever, in practice, disabling compression increases
22363ff01b23SMartin Matuskathe average number of sync passes; because when we turn compression off,
22373ff01b23SMartin Matuskamany blocks' size will change, and thus we have to re-allocate
22383ff01b23SMartin Matuska(not overwrite) them.
22393ff01b23SMartin MatuskaIt also increases the number of
2240716fd348SMartin Matuska.Em 128 KiB
22413ff01b23SMartin Matuskaallocations (e.g. for indirect blocks and spacemaps)
22423ff01b23SMartin Matuskabecause these will not be compressed.
22433ff01b23SMartin MatuskaThe
2244716fd348SMartin Matuska.Em 128 KiB
22453ff01b23SMartin Matuskaallocations are especially detrimental to performance
2246bb2d13b6SMartin Matuskaon highly fragmented systems, which may have very few free segments of this
2247bb2d13b6SMartin Matuskasize,
22483ff01b23SMartin Matuskaand may need to load new metaslabs to satisfy these allocations.
22493ff01b23SMartin Matuska.
2250be181ee2SMartin Matuska.It Sy zfs_sync_pass_rewrite Ns = Ns Sy 2 Pq uint
22513ff01b23SMartin MatuskaRewrite new block pointers starting in this pass.
22523ff01b23SMartin Matuska.
2253716fd348SMartin Matuska.It Sy zfs_trim_extent_bytes_max Ns = Ns Sy 134217728 Ns B Po 128 MiB Pc Pq uint
22543ff01b23SMartin MatuskaMaximum size of TRIM command.
2255bb2d13b6SMartin MatuskaLarger ranges will be split into chunks no larger than this value before
2256bb2d13b6SMartin Matuskaissuing.
22573ff01b23SMartin Matuska.
2258716fd348SMartin Matuska.It Sy zfs_trim_extent_bytes_min Ns = Ns Sy 32768 Ns B Po 32 KiB Pc Pq uint
22593ff01b23SMartin MatuskaMinimum size of TRIM commands.
22603ff01b23SMartin MatuskaTRIM ranges smaller than this will be skipped,
22613ff01b23SMartin Matuskaunless they're part of a larger range which was chunked.
22623ff01b23SMartin MatuskaThis is done because it's common for these small TRIMs
22633ff01b23SMartin Matuskato negatively impact overall performance.
22643ff01b23SMartin Matuska.
22653ff01b23SMartin Matuska.It Sy zfs_trim_metaslab_skip Ns = Ns Sy 0 Ns | Ns 1 Pq uint
22663ff01b23SMartin MatuskaSkip uninitialized metaslabs during the TRIM process.
2267bb2d13b6SMartin MatuskaThis option is useful for pools constructed from large thinly-provisioned
2268bb2d13b6SMartin Matuskadevices
22693ff01b23SMartin Matuskawhere TRIM operations are slow.
22703ff01b23SMartin MatuskaAs a pool ages, an increasing fraction of the pool's metaslabs
22713ff01b23SMartin Matuskawill be initialized, progressively degrading the usefulness of this option.
22723ff01b23SMartin MatuskaThis setting is stored when starting a manual TRIM and will
22733ff01b23SMartin Matuskapersist for the duration of the requested TRIM.
22743ff01b23SMartin Matuska.
22753ff01b23SMartin Matuska.It Sy zfs_trim_queue_limit Ns = Ns Sy 10 Pq uint
22763ff01b23SMartin MatuskaMaximum number of queued TRIMs outstanding per leaf vdev.
22773ff01b23SMartin MatuskaThe number of concurrent TRIM commands issued to the device is controlled by
22783ff01b23SMartin Matuska.Sy zfs_vdev_trim_min_active No and Sy zfs_vdev_trim_max_active .
22793ff01b23SMartin Matuska.
22803ff01b23SMartin Matuska.It Sy zfs_trim_txg_batch Ns = Ns Sy 32 Pq uint
22813ff01b23SMartin MatuskaThe number of transaction groups' worth of frees which should be aggregated
22823ff01b23SMartin Matuskabefore TRIM operations are issued to the device.
22833ff01b23SMartin MatuskaThis setting represents a trade-off between issuing larger,
22843ff01b23SMartin Matuskamore efficient TRIM operations and the delay
22853ff01b23SMartin Matuskabefore the recently trimmed space is available for use by the device.
22863ff01b23SMartin Matuska.Pp
22873ff01b23SMartin MatuskaIncreasing this value will allow frees to be aggregated for a longer time.
2288bb2d13b6SMartin MatuskaThis will result is larger TRIM operations and potentially increased memory
2289bb2d13b6SMartin Matuskausage.
22903ff01b23SMartin MatuskaDecreasing this value will have the opposite effect.
22913ff01b23SMartin MatuskaThe default of
22923ff01b23SMartin Matuska.Sy 32
22933ff01b23SMartin Matuskawas determined to be a reasonable compromise.
22943ff01b23SMartin Matuska.
229575e1fea6SMartin Matuska.It Sy zfs_txg_history Ns = Ns Sy 100 Pq uint
22963ff01b23SMartin MatuskaHistorical statistics for this many latest TXGs will be available in
22973ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/ Ns Ao Ar pool Ac Ns Pa /TXGs .
22983ff01b23SMartin Matuska.
2299be181ee2SMartin Matuska.It Sy zfs_txg_timeout Ns = Ns Sy 5 Ns s Pq uint
2300bb2d13b6SMartin MatuskaFlush dirty data to disk at least every this many seconds (maximum TXG
2301bb2d13b6SMartin Matuskaduration).
23023ff01b23SMartin Matuska.
2303be181ee2SMartin Matuska.It Sy zfs_vdev_aggregation_limit Ns = Ns Sy 1048576 Ns B Po 1 MiB Pc Pq uint
23043ff01b23SMartin MatuskaMax vdev I/O aggregation size.
23053ff01b23SMartin Matuska.
2306be181ee2SMartin Matuska.It Sy zfs_vdev_aggregation_limit_non_rotating Ns = Ns Sy 131072 Ns B Po 128 KiB Pc Pq uint
23073ff01b23SMartin MatuskaMax vdev I/O aggregation size for non-rotating media.
23083ff01b23SMartin Matuska.
23093ff01b23SMartin Matuska.It Sy zfs_vdev_mirror_rotating_inc Ns = Ns Sy 0 Pq int
23103ff01b23SMartin MatuskaA number by which the balancing algorithm increments the load calculation for
23113ff01b23SMartin Matuskathe purpose of selecting the least busy mirror member when an I/O operation
23123ff01b23SMartin Matuskaimmediately follows its predecessor on rotational vdevs
23133ff01b23SMartin Matuskafor the purpose of making decisions based on load.
23143ff01b23SMartin Matuska.
23153ff01b23SMartin Matuska.It Sy zfs_vdev_mirror_rotating_seek_inc Ns = Ns Sy 5 Pq int
23163ff01b23SMartin MatuskaA number by which the balancing algorithm increments the load calculation for
23173ff01b23SMartin Matuskathe purpose of selecting the least busy mirror member when an I/O operation
23183ff01b23SMartin Matuskalacks locality as defined by
23193ff01b23SMartin Matuska.Sy zfs_vdev_mirror_rotating_seek_offset .
23203ff01b23SMartin MatuskaOperations within this that are not immediately following the previous operation
23213ff01b23SMartin Matuskaare incremented by half.
23223ff01b23SMartin Matuska.
2323716fd348SMartin Matuska.It Sy zfs_vdev_mirror_rotating_seek_offset Ns = Ns Sy 1048576 Ns B Po 1 MiB Pc Pq int
23243ff01b23SMartin MatuskaThe maximum distance for the last queued I/O operation in which
23253ff01b23SMartin Matuskathe balancing algorithm considers an operation to have locality.
23263ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
23273ff01b23SMartin Matuska.
23283ff01b23SMartin Matuska.It Sy zfs_vdev_mirror_non_rotating_inc Ns = Ns Sy 0 Pq int
23293ff01b23SMartin MatuskaA number by which the balancing algorithm increments the load calculation for
23303ff01b23SMartin Matuskathe purpose of selecting the least busy mirror member on non-rotational vdevs
23313ff01b23SMartin Matuskawhen I/O operations do not immediately follow one another.
23323ff01b23SMartin Matuska.
23333ff01b23SMartin Matuska.It Sy zfs_vdev_mirror_non_rotating_seek_inc Ns = Ns Sy 1 Pq int
23343ff01b23SMartin MatuskaA number by which the balancing algorithm increments the load calculation for
2335bb2d13b6SMartin Matuskathe purpose of selecting the least busy mirror member when an I/O operation
2336bb2d13b6SMartin Matuskalacks
23373ff01b23SMartin Matuskalocality as defined by the
23383ff01b23SMartin Matuska.Sy zfs_vdev_mirror_rotating_seek_offset .
23393ff01b23SMartin MatuskaOperations within this that are not immediately following the previous operation
23403ff01b23SMartin Matuskaare incremented by half.
23413ff01b23SMartin Matuska.
2342be181ee2SMartin Matuska.It Sy zfs_vdev_read_gap_limit Ns = Ns Sy 32768 Ns B Po 32 KiB Pc Pq uint
23433ff01b23SMartin MatuskaAggregate read I/O operations if the on-disk gap between them is within this
23443ff01b23SMartin Matuskathreshold.
23453ff01b23SMartin Matuska.
2346be181ee2SMartin Matuska.It Sy zfs_vdev_write_gap_limit Ns = Ns Sy 4096 Ns B Po 4 KiB Pc Pq uint
23473ff01b23SMartin MatuskaAggregate write I/O operations if the on-disk gap between them is within this
23483ff01b23SMartin Matuskathreshold.
23493ff01b23SMartin Matuska.
23503ff01b23SMartin Matuska.It Sy zfs_vdev_raidz_impl Ns = Ns Sy fastest Pq string
23513ff01b23SMartin MatuskaSelect the raidz parity implementation to use.
23523ff01b23SMartin Matuska.Pp
23533ff01b23SMartin MatuskaVariants that don't depend on CPU-specific features
23543ff01b23SMartin Matuskamay be selected on module load, as they are supported on all systems.
23553ff01b23SMartin MatuskaThe remaining options may only be set after the module is loaded,
23563ff01b23SMartin Matuskaas they are available only if the implementations are compiled in
23573ff01b23SMartin Matuskaand supported on the running system.
23583ff01b23SMartin Matuska.Pp
23593ff01b23SMartin MatuskaOnce the module is loaded,
23603ff01b23SMartin Matuska.Pa /sys/module/zfs/parameters/zfs_vdev_raidz_impl
23613ff01b23SMartin Matuskawill show the available options,
23623ff01b23SMartin Matuskawith the currently selected one enclosed in square brackets.
23633ff01b23SMartin Matuska.Pp
23643ff01b23SMartin Matuska.TS
23653ff01b23SMartin Matuskalb l l .
23663ff01b23SMartin Matuskafastest	selected by built-in benchmark
23673ff01b23SMartin Matuskaoriginal	original implementation
23683ff01b23SMartin Matuskascalar	scalar implementation
23693ff01b23SMartin Matuskasse2	SSE2 instruction set	64-bit x86
23703ff01b23SMartin Matuskassse3	SSSE3 instruction set	64-bit x86
23713ff01b23SMartin Matuskaavx2	AVX2 instruction set	64-bit x86
23723ff01b23SMartin Matuskaavx512f	AVX512F instruction set	64-bit x86
23733ff01b23SMartin Matuskaavx512bw	AVX512F & AVX512BW instruction sets	64-bit x86
23743ff01b23SMartin Matuskaaarch64_neon	NEON	Aarch64/64-bit ARMv8
23753ff01b23SMartin Matuskaaarch64_neonx2	NEON with more unrolling	Aarch64/64-bit ARMv8
23763ff01b23SMartin Matuskapowerpc_altivec	Altivec	PowerPC
23773ff01b23SMartin Matuska.TE
23783ff01b23SMartin Matuska.
23793ff01b23SMartin Matuska.It Sy zfs_vdev_scheduler Pq charp
23803ff01b23SMartin Matuska.Sy DEPRECATED .
23812faf504dSMartin MatuskaPrints warning to kernel log for compatibility.
23823ff01b23SMartin Matuska.
2383be181ee2SMartin Matuska.It Sy zfs_zevent_len_max Ns = Ns Sy 512 Pq uint
23843ff01b23SMartin MatuskaMax event queue length.
23853ff01b23SMartin MatuskaEvents in the queue can be viewed with
23863ff01b23SMartin Matuska.Xr zpool-events 8 .
23873ff01b23SMartin Matuska.
23883ff01b23SMartin Matuska.It Sy zfs_zevent_retain_max Ns = Ns Sy 2000 Pq int
23893ff01b23SMartin MatuskaMaximum recent zevent records to retain for duplicate checking.
23903ff01b23SMartin MatuskaSetting this to
23913ff01b23SMartin Matuska.Sy 0
23923ff01b23SMartin Matuskadisables duplicate detection.
23933ff01b23SMartin Matuska.
23943ff01b23SMartin Matuska.It Sy zfs_zevent_retain_expire_secs Ns = Ns Sy 900 Ns s Po 15 min Pc Pq int
23953ff01b23SMartin MatuskaLifespan for a recent ereport that was retained for duplicate checking.
23963ff01b23SMartin Matuska.
23973ff01b23SMartin Matuska.It Sy zfs_zil_clean_taskq_maxalloc Ns = Ns Sy 1048576 Pq int
23983ff01b23SMartin MatuskaThe maximum number of taskq entries that are allowed to be cached.
23993ff01b23SMartin MatuskaWhen this limit is exceeded transaction records (itxs)
24003ff01b23SMartin Matuskawill be cleaned synchronously.
24013ff01b23SMartin Matuska.
24023ff01b23SMartin Matuska.It Sy zfs_zil_clean_taskq_minalloc Ns = Ns Sy 1024 Pq int
24033ff01b23SMartin MatuskaThe number of taskq entries that are pre-populated when the taskq is first
24043ff01b23SMartin Matuskacreated and are immediately available for use.
24053ff01b23SMartin Matuska.
24063ff01b23SMartin Matuska.It Sy zfs_zil_clean_taskq_nthr_pct Ns = Ns Sy 100 Ns % Pq int
24073ff01b23SMartin MatuskaThis controls the number of threads used by
24083ff01b23SMartin Matuska.Sy dp_zil_clean_taskq .
24093ff01b23SMartin MatuskaThe default value of
24103ff01b23SMartin Matuska.Sy 100%
2411*61145dc2SMartin Matuskawill create a maximum of one thread per CPU.
24123ff01b23SMartin Matuska.
2413be181ee2SMartin Matuska.It Sy zil_maxblocksize Ns = Ns Sy 131072 Ns B Po 128 KiB Pc Pq uint
24143ff01b23SMartin MatuskaThis sets the maximum block size used by the ZIL.
24153ff01b23SMartin MatuskaOn very fragmented pools, lowering this
2416716fd348SMartin Matuska.Pq typically to Sy 36 KiB
24173ff01b23SMartin Matuskacan improve performance.
24183ff01b23SMartin Matuska.
2419b2526e8bSMartin Matuska.It Sy zil_maxcopied Ns = Ns Sy 7680 Ns B Po 7.5 KiB Pc Pq uint
2420b2526e8bSMartin MatuskaThis sets the maximum number of write bytes logged via WR_COPIED.
2421b2526e8bSMartin MatuskaIt tunes a tradeoff between additional memory copy and possibly worse log
2422b2526e8bSMartin Matuskaspace efficiency vs additional range lock/unlock.
2423b2526e8bSMartin Matuska.
24243ff01b23SMartin Matuska.It Sy zil_nocacheflush Ns = Ns Sy 0 Ns | Ns 1 Pq int
24253ff01b23SMartin MatuskaDisable the cache flush commands that are normally sent to disk by
24263ff01b23SMartin Matuskathe ZIL after an LWB write has completed.
24273ff01b23SMartin MatuskaSetting this will cause ZIL corruption on power loss
24283ff01b23SMartin Matuskaif a volatile out-of-order write cache is enabled.
24293ff01b23SMartin Matuska.
24303ff01b23SMartin Matuska.It Sy zil_replay_disable Ns = Ns Sy 0 Ns | Ns 1 Pq int
24313ff01b23SMartin MatuskaDisable intent logging replay.
24323ff01b23SMartin MatuskaCan be disabled for recovery from corrupted ZIL.
24333ff01b23SMartin Matuska.
243422b267e8SMartin Matuska.It Sy zil_slog_bulk Ns = Ns Sy 67108864 Ns B Po 64 MiB Pc Pq u64
24353ff01b23SMartin MatuskaLimit SLOG write size per commit executed with synchronous priority.
24363ff01b23SMartin MatuskaAny writes above that will be executed with lower (asynchronous) priority
24373ff01b23SMartin Matuskato limit potential SLOG device abuse by single active ZIL writer.
24383ff01b23SMartin Matuska.
2439c03c5b1cSMartin Matuska.It Sy zfs_zil_saxattr Ns = Ns Sy 1 Ns | Ns 0 Pq int
2440c03c5b1cSMartin MatuskaSetting this tunable to zero disables ZIL logging of new
2441c03c5b1cSMartin Matuska.Sy xattr Ns = Ns Sy sa
2442c03c5b1cSMartin Matuskarecords if the
2443c03c5b1cSMartin Matuska.Sy org.openzfs:zilsaxattr
2444c03c5b1cSMartin Matuskafeature is enabled on the pool.
2445c03c5b1cSMartin MatuskaThis would only be necessary to work around bugs in the ZIL logging or replay
2446c03c5b1cSMartin Matuskacode for this record type.
2447c03c5b1cSMartin MatuskaThe tunable has no effect if the feature is disabled.
2448c03c5b1cSMartin Matuska.
2449be181ee2SMartin Matuska.It Sy zfs_embedded_slog_min_ms Ns = Ns Sy 64 Pq uint
24503ff01b23SMartin MatuskaUsually, one metaslab from each normal-class vdev is dedicated for use by
24513ff01b23SMartin Matuskathe ZIL to log synchronous writes.
24523ff01b23SMartin MatuskaHowever, if there are fewer than
24533ff01b23SMartin Matuska.Sy zfs_embedded_slog_min_ms
24543ff01b23SMartin Matuskametaslabs in the vdev, this functionality is disabled.
2455bb2d13b6SMartin MatuskaThis ensures that we don't set aside an unreasonable amount of space for the
2456bb2d13b6SMartin MatuskaZIL.
24573ff01b23SMartin Matuska.
2458be181ee2SMartin Matuska.It Sy zstd_earlyabort_pass Ns = Ns Sy 1 Pq uint
2459e3aa18adSMartin MatuskaWhether heuristic for detection of incompressible data with zstd levels >= 3
2460e3aa18adSMartin Matuskausing LZ4 and zstd-1 passes is enabled.
2461e3aa18adSMartin Matuska.
2462be181ee2SMartin Matuska.It Sy zstd_abort_size Ns = Ns Sy 131072 Pq uint
2463e3aa18adSMartin MatuskaMinimal uncompressed size (inclusive) of a record before the early abort
2464e3aa18adSMartin Matuskaheuristic will be attempted.
2465e3aa18adSMartin Matuska.
24663ff01b23SMartin Matuska.It Sy zio_deadman_log_all Ns = Ns Sy 0 Ns | Ns 1 Pq int
24673ff01b23SMartin MatuskaIf non-zero, the zio deadman will produce debugging messages
24683ff01b23SMartin Matuska.Pq see Sy zfs_dbgmsg_enable
24693ff01b23SMartin Matuskafor all zios, rather than only for leaf zios possessing a vdev.
24703ff01b23SMartin MatuskaThis is meant to be used by developers to gain
24713ff01b23SMartin Matuskadiagnostic information for hang conditions which don't involve a mutex
24723ff01b23SMartin Matuskaor other locking primitive: typically conditions in which a thread in
24733ff01b23SMartin Matuskathe zio pipeline is looping indefinitely.
24743ff01b23SMartin Matuska.
24753ff01b23SMartin Matuska.It Sy zio_slow_io_ms Ns = Ns Sy 30000 Ns ms Po 30 s Pc Pq int
24763ff01b23SMartin MatuskaWhen an I/O operation takes more than this much time to complete,
24773ff01b23SMartin Matuskait's marked as slow.
24783ff01b23SMartin MatuskaEach slow operation causes a delay zevent.
24793ff01b23SMartin MatuskaSlow I/O counters can be seen with
24803ff01b23SMartin Matuska.Nm zpool Cm status Fl s .
24813ff01b23SMartin Matuska.
24823ff01b23SMartin Matuska.It Sy zio_dva_throttle_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
24833ff01b23SMartin MatuskaThrottle block allocations in the I/O pipeline.
2484*61145dc2SMartin MatuskaThis allows for dynamic allocation distribution based on device performance.
24853ff01b23SMartin Matuska.
2486c03c5b1cSMartin Matuska.It Sy zfs_xattr_compat Ns = Ns 0 Ns | Ns 1 Pq int
2487c03c5b1cSMartin MatuskaControl the naming scheme used when setting new xattrs in the user namespace.
2488c03c5b1cSMartin MatuskaIf
2489c03c5b1cSMartin Matuska.Sy 0
2490c03c5b1cSMartin Matuska.Pq the default on Linux ,
2491c03c5b1cSMartin Matuskauser namespace xattr names are prefixed with the namespace, to be backwards
2492c03c5b1cSMartin Matuskacompatible with previous versions of ZFS on Linux.
2493c03c5b1cSMartin MatuskaIf
2494c03c5b1cSMartin Matuska.Sy 1
2495c03c5b1cSMartin Matuska.Pq the default on Fx ,
2496c03c5b1cSMartin Matuskauser namespace xattr names are not prefixed, to be backwards compatible with
2497c03c5b1cSMartin Matuskaprevious versions of ZFS on illumos and
2498c03c5b1cSMartin Matuska.Fx .
2499c03c5b1cSMartin Matuska.Pp
2500c03c5b1cSMartin MatuskaEither naming scheme can be read on this and future versions of ZFS, regardless
2501c03c5b1cSMartin Matuskaof this tunable, but legacy ZFS on illumos or
2502c03c5b1cSMartin Matuska.Fx
2503c03c5b1cSMartin Matuskaare unable to read user namespace xattrs written in the Linux format, and
2504c03c5b1cSMartin Matuskalegacy versions of ZFS on Linux are unable to read user namespace xattrs written
2505c03c5b1cSMartin Matuskain the legacy ZFS format.
2506c03c5b1cSMartin Matuska.Pp
2507c03c5b1cSMartin MatuskaAn existing xattr with the alternate naming scheme is removed when overwriting
2508c03c5b1cSMartin Matuskathe xattr so as to not accumulate duplicates.
2509c03c5b1cSMartin Matuska.
25103ff01b23SMartin Matuska.It Sy zio_requeue_io_start_cut_in_line Ns = Ns Sy 0 Ns | Ns 1 Pq int
25113ff01b23SMartin MatuskaPrioritize requeued I/O.
25123ff01b23SMartin Matuska.
25133ff01b23SMartin Matuska.It Sy zio_taskq_batch_pct Ns = Ns Sy 80 Ns % Pq uint
25143ff01b23SMartin MatuskaPercentage of online CPUs which will run a worker thread for I/O.
2515b985c9caSMartin MatuskaThese workers are responsible for I/O work such as compression, encryption,
2516b985c9caSMartin Matuskachecksum and parity calculations.
25173ff01b23SMartin MatuskaFractional number of CPUs will be rounded down.
25183ff01b23SMartin Matuska.Pp
25193ff01b23SMartin MatuskaThe default value of
25203ff01b23SMartin Matuska.Sy 80%
25213ff01b23SMartin Matuskawas chosen to avoid using all CPUs which can result in
25223ff01b23SMartin Matuskalatency issues and inconsistent application performance,
25233ff01b23SMartin Matuskaespecially when slower compression and/or checksumming is enabled.
2524b985c9caSMartin MatuskaSet value only applies to pools imported/created after that.
25253ff01b23SMartin Matuska.
25263ff01b23SMartin Matuska.It Sy zio_taskq_batch_tpq Ns = Ns Sy 0 Pq uint
25273ff01b23SMartin MatuskaNumber of worker threads per taskq.
2528b985c9caSMartin MatuskaHigher values improve I/O ordering and CPU utilization,
2529b985c9caSMartin Matuskawhile lower reduce lock contention.
2530b985c9caSMartin MatuskaSet value only applies to pools imported/created after that.
25313ff01b23SMartin Matuska.Pp
25323ff01b23SMartin MatuskaIf
25333ff01b23SMartin Matuska.Sy 0 ,
25343ff01b23SMartin Matuskagenerate a system-dependent value close to 6 threads per taskq.
2535b985c9caSMartin MatuskaSet value only applies to pools imported/created after that.
25363ff01b23SMartin Matuska.
2537b985c9caSMartin Matuska.It Sy zio_taskq_write_tpq Ns = Ns Sy 16 Pq uint
2538c6767dc1SMartin MatuskaDetermines the minimum number of threads per write issue taskq.
2539b985c9caSMartin MatuskaHigher values improve CPU utilization on high throughput,
2540b985c9caSMartin Matuskawhile lower reduce taskq locks contention on high IOPS.
2541b985c9caSMartin MatuskaSet value only applies to pools imported/created after that.
254214c2e0a0SMartin Matuska.
2543b356da80SMartin Matuska.It Sy zio_taskq_read Ns = Ns Sy fixed,1,8 null scale null Pq charp
2544b356da80SMartin MatuskaSet the queue and thread configuration for the IO read queues.
2545b356da80SMartin MatuskaThis is an advanced debugging parameter.
2546b356da80SMartin MatuskaDon't change this unless you understand what it does.
2547b985c9caSMartin MatuskaSet values only apply to pools imported/created after that.
2548b356da80SMartin Matuska.
2549aca928a5SMartin Matuska.It Sy zio_taskq_write Ns = Ns Sy sync null scale null Pq charp
2550b356da80SMartin MatuskaSet the queue and thread configuration for the IO write queues.
2551b356da80SMartin MatuskaThis is an advanced debugging parameter.
2552b356da80SMartin MatuskaDon't change this unless you understand what it does.
2553b985c9caSMartin MatuskaSet values only apply to pools imported/created after that.
2554b356da80SMartin Matuska.
25553ff01b23SMartin Matuska.It Sy zvol_inhibit_dev Ns = Ns Sy 0 Ns | Ns 1 Pq uint
25563ff01b23SMartin MatuskaDo not create zvol device nodes.
25573ff01b23SMartin MatuskaThis may slightly improve startup time on
25583ff01b23SMartin Matuskasystems with a very large number of zvols.
25593ff01b23SMartin Matuska.
25603ff01b23SMartin Matuska.It Sy zvol_major Ns = Ns Sy 230 Pq uint
25613ff01b23SMartin MatuskaMajor number for zvol block devices.
25623ff01b23SMartin Matuska.
2563dbd5678dSMartin Matuska.It Sy zvol_max_discard_blocks Ns = Ns Sy 16384 Pq long
25643ff01b23SMartin MatuskaDiscard (TRIM) operations done on zvols will be done in batches of this
25653ff01b23SMartin Matuskamany blocks, where block size is determined by the
25663ff01b23SMartin Matuska.Sy volblocksize
25673ff01b23SMartin Matuskaproperty of a zvol.
25683ff01b23SMartin Matuska.
2569716fd348SMartin Matuska.It Sy zvol_prefetch_bytes Ns = Ns Sy 131072 Ns B Po 128 KiB Pc Pq uint
25703ff01b23SMartin MatuskaWhen adding a zvol to the system, prefetch this many bytes
25713ff01b23SMartin Matuskafrom the start and end of the volume.
25723ff01b23SMartin MatuskaPrefetching these regions of the volume is desirable,
25733ff01b23SMartin Matuskabecause they are likely to be accessed immediately by
25743ff01b23SMartin Matuska.Xr blkid 8
25753ff01b23SMartin Matuskaor the kernel partitioner.
25763ff01b23SMartin Matuska.
25773ff01b23SMartin Matuska.It Sy zvol_request_sync Ns = Ns Sy 0 Ns | Ns 1 Pq uint
25783ff01b23SMartin MatuskaWhen processing I/O requests for a zvol, submit them synchronously.
25793ff01b23SMartin MatuskaThis effectively limits the queue depth to
25803ff01b23SMartin Matuska.Em 1
25813ff01b23SMartin Matuskafor each I/O submitter.
25823ff01b23SMartin MatuskaWhen unset, requests are handled asynchronously by a thread pool.
25833ff01b23SMartin MatuskaThe number of requests which can be handled concurrently is controlled by
25843ff01b23SMartin Matuska.Sy zvol_threads .
25851f1e2261SMartin Matuska.Sy zvol_request_sync
25861f1e2261SMartin Matuskais ignored when running on a kernel that supports block multiqueue
25871f1e2261SMartin Matuska.Pq Li blk-mq .
25883ff01b23SMartin Matuska.
25891719886fSMartin Matuska.It Sy zvol_num_taskqs Ns = Ns Sy 0 Pq uint
25901719886fSMartin MatuskaNumber of zvol taskqs.
25911719886fSMartin MatuskaIf
25921719886fSMartin Matuska.Sy 0
25931719886fSMartin Matuska(the default) then scaling is done internally to prefer 6 threads per taskq.
25941719886fSMartin MatuskaThis only applies on Linux.
25951719886fSMartin Matuska.
25961f1e2261SMartin Matuska.It Sy zvol_threads Ns = Ns Sy 0 Pq uint
25971f1e2261SMartin MatuskaThe number of system wide threads to use for processing zvol block IOs.
25981f1e2261SMartin MatuskaIf
25991f1e2261SMartin Matuska.Sy 0
26001f1e2261SMartin Matuska(the default) then internally set
26011f1e2261SMartin Matuska.Sy zvol_threads
26021f1e2261SMartin Matuskato the number of CPUs present or 32 (whichever is greater).
26031f1e2261SMartin Matuska.
26046c1e79dfSMartin Matuska.It Sy zvol_blk_mq_threads Ns = Ns Sy 0 Pq uint
26056c1e79dfSMartin MatuskaThe number of threads per zvol to use for queuing IO requests.
26066c1e79dfSMartin MatuskaThis parameter will only appear if your kernel supports
26076c1e79dfSMartin Matuska.Li blk-mq
26086c1e79dfSMartin Matuskaand is only read and assigned to a zvol at zvol load time.
26096c1e79dfSMartin MatuskaIf
26106c1e79dfSMartin Matuska.Sy 0
26116c1e79dfSMartin Matuska(the default) then internally set
26126c1e79dfSMartin Matuska.Sy zvol_blk_mq_threads
26136c1e79dfSMartin Matuskato the number of CPUs present.
26146c1e79dfSMartin Matuska.
26156c1e79dfSMartin Matuska.It Sy zvol_use_blk_mq Ns = Ns Sy 0 Ns | Ns 1 Pq uint
26166c1e79dfSMartin MatuskaSet to
26176c1e79dfSMartin Matuska.Sy 1
26186c1e79dfSMartin Matuskato use the
26196c1e79dfSMartin Matuska.Li blk-mq
26206c1e79dfSMartin MatuskaAPI for zvols.
26216c1e79dfSMartin MatuskaSet to
26226c1e79dfSMartin Matuska.Sy 0
26236c1e79dfSMartin Matuska(the default) to use the legacy zvol APIs.
26246c1e79dfSMartin MatuskaThis setting can give better or worse zvol performance depending on
26256c1e79dfSMartin Matuskathe workload.
26266c1e79dfSMartin MatuskaThis parameter will only appear if your kernel supports
26276c1e79dfSMartin Matuska.Li blk-mq
26286c1e79dfSMartin Matuskaand is only read and assigned to a zvol at zvol load time.
26296c1e79dfSMartin Matuska.
26306c1e79dfSMartin Matuska.It Sy zvol_blk_mq_blocks_per_thread Ns = Ns Sy 8 Pq uint
26316c1e79dfSMartin MatuskaIf
26326c1e79dfSMartin Matuska.Sy zvol_use_blk_mq
26336c1e79dfSMartin Matuskais enabled, then process this number of
26346c1e79dfSMartin Matuska.Sy volblocksize Ns -sized blocks per zvol thread.
26356c1e79dfSMartin MatuskaThis tunable can be use to favor better performance for zvol reads (lower
26366c1e79dfSMartin Matuskavalues) or writes (higher values).
26376c1e79dfSMartin MatuskaIf set to
26386c1e79dfSMartin Matuska.Sy 0 ,
26396c1e79dfSMartin Matuskathen the zvol layer will process the maximum number of blocks
26406c1e79dfSMartin Matuskaper thread that it can.
26416c1e79dfSMartin MatuskaThis parameter will only appear if your kernel supports
26426c1e79dfSMartin Matuska.Li blk-mq
26436c1e79dfSMartin Matuskaand is only applied at each zvol's load time.
26446c1e79dfSMartin Matuska.
26456c1e79dfSMartin Matuska.It Sy zvol_blk_mq_queue_depth Ns = Ns Sy 0 Pq uint
26466c1e79dfSMartin MatuskaThe queue_depth value for the zvol
26476c1e79dfSMartin Matuska.Li blk-mq
26486c1e79dfSMartin Matuskainterface.
26496c1e79dfSMartin MatuskaThis parameter will only appear if your kernel supports
26506c1e79dfSMartin Matuska.Li blk-mq
26516c1e79dfSMartin Matuskaand is only applied at each zvol's load time.
26526c1e79dfSMartin MatuskaIf
26536c1e79dfSMartin Matuska.Sy 0
26546c1e79dfSMartin Matuska(the default) then use the kernel's default queue depth.
26556c1e79dfSMartin MatuskaValues are clamped to the kernel's
26566c1e79dfSMartin Matuska.Dv BLKDEV_MIN_RQ
26576c1e79dfSMartin Matuskaand
26586c1e79dfSMartin Matuska.Dv BLKDEV_MAX_RQ Ns / Ns Dv BLKDEV_DEFAULT_RQ
26596c1e79dfSMartin Matuskalimits.
26606c1e79dfSMartin Matuska.
26613ff01b23SMartin Matuska.It Sy zvol_volmode Ns = Ns Sy 1 Pq uint
2662*61145dc2SMartin MatuskaDefines zvol block devices behavior when
26633ff01b23SMartin Matuska.Sy volmode Ns = Ns Sy default :
26643ff01b23SMartin Matuska.Bl -tag -compact -offset 4n -width "a"
26653ff01b23SMartin Matuska.It Sy 1
26663ff01b23SMartin Matuska.No equivalent to Sy full
26673ff01b23SMartin Matuska.It Sy 2
26683ff01b23SMartin Matuska.No equivalent to Sy dev
26693ff01b23SMartin Matuska.It Sy 3
26703ff01b23SMartin Matuska.No equivalent to Sy none
26713ff01b23SMartin Matuska.El
2672dbd5678dSMartin Matuska.
2673dbd5678dSMartin Matuska.It Sy zvol_enforce_quotas Ns = Ns Sy 0 Ns | Ns 1 Pq uint
2674dbd5678dSMartin MatuskaEnable strict ZVOL quota enforcement.
2675dbd5678dSMartin MatuskaThe strict quota enforcement may have a performance impact.
26763ff01b23SMartin Matuska.El
26773ff01b23SMartin Matuska.
26783ff01b23SMartin Matuska.Sh ZFS I/O SCHEDULER
26793ff01b23SMartin MatuskaZFS issues I/O operations to leaf vdevs to satisfy and complete I/O operations.
26803ff01b23SMartin MatuskaThe scheduler determines when and in what order those operations are issued.
26813ff01b23SMartin MatuskaThe scheduler divides operations into five I/O classes,
26823ff01b23SMartin Matuskaprioritized in the following order: sync read, sync write, async read,
26833ff01b23SMartin Matuskaasync write, and scrub/resilver.
26843ff01b23SMartin MatuskaEach queue defines the minimum and maximum number of concurrent operations
26853ff01b23SMartin Matuskathat may be issued to the device.
26863ff01b23SMartin MatuskaIn addition, the device has an aggregate maximum,
26873ff01b23SMartin Matuska.Sy zfs_vdev_max_active .
26883ff01b23SMartin MatuskaNote that the sum of the per-queue minima must not exceed the aggregate maximum.
26893ff01b23SMartin MatuskaIf the sum of the per-queue maxima exceeds the aggregate maximum,
26903ff01b23SMartin Matuskathen the number of active operations may reach
26913ff01b23SMartin Matuska.Sy zfs_vdev_max_active ,
26923ff01b23SMartin Matuskain which case no further operations will be issued,
26933ff01b23SMartin Matuskaregardless of whether all per-queue minima have been met.
26943ff01b23SMartin Matuska.Pp
26953ff01b23SMartin MatuskaFor many physical devices, throughput increases with the number of
26963ff01b23SMartin Matuskaconcurrent operations, but latency typically suffers.
26973ff01b23SMartin MatuskaFurthermore, physical devices typically have a limit
26983ff01b23SMartin Matuskaat which more concurrent operations have no
26993ff01b23SMartin Matuskaeffect on throughput or can actually cause it to decrease.
27003ff01b23SMartin Matuska.Pp
27013ff01b23SMartin MatuskaThe scheduler selects the next operation to issue by first looking for an
27023ff01b23SMartin MatuskaI/O class whose minimum has not been satisfied.
27033ff01b23SMartin MatuskaOnce all are satisfied and the aggregate maximum has not been hit,
27043ff01b23SMartin Matuskathe scheduler looks for classes whose maximum has not been satisfied.
27053ff01b23SMartin MatuskaIteration through the I/O classes is done in the order specified above.
27063ff01b23SMartin MatuskaNo further operations are issued
27073ff01b23SMartin Matuskaif the aggregate maximum number of concurrent operations has been hit,
2708bb2d13b6SMartin Matuskaor if there are no operations queued for an I/O class that has not hit its
2709bb2d13b6SMartin Matuskamaximum.
27103ff01b23SMartin MatuskaEvery time an I/O operation is queued or an operation completes,
27113ff01b23SMartin Matuskathe scheduler looks for new operations to issue.
27123ff01b23SMartin Matuska.Pp
27133ff01b23SMartin MatuskaIn general, smaller
27143ff01b23SMartin Matuska.Sy max_active Ns s
27153ff01b23SMartin Matuskawill lead to lower latency of synchronous operations.
27163ff01b23SMartin MatuskaLarger
27173ff01b23SMartin Matuska.Sy max_active Ns s
27183ff01b23SMartin Matuskamay lead to higher overall throughput, depending on underlying storage.
27193ff01b23SMartin Matuska.Pp
27203ff01b23SMartin MatuskaThe ratio of the queues'
27213ff01b23SMartin Matuska.Sy max_active Ns s
27223ff01b23SMartin Matuskadetermines the balance of performance between reads, writes, and scrubs.
27233ff01b23SMartin MatuskaFor example, increasing
27243ff01b23SMartin Matuska.Sy zfs_vdev_scrub_max_active
27253ff01b23SMartin Matuskawill cause the scrub or resilver to complete more quickly,
27263ff01b23SMartin Matuskabut reads and writes to have higher latency and lower throughput.
27273ff01b23SMartin Matuska.Pp
27283ff01b23SMartin MatuskaAll I/O classes have a fixed maximum number of outstanding operations,
27293ff01b23SMartin Matuskaexcept for the async write class.
27303ff01b23SMartin MatuskaAsynchronous writes represent the data that is committed to stable storage
27313ff01b23SMartin Matuskaduring the syncing stage for transaction groups.
27323ff01b23SMartin MatuskaTransaction groups enter the syncing state periodically,
27333ff01b23SMartin Matuskaso the number of queued async writes will quickly burst up
27343ff01b23SMartin Matuskaand then bleed down to zero.
27353ff01b23SMartin MatuskaRather than servicing them as quickly as possible,
27363ff01b23SMartin Matuskathe I/O scheduler changes the maximum number of active async write operations
27373ff01b23SMartin Matuskaaccording to the amount of dirty data in the pool.
27383ff01b23SMartin MatuskaSince both throughput and latency typically increase with the number of
27393ff01b23SMartin Matuskaconcurrent operations issued to physical devices, reducing the
2740bb2d13b6SMartin Matuskaburstiness in the number of simultaneous operations also stabilizes the
2741bb2d13b6SMartin Matuskaresponse time of operations from other queues, in particular synchronous ones.
27423ff01b23SMartin MatuskaIn broad strokes, the I/O scheduler will issue more concurrent operations
2743bb2d13b6SMartin Matuskafrom the async write queue as there is more dirty data in the pool.
27443ff01b23SMartin Matuska.
27453ff01b23SMartin Matuska.Ss Async Writes
27463ff01b23SMartin MatuskaThe number of concurrent operations issued for the async write I/O class
27473ff01b23SMartin Matuskafollows a piece-wise linear function defined by a few adjustable points:
27483ff01b23SMartin Matuska.Bd -literal
27493ff01b23SMartin Matuska       |              o---------| <-- \fBzfs_vdev_async_write_max_active\fP
27503ff01b23SMartin Matuska  ^    |             /^         |
27513ff01b23SMartin Matuska  |    |            / |         |
27523ff01b23SMartin Matuskaactive |           /  |         |
27533ff01b23SMartin Matuska I/O   |          /   |         |
27543ff01b23SMartin Matuskacount  |         /    |         |
27553ff01b23SMartin Matuska       |        /     |         |
27563ff01b23SMartin Matuska       |-------o      |         | <-- \fBzfs_vdev_async_write_min_active\fP
27573ff01b23SMartin Matuska      0|_______^______|_________|
27583ff01b23SMartin Matuska       0%      |      |       100% of \fBzfs_dirty_data_max\fP
27593ff01b23SMartin Matuska               |      |
27603ff01b23SMartin Matuska               |      `-- \fBzfs_vdev_async_write_active_max_dirty_percent\fP
27613ff01b23SMartin Matuska               `--------- \fBzfs_vdev_async_write_active_min_dirty_percent\fP
27623ff01b23SMartin Matuska.Ed
27633ff01b23SMartin Matuska.Pp
27643ff01b23SMartin MatuskaUntil the amount of dirty data exceeds a minimum percentage of the dirty
27653ff01b23SMartin Matuskadata allowed in the pool, the I/O scheduler will limit the number of
27663ff01b23SMartin Matuskaconcurrent operations to the minimum.
27673ff01b23SMartin MatuskaAs that threshold is crossed, the number of concurrent operations issued
27683ff01b23SMartin Matuskaincreases linearly to the maximum at the specified maximum percentage
27693ff01b23SMartin Matuskaof the dirty data allowed in the pool.
27703ff01b23SMartin Matuska.Pp
27713ff01b23SMartin MatuskaIdeally, the amount of dirty data on a busy pool will stay in the sloped
27723ff01b23SMartin Matuskapart of the function between
27733ff01b23SMartin Matuska.Sy zfs_vdev_async_write_active_min_dirty_percent
27743ff01b23SMartin Matuskaand
27753ff01b23SMartin Matuska.Sy zfs_vdev_async_write_active_max_dirty_percent .
27763ff01b23SMartin MatuskaIf it exceeds the maximum percentage,
27773ff01b23SMartin Matuskathis indicates that the rate of incoming data is
27783ff01b23SMartin Matuskagreater than the rate that the backend storage can handle.
27793ff01b23SMartin MatuskaIn this case, we must further throttle incoming writes,
27803ff01b23SMartin Matuskaas described in the next section.
27813ff01b23SMartin Matuska.
27823ff01b23SMartin Matuska.Sh ZFS TRANSACTION DELAY
27833ff01b23SMartin MatuskaWe delay transactions when we've determined that the backend storage
27843ff01b23SMartin Matuskaisn't able to accommodate the rate of incoming writes.
27853ff01b23SMartin Matuska.Pp
27863ff01b23SMartin MatuskaIf there is already a transaction waiting, we delay relative to when
27873ff01b23SMartin Matuskathat transaction will finish waiting.
27883ff01b23SMartin MatuskaThis way the calculated delay time
27893ff01b23SMartin Matuskais independent of the number of threads concurrently executing transactions.
27903ff01b23SMartin Matuska.Pp
27913ff01b23SMartin MatuskaIf we are the only waiter, wait relative to when the transaction started,
27923ff01b23SMartin Matuskarather than the current time.
27933ff01b23SMartin MatuskaThis credits the transaction for "time already served",
27943ff01b23SMartin Matuskae.g. reading indirect blocks.
27953ff01b23SMartin Matuska.Pp
27963ff01b23SMartin MatuskaThe minimum time for a transaction to take is calculated as
2797e92ffd9bSMartin Matuska.D1 min_time = min( Ns Sy zfs_delay_scale No \(mu Po Sy dirty No \- Sy min Pc / Po Sy max No \- Sy dirty Pc , 100ms)
27983ff01b23SMartin Matuska.Pp
27993ff01b23SMartin MatuskaThe delay has two degrees of freedom that can be adjusted via tunables.
28003ff01b23SMartin MatuskaThe percentage of dirty data at which we start to delay is defined by
28013ff01b23SMartin Matuska.Sy zfs_delay_min_dirty_percent .
28023ff01b23SMartin MatuskaThis should typically be at or above
28033ff01b23SMartin Matuska.Sy zfs_vdev_async_write_active_max_dirty_percent ,
28043ff01b23SMartin Matuskaso that we only start to delay after writing at full speed
28053ff01b23SMartin Matuskahas failed to keep up with the incoming write rate.
28063ff01b23SMartin MatuskaThe scale of the curve is defined by
28073ff01b23SMartin Matuska.Sy zfs_delay_scale .
2808bb2d13b6SMartin MatuskaRoughly speaking, this variable determines the amount of delay at the midpoint
2809bb2d13b6SMartin Matuskaof the curve.
28103ff01b23SMartin Matuska.Bd -literal
28113ff01b23SMartin Matuskadelay
28123ff01b23SMartin Matuska 10ms +-------------------------------------------------------------*+
28133ff01b23SMartin Matuska      |                                                             *|
28143ff01b23SMartin Matuska  9ms +                                                             *+
28153ff01b23SMartin Matuska      |                                                             *|
28163ff01b23SMartin Matuska  8ms +                                                             *+
28173ff01b23SMartin Matuska      |                                                            * |
28183ff01b23SMartin Matuska  7ms +                                                            * +
28193ff01b23SMartin Matuska      |                                                            * |
28203ff01b23SMartin Matuska  6ms +                                                            * +
28213ff01b23SMartin Matuska      |                                                            * |
28223ff01b23SMartin Matuska  5ms +                                                           *  +
28233ff01b23SMartin Matuska      |                                                           *  |
28243ff01b23SMartin Matuska  4ms +                                                           *  +
28253ff01b23SMartin Matuska      |                                                           *  |
28263ff01b23SMartin Matuska  3ms +                                                          *   +
28273ff01b23SMartin Matuska      |                                                          *   |
28283ff01b23SMartin Matuska  2ms +                                              (midpoint) *    +
28293ff01b23SMartin Matuska      |                                                  |    **     |
28303ff01b23SMartin Matuska  1ms +                                                  v ***       +
28313ff01b23SMartin Matuska      |             \fBzfs_delay_scale\fP ---------->     ********         |
28323ff01b23SMartin Matuska    0 +-------------------------------------*********----------------+
28333ff01b23SMartin Matuska      0%                    <- \fBzfs_dirty_data_max\fP ->               100%
28343ff01b23SMartin Matuska.Ed
28353ff01b23SMartin Matuska.Pp
28363ff01b23SMartin MatuskaNote, that since the delay is added to the outstanding time remaining on the
28373ff01b23SMartin Matuskamost recent transaction it's effectively the inverse of IOPS.
28383ff01b23SMartin MatuskaHere, the midpoint of
28393ff01b23SMartin Matuska.Em 500 us
28403ff01b23SMartin Matuskatranslates to
28413ff01b23SMartin Matuska.Em 2000 IOPS .
28423ff01b23SMartin MatuskaThe shape of the curve
28433ff01b23SMartin Matuskawas chosen such that small changes in the amount of accumulated dirty data
28443ff01b23SMartin Matuskain the first three quarters of the curve yield relatively small differences
28453ff01b23SMartin Matuskain the amount of delay.
28463ff01b23SMartin Matuska.Pp
28473ff01b23SMartin MatuskaThe effects can be easier to understand when the amount of delay is
28483ff01b23SMartin Matuskarepresented on a logarithmic scale:
28493ff01b23SMartin Matuska.Bd -literal
28503ff01b23SMartin Matuskadelay
28513ff01b23SMartin Matuska100ms +-------------------------------------------------------------++
28523ff01b23SMartin Matuska      +                                                              +
28533ff01b23SMartin Matuska      |                                                              |
28543ff01b23SMartin Matuska      +                                                             *+
28553ff01b23SMartin Matuska 10ms +                                                             *+
28563ff01b23SMartin Matuska      +                                                           ** +
28573ff01b23SMartin Matuska      |                                              (midpoint)  **  |
28583ff01b23SMartin Matuska      +                                                  |     **    +
28593ff01b23SMartin Matuska  1ms +                                                  v ****      +
28603ff01b23SMartin Matuska      +             \fBzfs_delay_scale\fP ---------->        *****         +
28613ff01b23SMartin Matuska      |                                             ****             |
28623ff01b23SMartin Matuska      +                                          ****                +
28633ff01b23SMartin Matuska100us +                                        **                    +
28643ff01b23SMartin Matuska      +                                       *                      +
28653ff01b23SMartin Matuska      |                                      *                       |
28663ff01b23SMartin Matuska      +                                     *                        +
28673ff01b23SMartin Matuska 10us +                                     *                        +
28683ff01b23SMartin Matuska      +                                                              +
28693ff01b23SMartin Matuska      |                                                              |
28703ff01b23SMartin Matuska      +                                                              +
28713ff01b23SMartin Matuska      +--------------------------------------------------------------+
28723ff01b23SMartin Matuska      0%                    <- \fBzfs_dirty_data_max\fP ->               100%
28733ff01b23SMartin Matuska.Ed
28743ff01b23SMartin Matuska.Pp
28753ff01b23SMartin MatuskaNote here that only as the amount of dirty data approaches its limit does
28763ff01b23SMartin Matuskathe delay start to increase rapidly.
28773ff01b23SMartin MatuskaThe goal of a properly tuned system should be to keep the amount of dirty data
28783ff01b23SMartin Matuskaout of that range by first ensuring that the appropriate limits are set
28793ff01b23SMartin Matuskafor the I/O scheduler to reach optimal throughput on the back-end storage,
28803ff01b23SMartin Matuskaand then by changing the value of
28813ff01b23SMartin Matuska.Sy zfs_delay_scale
28823ff01b23SMartin Matuskato increase the steepness of the curve.
2883