xref: /freebsd/sys/contrib/openzfs/man/man4/zfs.4 (revision 3ff01b231dfa83d518854c63e7c9cd1debd1139e)
1*3ff01b23SMartin Matuska.\"
2*3ff01b23SMartin Matuska.\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved.
3*3ff01b23SMartin Matuska.\" Copyright (c) 2019, 2021 by Delphix. All rights reserved.
4*3ff01b23SMartin Matuska.\" Copyright (c) 2019 Datto Inc.
5*3ff01b23SMartin Matuska.\" The contents of this file are subject to the terms of the Common Development
6*3ff01b23SMartin Matuska.\" and Distribution License (the "License").  You may not use this file except
7*3ff01b23SMartin Matuska.\" in compliance with the License. You can obtain a copy of the license at
8*3ff01b23SMartin Matuska.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing.
9*3ff01b23SMartin Matuska.\"
10*3ff01b23SMartin Matuska.\" See the License for the specific language governing permissions and
11*3ff01b23SMartin Matuska.\" limitations under the License. When distributing Covered Code, include this
12*3ff01b23SMartin Matuska.\" CDDL HEADER in each file and include the License file at
13*3ff01b23SMartin Matuska.\" usr/src/OPENSOLARIS.LICENSE.  If applicable, add the following below this
14*3ff01b23SMartin Matuska.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your
15*3ff01b23SMartin Matuska.\" own identifying information:
16*3ff01b23SMartin Matuska.\" Portions Copyright [yyyy] [name of copyright owner]
17*3ff01b23SMartin Matuska.\"
18*3ff01b23SMartin Matuska.Dd June 1, 2021
19*3ff01b23SMartin Matuska.Dt ZFS 4
20*3ff01b23SMartin Matuska.Os
21*3ff01b23SMartin Matuska.
22*3ff01b23SMartin Matuska.Sh NAME
23*3ff01b23SMartin Matuska.Nm zfs
24*3ff01b23SMartin Matuska.Nd tuning of the ZFS kernel module
25*3ff01b23SMartin Matuska.
26*3ff01b23SMartin Matuska.Sh DESCRIPTION
27*3ff01b23SMartin MatuskaThe ZFS module supports these parameters:
28*3ff01b23SMartin Matuska.Bl -tag -width Ds
29*3ff01b23SMartin Matuska.It Sy dbuf_cache_max_bytes Ns = Ns Sy ULONG_MAX Ns B Pq ulong
30*3ff01b23SMartin MatuskaMaximum size in bytes of the dbuf cache.
31*3ff01b23SMartin MatuskaThe target size is determined by the MIN versus
32*3ff01b23SMartin Matuska.No 1/2^ Ns Sy dbuf_cache_shift Pq 1/32nd
33*3ff01b23SMartin Matuskaof the target ARC size.
34*3ff01b23SMartin MatuskaThe behavior of the dbuf cache and its associated settings
35*3ff01b23SMartin Matuskacan be observed via the
36*3ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/dbufstats
37*3ff01b23SMartin Matuskakstat.
38*3ff01b23SMartin Matuska.
39*3ff01b23SMartin Matuska.It Sy dbuf_metadata_cache_max_bytes Ns = Ns Sy ULONG_MAX Ns B Pq ulong
40*3ff01b23SMartin MatuskaMaximum size in bytes of the metadata dbuf cache.
41*3ff01b23SMartin MatuskaThe target size is determined by the MIN versus
42*3ff01b23SMartin Matuska.No 1/2^ Ns Sy dbuf_metadata_cache_shift Pq 1/64th
43*3ff01b23SMartin Matuskaof the target ARC size.
44*3ff01b23SMartin MatuskaThe behavior of the metadata dbuf cache and its associated settings
45*3ff01b23SMartin Matuskacan be observed via the
46*3ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/dbufstats
47*3ff01b23SMartin Matuskakstat.
48*3ff01b23SMartin Matuska.
49*3ff01b23SMartin Matuska.It Sy dbuf_cache_hiwater_pct Ns = Ns Sy 10 Ns % Pq uint
50*3ff01b23SMartin MatuskaThe percentage over
51*3ff01b23SMartin Matuska.Sy dbuf_cache_max_bytes
52*3ff01b23SMartin Matuskawhen dbufs must be evicted directly.
53*3ff01b23SMartin Matuska.
54*3ff01b23SMartin Matuska.It Sy dbuf_cache_lowater_pct Ns = Ns Sy 10 Ns % Pq uint
55*3ff01b23SMartin MatuskaThe percentage below
56*3ff01b23SMartin Matuska.Sy dbuf_cache_max_bytes
57*3ff01b23SMartin Matuskawhen the evict thread stops evicting dbufs.
58*3ff01b23SMartin Matuska.
59*3ff01b23SMartin Matuska.It Sy dbuf_cache_shift Ns = Ns Sy 5 Pq int
60*3ff01b23SMartin MatuskaSet the size of the dbuf cache
61*3ff01b23SMartin Matuska.Pq Sy dbuf_cache_max_bytes
62*3ff01b23SMartin Matuskato a log2 fraction of the target ARC size.
63*3ff01b23SMartin Matuska.
64*3ff01b23SMartin Matuska.It Sy dbuf_metadata_cache_shift Ns = Ns Sy 6 Pq int
65*3ff01b23SMartin MatuskaSet the size of the dbuf metadata cache
66*3ff01b23SMartin Matuska.Pq Sy dbuf_metadata_cache_max_bytes
67*3ff01b23SMartin Matuskato a log2 fraction of the target ARC size.
68*3ff01b23SMartin Matuska.
69*3ff01b23SMartin Matuska.It Sy dmu_object_alloc_chunk_shift Ns = Ns Sy 7 Po 128 Pc Pq int
70*3ff01b23SMartin Matuskadnode slots allocated in a single operation as a power of 2.
71*3ff01b23SMartin MatuskaThe default value minimizes lock contention for the bulk operation performed.
72*3ff01b23SMartin Matuska.
73*3ff01b23SMartin Matuska.It Sy dmu_prefetch_max Ns = Ns Sy 134217728 Ns B Po 128MB Pc Pq int
74*3ff01b23SMartin MatuskaLimit the amount we can prefetch with one call to this amount in bytes.
75*3ff01b23SMartin MatuskaThis helps to limit the amount of memory that can be used by prefetching.
76*3ff01b23SMartin Matuska.
77*3ff01b23SMartin Matuska.It Sy ignore_hole_birth Pq int
78*3ff01b23SMartin MatuskaAlias for
79*3ff01b23SMartin Matuska.Sy send_holes_without_birth_time .
80*3ff01b23SMartin Matuska.
81*3ff01b23SMartin Matuska.It Sy l2arc_feed_again Ns = Ns Sy 1 Ns | Ns 0 Pq int
82*3ff01b23SMartin MatuskaTurbo L2ARC warm-up.
83*3ff01b23SMartin MatuskaWhen the L2ARC is cold the fill interval will be set as fast as possible.
84*3ff01b23SMartin Matuska.
85*3ff01b23SMartin Matuska.It Sy l2arc_feed_min_ms Ns = Ns Sy 200 Pq ulong
86*3ff01b23SMartin MatuskaMin feed interval in milliseconds.
87*3ff01b23SMartin MatuskaRequires
88*3ff01b23SMartin Matuska.Sy l2arc_feed_again Ns = Ns Ar 1
89*3ff01b23SMartin Matuskaand only applicable in related situations.
90*3ff01b23SMartin Matuska.
91*3ff01b23SMartin Matuska.It Sy l2arc_feed_secs Ns = Ns Sy 1 Pq ulong
92*3ff01b23SMartin MatuskaSeconds between L2ARC writing.
93*3ff01b23SMartin Matuska.
94*3ff01b23SMartin Matuska.It Sy l2arc_headroom Ns = Ns Sy 2 Pq ulong
95*3ff01b23SMartin MatuskaHow far through the ARC lists to search for L2ARC cacheable content,
96*3ff01b23SMartin Matuskaexpressed as a multiplier of
97*3ff01b23SMartin Matuska.Sy l2arc_write_max .
98*3ff01b23SMartin MatuskaARC persistence across reboots can be achieved with persistent L2ARC
99*3ff01b23SMartin Matuskaby setting this parameter to
100*3ff01b23SMartin Matuska.Sy 0 ,
101*3ff01b23SMartin Matuskaallowing the full length of ARC lists to be searched for cacheable content.
102*3ff01b23SMartin Matuska.
103*3ff01b23SMartin Matuska.It Sy l2arc_headroom_boost Ns = Ns Sy 200 Ns % Pq ulong
104*3ff01b23SMartin MatuskaScales
105*3ff01b23SMartin Matuska.Sy l2arc_headroom
106*3ff01b23SMartin Matuskaby this percentage when L2ARC contents are being successfully compressed
107*3ff01b23SMartin Matuskabefore writing.
108*3ff01b23SMartin MatuskaA value of
109*3ff01b23SMartin Matuska.Sy 100
110*3ff01b23SMartin Matuskadisables this feature.
111*3ff01b23SMartin Matuska.
112*3ff01b23SMartin Matuska.It Sy l2arc_mfuonly Ns = Ns Sy 0 Ns | Ns 1 Pq  int
113*3ff01b23SMartin MatuskaControls whether only MFU metadata and data are cached from ARC into L2ARC.
114*3ff01b23SMartin MatuskaThis may be desired to avoid wasting space on L2ARC when reading/writing large
115*3ff01b23SMartin Matuskaamounts of data that are not expected to be accessed more than once.
116*3ff01b23SMartin Matuska.Pp
117*3ff01b23SMartin MatuskaThe default is off,
118*3ff01b23SMartin Matuskameaning both MRU and MFU data and metadata are cached.
119*3ff01b23SMartin MatuskaWhen turning off this feature, some MRU buffers will still be present
120*3ff01b23SMartin Matuskain ARC and eventually cached on L2ARC.
121*3ff01b23SMartin Matuska.No If Sy l2arc_noprefetch Ns = Ns Sy 0 ,
122*3ff01b23SMartin Matuskasome prefetched buffers will be cached to L2ARC, and those might later
123*3ff01b23SMartin Matuskatransition to MRU, in which case the
124*3ff01b23SMartin Matuska.Sy l2arc_mru_asize No arcstat will not be Sy 0 .
125*3ff01b23SMartin Matuska.Pp
126*3ff01b23SMartin MatuskaRegardless of
127*3ff01b23SMartin Matuska.Sy l2arc_noprefetch ,
128*3ff01b23SMartin Matuskasome MFU buffers might be evicted from ARC,
129*3ff01b23SMartin Matuskaaccessed later on as prefetches and transition to MRU as prefetches.
130*3ff01b23SMartin MatuskaIf accessed again they are counted as MRU and the
131*3ff01b23SMartin Matuska.Sy l2arc_mru_asize No arcstat will not be Sy 0 .
132*3ff01b23SMartin Matuska.Pp
133*3ff01b23SMartin MatuskaThe ARC status of L2ARC buffers when they were first cached in
134*3ff01b23SMartin MatuskaL2ARC can be seen in the
135*3ff01b23SMartin Matuska.Sy l2arc_mru_asize , Sy l2arc_mfu_asize , No and Sy l2arc_prefetch_asize
136*3ff01b23SMartin Matuskaarcstats when importing the pool or onlining a cache
137*3ff01b23SMartin Matuskadevice if persistent L2ARC is enabled.
138*3ff01b23SMartin Matuska.Pp
139*3ff01b23SMartin MatuskaThe
140*3ff01b23SMartin Matuska.Sy evict_l2_eligible_mru
141*3ff01b23SMartin Matuskaarcstat does not take into account if this option is enabled as the information
142*3ff01b23SMartin Matuskaprovided by the
143*3ff01b23SMartin Matuska.Sy evict_l2_eligible_m[rf]u
144*3ff01b23SMartin Matuskaarcstats can be used to decide if toggling this option is appropriate
145*3ff01b23SMartin Matuskafor the current workload.
146*3ff01b23SMartin Matuska.
147*3ff01b23SMartin Matuska.It Sy l2arc_meta_percent Ns = Ns Sy 33 Ns % Pq int
148*3ff01b23SMartin MatuskaPercent of ARC size allowed for L2ARC-only headers.
149*3ff01b23SMartin MatuskaSince L2ARC buffers are not evicted on memory pressure,
150*3ff01b23SMartin Matuskatoo many headers on a system with an irrationally large L2ARC
151*3ff01b23SMartin Matuskacan render it slow or unusable.
152*3ff01b23SMartin MatuskaThis parameter limits L2ARC writes and rebuilds to achieve the target.
153*3ff01b23SMartin Matuska.
154*3ff01b23SMartin Matuska.It Sy l2arc_trim_ahead Ns = Ns Sy 0 Ns % Pq ulong
155*3ff01b23SMartin MatuskaTrims ahead of the current write size
156*3ff01b23SMartin Matuska.Pq Sy l2arc_write_max
157*3ff01b23SMartin Matuskaon L2ARC devices by this percentage of write size if we have filled the device.
158*3ff01b23SMartin MatuskaIf set to
159*3ff01b23SMartin Matuska.Sy 100
160*3ff01b23SMartin Matuskawe TRIM twice the space required to accommodate upcoming writes.
161*3ff01b23SMartin MatuskaA minimum of
162*3ff01b23SMartin Matuska.Sy 64MB
163*3ff01b23SMartin Matuskawill be trimmed.
164*3ff01b23SMartin MatuskaIt also enables TRIM of the whole L2ARC device upon creation
165*3ff01b23SMartin Matuskaor addition to an existing pool or if the header of the device is
166*3ff01b23SMartin Matuskainvalid upon importing a pool or onlining a cache device.
167*3ff01b23SMartin MatuskaA value of
168*3ff01b23SMartin Matuska.Sy 0
169*3ff01b23SMartin Matuskadisables TRIM on L2ARC altogether and is the default as it can put significant
170*3ff01b23SMartin Matuskastress on the underlying storage devices.
171*3ff01b23SMartin MatuskaThis will vary depending of how well the specific device handles these commands.
172*3ff01b23SMartin Matuska.
173*3ff01b23SMartin Matuska.It Sy l2arc_noprefetch Ns = Ns Sy 1 Ns | Ns 0 Pq int
174*3ff01b23SMartin MatuskaDo not write buffers to L2ARC if they were prefetched but not used by
175*3ff01b23SMartin Matuskaapplications.
176*3ff01b23SMartin MatuskaIn case there are prefetched buffers in L2ARC and this option
177*3ff01b23SMartin Matuskais later set, we do not read the prefetched buffers from L2ARC.
178*3ff01b23SMartin MatuskaUnsetting this option is useful for caching sequential reads from the
179*3ff01b23SMartin Matuskadisks to L2ARC and serve those reads from L2ARC later on.
180*3ff01b23SMartin MatuskaThis may be beneficial in case the L2ARC device is significantly faster
181*3ff01b23SMartin Matuskain sequential reads than the disks of the pool.
182*3ff01b23SMartin Matuska.Pp
183*3ff01b23SMartin MatuskaUse
184*3ff01b23SMartin Matuska.Sy 1
185*3ff01b23SMartin Matuskato disable and
186*3ff01b23SMartin Matuska.Sy 0
187*3ff01b23SMartin Matuskato enable caching/reading prefetches to/from L2ARC.
188*3ff01b23SMartin Matuska.
189*3ff01b23SMartin Matuska.It Sy l2arc_norw Ns = Ns Sy 0 Ns | Ns 1 Pq int
190*3ff01b23SMartin MatuskaNo reads during writes.
191*3ff01b23SMartin Matuska.
192*3ff01b23SMartin Matuska.It Sy l2arc_write_boost Ns = Ns Sy 8388608 Ns B Po 8MB Pc Pq ulong
193*3ff01b23SMartin MatuskaCold L2ARC devices will have
194*3ff01b23SMartin Matuska.Sy l2arc_write_max
195*3ff01b23SMartin Matuskaincreased by this amount while they remain cold.
196*3ff01b23SMartin Matuska.
197*3ff01b23SMartin Matuska.It Sy l2arc_write_max Ns = Ns Sy 8388608 Ns B Po 8MB Pc Pq ulong
198*3ff01b23SMartin MatuskaMax write bytes per interval.
199*3ff01b23SMartin Matuska.
200*3ff01b23SMartin Matuska.It Sy l2arc_rebuild_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
201*3ff01b23SMartin MatuskaRebuild the L2ARC when importing a pool (persistent L2ARC).
202*3ff01b23SMartin MatuskaThis can be disabled if there are problems importing a pool
203*3ff01b23SMartin Matuskaor attaching an L2ARC device (e.g. the L2ARC device is slow
204*3ff01b23SMartin Matuskain reading stored log metadata, or the metadata
205*3ff01b23SMartin Matuskahas become somehow fragmented/unusable).
206*3ff01b23SMartin Matuska.
207*3ff01b23SMartin Matuska.It Sy l2arc_rebuild_blocks_min_l2size Ns = Ns Sy 1073741824 Ns B Po 1GB Pc Pq ulong
208*3ff01b23SMartin MatuskaMininum size of an L2ARC device required in order to write log blocks in it.
209*3ff01b23SMartin MatuskaThe log blocks are used upon importing the pool to rebuild the persistent L2ARC.
210*3ff01b23SMartin Matuska.Pp
211*3ff01b23SMartin MatuskaFor L2ARC devices less than 1GB, the amount of data
212*3ff01b23SMartin Matuska.Fn l2arc_evict
213*3ff01b23SMartin Matuskaevicts is significant compared to the amount of restored L2ARC data.
214*3ff01b23SMartin MatuskaIn this case, do not write log blocks in L2ARC in order not to waste space.
215*3ff01b23SMartin Matuska.
216*3ff01b23SMartin Matuska.It Sy metaslab_aliquot Ns = Ns Sy 524288 Ns B Po 512kB Pc Pq ulong
217*3ff01b23SMartin MatuskaMetaslab granularity, in bytes.
218*3ff01b23SMartin MatuskaThis is roughly similar to what would be referred to as the "stripe size"
219*3ff01b23SMartin Matuskain traditional RAID arrays.
220*3ff01b23SMartin MatuskaIn normal operation, ZFS will try to write this amount of data
221*3ff01b23SMartin Matuskato a top-level vdev before moving on to the next one.
222*3ff01b23SMartin Matuska.
223*3ff01b23SMartin Matuska.It Sy metaslab_bias_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
224*3ff01b23SMartin MatuskaEnable metaslab group biasing based on their vdevs' over- or under-utilization
225*3ff01b23SMartin Matuskarelative to the pool.
226*3ff01b23SMartin Matuska.
227*3ff01b23SMartin Matuska.It Sy metaslab_force_ganging Ns = Ns Sy 16777217 Ns B Ns B Po 16MB + 1B Pc Pq ulong
228*3ff01b23SMartin MatuskaMake some blocks above a certain size be gang blocks.
229*3ff01b23SMartin MatuskaThis option is used by the test suite to facilitate testing.
230*3ff01b23SMartin Matuska.
231*3ff01b23SMartin Matuska.It Sy zfs_history_output_max Ns = Ns Sy 1048576 Ns B Ns B Po 1MB Pc Pq int
232*3ff01b23SMartin MatuskaWhen attempting to log an output nvlist of an ioctl in the on-disk history,
233*3ff01b23SMartin Matuskathe output will not be stored if it is larger than this size (in bytes).
234*3ff01b23SMartin MatuskaThis must be less than
235*3ff01b23SMartin Matuska.Sy DMU_MAX_ACCESS Pq 64MB .
236*3ff01b23SMartin MatuskaThis applies primarily to
237*3ff01b23SMartin Matuska.Fn zfs_ioc_channel_program Pq cf. Xr zfs-program 8 .
238*3ff01b23SMartin Matuska.
239*3ff01b23SMartin Matuska.It Sy zfs_keep_log_spacemaps_at_export Ns = Ns Sy 0 Ns | Ns 1 Pq int
240*3ff01b23SMartin MatuskaPrevent log spacemaps from being destroyed during pool exports and destroys.
241*3ff01b23SMartin Matuska.
242*3ff01b23SMartin Matuska.It Sy zfs_metaslab_segment_weight_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
243*3ff01b23SMartin MatuskaEnable/disable segment-based metaslab selection.
244*3ff01b23SMartin Matuska.
245*3ff01b23SMartin Matuska.It Sy zfs_metaslab_switch_threshold Ns = Ns Sy 2 Pq int
246*3ff01b23SMartin MatuskaWhen using segment-based metaslab selection, continue allocating
247*3ff01b23SMartin Matuskafrom the active metaslab until this option's
248*3ff01b23SMartin Matuskaworth of buckets have been exhausted.
249*3ff01b23SMartin Matuska.
250*3ff01b23SMartin Matuska.It Sy metaslab_debug_load Ns = Ns Sy 0 Ns | Ns 1 Pq int
251*3ff01b23SMartin MatuskaLoad all metaslabs during pool import.
252*3ff01b23SMartin Matuska.
253*3ff01b23SMartin Matuska.It Sy metaslab_debug_unload Ns = Ns Sy 0 Ns | Ns 1 Pq int
254*3ff01b23SMartin MatuskaPrevent metaslabs from being unloaded.
255*3ff01b23SMartin Matuska.
256*3ff01b23SMartin Matuska.It Sy metaslab_fragmentation_factor_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
257*3ff01b23SMartin MatuskaEnable use of the fragmentation metric in computing metaslab weights.
258*3ff01b23SMartin Matuska.
259*3ff01b23SMartin Matuska.It Sy metaslab_df_max_search Ns = Ns Sy 16777216 Ns B Po 16MB Pc Pq int
260*3ff01b23SMartin MatuskaMaximum distance to search forward from the last offset.
261*3ff01b23SMartin MatuskaWithout this limit, fragmented pools can see
262*3ff01b23SMartin Matuska.Em >100`000
263*3ff01b23SMartin Matuskaiterations and
264*3ff01b23SMartin Matuska.Fn metaslab_block_picker
265*3ff01b23SMartin Matuskabecomes the performance limiting factor on high-performance storage.
266*3ff01b23SMartin Matuska.Pp
267*3ff01b23SMartin MatuskaWith the default setting of
268*3ff01b23SMartin Matuska.Sy 16MB ,
269*3ff01b23SMartin Matuskawe typically see less than
270*3ff01b23SMartin Matuska.Em 500
271*3ff01b23SMartin Matuskaiterations, even with very fragmented
272*3ff01b23SMartin Matuska.Sy ashift Ns = Ns Sy 9
273*3ff01b23SMartin Matuskapools.
274*3ff01b23SMartin MatuskaThe maximum number of iterations possible is
275*3ff01b23SMartin Matuska.Sy metaslab_df_max_search / 2^(ashift+1) .
276*3ff01b23SMartin MatuskaWith the default setting of
277*3ff01b23SMartin Matuska.Sy 16MB
278*3ff01b23SMartin Matuskathis is
279*3ff01b23SMartin Matuska.Em 16*1024 Pq with Sy ashift Ns = Ns Sy 9
280*3ff01b23SMartin Matuskaor
281*3ff01b23SMartin Matuska.Em 2*1024 Pq with Sy ashift Ns = Ns Sy 12 .
282*3ff01b23SMartin Matuska.
283*3ff01b23SMartin Matuska.It Sy metaslab_df_use_largest_segment Ns = Ns Sy 0 Ns | Ns 1 Pq int
284*3ff01b23SMartin MatuskaIf not searching forward (due to
285*3ff01b23SMartin Matuska.Sy metaslab_df_max_search , metaslab_df_free_pct ,
286*3ff01b23SMartin Matuska.No or Sy metaslab_df_alloc_threshold ) ,
287*3ff01b23SMartin Matuskathis tunable controls which segment is used.
288*3ff01b23SMartin MatuskaIf set, we will use the largest free segment.
289*3ff01b23SMartin MatuskaIf unset, we will use a segment of at least the requested size.
290*3ff01b23SMartin Matuska.
291*3ff01b23SMartin Matuska.It Sy zfs_metaslab_max_size_cache_sec Ns = Ns Sy 3600 Ns s Po 1h Pc Pq ulong
292*3ff01b23SMartin MatuskaWhen we unload a metaslab, we cache the size of the largest free chunk.
293*3ff01b23SMartin MatuskaWe use that cached size to determine whether or not to load a metaslab
294*3ff01b23SMartin Matuskafor a given allocation.
295*3ff01b23SMartin MatuskaAs more frees accumulate in that metaslab while it's unloaded,
296*3ff01b23SMartin Matuskathe cached max size becomes less and less accurate.
297*3ff01b23SMartin MatuskaAfter a number of seconds controlled by this tunable,
298*3ff01b23SMartin Matuskawe stop considering the cached max size and start
299*3ff01b23SMartin Matuskaconsidering only the histogram instead.
300*3ff01b23SMartin Matuska.
301*3ff01b23SMartin Matuska.It Sy zfs_metaslab_mem_limit Ns = Ns Sy 25 Ns % Pq int
302*3ff01b23SMartin MatuskaWhen we are loading a new metaslab, we check the amount of memory being used
303*3ff01b23SMartin Matuskato store metaslab range trees.
304*3ff01b23SMartin MatuskaIf it is over a threshold, we attempt to unload the least recently used metaslab
305*3ff01b23SMartin Matuskato prevent the system from clogging all of its memory with range trees.
306*3ff01b23SMartin MatuskaThis tunable sets the percentage of total system memory that is the threshold.
307*3ff01b23SMartin Matuska.
308*3ff01b23SMartin Matuska.It Sy zfs_metaslab_try_hard_before_gang Ns = Ns Sy 0 Ns | Ns 1 Pq int
309*3ff01b23SMartin Matuska.Bl -item -compact
310*3ff01b23SMartin Matuska.It
311*3ff01b23SMartin MatuskaIf unset, we will first try normal allocation.
312*3ff01b23SMartin Matuska.It
313*3ff01b23SMartin MatuskaIf that fails then we will do a gang allocation.
314*3ff01b23SMartin Matuska.It
315*3ff01b23SMartin MatuskaIf that fails then we will do a "try hard" gang allocation.
316*3ff01b23SMartin Matuska.It
317*3ff01b23SMartin MatuskaIf that fails then we will have a multi-layer gang block.
318*3ff01b23SMartin Matuska.El
319*3ff01b23SMartin Matuska.Pp
320*3ff01b23SMartin Matuska.Bl -item -compact
321*3ff01b23SMartin Matuska.It
322*3ff01b23SMartin MatuskaIf set, we will first try normal allocation.
323*3ff01b23SMartin Matuska.It
324*3ff01b23SMartin MatuskaIf that fails then we will do a "try hard" allocation.
325*3ff01b23SMartin Matuska.It
326*3ff01b23SMartin MatuskaIf that fails we will do a gang allocation.
327*3ff01b23SMartin Matuska.It
328*3ff01b23SMartin MatuskaIf that fails we will do a "try hard" gang allocation.
329*3ff01b23SMartin Matuska.It
330*3ff01b23SMartin MatuskaIf that fails then we will have a multi-layer gang block.
331*3ff01b23SMartin Matuska.El
332*3ff01b23SMartin Matuska.
333*3ff01b23SMartin Matuska.It Sy zfs_metaslab_find_max_tries Ns = Ns Sy 100 Pq int
334*3ff01b23SMartin MatuskaWhen not trying hard, we only consider this number of the best metaslabs.
335*3ff01b23SMartin MatuskaThis improves performance, especially when there are many metaslabs per vdev
336*3ff01b23SMartin Matuskaand the allocation can't actually be satisfied
337*3ff01b23SMartin Matuska(so we would otherwise iterate all metaslabs).
338*3ff01b23SMartin Matuska.
339*3ff01b23SMartin Matuska.It Sy zfs_vdev_default_ms_count Ns = Ns Sy 200 Pq int
340*3ff01b23SMartin MatuskaWhen a vdev is added, target this number of metaslabs per top-level vdev.
341*3ff01b23SMartin Matuska.
342*3ff01b23SMartin Matuska.It Sy zfs_vdev_default_ms_shift Ns = Ns Sy 29 Po 512MB Pc Pq int
343*3ff01b23SMartin MatuskaDefault limit for metaslab size.
344*3ff01b23SMartin Matuska.
345*3ff01b23SMartin Matuska.It Sy zfs_vdev_max_auto_ashift Ns = Ns Sy ASHIFT_MAX Po 16 Pc Pq ulong
346*3ff01b23SMartin MatuskaMaximum ashift used when optimizing for logical -> physical sector size on new
347*3ff01b23SMartin Matuskatop-level vdevs.
348*3ff01b23SMartin Matuska.
349*3ff01b23SMartin Matuska.It Sy zfs_vdev_min_auto_ashift Ns = Ns Sy ASHIFT_MIN Po 9 Pc Pq ulong
350*3ff01b23SMartin MatuskaMinimum ashift used when creating new top-level vdevs.
351*3ff01b23SMartin Matuska.
352*3ff01b23SMartin Matuska.It Sy zfs_vdev_min_ms_count Ns = Ns Sy 16 Pq int
353*3ff01b23SMartin MatuskaMinimum number of metaslabs to create in a top-level vdev.
354*3ff01b23SMartin Matuska.
355*3ff01b23SMartin Matuska.It Sy vdev_validate_skip Ns = Ns Sy 0 Ns | Ns 1 Pq int
356*3ff01b23SMartin MatuskaSkip label validation steps during pool import.
357*3ff01b23SMartin MatuskaChanging is not recommended unless you know what you're doing
358*3ff01b23SMartin Matuskaand are recovering a damaged label.
359*3ff01b23SMartin Matuska.
360*3ff01b23SMartin Matuska.It Sy zfs_vdev_ms_count_limit Ns = Ns Sy 131072 Po 128k Pc Pq int
361*3ff01b23SMartin MatuskaPractical upper limit of total metaslabs per top-level vdev.
362*3ff01b23SMartin Matuska.
363*3ff01b23SMartin Matuska.It Sy metaslab_preload_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
364*3ff01b23SMartin MatuskaEnable metaslab group preloading.
365*3ff01b23SMartin Matuska.
366*3ff01b23SMartin Matuska.It Sy metaslab_lba_weighting_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
367*3ff01b23SMartin MatuskaGive more weight to metaslabs with lower LBAs,
368*3ff01b23SMartin Matuskaassuming they have greater bandwidth,
369*3ff01b23SMartin Matuskaas is typically the case on a modern constant angular velocity disk drive.
370*3ff01b23SMartin Matuska.
371*3ff01b23SMartin Matuska.It Sy metaslab_unload_delay Ns = Ns Sy 32 Pq int
372*3ff01b23SMartin MatuskaAfter a metaslab is used, we keep it loaded for this many TXGs, to attempt to
373*3ff01b23SMartin Matuskareduce unnecessary reloading.
374*3ff01b23SMartin MatuskaNote that both this many TXGs and
375*3ff01b23SMartin Matuska.Sy metaslab_unload_delay_ms
376*3ff01b23SMartin Matuskamilliseconds must pass before unloading will occur.
377*3ff01b23SMartin Matuska.
378*3ff01b23SMartin Matuska.It Sy metaslab_unload_delay_ms Ns = Ns Sy 600000 Ns ms Po 10min Pc Pq int
379*3ff01b23SMartin MatuskaAfter a metaslab is used, we keep it loaded for this many milliseconds,
380*3ff01b23SMartin Matuskato attempt to reduce unnecessary reloading.
381*3ff01b23SMartin MatuskaNote, that both this many milliseconds and
382*3ff01b23SMartin Matuska.Sy metaslab_unload_delay
383*3ff01b23SMartin MatuskaTXGs must pass before unloading will occur.
384*3ff01b23SMartin Matuska.
385*3ff01b23SMartin Matuska.It Sy reference_history Ns = Ns Sy 3 Pq int
386*3ff01b23SMartin MatuskaMaximum reference holders being tracked when reference_tracking_enable is active.
387*3ff01b23SMartin Matuska.
388*3ff01b23SMartin Matuska.It Sy reference_tracking_enable Ns = Ns Sy 0 Ns | Ns 1 Pq int
389*3ff01b23SMartin MatuskaTrack reference holders to
390*3ff01b23SMartin Matuska.Sy refcount_t
391*3ff01b23SMartin Matuskaobjects (debug builds only).
392*3ff01b23SMartin Matuska.
393*3ff01b23SMartin Matuska.It Sy send_holes_without_birth_time Ns = Ns Sy 1 Ns | Ns 0 Pq int
394*3ff01b23SMartin MatuskaWhen set, the
395*3ff01b23SMartin Matuska.Sy hole_birth
396*3ff01b23SMartin Matuskaoptimization will not be used, and all holes will always be sent during a
397*3ff01b23SMartin Matuska.Nm zfs Cm send .
398*3ff01b23SMartin MatuskaThis is useful if you suspect your datasets are affected by a bug in
399*3ff01b23SMartin Matuska.Sy hole_birth .
400*3ff01b23SMartin Matuska.
401*3ff01b23SMartin Matuska.It Sy spa_config_path Ns = Ns Pa /etc/zfs/zpool.cache Pq charp
402*3ff01b23SMartin MatuskaSPA config file.
403*3ff01b23SMartin Matuska.
404*3ff01b23SMartin Matuska.It Sy spa_asize_inflation Ns = Ns Sy 24 Pq int
405*3ff01b23SMartin MatuskaMultiplication factor used to estimate actual disk consumption from the
406*3ff01b23SMartin Matuskasize of data being written.
407*3ff01b23SMartin MatuskaThe default value is a worst case estimate,
408*3ff01b23SMartin Matuskabut lower values may be valid for a given pool depending on its configuration.
409*3ff01b23SMartin MatuskaPool administrators who understand the factors involved
410*3ff01b23SMartin Matuskamay wish to specify a more realistic inflation factor,
411*3ff01b23SMartin Matuskaparticularly if they operate close to quota or capacity limits.
412*3ff01b23SMartin Matuska.
413*3ff01b23SMartin Matuska.It Sy spa_load_print_vdev_tree Ns = Ns Sy 0 Ns | Ns 1 Pq int
414*3ff01b23SMartin MatuskaWhether to print the vdev tree in the debugging message buffer during pool import.
415*3ff01b23SMartin Matuska.
416*3ff01b23SMartin Matuska.It Sy spa_load_verify_data Ns = Ns Sy 1 Ns | Ns 0 Pq int
417*3ff01b23SMartin MatuskaWhether to traverse data blocks during an "extreme rewind"
418*3ff01b23SMartin Matuska.Pq Fl X
419*3ff01b23SMartin Matuskaimport.
420*3ff01b23SMartin Matuska.Pp
421*3ff01b23SMartin MatuskaAn extreme rewind import normally performs a full traversal of all
422*3ff01b23SMartin Matuskablocks in the pool for verification.
423*3ff01b23SMartin MatuskaIf this parameter is unset, the traversal skips non-metadata blocks.
424*3ff01b23SMartin MatuskaIt can be toggled once the
425*3ff01b23SMartin Matuskaimport has started to stop or start the traversal of non-metadata blocks.
426*3ff01b23SMartin Matuska.
427*3ff01b23SMartin Matuska.It Sy spa_load_verify_metadata  Ns = Ns Sy 1 Ns | Ns 0 Pq int
428*3ff01b23SMartin MatuskaWhether to traverse blocks during an "extreme rewind"
429*3ff01b23SMartin Matuska.Pq Fl X
430*3ff01b23SMartin Matuskapool import.
431*3ff01b23SMartin Matuska.Pp
432*3ff01b23SMartin MatuskaAn extreme rewind import normally performs a full traversal of all
433*3ff01b23SMartin Matuskablocks in the pool for verification.
434*3ff01b23SMartin MatuskaIf this parameter is unset, the traversal is not performed.
435*3ff01b23SMartin MatuskaIt can be toggled once the import has started to stop or start the traversal.
436*3ff01b23SMartin Matuska.
437*3ff01b23SMartin Matuska.It Sy spa_load_verify_shift Ns = Ns Sy 4 Po 1/16th Pc Pq int
438*3ff01b23SMartin MatuskaSets the maximum number of bytes to consume during pool import to the log2
439*3ff01b23SMartin Matuskafraction of the target ARC size.
440*3ff01b23SMartin Matuska.
441*3ff01b23SMartin Matuska.It Sy spa_slop_shift Ns = Ns Sy 5 Po 1/32nd Pc Pq int
442*3ff01b23SMartin MatuskaNormally, we don't allow the last
443*3ff01b23SMartin Matuska.Sy 3.2% Pq Sy 1/2^spa_slop_shift
444*3ff01b23SMartin Matuskaof space in the pool to be consumed.
445*3ff01b23SMartin MatuskaThis ensures that we don't run the pool completely out of space,
446*3ff01b23SMartin Matuskadue to unaccounted changes (e.g. to the MOS).
447*3ff01b23SMartin MatuskaIt also limits the worst-case time to allocate space.
448*3ff01b23SMartin MatuskaIf we have less than this amount of free space,
449*3ff01b23SMartin Matuskamost ZPL operations (e.g. write, create) will return
450*3ff01b23SMartin Matuska.Sy ENOSPC .
451*3ff01b23SMartin Matuska.
452*3ff01b23SMartin Matuska.It Sy vdev_removal_max_span Ns = Ns Sy 32768 Ns B Po 32kB Pc Pq int
453*3ff01b23SMartin MatuskaDuring top-level vdev removal, chunks of data are copied from the vdev
454*3ff01b23SMartin Matuskawhich may include free space in order to trade bandwidth for IOPS.
455*3ff01b23SMartin MatuskaThis parameter determines the maximum span of free space, in bytes,
456*3ff01b23SMartin Matuskawhich will be included as "unnecessary" data in a chunk of copied data.
457*3ff01b23SMartin Matuska.Pp
458*3ff01b23SMartin MatuskaThe default value here was chosen to align with
459*3ff01b23SMartin Matuska.Sy zfs_vdev_read_gap_limit ,
460*3ff01b23SMartin Matuskawhich is a similar concept when doing
461*3ff01b23SMartin Matuskaregular reads (but there's no reason it has to be the same).
462*3ff01b23SMartin Matuska.
463*3ff01b23SMartin Matuska.It Sy vdev_file_logical_ashift Ns = Ns Sy 9 Po 512B Pc Pq ulong
464*3ff01b23SMartin MatuskaLogical ashift for file-based devices.
465*3ff01b23SMartin Matuska.
466*3ff01b23SMartin Matuska.It Sy vdev_file_physical_ashift Ns = Ns Sy 9 Po 512B Pc Pq ulong
467*3ff01b23SMartin MatuskaPhysical ashift for file-based devices.
468*3ff01b23SMartin Matuska.
469*3ff01b23SMartin Matuska.It Sy zap_iterate_prefetch Ns = Ns Sy 1 Ns | Ns 0 Pq int
470*3ff01b23SMartin MatuskaIf set, when we start iterating over a ZAP object,
471*3ff01b23SMartin Matuskaprefetch the entire object (all leaf blocks).
472*3ff01b23SMartin MatuskaHowever, this is limited by
473*3ff01b23SMartin Matuska.Sy dmu_prefetch_max .
474*3ff01b23SMartin Matuska.
475*3ff01b23SMartin Matuska.It Sy zfetch_array_rd_sz Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq ulong
476*3ff01b23SMartin MatuskaIf prefetching is enabled, disable prefetching for reads larger than this size.
477*3ff01b23SMartin Matuska.
478*3ff01b23SMartin Matuska.It Sy zfetch_max_distance Ns = Ns Sy 8388608 Ns B Po 8MB Pc Pq uint
479*3ff01b23SMartin MatuskaMax bytes to prefetch per stream.
480*3ff01b23SMartin Matuska.
481*3ff01b23SMartin Matuska.It Sy zfetch_max_idistance Ns = Ns Sy 67108864 Ns B Po 64MB Pc Pq uint
482*3ff01b23SMartin MatuskaMax bytes to prefetch indirects for per stream.
483*3ff01b23SMartin Matuska.
484*3ff01b23SMartin Matuska.It Sy zfetch_max_streams Ns = Ns Sy 8 Pq uint
485*3ff01b23SMartin MatuskaMax number of streams per zfetch (prefetch streams per file).
486*3ff01b23SMartin Matuska.
487*3ff01b23SMartin Matuska.It Sy zfetch_min_sec_reap Ns = Ns Sy 2 Pq uint
488*3ff01b23SMartin MatuskaMin time before an active prefetch stream can be reclaimed
489*3ff01b23SMartin Matuska.
490*3ff01b23SMartin Matuska.It Sy zfs_abd_scatter_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
491*3ff01b23SMartin MatuskaEnables ARC from using scatter/gather lists and forces all allocations to be
492*3ff01b23SMartin Matuskalinear in kernel memory.
493*3ff01b23SMartin MatuskaDisabling can improve performance in some code paths
494*3ff01b23SMartin Matuskaat the expense of fragmented kernel memory.
495*3ff01b23SMartin Matuska.
496*3ff01b23SMartin Matuska.It Sy zfs_abd_scatter_max_order Ns = Ns Sy MAX_ORDER-1 Pq uint
497*3ff01b23SMartin MatuskaMaximum number of consecutive memory pages allocated in a single block for
498*3ff01b23SMartin Matuskascatter/gather lists.
499*3ff01b23SMartin Matuska.Pp
500*3ff01b23SMartin MatuskaThe value of
501*3ff01b23SMartin Matuska.Sy MAX_ORDER
502*3ff01b23SMartin Matuskadepends on kernel configuration.
503*3ff01b23SMartin Matuska.
504*3ff01b23SMartin Matuska.It Sy zfs_abd_scatter_min_size Ns = Ns Sy 1536 Ns B Po 1.5kB Pc Pq uint
505*3ff01b23SMartin MatuskaThis is the minimum allocation size that will use scatter (page-based) ABDs.
506*3ff01b23SMartin MatuskaSmaller allocations will use linear ABDs.
507*3ff01b23SMartin Matuska.
508*3ff01b23SMartin Matuska.It Sy zfs_arc_dnode_limit Ns = Ns Sy 0 Ns B Pq ulong
509*3ff01b23SMartin MatuskaWhen the number of bytes consumed by dnodes in the ARC exceeds this number of
510*3ff01b23SMartin Matuskabytes, try to unpin some of it in response to demand for non-metadata.
511*3ff01b23SMartin MatuskaThis value acts as a ceiling to the amount of dnode metadata, and defaults to
512*3ff01b23SMartin Matuska.Sy 0 ,
513*3ff01b23SMartin Matuskawhich indicates that a percent which is based on
514*3ff01b23SMartin Matuska.Sy zfs_arc_dnode_limit_percent
515*3ff01b23SMartin Matuskaof the ARC meta buffers that may be used for dnodes.
516*3ff01b23SMartin Matuska.Pp
517*3ff01b23SMartin MatuskaAlso see
518*3ff01b23SMartin Matuska.Sy zfs_arc_meta_prune
519*3ff01b23SMartin Matuskawhich serves a similar purpose but is used
520*3ff01b23SMartin Matuskawhen the amount of metadata in the ARC exceeds
521*3ff01b23SMartin Matuska.Sy zfs_arc_meta_limit
522*3ff01b23SMartin Matuskarather than in response to overall demand for non-metadata.
523*3ff01b23SMartin Matuska.
524*3ff01b23SMartin Matuska.It Sy zfs_arc_dnode_limit_percent Ns = Ns Sy 10 Ns % Pq ulong
525*3ff01b23SMartin MatuskaPercentage that can be consumed by dnodes of ARC meta buffers.
526*3ff01b23SMartin Matuska.Pp
527*3ff01b23SMartin MatuskaSee also
528*3ff01b23SMartin Matuska.Sy zfs_arc_dnode_limit ,
529*3ff01b23SMartin Matuskawhich serves a similar purpose but has a higher priority if nonzero.
530*3ff01b23SMartin Matuska.
531*3ff01b23SMartin Matuska.It Sy zfs_arc_dnode_reduce_percent Ns = Ns Sy 10 Ns % Pq ulong
532*3ff01b23SMartin MatuskaPercentage of ARC dnodes to try to scan in response to demand for non-metadata
533*3ff01b23SMartin Matuskawhen the number of bytes consumed by dnodes exceeds
534*3ff01b23SMartin Matuska.Sy zfs_arc_dnode_limit .
535*3ff01b23SMartin Matuska.
536*3ff01b23SMartin Matuska.It Sy zfs_arc_average_blocksize Ns = Ns Sy 8192 Ns B Po 8kB Pc Pq int
537*3ff01b23SMartin MatuskaThe ARC's buffer hash table is sized based on the assumption of an average
538*3ff01b23SMartin Matuskablock size of this value.
539*3ff01b23SMartin MatuskaThis works out to roughly 1MB of hash table per 1GB of physical memory
540*3ff01b23SMartin Matuskawith 8-byte pointers.
541*3ff01b23SMartin MatuskaFor configurations with a known larger average block size,
542*3ff01b23SMartin Matuskathis value can be increased to reduce the memory footprint.
543*3ff01b23SMartin Matuska.
544*3ff01b23SMartin Matuska.It Sy zfs_arc_eviction_pct Ns = Ns Sy 200 Ns % Pq int
545*3ff01b23SMartin MatuskaWhen
546*3ff01b23SMartin Matuska.Fn arc_is_overflowing ,
547*3ff01b23SMartin Matuska.Fn arc_get_data_impl
548*3ff01b23SMartin Matuskawaits for this percent of the requested amount of data to be evicted.
549*3ff01b23SMartin MatuskaFor example, by default, for every
550*3ff01b23SMartin Matuska.Em 2kB
551*3ff01b23SMartin Matuskathat's evicted,
552*3ff01b23SMartin Matuska.Em 1kB
553*3ff01b23SMartin Matuskaof it may be "reused" by a new allocation.
554*3ff01b23SMartin MatuskaSince this is above
555*3ff01b23SMartin Matuska.Sy 100 Ns % ,
556*3ff01b23SMartin Matuskait ensures that progress is made towards getting
557*3ff01b23SMartin Matuska.Sy arc_size No under Sy arc_c .
558*3ff01b23SMartin MatuskaSince this is finite, it ensures that allocations can still happen,
559*3ff01b23SMartin Matuskaeven during the potentially long time that
560*3ff01b23SMartin Matuska.Sy arc_size No is more than Sy arc_c .
561*3ff01b23SMartin Matuska.
562*3ff01b23SMartin Matuska.It Sy zfs_arc_evict_batch_limit Ns = Ns Sy 10 Pq int
563*3ff01b23SMartin MatuskaNumber ARC headers to evict per sub-list before proceeding to another sub-list.
564*3ff01b23SMartin MatuskaThis batch-style operation prevents entire sub-lists from being evicted at once
565*3ff01b23SMartin Matuskabut comes at a cost of additional unlocking and locking.
566*3ff01b23SMartin Matuska.
567*3ff01b23SMartin Matuska.It Sy zfs_arc_grow_retry Ns = Ns Sy 0 Ns s Pq int
568*3ff01b23SMartin MatuskaIf set to a non zero value, it will replace the
569*3ff01b23SMartin Matuska.Sy arc_grow_retry
570*3ff01b23SMartin Matuskavalue with this value.
571*3ff01b23SMartin MatuskaThe
572*3ff01b23SMartin Matuska.Sy arc_grow_retry
573*3ff01b23SMartin Matuska.No value Pq default Sy 5 Ns s
574*3ff01b23SMartin Matuskais the number of seconds the ARC will wait before
575*3ff01b23SMartin Matuskatrying to resume growth after a memory pressure event.
576*3ff01b23SMartin Matuska.
577*3ff01b23SMartin Matuska.It Sy zfs_arc_lotsfree_percent Ns = Ns Sy 10 Ns % Pq int
578*3ff01b23SMartin MatuskaThrottle I/O when free system memory drops below this percentage of total
579*3ff01b23SMartin Matuskasystem memory.
580*3ff01b23SMartin MatuskaSetting this value to
581*3ff01b23SMartin Matuska.Sy 0
582*3ff01b23SMartin Matuskawill disable the throttle.
583*3ff01b23SMartin Matuska.
584*3ff01b23SMartin Matuska.It Sy zfs_arc_max Ns = Ns Sy 0 Ns B Pq ulong
585*3ff01b23SMartin MatuskaMax size of ARC in bytes.
586*3ff01b23SMartin MatuskaIf
587*3ff01b23SMartin Matuska.Sy 0 ,
588*3ff01b23SMartin Matuskathen the max size of ARC is determined by the amount of system memory installed.
589*3ff01b23SMartin MatuskaUnder Linux, half of system memory will be used as the limit.
590*3ff01b23SMartin MatuskaUnder
591*3ff01b23SMartin Matuska.Fx ,
592*3ff01b23SMartin Matuskathe larger of
593*3ff01b23SMartin Matuska.Sy all_system_memory - 1GB No and Sy 5/8 * all_system_memory
594*3ff01b23SMartin Matuskawill be used as the limit.
595*3ff01b23SMartin MatuskaThis value must be at least
596*3ff01b23SMartin Matuska.Sy 67108864 Ns B Pq 64MB .
597*3ff01b23SMartin Matuska.Pp
598*3ff01b23SMartin MatuskaThis value can be changed dynamically, with some caveats.
599*3ff01b23SMartin MatuskaIt cannot be set back to
600*3ff01b23SMartin Matuska.Sy 0
601*3ff01b23SMartin Matuskawhile running, and reducing it below the current ARC size will not cause
602*3ff01b23SMartin Matuskathe ARC to shrink without memory pressure to induce shrinking.
603*3ff01b23SMartin Matuska.
604*3ff01b23SMartin Matuska.It Sy zfs_arc_meta_adjust_restarts Ns = Ns Sy 4096 Pq ulong
605*3ff01b23SMartin MatuskaThe number of restart passes to make while scanning the ARC attempting
606*3ff01b23SMartin Matuskathe free buffers in order to stay below the
607*3ff01b23SMartin Matuska.Sy fs_arc_meta_limit .
608*3ff01b23SMartin MatuskaThis value should not need to be tuned but is available to facilitate
609*3ff01b23SMartin Matuskaperformance analysis.
610*3ff01b23SMartin Matuska.
611*3ff01b23SMartin Matuska.It Sy zfs_arc_meta_limit Ns = Ns Sy 0 Ns B Pq ulong
612*3ff01b23SMartin MatuskaThe maximum allowed size in bytes that metadata buffers are allowed to
613*3ff01b23SMartin Matuskaconsume in the ARC.
614*3ff01b23SMartin MatuskaWhen this limit is reached, metadata buffers will be reclaimed,
615*3ff01b23SMartin Matuskaeven if the overall
616*3ff01b23SMartin Matuska.Sy arc_c_max
617*3ff01b23SMartin Matuskahas not been reached.
618*3ff01b23SMartin MatuskaIt defaults to
619*3ff01b23SMartin Matuska.Sy 0 ,
620*3ff01b23SMartin Matuskawhich indicates that a percentage based on
621*3ff01b23SMartin Matuska.Sy zfs_arc_meta_limit_percent
622*3ff01b23SMartin Matuskaof the ARC may be used for metadata.
623*3ff01b23SMartin Matuska.Pp
624*3ff01b23SMartin MatuskaThis value my be changed dynamically, except that must be set to an explicit value
625*3ff01b23SMartin Matuska.Pq cannot be set back to Sy 0 .
626*3ff01b23SMartin Matuska.
627*3ff01b23SMartin Matuska.It Sy zfs_arc_meta_limit_percent Ns = Ns Sy 75 Ns % Pq ulong
628*3ff01b23SMartin MatuskaPercentage of ARC buffers that can be used for metadata.
629*3ff01b23SMartin Matuska.Pp
630*3ff01b23SMartin MatuskaSee also
631*3ff01b23SMartin Matuska.Sy zfs_arc_meta_limit ,
632*3ff01b23SMartin Matuskawhich serves a similar purpose but has a higher priority if nonzero.
633*3ff01b23SMartin Matuska.
634*3ff01b23SMartin Matuska.It Sy zfs_arc_meta_min Ns = Ns Sy 0 Ns B Pq ulong
635*3ff01b23SMartin MatuskaThe minimum allowed size in bytes that metadata buffers may consume in
636*3ff01b23SMartin Matuskathe ARC.
637*3ff01b23SMartin Matuska.
638*3ff01b23SMartin Matuska.It Sy zfs_arc_meta_prune Ns = Ns Sy 10000 Pq int
639*3ff01b23SMartin MatuskaThe number of dentries and inodes to be scanned looking for entries
640*3ff01b23SMartin Matuskawhich can be dropped.
641*3ff01b23SMartin MatuskaThis may be required when the ARC reaches the
642*3ff01b23SMartin Matuska.Sy zfs_arc_meta_limit
643*3ff01b23SMartin Matuskabecause dentries and inodes can pin buffers in the ARC.
644*3ff01b23SMartin MatuskaIncreasing this value will cause to dentry and inode caches
645*3ff01b23SMartin Matuskato be pruned more aggressively.
646*3ff01b23SMartin MatuskaSetting this value to
647*3ff01b23SMartin Matuska.Sy 0
648*3ff01b23SMartin Matuskawill disable pruning the inode and dentry caches.
649*3ff01b23SMartin Matuska.
650*3ff01b23SMartin Matuska.It Sy zfs_arc_meta_strategy Ns = Ns Sy 1 Ns | Ns 0 Pq int
651*3ff01b23SMartin MatuskaDefine the strategy for ARC metadata buffer eviction (meta reclaim strategy):
652*3ff01b23SMartin Matuska.Bl -tag -compact -offset 4n -width "0 (META_ONLY)"
653*3ff01b23SMartin Matuska.It Sy 0 Pq META_ONLY
654*3ff01b23SMartin Matuskaevict only the ARC metadata buffers
655*3ff01b23SMartin Matuska.It Sy 1 Pq BALANCED
656*3ff01b23SMartin Matuskaadditional data buffers may be evicted if required
657*3ff01b23SMartin Matuskato evict the required number of metadata buffers.
658*3ff01b23SMartin Matuska.El
659*3ff01b23SMartin Matuska.
660*3ff01b23SMartin Matuska.It Sy zfs_arc_min Ns = Ns Sy 0 Ns B Pq ulong
661*3ff01b23SMartin MatuskaMin size of ARC in bytes.
662*3ff01b23SMartin Matuska.No If set to Sy 0 , arc_c_min
663*3ff01b23SMartin Matuskawill default to consuming the larger of
664*3ff01b23SMartin Matuska.Sy 32MB No or Sy all_system_memory/32 .
665*3ff01b23SMartin Matuska.
666*3ff01b23SMartin Matuska.It Sy zfs_arc_min_prefetch_ms Ns = Ns Sy 0 Ns ms Ns Po Ns ≡ Ns 1s Pc Pq int
667*3ff01b23SMartin MatuskaMinimum time prefetched blocks are locked in the ARC.
668*3ff01b23SMartin Matuska.
669*3ff01b23SMartin Matuska.It Sy zfs_arc_min_prescient_prefetch_ms Ns = Ns Sy 0 Ns ms Ns Po Ns ≡ Ns 6s Pc Pq int
670*3ff01b23SMartin MatuskaMinimum time "prescient prefetched" blocks are locked in the ARC.
671*3ff01b23SMartin MatuskaThese blocks are meant to be prefetched fairly aggressively ahead of
672*3ff01b23SMartin Matuskathe code that may use them.
673*3ff01b23SMartin Matuska.
674*3ff01b23SMartin Matuska.It Sy zfs_max_missing_tvds Ns = Ns Sy 0 Pq int
675*3ff01b23SMartin MatuskaNumber of missing top-level vdevs which will be allowed during
676*3ff01b23SMartin Matuskapool import (only in read-only mode).
677*3ff01b23SMartin Matuska.
678*3ff01b23SMartin Matuska.It Sy zfs_max_nvlist_src_size Ns = Sy 0 Pq ulong
679*3ff01b23SMartin MatuskaMaximum size in bytes allowed to be passed as
680*3ff01b23SMartin Matuska.Sy zc_nvlist_src_size
681*3ff01b23SMartin Matuskafor ioctls on
682*3ff01b23SMartin Matuska.Pa /dev/zfs .
683*3ff01b23SMartin MatuskaThis prevents a user from causing the kernel to allocate
684*3ff01b23SMartin Matuskaan excessive amount of memory.
685*3ff01b23SMartin MatuskaWhen the limit is exceeded, the ioctl fails with
686*3ff01b23SMartin Matuska.Sy EINVAL
687*3ff01b23SMartin Matuskaand a description of the error is sent to the
688*3ff01b23SMartin Matuska.Pa zfs-dbgmsg
689*3ff01b23SMartin Matuskalog.
690*3ff01b23SMartin MatuskaThis parameter should not need to be touched under normal circumstances.
691*3ff01b23SMartin MatuskaIf
692*3ff01b23SMartin Matuska.Sy 0 ,
693*3ff01b23SMartin Matuskaequivalent to a quarter of the user-wired memory limit under
694*3ff01b23SMartin Matuska.Fx
695*3ff01b23SMartin Matuskaand to
696*3ff01b23SMartin Matuska.Sy 134217728 Ns B Pq 128MB
697*3ff01b23SMartin Matuskaunder Linux.
698*3ff01b23SMartin Matuska.
699*3ff01b23SMartin Matuska.It Sy zfs_multilist_num_sublists Ns = Ns Sy 0 Pq int
700*3ff01b23SMartin MatuskaTo allow more fine-grained locking, each ARC state contains a series
701*3ff01b23SMartin Matuskaof lists for both data and metadata objects.
702*3ff01b23SMartin MatuskaLocking is performed at the level of these "sub-lists".
703*3ff01b23SMartin MatuskaThis parameters controls the number of sub-lists per ARC state,
704*3ff01b23SMartin Matuskaand also applies to other uses of the multilist data structure.
705*3ff01b23SMartin Matuska.Pp
706*3ff01b23SMartin MatuskaIf
707*3ff01b23SMartin Matuska.Sy 0 ,
708*3ff01b23SMartin Matuskaequivalent to the greater of the number of online CPUs and
709*3ff01b23SMartin Matuska.Sy 4 .
710*3ff01b23SMartin Matuska.
711*3ff01b23SMartin Matuska.It Sy zfs_arc_overflow_shift Ns = Ns Sy 8 Pq int
712*3ff01b23SMartin MatuskaThe ARC size is considered to be overflowing if it exceeds the current
713*3ff01b23SMartin MatuskaARC target size
714*3ff01b23SMartin Matuska.Pq Sy arc_c
715*3ff01b23SMartin Matuskaby a threshold determined by this parameter.
716*3ff01b23SMartin MatuskaThe threshold is calculated as a fraction of
717*3ff01b23SMartin Matuska.Sy arc_c
718*3ff01b23SMartin Matuskausing the formula
719*3ff01b23SMartin Matuska.Sy arc_c >> zfs_arc_overflow_shift .
720*3ff01b23SMartin Matuska.Pp
721*3ff01b23SMartin MatuskaThe default value of
722*3ff01b23SMartin Matuska.Sy 8
723*3ff01b23SMartin Matuskacauses the ARC to be considered overflowing if it exceeds the target size by
724*3ff01b23SMartin Matuska.Em 1/256th Pq Em 0.3%
725*3ff01b23SMartin Matuskaof the target size.
726*3ff01b23SMartin Matuska.Pp
727*3ff01b23SMartin MatuskaWhen the ARC is overflowing, new buffer allocations are stalled until
728*3ff01b23SMartin Matuskathe reclaim thread catches up and the overflow condition no longer exists.
729*3ff01b23SMartin Matuska.
730*3ff01b23SMartin Matuska.It Sy zfs_arc_p_min_shift Ns = Ns Sy 0 Pq int
731*3ff01b23SMartin MatuskaIf nonzero, this will update
732*3ff01b23SMartin Matuska.Sy arc_p_min_shift Pq default Sy 4
733*3ff01b23SMartin Matuskawith the new value.
734*3ff01b23SMartin Matuska.Sy arc_p_min_shift No is used as a shift of Sy arc_c
735*3ff01b23SMartin Matuskawhen calculating the minumum
736*3ff01b23SMartin Matuska.Sy arc_p No size.
737*3ff01b23SMartin Matuska.
738*3ff01b23SMartin Matuska.It Sy zfs_arc_p_dampener_disable Ns = Ns Sy 1 Ns | Ns 0 Pq int
739*3ff01b23SMartin MatuskaDisable
740*3ff01b23SMartin Matuska.Sy arc_p
741*3ff01b23SMartin Matuskaadapt dampener, which reduces the maximum single adjustment to
742*3ff01b23SMartin Matuska.Sy arc_p .
743*3ff01b23SMartin Matuska.
744*3ff01b23SMartin Matuska.It Sy zfs_arc_shrink_shift Ns = Ns Sy 0 Pq int
745*3ff01b23SMartin MatuskaIf nonzero, this will update
746*3ff01b23SMartin Matuska.Sy arc_shrink_shift Pq default Sy 7
747*3ff01b23SMartin Matuskawith the new value.
748*3ff01b23SMartin Matuska.
749*3ff01b23SMartin Matuska.It Sy zfs_arc_pc_percent Ns = Ns Sy 0 Ns % Po off Pc Pq uint
750*3ff01b23SMartin MatuskaPercent of pagecache to reclaim ARC to.
751*3ff01b23SMartin Matuska.Pp
752*3ff01b23SMartin MatuskaThis tunable allows the ZFS ARC to play more nicely
753*3ff01b23SMartin Matuskawith the kernel's LRU pagecache.
754*3ff01b23SMartin MatuskaIt can guarantee that the ARC size won't collapse under scanning
755*3ff01b23SMartin Matuskapressure on the pagecache, yet still allows the ARC to be reclaimed down to
756*3ff01b23SMartin Matuska.Sy zfs_arc_min
757*3ff01b23SMartin Matuskaif necessary.
758*3ff01b23SMartin MatuskaThis value is specified as percent of pagecache size (as measured by
759*3ff01b23SMartin Matuska.Sy NR_FILE_PAGES ) ,
760*3ff01b23SMartin Matuskawhere that percent may exceed
761*3ff01b23SMartin Matuska.Sy 100 .
762*3ff01b23SMartin MatuskaThis
763*3ff01b23SMartin Matuskaonly operates during memory pressure/reclaim.
764*3ff01b23SMartin Matuska.
765*3ff01b23SMartin Matuska.It Sy zfs_arc_shrinker_limit Ns = Ns Sy 10000 Pq int
766*3ff01b23SMartin MatuskaThis is a limit on how many pages the ARC shrinker makes available for
767*3ff01b23SMartin Matuskaeviction in response to one page allocation attempt.
768*3ff01b23SMartin MatuskaNote that in practice, the kernel's shrinker can ask us to evict
769*3ff01b23SMartin Matuskaup to about four times this for one allocation attempt.
770*3ff01b23SMartin Matuska.Pp
771*3ff01b23SMartin MatuskaThe default limit of
772*3ff01b23SMartin Matuska.Sy 10000 Pq in practice, Em 160MB No per allocation attempt with 4kB pages
773*3ff01b23SMartin Matuskalimits the amount of time spent attempting to reclaim ARC memory to
774*3ff01b23SMartin Matuskaless than 100ms per allocation attempt,
775*3ff01b23SMartin Matuskaeven with a small average compressed block size of ~8kB.
776*3ff01b23SMartin Matuska.Pp
777*3ff01b23SMartin MatuskaThe parameter can be set to 0 (zero) to disable the limit,
778*3ff01b23SMartin Matuskaand only applies on Linux.
779*3ff01b23SMartin Matuska.
780*3ff01b23SMartin Matuska.It Sy zfs_arc_sys_free Ns = Ns Sy 0 Ns B Pq ulong
781*3ff01b23SMartin MatuskaThe target number of bytes the ARC should leave as free memory on the system.
782*3ff01b23SMartin MatuskaIf zero, equivalent to the bigger of
783*3ff01b23SMartin Matuska.Sy 512kB No and Sy all_system_memory/64 .
784*3ff01b23SMartin Matuska.
785*3ff01b23SMartin Matuska.It Sy zfs_autoimport_disable Ns = Ns Sy 1 Ns | Ns 0 Pq int
786*3ff01b23SMartin MatuskaDisable pool import at module load by ignoring the cache file
787*3ff01b23SMartin Matuska.Pq Sy spa_config_path .
788*3ff01b23SMartin Matuska.
789*3ff01b23SMartin Matuska.It Sy zfs_checksum_events_per_second Ns = Ns Sy 20 Ns /s Pq uint
790*3ff01b23SMartin MatuskaRate limit checksum events to this many per second.
791*3ff01b23SMartin MatuskaNote that this should not be set below the ZED thresholds
792*3ff01b23SMartin Matuska(currently 10 checksums over 10 seconds)
793*3ff01b23SMartin Matuskaor else the daemon may not trigger any action.
794*3ff01b23SMartin Matuska.
795*3ff01b23SMartin Matuska.It Sy zfs_commit_timeout_pct Ns = Ns Sy 5 Ns % Pq int
796*3ff01b23SMartin MatuskaThis controls the amount of time that a ZIL block (lwb) will remain "open"
797*3ff01b23SMartin Matuskawhen it isn't "full", and it has a thread waiting for it to be committed to
798*3ff01b23SMartin Matuskastable storage.
799*3ff01b23SMartin MatuskaThe timeout is scaled based on a percentage of the last lwb
800*3ff01b23SMartin Matuskalatency to avoid significantly impacting the latency of each individual
801*3ff01b23SMartin Matuskatransaction record (itx).
802*3ff01b23SMartin Matuska.
803*3ff01b23SMartin Matuska.It Sy zfs_condense_indirect_commit_entry_delay_ms Ns = Ns Sy 0 Ns ms Pq int
804*3ff01b23SMartin MatuskaVdev indirection layer (used for device removal) sleeps for this many
805*3ff01b23SMartin Matuskamilliseconds during mapping generation.
806*3ff01b23SMartin MatuskaIntended for use with the test suite to throttle vdev removal speed.
807*3ff01b23SMartin Matuska.
808*3ff01b23SMartin Matuska.It Sy zfs_condense_indirect_obsolete_pct Ns = Ns Sy 25 Ns % Pq int
809*3ff01b23SMartin MatuskaMinimum percent of obsolete bytes in vdev mapping required to attempt to condense
810*3ff01b23SMartin Matuska.Pq see Sy zfs_condense_indirect_vdevs_enable .
811*3ff01b23SMartin MatuskaIntended for use with the test suite
812*3ff01b23SMartin Matuskato facilitate triggering condensing as needed.
813*3ff01b23SMartin Matuska.
814*3ff01b23SMartin Matuska.It Sy zfs_condense_indirect_vdevs_enable Ns = Ns Sy 1 Ns | Ns 0 Pq int
815*3ff01b23SMartin MatuskaEnable condensing indirect vdev mappings.
816*3ff01b23SMartin MatuskaWhen set, attempt to condense indirect vdev mappings
817*3ff01b23SMartin Matuskaif the mapping uses more than
818*3ff01b23SMartin Matuska.Sy zfs_condense_min_mapping_bytes
819*3ff01b23SMartin Matuskabytes of memory and if the obsolete space map object uses more than
820*3ff01b23SMartin Matuska.Sy zfs_condense_max_obsolete_bytes
821*3ff01b23SMartin Matuskabytes on-disk.
822*3ff01b23SMartin MatuskaThe condensing process is an attempt to save memory by removing obsolete mappings.
823*3ff01b23SMartin Matuska.
824*3ff01b23SMartin Matuska.It Sy zfs_condense_max_obsolete_bytes Ns = Ns Sy 1073741824 Ns B Po 1GB Pc Pq ulong
825*3ff01b23SMartin MatuskaOnly attempt to condense indirect vdev mappings if the on-disk size
826*3ff01b23SMartin Matuskaof the obsolete space map object is greater than this number of bytes
827*3ff01b23SMartin Matuska.Pq see Sy zfs_condense_indirect_vdevs_enable .
828*3ff01b23SMartin Matuska.
829*3ff01b23SMartin Matuska.It Sy zfs_condense_min_mapping_bytes Ns = Ns Sy 131072 Ns B Po 128kB Pc Pq ulong
830*3ff01b23SMartin MatuskaMinimum size vdev mapping to attempt to condense
831*3ff01b23SMartin Matuska.Pq see Sy zfs_condense_indirect_vdevs_enable .
832*3ff01b23SMartin Matuska.
833*3ff01b23SMartin Matuska.It Sy zfs_dbgmsg_enable Ns = Ns Sy 1 Ns | Ns 0 Pq int
834*3ff01b23SMartin MatuskaInternally ZFS keeps a small log to facilitate debugging.
835*3ff01b23SMartin MatuskaThe log is enabled by default, and can be disabled by unsetting this option.
836*3ff01b23SMartin MatuskaThe contents of the log can be accessed by reading
837*3ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/dbgmsg .
838*3ff01b23SMartin MatuskaWriting
839*3ff01b23SMartin Matuska.Sy 0
840*3ff01b23SMartin Matuskato the file clears the log.
841*3ff01b23SMartin Matuska.Pp
842*3ff01b23SMartin MatuskaThis setting does not influence debug prints due to
843*3ff01b23SMartin Matuska.Sy zfs_flags .
844*3ff01b23SMartin Matuska.
845*3ff01b23SMartin Matuska.It Sy zfs_dbgmsg_maxsize Ns = Ns Sy 4194304 Ns B Po 4MB Pc Pq int
846*3ff01b23SMartin MatuskaMaximum size of the internal ZFS debug log.
847*3ff01b23SMartin Matuska.
848*3ff01b23SMartin Matuska.It Sy zfs_dbuf_state_index Ns = Ns Sy 0 Pq int
849*3ff01b23SMartin MatuskaHistorically used for controlling what reporting was available under
850*3ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs .
851*3ff01b23SMartin MatuskaNo effect.
852*3ff01b23SMartin Matuska.
853*3ff01b23SMartin Matuska.It Sy zfs_deadman_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
854*3ff01b23SMartin MatuskaWhen a pool sync operation takes longer than
855*3ff01b23SMartin Matuska.Sy zfs_deadman_synctime_ms ,
856*3ff01b23SMartin Matuskaor when an individual I/O operation takes longer than
857*3ff01b23SMartin Matuska.Sy zfs_deadman_ziotime_ms ,
858*3ff01b23SMartin Matuskathen the operation is considered to be "hung".
859*3ff01b23SMartin MatuskaIf
860*3ff01b23SMartin Matuska.Sy zfs_deadman_enabled
861*3ff01b23SMartin Matuskais set, then the deadman behavior is invoked as described by
862*3ff01b23SMartin Matuska.Sy zfs_deadman_failmode .
863*3ff01b23SMartin MatuskaBy default, the deadman is enabled and set to
864*3ff01b23SMartin Matuska.Sy wait
865*3ff01b23SMartin Matuskawhich results in "hung" I/Os only being logged.
866*3ff01b23SMartin MatuskaThe deadman is automatically disabled when a pool gets suspended.
867*3ff01b23SMartin Matuska.
868*3ff01b23SMartin Matuska.It Sy zfs_deadman_failmode Ns = Ns Sy wait Pq charp
869*3ff01b23SMartin MatuskaControls the failure behavior when the deadman detects a "hung" I/O operation.
870*3ff01b23SMartin MatuskaValid values are:
871*3ff01b23SMartin Matuska.Bl -tag -compact -offset 4n -width "continue"
872*3ff01b23SMartin Matuska.It Sy wait
873*3ff01b23SMartin MatuskaWait for a "hung" operation to complete.
874*3ff01b23SMartin MatuskaFor each "hung" operation a "deadman" event will be posted
875*3ff01b23SMartin Matuskadescribing that operation.
876*3ff01b23SMartin Matuska.It Sy continue
877*3ff01b23SMartin MatuskaAttempt to recover from a "hung" operation by re-dispatching it
878*3ff01b23SMartin Matuskato the I/O pipeline if possible.
879*3ff01b23SMartin Matuska.It Sy panic
880*3ff01b23SMartin MatuskaPanic the system.
881*3ff01b23SMartin MatuskaThis can be used to facilitate automatic fail-over
882*3ff01b23SMartin Matuskato a properly configured fail-over partner.
883*3ff01b23SMartin Matuska.El
884*3ff01b23SMartin Matuska.
885*3ff01b23SMartin Matuska.It Sy zfs_deadman_checktime_ms Ns = Ns Sy 60000 Ns ms Po 1min Pc Pq int
886*3ff01b23SMartin MatuskaCheck time in milliseconds.
887*3ff01b23SMartin MatuskaThis defines the frequency at which we check for hung I/O requests
888*3ff01b23SMartin Matuskaand potentially invoke the
889*3ff01b23SMartin Matuska.Sy zfs_deadman_failmode
890*3ff01b23SMartin Matuskabehavior.
891*3ff01b23SMartin Matuska.
892*3ff01b23SMartin Matuska.It Sy zfs_deadman_synctime_ms Ns = Ns Sy 600000 Ns ms Po 10min Pc Pq ulong
893*3ff01b23SMartin MatuskaInterval in milliseconds after which the deadman is triggered and also
894*3ff01b23SMartin Matuskathe interval after which a pool sync operation is considered to be "hung".
895*3ff01b23SMartin MatuskaOnce this limit is exceeded the deadman will be invoked every
896*3ff01b23SMartin Matuska.Sy zfs_deadman_checktime_ms
897*3ff01b23SMartin Matuskamilliseconds until the pool sync completes.
898*3ff01b23SMartin Matuska.
899*3ff01b23SMartin Matuska.It Sy zfs_deadman_ziotime_ms Ns = Ns Sy 300000 Ns ms Po 5min Pc Pq ulong
900*3ff01b23SMartin MatuskaInterval in milliseconds after which the deadman is triggered and an
901*3ff01b23SMartin Matuskaindividual I/O operation is considered to be "hung".
902*3ff01b23SMartin MatuskaAs long as the operation remains "hung",
903*3ff01b23SMartin Matuskathe deadman will be invoked every
904*3ff01b23SMartin Matuska.Sy zfs_deadman_checktime_ms
905*3ff01b23SMartin Matuskamilliseconds until the operation completes.
906*3ff01b23SMartin Matuska.
907*3ff01b23SMartin Matuska.It Sy zfs_dedup_prefetch Ns = Ns Sy 0 Ns | Ns 1 Pq int
908*3ff01b23SMartin MatuskaEnable prefetching dedup-ed blocks which are going to be freed.
909*3ff01b23SMartin Matuska.
910*3ff01b23SMartin Matuska.It Sy zfs_delay_min_dirty_percent Ns = Ns Sy 60 Ns % Pq int
911*3ff01b23SMartin MatuskaStart to delay each transaction once there is this amount of dirty data,
912*3ff01b23SMartin Matuskaexpressed as a percentage of
913*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max .
914*3ff01b23SMartin MatuskaThis value should be at least
915*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_active_max_dirty_percent .
916*3ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY .
917*3ff01b23SMartin Matuska.
918*3ff01b23SMartin Matuska.It Sy zfs_delay_scale Ns = Ns Sy 500000 Pq int
919*3ff01b23SMartin MatuskaThis controls how quickly the transaction delay approaches infinity.
920*3ff01b23SMartin MatuskaLarger values cause longer delays for a given amount of dirty data.
921*3ff01b23SMartin Matuska.Pp
922*3ff01b23SMartin MatuskaFor the smoothest delay, this value should be about 1 billion divided
923*3ff01b23SMartin Matuskaby the maximum number of operations per second.
924*3ff01b23SMartin MatuskaThis will smoothly handle between ten times and a tenth of this number.
925*3ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY .
926*3ff01b23SMartin Matuska.Pp
927*3ff01b23SMartin Matuska.Sy zfs_delay_scale * zfs_dirty_data_max Em must be smaller than Sy 2^64 .
928*3ff01b23SMartin Matuska.
929*3ff01b23SMartin Matuska.It Sy zfs_disable_ivset_guid_check Ns = Ns Sy 0 Ns | Ns 1 Pq int
930*3ff01b23SMartin MatuskaDisables requirement for IVset GUIDs to be present and match when doing a raw
931*3ff01b23SMartin Matuskareceive of encrypted datasets.
932*3ff01b23SMartin MatuskaIntended for users whose pools were created with
933*3ff01b23SMartin MatuskaOpenZFS pre-release versions and now have compatibility issues.
934*3ff01b23SMartin Matuska.
935*3ff01b23SMartin Matuska.It Sy zfs_key_max_salt_uses Ns = Ns Sy 400000000 Po 4*10^8 Pc Pq ulong
936*3ff01b23SMartin MatuskaMaximum number of uses of a single salt value before generating a new one for
937*3ff01b23SMartin Matuskaencrypted datasets.
938*3ff01b23SMartin MatuskaThe default value is also the maximum.
939*3ff01b23SMartin Matuska.
940*3ff01b23SMartin Matuska.It Sy zfs_object_mutex_size Ns = Ns Sy 64 Pq uint
941*3ff01b23SMartin MatuskaSize of the znode hashtable used for holds.
942*3ff01b23SMartin Matuska.Pp
943*3ff01b23SMartin MatuskaDue to the need to hold locks on objects that may not exist yet, kernel mutexes
944*3ff01b23SMartin Matuskaare not created per-object and instead a hashtable is used where collisions
945*3ff01b23SMartin Matuskawill result in objects waiting when there is not actually contention on the
946*3ff01b23SMartin Matuskasame object.
947*3ff01b23SMartin Matuska.
948*3ff01b23SMartin Matuska.It Sy zfs_slow_io_events_per_second Ns = Ns Sy 20 Ns /s Pq int
949*3ff01b23SMartin MatuskaRate limit delay and deadman zevents (which report slow I/Os) to this many per
950*3ff01b23SMartin Matuskasecond.
951*3ff01b23SMartin Matuska.
952*3ff01b23SMartin Matuska.It Sy zfs_unflushed_max_mem_amt Ns = Ns Sy 1073741824 Ns B Po 1GB Pc Pq ulong
953*3ff01b23SMartin MatuskaUpper-bound limit for unflushed metadata changes to be held by the
954*3ff01b23SMartin Matuskalog spacemap in memory, in bytes.
955*3ff01b23SMartin Matuska.
956*3ff01b23SMartin Matuska.It Sy zfs_unflushed_max_mem_ppm Ns = Ns Sy 1000 Ns ppm Po 0.1% Pc Pq ulong
957*3ff01b23SMartin MatuskaPart of overall system memory that ZFS allows to be used
958*3ff01b23SMartin Matuskafor unflushed metadata changes by the log spacemap, in millionths.
959*3ff01b23SMartin Matuska.
960*3ff01b23SMartin Matuska.It Sy zfs_unflushed_log_block_max Ns = Ns Sy 262144 Po 256k Pc Pq ulong
961*3ff01b23SMartin MatuskaDescribes the maximum number of log spacemap blocks allowed for each pool.
962*3ff01b23SMartin MatuskaThe default value means that the space in all the log spacemaps
963*3ff01b23SMartin Matuskacan add up to no more than
964*3ff01b23SMartin Matuska.Sy 262144
965*3ff01b23SMartin Matuskablocks (which means
966*3ff01b23SMartin Matuska.Em 32GB
967*3ff01b23SMartin Matuskaof logical space before compression and ditto blocks,
968*3ff01b23SMartin Matuskaassuming that blocksize is
969*3ff01b23SMartin Matuska.Em 128kB ) .
970*3ff01b23SMartin Matuska.Pp
971*3ff01b23SMartin MatuskaThis tunable is important because it involves a trade-off between import
972*3ff01b23SMartin Matuskatime after an unclean export and the frequency of flushing metaslabs.
973*3ff01b23SMartin MatuskaThe higher this number is, the more log blocks we allow when the pool is
974*3ff01b23SMartin Matuskaactive which means that we flush metaslabs less often and thus decrease
975*3ff01b23SMartin Matuskathe number of I/Os for spacemap updates per TXG.
976*3ff01b23SMartin MatuskaAt the same time though, that means that in the event of an unclean export,
977*3ff01b23SMartin Matuskathere will be more log spacemap blocks for us to read, inducing overhead
978*3ff01b23SMartin Matuskain the import time of the pool.
979*3ff01b23SMartin MatuskaThe lower the number, the amount of flushing increases, destroying log
980*3ff01b23SMartin Matuskablocks quicker as they become obsolete faster, which leaves less blocks
981*3ff01b23SMartin Matuskato be read during import time after a crash.
982*3ff01b23SMartin Matuska.Pp
983*3ff01b23SMartin MatuskaEach log spacemap block existing during pool import leads to approximately
984*3ff01b23SMartin Matuskaone extra logical I/O issued.
985*3ff01b23SMartin MatuskaThis is the reason why this tunable is exposed in terms of blocks rather
986*3ff01b23SMartin Matuskathan space used.
987*3ff01b23SMartin Matuska.
988*3ff01b23SMartin Matuska.It Sy zfs_unflushed_log_block_min Ns = Ns Sy 1000 Pq ulong
989*3ff01b23SMartin MatuskaIf the number of metaslabs is small and our incoming rate is high,
990*3ff01b23SMartin Matuskawe could get into a situation that we are flushing all our metaslabs every TXG.
991*3ff01b23SMartin MatuskaThus we always allow at least this many log blocks.
992*3ff01b23SMartin Matuska.
993*3ff01b23SMartin Matuska.It Sy zfs_unflushed_log_block_pct Ns = Ns Sy 400 Ns % Pq ulong
994*3ff01b23SMartin MatuskaTunable used to determine the number of blocks that can be used for
995*3ff01b23SMartin Matuskathe spacemap log, expressed as a percentage of the total number of
996*3ff01b23SMartin Matuskametaslabs in the pool.
997*3ff01b23SMartin Matuska.
998*3ff01b23SMartin Matuska.It Sy zfs_unlink_suspend_progress Ns = Ns Sy 0 Ns | Ns 1 Pq uint
999*3ff01b23SMartin MatuskaWhen enabled, files will not be asynchronously removed from the list of pending
1000*3ff01b23SMartin Matuskaunlinks and the space they consume will be leaked.
1001*3ff01b23SMartin MatuskaOnce this option has been disabled and the dataset is remounted,
1002*3ff01b23SMartin Matuskathe pending unlinks will be processed and the freed space returned to the pool.
1003*3ff01b23SMartin MatuskaThis option is used by the test suite.
1004*3ff01b23SMartin Matuska.
1005*3ff01b23SMartin Matuska.It Sy zfs_delete_blocks Ns = Ns Sy 20480 Pq ulong
1006*3ff01b23SMartin MatuskaThis is the used to define a large file for the purposes of deletion.
1007*3ff01b23SMartin MatuskaFiles containing more than
1008*3ff01b23SMartin Matuska.Sy zfs_delete_blocks
1009*3ff01b23SMartin Matuskawill be deleted asynchronously, while smaller files are deleted synchronously.
1010*3ff01b23SMartin MatuskaDecreasing this value will reduce the time spent in an
1011*3ff01b23SMartin Matuska.Xr unlink 2
1012*3ff01b23SMartin Matuskasystem call, at the expense of a longer delay before the freed space is available.
1013*3ff01b23SMartin Matuska.
1014*3ff01b23SMartin Matuska.It Sy zfs_dirty_data_max Ns = Pq int
1015*3ff01b23SMartin MatuskaDetermines the dirty space limit in bytes.
1016*3ff01b23SMartin MatuskaOnce this limit is exceeded, new writes are halted until space frees up.
1017*3ff01b23SMartin MatuskaThis parameter takes precedence over
1018*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max_percent .
1019*3ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY .
1020*3ff01b23SMartin Matuska.Pp
1021*3ff01b23SMartin MatuskaDefaults to
1022*3ff01b23SMartin Matuska.Sy physical_ram/10 ,
1023*3ff01b23SMartin Matuskacapped at
1024*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max_max .
1025*3ff01b23SMartin Matuska.
1026*3ff01b23SMartin Matuska.It Sy zfs_dirty_data_max_max Ns = Pq int
1027*3ff01b23SMartin MatuskaMaximum allowable value of
1028*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max ,
1029*3ff01b23SMartin Matuskaexpressed in bytes.
1030*3ff01b23SMartin MatuskaThis limit is only enforced at module load time, and will be ignored if
1031*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max
1032*3ff01b23SMartin Matuskais later changed.
1033*3ff01b23SMartin MatuskaThis parameter takes precedence over
1034*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max_max_percent .
1035*3ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY .
1036*3ff01b23SMartin Matuska.Pp
1037*3ff01b23SMartin MatuskaDefaults to
1038*3ff01b23SMartin Matuska.Sy physical_ram/4 ,
1039*3ff01b23SMartin Matuska.
1040*3ff01b23SMartin Matuska.It Sy zfs_dirty_data_max_max_percent Ns = Ns Sy 25 Ns % Pq int
1041*3ff01b23SMartin MatuskaMaximum allowable value of
1042*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max ,
1043*3ff01b23SMartin Matuskaexpressed as a percentage of physical RAM.
1044*3ff01b23SMartin MatuskaThis limit is only enforced at module load time, and will be ignored if
1045*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max
1046*3ff01b23SMartin Matuskais later changed.
1047*3ff01b23SMartin MatuskaThe parameter
1048*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max_max
1049*3ff01b23SMartin Matuskatakes precedence over this one.
1050*3ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY .
1051*3ff01b23SMartin Matuska.
1052*3ff01b23SMartin Matuska.It Sy zfs_dirty_data_max_percent Ns = Ns Sy 10 Ns % Pq int
1053*3ff01b23SMartin MatuskaDetermines the dirty space limit, expressed as a percentage of all memory.
1054*3ff01b23SMartin MatuskaOnce this limit is exceeded, new writes are halted until space frees up.
1055*3ff01b23SMartin MatuskaThe parameter
1056*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max
1057*3ff01b23SMartin Matuskatakes precedence over this one.
1058*3ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY .
1059*3ff01b23SMartin Matuska.Pp
1060*3ff01b23SMartin MatuskaSubject to
1061*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max_max .
1062*3ff01b23SMartin Matuska.
1063*3ff01b23SMartin Matuska.It Sy zfs_dirty_data_sync_percent Ns = Ns Sy 20 Ns % Pq int
1064*3ff01b23SMartin MatuskaStart syncing out a transaction group if there's at least this much dirty data
1065*3ff01b23SMartin Matuska.Pq as a percentage of Sy zfs_dirty_data_max .
1066*3ff01b23SMartin MatuskaThis should be less than
1067*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_active_min_dirty_percent .
1068*3ff01b23SMartin Matuska.
1069*3ff01b23SMartin Matuska.It Sy zfs_fallocate_reserve_percent Ns = Ns Sy 110 Ns % Pq uint
1070*3ff01b23SMartin MatuskaSince ZFS is a copy-on-write filesystem with snapshots, blocks cannot be
1071*3ff01b23SMartin Matuskapreallocated for a file in order to guarantee that later writes will not
1072*3ff01b23SMartin Matuskarun out of space.
1073*3ff01b23SMartin MatuskaInstead,
1074*3ff01b23SMartin Matuska.Xr fallocate 2
1075*3ff01b23SMartin Matuskaspace preallocation only checks that sufficient space is currently available
1076*3ff01b23SMartin Matuskain the pool or the user's project quota allocation,
1077*3ff01b23SMartin Matuskaand then creates a sparse file of the requested size.
1078*3ff01b23SMartin MatuskaThe requested space is multiplied by
1079*3ff01b23SMartin Matuska.Sy zfs_fallocate_reserve_percent
1080*3ff01b23SMartin Matuskato allow additional space for indirect blocks and other internal metadata.
1081*3ff01b23SMartin MatuskaSetting this to
1082*3ff01b23SMartin Matuska.Sy 0
1083*3ff01b23SMartin Matuskadisables support for
1084*3ff01b23SMartin Matuska.Xr fallocate 2
1085*3ff01b23SMartin Matuskaand causes it to return
1086*3ff01b23SMartin Matuska.Sy EOPNOTSUPP .
1087*3ff01b23SMartin Matuska.
1088*3ff01b23SMartin Matuska.It Sy zfs_fletcher_4_impl Ns = Ns Sy fastest Pq string
1089*3ff01b23SMartin MatuskaSelect a fletcher 4 implementation.
1090*3ff01b23SMartin Matuska.Pp
1091*3ff01b23SMartin MatuskaSupported selectors are:
1092*3ff01b23SMartin Matuska.Sy fastest , scalar , sse2 , ssse3 , avx2 , avx512f , avx512bw ,
1093*3ff01b23SMartin Matuska.No and Sy aarch64_neon .
1094*3ff01b23SMartin MatuskaAll except
1095*3ff01b23SMartin Matuska.Sy fastest No and Sy scalar
1096*3ff01b23SMartin Matuskarequire instruction set extensions to be available,
1097*3ff01b23SMartin Matuskaand will only appear if ZFS detects that they are present at runtime.
1098*3ff01b23SMartin MatuskaIf multiple implementations of fletcher 4 are available, the
1099*3ff01b23SMartin Matuska.Sy fastest
1100*3ff01b23SMartin Matuskawill be chosen using a micro benchmark.
1101*3ff01b23SMartin MatuskaSelecting
1102*3ff01b23SMartin Matuska.Sy scalar
1103*3ff01b23SMartin Matuskaresults in the original CPU-based calculation being used.
1104*3ff01b23SMartin MatuskaSelecting any option other than
1105*3ff01b23SMartin Matuska.Sy fastest No or Sy scalar
1106*3ff01b23SMartin Matuskaresults in vector instructions
1107*3ff01b23SMartin Matuskafrom the respective CPU instruction set being used.
1108*3ff01b23SMartin Matuska.
1109*3ff01b23SMartin Matuska.It Sy zfs_free_bpobj_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
1110*3ff01b23SMartin MatuskaEnable/disable the processing of the free_bpobj object.
1111*3ff01b23SMartin Matuska.
1112*3ff01b23SMartin Matuska.It Sy zfs_async_block_max_blocks Ns = Ns Sy ULONG_MAX Po unlimited Pc Pq ulong
1113*3ff01b23SMartin MatuskaMaximum number of blocks freed in a single TXG.
1114*3ff01b23SMartin Matuska.
1115*3ff01b23SMartin Matuska.It Sy zfs_max_async_dedup_frees Ns = Ns Sy 100000 Po 10^5 Pc Pq ulong
1116*3ff01b23SMartin MatuskaMaximum number of dedup blocks freed in a single TXG.
1117*3ff01b23SMartin Matuska.
1118*3ff01b23SMartin Matuska.It Sy zfs_override_estimate_recordsize Ns = Ns Sy 0 Pq ulong
1119*3ff01b23SMartin MatuskaIf nonzer, override record size calculation for
1120*3ff01b23SMartin Matuska.Nm zfs Cm send
1121*3ff01b23SMartin Matuskaestimates.
1122*3ff01b23SMartin Matuska.
1123*3ff01b23SMartin Matuska.It Sy zfs_vdev_async_read_max_active Ns = Ns Sy 3 Pq int
1124*3ff01b23SMartin MatuskaMaximum asynchronous read I/O operations active to each device.
1125*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1126*3ff01b23SMartin Matuska.
1127*3ff01b23SMartin Matuska.It Sy zfs_vdev_async_read_min_active Ns = Ns Sy 1 Pq int
1128*3ff01b23SMartin MatuskaMinimum asynchronous read I/O operation active to each device.
1129*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1130*3ff01b23SMartin Matuska.
1131*3ff01b23SMartin Matuska.It Sy zfs_vdev_async_write_active_max_dirty_percent Ns = Ns Sy 60 Ns % Pq int
1132*3ff01b23SMartin MatuskaWhen the pool has more than this much dirty data, use
1133*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_max_active
1134*3ff01b23SMartin Matuskato limit active async writes.
1135*3ff01b23SMartin MatuskaIf the dirty data is between the minimum and maximum,
1136*3ff01b23SMartin Matuskathe active I/O limit is linearly interpolated.
1137*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1138*3ff01b23SMartin Matuska.
1139*3ff01b23SMartin Matuska.It Sy zfs_vdev_async_write_active_min_dirty_percent Ns = Ns Sy 30 Ns % Pq int
1140*3ff01b23SMartin MatuskaWhen the pool has less than this much dirty data, use
1141*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_min_active
1142*3ff01b23SMartin Matuskato limit active async writes.
1143*3ff01b23SMartin MatuskaIf the dirty data is between the minimum and maximum,
1144*3ff01b23SMartin Matuskathe active I/O limit is linearly
1145*3ff01b23SMartin Matuskainterpolated.
1146*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1147*3ff01b23SMartin Matuska.
1148*3ff01b23SMartin Matuska.It Sy zfs_vdev_async_write_max_active Ns = Ns Sy 30 Pq int
1149*3ff01b23SMartin MatuskaMaximum asynchronous write I/O operations active to each device.
1150*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1151*3ff01b23SMartin Matuska.
1152*3ff01b23SMartin Matuska.It Sy zfs_vdev_async_write_min_active Ns = Ns Sy 2 Pq int
1153*3ff01b23SMartin MatuskaMinimum asynchronous write I/O operations active to each device.
1154*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1155*3ff01b23SMartin Matuska.Pp
1156*3ff01b23SMartin MatuskaLower values are associated with better latency on rotational media but poorer
1157*3ff01b23SMartin Matuskaresilver performance.
1158*3ff01b23SMartin MatuskaThe default value of
1159*3ff01b23SMartin Matuska.Sy 2
1160*3ff01b23SMartin Matuskawas chosen as a compromise.
1161*3ff01b23SMartin MatuskaA value of
1162*3ff01b23SMartin Matuska.Sy 3
1163*3ff01b23SMartin Matuskahas been shown to improve resilver performance further at a cost of
1164*3ff01b23SMartin Matuskafurther increasing latency.
1165*3ff01b23SMartin Matuska.
1166*3ff01b23SMartin Matuska.It Sy zfs_vdev_initializing_max_active Ns = Ns Sy 1 Pq int
1167*3ff01b23SMartin MatuskaMaximum initializing I/O operations active to each device.
1168*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1169*3ff01b23SMartin Matuska.
1170*3ff01b23SMartin Matuska.It Sy zfs_vdev_initializing_min_active Ns = Ns Sy 1 Pq int
1171*3ff01b23SMartin MatuskaMinimum initializing I/O operations active to each device.
1172*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1173*3ff01b23SMartin Matuska.
1174*3ff01b23SMartin Matuska.It Sy zfs_vdev_max_active Ns = Ns Sy 1000 Pq int
1175*3ff01b23SMartin MatuskaThe maximum number of I/O operations active to each device.
1176*3ff01b23SMartin MatuskaIdeally, this will be at least the sum of each queue's
1177*3ff01b23SMartin Matuska.Sy max_active .
1178*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1179*3ff01b23SMartin Matuska.
1180*3ff01b23SMartin Matuska.It Sy zfs_vdev_rebuild_max_active Ns = Ns Sy 3 Pq int
1181*3ff01b23SMartin MatuskaMaximum sequential resilver I/O operations active to each device.
1182*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1183*3ff01b23SMartin Matuska.
1184*3ff01b23SMartin Matuska.It Sy zfs_vdev_rebuild_min_active Ns = Ns Sy 1 Pq int
1185*3ff01b23SMartin MatuskaMinimum sequential resilver I/O operations active to each device.
1186*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1187*3ff01b23SMartin Matuska.
1188*3ff01b23SMartin Matuska.It Sy zfs_vdev_removal_max_active Ns = Ns Sy 2 Pq int
1189*3ff01b23SMartin MatuskaMaximum removal I/O operations active to each device.
1190*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1191*3ff01b23SMartin Matuska.
1192*3ff01b23SMartin Matuska.It Sy zfs_vdev_removal_min_active Ns = Ns Sy 1 Pq int
1193*3ff01b23SMartin MatuskaMinimum removal I/O operations active to each device.
1194*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1195*3ff01b23SMartin Matuska.
1196*3ff01b23SMartin Matuska.It Sy zfs_vdev_scrub_max_active Ns = Ns Sy 2 Pq int
1197*3ff01b23SMartin MatuskaMaximum scrub I/O operations active to each device.
1198*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1199*3ff01b23SMartin Matuska.
1200*3ff01b23SMartin Matuska.It Sy zfs_vdev_scrub_min_active Ns = Ns Sy 1 Pq int
1201*3ff01b23SMartin MatuskaMinimum scrub I/O operations active to each device.
1202*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1203*3ff01b23SMartin Matuska.
1204*3ff01b23SMartin Matuska.It Sy zfs_vdev_sync_read_max_active Ns = Ns Sy 10 Pq int
1205*3ff01b23SMartin MatuskaMaximum synchronous read I/O operations active to each device.
1206*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1207*3ff01b23SMartin Matuska.
1208*3ff01b23SMartin Matuska.It Sy zfs_vdev_sync_read_min_active Ns = Ns Sy 10 Pq int
1209*3ff01b23SMartin MatuskaMinimum synchronous read I/O operations active to each device.
1210*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1211*3ff01b23SMartin Matuska.
1212*3ff01b23SMartin Matuska.It Sy zfs_vdev_sync_write_max_active Ns = Ns Sy 10 Pq int
1213*3ff01b23SMartin MatuskaMaximum synchronous write I/O operations active to each device.
1214*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1215*3ff01b23SMartin Matuska.
1216*3ff01b23SMartin Matuska.It Sy zfs_vdev_sync_write_min_active Ns = Ns Sy 10 Pq int
1217*3ff01b23SMartin MatuskaMinimum synchronous write I/O operations active to each device.
1218*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1219*3ff01b23SMartin Matuska.
1220*3ff01b23SMartin Matuska.It Sy zfs_vdev_trim_max_active Ns = Ns Sy 2 Pq int
1221*3ff01b23SMartin MatuskaMaximum trim/discard I/O operations active to each device.
1222*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1223*3ff01b23SMartin Matuska.
1224*3ff01b23SMartin Matuska.It Sy zfs_vdev_trim_min_active Ns = Ns Sy 1 Pq int
1225*3ff01b23SMartin MatuskaMinimum trim/discard I/O operations active to each device.
1226*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1227*3ff01b23SMartin Matuska.
1228*3ff01b23SMartin Matuska.It Sy zfs_vdev_nia_delay Ns = Ns Sy 5 Pq int
1229*3ff01b23SMartin MatuskaFor non-interactive I/O (scrub, resilver, removal, initialize and rebuild),
1230*3ff01b23SMartin Matuskathe number of concurrently-active I/O operations is limited to
1231*3ff01b23SMartin Matuska.Sy zfs_*_min_active ,
1232*3ff01b23SMartin Matuskaunless the vdev is "idle".
1233*3ff01b23SMartin MatuskaWhen there are no interactive I/O operatinons active (synchronous or otherwise),
1234*3ff01b23SMartin Matuskaand
1235*3ff01b23SMartin Matuska.Sy zfs_vdev_nia_delay
1236*3ff01b23SMartin Matuskaoperations have completed since the last interactive operation,
1237*3ff01b23SMartin Matuskathen the vdev is considered to be "idle",
1238*3ff01b23SMartin Matuskaand the number of concurrently-active non-interactive operations is increased to
1239*3ff01b23SMartin Matuska.Sy zfs_*_max_active .
1240*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1241*3ff01b23SMartin Matuska.
1242*3ff01b23SMartin Matuska.It Sy zfs_vdev_nia_credit Ns = Ns Sy 5 Pq int
1243*3ff01b23SMartin MatuskaSome HDDs tend to prioritize sequential I/O so strongly, that concurrent
1244*3ff01b23SMartin Matuskarandom I/O latency reaches several seconds.
1245*3ff01b23SMartin MatuskaOn some HDDs this happens even if sequential I/O operations
1246*3ff01b23SMartin Matuskaare submitted one at a time, and so setting
1247*3ff01b23SMartin Matuska.Sy zfs_*_max_active Ns = Sy 1
1248*3ff01b23SMartin Matuskadoes not help.
1249*3ff01b23SMartin MatuskaTo prevent non-interactive I/O, like scrub,
1250*3ff01b23SMartin Matuskafrom monopolizing the device, no more than
1251*3ff01b23SMartin Matuska.Sy zfs_vdev_nia_credit operations can be sent
1252*3ff01b23SMartin Matuskawhile there are outstanding incomplete interactive operations.
1253*3ff01b23SMartin MatuskaThis enforced wait ensures the HDD services the interactive I/O
1254*3ff01b23SMartin Matuskawithin a reasonable amount of time.
1255*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1256*3ff01b23SMartin Matuska.
1257*3ff01b23SMartin Matuska.It Sy zfs_vdev_queue_depth_pct Ns = Ns Sy 1000 Ns % Pq int
1258*3ff01b23SMartin MatuskaMaximum number of queued allocations per top-level vdev expressed as
1259*3ff01b23SMartin Matuskaa percentage of
1260*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_max_active ,
1261*3ff01b23SMartin Matuskawhich allows the system to detect devices that are more capable
1262*3ff01b23SMartin Matuskaof handling allocations and to allocate more blocks to those devices.
1263*3ff01b23SMartin MatuskaThis allows for dynamic allocation distribution when devices are imbalanced,
1264*3ff01b23SMartin Matuskaas fuller devices will tend to be slower than empty devices.
1265*3ff01b23SMartin Matuska.Pp
1266*3ff01b23SMartin MatuskaAlso see
1267*3ff01b23SMartin Matuska.Sy zio_dva_throttle_enabled .
1268*3ff01b23SMartin Matuska.
1269*3ff01b23SMartin Matuska.It Sy zfs_expire_snapshot Ns = Ns Sy 300 Ns s Pq int
1270*3ff01b23SMartin MatuskaTime before expiring
1271*3ff01b23SMartin Matuska.Pa .zfs/snapshot .
1272*3ff01b23SMartin Matuska.
1273*3ff01b23SMartin Matuska.It Sy zfs_admin_snapshot Ns = Ns Sy 0 Ns | Ns 1 Pq int
1274*3ff01b23SMartin MatuskaAllow the creation, removal, or renaming of entries in the
1275*3ff01b23SMartin Matuska.Sy .zfs/snapshot
1276*3ff01b23SMartin Matuskadirectory to cause the creation, destruction, or renaming of snapshots.
1277*3ff01b23SMartin MatuskaWhen enabled, this functionality works both locally and over NFS exports
1278*3ff01b23SMartin Matuskawhich have the
1279*3ff01b23SMartin Matuska.Em no_root_squash
1280*3ff01b23SMartin Matuskaoption set.
1281*3ff01b23SMartin Matuska.
1282*3ff01b23SMartin Matuska.It Sy zfs_flags Ns = Ns Sy 0 Pq int
1283*3ff01b23SMartin MatuskaSet additional debugging flags.
1284*3ff01b23SMartin MatuskaThe following flags may be bitwise-ored together:
1285*3ff01b23SMartin Matuska.TS
1286*3ff01b23SMartin Matuskabox;
1287*3ff01b23SMartin Matuskalbz r l l .
1288*3ff01b23SMartin Matuska	Value	Symbolic Name	Description
1289*3ff01b23SMartin Matuska_
1290*3ff01b23SMartin Matuska	1	ZFS_DEBUG_DPRINTF	Enable dprintf entries in the debug log.
1291*3ff01b23SMartin Matuska*	2	ZFS_DEBUG_DBUF_VERIFY	Enable extra dbuf verifications.
1292*3ff01b23SMartin Matuska*	4	ZFS_DEBUG_DNODE_VERIFY	Enable extra dnode verifications.
1293*3ff01b23SMartin Matuska	8	ZFS_DEBUG_SNAPNAMES	Enable snapshot name verification.
1294*3ff01b23SMartin Matuska	16	ZFS_DEBUG_MODIFY	Check for illegally modified ARC buffers.
1295*3ff01b23SMartin Matuska	64	ZFS_DEBUG_ZIO_FREE	Enable verification of block frees.
1296*3ff01b23SMartin Matuska	128	ZFS_DEBUG_HISTOGRAM_VERIFY	Enable extra spacemap histogram verifications.
1297*3ff01b23SMartin Matuska	256	ZFS_DEBUG_METASLAB_VERIFY	Verify space accounting on disk matches in-memory \fBrange_trees\fP.
1298*3ff01b23SMartin Matuska	512	ZFS_DEBUG_SET_ERROR	Enable \fBSET_ERROR\fP and dprintf entries in the debug log.
1299*3ff01b23SMartin Matuska	1024	ZFS_DEBUG_INDIRECT_REMAP	Verify split blocks created by device removal.
1300*3ff01b23SMartin Matuska	2048	ZFS_DEBUG_TRIM	Verify TRIM ranges are always within the allocatable range tree.
1301*3ff01b23SMartin Matuska	4096	ZFS_DEBUG_LOG_SPACEMAP	Verify that the log summary is consistent with the spacemap log
1302*3ff01b23SMartin Matuska			       and enable \fBzfs_dbgmsgs\fP for metaslab loading and flushing.
1303*3ff01b23SMartin Matuska.TE
1304*3ff01b23SMartin Matuska.Sy \& * No Requires debug build.
1305*3ff01b23SMartin Matuska.
1306*3ff01b23SMartin Matuska.It Sy zfs_free_leak_on_eio Ns = Ns Sy 0 Ns | Ns 1 Pq int
1307*3ff01b23SMartin MatuskaIf destroy encounters an
1308*3ff01b23SMartin Matuska.Sy EIO
1309*3ff01b23SMartin Matuskawhile reading metadata (e.g. indirect blocks),
1310*3ff01b23SMartin Matuskaspace referenced by the missing metadata can not be freed.
1311*3ff01b23SMartin MatuskaNormally this causes the background destroy to become "stalled",
1312*3ff01b23SMartin Matuskaas it is unable to make forward progress.
1313*3ff01b23SMartin MatuskaWhile in this stalled state, all remaining space to free
1314*3ff01b23SMartin Matuskafrom the error-encountering filesystem is "temporarily leaked".
1315*3ff01b23SMartin MatuskaSet this flag to cause it to ignore the
1316*3ff01b23SMartin Matuska.Sy EIO ,
1317*3ff01b23SMartin Matuskapermanently leak the space from indirect blocks that can not be read,
1318*3ff01b23SMartin Matuskaand continue to free everything else that it can.
1319*3ff01b23SMartin Matuska.Pp
1320*3ff01b23SMartin MatuskaThe default "stalling" behavior is useful if the storage partially
1321*3ff01b23SMartin Matuskafails (i.e. some but not all I/O operations fail), and then later recovers.
1322*3ff01b23SMartin MatuskaIn this case, we will be able to continue pool operations while it is
1323*3ff01b23SMartin Matuskapartially failed, and when it recovers, we can continue to free the
1324*3ff01b23SMartin Matuskaspace, with no leaks.
1325*3ff01b23SMartin MatuskaNote, however, that this case is actually fairly rare.
1326*3ff01b23SMartin Matuska.Pp
1327*3ff01b23SMartin MatuskaTypically pools either
1328*3ff01b23SMartin Matuska.Bl -enum -compact -offset 4n -width "1."
1329*3ff01b23SMartin Matuska.It
1330*3ff01b23SMartin Matuskafail completely (but perhaps temporarily,
1331*3ff01b23SMartin Matuskae.g. due to a top-level vdev going offline), or
1332*3ff01b23SMartin Matuska.It
1333*3ff01b23SMartin Matuskahave localized, permanent errors (e.g. disk returns the wrong data
1334*3ff01b23SMartin Matuskadue to bit flip or firmware bug).
1335*3ff01b23SMartin Matuska.El
1336*3ff01b23SMartin MatuskaIn the former case, this setting does not matter because the
1337*3ff01b23SMartin Matuskapool will be suspended and the sync thread will not be able to make
1338*3ff01b23SMartin Matuskaforward progress regardless.
1339*3ff01b23SMartin MatuskaIn the latter, because the error is permanent, the best we can do
1340*3ff01b23SMartin Matuskais leak the minimum amount of space,
1341*3ff01b23SMartin Matuskawhich is what setting this flag will do.
1342*3ff01b23SMartin MatuskaIt is therefore reasonable for this flag to normally be set,
1343*3ff01b23SMartin Matuskabut we chose the more conservative approach of not setting it,
1344*3ff01b23SMartin Matuskaso that there is no possibility of
1345*3ff01b23SMartin Matuskaleaking space in the "partial temporary" failure case.
1346*3ff01b23SMartin Matuska.
1347*3ff01b23SMartin Matuska.It Sy zfs_free_min_time_ms Ns = Ns Sy 1000 Ns ms Po 1s Pc Pq int
1348*3ff01b23SMartin MatuskaDuring a
1349*3ff01b23SMartin Matuska.Nm zfs Cm destroy
1350*3ff01b23SMartin Matuskaoperation using the
1351*3ff01b23SMartin Matuska.Sy async_destroy
1352*3ff01b23SMartin Matuskafeature,
1353*3ff01b23SMartin Matuskaa minimum of this much time will be spent working on freeing blocks per TXG.
1354*3ff01b23SMartin Matuska.
1355*3ff01b23SMartin Matuska.It Sy zfs_obsolete_min_time_ms Ns = Ns Sy 500 Ns ms Pq int
1356*3ff01b23SMartin MatuskaSimilar to
1357*3ff01b23SMartin Matuska.Sy zfs_free_min_time_ms ,
1358*3ff01b23SMartin Matuskabut for cleanup of old indirection records for removed vdevs.
1359*3ff01b23SMartin Matuska.
1360*3ff01b23SMartin Matuska.It Sy zfs_immediate_write_sz Ns = Ns Sy 32768 Ns B Po 32kB Pc Pq long
1361*3ff01b23SMartin MatuskaLargest data block to write to the ZIL.
1362*3ff01b23SMartin MatuskaLarger blocks will be treated as if the dataset being written to had the
1363*3ff01b23SMartin Matuska.Sy logbias Ns = Ns Sy throughput
1364*3ff01b23SMartin Matuskaproperty set.
1365*3ff01b23SMartin Matuska.
1366*3ff01b23SMartin Matuska.It Sy zfs_initialize_value Ns = Ns Sy 16045690984833335022 Po 0xDEADBEEFDEADBEEE Pc Pq ulong
1367*3ff01b23SMartin MatuskaPattern written to vdev free space by
1368*3ff01b23SMartin Matuska.Xr zpool-initialize 8 .
1369*3ff01b23SMartin Matuska.
1370*3ff01b23SMartin Matuska.It Sy zfs_initialize_chunk_size Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq ulong
1371*3ff01b23SMartin MatuskaSize of writes used by
1372*3ff01b23SMartin Matuska.Xr zpool-initialize 8 .
1373*3ff01b23SMartin MatuskaThis option is used by the test suite.
1374*3ff01b23SMartin Matuska.
1375*3ff01b23SMartin Matuska.It Sy zfs_livelist_max_entries Ns = Ns Sy 500000 Po 5*10^5 Pc Pq ulong
1376*3ff01b23SMartin MatuskaThe threshold size (in block pointers) at which we create a new sub-livelist.
1377*3ff01b23SMartin MatuskaLarger sublists are more costly from a memory perspective but the fewer
1378*3ff01b23SMartin Matuskasublists there are, the lower the cost of insertion.
1379*3ff01b23SMartin Matuska.
1380*3ff01b23SMartin Matuska.It Sy zfs_livelist_min_percent_shared Ns = Ns Sy 75 Ns % Pq int
1381*3ff01b23SMartin MatuskaIf the amount of shared space between a snapshot and its clone drops below
1382*3ff01b23SMartin Matuskathis threshold, the clone turns off the livelist and reverts to the old
1383*3ff01b23SMartin Matuskadeletion method.
1384*3ff01b23SMartin MatuskaThis is in place because livelists no long give us a benefit
1385*3ff01b23SMartin Matuskaonce a clone has been overwritten enough.
1386*3ff01b23SMartin Matuska.
1387*3ff01b23SMartin Matuska.It Sy zfs_livelist_condense_new_alloc Ns = Ns Sy 0 Pq int
1388*3ff01b23SMartin MatuskaIncremented each time an extra ALLOC blkptr is added to a livelist entry while
1389*3ff01b23SMartin Matuskait is being condensed.
1390*3ff01b23SMartin MatuskaThis option is used by the test suite to track race conditions.
1391*3ff01b23SMartin Matuska.
1392*3ff01b23SMartin Matuska.It Sy zfs_livelist_condense_sync_cancel Ns = Ns Sy 0 Pq int
1393*3ff01b23SMartin MatuskaIncremented each time livelist condensing is canceled while in
1394*3ff01b23SMartin Matuska.Fn spa_livelist_condense_sync .
1395*3ff01b23SMartin MatuskaThis option is used by the test suite to track race conditions.
1396*3ff01b23SMartin Matuska.
1397*3ff01b23SMartin Matuska.It Sy zfs_livelist_condense_sync_pause Ns = Ns Sy 0 Ns | Ns 1 Pq int
1398*3ff01b23SMartin MatuskaWhen set, the livelist condense process pauses indefinitely before
1399*3ff01b23SMartin Matuskaexecuting the synctask -
1400*3ff01b23SMartin Matuska.Fn spa_livelist_condense_sync .
1401*3ff01b23SMartin MatuskaThis option is used by the test suite to trigger race conditions.
1402*3ff01b23SMartin Matuska.
1403*3ff01b23SMartin Matuska.It Sy zfs_livelist_condense_zthr_cancel Ns = Ns Sy 0 Pq int
1404*3ff01b23SMartin MatuskaIncremented each time livelist condensing is canceled while in
1405*3ff01b23SMartin Matuska.Fn spa_livelist_condense_cb .
1406*3ff01b23SMartin MatuskaThis option is used by the test suite to track race conditions.
1407*3ff01b23SMartin Matuska.
1408*3ff01b23SMartin Matuska.It Sy zfs_livelist_condense_zthr_pause Ns = Ns Sy 0 Ns | Ns 1 Pq int
1409*3ff01b23SMartin MatuskaWhen set, the livelist condense process pauses indefinitely before
1410*3ff01b23SMartin Matuskaexecuting the open context condensing work in
1411*3ff01b23SMartin Matuska.Fn spa_livelist_condense_cb .
1412*3ff01b23SMartin MatuskaThis option is used by the test suite to trigger race conditions.
1413*3ff01b23SMartin Matuska.
1414*3ff01b23SMartin Matuska.It Sy zfs_lua_max_instrlimit Ns = Ns Sy 100000000 Po 10^8 Pc Pq ulong
1415*3ff01b23SMartin MatuskaThe maximum execution time limit that can be set for a ZFS channel program,
1416*3ff01b23SMartin Matuskaspecified as a number of Lua instructions.
1417*3ff01b23SMartin Matuska.
1418*3ff01b23SMartin Matuska.It Sy zfs_lua_max_memlimit Ns = Ns Sy 104857600 Po 100MB Pc Pq ulong
1419*3ff01b23SMartin MatuskaThe maximum memory limit that can be set for a ZFS channel program, specified
1420*3ff01b23SMartin Matuskain bytes.
1421*3ff01b23SMartin Matuska.
1422*3ff01b23SMartin Matuska.It Sy zfs_max_dataset_nesting Ns = Ns Sy 50 Pq int
1423*3ff01b23SMartin MatuskaThe maximum depth of nested datasets.
1424*3ff01b23SMartin MatuskaThis value can be tuned temporarily to
1425*3ff01b23SMartin Matuskafix existing datasets that exceed the predefined limit.
1426*3ff01b23SMartin Matuska.
1427*3ff01b23SMartin Matuska.It Sy zfs_max_log_walking Ns = Ns Sy 5 Pq ulong
1428*3ff01b23SMartin MatuskaThe number of past TXGs that the flushing algorithm of the log spacemap
1429*3ff01b23SMartin Matuskafeature uses to estimate incoming log blocks.
1430*3ff01b23SMartin Matuska.
1431*3ff01b23SMartin Matuska.It Sy zfs_max_logsm_summary_length Ns = Ns Sy 10 Pq ulong
1432*3ff01b23SMartin MatuskaMaximum number of rows allowed in the summary of the spacemap log.
1433*3ff01b23SMartin Matuska.
1434*3ff01b23SMartin Matuska.It Sy zfs_max_recordsize Ns = Ns Sy 1048576 Po 1MB Pc Pq int
1435*3ff01b23SMartin MatuskaWe currently support block sizes from
1436*3ff01b23SMartin Matuska.Em 512B No to Em 16MB .
1437*3ff01b23SMartin MatuskaThe benefits of larger blocks, and thus larger I/O,
1438*3ff01b23SMartin Matuskaneed to be weighed against the cost of COWing a giant block to modify one byte.
1439*3ff01b23SMartin MatuskaAdditionally, very large blocks can have an impact on I/O latency,
1440*3ff01b23SMartin Matuskaand also potentially on the memory allocator.
1441*3ff01b23SMartin MatuskaTherefore, we do not allow the recordsize to be set larger than this tunable.
1442*3ff01b23SMartin MatuskaLarger blocks can be created by changing it,
1443*3ff01b23SMartin Matuskaand pools with larger blocks can always be imported and used,
1444*3ff01b23SMartin Matuskaregardless of this setting.
1445*3ff01b23SMartin Matuska.
1446*3ff01b23SMartin Matuska.It Sy zfs_allow_redacted_dataset_mount Ns = Ns Sy 0 Ns | Ns 1 Pq int
1447*3ff01b23SMartin MatuskaAllow datasets received with redacted send/receive to be mounted.
1448*3ff01b23SMartin MatuskaNormally disabled because these datasets may be missing key data.
1449*3ff01b23SMartin Matuska.
1450*3ff01b23SMartin Matuska.It Sy zfs_min_metaslabs_to_flush Ns = Ns Sy 1 Pq ulong
1451*3ff01b23SMartin MatuskaMinimum number of metaslabs to flush per dirty TXG.
1452*3ff01b23SMartin Matuska.
1453*3ff01b23SMartin Matuska.It Sy zfs_metaslab_fragmentation_threshold Ns = Ns Sy 70 Ns % Pq int
1454*3ff01b23SMartin MatuskaAllow metaslabs to keep their active state as long as their fragmentation
1455*3ff01b23SMartin Matuskapercentage is no more than this value.
1456*3ff01b23SMartin MatuskaAn active metaslab that exceeds this threshold
1457*3ff01b23SMartin Matuskawill no longer keep its active status allowing better metaslabs to be selected.
1458*3ff01b23SMartin Matuska.
1459*3ff01b23SMartin Matuska.It Sy zfs_mg_fragmentation_threshold Ns = Ns Sy 95 Ns % Pq int
1460*3ff01b23SMartin MatuskaMetaslab groups are considered eligible for allocations if their
1461*3ff01b23SMartin Matuskafragmentation metric (measured as a percentage) is less than or equal to
1462*3ff01b23SMartin Matuskathis value.
1463*3ff01b23SMartin MatuskaIf a metaslab group exceeds this threshold then it will be
1464*3ff01b23SMartin Matuskaskipped unless all metaslab groups within the metaslab class have also
1465*3ff01b23SMartin Matuskacrossed this threshold.
1466*3ff01b23SMartin Matuska.
1467*3ff01b23SMartin Matuska.It Sy zfs_mg_noalloc_threshold Ns = Ns Sy 0 Ns % Pq int
1468*3ff01b23SMartin MatuskaDefines a threshold at which metaslab groups should be eligible for allocations.
1469*3ff01b23SMartin MatuskaThe value is expressed as a percentage of free space
1470*3ff01b23SMartin Matuskabeyond which a metaslab group is always eligible for allocations.
1471*3ff01b23SMartin MatuskaIf a metaslab group's free space is less than or equal to the
1472*3ff01b23SMartin Matuskathreshold, the allocator will avoid allocating to that group
1473*3ff01b23SMartin Matuskaunless all groups in the pool have reached the threshold.
1474*3ff01b23SMartin MatuskaOnce all groups have reached the threshold, all groups are allowed to accept
1475*3ff01b23SMartin Matuskaallocations.
1476*3ff01b23SMartin MatuskaThe default value of
1477*3ff01b23SMartin Matuska.Sy 0
1478*3ff01b23SMartin Matuskadisables the feature and causes all metaslab groups to be eligible for allocations.
1479*3ff01b23SMartin Matuska.Pp
1480*3ff01b23SMartin MatuskaThis parameter allows one to deal with pools having heavily imbalanced
1481*3ff01b23SMartin Matuskavdevs such as would be the case when a new vdev has been added.
1482*3ff01b23SMartin MatuskaSetting the threshold to a non-zero percentage will stop allocations
1483*3ff01b23SMartin Matuskafrom being made to vdevs that aren't filled to the specified percentage
1484*3ff01b23SMartin Matuskaand allow lesser filled vdevs to acquire more allocations than they
1485*3ff01b23SMartin Matuskaotherwise would under the old
1486*3ff01b23SMartin Matuska.Sy zfs_mg_alloc_failures
1487*3ff01b23SMartin Matuskafacility.
1488*3ff01b23SMartin Matuska.
1489*3ff01b23SMartin Matuska.It Sy zfs_ddt_data_is_special Ns = Ns Sy 1 Ns | Ns 0 Pq int
1490*3ff01b23SMartin MatuskaIf enabled, ZFS will place DDT data into the special allocation class.
1491*3ff01b23SMartin Matuska.
1492*3ff01b23SMartin Matuska.It Sy zfs_user_indirect_is_special Ns = Ns Sy 1 Ns | Ns 0 Pq int
1493*3ff01b23SMartin MatuskaIf enabled, ZFS will place user data indirect blocks
1494*3ff01b23SMartin Matuskainto the special allocation class.
1495*3ff01b23SMartin Matuska.
1496*3ff01b23SMartin Matuska.It Sy zfs_multihost_history Ns = Ns Sy 0 Pq int
1497*3ff01b23SMartin MatuskaHistorical statistics for this many latest multihost updates will be available in
1498*3ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/ Ns Ao Ar pool Ac Ns Pa /multihost .
1499*3ff01b23SMartin Matuska.
1500*3ff01b23SMartin Matuska.It Sy zfs_multihost_interval Ns = Ns Sy 1000 Ns ms Po 1s Pc Pq ulong
1501*3ff01b23SMartin MatuskaUsed to control the frequency of multihost writes which are performed when the
1502*3ff01b23SMartin Matuska.Sy multihost
1503*3ff01b23SMartin Matuskapool property is on.
1504*3ff01b23SMartin MatuskaThis is one of the factors used to determine the
1505*3ff01b23SMartin Matuskalength of the activity check during import.
1506*3ff01b23SMartin Matuska.Pp
1507*3ff01b23SMartin MatuskaThe multihost write period is
1508*3ff01b23SMartin Matuska.Sy zfs_multihost_interval / leaf-vdevs .
1509*3ff01b23SMartin MatuskaOn average a multihost write will be issued for each leaf vdev
1510*3ff01b23SMartin Matuskaevery
1511*3ff01b23SMartin Matuska.Sy zfs_multihost_interval
1512*3ff01b23SMartin Matuskamilliseconds.
1513*3ff01b23SMartin MatuskaIn practice, the observed period can vary with the I/O load
1514*3ff01b23SMartin Matuskaand this observed value is the delay which is stored in the uberblock.
1515*3ff01b23SMartin Matuska.
1516*3ff01b23SMartin Matuska.It Sy zfs_multihost_import_intervals Ns = Ns Sy 20 Pq uint
1517*3ff01b23SMartin MatuskaUsed to control the duration of the activity test on import.
1518*3ff01b23SMartin MatuskaSmaller values of
1519*3ff01b23SMartin Matuska.Sy zfs_multihost_import_intervals
1520*3ff01b23SMartin Matuskawill reduce the import time but increase
1521*3ff01b23SMartin Matuskathe risk of failing to detect an active pool.
1522*3ff01b23SMartin MatuskaThe total activity check time is never allowed to drop below one second.
1523*3ff01b23SMartin Matuska.Pp
1524*3ff01b23SMartin MatuskaOn import the activity check waits a minimum amount of time determined by
1525*3ff01b23SMartin Matuska.Sy zfs_multihost_interval * zfs_multihost_import_intervals ,
1526*3ff01b23SMartin Matuskaor the same product computed on the host which last had the pool imported,
1527*3ff01b23SMartin Matuskawhichever is greater.
1528*3ff01b23SMartin MatuskaThe activity check time may be further extended if the value of MMP
1529*3ff01b23SMartin Matuskadelay found in the best uberblock indicates actual multihost updates happened
1530*3ff01b23SMartin Matuskaat longer intervals than
1531*3ff01b23SMartin Matuska.Sy zfs_multihost_interval .
1532*3ff01b23SMartin MatuskaA minimum of
1533*3ff01b23SMartin Matuska.Em 100ms
1534*3ff01b23SMartin Matuskais enforced.
1535*3ff01b23SMartin Matuska.Pp
1536*3ff01b23SMartin Matuska.Sy 0 No is equivalent to Sy 1 .
1537*3ff01b23SMartin Matuska.
1538*3ff01b23SMartin Matuska.It Sy zfs_multihost_fail_intervals Ns = Ns Sy 10 Pq uint
1539*3ff01b23SMartin MatuskaControls the behavior of the pool when multihost write failures or delays are
1540*3ff01b23SMartin Matuskadetected.
1541*3ff01b23SMartin Matuska.Pp
1542*3ff01b23SMartin MatuskaWhen
1543*3ff01b23SMartin Matuska.Sy 0 ,
1544*3ff01b23SMartin Matuskamultihost write failures or delays are ignored.
1545*3ff01b23SMartin MatuskaThe failures will still be reported to the ZED which depending on
1546*3ff01b23SMartin Matuskaits configuration may take action such as suspending the pool or offlining a
1547*3ff01b23SMartin Matuskadevice.
1548*3ff01b23SMartin Matuska.Pp
1549*3ff01b23SMartin MatuskaOtherwise, the pool will be suspended if
1550*3ff01b23SMartin Matuska.Sy zfs_multihost_fail_intervals * zfs_multihost_interval
1551*3ff01b23SMartin Matuskamilliseconds pass without a successful MMP write.
1552*3ff01b23SMartin MatuskaThis guarantees the activity test will see MMP writes if the pool is imported.
1553*3ff01b23SMartin Matuska.Sy 1 No is equivalent to Sy 2 ;
1554*3ff01b23SMartin Matuskathis is necessary to prevent the pool from being suspended
1555*3ff01b23SMartin Matuskadue to normal, small I/O latency variations.
1556*3ff01b23SMartin Matuska.
1557*3ff01b23SMartin Matuska.It Sy zfs_no_scrub_io Ns = Ns Sy 0 Ns | Ns 1 Pq int
1558*3ff01b23SMartin MatuskaSet to disable scrub I/O.
1559*3ff01b23SMartin MatuskaThis results in scrubs not actually scrubbing data and
1560*3ff01b23SMartin Matuskasimply doing a metadata crawl of the pool instead.
1561*3ff01b23SMartin Matuska.
1562*3ff01b23SMartin Matuska.It Sy zfs_no_scrub_prefetch Ns = Ns Sy 0 Ns | Ns 1 Pq int
1563*3ff01b23SMartin MatuskaSet to disable block prefetching for scrubs.
1564*3ff01b23SMartin Matuska.
1565*3ff01b23SMartin Matuska.It Sy zfs_nocacheflush Ns = Ns Sy 0 Ns | Ns 1 Pq int
1566*3ff01b23SMartin MatuskaDisable cache flush operations on disks when writing.
1567*3ff01b23SMartin MatuskaSetting this will cause pool corruption on power loss
1568*3ff01b23SMartin Matuskaif a volatile out-of-order write cache is enabled.
1569*3ff01b23SMartin Matuska.
1570*3ff01b23SMartin Matuska.It Sy zfs_nopwrite_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
1571*3ff01b23SMartin MatuskaAllow no-operation writes.
1572*3ff01b23SMartin MatuskaThe occurrence of nopwrites will further depend on other pool properties
1573*3ff01b23SMartin Matuska.Pq i.a. the checksumming and compression algorithms .
1574*3ff01b23SMartin Matuska.
1575*3ff01b23SMartin Matuska.It Sy zfs_dmu_offset_next_sync Ns = Ns Sy 0 Ns | ns 1 Pq int
1576*3ff01b23SMartin MatuskaEnable forcing TXG sync to find holes.
1577*3ff01b23SMartin MatuskaWhen enabled forces ZFS to act like prior versions when
1578*3ff01b23SMartin Matuska.Sy SEEK_HOLE No or Sy SEEK_DATA
1579*3ff01b23SMartin Matuskaflags are used, which, when a dnode is dirty,
1580*3ff01b23SMartin Matuskacauses TXGs to be synced so that this data can be found.
1581*3ff01b23SMartin Matuska.
1582*3ff01b23SMartin Matuska.It Sy zfs_pd_bytes_max Ns = Ns Sy 52428800 Ns B Po 50MB Pc Pq int
1583*3ff01b23SMartin MatuskaThe number of bytes which should be prefetched during a pool traversal, like
1584*3ff01b23SMartin Matuska.Nm zfs Cm send
1585*3ff01b23SMartin Matuskaor other data crawling operations.
1586*3ff01b23SMartin Matuska.
1587*3ff01b23SMartin Matuska.It Sy zfs_traverse_indirect_prefetch_limit Ns = Ns Sy 32 Pq int
1588*3ff01b23SMartin MatuskaThe number of blocks pointed by indirect (non-L0) block which should be
1589*3ff01b23SMartin Matuskaprefetched during a pool traversal, like
1590*3ff01b23SMartin Matuska.Nm zfs Cm send
1591*3ff01b23SMartin Matuskaor other data crawling operations.
1592*3ff01b23SMartin Matuska.
1593*3ff01b23SMartin Matuska.It Sy zfs_per_txg_dirty_frees_percent Ns = Ns Sy 5 Ns % Pq ulong
1594*3ff01b23SMartin MatuskaControl percentage of dirtied indirect blocks from frees allowed into one TXG.
1595*3ff01b23SMartin MatuskaAfter this threshold is crossed, additional frees will wait until the next TXG.
1596*3ff01b23SMartin Matuska.Sy 0 No disables this throttle.
1597*3ff01b23SMartin Matuska.
1598*3ff01b23SMartin Matuska.It Sy zfs_prefetch_disable Ns = Ns Sy 0 Ns | Ns 1 Pq int
1599*3ff01b23SMartin MatuskaDisable predictive prefetch.
1600*3ff01b23SMartin MatuskaNote that it leaves "prescient" prefetch (for. e.g.\&
1601*3ff01b23SMartin Matuska.Nm zfs Cm send )
1602*3ff01b23SMartin Matuskaintact.
1603*3ff01b23SMartin MatuskaUnlike predictive prefetch, prescient prefetch never issues I/O
1604*3ff01b23SMartin Matuskathat ends up not being needed, so it can't hurt performance.
1605*3ff01b23SMartin Matuska.
1606*3ff01b23SMartin Matuska.It Sy zfs_qat_checksum_disable Ns = Ns Sy 0 Ns | Ns 1 Pq int
1607*3ff01b23SMartin MatuskaDisable QAT hardware acceleration for SHA256 checksums.
1608*3ff01b23SMartin MatuskaMay be unset after the ZFS modules have been loaded to initialize the QAT
1609*3ff01b23SMartin Matuskahardware as long as support is compiled in and the QAT driver is present.
1610*3ff01b23SMartin Matuska.
1611*3ff01b23SMartin Matuska.It Sy zfs_qat_compress_disable Ns = Ns Sy 0 Ns | Ns 1 Pq int
1612*3ff01b23SMartin MatuskaDisable QAT hardware acceleration for gzip compression.
1613*3ff01b23SMartin MatuskaMay be unset after the ZFS modules have been loaded to initialize the QAT
1614*3ff01b23SMartin Matuskahardware as long as support is compiled in and the QAT driver is present.
1615*3ff01b23SMartin Matuska.
1616*3ff01b23SMartin Matuska.It Sy zfs_qat_encrypt_disable Ns = Ns Sy 0 Ns | Ns 1 Pq int
1617*3ff01b23SMartin MatuskaDisable QAT hardware acceleration for AES-GCM encryption.
1618*3ff01b23SMartin MatuskaMay be unset after the ZFS modules have been loaded to initialize the QAT
1619*3ff01b23SMartin Matuskahardware as long as support is compiled in and the QAT driver is present.
1620*3ff01b23SMartin Matuska.
1621*3ff01b23SMartin Matuska.It Sy zfs_vnops_read_chunk_size Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq long
1622*3ff01b23SMartin MatuskaBytes to read per chunk.
1623*3ff01b23SMartin Matuska.
1624*3ff01b23SMartin Matuska.It Sy zfs_read_history Ns = Ns Sy 0 Pq int
1625*3ff01b23SMartin MatuskaHistorical statistics for this many latest reads will be available in
1626*3ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/ Ns Ao Ar pool Ac Ns Pa /reads .
1627*3ff01b23SMartin Matuska.
1628*3ff01b23SMartin Matuska.It Sy zfs_read_history_hits Ns = Ns Sy 0 Ns | Ns 1 Pq int
1629*3ff01b23SMartin MatuskaInclude cache hits in read history
1630*3ff01b23SMartin Matuska.
1631*3ff01b23SMartin Matuska.It Sy zfs_rebuild_max_segment Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq ulong
1632*3ff01b23SMartin MatuskaMaximum read segment size to issue when sequentially resilvering a
1633*3ff01b23SMartin Matuskatop-level vdev.
1634*3ff01b23SMartin Matuska.
1635*3ff01b23SMartin Matuska.It Sy zfs_rebuild_scrub_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
1636*3ff01b23SMartin MatuskaAutomatically start a pool scrub when the last active sequential resilver
1637*3ff01b23SMartin Matuskacompletes in order to verify the checksums of all blocks which have been
1638*3ff01b23SMartin Matuskaresilvered.
1639*3ff01b23SMartin MatuskaThis is enabled by default and strongly recommended.
1640*3ff01b23SMartin Matuska.
1641*3ff01b23SMartin Matuska.It Sy zfs_rebuild_vdev_limit Ns = Ns Sy 33554432 Ns B Po 32MB Pc Pq ulong
1642*3ff01b23SMartin MatuskaMaximum amount of I/O that can be concurrently issued for a sequential
1643*3ff01b23SMartin Matuskaresilver per leaf device, given in bytes.
1644*3ff01b23SMartin Matuska.
1645*3ff01b23SMartin Matuska.It Sy zfs_reconstruct_indirect_combinations_max Ns = Ns Sy 4096 Pq int
1646*3ff01b23SMartin MatuskaIf an indirect split block contains more than this many possible unique
1647*3ff01b23SMartin Matuskacombinations when being reconstructed, consider it too computationally
1648*3ff01b23SMartin Matuskaexpensive to check them all.
1649*3ff01b23SMartin MatuskaInstead, try at most this many randomly selected
1650*3ff01b23SMartin Matuskacombinations each time the block is accessed.
1651*3ff01b23SMartin MatuskaThis allows all segment copies to participate fairly
1652*3ff01b23SMartin Matuskain the reconstruction when all combinations
1653*3ff01b23SMartin Matuskacannot be checked and prevents repeated use of one bad copy.
1654*3ff01b23SMartin Matuska.
1655*3ff01b23SMartin Matuska.It Sy zfs_recover Ns = Ns Sy 0 Ns | Ns 1 Pq int
1656*3ff01b23SMartin MatuskaSet to attempt to recover from fatal errors.
1657*3ff01b23SMartin MatuskaThis should only be used as a last resort,
1658*3ff01b23SMartin Matuskaas it typically results in leaked space, or worse.
1659*3ff01b23SMartin Matuska.
1660*3ff01b23SMartin Matuska.It Sy zfs_removal_ignore_errors Ns = Ns Sy 0 Ns | Ns 1 Pq int
1661*3ff01b23SMartin MatuskaIgnore hard IO errors during device removal.
1662*3ff01b23SMartin MatuskaWhen set, if a device encounters a hard IO error during the removal process
1663*3ff01b23SMartin Matuskathe removal will not be cancelled.
1664*3ff01b23SMartin MatuskaThis can result in a normally recoverable block becoming permanently damaged
1665*3ff01b23SMartin Matuskaand is hence not recommended.
1666*3ff01b23SMartin MatuskaThis should only be used as a last resort when the
1667*3ff01b23SMartin Matuskapool cannot be returned to a healthy state prior to removing the device.
1668*3ff01b23SMartin Matuska.
1669*3ff01b23SMartin Matuska.It Sy zfs_removal_suspend_progress Ns = Ns Sy 0 Ns | Ns 1 Pq int
1670*3ff01b23SMartin MatuskaThis is used by the test suite so that it can ensure that certain actions
1671*3ff01b23SMartin Matuskahappen while in the middle of a removal.
1672*3ff01b23SMartin Matuska.
1673*3ff01b23SMartin Matuska.It Sy zfs_remove_max_segment Ns = Ns Sy 16777216 Ns B Po 16MB Pc Pq int
1674*3ff01b23SMartin MatuskaThe largest contiguous segment that we will attempt to allocate when removing
1675*3ff01b23SMartin Matuskaa device.
1676*3ff01b23SMartin MatuskaIf there is a performance problem with attempting to allocate large blocks,
1677*3ff01b23SMartin Matuskaconsider decreasing this.
1678*3ff01b23SMartin MatuskaThe default value is also the maximum.
1679*3ff01b23SMartin Matuska.
1680*3ff01b23SMartin Matuska.It Sy zfs_resilver_disable_defer Ns = Ns Sy 0 Ns | Ns 1 Pq int
1681*3ff01b23SMartin MatuskaIgnore the
1682*3ff01b23SMartin Matuska.Sy resilver_defer
1683*3ff01b23SMartin Matuskafeature, causing an operation that would start a resilver to
1684*3ff01b23SMartin Matuskaimmediately restart the one in progress.
1685*3ff01b23SMartin Matuska.
1686*3ff01b23SMartin Matuska.It Sy zfs_resilver_min_time_ms Ns = Ns Sy 3000 Ns ms Po 3s Pc Pq int
1687*3ff01b23SMartin MatuskaResilvers are processed by the sync thread.
1688*3ff01b23SMartin MatuskaWhile resilvering, it will spend at least this much time
1689*3ff01b23SMartin Matuskaworking on a resilver between TXG flushes.
1690*3ff01b23SMartin Matuska.
1691*3ff01b23SMartin Matuska.It Sy zfs_scan_ignore_errors Ns = Ns Sy 0 Ns | Ns 1 Pq int
1692*3ff01b23SMartin MatuskaIf set, remove the DTL (dirty time list) upon completion of a pool scan (scrub),
1693*3ff01b23SMartin Matuskaeven if there were unrepairable errors.
1694*3ff01b23SMartin MatuskaIntended to be used during pool repair or recovery to
1695*3ff01b23SMartin Matuskastop resilvering when the pool is next imported.
1696*3ff01b23SMartin Matuska.
1697*3ff01b23SMartin Matuska.It Sy zfs_scrub_min_time_ms Ns = Ns Sy 1000 Ns ms Po 1s Pc Pq int
1698*3ff01b23SMartin MatuskaScrubs are processed by the sync thread.
1699*3ff01b23SMartin MatuskaWhile scrubbing, it will spend at least this much time
1700*3ff01b23SMartin Matuskaworking on a scrub between TXG flushes.
1701*3ff01b23SMartin Matuska.
1702*3ff01b23SMartin Matuska.It Sy zfs_scan_checkpoint_intval Ns = Ns Sy 7200 Ns s Po 2h Pc Pq int
1703*3ff01b23SMartin MatuskaTo preserve progress across reboots, the sequential scan algorithm periodically
1704*3ff01b23SMartin Matuskaneeds to stop metadata scanning and issue all the verification I/O to disk.
1705*3ff01b23SMartin MatuskaThe frequency of this flushing is determined by this tunable.
1706*3ff01b23SMartin Matuska.
1707*3ff01b23SMartin Matuska.It Sy zfs_scan_fill_weight Ns = Ns Sy 3 Pq int
1708*3ff01b23SMartin MatuskaThis tunable affects how scrub and resilver I/O segments are ordered.
1709*3ff01b23SMartin MatuskaA higher number indicates that we care more about how filled in a segment is,
1710*3ff01b23SMartin Matuskawhile a lower number indicates we care more about the size of the extent without
1711*3ff01b23SMartin Matuskaconsidering the gaps within a segment.
1712*3ff01b23SMartin MatuskaThis value is only tunable upon module insertion.
1713*3ff01b23SMartin MatuskaChanging the value afterwards will have no affect on scrub or resilver performance.
1714*3ff01b23SMartin Matuska.
1715*3ff01b23SMartin Matuska.It Sy zfs_scan_issue_strategy Ns = Ns Sy 0 Pq int
1716*3ff01b23SMartin MatuskaDetermines the order that data will be verified while scrubbing or resilvering:
1717*3ff01b23SMartin Matuska.Bl -tag -compact -offset 4n -width "a"
1718*3ff01b23SMartin Matuska.It Sy 1
1719*3ff01b23SMartin MatuskaData will be verified as sequentially as possible, given the
1720*3ff01b23SMartin Matuskaamount of memory reserved for scrubbing
1721*3ff01b23SMartin Matuska.Pq see Sy zfs_scan_mem_lim_fact .
1722*3ff01b23SMartin MatuskaThis may improve scrub performance if the pool's data is very fragmented.
1723*3ff01b23SMartin Matuska.It Sy 2
1724*3ff01b23SMartin MatuskaThe largest mostly-contiguous chunk of found data will be verified first.
1725*3ff01b23SMartin MatuskaBy deferring scrubbing of small segments, we may later find adjacent data
1726*3ff01b23SMartin Matuskato coalesce and increase the segment size.
1727*3ff01b23SMartin Matuska.It Sy 0
1728*3ff01b23SMartin Matuska.No Use strategy Sy 1 No during normal verification
1729*3ff01b23SMartin Matuska.No and strategy Sy 2 No while taking a checkpoint.
1730*3ff01b23SMartin Matuska.El
1731*3ff01b23SMartin Matuska.
1732*3ff01b23SMartin Matuska.It Sy zfs_scan_legacy Ns = Ns Sy 0 Ns | Ns 1 Pq int
1733*3ff01b23SMartin MatuskaIf unset, indicates that scrubs and resilvers will gather metadata in
1734*3ff01b23SMartin Matuskamemory before issuing sequential I/O.
1735*3ff01b23SMartin MatuskaOtherwise indicates that the legacy algorithm will be used,
1736*3ff01b23SMartin Matuskawhere I/O is initiated as soon as it is discovered.
1737*3ff01b23SMartin MatuskaUnsetting will not affect scrubs or resilvers that are already in progress.
1738*3ff01b23SMartin Matuska.
1739*3ff01b23SMartin Matuska.It Sy zfs_scan_max_ext_gap Ns = Ns Sy 2097152 Ns B Po 2MB Pc Pq int
1740*3ff01b23SMartin MatuskaSets the largest gap in bytes between scrub/resilver I/O operations
1741*3ff01b23SMartin Matuskathat will still be considered sequential for sorting purposes.
1742*3ff01b23SMartin MatuskaChanging this value will not
1743*3ff01b23SMartin Matuskaaffect scrubs or resilvers that are already in progress.
1744*3ff01b23SMartin Matuska.
1745*3ff01b23SMartin Matuska.It Sy zfs_scan_mem_lim_fact Ns = Ns Sy 20 Ns ^-1 Pq int
1746*3ff01b23SMartin MatuskaMaximum fraction of RAM used for I/O sorting by sequential scan algorithm.
1747*3ff01b23SMartin MatuskaThis tunable determines the hard limit for I/O sorting memory usage.
1748*3ff01b23SMartin MatuskaWhen the hard limit is reached we stop scanning metadata and start issuing
1749*3ff01b23SMartin Matuskadata verification I/O.
1750*3ff01b23SMartin MatuskaThis is done until we get below the soft limit.
1751*3ff01b23SMartin Matuska.
1752*3ff01b23SMartin Matuska.It Sy zfs_scan_mem_lim_soft_fact Ns = Ns Sy 20 Ns ^-1 Pq int
1753*3ff01b23SMartin MatuskaThe fraction of the hard limit used to determined the soft limit for I/O sorting
1754*3ff01b23SMartin Matuskaby the sequential scan algorithm.
1755*3ff01b23SMartin MatuskaWhen we cross this limit from below no action is taken.
1756*3ff01b23SMartin MatuskaWhen we cross this limit from above it is because we are issuing verification I/O.
1757*3ff01b23SMartin MatuskaIn this case (unless the metadata scan is done) we stop issuing verification I/O
1758*3ff01b23SMartin Matuskaand start scanning metadata again until we get to the hard limit.
1759*3ff01b23SMartin Matuska.
1760*3ff01b23SMartin Matuska.It Sy zfs_scan_strict_mem_lim Ns = Ns Sy 0 Ns | Ns 1 Pq int
1761*3ff01b23SMartin MatuskaEnforce tight memory limits on pool scans when a sequential scan is in progress.
1762*3ff01b23SMartin MatuskaWhen disabled, the memory limit may be exceeded by fast disks.
1763*3ff01b23SMartin Matuska.
1764*3ff01b23SMartin Matuska.It Sy zfs_scan_suspend_progress Ns = Ns Sy 0 Ns | Ns 1 Pq int
1765*3ff01b23SMartin MatuskaFreezes a scrub/resilver in progress without actually pausing it.
1766*3ff01b23SMartin MatuskaIntended for testing/debugging.
1767*3ff01b23SMartin Matuska.
1768*3ff01b23SMartin Matuska.It Sy zfs_scan_vdev_limit Ns = Ns Sy 4194304 Ns B Po 4MB Pc Pq int
1769*3ff01b23SMartin MatuskaMaximum amount of data that can be concurrently issued at once for scrubs and
1770*3ff01b23SMartin Matuskaresilvers per leaf device, given in bytes.
1771*3ff01b23SMartin Matuska.
1772*3ff01b23SMartin Matuska.It Sy zfs_send_corrupt_data Ns = Ns Sy 0 Ns | Ns 1 Pq int
1773*3ff01b23SMartin MatuskaAllow sending of corrupt data (ignore read/checksum errors when sending).
1774*3ff01b23SMartin Matuska.
1775*3ff01b23SMartin Matuska.It Sy zfs_send_unmodified_spill_blocks Ns = Ns Sy 1 Ns | Ns 0 Pq int
1776*3ff01b23SMartin MatuskaInclude unmodified spill blocks in the send stream.
1777*3ff01b23SMartin MatuskaUnder certain circumstances, previous versions of ZFS could incorrectly
1778*3ff01b23SMartin Matuskaremove the spill block from an existing object.
1779*3ff01b23SMartin MatuskaIncluding unmodified copies of the spill blocks creates a backwards-compatible
1780*3ff01b23SMartin Matuskastream which will recreate a spill block if it was incorrectly removed.
1781*3ff01b23SMartin Matuska.
1782*3ff01b23SMartin Matuska.It Sy zfs_send_no_prefetch_queue_ff Ns = Ns Sy 20 Ns ^-1 Pq int
1783*3ff01b23SMartin MatuskaThe fill fraction of the
1784*3ff01b23SMartin Matuska.Nm zfs Cm send
1785*3ff01b23SMartin Matuskainternal queues.
1786*3ff01b23SMartin MatuskaThe fill fraction controls the timing with which internal threads are woken up.
1787*3ff01b23SMartin Matuska.
1788*3ff01b23SMartin Matuska.It Sy zfs_send_no_prefetch_queue_length Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq int
1789*3ff01b23SMartin MatuskaThe maximum number of bytes allowed in
1790*3ff01b23SMartin Matuska.Nm zfs Cm send Ns 's
1791*3ff01b23SMartin Matuskainternal queues.
1792*3ff01b23SMartin Matuska.
1793*3ff01b23SMartin Matuska.It Sy zfs_send_queue_ff Ns = Ns Sy 20 Ns ^-1 Pq int
1794*3ff01b23SMartin MatuskaThe fill fraction of the
1795*3ff01b23SMartin Matuska.Nm zfs Cm send
1796*3ff01b23SMartin Matuskaprefetch queue.
1797*3ff01b23SMartin MatuskaThe fill fraction controls the timing with which internal threads are woken up.
1798*3ff01b23SMartin Matuska.
1799*3ff01b23SMartin Matuska.It Sy zfs_send_queue_length Ns = Ns Sy 16777216 Ns B Po 16MB Pc Pq int
1800*3ff01b23SMartin MatuskaThe maximum number of bytes allowed that will be prefetched by
1801*3ff01b23SMartin Matuska.Nm zfs Cm send .
1802*3ff01b23SMartin MatuskaThis value must be at least twice the maximum block size in use.
1803*3ff01b23SMartin Matuska.
1804*3ff01b23SMartin Matuska.It Sy zfs_recv_queue_ff Ns = Ns Sy 20 Ns ^-1 Pq int
1805*3ff01b23SMartin MatuskaThe fill fraction of the
1806*3ff01b23SMartin Matuska.Nm zfs Cm receive
1807*3ff01b23SMartin Matuskaqueue.
1808*3ff01b23SMartin MatuskaThe fill fraction controls the timing with which internal threads are woken up.
1809*3ff01b23SMartin Matuska.
1810*3ff01b23SMartin Matuska.It Sy zfs_recv_queue_length Ns = Ns Sy 16777216 Ns B Po 16MB Pc Pq int
1811*3ff01b23SMartin MatuskaThe maximum number of bytes allowed in the
1812*3ff01b23SMartin Matuska.Nm zfs Cm receive
1813*3ff01b23SMartin Matuskaqueue.
1814*3ff01b23SMartin MatuskaThis value must be at least twice the maximum block size in use.
1815*3ff01b23SMartin Matuska.
1816*3ff01b23SMartin Matuska.It Sy zfs_recv_write_batch_size Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq int
1817*3ff01b23SMartin MatuskaThe maximum amount of data, in bytes, that
1818*3ff01b23SMartin Matuska.Nm zfs Cm receive
1819*3ff01b23SMartin Matuskawill write in one DMU transaction.
1820*3ff01b23SMartin MatuskaThis is the uncompressed size, even when receiving a compressed send stream.
1821*3ff01b23SMartin MatuskaThis setting will not reduce the write size below a single block.
1822*3ff01b23SMartin MatuskaCapped at a maximum of
1823*3ff01b23SMartin Matuska.Sy 32MB .
1824*3ff01b23SMartin Matuska.
1825*3ff01b23SMartin Matuska.It Sy zfs_override_estimate_recordsize Ns = Ns Sy 0 Ns | Ns 1 Pq ulong
1826*3ff01b23SMartin MatuskaSetting this variable overrides the default logic for estimating block
1827*3ff01b23SMartin Matuskasizes when doing a
1828*3ff01b23SMartin Matuska.Nm zfs Cm send .
1829*3ff01b23SMartin MatuskaThe default heuristic is that the average block size
1830*3ff01b23SMartin Matuskawill be the current recordsize.
1831*3ff01b23SMartin MatuskaOverride this value if most data in your dataset is not of that size
1832*3ff01b23SMartin Matuskaand you require accurate zfs send size estimates.
1833*3ff01b23SMartin Matuska.
1834*3ff01b23SMartin Matuska.It Sy zfs_sync_pass_deferred_free Ns = Ns Sy 2 Pq int
1835*3ff01b23SMartin MatuskaFlushing of data to disk is done in passes.
1836*3ff01b23SMartin MatuskaDefer frees starting in this pass.
1837*3ff01b23SMartin Matuska.
1838*3ff01b23SMartin Matuska.It Sy zfs_spa_discard_memory_limit Ns = Ns Sy 16777216 Ns B Po 16MB Pc Pq int
1839*3ff01b23SMartin MatuskaMaximum memory used for prefetching a checkpoint's space map on each
1840*3ff01b23SMartin Matuskavdev while discarding the checkpoint.
1841*3ff01b23SMartin Matuska.
1842*3ff01b23SMartin Matuska.It Sy zfs_special_class_metadata_reserve_pct Ns = Ns Sy 25 Ns % Pq int
1843*3ff01b23SMartin MatuskaOnly allow small data blocks to be allocated on the special and dedup vdev
1844*3ff01b23SMartin Matuskatypes when the available free space percentage on these vdevs exceeds this value.
1845*3ff01b23SMartin MatuskaThis ensures reserved space is available for pool metadata as the
1846*3ff01b23SMartin Matuskaspecial vdevs approach capacity.
1847*3ff01b23SMartin Matuska.
1848*3ff01b23SMartin Matuska.It Sy zfs_sync_pass_dont_compress Ns = Ns Sy 8 Pq int
1849*3ff01b23SMartin MatuskaStarting in this sync pass, disable compression (including of metadata).
1850*3ff01b23SMartin MatuskaWith the default setting, in practice, we don't have this many sync passes,
1851*3ff01b23SMartin Matuskaso this has no effect.
1852*3ff01b23SMartin Matuska.Pp
1853*3ff01b23SMartin MatuskaThe original intent was that disabling compression would help the sync passes
1854*3ff01b23SMartin Matuskato converge.
1855*3ff01b23SMartin MatuskaHowever, in practice, disabling compression increases
1856*3ff01b23SMartin Matuskathe average number of sync passes; because when we turn compression off,
1857*3ff01b23SMartin Matuskamany blocks' size will change, and thus we have to re-allocate
1858*3ff01b23SMartin Matuska(not overwrite) them.
1859*3ff01b23SMartin MatuskaIt also increases the number of
1860*3ff01b23SMartin Matuska.Em 128kB
1861*3ff01b23SMartin Matuskaallocations (e.g. for indirect blocks and spacemaps)
1862*3ff01b23SMartin Matuskabecause these will not be compressed.
1863*3ff01b23SMartin MatuskaThe
1864*3ff01b23SMartin Matuska.Em 128kB
1865*3ff01b23SMartin Matuskaallocations are especially detrimental to performance
1866*3ff01b23SMartin Matuskaon highly fragmented systems, which may have very few free segments of this size,
1867*3ff01b23SMartin Matuskaand may need to load new metaslabs to satisfy these allocations.
1868*3ff01b23SMartin Matuska.
1869*3ff01b23SMartin Matuska.It Sy zfs_sync_pass_rewrite Ns = Ns Sy 2 Pq int
1870*3ff01b23SMartin MatuskaRewrite new block pointers starting in this pass.
1871*3ff01b23SMartin Matuska.
1872*3ff01b23SMartin Matuska.It Sy zfs_sync_taskq_batch_pct Ns = Ns Sy 75 Ns % Pq int
1873*3ff01b23SMartin MatuskaThis controls the number of threads used by
1874*3ff01b23SMartin Matuska.Sy dp_sync_taskq .
1875*3ff01b23SMartin MatuskaThe default value of
1876*3ff01b23SMartin Matuska.Sy 75%
1877*3ff01b23SMartin Matuskawill create a maximum of one thread per CPU.
1878*3ff01b23SMartin Matuska.
1879*3ff01b23SMartin Matuska.It Sy zfs_trim_extent_bytes_max Ns = Ns Sy 134217728 Ns B Po 128MB Pc Pq uint
1880*3ff01b23SMartin MatuskaMaximum size of TRIM command.
1881*3ff01b23SMartin MatuskaLarger ranges will be split into chunks no larger than this value before issuing.
1882*3ff01b23SMartin Matuska.
1883*3ff01b23SMartin Matuska.It Sy zfs_trim_extent_bytes_min Ns = Ns Sy 32768 Ns B Po 32kB Pc Pq uint
1884*3ff01b23SMartin MatuskaMinimum size of TRIM commands.
1885*3ff01b23SMartin MatuskaTRIM ranges smaller than this will be skipped,
1886*3ff01b23SMartin Matuskaunless they're part of a larger range which was chunked.
1887*3ff01b23SMartin MatuskaThis is done because it's common for these small TRIMs
1888*3ff01b23SMartin Matuskato negatively impact overall performance.
1889*3ff01b23SMartin Matuska.
1890*3ff01b23SMartin Matuska.It Sy zfs_trim_metaslab_skip Ns = Ns Sy 0 Ns | Ns 1 Pq uint
1891*3ff01b23SMartin MatuskaSkip uninitialized metaslabs during the TRIM process.
1892*3ff01b23SMartin MatuskaThis option is useful for pools constructed from large thinly-provisioned devices
1893*3ff01b23SMartin Matuskawhere TRIM operations are slow.
1894*3ff01b23SMartin MatuskaAs a pool ages, an increasing fraction of the pool's metaslabs
1895*3ff01b23SMartin Matuskawill be initialized, progressively degrading the usefulness of this option.
1896*3ff01b23SMartin MatuskaThis setting is stored when starting a manual TRIM and will
1897*3ff01b23SMartin Matuskapersist for the duration of the requested TRIM.
1898*3ff01b23SMartin Matuska.
1899*3ff01b23SMartin Matuska.It Sy zfs_trim_queue_limit Ns = Ns Sy 10 Pq uint
1900*3ff01b23SMartin MatuskaMaximum number of queued TRIMs outstanding per leaf vdev.
1901*3ff01b23SMartin MatuskaThe number of concurrent TRIM commands issued to the device is controlled by
1902*3ff01b23SMartin Matuska.Sy zfs_vdev_trim_min_active No and Sy zfs_vdev_trim_max_active .
1903*3ff01b23SMartin Matuska.
1904*3ff01b23SMartin Matuska.It Sy zfs_trim_txg_batch Ns = Ns Sy 32 Pq uint
1905*3ff01b23SMartin MatuskaThe number of transaction groups' worth of frees which should be aggregated
1906*3ff01b23SMartin Matuskabefore TRIM operations are issued to the device.
1907*3ff01b23SMartin MatuskaThis setting represents a trade-off between issuing larger,
1908*3ff01b23SMartin Matuskamore efficient TRIM operations and the delay
1909*3ff01b23SMartin Matuskabefore the recently trimmed space is available for use by the device.
1910*3ff01b23SMartin Matuska.Pp
1911*3ff01b23SMartin MatuskaIncreasing this value will allow frees to be aggregated for a longer time.
1912*3ff01b23SMartin MatuskaThis will result is larger TRIM operations and potentially increased memory usage.
1913*3ff01b23SMartin MatuskaDecreasing this value will have the opposite effect.
1914*3ff01b23SMartin MatuskaThe default of
1915*3ff01b23SMartin Matuska.Sy 32
1916*3ff01b23SMartin Matuskawas determined to be a reasonable compromise.
1917*3ff01b23SMartin Matuska.
1918*3ff01b23SMartin Matuska.It Sy zfs_txg_history Ns = Ns Sy 0 Pq int
1919*3ff01b23SMartin MatuskaHistorical statistics for this many latest TXGs will be available in
1920*3ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/ Ns Ao Ar pool Ac Ns Pa /TXGs .
1921*3ff01b23SMartin Matuska.
1922*3ff01b23SMartin Matuska.It Sy zfs_txg_timeout Ns = Ns Sy 5 Ns s Pq int
1923*3ff01b23SMartin MatuskaFlush dirty data to disk at least every this many seconds (maximum TXG duration).
1924*3ff01b23SMartin Matuska.
1925*3ff01b23SMartin Matuska.It Sy zfs_vdev_aggregate_trim Ns = Ns Sy 0 Ns | Ns 1 Pq int
1926*3ff01b23SMartin MatuskaAllow TRIM I/Os to be aggregated.
1927*3ff01b23SMartin MatuskaThis is normally not helpful because the extents to be trimmed
1928*3ff01b23SMartin Matuskawill have been already been aggregated by the metaslab.
1929*3ff01b23SMartin MatuskaThis option is provided for debugging and performance analysis.
1930*3ff01b23SMartin Matuska.
1931*3ff01b23SMartin Matuska.It Sy zfs_vdev_aggregation_limit Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq int
1932*3ff01b23SMartin MatuskaMax vdev I/O aggregation size.
1933*3ff01b23SMartin Matuska.
1934*3ff01b23SMartin Matuska.It Sy zfs_vdev_aggregation_limit_non_rotating Ns = Ns Sy 131072 Ns B Po 128kB Pc Pq int
1935*3ff01b23SMartin MatuskaMax vdev I/O aggregation size for non-rotating media.
1936*3ff01b23SMartin Matuska.
1937*3ff01b23SMartin Matuska.It Sy zfs_vdev_cache_bshift Ns = Ns Sy 16 Po 64kB Pc Pq int
1938*3ff01b23SMartin MatuskaShift size to inflate reads to.
1939*3ff01b23SMartin Matuska.
1940*3ff01b23SMartin Matuska.It Sy zfs_vdev_cache_max Ns = Ns Sy 16384 Ns B Po 16kB Pc Pq int
1941*3ff01b23SMartin MatuskaInflate reads smaller than this value to meet the
1942*3ff01b23SMartin Matuska.Sy zfs_vdev_cache_bshift
1943*3ff01b23SMartin Matuskasize
1944*3ff01b23SMartin Matuska.Pq default Sy 64kB .
1945*3ff01b23SMartin Matuska.
1946*3ff01b23SMartin Matuska.It Sy zfs_vdev_cache_size Ns = Ns Sy 0 Pq int
1947*3ff01b23SMartin MatuskaTotal size of the per-disk cache in bytes.
1948*3ff01b23SMartin Matuska.Pp
1949*3ff01b23SMartin MatuskaCurrently this feature is disabled, as it has been found to not be helpful
1950*3ff01b23SMartin Matuskafor performance and in some cases harmful.
1951*3ff01b23SMartin Matuska.
1952*3ff01b23SMartin Matuska.It Sy zfs_vdev_mirror_rotating_inc Ns = Ns Sy 0 Pq int
1953*3ff01b23SMartin MatuskaA number by which the balancing algorithm increments the load calculation for
1954*3ff01b23SMartin Matuskathe purpose of selecting the least busy mirror member when an I/O operation
1955*3ff01b23SMartin Matuskaimmediately follows its predecessor on rotational vdevs
1956*3ff01b23SMartin Matuskafor the purpose of making decisions based on load.
1957*3ff01b23SMartin Matuska.
1958*3ff01b23SMartin Matuska.It Sy zfs_vdev_mirror_rotating_seek_inc Ns = Ns Sy 5 Pq int
1959*3ff01b23SMartin MatuskaA number by which the balancing algorithm increments the load calculation for
1960*3ff01b23SMartin Matuskathe purpose of selecting the least busy mirror member when an I/O operation
1961*3ff01b23SMartin Matuskalacks locality as defined by
1962*3ff01b23SMartin Matuska.Sy zfs_vdev_mirror_rotating_seek_offset .
1963*3ff01b23SMartin MatuskaOperations within this that are not immediately following the previous operation
1964*3ff01b23SMartin Matuskaare incremented by half.
1965*3ff01b23SMartin Matuska.
1966*3ff01b23SMartin Matuska.It Sy zfs_vdev_mirror_rotating_seek_offset Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq int
1967*3ff01b23SMartin MatuskaThe maximum distance for the last queued I/O operation in which
1968*3ff01b23SMartin Matuskathe balancing algorithm considers an operation to have locality.
1969*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER .
1970*3ff01b23SMartin Matuska.
1971*3ff01b23SMartin Matuska.It Sy zfs_vdev_mirror_non_rotating_inc Ns = Ns Sy 0 Pq int
1972*3ff01b23SMartin MatuskaA number by which the balancing algorithm increments the load calculation for
1973*3ff01b23SMartin Matuskathe purpose of selecting the least busy mirror member on non-rotational vdevs
1974*3ff01b23SMartin Matuskawhen I/O operations do not immediately follow one another.
1975*3ff01b23SMartin Matuska.
1976*3ff01b23SMartin Matuska.It Sy zfs_vdev_mirror_non_rotating_seek_inc Ns = Ns Sy 1 Pq int
1977*3ff01b23SMartin MatuskaA number by which the balancing algorithm increments the load calculation for
1978*3ff01b23SMartin Matuskathe purpose of selecting the least busy mirror member when an I/O operation lacks
1979*3ff01b23SMartin Matuskalocality as defined by the
1980*3ff01b23SMartin Matuska.Sy zfs_vdev_mirror_rotating_seek_offset .
1981*3ff01b23SMartin MatuskaOperations within this that are not immediately following the previous operation
1982*3ff01b23SMartin Matuskaare incremented by half.
1983*3ff01b23SMartin Matuska.
1984*3ff01b23SMartin Matuska.It Sy zfs_vdev_read_gap_limit Ns = Ns Sy 32768 Ns B Po 32kB Pc Pq int
1985*3ff01b23SMartin MatuskaAggregate read I/O operations if the on-disk gap between them is within this
1986*3ff01b23SMartin Matuskathreshold.
1987*3ff01b23SMartin Matuska.
1988*3ff01b23SMartin Matuska.It Sy zfs_vdev_write_gap_limit Ns = Ns Sy 4096 Ns B Po 4kB Pc Pq int
1989*3ff01b23SMartin MatuskaAggregate write I/O operations if the on-disk gap between them is within this
1990*3ff01b23SMartin Matuskathreshold.
1991*3ff01b23SMartin Matuska.
1992*3ff01b23SMartin Matuska.It Sy zfs_vdev_raidz_impl Ns = Ns Sy fastest Pq string
1993*3ff01b23SMartin MatuskaSelect the raidz parity implementation to use.
1994*3ff01b23SMartin Matuska.Pp
1995*3ff01b23SMartin MatuskaVariants that don't depend on CPU-specific features
1996*3ff01b23SMartin Matuskamay be selected on module load, as they are supported on all systems.
1997*3ff01b23SMartin MatuskaThe remaining options may only be set after the module is loaded,
1998*3ff01b23SMartin Matuskaas they are available only if the implementations are compiled in
1999*3ff01b23SMartin Matuskaand supported on the running system.
2000*3ff01b23SMartin Matuska.Pp
2001*3ff01b23SMartin MatuskaOnce the module is loaded,
2002*3ff01b23SMartin Matuska.Pa /sys/module/zfs/parameters/zfs_vdev_raidz_impl
2003*3ff01b23SMartin Matuskawill show the available options,
2004*3ff01b23SMartin Matuskawith the currently selected one enclosed in square brackets.
2005*3ff01b23SMartin Matuska.Pp
2006*3ff01b23SMartin Matuska.TS
2007*3ff01b23SMartin Matuskalb l l .
2008*3ff01b23SMartin Matuskafastest	selected by built-in benchmark
2009*3ff01b23SMartin Matuskaoriginal	original implementation
2010*3ff01b23SMartin Matuskascalar	scalar implementation
2011*3ff01b23SMartin Matuskasse2	SSE2 instruction set	64-bit x86
2012*3ff01b23SMartin Matuskassse3	SSSE3 instruction set	64-bit x86
2013*3ff01b23SMartin Matuskaavx2	AVX2 instruction set	64-bit x86
2014*3ff01b23SMartin Matuskaavx512f	AVX512F instruction set	64-bit x86
2015*3ff01b23SMartin Matuskaavx512bw	AVX512F & AVX512BW instruction sets	64-bit x86
2016*3ff01b23SMartin Matuskaaarch64_neon	NEON	Aarch64/64-bit ARMv8
2017*3ff01b23SMartin Matuskaaarch64_neonx2	NEON with more unrolling	Aarch64/64-bit ARMv8
2018*3ff01b23SMartin Matuskapowerpc_altivec	Altivec	PowerPC
2019*3ff01b23SMartin Matuska.TE
2020*3ff01b23SMartin Matuska.
2021*3ff01b23SMartin Matuska.It Sy zfs_vdev_scheduler Pq charp
2022*3ff01b23SMartin Matuska.Sy DEPRECATED .
2023*3ff01b23SMartin MatuskaPrints warning to kernel log for compatiblity.
2024*3ff01b23SMartin Matuska.
2025*3ff01b23SMartin Matuska.It Sy zfs_zevent_len_max Ns = Ns Sy 512 Pq int
2026*3ff01b23SMartin MatuskaMax event queue length.
2027*3ff01b23SMartin MatuskaEvents in the queue can be viewed with
2028*3ff01b23SMartin Matuska.Xr zpool-events 8 .
2029*3ff01b23SMartin Matuska.
2030*3ff01b23SMartin Matuska.It Sy zfs_zevent_retain_max Ns = Ns Sy 2000 Pq int
2031*3ff01b23SMartin MatuskaMaximum recent zevent records to retain for duplicate checking.
2032*3ff01b23SMartin MatuskaSetting this to
2033*3ff01b23SMartin Matuska.Sy 0
2034*3ff01b23SMartin Matuskadisables duplicate detection.
2035*3ff01b23SMartin Matuska.
2036*3ff01b23SMartin Matuska.It Sy zfs_zevent_retain_expire_secs Ns = Ns Sy 900 Ns s Po 15min Pc Pq int
2037*3ff01b23SMartin MatuskaLifespan for a recent ereport that was retained for duplicate checking.
2038*3ff01b23SMartin Matuska.
2039*3ff01b23SMartin Matuska.It Sy zfs_zil_clean_taskq_maxalloc Ns = Ns Sy 1048576 Pq int
2040*3ff01b23SMartin MatuskaThe maximum number of taskq entries that are allowed to be cached.
2041*3ff01b23SMartin MatuskaWhen this limit is exceeded transaction records (itxs)
2042*3ff01b23SMartin Matuskawill be cleaned synchronously.
2043*3ff01b23SMartin Matuska.
2044*3ff01b23SMartin Matuska.It Sy zfs_zil_clean_taskq_minalloc Ns = Ns Sy 1024 Pq int
2045*3ff01b23SMartin MatuskaThe number of taskq entries that are pre-populated when the taskq is first
2046*3ff01b23SMartin Matuskacreated and are immediately available for use.
2047*3ff01b23SMartin Matuska.
2048*3ff01b23SMartin Matuska.It Sy zfs_zil_clean_taskq_nthr_pct Ns = Ns Sy 100 Ns % Pq int
2049*3ff01b23SMartin MatuskaThis controls the number of threads used by
2050*3ff01b23SMartin Matuska.Sy dp_zil_clean_taskq .
2051*3ff01b23SMartin MatuskaThe default value of
2052*3ff01b23SMartin Matuska.Sy 100%
2053*3ff01b23SMartin Matuskawill create a maximum of one thread per cpu.
2054*3ff01b23SMartin Matuska.
2055*3ff01b23SMartin Matuska.It Sy zil_maxblocksize Ns = Ns Sy 131072 Ns B Po 128kB Pc Pq int
2056*3ff01b23SMartin MatuskaThis sets the maximum block size used by the ZIL.
2057*3ff01b23SMartin MatuskaOn very fragmented pools, lowering this
2058*3ff01b23SMartin Matuska.Pq typically to Sy 36kB
2059*3ff01b23SMartin Matuskacan improve performance.
2060*3ff01b23SMartin Matuska.
2061*3ff01b23SMartin Matuska.It Sy zil_nocacheflush Ns = Ns Sy 0 Ns | Ns 1 Pq int
2062*3ff01b23SMartin MatuskaDisable the cache flush commands that are normally sent to disk by
2063*3ff01b23SMartin Matuskathe ZIL after an LWB write has completed.
2064*3ff01b23SMartin MatuskaSetting this will cause ZIL corruption on power loss
2065*3ff01b23SMartin Matuskaif a volatile out-of-order write cache is enabled.
2066*3ff01b23SMartin Matuska.
2067*3ff01b23SMartin Matuska.It Sy zil_replay_disable Ns = Ns Sy 0 Ns | Ns 1 Pq int
2068*3ff01b23SMartin MatuskaDisable intent logging replay.
2069*3ff01b23SMartin MatuskaCan be disabled for recovery from corrupted ZIL.
2070*3ff01b23SMartin Matuska.
2071*3ff01b23SMartin Matuska.It Sy zil_slog_bulk Ns = Ns Sy 786432 Ns B Po 768kB Pc Pq ulong
2072*3ff01b23SMartin MatuskaLimit SLOG write size per commit executed with synchronous priority.
2073*3ff01b23SMartin MatuskaAny writes above that will be executed with lower (asynchronous) priority
2074*3ff01b23SMartin Matuskato limit potential SLOG device abuse by single active ZIL writer.
2075*3ff01b23SMartin Matuska.
2076*3ff01b23SMartin Matuska.It Sy zfs_embedded_slog_min_ms Ns = Ns Sy 64  Pq int
2077*3ff01b23SMartin MatuskaUsually, one metaslab from each normal-class vdev is dedicated for use by
2078*3ff01b23SMartin Matuskathe ZIL to log synchronous writes.
2079*3ff01b23SMartin MatuskaHowever, if there are fewer than
2080*3ff01b23SMartin Matuska.Sy zfs_embedded_slog_min_ms
2081*3ff01b23SMartin Matuskametaslabs in the vdev, this functionality is disabled.
2082*3ff01b23SMartin MatuskaThis ensures that we don't set aside an unreasonable amount of space for the ZIL.
2083*3ff01b23SMartin Matuska.
2084*3ff01b23SMartin Matuska.It Sy zio_deadman_log_all Ns = Ns Sy 0 Ns | Ns 1 Pq int
2085*3ff01b23SMartin MatuskaIf non-zero, the zio deadman will produce debugging messages
2086*3ff01b23SMartin Matuska.Pq see Sy zfs_dbgmsg_enable
2087*3ff01b23SMartin Matuskafor all zios, rather than only for leaf zios possessing a vdev.
2088*3ff01b23SMartin MatuskaThis is meant to be used by developers to gain
2089*3ff01b23SMartin Matuskadiagnostic information for hang conditions which don't involve a mutex
2090*3ff01b23SMartin Matuskaor other locking primitive: typically conditions in which a thread in
2091*3ff01b23SMartin Matuskathe zio pipeline is looping indefinitely.
2092*3ff01b23SMartin Matuska.
2093*3ff01b23SMartin Matuska.It Sy zio_slow_io_ms Ns = Ns Sy 30000 Ns ms Po 30s Pc Pq int
2094*3ff01b23SMartin MatuskaWhen an I/O operation takes more than this much time to complete,
2095*3ff01b23SMartin Matuskait's marked as slow.
2096*3ff01b23SMartin MatuskaEach slow operation causes a delay zevent.
2097*3ff01b23SMartin MatuskaSlow I/O counters can be seen with
2098*3ff01b23SMartin Matuska.Nm zpool Cm status Fl s .
2099*3ff01b23SMartin Matuska.
2100*3ff01b23SMartin Matuska.It Sy zio_dva_throttle_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int
2101*3ff01b23SMartin MatuskaThrottle block allocations in the I/O pipeline.
2102*3ff01b23SMartin MatuskaThis allows for dynamic allocation distribution when devices are imbalanced.
2103*3ff01b23SMartin MatuskaWhen enabled, the maximum number of pending allocations per top-level vdev
2104*3ff01b23SMartin Matuskais limited by
2105*3ff01b23SMartin Matuska.Sy zfs_vdev_queue_depth_pct .
2106*3ff01b23SMartin Matuska.
2107*3ff01b23SMartin Matuska.It Sy zio_requeue_io_start_cut_in_line Ns = Ns Sy 0 Ns | Ns 1 Pq int
2108*3ff01b23SMartin MatuskaPrioritize requeued I/O.
2109*3ff01b23SMartin Matuska.
2110*3ff01b23SMartin Matuska.It Sy zio_taskq_batch_pct Ns = Ns Sy 80 Ns % Pq uint
2111*3ff01b23SMartin MatuskaPercentage of online CPUs which will run a worker thread for I/O.
2112*3ff01b23SMartin MatuskaThese workers are responsible for I/O work such as compression and
2113*3ff01b23SMartin Matuskachecksum calculations.
2114*3ff01b23SMartin MatuskaFractional number of CPUs will be rounded down.
2115*3ff01b23SMartin Matuska.Pp
2116*3ff01b23SMartin MatuskaThe default value of
2117*3ff01b23SMartin Matuska.Sy 80%
2118*3ff01b23SMartin Matuskawas chosen to avoid using all CPUs which can result in
2119*3ff01b23SMartin Matuskalatency issues and inconsistent application performance,
2120*3ff01b23SMartin Matuskaespecially when slower compression and/or checksumming is enabled.
2121*3ff01b23SMartin Matuska.
2122*3ff01b23SMartin Matuska.It Sy zio_taskq_batch_tpq Ns = Ns Sy 0 Pq uint
2123*3ff01b23SMartin MatuskaNumber of worker threads per taskq.
2124*3ff01b23SMartin MatuskaLower values improve I/O ordering and CPU utilization,
2125*3ff01b23SMartin Matuskawhile higher reduces lock contention.
2126*3ff01b23SMartin Matuska.Pp
2127*3ff01b23SMartin MatuskaIf
2128*3ff01b23SMartin Matuska.Sy 0 ,
2129*3ff01b23SMartin Matuskagenerate a system-dependent value close to 6 threads per taskq.
2130*3ff01b23SMartin Matuska.
2131*3ff01b23SMartin Matuska.It Sy zvol_inhibit_dev Ns = Ns Sy 0 Ns | Ns 1 Pq uint
2132*3ff01b23SMartin MatuskaDo not create zvol device nodes.
2133*3ff01b23SMartin MatuskaThis may slightly improve startup time on
2134*3ff01b23SMartin Matuskasystems with a very large number of zvols.
2135*3ff01b23SMartin Matuska.
2136*3ff01b23SMartin Matuska.It Sy zvol_major Ns = Ns Sy 230 Pq uint
2137*3ff01b23SMartin MatuskaMajor number for zvol block devices.
2138*3ff01b23SMartin Matuska.
2139*3ff01b23SMartin Matuska.It Sy zvol_max_discard_blocks Ns = Ns Sy 16384 Pq ulong
2140*3ff01b23SMartin MatuskaDiscard (TRIM) operations done on zvols will be done in batches of this
2141*3ff01b23SMartin Matuskamany blocks, where block size is determined by the
2142*3ff01b23SMartin Matuska.Sy volblocksize
2143*3ff01b23SMartin Matuskaproperty of a zvol.
2144*3ff01b23SMartin Matuska.
2145*3ff01b23SMartin Matuska.It Sy zvol_prefetch_bytes Ns = Ns Sy 131072 Ns B Po 128kB Pc Pq uint
2146*3ff01b23SMartin MatuskaWhen adding a zvol to the system, prefetch this many bytes
2147*3ff01b23SMartin Matuskafrom the start and end of the volume.
2148*3ff01b23SMartin MatuskaPrefetching these regions of the volume is desirable,
2149*3ff01b23SMartin Matuskabecause they are likely to be accessed immediately by
2150*3ff01b23SMartin Matuska.Xr blkid 8
2151*3ff01b23SMartin Matuskaor the kernel partitioner.
2152*3ff01b23SMartin Matuska.
2153*3ff01b23SMartin Matuska.It Sy zvol_request_sync Ns = Ns Sy 0 Ns | Ns 1 Pq uint
2154*3ff01b23SMartin MatuskaWhen processing I/O requests for a zvol, submit them synchronously.
2155*3ff01b23SMartin MatuskaThis effectively limits the queue depth to
2156*3ff01b23SMartin Matuska.Em 1
2157*3ff01b23SMartin Matuskafor each I/O submitter.
2158*3ff01b23SMartin MatuskaWhen unset, requests are handled asynchronously by a thread pool.
2159*3ff01b23SMartin MatuskaThe number of requests which can be handled concurrently is controlled by
2160*3ff01b23SMartin Matuska.Sy zvol_threads .
2161*3ff01b23SMartin Matuska.
2162*3ff01b23SMartin Matuska.It Sy zvol_threads Ns = Ns Sy 32 Pq uint
2163*3ff01b23SMartin MatuskaMax number of threads which can handle zvol I/O requests concurrently.
2164*3ff01b23SMartin Matuska.
2165*3ff01b23SMartin Matuska.It Sy zvol_volmode Ns = Ns Sy 1 Pq uint
2166*3ff01b23SMartin MatuskaDefines zvol block devices behaviour when
2167*3ff01b23SMartin Matuska.Sy volmode Ns = Ns Sy default :
2168*3ff01b23SMartin Matuska.Bl -tag -compact -offset 4n -width "a"
2169*3ff01b23SMartin Matuska.It Sy 1
2170*3ff01b23SMartin Matuska.No equivalent to Sy full
2171*3ff01b23SMartin Matuska.It Sy 2
2172*3ff01b23SMartin Matuska.No equivalent to Sy dev
2173*3ff01b23SMartin Matuska.It Sy 3
2174*3ff01b23SMartin Matuska.No equivalent to Sy none
2175*3ff01b23SMartin Matuska.El
2176*3ff01b23SMartin Matuska.El
2177*3ff01b23SMartin Matuska.
2178*3ff01b23SMartin Matuska.Sh ZFS I/O SCHEDULER
2179*3ff01b23SMartin MatuskaZFS issues I/O operations to leaf vdevs to satisfy and complete I/O operations.
2180*3ff01b23SMartin MatuskaThe scheduler determines when and in what order those operations are issued.
2181*3ff01b23SMartin MatuskaThe scheduler divides operations into five I/O classes,
2182*3ff01b23SMartin Matuskaprioritized in the following order: sync read, sync write, async read,
2183*3ff01b23SMartin Matuskaasync write, and scrub/resilver.
2184*3ff01b23SMartin MatuskaEach queue defines the minimum and maximum number of concurrent operations
2185*3ff01b23SMartin Matuskathat may be issued to the device.
2186*3ff01b23SMartin MatuskaIn addition, the device has an aggregate maximum,
2187*3ff01b23SMartin Matuska.Sy zfs_vdev_max_active .
2188*3ff01b23SMartin MatuskaNote that the sum of the per-queue minima must not exceed the aggregate maximum.
2189*3ff01b23SMartin MatuskaIf the sum of the per-queue maxima exceeds the aggregate maximum,
2190*3ff01b23SMartin Matuskathen the number of active operations may reach
2191*3ff01b23SMartin Matuska.Sy zfs_vdev_max_active ,
2192*3ff01b23SMartin Matuskain which case no further operations will be issued,
2193*3ff01b23SMartin Matuskaregardless of whether all per-queue minima have been met.
2194*3ff01b23SMartin Matuska.Pp
2195*3ff01b23SMartin MatuskaFor many physical devices, throughput increases with the number of
2196*3ff01b23SMartin Matuskaconcurrent operations, but latency typically suffers.
2197*3ff01b23SMartin MatuskaFurthermore, physical devices typically have a limit
2198*3ff01b23SMartin Matuskaat which more concurrent operations have no
2199*3ff01b23SMartin Matuskaeffect on throughput or can actually cause it to decrease.
2200*3ff01b23SMartin Matuska.Pp
2201*3ff01b23SMartin MatuskaThe scheduler selects the next operation to issue by first looking for an
2202*3ff01b23SMartin MatuskaI/O class whose minimum has not been satisfied.
2203*3ff01b23SMartin MatuskaOnce all are satisfied and the aggregate maximum has not been hit,
2204*3ff01b23SMartin Matuskathe scheduler looks for classes whose maximum has not been satisfied.
2205*3ff01b23SMartin MatuskaIteration through the I/O classes is done in the order specified above.
2206*3ff01b23SMartin MatuskaNo further operations are issued
2207*3ff01b23SMartin Matuskaif the aggregate maximum number of concurrent operations has been hit,
2208*3ff01b23SMartin Matuskaor if there are no operations queued for an I/O class that has not hit its maximum.
2209*3ff01b23SMartin MatuskaEvery time an I/O operation is queued or an operation completes,
2210*3ff01b23SMartin Matuskathe scheduler looks for new operations to issue.
2211*3ff01b23SMartin Matuska.Pp
2212*3ff01b23SMartin MatuskaIn general, smaller
2213*3ff01b23SMartin Matuska.Sy max_active Ns s
2214*3ff01b23SMartin Matuskawill lead to lower latency of synchronous operations.
2215*3ff01b23SMartin MatuskaLarger
2216*3ff01b23SMartin Matuska.Sy max_active Ns s
2217*3ff01b23SMartin Matuskamay lead to higher overall throughput, depending on underlying storage.
2218*3ff01b23SMartin Matuska.Pp
2219*3ff01b23SMartin MatuskaThe ratio of the queues'
2220*3ff01b23SMartin Matuska.Sy max_active Ns s
2221*3ff01b23SMartin Matuskadetermines the balance of performance between reads, writes, and scrubs.
2222*3ff01b23SMartin MatuskaFor example, increasing
2223*3ff01b23SMartin Matuska.Sy zfs_vdev_scrub_max_active
2224*3ff01b23SMartin Matuskawill cause the scrub or resilver to complete more quickly,
2225*3ff01b23SMartin Matuskabut reads and writes to have higher latency and lower throughput.
2226*3ff01b23SMartin Matuska.Pp
2227*3ff01b23SMartin MatuskaAll I/O classes have a fixed maximum number of outstanding operations,
2228*3ff01b23SMartin Matuskaexcept for the async write class.
2229*3ff01b23SMartin MatuskaAsynchronous writes represent the data that is committed to stable storage
2230*3ff01b23SMartin Matuskaduring the syncing stage for transaction groups.
2231*3ff01b23SMartin MatuskaTransaction groups enter the syncing state periodically,
2232*3ff01b23SMartin Matuskaso the number of queued async writes will quickly burst up
2233*3ff01b23SMartin Matuskaand then bleed down to zero.
2234*3ff01b23SMartin MatuskaRather than servicing them as quickly as possible,
2235*3ff01b23SMartin Matuskathe I/O scheduler changes the maximum number of active async write operations
2236*3ff01b23SMartin Matuskaaccording to the amount of dirty data in the pool.
2237*3ff01b23SMartin MatuskaSince both throughput and latency typically increase with the number of
2238*3ff01b23SMartin Matuskaconcurrent operations issued to physical devices, reducing the
2239*3ff01b23SMartin Matuskaburstiness in the number of concurrent operations also stabilizes the
2240*3ff01b23SMartin Matuskaresponse time of operations from other – and in particular synchronous – queues.
2241*3ff01b23SMartin MatuskaIn broad strokes, the I/O scheduler will issue more concurrent operations
2242*3ff01b23SMartin Matuskafrom the async write queue as there's more dirty data in the pool.
2243*3ff01b23SMartin Matuska.
2244*3ff01b23SMartin Matuska.Ss Async Writes
2245*3ff01b23SMartin MatuskaThe number of concurrent operations issued for the async write I/O class
2246*3ff01b23SMartin Matuskafollows a piece-wise linear function defined by a few adjustable points:
2247*3ff01b23SMartin Matuska.Bd -literal
2248*3ff01b23SMartin Matuska       |              o---------| <-- \fBzfs_vdev_async_write_max_active\fP
2249*3ff01b23SMartin Matuska  ^    |             /^         |
2250*3ff01b23SMartin Matuska  |    |            / |         |
2251*3ff01b23SMartin Matuskaactive |           /  |         |
2252*3ff01b23SMartin Matuska I/O   |          /   |         |
2253*3ff01b23SMartin Matuskacount  |         /    |         |
2254*3ff01b23SMartin Matuska       |        /     |         |
2255*3ff01b23SMartin Matuska       |-------o      |         | <-- \fBzfs_vdev_async_write_min_active\fP
2256*3ff01b23SMartin Matuska      0|_______^______|_________|
2257*3ff01b23SMartin Matuska       0%      |      |       100% of \fBzfs_dirty_data_max\fP
2258*3ff01b23SMartin Matuska               |      |
2259*3ff01b23SMartin Matuska               |      `-- \fBzfs_vdev_async_write_active_max_dirty_percent\fP
2260*3ff01b23SMartin Matuska               `--------- \fBzfs_vdev_async_write_active_min_dirty_percent\fP
2261*3ff01b23SMartin Matuska.Ed
2262*3ff01b23SMartin Matuska.Pp
2263*3ff01b23SMartin MatuskaUntil the amount of dirty data exceeds a minimum percentage of the dirty
2264*3ff01b23SMartin Matuskadata allowed in the pool, the I/O scheduler will limit the number of
2265*3ff01b23SMartin Matuskaconcurrent operations to the minimum.
2266*3ff01b23SMartin MatuskaAs that threshold is crossed, the number of concurrent operations issued
2267*3ff01b23SMartin Matuskaincreases linearly to the maximum at the specified maximum percentage
2268*3ff01b23SMartin Matuskaof the dirty data allowed in the pool.
2269*3ff01b23SMartin Matuska.Pp
2270*3ff01b23SMartin MatuskaIdeally, the amount of dirty data on a busy pool will stay in the sloped
2271*3ff01b23SMartin Matuskapart of the function between
2272*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_active_min_dirty_percent
2273*3ff01b23SMartin Matuskaand
2274*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_active_max_dirty_percent .
2275*3ff01b23SMartin MatuskaIf it exceeds the maximum percentage,
2276*3ff01b23SMartin Matuskathis indicates that the rate of incoming data is
2277*3ff01b23SMartin Matuskagreater than the rate that the backend storage can handle.
2278*3ff01b23SMartin MatuskaIn this case, we must further throttle incoming writes,
2279*3ff01b23SMartin Matuskaas described in the next section.
2280*3ff01b23SMartin Matuska.
2281*3ff01b23SMartin Matuska.Sh ZFS TRANSACTION DELAY
2282*3ff01b23SMartin MatuskaWe delay transactions when we've determined that the backend storage
2283*3ff01b23SMartin Matuskaisn't able to accommodate the rate of incoming writes.
2284*3ff01b23SMartin Matuska.Pp
2285*3ff01b23SMartin MatuskaIf there is already a transaction waiting, we delay relative to when
2286*3ff01b23SMartin Matuskathat transaction will finish waiting.
2287*3ff01b23SMartin MatuskaThis way the calculated delay time
2288*3ff01b23SMartin Matuskais independent of the number of threads concurrently executing transactions.
2289*3ff01b23SMartin Matuska.Pp
2290*3ff01b23SMartin MatuskaIf we are the only waiter, wait relative to when the transaction started,
2291*3ff01b23SMartin Matuskarather than the current time.
2292*3ff01b23SMartin MatuskaThis credits the transaction for "time already served",
2293*3ff01b23SMartin Matuskae.g. reading indirect blocks.
2294*3ff01b23SMartin Matuska.Pp
2295*3ff01b23SMartin MatuskaThe minimum time for a transaction to take is calculated as
2296*3ff01b23SMartin Matuska.Dl min_time = min( Ns Sy zfs_delay_scale No * (dirty - min) / (max - dirty), 100ms)
2297*3ff01b23SMartin Matuska.Pp
2298*3ff01b23SMartin MatuskaThe delay has two degrees of freedom that can be adjusted via tunables.
2299*3ff01b23SMartin MatuskaThe percentage of dirty data at which we start to delay is defined by
2300*3ff01b23SMartin Matuska.Sy zfs_delay_min_dirty_percent .
2301*3ff01b23SMartin MatuskaThis should typically be at or above
2302*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_active_max_dirty_percent ,
2303*3ff01b23SMartin Matuskaso that we only start to delay after writing at full speed
2304*3ff01b23SMartin Matuskahas failed to keep up with the incoming write rate.
2305*3ff01b23SMartin MatuskaThe scale of the curve is defined by
2306*3ff01b23SMartin Matuska.Sy zfs_delay_scale .
2307*3ff01b23SMartin MatuskaRoughly speaking, this variable determines the amount of delay at the midpoint of the curve.
2308*3ff01b23SMartin Matuska.Bd -literal
2309*3ff01b23SMartin Matuskadelay
2310*3ff01b23SMartin Matuska 10ms +-------------------------------------------------------------*+
2311*3ff01b23SMartin Matuska      |                                                             *|
2312*3ff01b23SMartin Matuska  9ms +                                                             *+
2313*3ff01b23SMartin Matuska      |                                                             *|
2314*3ff01b23SMartin Matuska  8ms +                                                             *+
2315*3ff01b23SMartin Matuska      |                                                            * |
2316*3ff01b23SMartin Matuska  7ms +                                                            * +
2317*3ff01b23SMartin Matuska      |                                                            * |
2318*3ff01b23SMartin Matuska  6ms +                                                            * +
2319*3ff01b23SMartin Matuska      |                                                            * |
2320*3ff01b23SMartin Matuska  5ms +                                                           *  +
2321*3ff01b23SMartin Matuska      |                                                           *  |
2322*3ff01b23SMartin Matuska  4ms +                                                           *  +
2323*3ff01b23SMartin Matuska      |                                                           *  |
2324*3ff01b23SMartin Matuska  3ms +                                                          *   +
2325*3ff01b23SMartin Matuska      |                                                          *   |
2326*3ff01b23SMartin Matuska  2ms +                                              (midpoint) *    +
2327*3ff01b23SMartin Matuska      |                                                  |    **     |
2328*3ff01b23SMartin Matuska  1ms +                                                  v ***       +
2329*3ff01b23SMartin Matuska      |             \fBzfs_delay_scale\fP ---------->     ********         |
2330*3ff01b23SMartin Matuska    0 +-------------------------------------*********----------------+
2331*3ff01b23SMartin Matuska      0%                    <- \fBzfs_dirty_data_max\fP ->               100%
2332*3ff01b23SMartin Matuska.Ed
2333*3ff01b23SMartin Matuska.Pp
2334*3ff01b23SMartin MatuskaNote, that since the delay is added to the outstanding time remaining on the
2335*3ff01b23SMartin Matuskamost recent transaction it's effectively the inverse of IOPS.
2336*3ff01b23SMartin MatuskaHere, the midpoint of
2337*3ff01b23SMartin Matuska.Em 500us
2338*3ff01b23SMartin Matuskatranslates to
2339*3ff01b23SMartin Matuska.Em 2000 IOPS .
2340*3ff01b23SMartin MatuskaThe shape of the curve
2341*3ff01b23SMartin Matuskawas chosen such that small changes in the amount of accumulated dirty data
2342*3ff01b23SMartin Matuskain the first three quarters of the curve yield relatively small differences
2343*3ff01b23SMartin Matuskain the amount of delay.
2344*3ff01b23SMartin Matuska.Pp
2345*3ff01b23SMartin MatuskaThe effects can be easier to understand when the amount of delay is
2346*3ff01b23SMartin Matuskarepresented on a logarithmic scale:
2347*3ff01b23SMartin Matuska.Bd -literal
2348*3ff01b23SMartin Matuskadelay
2349*3ff01b23SMartin Matuska100ms +-------------------------------------------------------------++
2350*3ff01b23SMartin Matuska      +                                                              +
2351*3ff01b23SMartin Matuska      |                                                              |
2352*3ff01b23SMartin Matuska      +                                                             *+
2353*3ff01b23SMartin Matuska 10ms +                                                             *+
2354*3ff01b23SMartin Matuska      +                                                           ** +
2355*3ff01b23SMartin Matuska      |                                              (midpoint)  **  |
2356*3ff01b23SMartin Matuska      +                                                  |     **    +
2357*3ff01b23SMartin Matuska  1ms +                                                  v ****      +
2358*3ff01b23SMartin Matuska      +             \fBzfs_delay_scale\fP ---------->        *****         +
2359*3ff01b23SMartin Matuska      |                                             ****             |
2360*3ff01b23SMartin Matuska      +                                          ****                +
2361*3ff01b23SMartin Matuska100us +                                        **                    +
2362*3ff01b23SMartin Matuska      +                                       *                      +
2363*3ff01b23SMartin Matuska      |                                      *                       |
2364*3ff01b23SMartin Matuska      +                                     *                        +
2365*3ff01b23SMartin Matuska 10us +                                     *                        +
2366*3ff01b23SMartin Matuska      +                                                              +
2367*3ff01b23SMartin Matuska      |                                                              |
2368*3ff01b23SMartin Matuska      +                                                              +
2369*3ff01b23SMartin Matuska      +--------------------------------------------------------------+
2370*3ff01b23SMartin Matuska      0%                    <- \fBzfs_dirty_data_max\fP ->               100%
2371*3ff01b23SMartin Matuska.Ed
2372*3ff01b23SMartin Matuska.Pp
2373*3ff01b23SMartin MatuskaNote here that only as the amount of dirty data approaches its limit does
2374*3ff01b23SMartin Matuskathe delay start to increase rapidly.
2375*3ff01b23SMartin MatuskaThe goal of a properly tuned system should be to keep the amount of dirty data
2376*3ff01b23SMartin Matuskaout of that range by first ensuring that the appropriate limits are set
2377*3ff01b23SMartin Matuskafor the I/O scheduler to reach optimal throughput on the back-end storage,
2378*3ff01b23SMartin Matuskaand then by changing the value of
2379*3ff01b23SMartin Matuska.Sy zfs_delay_scale
2380*3ff01b23SMartin Matuskato increase the steepness of the curve.
2381