1*3ff01b23SMartin Matuska.\" 2*3ff01b23SMartin Matuska.\" Copyright (c) 2013 by Turbo Fredriksson <turbo@bayour.com>. All rights reserved. 3*3ff01b23SMartin Matuska.\" Copyright (c) 2019, 2021 by Delphix. All rights reserved. 4*3ff01b23SMartin Matuska.\" Copyright (c) 2019 Datto Inc. 5*3ff01b23SMartin Matuska.\" The contents of this file are subject to the terms of the Common Development 6*3ff01b23SMartin Matuska.\" and Distribution License (the "License"). You may not use this file except 7*3ff01b23SMartin Matuska.\" in compliance with the License. You can obtain a copy of the license at 8*3ff01b23SMartin Matuska.\" usr/src/OPENSOLARIS.LICENSE or http://www.opensolaris.org/os/licensing. 9*3ff01b23SMartin Matuska.\" 10*3ff01b23SMartin Matuska.\" See the License for the specific language governing permissions and 11*3ff01b23SMartin Matuska.\" limitations under the License. When distributing Covered Code, include this 12*3ff01b23SMartin Matuska.\" CDDL HEADER in each file and include the License file at 13*3ff01b23SMartin Matuska.\" usr/src/OPENSOLARIS.LICENSE. If applicable, add the following below this 14*3ff01b23SMartin Matuska.\" CDDL HEADER, with the fields enclosed by brackets "[]" replaced with your 15*3ff01b23SMartin Matuska.\" own identifying information: 16*3ff01b23SMartin Matuska.\" Portions Copyright [yyyy] [name of copyright owner] 17*3ff01b23SMartin Matuska.\" 18*3ff01b23SMartin Matuska.Dd June 1, 2021 19*3ff01b23SMartin Matuska.Dt ZFS 4 20*3ff01b23SMartin Matuska.Os 21*3ff01b23SMartin Matuska. 22*3ff01b23SMartin Matuska.Sh NAME 23*3ff01b23SMartin Matuska.Nm zfs 24*3ff01b23SMartin Matuska.Nd tuning of the ZFS kernel module 25*3ff01b23SMartin Matuska. 26*3ff01b23SMartin Matuska.Sh DESCRIPTION 27*3ff01b23SMartin MatuskaThe ZFS module supports these parameters: 28*3ff01b23SMartin Matuska.Bl -tag -width Ds 29*3ff01b23SMartin Matuska.It Sy dbuf_cache_max_bytes Ns = Ns Sy ULONG_MAX Ns B Pq ulong 30*3ff01b23SMartin MatuskaMaximum size in bytes of the dbuf cache. 31*3ff01b23SMartin MatuskaThe target size is determined by the MIN versus 32*3ff01b23SMartin Matuska.No 1/2^ Ns Sy dbuf_cache_shift Pq 1/32nd 33*3ff01b23SMartin Matuskaof the target ARC size. 34*3ff01b23SMartin MatuskaThe behavior of the dbuf cache and its associated settings 35*3ff01b23SMartin Matuskacan be observed via the 36*3ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/dbufstats 37*3ff01b23SMartin Matuskakstat. 38*3ff01b23SMartin Matuska. 39*3ff01b23SMartin Matuska.It Sy dbuf_metadata_cache_max_bytes Ns = Ns Sy ULONG_MAX Ns B Pq ulong 40*3ff01b23SMartin MatuskaMaximum size in bytes of the metadata dbuf cache. 41*3ff01b23SMartin MatuskaThe target size is determined by the MIN versus 42*3ff01b23SMartin Matuska.No 1/2^ Ns Sy dbuf_metadata_cache_shift Pq 1/64th 43*3ff01b23SMartin Matuskaof the target ARC size. 44*3ff01b23SMartin MatuskaThe behavior of the metadata dbuf cache and its associated settings 45*3ff01b23SMartin Matuskacan be observed via the 46*3ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/dbufstats 47*3ff01b23SMartin Matuskakstat. 48*3ff01b23SMartin Matuska. 49*3ff01b23SMartin Matuska.It Sy dbuf_cache_hiwater_pct Ns = Ns Sy 10 Ns % Pq uint 50*3ff01b23SMartin MatuskaThe percentage over 51*3ff01b23SMartin Matuska.Sy dbuf_cache_max_bytes 52*3ff01b23SMartin Matuskawhen dbufs must be evicted directly. 53*3ff01b23SMartin Matuska. 54*3ff01b23SMartin Matuska.It Sy dbuf_cache_lowater_pct Ns = Ns Sy 10 Ns % Pq uint 55*3ff01b23SMartin MatuskaThe percentage below 56*3ff01b23SMartin Matuska.Sy dbuf_cache_max_bytes 57*3ff01b23SMartin Matuskawhen the evict thread stops evicting dbufs. 58*3ff01b23SMartin Matuska. 59*3ff01b23SMartin Matuska.It Sy dbuf_cache_shift Ns = Ns Sy 5 Pq int 60*3ff01b23SMartin MatuskaSet the size of the dbuf cache 61*3ff01b23SMartin Matuska.Pq Sy dbuf_cache_max_bytes 62*3ff01b23SMartin Matuskato a log2 fraction of the target ARC size. 63*3ff01b23SMartin Matuska. 64*3ff01b23SMartin Matuska.It Sy dbuf_metadata_cache_shift Ns = Ns Sy 6 Pq int 65*3ff01b23SMartin MatuskaSet the size of the dbuf metadata cache 66*3ff01b23SMartin Matuska.Pq Sy dbuf_metadata_cache_max_bytes 67*3ff01b23SMartin Matuskato a log2 fraction of the target ARC size. 68*3ff01b23SMartin Matuska. 69*3ff01b23SMartin Matuska.It Sy dmu_object_alloc_chunk_shift Ns = Ns Sy 7 Po 128 Pc Pq int 70*3ff01b23SMartin Matuskadnode slots allocated in a single operation as a power of 2. 71*3ff01b23SMartin MatuskaThe default value minimizes lock contention for the bulk operation performed. 72*3ff01b23SMartin Matuska. 73*3ff01b23SMartin Matuska.It Sy dmu_prefetch_max Ns = Ns Sy 134217728 Ns B Po 128MB Pc Pq int 74*3ff01b23SMartin MatuskaLimit the amount we can prefetch with one call to this amount in bytes. 75*3ff01b23SMartin MatuskaThis helps to limit the amount of memory that can be used by prefetching. 76*3ff01b23SMartin Matuska. 77*3ff01b23SMartin Matuska.It Sy ignore_hole_birth Pq int 78*3ff01b23SMartin MatuskaAlias for 79*3ff01b23SMartin Matuska.Sy send_holes_without_birth_time . 80*3ff01b23SMartin Matuska. 81*3ff01b23SMartin Matuska.It Sy l2arc_feed_again Ns = Ns Sy 1 Ns | Ns 0 Pq int 82*3ff01b23SMartin MatuskaTurbo L2ARC warm-up. 83*3ff01b23SMartin MatuskaWhen the L2ARC is cold the fill interval will be set as fast as possible. 84*3ff01b23SMartin Matuska. 85*3ff01b23SMartin Matuska.It Sy l2arc_feed_min_ms Ns = Ns Sy 200 Pq ulong 86*3ff01b23SMartin MatuskaMin feed interval in milliseconds. 87*3ff01b23SMartin MatuskaRequires 88*3ff01b23SMartin Matuska.Sy l2arc_feed_again Ns = Ns Ar 1 89*3ff01b23SMartin Matuskaand only applicable in related situations. 90*3ff01b23SMartin Matuska. 91*3ff01b23SMartin Matuska.It Sy l2arc_feed_secs Ns = Ns Sy 1 Pq ulong 92*3ff01b23SMartin MatuskaSeconds between L2ARC writing. 93*3ff01b23SMartin Matuska. 94*3ff01b23SMartin Matuska.It Sy l2arc_headroom Ns = Ns Sy 2 Pq ulong 95*3ff01b23SMartin MatuskaHow far through the ARC lists to search for L2ARC cacheable content, 96*3ff01b23SMartin Matuskaexpressed as a multiplier of 97*3ff01b23SMartin Matuska.Sy l2arc_write_max . 98*3ff01b23SMartin MatuskaARC persistence across reboots can be achieved with persistent L2ARC 99*3ff01b23SMartin Matuskaby setting this parameter to 100*3ff01b23SMartin Matuska.Sy 0 , 101*3ff01b23SMartin Matuskaallowing the full length of ARC lists to be searched for cacheable content. 102*3ff01b23SMartin Matuska. 103*3ff01b23SMartin Matuska.It Sy l2arc_headroom_boost Ns = Ns Sy 200 Ns % Pq ulong 104*3ff01b23SMartin MatuskaScales 105*3ff01b23SMartin Matuska.Sy l2arc_headroom 106*3ff01b23SMartin Matuskaby this percentage when L2ARC contents are being successfully compressed 107*3ff01b23SMartin Matuskabefore writing. 108*3ff01b23SMartin MatuskaA value of 109*3ff01b23SMartin Matuska.Sy 100 110*3ff01b23SMartin Matuskadisables this feature. 111*3ff01b23SMartin Matuska. 112*3ff01b23SMartin Matuska.It Sy l2arc_mfuonly Ns = Ns Sy 0 Ns | Ns 1 Pq int 113*3ff01b23SMartin MatuskaControls whether only MFU metadata and data are cached from ARC into L2ARC. 114*3ff01b23SMartin MatuskaThis may be desired to avoid wasting space on L2ARC when reading/writing large 115*3ff01b23SMartin Matuskaamounts of data that are not expected to be accessed more than once. 116*3ff01b23SMartin Matuska.Pp 117*3ff01b23SMartin MatuskaThe default is off, 118*3ff01b23SMartin Matuskameaning both MRU and MFU data and metadata are cached. 119*3ff01b23SMartin MatuskaWhen turning off this feature, some MRU buffers will still be present 120*3ff01b23SMartin Matuskain ARC and eventually cached on L2ARC. 121*3ff01b23SMartin Matuska.No If Sy l2arc_noprefetch Ns = Ns Sy 0 , 122*3ff01b23SMartin Matuskasome prefetched buffers will be cached to L2ARC, and those might later 123*3ff01b23SMartin Matuskatransition to MRU, in which case the 124*3ff01b23SMartin Matuska.Sy l2arc_mru_asize No arcstat will not be Sy 0 . 125*3ff01b23SMartin Matuska.Pp 126*3ff01b23SMartin MatuskaRegardless of 127*3ff01b23SMartin Matuska.Sy l2arc_noprefetch , 128*3ff01b23SMartin Matuskasome MFU buffers might be evicted from ARC, 129*3ff01b23SMartin Matuskaaccessed later on as prefetches and transition to MRU as prefetches. 130*3ff01b23SMartin MatuskaIf accessed again they are counted as MRU and the 131*3ff01b23SMartin Matuska.Sy l2arc_mru_asize No arcstat will not be Sy 0 . 132*3ff01b23SMartin Matuska.Pp 133*3ff01b23SMartin MatuskaThe ARC status of L2ARC buffers when they were first cached in 134*3ff01b23SMartin MatuskaL2ARC can be seen in the 135*3ff01b23SMartin Matuska.Sy l2arc_mru_asize , Sy l2arc_mfu_asize , No and Sy l2arc_prefetch_asize 136*3ff01b23SMartin Matuskaarcstats when importing the pool or onlining a cache 137*3ff01b23SMartin Matuskadevice if persistent L2ARC is enabled. 138*3ff01b23SMartin Matuska.Pp 139*3ff01b23SMartin MatuskaThe 140*3ff01b23SMartin Matuska.Sy evict_l2_eligible_mru 141*3ff01b23SMartin Matuskaarcstat does not take into account if this option is enabled as the information 142*3ff01b23SMartin Matuskaprovided by the 143*3ff01b23SMartin Matuska.Sy evict_l2_eligible_m[rf]u 144*3ff01b23SMartin Matuskaarcstats can be used to decide if toggling this option is appropriate 145*3ff01b23SMartin Matuskafor the current workload. 146*3ff01b23SMartin Matuska. 147*3ff01b23SMartin Matuska.It Sy l2arc_meta_percent Ns = Ns Sy 33 Ns % Pq int 148*3ff01b23SMartin MatuskaPercent of ARC size allowed for L2ARC-only headers. 149*3ff01b23SMartin MatuskaSince L2ARC buffers are not evicted on memory pressure, 150*3ff01b23SMartin Matuskatoo many headers on a system with an irrationally large L2ARC 151*3ff01b23SMartin Matuskacan render it slow or unusable. 152*3ff01b23SMartin MatuskaThis parameter limits L2ARC writes and rebuilds to achieve the target. 153*3ff01b23SMartin Matuska. 154*3ff01b23SMartin Matuska.It Sy l2arc_trim_ahead Ns = Ns Sy 0 Ns % Pq ulong 155*3ff01b23SMartin MatuskaTrims ahead of the current write size 156*3ff01b23SMartin Matuska.Pq Sy l2arc_write_max 157*3ff01b23SMartin Matuskaon L2ARC devices by this percentage of write size if we have filled the device. 158*3ff01b23SMartin MatuskaIf set to 159*3ff01b23SMartin Matuska.Sy 100 160*3ff01b23SMartin Matuskawe TRIM twice the space required to accommodate upcoming writes. 161*3ff01b23SMartin MatuskaA minimum of 162*3ff01b23SMartin Matuska.Sy 64MB 163*3ff01b23SMartin Matuskawill be trimmed. 164*3ff01b23SMartin MatuskaIt also enables TRIM of the whole L2ARC device upon creation 165*3ff01b23SMartin Matuskaor addition to an existing pool or if the header of the device is 166*3ff01b23SMartin Matuskainvalid upon importing a pool or onlining a cache device. 167*3ff01b23SMartin MatuskaA value of 168*3ff01b23SMartin Matuska.Sy 0 169*3ff01b23SMartin Matuskadisables TRIM on L2ARC altogether and is the default as it can put significant 170*3ff01b23SMartin Matuskastress on the underlying storage devices. 171*3ff01b23SMartin MatuskaThis will vary depending of how well the specific device handles these commands. 172*3ff01b23SMartin Matuska. 173*3ff01b23SMartin Matuska.It Sy l2arc_noprefetch Ns = Ns Sy 1 Ns | Ns 0 Pq int 174*3ff01b23SMartin MatuskaDo not write buffers to L2ARC if they were prefetched but not used by 175*3ff01b23SMartin Matuskaapplications. 176*3ff01b23SMartin MatuskaIn case there are prefetched buffers in L2ARC and this option 177*3ff01b23SMartin Matuskais later set, we do not read the prefetched buffers from L2ARC. 178*3ff01b23SMartin MatuskaUnsetting this option is useful for caching sequential reads from the 179*3ff01b23SMartin Matuskadisks to L2ARC and serve those reads from L2ARC later on. 180*3ff01b23SMartin MatuskaThis may be beneficial in case the L2ARC device is significantly faster 181*3ff01b23SMartin Matuskain sequential reads than the disks of the pool. 182*3ff01b23SMartin Matuska.Pp 183*3ff01b23SMartin MatuskaUse 184*3ff01b23SMartin Matuska.Sy 1 185*3ff01b23SMartin Matuskato disable and 186*3ff01b23SMartin Matuska.Sy 0 187*3ff01b23SMartin Matuskato enable caching/reading prefetches to/from L2ARC. 188*3ff01b23SMartin Matuska. 189*3ff01b23SMartin Matuska.It Sy l2arc_norw Ns = Ns Sy 0 Ns | Ns 1 Pq int 190*3ff01b23SMartin MatuskaNo reads during writes. 191*3ff01b23SMartin Matuska. 192*3ff01b23SMartin Matuska.It Sy l2arc_write_boost Ns = Ns Sy 8388608 Ns B Po 8MB Pc Pq ulong 193*3ff01b23SMartin MatuskaCold L2ARC devices will have 194*3ff01b23SMartin Matuska.Sy l2arc_write_max 195*3ff01b23SMartin Matuskaincreased by this amount while they remain cold. 196*3ff01b23SMartin Matuska. 197*3ff01b23SMartin Matuska.It Sy l2arc_write_max Ns = Ns Sy 8388608 Ns B Po 8MB Pc Pq ulong 198*3ff01b23SMartin MatuskaMax write bytes per interval. 199*3ff01b23SMartin Matuska. 200*3ff01b23SMartin Matuska.It Sy l2arc_rebuild_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int 201*3ff01b23SMartin MatuskaRebuild the L2ARC when importing a pool (persistent L2ARC). 202*3ff01b23SMartin MatuskaThis can be disabled if there are problems importing a pool 203*3ff01b23SMartin Matuskaor attaching an L2ARC device (e.g. the L2ARC device is slow 204*3ff01b23SMartin Matuskain reading stored log metadata, or the metadata 205*3ff01b23SMartin Matuskahas become somehow fragmented/unusable). 206*3ff01b23SMartin Matuska. 207*3ff01b23SMartin Matuska.It Sy l2arc_rebuild_blocks_min_l2size Ns = Ns Sy 1073741824 Ns B Po 1GB Pc Pq ulong 208*3ff01b23SMartin MatuskaMininum size of an L2ARC device required in order to write log blocks in it. 209*3ff01b23SMartin MatuskaThe log blocks are used upon importing the pool to rebuild the persistent L2ARC. 210*3ff01b23SMartin Matuska.Pp 211*3ff01b23SMartin MatuskaFor L2ARC devices less than 1GB, the amount of data 212*3ff01b23SMartin Matuska.Fn l2arc_evict 213*3ff01b23SMartin Matuskaevicts is significant compared to the amount of restored L2ARC data. 214*3ff01b23SMartin MatuskaIn this case, do not write log blocks in L2ARC in order not to waste space. 215*3ff01b23SMartin Matuska. 216*3ff01b23SMartin Matuska.It Sy metaslab_aliquot Ns = Ns Sy 524288 Ns B Po 512kB Pc Pq ulong 217*3ff01b23SMartin MatuskaMetaslab granularity, in bytes. 218*3ff01b23SMartin MatuskaThis is roughly similar to what would be referred to as the "stripe size" 219*3ff01b23SMartin Matuskain traditional RAID arrays. 220*3ff01b23SMartin MatuskaIn normal operation, ZFS will try to write this amount of data 221*3ff01b23SMartin Matuskato a top-level vdev before moving on to the next one. 222*3ff01b23SMartin Matuska. 223*3ff01b23SMartin Matuska.It Sy metaslab_bias_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int 224*3ff01b23SMartin MatuskaEnable metaslab group biasing based on their vdevs' over- or under-utilization 225*3ff01b23SMartin Matuskarelative to the pool. 226*3ff01b23SMartin Matuska. 227*3ff01b23SMartin Matuska.It Sy metaslab_force_ganging Ns = Ns Sy 16777217 Ns B Ns B Po 16MB + 1B Pc Pq ulong 228*3ff01b23SMartin MatuskaMake some blocks above a certain size be gang blocks. 229*3ff01b23SMartin MatuskaThis option is used by the test suite to facilitate testing. 230*3ff01b23SMartin Matuska. 231*3ff01b23SMartin Matuska.It Sy zfs_history_output_max Ns = Ns Sy 1048576 Ns B Ns B Po 1MB Pc Pq int 232*3ff01b23SMartin MatuskaWhen attempting to log an output nvlist of an ioctl in the on-disk history, 233*3ff01b23SMartin Matuskathe output will not be stored if it is larger than this size (in bytes). 234*3ff01b23SMartin MatuskaThis must be less than 235*3ff01b23SMartin Matuska.Sy DMU_MAX_ACCESS Pq 64MB . 236*3ff01b23SMartin MatuskaThis applies primarily to 237*3ff01b23SMartin Matuska.Fn zfs_ioc_channel_program Pq cf. Xr zfs-program 8 . 238*3ff01b23SMartin Matuska. 239*3ff01b23SMartin Matuska.It Sy zfs_keep_log_spacemaps_at_export Ns = Ns Sy 0 Ns | Ns 1 Pq int 240*3ff01b23SMartin MatuskaPrevent log spacemaps from being destroyed during pool exports and destroys. 241*3ff01b23SMartin Matuska. 242*3ff01b23SMartin Matuska.It Sy zfs_metaslab_segment_weight_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int 243*3ff01b23SMartin MatuskaEnable/disable segment-based metaslab selection. 244*3ff01b23SMartin Matuska. 245*3ff01b23SMartin Matuska.It Sy zfs_metaslab_switch_threshold Ns = Ns Sy 2 Pq int 246*3ff01b23SMartin MatuskaWhen using segment-based metaslab selection, continue allocating 247*3ff01b23SMartin Matuskafrom the active metaslab until this option's 248*3ff01b23SMartin Matuskaworth of buckets have been exhausted. 249*3ff01b23SMartin Matuska. 250*3ff01b23SMartin Matuska.It Sy metaslab_debug_load Ns = Ns Sy 0 Ns | Ns 1 Pq int 251*3ff01b23SMartin MatuskaLoad all metaslabs during pool import. 252*3ff01b23SMartin Matuska. 253*3ff01b23SMartin Matuska.It Sy metaslab_debug_unload Ns = Ns Sy 0 Ns | Ns 1 Pq int 254*3ff01b23SMartin MatuskaPrevent metaslabs from being unloaded. 255*3ff01b23SMartin Matuska. 256*3ff01b23SMartin Matuska.It Sy metaslab_fragmentation_factor_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int 257*3ff01b23SMartin MatuskaEnable use of the fragmentation metric in computing metaslab weights. 258*3ff01b23SMartin Matuska. 259*3ff01b23SMartin Matuska.It Sy metaslab_df_max_search Ns = Ns Sy 16777216 Ns B Po 16MB Pc Pq int 260*3ff01b23SMartin MatuskaMaximum distance to search forward from the last offset. 261*3ff01b23SMartin MatuskaWithout this limit, fragmented pools can see 262*3ff01b23SMartin Matuska.Em >100`000 263*3ff01b23SMartin Matuskaiterations and 264*3ff01b23SMartin Matuska.Fn metaslab_block_picker 265*3ff01b23SMartin Matuskabecomes the performance limiting factor on high-performance storage. 266*3ff01b23SMartin Matuska.Pp 267*3ff01b23SMartin MatuskaWith the default setting of 268*3ff01b23SMartin Matuska.Sy 16MB , 269*3ff01b23SMartin Matuskawe typically see less than 270*3ff01b23SMartin Matuska.Em 500 271*3ff01b23SMartin Matuskaiterations, even with very fragmented 272*3ff01b23SMartin Matuska.Sy ashift Ns = Ns Sy 9 273*3ff01b23SMartin Matuskapools. 274*3ff01b23SMartin MatuskaThe maximum number of iterations possible is 275*3ff01b23SMartin Matuska.Sy metaslab_df_max_search / 2^(ashift+1) . 276*3ff01b23SMartin MatuskaWith the default setting of 277*3ff01b23SMartin Matuska.Sy 16MB 278*3ff01b23SMartin Matuskathis is 279*3ff01b23SMartin Matuska.Em 16*1024 Pq with Sy ashift Ns = Ns Sy 9 280*3ff01b23SMartin Matuskaor 281*3ff01b23SMartin Matuska.Em 2*1024 Pq with Sy ashift Ns = Ns Sy 12 . 282*3ff01b23SMartin Matuska. 283*3ff01b23SMartin Matuska.It Sy metaslab_df_use_largest_segment Ns = Ns Sy 0 Ns | Ns 1 Pq int 284*3ff01b23SMartin MatuskaIf not searching forward (due to 285*3ff01b23SMartin Matuska.Sy metaslab_df_max_search , metaslab_df_free_pct , 286*3ff01b23SMartin Matuska.No or Sy metaslab_df_alloc_threshold ) , 287*3ff01b23SMartin Matuskathis tunable controls which segment is used. 288*3ff01b23SMartin MatuskaIf set, we will use the largest free segment. 289*3ff01b23SMartin MatuskaIf unset, we will use a segment of at least the requested size. 290*3ff01b23SMartin Matuska. 291*3ff01b23SMartin Matuska.It Sy zfs_metaslab_max_size_cache_sec Ns = Ns Sy 3600 Ns s Po 1h Pc Pq ulong 292*3ff01b23SMartin MatuskaWhen we unload a metaslab, we cache the size of the largest free chunk. 293*3ff01b23SMartin MatuskaWe use that cached size to determine whether or not to load a metaslab 294*3ff01b23SMartin Matuskafor a given allocation. 295*3ff01b23SMartin MatuskaAs more frees accumulate in that metaslab while it's unloaded, 296*3ff01b23SMartin Matuskathe cached max size becomes less and less accurate. 297*3ff01b23SMartin MatuskaAfter a number of seconds controlled by this tunable, 298*3ff01b23SMartin Matuskawe stop considering the cached max size and start 299*3ff01b23SMartin Matuskaconsidering only the histogram instead. 300*3ff01b23SMartin Matuska. 301*3ff01b23SMartin Matuska.It Sy zfs_metaslab_mem_limit Ns = Ns Sy 25 Ns % Pq int 302*3ff01b23SMartin MatuskaWhen we are loading a new metaslab, we check the amount of memory being used 303*3ff01b23SMartin Matuskato store metaslab range trees. 304*3ff01b23SMartin MatuskaIf it is over a threshold, we attempt to unload the least recently used metaslab 305*3ff01b23SMartin Matuskato prevent the system from clogging all of its memory with range trees. 306*3ff01b23SMartin MatuskaThis tunable sets the percentage of total system memory that is the threshold. 307*3ff01b23SMartin Matuska. 308*3ff01b23SMartin Matuska.It Sy zfs_metaslab_try_hard_before_gang Ns = Ns Sy 0 Ns | Ns 1 Pq int 309*3ff01b23SMartin Matuska.Bl -item -compact 310*3ff01b23SMartin Matuska.It 311*3ff01b23SMartin MatuskaIf unset, we will first try normal allocation. 312*3ff01b23SMartin Matuska.It 313*3ff01b23SMartin MatuskaIf that fails then we will do a gang allocation. 314*3ff01b23SMartin Matuska.It 315*3ff01b23SMartin MatuskaIf that fails then we will do a "try hard" gang allocation. 316*3ff01b23SMartin Matuska.It 317*3ff01b23SMartin MatuskaIf that fails then we will have a multi-layer gang block. 318*3ff01b23SMartin Matuska.El 319*3ff01b23SMartin Matuska.Pp 320*3ff01b23SMartin Matuska.Bl -item -compact 321*3ff01b23SMartin Matuska.It 322*3ff01b23SMartin MatuskaIf set, we will first try normal allocation. 323*3ff01b23SMartin Matuska.It 324*3ff01b23SMartin MatuskaIf that fails then we will do a "try hard" allocation. 325*3ff01b23SMartin Matuska.It 326*3ff01b23SMartin MatuskaIf that fails we will do a gang allocation. 327*3ff01b23SMartin Matuska.It 328*3ff01b23SMartin MatuskaIf that fails we will do a "try hard" gang allocation. 329*3ff01b23SMartin Matuska.It 330*3ff01b23SMartin MatuskaIf that fails then we will have a multi-layer gang block. 331*3ff01b23SMartin Matuska.El 332*3ff01b23SMartin Matuska. 333*3ff01b23SMartin Matuska.It Sy zfs_metaslab_find_max_tries Ns = Ns Sy 100 Pq int 334*3ff01b23SMartin MatuskaWhen not trying hard, we only consider this number of the best metaslabs. 335*3ff01b23SMartin MatuskaThis improves performance, especially when there are many metaslabs per vdev 336*3ff01b23SMartin Matuskaand the allocation can't actually be satisfied 337*3ff01b23SMartin Matuska(so we would otherwise iterate all metaslabs). 338*3ff01b23SMartin Matuska. 339*3ff01b23SMartin Matuska.It Sy zfs_vdev_default_ms_count Ns = Ns Sy 200 Pq int 340*3ff01b23SMartin MatuskaWhen a vdev is added, target this number of metaslabs per top-level vdev. 341*3ff01b23SMartin Matuska. 342*3ff01b23SMartin Matuska.It Sy zfs_vdev_default_ms_shift Ns = Ns Sy 29 Po 512MB Pc Pq int 343*3ff01b23SMartin MatuskaDefault limit for metaslab size. 344*3ff01b23SMartin Matuska. 345*3ff01b23SMartin Matuska.It Sy zfs_vdev_max_auto_ashift Ns = Ns Sy ASHIFT_MAX Po 16 Pc Pq ulong 346*3ff01b23SMartin MatuskaMaximum ashift used when optimizing for logical -> physical sector size on new 347*3ff01b23SMartin Matuskatop-level vdevs. 348*3ff01b23SMartin Matuska. 349*3ff01b23SMartin Matuska.It Sy zfs_vdev_min_auto_ashift Ns = Ns Sy ASHIFT_MIN Po 9 Pc Pq ulong 350*3ff01b23SMartin MatuskaMinimum ashift used when creating new top-level vdevs. 351*3ff01b23SMartin Matuska. 352*3ff01b23SMartin Matuska.It Sy zfs_vdev_min_ms_count Ns = Ns Sy 16 Pq int 353*3ff01b23SMartin MatuskaMinimum number of metaslabs to create in a top-level vdev. 354*3ff01b23SMartin Matuska. 355*3ff01b23SMartin Matuska.It Sy vdev_validate_skip Ns = Ns Sy 0 Ns | Ns 1 Pq int 356*3ff01b23SMartin MatuskaSkip label validation steps during pool import. 357*3ff01b23SMartin MatuskaChanging is not recommended unless you know what you're doing 358*3ff01b23SMartin Matuskaand are recovering a damaged label. 359*3ff01b23SMartin Matuska. 360*3ff01b23SMartin Matuska.It Sy zfs_vdev_ms_count_limit Ns = Ns Sy 131072 Po 128k Pc Pq int 361*3ff01b23SMartin MatuskaPractical upper limit of total metaslabs per top-level vdev. 362*3ff01b23SMartin Matuska. 363*3ff01b23SMartin Matuska.It Sy metaslab_preload_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int 364*3ff01b23SMartin MatuskaEnable metaslab group preloading. 365*3ff01b23SMartin Matuska. 366*3ff01b23SMartin Matuska.It Sy metaslab_lba_weighting_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int 367*3ff01b23SMartin MatuskaGive more weight to metaslabs with lower LBAs, 368*3ff01b23SMartin Matuskaassuming they have greater bandwidth, 369*3ff01b23SMartin Matuskaas is typically the case on a modern constant angular velocity disk drive. 370*3ff01b23SMartin Matuska. 371*3ff01b23SMartin Matuska.It Sy metaslab_unload_delay Ns = Ns Sy 32 Pq int 372*3ff01b23SMartin MatuskaAfter a metaslab is used, we keep it loaded for this many TXGs, to attempt to 373*3ff01b23SMartin Matuskareduce unnecessary reloading. 374*3ff01b23SMartin MatuskaNote that both this many TXGs and 375*3ff01b23SMartin Matuska.Sy metaslab_unload_delay_ms 376*3ff01b23SMartin Matuskamilliseconds must pass before unloading will occur. 377*3ff01b23SMartin Matuska. 378*3ff01b23SMartin Matuska.It Sy metaslab_unload_delay_ms Ns = Ns Sy 600000 Ns ms Po 10min Pc Pq int 379*3ff01b23SMartin MatuskaAfter a metaslab is used, we keep it loaded for this many milliseconds, 380*3ff01b23SMartin Matuskato attempt to reduce unnecessary reloading. 381*3ff01b23SMartin MatuskaNote, that both this many milliseconds and 382*3ff01b23SMartin Matuska.Sy metaslab_unload_delay 383*3ff01b23SMartin MatuskaTXGs must pass before unloading will occur. 384*3ff01b23SMartin Matuska. 385*3ff01b23SMartin Matuska.It Sy reference_history Ns = Ns Sy 3 Pq int 386*3ff01b23SMartin MatuskaMaximum reference holders being tracked when reference_tracking_enable is active. 387*3ff01b23SMartin Matuska. 388*3ff01b23SMartin Matuska.It Sy reference_tracking_enable Ns = Ns Sy 0 Ns | Ns 1 Pq int 389*3ff01b23SMartin MatuskaTrack reference holders to 390*3ff01b23SMartin Matuska.Sy refcount_t 391*3ff01b23SMartin Matuskaobjects (debug builds only). 392*3ff01b23SMartin Matuska. 393*3ff01b23SMartin Matuska.It Sy send_holes_without_birth_time Ns = Ns Sy 1 Ns | Ns 0 Pq int 394*3ff01b23SMartin MatuskaWhen set, the 395*3ff01b23SMartin Matuska.Sy hole_birth 396*3ff01b23SMartin Matuskaoptimization will not be used, and all holes will always be sent during a 397*3ff01b23SMartin Matuska.Nm zfs Cm send . 398*3ff01b23SMartin MatuskaThis is useful if you suspect your datasets are affected by a bug in 399*3ff01b23SMartin Matuska.Sy hole_birth . 400*3ff01b23SMartin Matuska. 401*3ff01b23SMartin Matuska.It Sy spa_config_path Ns = Ns Pa /etc/zfs/zpool.cache Pq charp 402*3ff01b23SMartin MatuskaSPA config file. 403*3ff01b23SMartin Matuska. 404*3ff01b23SMartin Matuska.It Sy spa_asize_inflation Ns = Ns Sy 24 Pq int 405*3ff01b23SMartin MatuskaMultiplication factor used to estimate actual disk consumption from the 406*3ff01b23SMartin Matuskasize of data being written. 407*3ff01b23SMartin MatuskaThe default value is a worst case estimate, 408*3ff01b23SMartin Matuskabut lower values may be valid for a given pool depending on its configuration. 409*3ff01b23SMartin MatuskaPool administrators who understand the factors involved 410*3ff01b23SMartin Matuskamay wish to specify a more realistic inflation factor, 411*3ff01b23SMartin Matuskaparticularly if they operate close to quota or capacity limits. 412*3ff01b23SMartin Matuska. 413*3ff01b23SMartin Matuska.It Sy spa_load_print_vdev_tree Ns = Ns Sy 0 Ns | Ns 1 Pq int 414*3ff01b23SMartin MatuskaWhether to print the vdev tree in the debugging message buffer during pool import. 415*3ff01b23SMartin Matuska. 416*3ff01b23SMartin Matuska.It Sy spa_load_verify_data Ns = Ns Sy 1 Ns | Ns 0 Pq int 417*3ff01b23SMartin MatuskaWhether to traverse data blocks during an "extreme rewind" 418*3ff01b23SMartin Matuska.Pq Fl X 419*3ff01b23SMartin Matuskaimport. 420*3ff01b23SMartin Matuska.Pp 421*3ff01b23SMartin MatuskaAn extreme rewind import normally performs a full traversal of all 422*3ff01b23SMartin Matuskablocks in the pool for verification. 423*3ff01b23SMartin MatuskaIf this parameter is unset, the traversal skips non-metadata blocks. 424*3ff01b23SMartin MatuskaIt can be toggled once the 425*3ff01b23SMartin Matuskaimport has started to stop or start the traversal of non-metadata blocks. 426*3ff01b23SMartin Matuska. 427*3ff01b23SMartin Matuska.It Sy spa_load_verify_metadata Ns = Ns Sy 1 Ns | Ns 0 Pq int 428*3ff01b23SMartin MatuskaWhether to traverse blocks during an "extreme rewind" 429*3ff01b23SMartin Matuska.Pq Fl X 430*3ff01b23SMartin Matuskapool import. 431*3ff01b23SMartin Matuska.Pp 432*3ff01b23SMartin MatuskaAn extreme rewind import normally performs a full traversal of all 433*3ff01b23SMartin Matuskablocks in the pool for verification. 434*3ff01b23SMartin MatuskaIf this parameter is unset, the traversal is not performed. 435*3ff01b23SMartin MatuskaIt can be toggled once the import has started to stop or start the traversal. 436*3ff01b23SMartin Matuska. 437*3ff01b23SMartin Matuska.It Sy spa_load_verify_shift Ns = Ns Sy 4 Po 1/16th Pc Pq int 438*3ff01b23SMartin MatuskaSets the maximum number of bytes to consume during pool import to the log2 439*3ff01b23SMartin Matuskafraction of the target ARC size. 440*3ff01b23SMartin Matuska. 441*3ff01b23SMartin Matuska.It Sy spa_slop_shift Ns = Ns Sy 5 Po 1/32nd Pc Pq int 442*3ff01b23SMartin MatuskaNormally, we don't allow the last 443*3ff01b23SMartin Matuska.Sy 3.2% Pq Sy 1/2^spa_slop_shift 444*3ff01b23SMartin Matuskaof space in the pool to be consumed. 445*3ff01b23SMartin MatuskaThis ensures that we don't run the pool completely out of space, 446*3ff01b23SMartin Matuskadue to unaccounted changes (e.g. to the MOS). 447*3ff01b23SMartin MatuskaIt also limits the worst-case time to allocate space. 448*3ff01b23SMartin MatuskaIf we have less than this amount of free space, 449*3ff01b23SMartin Matuskamost ZPL operations (e.g. write, create) will return 450*3ff01b23SMartin Matuska.Sy ENOSPC . 451*3ff01b23SMartin Matuska. 452*3ff01b23SMartin Matuska.It Sy vdev_removal_max_span Ns = Ns Sy 32768 Ns B Po 32kB Pc Pq int 453*3ff01b23SMartin MatuskaDuring top-level vdev removal, chunks of data are copied from the vdev 454*3ff01b23SMartin Matuskawhich may include free space in order to trade bandwidth for IOPS. 455*3ff01b23SMartin MatuskaThis parameter determines the maximum span of free space, in bytes, 456*3ff01b23SMartin Matuskawhich will be included as "unnecessary" data in a chunk of copied data. 457*3ff01b23SMartin Matuska.Pp 458*3ff01b23SMartin MatuskaThe default value here was chosen to align with 459*3ff01b23SMartin Matuska.Sy zfs_vdev_read_gap_limit , 460*3ff01b23SMartin Matuskawhich is a similar concept when doing 461*3ff01b23SMartin Matuskaregular reads (but there's no reason it has to be the same). 462*3ff01b23SMartin Matuska. 463*3ff01b23SMartin Matuska.It Sy vdev_file_logical_ashift Ns = Ns Sy 9 Po 512B Pc Pq ulong 464*3ff01b23SMartin MatuskaLogical ashift for file-based devices. 465*3ff01b23SMartin Matuska. 466*3ff01b23SMartin Matuska.It Sy vdev_file_physical_ashift Ns = Ns Sy 9 Po 512B Pc Pq ulong 467*3ff01b23SMartin MatuskaPhysical ashift for file-based devices. 468*3ff01b23SMartin Matuska. 469*3ff01b23SMartin Matuska.It Sy zap_iterate_prefetch Ns = Ns Sy 1 Ns | Ns 0 Pq int 470*3ff01b23SMartin MatuskaIf set, when we start iterating over a ZAP object, 471*3ff01b23SMartin Matuskaprefetch the entire object (all leaf blocks). 472*3ff01b23SMartin MatuskaHowever, this is limited by 473*3ff01b23SMartin Matuska.Sy dmu_prefetch_max . 474*3ff01b23SMartin Matuska. 475*3ff01b23SMartin Matuska.It Sy zfetch_array_rd_sz Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq ulong 476*3ff01b23SMartin MatuskaIf prefetching is enabled, disable prefetching for reads larger than this size. 477*3ff01b23SMartin Matuska. 478*3ff01b23SMartin Matuska.It Sy zfetch_max_distance Ns = Ns Sy 8388608 Ns B Po 8MB Pc Pq uint 479*3ff01b23SMartin MatuskaMax bytes to prefetch per stream. 480*3ff01b23SMartin Matuska. 481*3ff01b23SMartin Matuska.It Sy zfetch_max_idistance Ns = Ns Sy 67108864 Ns B Po 64MB Pc Pq uint 482*3ff01b23SMartin MatuskaMax bytes to prefetch indirects for per stream. 483*3ff01b23SMartin Matuska. 484*3ff01b23SMartin Matuska.It Sy zfetch_max_streams Ns = Ns Sy 8 Pq uint 485*3ff01b23SMartin MatuskaMax number of streams per zfetch (prefetch streams per file). 486*3ff01b23SMartin Matuska. 487*3ff01b23SMartin Matuska.It Sy zfetch_min_sec_reap Ns = Ns Sy 2 Pq uint 488*3ff01b23SMartin MatuskaMin time before an active prefetch stream can be reclaimed 489*3ff01b23SMartin Matuska. 490*3ff01b23SMartin Matuska.It Sy zfs_abd_scatter_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int 491*3ff01b23SMartin MatuskaEnables ARC from using scatter/gather lists and forces all allocations to be 492*3ff01b23SMartin Matuskalinear in kernel memory. 493*3ff01b23SMartin MatuskaDisabling can improve performance in some code paths 494*3ff01b23SMartin Matuskaat the expense of fragmented kernel memory. 495*3ff01b23SMartin Matuska. 496*3ff01b23SMartin Matuska.It Sy zfs_abd_scatter_max_order Ns = Ns Sy MAX_ORDER-1 Pq uint 497*3ff01b23SMartin MatuskaMaximum number of consecutive memory pages allocated in a single block for 498*3ff01b23SMartin Matuskascatter/gather lists. 499*3ff01b23SMartin Matuska.Pp 500*3ff01b23SMartin MatuskaThe value of 501*3ff01b23SMartin Matuska.Sy MAX_ORDER 502*3ff01b23SMartin Matuskadepends on kernel configuration. 503*3ff01b23SMartin Matuska. 504*3ff01b23SMartin Matuska.It Sy zfs_abd_scatter_min_size Ns = Ns Sy 1536 Ns B Po 1.5kB Pc Pq uint 505*3ff01b23SMartin MatuskaThis is the minimum allocation size that will use scatter (page-based) ABDs. 506*3ff01b23SMartin MatuskaSmaller allocations will use linear ABDs. 507*3ff01b23SMartin Matuska. 508*3ff01b23SMartin Matuska.It Sy zfs_arc_dnode_limit Ns = Ns Sy 0 Ns B Pq ulong 509*3ff01b23SMartin MatuskaWhen the number of bytes consumed by dnodes in the ARC exceeds this number of 510*3ff01b23SMartin Matuskabytes, try to unpin some of it in response to demand for non-metadata. 511*3ff01b23SMartin MatuskaThis value acts as a ceiling to the amount of dnode metadata, and defaults to 512*3ff01b23SMartin Matuska.Sy 0 , 513*3ff01b23SMartin Matuskawhich indicates that a percent which is based on 514*3ff01b23SMartin Matuska.Sy zfs_arc_dnode_limit_percent 515*3ff01b23SMartin Matuskaof the ARC meta buffers that may be used for dnodes. 516*3ff01b23SMartin Matuska.Pp 517*3ff01b23SMartin MatuskaAlso see 518*3ff01b23SMartin Matuska.Sy zfs_arc_meta_prune 519*3ff01b23SMartin Matuskawhich serves a similar purpose but is used 520*3ff01b23SMartin Matuskawhen the amount of metadata in the ARC exceeds 521*3ff01b23SMartin Matuska.Sy zfs_arc_meta_limit 522*3ff01b23SMartin Matuskarather than in response to overall demand for non-metadata. 523*3ff01b23SMartin Matuska. 524*3ff01b23SMartin Matuska.It Sy zfs_arc_dnode_limit_percent Ns = Ns Sy 10 Ns % Pq ulong 525*3ff01b23SMartin MatuskaPercentage that can be consumed by dnodes of ARC meta buffers. 526*3ff01b23SMartin Matuska.Pp 527*3ff01b23SMartin MatuskaSee also 528*3ff01b23SMartin Matuska.Sy zfs_arc_dnode_limit , 529*3ff01b23SMartin Matuskawhich serves a similar purpose but has a higher priority if nonzero. 530*3ff01b23SMartin Matuska. 531*3ff01b23SMartin Matuska.It Sy zfs_arc_dnode_reduce_percent Ns = Ns Sy 10 Ns % Pq ulong 532*3ff01b23SMartin MatuskaPercentage of ARC dnodes to try to scan in response to demand for non-metadata 533*3ff01b23SMartin Matuskawhen the number of bytes consumed by dnodes exceeds 534*3ff01b23SMartin Matuska.Sy zfs_arc_dnode_limit . 535*3ff01b23SMartin Matuska. 536*3ff01b23SMartin Matuska.It Sy zfs_arc_average_blocksize Ns = Ns Sy 8192 Ns B Po 8kB Pc Pq int 537*3ff01b23SMartin MatuskaThe ARC's buffer hash table is sized based on the assumption of an average 538*3ff01b23SMartin Matuskablock size of this value. 539*3ff01b23SMartin MatuskaThis works out to roughly 1MB of hash table per 1GB of physical memory 540*3ff01b23SMartin Matuskawith 8-byte pointers. 541*3ff01b23SMartin MatuskaFor configurations with a known larger average block size, 542*3ff01b23SMartin Matuskathis value can be increased to reduce the memory footprint. 543*3ff01b23SMartin Matuska. 544*3ff01b23SMartin Matuska.It Sy zfs_arc_eviction_pct Ns = Ns Sy 200 Ns % Pq int 545*3ff01b23SMartin MatuskaWhen 546*3ff01b23SMartin Matuska.Fn arc_is_overflowing , 547*3ff01b23SMartin Matuska.Fn arc_get_data_impl 548*3ff01b23SMartin Matuskawaits for this percent of the requested amount of data to be evicted. 549*3ff01b23SMartin MatuskaFor example, by default, for every 550*3ff01b23SMartin Matuska.Em 2kB 551*3ff01b23SMartin Matuskathat's evicted, 552*3ff01b23SMartin Matuska.Em 1kB 553*3ff01b23SMartin Matuskaof it may be "reused" by a new allocation. 554*3ff01b23SMartin MatuskaSince this is above 555*3ff01b23SMartin Matuska.Sy 100 Ns % , 556*3ff01b23SMartin Matuskait ensures that progress is made towards getting 557*3ff01b23SMartin Matuska.Sy arc_size No under Sy arc_c . 558*3ff01b23SMartin MatuskaSince this is finite, it ensures that allocations can still happen, 559*3ff01b23SMartin Matuskaeven during the potentially long time that 560*3ff01b23SMartin Matuska.Sy arc_size No is more than Sy arc_c . 561*3ff01b23SMartin Matuska. 562*3ff01b23SMartin Matuska.It Sy zfs_arc_evict_batch_limit Ns = Ns Sy 10 Pq int 563*3ff01b23SMartin MatuskaNumber ARC headers to evict per sub-list before proceeding to another sub-list. 564*3ff01b23SMartin MatuskaThis batch-style operation prevents entire sub-lists from being evicted at once 565*3ff01b23SMartin Matuskabut comes at a cost of additional unlocking and locking. 566*3ff01b23SMartin Matuska. 567*3ff01b23SMartin Matuska.It Sy zfs_arc_grow_retry Ns = Ns Sy 0 Ns s Pq int 568*3ff01b23SMartin MatuskaIf set to a non zero value, it will replace the 569*3ff01b23SMartin Matuska.Sy arc_grow_retry 570*3ff01b23SMartin Matuskavalue with this value. 571*3ff01b23SMartin MatuskaThe 572*3ff01b23SMartin Matuska.Sy arc_grow_retry 573*3ff01b23SMartin Matuska.No value Pq default Sy 5 Ns s 574*3ff01b23SMartin Matuskais the number of seconds the ARC will wait before 575*3ff01b23SMartin Matuskatrying to resume growth after a memory pressure event. 576*3ff01b23SMartin Matuska. 577*3ff01b23SMartin Matuska.It Sy zfs_arc_lotsfree_percent Ns = Ns Sy 10 Ns % Pq int 578*3ff01b23SMartin MatuskaThrottle I/O when free system memory drops below this percentage of total 579*3ff01b23SMartin Matuskasystem memory. 580*3ff01b23SMartin MatuskaSetting this value to 581*3ff01b23SMartin Matuska.Sy 0 582*3ff01b23SMartin Matuskawill disable the throttle. 583*3ff01b23SMartin Matuska. 584*3ff01b23SMartin Matuska.It Sy zfs_arc_max Ns = Ns Sy 0 Ns B Pq ulong 585*3ff01b23SMartin MatuskaMax size of ARC in bytes. 586*3ff01b23SMartin MatuskaIf 587*3ff01b23SMartin Matuska.Sy 0 , 588*3ff01b23SMartin Matuskathen the max size of ARC is determined by the amount of system memory installed. 589*3ff01b23SMartin MatuskaUnder Linux, half of system memory will be used as the limit. 590*3ff01b23SMartin MatuskaUnder 591*3ff01b23SMartin Matuska.Fx , 592*3ff01b23SMartin Matuskathe larger of 593*3ff01b23SMartin Matuska.Sy all_system_memory - 1GB No and Sy 5/8 * all_system_memory 594*3ff01b23SMartin Matuskawill be used as the limit. 595*3ff01b23SMartin MatuskaThis value must be at least 596*3ff01b23SMartin Matuska.Sy 67108864 Ns B Pq 64MB . 597*3ff01b23SMartin Matuska.Pp 598*3ff01b23SMartin MatuskaThis value can be changed dynamically, with some caveats. 599*3ff01b23SMartin MatuskaIt cannot be set back to 600*3ff01b23SMartin Matuska.Sy 0 601*3ff01b23SMartin Matuskawhile running, and reducing it below the current ARC size will not cause 602*3ff01b23SMartin Matuskathe ARC to shrink without memory pressure to induce shrinking. 603*3ff01b23SMartin Matuska. 604*3ff01b23SMartin Matuska.It Sy zfs_arc_meta_adjust_restarts Ns = Ns Sy 4096 Pq ulong 605*3ff01b23SMartin MatuskaThe number of restart passes to make while scanning the ARC attempting 606*3ff01b23SMartin Matuskathe free buffers in order to stay below the 607*3ff01b23SMartin Matuska.Sy fs_arc_meta_limit . 608*3ff01b23SMartin MatuskaThis value should not need to be tuned but is available to facilitate 609*3ff01b23SMartin Matuskaperformance analysis. 610*3ff01b23SMartin Matuska. 611*3ff01b23SMartin Matuska.It Sy zfs_arc_meta_limit Ns = Ns Sy 0 Ns B Pq ulong 612*3ff01b23SMartin MatuskaThe maximum allowed size in bytes that metadata buffers are allowed to 613*3ff01b23SMartin Matuskaconsume in the ARC. 614*3ff01b23SMartin MatuskaWhen this limit is reached, metadata buffers will be reclaimed, 615*3ff01b23SMartin Matuskaeven if the overall 616*3ff01b23SMartin Matuska.Sy arc_c_max 617*3ff01b23SMartin Matuskahas not been reached. 618*3ff01b23SMartin MatuskaIt defaults to 619*3ff01b23SMartin Matuska.Sy 0 , 620*3ff01b23SMartin Matuskawhich indicates that a percentage based on 621*3ff01b23SMartin Matuska.Sy zfs_arc_meta_limit_percent 622*3ff01b23SMartin Matuskaof the ARC may be used for metadata. 623*3ff01b23SMartin Matuska.Pp 624*3ff01b23SMartin MatuskaThis value my be changed dynamically, except that must be set to an explicit value 625*3ff01b23SMartin Matuska.Pq cannot be set back to Sy 0 . 626*3ff01b23SMartin Matuska. 627*3ff01b23SMartin Matuska.It Sy zfs_arc_meta_limit_percent Ns = Ns Sy 75 Ns % Pq ulong 628*3ff01b23SMartin MatuskaPercentage of ARC buffers that can be used for metadata. 629*3ff01b23SMartin Matuska.Pp 630*3ff01b23SMartin MatuskaSee also 631*3ff01b23SMartin Matuska.Sy zfs_arc_meta_limit , 632*3ff01b23SMartin Matuskawhich serves a similar purpose but has a higher priority if nonzero. 633*3ff01b23SMartin Matuska. 634*3ff01b23SMartin Matuska.It Sy zfs_arc_meta_min Ns = Ns Sy 0 Ns B Pq ulong 635*3ff01b23SMartin MatuskaThe minimum allowed size in bytes that metadata buffers may consume in 636*3ff01b23SMartin Matuskathe ARC. 637*3ff01b23SMartin Matuska. 638*3ff01b23SMartin Matuska.It Sy zfs_arc_meta_prune Ns = Ns Sy 10000 Pq int 639*3ff01b23SMartin MatuskaThe number of dentries and inodes to be scanned looking for entries 640*3ff01b23SMartin Matuskawhich can be dropped. 641*3ff01b23SMartin MatuskaThis may be required when the ARC reaches the 642*3ff01b23SMartin Matuska.Sy zfs_arc_meta_limit 643*3ff01b23SMartin Matuskabecause dentries and inodes can pin buffers in the ARC. 644*3ff01b23SMartin MatuskaIncreasing this value will cause to dentry and inode caches 645*3ff01b23SMartin Matuskato be pruned more aggressively. 646*3ff01b23SMartin MatuskaSetting this value to 647*3ff01b23SMartin Matuska.Sy 0 648*3ff01b23SMartin Matuskawill disable pruning the inode and dentry caches. 649*3ff01b23SMartin Matuska. 650*3ff01b23SMartin Matuska.It Sy zfs_arc_meta_strategy Ns = Ns Sy 1 Ns | Ns 0 Pq int 651*3ff01b23SMartin MatuskaDefine the strategy for ARC metadata buffer eviction (meta reclaim strategy): 652*3ff01b23SMartin Matuska.Bl -tag -compact -offset 4n -width "0 (META_ONLY)" 653*3ff01b23SMartin Matuska.It Sy 0 Pq META_ONLY 654*3ff01b23SMartin Matuskaevict only the ARC metadata buffers 655*3ff01b23SMartin Matuska.It Sy 1 Pq BALANCED 656*3ff01b23SMartin Matuskaadditional data buffers may be evicted if required 657*3ff01b23SMartin Matuskato evict the required number of metadata buffers. 658*3ff01b23SMartin Matuska.El 659*3ff01b23SMartin Matuska. 660*3ff01b23SMartin Matuska.It Sy zfs_arc_min Ns = Ns Sy 0 Ns B Pq ulong 661*3ff01b23SMartin MatuskaMin size of ARC in bytes. 662*3ff01b23SMartin Matuska.No If set to Sy 0 , arc_c_min 663*3ff01b23SMartin Matuskawill default to consuming the larger of 664*3ff01b23SMartin Matuska.Sy 32MB No or Sy all_system_memory/32 . 665*3ff01b23SMartin Matuska. 666*3ff01b23SMartin Matuska.It Sy zfs_arc_min_prefetch_ms Ns = Ns Sy 0 Ns ms Ns Po Ns ≡ Ns 1s Pc Pq int 667*3ff01b23SMartin MatuskaMinimum time prefetched blocks are locked in the ARC. 668*3ff01b23SMartin Matuska. 669*3ff01b23SMartin Matuska.It Sy zfs_arc_min_prescient_prefetch_ms Ns = Ns Sy 0 Ns ms Ns Po Ns ≡ Ns 6s Pc Pq int 670*3ff01b23SMartin MatuskaMinimum time "prescient prefetched" blocks are locked in the ARC. 671*3ff01b23SMartin MatuskaThese blocks are meant to be prefetched fairly aggressively ahead of 672*3ff01b23SMartin Matuskathe code that may use them. 673*3ff01b23SMartin Matuska. 674*3ff01b23SMartin Matuska.It Sy zfs_max_missing_tvds Ns = Ns Sy 0 Pq int 675*3ff01b23SMartin MatuskaNumber of missing top-level vdevs which will be allowed during 676*3ff01b23SMartin Matuskapool import (only in read-only mode). 677*3ff01b23SMartin Matuska. 678*3ff01b23SMartin Matuska.It Sy zfs_max_nvlist_src_size Ns = Sy 0 Pq ulong 679*3ff01b23SMartin MatuskaMaximum size in bytes allowed to be passed as 680*3ff01b23SMartin Matuska.Sy zc_nvlist_src_size 681*3ff01b23SMartin Matuskafor ioctls on 682*3ff01b23SMartin Matuska.Pa /dev/zfs . 683*3ff01b23SMartin MatuskaThis prevents a user from causing the kernel to allocate 684*3ff01b23SMartin Matuskaan excessive amount of memory. 685*3ff01b23SMartin MatuskaWhen the limit is exceeded, the ioctl fails with 686*3ff01b23SMartin Matuska.Sy EINVAL 687*3ff01b23SMartin Matuskaand a description of the error is sent to the 688*3ff01b23SMartin Matuska.Pa zfs-dbgmsg 689*3ff01b23SMartin Matuskalog. 690*3ff01b23SMartin MatuskaThis parameter should not need to be touched under normal circumstances. 691*3ff01b23SMartin MatuskaIf 692*3ff01b23SMartin Matuska.Sy 0 , 693*3ff01b23SMartin Matuskaequivalent to a quarter of the user-wired memory limit under 694*3ff01b23SMartin Matuska.Fx 695*3ff01b23SMartin Matuskaand to 696*3ff01b23SMartin Matuska.Sy 134217728 Ns B Pq 128MB 697*3ff01b23SMartin Matuskaunder Linux. 698*3ff01b23SMartin Matuska. 699*3ff01b23SMartin Matuska.It Sy zfs_multilist_num_sublists Ns = Ns Sy 0 Pq int 700*3ff01b23SMartin MatuskaTo allow more fine-grained locking, each ARC state contains a series 701*3ff01b23SMartin Matuskaof lists for both data and metadata objects. 702*3ff01b23SMartin MatuskaLocking is performed at the level of these "sub-lists". 703*3ff01b23SMartin MatuskaThis parameters controls the number of sub-lists per ARC state, 704*3ff01b23SMartin Matuskaand also applies to other uses of the multilist data structure. 705*3ff01b23SMartin Matuska.Pp 706*3ff01b23SMartin MatuskaIf 707*3ff01b23SMartin Matuska.Sy 0 , 708*3ff01b23SMartin Matuskaequivalent to the greater of the number of online CPUs and 709*3ff01b23SMartin Matuska.Sy 4 . 710*3ff01b23SMartin Matuska. 711*3ff01b23SMartin Matuska.It Sy zfs_arc_overflow_shift Ns = Ns Sy 8 Pq int 712*3ff01b23SMartin MatuskaThe ARC size is considered to be overflowing if it exceeds the current 713*3ff01b23SMartin MatuskaARC target size 714*3ff01b23SMartin Matuska.Pq Sy arc_c 715*3ff01b23SMartin Matuskaby a threshold determined by this parameter. 716*3ff01b23SMartin MatuskaThe threshold is calculated as a fraction of 717*3ff01b23SMartin Matuska.Sy arc_c 718*3ff01b23SMartin Matuskausing the formula 719*3ff01b23SMartin Matuska.Sy arc_c >> zfs_arc_overflow_shift . 720*3ff01b23SMartin Matuska.Pp 721*3ff01b23SMartin MatuskaThe default value of 722*3ff01b23SMartin Matuska.Sy 8 723*3ff01b23SMartin Matuskacauses the ARC to be considered overflowing if it exceeds the target size by 724*3ff01b23SMartin Matuska.Em 1/256th Pq Em 0.3% 725*3ff01b23SMartin Matuskaof the target size. 726*3ff01b23SMartin Matuska.Pp 727*3ff01b23SMartin MatuskaWhen the ARC is overflowing, new buffer allocations are stalled until 728*3ff01b23SMartin Matuskathe reclaim thread catches up and the overflow condition no longer exists. 729*3ff01b23SMartin Matuska. 730*3ff01b23SMartin Matuska.It Sy zfs_arc_p_min_shift Ns = Ns Sy 0 Pq int 731*3ff01b23SMartin MatuskaIf nonzero, this will update 732*3ff01b23SMartin Matuska.Sy arc_p_min_shift Pq default Sy 4 733*3ff01b23SMartin Matuskawith the new value. 734*3ff01b23SMartin Matuska.Sy arc_p_min_shift No is used as a shift of Sy arc_c 735*3ff01b23SMartin Matuskawhen calculating the minumum 736*3ff01b23SMartin Matuska.Sy arc_p No size. 737*3ff01b23SMartin Matuska. 738*3ff01b23SMartin Matuska.It Sy zfs_arc_p_dampener_disable Ns = Ns Sy 1 Ns | Ns 0 Pq int 739*3ff01b23SMartin MatuskaDisable 740*3ff01b23SMartin Matuska.Sy arc_p 741*3ff01b23SMartin Matuskaadapt dampener, which reduces the maximum single adjustment to 742*3ff01b23SMartin Matuska.Sy arc_p . 743*3ff01b23SMartin Matuska. 744*3ff01b23SMartin Matuska.It Sy zfs_arc_shrink_shift Ns = Ns Sy 0 Pq int 745*3ff01b23SMartin MatuskaIf nonzero, this will update 746*3ff01b23SMartin Matuska.Sy arc_shrink_shift Pq default Sy 7 747*3ff01b23SMartin Matuskawith the new value. 748*3ff01b23SMartin Matuska. 749*3ff01b23SMartin Matuska.It Sy zfs_arc_pc_percent Ns = Ns Sy 0 Ns % Po off Pc Pq uint 750*3ff01b23SMartin MatuskaPercent of pagecache to reclaim ARC to. 751*3ff01b23SMartin Matuska.Pp 752*3ff01b23SMartin MatuskaThis tunable allows the ZFS ARC to play more nicely 753*3ff01b23SMartin Matuskawith the kernel's LRU pagecache. 754*3ff01b23SMartin MatuskaIt can guarantee that the ARC size won't collapse under scanning 755*3ff01b23SMartin Matuskapressure on the pagecache, yet still allows the ARC to be reclaimed down to 756*3ff01b23SMartin Matuska.Sy zfs_arc_min 757*3ff01b23SMartin Matuskaif necessary. 758*3ff01b23SMartin MatuskaThis value is specified as percent of pagecache size (as measured by 759*3ff01b23SMartin Matuska.Sy NR_FILE_PAGES ) , 760*3ff01b23SMartin Matuskawhere that percent may exceed 761*3ff01b23SMartin Matuska.Sy 100 . 762*3ff01b23SMartin MatuskaThis 763*3ff01b23SMartin Matuskaonly operates during memory pressure/reclaim. 764*3ff01b23SMartin Matuska. 765*3ff01b23SMartin Matuska.It Sy zfs_arc_shrinker_limit Ns = Ns Sy 10000 Pq int 766*3ff01b23SMartin MatuskaThis is a limit on how many pages the ARC shrinker makes available for 767*3ff01b23SMartin Matuskaeviction in response to one page allocation attempt. 768*3ff01b23SMartin MatuskaNote that in practice, the kernel's shrinker can ask us to evict 769*3ff01b23SMartin Matuskaup to about four times this for one allocation attempt. 770*3ff01b23SMartin Matuska.Pp 771*3ff01b23SMartin MatuskaThe default limit of 772*3ff01b23SMartin Matuska.Sy 10000 Pq in practice, Em 160MB No per allocation attempt with 4kB pages 773*3ff01b23SMartin Matuskalimits the amount of time spent attempting to reclaim ARC memory to 774*3ff01b23SMartin Matuskaless than 100ms per allocation attempt, 775*3ff01b23SMartin Matuskaeven with a small average compressed block size of ~8kB. 776*3ff01b23SMartin Matuska.Pp 777*3ff01b23SMartin MatuskaThe parameter can be set to 0 (zero) to disable the limit, 778*3ff01b23SMartin Matuskaand only applies on Linux. 779*3ff01b23SMartin Matuska. 780*3ff01b23SMartin Matuska.It Sy zfs_arc_sys_free Ns = Ns Sy 0 Ns B Pq ulong 781*3ff01b23SMartin MatuskaThe target number of bytes the ARC should leave as free memory on the system. 782*3ff01b23SMartin MatuskaIf zero, equivalent to the bigger of 783*3ff01b23SMartin Matuska.Sy 512kB No and Sy all_system_memory/64 . 784*3ff01b23SMartin Matuska. 785*3ff01b23SMartin Matuska.It Sy zfs_autoimport_disable Ns = Ns Sy 1 Ns | Ns 0 Pq int 786*3ff01b23SMartin MatuskaDisable pool import at module load by ignoring the cache file 787*3ff01b23SMartin Matuska.Pq Sy spa_config_path . 788*3ff01b23SMartin Matuska. 789*3ff01b23SMartin Matuska.It Sy zfs_checksum_events_per_second Ns = Ns Sy 20 Ns /s Pq uint 790*3ff01b23SMartin MatuskaRate limit checksum events to this many per second. 791*3ff01b23SMartin MatuskaNote that this should not be set below the ZED thresholds 792*3ff01b23SMartin Matuska(currently 10 checksums over 10 seconds) 793*3ff01b23SMartin Matuskaor else the daemon may not trigger any action. 794*3ff01b23SMartin Matuska. 795*3ff01b23SMartin Matuska.It Sy zfs_commit_timeout_pct Ns = Ns Sy 5 Ns % Pq int 796*3ff01b23SMartin MatuskaThis controls the amount of time that a ZIL block (lwb) will remain "open" 797*3ff01b23SMartin Matuskawhen it isn't "full", and it has a thread waiting for it to be committed to 798*3ff01b23SMartin Matuskastable storage. 799*3ff01b23SMartin MatuskaThe timeout is scaled based on a percentage of the last lwb 800*3ff01b23SMartin Matuskalatency to avoid significantly impacting the latency of each individual 801*3ff01b23SMartin Matuskatransaction record (itx). 802*3ff01b23SMartin Matuska. 803*3ff01b23SMartin Matuska.It Sy zfs_condense_indirect_commit_entry_delay_ms Ns = Ns Sy 0 Ns ms Pq int 804*3ff01b23SMartin MatuskaVdev indirection layer (used for device removal) sleeps for this many 805*3ff01b23SMartin Matuskamilliseconds during mapping generation. 806*3ff01b23SMartin MatuskaIntended for use with the test suite to throttle vdev removal speed. 807*3ff01b23SMartin Matuska. 808*3ff01b23SMartin Matuska.It Sy zfs_condense_indirect_obsolete_pct Ns = Ns Sy 25 Ns % Pq int 809*3ff01b23SMartin MatuskaMinimum percent of obsolete bytes in vdev mapping required to attempt to condense 810*3ff01b23SMartin Matuska.Pq see Sy zfs_condense_indirect_vdevs_enable . 811*3ff01b23SMartin MatuskaIntended for use with the test suite 812*3ff01b23SMartin Matuskato facilitate triggering condensing as needed. 813*3ff01b23SMartin Matuska. 814*3ff01b23SMartin Matuska.It Sy zfs_condense_indirect_vdevs_enable Ns = Ns Sy 1 Ns | Ns 0 Pq int 815*3ff01b23SMartin MatuskaEnable condensing indirect vdev mappings. 816*3ff01b23SMartin MatuskaWhen set, attempt to condense indirect vdev mappings 817*3ff01b23SMartin Matuskaif the mapping uses more than 818*3ff01b23SMartin Matuska.Sy zfs_condense_min_mapping_bytes 819*3ff01b23SMartin Matuskabytes of memory and if the obsolete space map object uses more than 820*3ff01b23SMartin Matuska.Sy zfs_condense_max_obsolete_bytes 821*3ff01b23SMartin Matuskabytes on-disk. 822*3ff01b23SMartin MatuskaThe condensing process is an attempt to save memory by removing obsolete mappings. 823*3ff01b23SMartin Matuska. 824*3ff01b23SMartin Matuska.It Sy zfs_condense_max_obsolete_bytes Ns = Ns Sy 1073741824 Ns B Po 1GB Pc Pq ulong 825*3ff01b23SMartin MatuskaOnly attempt to condense indirect vdev mappings if the on-disk size 826*3ff01b23SMartin Matuskaof the obsolete space map object is greater than this number of bytes 827*3ff01b23SMartin Matuska.Pq see Sy zfs_condense_indirect_vdevs_enable . 828*3ff01b23SMartin Matuska. 829*3ff01b23SMartin Matuska.It Sy zfs_condense_min_mapping_bytes Ns = Ns Sy 131072 Ns B Po 128kB Pc Pq ulong 830*3ff01b23SMartin MatuskaMinimum size vdev mapping to attempt to condense 831*3ff01b23SMartin Matuska.Pq see Sy zfs_condense_indirect_vdevs_enable . 832*3ff01b23SMartin Matuska. 833*3ff01b23SMartin Matuska.It Sy zfs_dbgmsg_enable Ns = Ns Sy 1 Ns | Ns 0 Pq int 834*3ff01b23SMartin MatuskaInternally ZFS keeps a small log to facilitate debugging. 835*3ff01b23SMartin MatuskaThe log is enabled by default, and can be disabled by unsetting this option. 836*3ff01b23SMartin MatuskaThe contents of the log can be accessed by reading 837*3ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/dbgmsg . 838*3ff01b23SMartin MatuskaWriting 839*3ff01b23SMartin Matuska.Sy 0 840*3ff01b23SMartin Matuskato the file clears the log. 841*3ff01b23SMartin Matuska.Pp 842*3ff01b23SMartin MatuskaThis setting does not influence debug prints due to 843*3ff01b23SMartin Matuska.Sy zfs_flags . 844*3ff01b23SMartin Matuska. 845*3ff01b23SMartin Matuska.It Sy zfs_dbgmsg_maxsize Ns = Ns Sy 4194304 Ns B Po 4MB Pc Pq int 846*3ff01b23SMartin MatuskaMaximum size of the internal ZFS debug log. 847*3ff01b23SMartin Matuska. 848*3ff01b23SMartin Matuska.It Sy zfs_dbuf_state_index Ns = Ns Sy 0 Pq int 849*3ff01b23SMartin MatuskaHistorically used for controlling what reporting was available under 850*3ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs . 851*3ff01b23SMartin MatuskaNo effect. 852*3ff01b23SMartin Matuska. 853*3ff01b23SMartin Matuska.It Sy zfs_deadman_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int 854*3ff01b23SMartin MatuskaWhen a pool sync operation takes longer than 855*3ff01b23SMartin Matuska.Sy zfs_deadman_synctime_ms , 856*3ff01b23SMartin Matuskaor when an individual I/O operation takes longer than 857*3ff01b23SMartin Matuska.Sy zfs_deadman_ziotime_ms , 858*3ff01b23SMartin Matuskathen the operation is considered to be "hung". 859*3ff01b23SMartin MatuskaIf 860*3ff01b23SMartin Matuska.Sy zfs_deadman_enabled 861*3ff01b23SMartin Matuskais set, then the deadman behavior is invoked as described by 862*3ff01b23SMartin Matuska.Sy zfs_deadman_failmode . 863*3ff01b23SMartin MatuskaBy default, the deadman is enabled and set to 864*3ff01b23SMartin Matuska.Sy wait 865*3ff01b23SMartin Matuskawhich results in "hung" I/Os only being logged. 866*3ff01b23SMartin MatuskaThe deadman is automatically disabled when a pool gets suspended. 867*3ff01b23SMartin Matuska. 868*3ff01b23SMartin Matuska.It Sy zfs_deadman_failmode Ns = Ns Sy wait Pq charp 869*3ff01b23SMartin MatuskaControls the failure behavior when the deadman detects a "hung" I/O operation. 870*3ff01b23SMartin MatuskaValid values are: 871*3ff01b23SMartin Matuska.Bl -tag -compact -offset 4n -width "continue" 872*3ff01b23SMartin Matuska.It Sy wait 873*3ff01b23SMartin MatuskaWait for a "hung" operation to complete. 874*3ff01b23SMartin MatuskaFor each "hung" operation a "deadman" event will be posted 875*3ff01b23SMartin Matuskadescribing that operation. 876*3ff01b23SMartin Matuska.It Sy continue 877*3ff01b23SMartin MatuskaAttempt to recover from a "hung" operation by re-dispatching it 878*3ff01b23SMartin Matuskato the I/O pipeline if possible. 879*3ff01b23SMartin Matuska.It Sy panic 880*3ff01b23SMartin MatuskaPanic the system. 881*3ff01b23SMartin MatuskaThis can be used to facilitate automatic fail-over 882*3ff01b23SMartin Matuskato a properly configured fail-over partner. 883*3ff01b23SMartin Matuska.El 884*3ff01b23SMartin Matuska. 885*3ff01b23SMartin Matuska.It Sy zfs_deadman_checktime_ms Ns = Ns Sy 60000 Ns ms Po 1min Pc Pq int 886*3ff01b23SMartin MatuskaCheck time in milliseconds. 887*3ff01b23SMartin MatuskaThis defines the frequency at which we check for hung I/O requests 888*3ff01b23SMartin Matuskaand potentially invoke the 889*3ff01b23SMartin Matuska.Sy zfs_deadman_failmode 890*3ff01b23SMartin Matuskabehavior. 891*3ff01b23SMartin Matuska. 892*3ff01b23SMartin Matuska.It Sy zfs_deadman_synctime_ms Ns = Ns Sy 600000 Ns ms Po 10min Pc Pq ulong 893*3ff01b23SMartin MatuskaInterval in milliseconds after which the deadman is triggered and also 894*3ff01b23SMartin Matuskathe interval after which a pool sync operation is considered to be "hung". 895*3ff01b23SMartin MatuskaOnce this limit is exceeded the deadman will be invoked every 896*3ff01b23SMartin Matuska.Sy zfs_deadman_checktime_ms 897*3ff01b23SMartin Matuskamilliseconds until the pool sync completes. 898*3ff01b23SMartin Matuska. 899*3ff01b23SMartin Matuska.It Sy zfs_deadman_ziotime_ms Ns = Ns Sy 300000 Ns ms Po 5min Pc Pq ulong 900*3ff01b23SMartin MatuskaInterval in milliseconds after which the deadman is triggered and an 901*3ff01b23SMartin Matuskaindividual I/O operation is considered to be "hung". 902*3ff01b23SMartin MatuskaAs long as the operation remains "hung", 903*3ff01b23SMartin Matuskathe deadman will be invoked every 904*3ff01b23SMartin Matuska.Sy zfs_deadman_checktime_ms 905*3ff01b23SMartin Matuskamilliseconds until the operation completes. 906*3ff01b23SMartin Matuska. 907*3ff01b23SMartin Matuska.It Sy zfs_dedup_prefetch Ns = Ns Sy 0 Ns | Ns 1 Pq int 908*3ff01b23SMartin MatuskaEnable prefetching dedup-ed blocks which are going to be freed. 909*3ff01b23SMartin Matuska. 910*3ff01b23SMartin Matuska.It Sy zfs_delay_min_dirty_percent Ns = Ns Sy 60 Ns % Pq int 911*3ff01b23SMartin MatuskaStart to delay each transaction once there is this amount of dirty data, 912*3ff01b23SMartin Matuskaexpressed as a percentage of 913*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max . 914*3ff01b23SMartin MatuskaThis value should be at least 915*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_active_max_dirty_percent . 916*3ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY . 917*3ff01b23SMartin Matuska. 918*3ff01b23SMartin Matuska.It Sy zfs_delay_scale Ns = Ns Sy 500000 Pq int 919*3ff01b23SMartin MatuskaThis controls how quickly the transaction delay approaches infinity. 920*3ff01b23SMartin MatuskaLarger values cause longer delays for a given amount of dirty data. 921*3ff01b23SMartin Matuska.Pp 922*3ff01b23SMartin MatuskaFor the smoothest delay, this value should be about 1 billion divided 923*3ff01b23SMartin Matuskaby the maximum number of operations per second. 924*3ff01b23SMartin MatuskaThis will smoothly handle between ten times and a tenth of this number. 925*3ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY . 926*3ff01b23SMartin Matuska.Pp 927*3ff01b23SMartin Matuska.Sy zfs_delay_scale * zfs_dirty_data_max Em must be smaller than Sy 2^64 . 928*3ff01b23SMartin Matuska. 929*3ff01b23SMartin Matuska.It Sy zfs_disable_ivset_guid_check Ns = Ns Sy 0 Ns | Ns 1 Pq int 930*3ff01b23SMartin MatuskaDisables requirement for IVset GUIDs to be present and match when doing a raw 931*3ff01b23SMartin Matuskareceive of encrypted datasets. 932*3ff01b23SMartin MatuskaIntended for users whose pools were created with 933*3ff01b23SMartin MatuskaOpenZFS pre-release versions and now have compatibility issues. 934*3ff01b23SMartin Matuska. 935*3ff01b23SMartin Matuska.It Sy zfs_key_max_salt_uses Ns = Ns Sy 400000000 Po 4*10^8 Pc Pq ulong 936*3ff01b23SMartin MatuskaMaximum number of uses of a single salt value before generating a new one for 937*3ff01b23SMartin Matuskaencrypted datasets. 938*3ff01b23SMartin MatuskaThe default value is also the maximum. 939*3ff01b23SMartin Matuska. 940*3ff01b23SMartin Matuska.It Sy zfs_object_mutex_size Ns = Ns Sy 64 Pq uint 941*3ff01b23SMartin MatuskaSize of the znode hashtable used for holds. 942*3ff01b23SMartin Matuska.Pp 943*3ff01b23SMartin MatuskaDue to the need to hold locks on objects that may not exist yet, kernel mutexes 944*3ff01b23SMartin Matuskaare not created per-object and instead a hashtable is used where collisions 945*3ff01b23SMartin Matuskawill result in objects waiting when there is not actually contention on the 946*3ff01b23SMartin Matuskasame object. 947*3ff01b23SMartin Matuska. 948*3ff01b23SMartin Matuska.It Sy zfs_slow_io_events_per_second Ns = Ns Sy 20 Ns /s Pq int 949*3ff01b23SMartin MatuskaRate limit delay and deadman zevents (which report slow I/Os) to this many per 950*3ff01b23SMartin Matuskasecond. 951*3ff01b23SMartin Matuska. 952*3ff01b23SMartin Matuska.It Sy zfs_unflushed_max_mem_amt Ns = Ns Sy 1073741824 Ns B Po 1GB Pc Pq ulong 953*3ff01b23SMartin MatuskaUpper-bound limit for unflushed metadata changes to be held by the 954*3ff01b23SMartin Matuskalog spacemap in memory, in bytes. 955*3ff01b23SMartin Matuska. 956*3ff01b23SMartin Matuska.It Sy zfs_unflushed_max_mem_ppm Ns = Ns Sy 1000 Ns ppm Po 0.1% Pc Pq ulong 957*3ff01b23SMartin MatuskaPart of overall system memory that ZFS allows to be used 958*3ff01b23SMartin Matuskafor unflushed metadata changes by the log spacemap, in millionths. 959*3ff01b23SMartin Matuska. 960*3ff01b23SMartin Matuska.It Sy zfs_unflushed_log_block_max Ns = Ns Sy 262144 Po 256k Pc Pq ulong 961*3ff01b23SMartin MatuskaDescribes the maximum number of log spacemap blocks allowed for each pool. 962*3ff01b23SMartin MatuskaThe default value means that the space in all the log spacemaps 963*3ff01b23SMartin Matuskacan add up to no more than 964*3ff01b23SMartin Matuska.Sy 262144 965*3ff01b23SMartin Matuskablocks (which means 966*3ff01b23SMartin Matuska.Em 32GB 967*3ff01b23SMartin Matuskaof logical space before compression and ditto blocks, 968*3ff01b23SMartin Matuskaassuming that blocksize is 969*3ff01b23SMartin Matuska.Em 128kB ) . 970*3ff01b23SMartin Matuska.Pp 971*3ff01b23SMartin MatuskaThis tunable is important because it involves a trade-off between import 972*3ff01b23SMartin Matuskatime after an unclean export and the frequency of flushing metaslabs. 973*3ff01b23SMartin MatuskaThe higher this number is, the more log blocks we allow when the pool is 974*3ff01b23SMartin Matuskaactive which means that we flush metaslabs less often and thus decrease 975*3ff01b23SMartin Matuskathe number of I/Os for spacemap updates per TXG. 976*3ff01b23SMartin MatuskaAt the same time though, that means that in the event of an unclean export, 977*3ff01b23SMartin Matuskathere will be more log spacemap blocks for us to read, inducing overhead 978*3ff01b23SMartin Matuskain the import time of the pool. 979*3ff01b23SMartin MatuskaThe lower the number, the amount of flushing increases, destroying log 980*3ff01b23SMartin Matuskablocks quicker as they become obsolete faster, which leaves less blocks 981*3ff01b23SMartin Matuskato be read during import time after a crash. 982*3ff01b23SMartin Matuska.Pp 983*3ff01b23SMartin MatuskaEach log spacemap block existing during pool import leads to approximately 984*3ff01b23SMartin Matuskaone extra logical I/O issued. 985*3ff01b23SMartin MatuskaThis is the reason why this tunable is exposed in terms of blocks rather 986*3ff01b23SMartin Matuskathan space used. 987*3ff01b23SMartin Matuska. 988*3ff01b23SMartin Matuska.It Sy zfs_unflushed_log_block_min Ns = Ns Sy 1000 Pq ulong 989*3ff01b23SMartin MatuskaIf the number of metaslabs is small and our incoming rate is high, 990*3ff01b23SMartin Matuskawe could get into a situation that we are flushing all our metaslabs every TXG. 991*3ff01b23SMartin MatuskaThus we always allow at least this many log blocks. 992*3ff01b23SMartin Matuska. 993*3ff01b23SMartin Matuska.It Sy zfs_unflushed_log_block_pct Ns = Ns Sy 400 Ns % Pq ulong 994*3ff01b23SMartin MatuskaTunable used to determine the number of blocks that can be used for 995*3ff01b23SMartin Matuskathe spacemap log, expressed as a percentage of the total number of 996*3ff01b23SMartin Matuskametaslabs in the pool. 997*3ff01b23SMartin Matuska. 998*3ff01b23SMartin Matuska.It Sy zfs_unlink_suspend_progress Ns = Ns Sy 0 Ns | Ns 1 Pq uint 999*3ff01b23SMartin MatuskaWhen enabled, files will not be asynchronously removed from the list of pending 1000*3ff01b23SMartin Matuskaunlinks and the space they consume will be leaked. 1001*3ff01b23SMartin MatuskaOnce this option has been disabled and the dataset is remounted, 1002*3ff01b23SMartin Matuskathe pending unlinks will be processed and the freed space returned to the pool. 1003*3ff01b23SMartin MatuskaThis option is used by the test suite. 1004*3ff01b23SMartin Matuska. 1005*3ff01b23SMartin Matuska.It Sy zfs_delete_blocks Ns = Ns Sy 20480 Pq ulong 1006*3ff01b23SMartin MatuskaThis is the used to define a large file for the purposes of deletion. 1007*3ff01b23SMartin MatuskaFiles containing more than 1008*3ff01b23SMartin Matuska.Sy zfs_delete_blocks 1009*3ff01b23SMartin Matuskawill be deleted asynchronously, while smaller files are deleted synchronously. 1010*3ff01b23SMartin MatuskaDecreasing this value will reduce the time spent in an 1011*3ff01b23SMartin Matuska.Xr unlink 2 1012*3ff01b23SMartin Matuskasystem call, at the expense of a longer delay before the freed space is available. 1013*3ff01b23SMartin Matuska. 1014*3ff01b23SMartin Matuska.It Sy zfs_dirty_data_max Ns = Pq int 1015*3ff01b23SMartin MatuskaDetermines the dirty space limit in bytes. 1016*3ff01b23SMartin MatuskaOnce this limit is exceeded, new writes are halted until space frees up. 1017*3ff01b23SMartin MatuskaThis parameter takes precedence over 1018*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max_percent . 1019*3ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY . 1020*3ff01b23SMartin Matuska.Pp 1021*3ff01b23SMartin MatuskaDefaults to 1022*3ff01b23SMartin Matuska.Sy physical_ram/10 , 1023*3ff01b23SMartin Matuskacapped at 1024*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max_max . 1025*3ff01b23SMartin Matuska. 1026*3ff01b23SMartin Matuska.It Sy zfs_dirty_data_max_max Ns = Pq int 1027*3ff01b23SMartin MatuskaMaximum allowable value of 1028*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max , 1029*3ff01b23SMartin Matuskaexpressed in bytes. 1030*3ff01b23SMartin MatuskaThis limit is only enforced at module load time, and will be ignored if 1031*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max 1032*3ff01b23SMartin Matuskais later changed. 1033*3ff01b23SMartin MatuskaThis parameter takes precedence over 1034*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max_max_percent . 1035*3ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY . 1036*3ff01b23SMartin Matuska.Pp 1037*3ff01b23SMartin MatuskaDefaults to 1038*3ff01b23SMartin Matuska.Sy physical_ram/4 , 1039*3ff01b23SMartin Matuska. 1040*3ff01b23SMartin Matuska.It Sy zfs_dirty_data_max_max_percent Ns = Ns Sy 25 Ns % Pq int 1041*3ff01b23SMartin MatuskaMaximum allowable value of 1042*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max , 1043*3ff01b23SMartin Matuskaexpressed as a percentage of physical RAM. 1044*3ff01b23SMartin MatuskaThis limit is only enforced at module load time, and will be ignored if 1045*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max 1046*3ff01b23SMartin Matuskais later changed. 1047*3ff01b23SMartin MatuskaThe parameter 1048*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max_max 1049*3ff01b23SMartin Matuskatakes precedence over this one. 1050*3ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY . 1051*3ff01b23SMartin Matuska. 1052*3ff01b23SMartin Matuska.It Sy zfs_dirty_data_max_percent Ns = Ns Sy 10 Ns % Pq int 1053*3ff01b23SMartin MatuskaDetermines the dirty space limit, expressed as a percentage of all memory. 1054*3ff01b23SMartin MatuskaOnce this limit is exceeded, new writes are halted until space frees up. 1055*3ff01b23SMartin MatuskaThe parameter 1056*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max 1057*3ff01b23SMartin Matuskatakes precedence over this one. 1058*3ff01b23SMartin Matuska.No See Sx ZFS TRANSACTION DELAY . 1059*3ff01b23SMartin Matuska.Pp 1060*3ff01b23SMartin MatuskaSubject to 1061*3ff01b23SMartin Matuska.Sy zfs_dirty_data_max_max . 1062*3ff01b23SMartin Matuska. 1063*3ff01b23SMartin Matuska.It Sy zfs_dirty_data_sync_percent Ns = Ns Sy 20 Ns % Pq int 1064*3ff01b23SMartin MatuskaStart syncing out a transaction group if there's at least this much dirty data 1065*3ff01b23SMartin Matuska.Pq as a percentage of Sy zfs_dirty_data_max . 1066*3ff01b23SMartin MatuskaThis should be less than 1067*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_active_min_dirty_percent . 1068*3ff01b23SMartin Matuska. 1069*3ff01b23SMartin Matuska.It Sy zfs_fallocate_reserve_percent Ns = Ns Sy 110 Ns % Pq uint 1070*3ff01b23SMartin MatuskaSince ZFS is a copy-on-write filesystem with snapshots, blocks cannot be 1071*3ff01b23SMartin Matuskapreallocated for a file in order to guarantee that later writes will not 1072*3ff01b23SMartin Matuskarun out of space. 1073*3ff01b23SMartin MatuskaInstead, 1074*3ff01b23SMartin Matuska.Xr fallocate 2 1075*3ff01b23SMartin Matuskaspace preallocation only checks that sufficient space is currently available 1076*3ff01b23SMartin Matuskain the pool or the user's project quota allocation, 1077*3ff01b23SMartin Matuskaand then creates a sparse file of the requested size. 1078*3ff01b23SMartin MatuskaThe requested space is multiplied by 1079*3ff01b23SMartin Matuska.Sy zfs_fallocate_reserve_percent 1080*3ff01b23SMartin Matuskato allow additional space for indirect blocks and other internal metadata. 1081*3ff01b23SMartin MatuskaSetting this to 1082*3ff01b23SMartin Matuska.Sy 0 1083*3ff01b23SMartin Matuskadisables support for 1084*3ff01b23SMartin Matuska.Xr fallocate 2 1085*3ff01b23SMartin Matuskaand causes it to return 1086*3ff01b23SMartin Matuska.Sy EOPNOTSUPP . 1087*3ff01b23SMartin Matuska. 1088*3ff01b23SMartin Matuska.It Sy zfs_fletcher_4_impl Ns = Ns Sy fastest Pq string 1089*3ff01b23SMartin MatuskaSelect a fletcher 4 implementation. 1090*3ff01b23SMartin Matuska.Pp 1091*3ff01b23SMartin MatuskaSupported selectors are: 1092*3ff01b23SMartin Matuska.Sy fastest , scalar , sse2 , ssse3 , avx2 , avx512f , avx512bw , 1093*3ff01b23SMartin Matuska.No and Sy aarch64_neon . 1094*3ff01b23SMartin MatuskaAll except 1095*3ff01b23SMartin Matuska.Sy fastest No and Sy scalar 1096*3ff01b23SMartin Matuskarequire instruction set extensions to be available, 1097*3ff01b23SMartin Matuskaand will only appear if ZFS detects that they are present at runtime. 1098*3ff01b23SMartin MatuskaIf multiple implementations of fletcher 4 are available, the 1099*3ff01b23SMartin Matuska.Sy fastest 1100*3ff01b23SMartin Matuskawill be chosen using a micro benchmark. 1101*3ff01b23SMartin MatuskaSelecting 1102*3ff01b23SMartin Matuska.Sy scalar 1103*3ff01b23SMartin Matuskaresults in the original CPU-based calculation being used. 1104*3ff01b23SMartin MatuskaSelecting any option other than 1105*3ff01b23SMartin Matuska.Sy fastest No or Sy scalar 1106*3ff01b23SMartin Matuskaresults in vector instructions 1107*3ff01b23SMartin Matuskafrom the respective CPU instruction set being used. 1108*3ff01b23SMartin Matuska. 1109*3ff01b23SMartin Matuska.It Sy zfs_free_bpobj_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int 1110*3ff01b23SMartin MatuskaEnable/disable the processing of the free_bpobj object. 1111*3ff01b23SMartin Matuska. 1112*3ff01b23SMartin Matuska.It Sy zfs_async_block_max_blocks Ns = Ns Sy ULONG_MAX Po unlimited Pc Pq ulong 1113*3ff01b23SMartin MatuskaMaximum number of blocks freed in a single TXG. 1114*3ff01b23SMartin Matuska. 1115*3ff01b23SMartin Matuska.It Sy zfs_max_async_dedup_frees Ns = Ns Sy 100000 Po 10^5 Pc Pq ulong 1116*3ff01b23SMartin MatuskaMaximum number of dedup blocks freed in a single TXG. 1117*3ff01b23SMartin Matuska. 1118*3ff01b23SMartin Matuska.It Sy zfs_override_estimate_recordsize Ns = Ns Sy 0 Pq ulong 1119*3ff01b23SMartin MatuskaIf nonzer, override record size calculation for 1120*3ff01b23SMartin Matuska.Nm zfs Cm send 1121*3ff01b23SMartin Matuskaestimates. 1122*3ff01b23SMartin Matuska. 1123*3ff01b23SMartin Matuska.It Sy zfs_vdev_async_read_max_active Ns = Ns Sy 3 Pq int 1124*3ff01b23SMartin MatuskaMaximum asynchronous read I/O operations active to each device. 1125*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1126*3ff01b23SMartin Matuska. 1127*3ff01b23SMartin Matuska.It Sy zfs_vdev_async_read_min_active Ns = Ns Sy 1 Pq int 1128*3ff01b23SMartin MatuskaMinimum asynchronous read I/O operation active to each device. 1129*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1130*3ff01b23SMartin Matuska. 1131*3ff01b23SMartin Matuska.It Sy zfs_vdev_async_write_active_max_dirty_percent Ns = Ns Sy 60 Ns % Pq int 1132*3ff01b23SMartin MatuskaWhen the pool has more than this much dirty data, use 1133*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_max_active 1134*3ff01b23SMartin Matuskato limit active async writes. 1135*3ff01b23SMartin MatuskaIf the dirty data is between the minimum and maximum, 1136*3ff01b23SMartin Matuskathe active I/O limit is linearly interpolated. 1137*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1138*3ff01b23SMartin Matuska. 1139*3ff01b23SMartin Matuska.It Sy zfs_vdev_async_write_active_min_dirty_percent Ns = Ns Sy 30 Ns % Pq int 1140*3ff01b23SMartin MatuskaWhen the pool has less than this much dirty data, use 1141*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_min_active 1142*3ff01b23SMartin Matuskato limit active async writes. 1143*3ff01b23SMartin MatuskaIf the dirty data is between the minimum and maximum, 1144*3ff01b23SMartin Matuskathe active I/O limit is linearly 1145*3ff01b23SMartin Matuskainterpolated. 1146*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1147*3ff01b23SMartin Matuska. 1148*3ff01b23SMartin Matuska.It Sy zfs_vdev_async_write_max_active Ns = Ns Sy 30 Pq int 1149*3ff01b23SMartin MatuskaMaximum asynchronous write I/O operations active to each device. 1150*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1151*3ff01b23SMartin Matuska. 1152*3ff01b23SMartin Matuska.It Sy zfs_vdev_async_write_min_active Ns = Ns Sy 2 Pq int 1153*3ff01b23SMartin MatuskaMinimum asynchronous write I/O operations active to each device. 1154*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1155*3ff01b23SMartin Matuska.Pp 1156*3ff01b23SMartin MatuskaLower values are associated with better latency on rotational media but poorer 1157*3ff01b23SMartin Matuskaresilver performance. 1158*3ff01b23SMartin MatuskaThe default value of 1159*3ff01b23SMartin Matuska.Sy 2 1160*3ff01b23SMartin Matuskawas chosen as a compromise. 1161*3ff01b23SMartin MatuskaA value of 1162*3ff01b23SMartin Matuska.Sy 3 1163*3ff01b23SMartin Matuskahas been shown to improve resilver performance further at a cost of 1164*3ff01b23SMartin Matuskafurther increasing latency. 1165*3ff01b23SMartin Matuska. 1166*3ff01b23SMartin Matuska.It Sy zfs_vdev_initializing_max_active Ns = Ns Sy 1 Pq int 1167*3ff01b23SMartin MatuskaMaximum initializing I/O operations active to each device. 1168*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1169*3ff01b23SMartin Matuska. 1170*3ff01b23SMartin Matuska.It Sy zfs_vdev_initializing_min_active Ns = Ns Sy 1 Pq int 1171*3ff01b23SMartin MatuskaMinimum initializing I/O operations active to each device. 1172*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1173*3ff01b23SMartin Matuska. 1174*3ff01b23SMartin Matuska.It Sy zfs_vdev_max_active Ns = Ns Sy 1000 Pq int 1175*3ff01b23SMartin MatuskaThe maximum number of I/O operations active to each device. 1176*3ff01b23SMartin MatuskaIdeally, this will be at least the sum of each queue's 1177*3ff01b23SMartin Matuska.Sy max_active . 1178*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1179*3ff01b23SMartin Matuska. 1180*3ff01b23SMartin Matuska.It Sy zfs_vdev_rebuild_max_active Ns = Ns Sy 3 Pq int 1181*3ff01b23SMartin MatuskaMaximum sequential resilver I/O operations active to each device. 1182*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1183*3ff01b23SMartin Matuska. 1184*3ff01b23SMartin Matuska.It Sy zfs_vdev_rebuild_min_active Ns = Ns Sy 1 Pq int 1185*3ff01b23SMartin MatuskaMinimum sequential resilver I/O operations active to each device. 1186*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1187*3ff01b23SMartin Matuska. 1188*3ff01b23SMartin Matuska.It Sy zfs_vdev_removal_max_active Ns = Ns Sy 2 Pq int 1189*3ff01b23SMartin MatuskaMaximum removal I/O operations active to each device. 1190*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1191*3ff01b23SMartin Matuska. 1192*3ff01b23SMartin Matuska.It Sy zfs_vdev_removal_min_active Ns = Ns Sy 1 Pq int 1193*3ff01b23SMartin MatuskaMinimum removal I/O operations active to each device. 1194*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1195*3ff01b23SMartin Matuska. 1196*3ff01b23SMartin Matuska.It Sy zfs_vdev_scrub_max_active Ns = Ns Sy 2 Pq int 1197*3ff01b23SMartin MatuskaMaximum scrub I/O operations active to each device. 1198*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1199*3ff01b23SMartin Matuska. 1200*3ff01b23SMartin Matuska.It Sy zfs_vdev_scrub_min_active Ns = Ns Sy 1 Pq int 1201*3ff01b23SMartin MatuskaMinimum scrub I/O operations active to each device. 1202*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1203*3ff01b23SMartin Matuska. 1204*3ff01b23SMartin Matuska.It Sy zfs_vdev_sync_read_max_active Ns = Ns Sy 10 Pq int 1205*3ff01b23SMartin MatuskaMaximum synchronous read I/O operations active to each device. 1206*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1207*3ff01b23SMartin Matuska. 1208*3ff01b23SMartin Matuska.It Sy zfs_vdev_sync_read_min_active Ns = Ns Sy 10 Pq int 1209*3ff01b23SMartin MatuskaMinimum synchronous read I/O operations active to each device. 1210*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1211*3ff01b23SMartin Matuska. 1212*3ff01b23SMartin Matuska.It Sy zfs_vdev_sync_write_max_active Ns = Ns Sy 10 Pq int 1213*3ff01b23SMartin MatuskaMaximum synchronous write I/O operations active to each device. 1214*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1215*3ff01b23SMartin Matuska. 1216*3ff01b23SMartin Matuska.It Sy zfs_vdev_sync_write_min_active Ns = Ns Sy 10 Pq int 1217*3ff01b23SMartin MatuskaMinimum synchronous write I/O operations active to each device. 1218*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1219*3ff01b23SMartin Matuska. 1220*3ff01b23SMartin Matuska.It Sy zfs_vdev_trim_max_active Ns = Ns Sy 2 Pq int 1221*3ff01b23SMartin MatuskaMaximum trim/discard I/O operations active to each device. 1222*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1223*3ff01b23SMartin Matuska. 1224*3ff01b23SMartin Matuska.It Sy zfs_vdev_trim_min_active Ns = Ns Sy 1 Pq int 1225*3ff01b23SMartin MatuskaMinimum trim/discard I/O operations active to each device. 1226*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1227*3ff01b23SMartin Matuska. 1228*3ff01b23SMartin Matuska.It Sy zfs_vdev_nia_delay Ns = Ns Sy 5 Pq int 1229*3ff01b23SMartin MatuskaFor non-interactive I/O (scrub, resilver, removal, initialize and rebuild), 1230*3ff01b23SMartin Matuskathe number of concurrently-active I/O operations is limited to 1231*3ff01b23SMartin Matuska.Sy zfs_*_min_active , 1232*3ff01b23SMartin Matuskaunless the vdev is "idle". 1233*3ff01b23SMartin MatuskaWhen there are no interactive I/O operatinons active (synchronous or otherwise), 1234*3ff01b23SMartin Matuskaand 1235*3ff01b23SMartin Matuska.Sy zfs_vdev_nia_delay 1236*3ff01b23SMartin Matuskaoperations have completed since the last interactive operation, 1237*3ff01b23SMartin Matuskathen the vdev is considered to be "idle", 1238*3ff01b23SMartin Matuskaand the number of concurrently-active non-interactive operations is increased to 1239*3ff01b23SMartin Matuska.Sy zfs_*_max_active . 1240*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1241*3ff01b23SMartin Matuska. 1242*3ff01b23SMartin Matuska.It Sy zfs_vdev_nia_credit Ns = Ns Sy 5 Pq int 1243*3ff01b23SMartin MatuskaSome HDDs tend to prioritize sequential I/O so strongly, that concurrent 1244*3ff01b23SMartin Matuskarandom I/O latency reaches several seconds. 1245*3ff01b23SMartin MatuskaOn some HDDs this happens even if sequential I/O operations 1246*3ff01b23SMartin Matuskaare submitted one at a time, and so setting 1247*3ff01b23SMartin Matuska.Sy zfs_*_max_active Ns = Sy 1 1248*3ff01b23SMartin Matuskadoes not help. 1249*3ff01b23SMartin MatuskaTo prevent non-interactive I/O, like scrub, 1250*3ff01b23SMartin Matuskafrom monopolizing the device, no more than 1251*3ff01b23SMartin Matuska.Sy zfs_vdev_nia_credit operations can be sent 1252*3ff01b23SMartin Matuskawhile there are outstanding incomplete interactive operations. 1253*3ff01b23SMartin MatuskaThis enforced wait ensures the HDD services the interactive I/O 1254*3ff01b23SMartin Matuskawithin a reasonable amount of time. 1255*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1256*3ff01b23SMartin Matuska. 1257*3ff01b23SMartin Matuska.It Sy zfs_vdev_queue_depth_pct Ns = Ns Sy 1000 Ns % Pq int 1258*3ff01b23SMartin MatuskaMaximum number of queued allocations per top-level vdev expressed as 1259*3ff01b23SMartin Matuskaa percentage of 1260*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_max_active , 1261*3ff01b23SMartin Matuskawhich allows the system to detect devices that are more capable 1262*3ff01b23SMartin Matuskaof handling allocations and to allocate more blocks to those devices. 1263*3ff01b23SMartin MatuskaThis allows for dynamic allocation distribution when devices are imbalanced, 1264*3ff01b23SMartin Matuskaas fuller devices will tend to be slower than empty devices. 1265*3ff01b23SMartin Matuska.Pp 1266*3ff01b23SMartin MatuskaAlso see 1267*3ff01b23SMartin Matuska.Sy zio_dva_throttle_enabled . 1268*3ff01b23SMartin Matuska. 1269*3ff01b23SMartin Matuska.It Sy zfs_expire_snapshot Ns = Ns Sy 300 Ns s Pq int 1270*3ff01b23SMartin MatuskaTime before expiring 1271*3ff01b23SMartin Matuska.Pa .zfs/snapshot . 1272*3ff01b23SMartin Matuska. 1273*3ff01b23SMartin Matuska.It Sy zfs_admin_snapshot Ns = Ns Sy 0 Ns | Ns 1 Pq int 1274*3ff01b23SMartin MatuskaAllow the creation, removal, or renaming of entries in the 1275*3ff01b23SMartin Matuska.Sy .zfs/snapshot 1276*3ff01b23SMartin Matuskadirectory to cause the creation, destruction, or renaming of snapshots. 1277*3ff01b23SMartin MatuskaWhen enabled, this functionality works both locally and over NFS exports 1278*3ff01b23SMartin Matuskawhich have the 1279*3ff01b23SMartin Matuska.Em no_root_squash 1280*3ff01b23SMartin Matuskaoption set. 1281*3ff01b23SMartin Matuska. 1282*3ff01b23SMartin Matuska.It Sy zfs_flags Ns = Ns Sy 0 Pq int 1283*3ff01b23SMartin MatuskaSet additional debugging flags. 1284*3ff01b23SMartin MatuskaThe following flags may be bitwise-ored together: 1285*3ff01b23SMartin Matuska.TS 1286*3ff01b23SMartin Matuskabox; 1287*3ff01b23SMartin Matuskalbz r l l . 1288*3ff01b23SMartin Matuska Value Symbolic Name Description 1289*3ff01b23SMartin Matuska_ 1290*3ff01b23SMartin Matuska 1 ZFS_DEBUG_DPRINTF Enable dprintf entries in the debug log. 1291*3ff01b23SMartin Matuska* 2 ZFS_DEBUG_DBUF_VERIFY Enable extra dbuf verifications. 1292*3ff01b23SMartin Matuska* 4 ZFS_DEBUG_DNODE_VERIFY Enable extra dnode verifications. 1293*3ff01b23SMartin Matuska 8 ZFS_DEBUG_SNAPNAMES Enable snapshot name verification. 1294*3ff01b23SMartin Matuska 16 ZFS_DEBUG_MODIFY Check for illegally modified ARC buffers. 1295*3ff01b23SMartin Matuska 64 ZFS_DEBUG_ZIO_FREE Enable verification of block frees. 1296*3ff01b23SMartin Matuska 128 ZFS_DEBUG_HISTOGRAM_VERIFY Enable extra spacemap histogram verifications. 1297*3ff01b23SMartin Matuska 256 ZFS_DEBUG_METASLAB_VERIFY Verify space accounting on disk matches in-memory \fBrange_trees\fP. 1298*3ff01b23SMartin Matuska 512 ZFS_DEBUG_SET_ERROR Enable \fBSET_ERROR\fP and dprintf entries in the debug log. 1299*3ff01b23SMartin Matuska 1024 ZFS_DEBUG_INDIRECT_REMAP Verify split blocks created by device removal. 1300*3ff01b23SMartin Matuska 2048 ZFS_DEBUG_TRIM Verify TRIM ranges are always within the allocatable range tree. 1301*3ff01b23SMartin Matuska 4096 ZFS_DEBUG_LOG_SPACEMAP Verify that the log summary is consistent with the spacemap log 1302*3ff01b23SMartin Matuska and enable \fBzfs_dbgmsgs\fP for metaslab loading and flushing. 1303*3ff01b23SMartin Matuska.TE 1304*3ff01b23SMartin Matuska.Sy \& * No Requires debug build. 1305*3ff01b23SMartin Matuska. 1306*3ff01b23SMartin Matuska.It Sy zfs_free_leak_on_eio Ns = Ns Sy 0 Ns | Ns 1 Pq int 1307*3ff01b23SMartin MatuskaIf destroy encounters an 1308*3ff01b23SMartin Matuska.Sy EIO 1309*3ff01b23SMartin Matuskawhile reading metadata (e.g. indirect blocks), 1310*3ff01b23SMartin Matuskaspace referenced by the missing metadata can not be freed. 1311*3ff01b23SMartin MatuskaNormally this causes the background destroy to become "stalled", 1312*3ff01b23SMartin Matuskaas it is unable to make forward progress. 1313*3ff01b23SMartin MatuskaWhile in this stalled state, all remaining space to free 1314*3ff01b23SMartin Matuskafrom the error-encountering filesystem is "temporarily leaked". 1315*3ff01b23SMartin MatuskaSet this flag to cause it to ignore the 1316*3ff01b23SMartin Matuska.Sy EIO , 1317*3ff01b23SMartin Matuskapermanently leak the space from indirect blocks that can not be read, 1318*3ff01b23SMartin Matuskaand continue to free everything else that it can. 1319*3ff01b23SMartin Matuska.Pp 1320*3ff01b23SMartin MatuskaThe default "stalling" behavior is useful if the storage partially 1321*3ff01b23SMartin Matuskafails (i.e. some but not all I/O operations fail), and then later recovers. 1322*3ff01b23SMartin MatuskaIn this case, we will be able to continue pool operations while it is 1323*3ff01b23SMartin Matuskapartially failed, and when it recovers, we can continue to free the 1324*3ff01b23SMartin Matuskaspace, with no leaks. 1325*3ff01b23SMartin MatuskaNote, however, that this case is actually fairly rare. 1326*3ff01b23SMartin Matuska.Pp 1327*3ff01b23SMartin MatuskaTypically pools either 1328*3ff01b23SMartin Matuska.Bl -enum -compact -offset 4n -width "1." 1329*3ff01b23SMartin Matuska.It 1330*3ff01b23SMartin Matuskafail completely (but perhaps temporarily, 1331*3ff01b23SMartin Matuskae.g. due to a top-level vdev going offline), or 1332*3ff01b23SMartin Matuska.It 1333*3ff01b23SMartin Matuskahave localized, permanent errors (e.g. disk returns the wrong data 1334*3ff01b23SMartin Matuskadue to bit flip or firmware bug). 1335*3ff01b23SMartin Matuska.El 1336*3ff01b23SMartin MatuskaIn the former case, this setting does not matter because the 1337*3ff01b23SMartin Matuskapool will be suspended and the sync thread will not be able to make 1338*3ff01b23SMartin Matuskaforward progress regardless. 1339*3ff01b23SMartin MatuskaIn the latter, because the error is permanent, the best we can do 1340*3ff01b23SMartin Matuskais leak the minimum amount of space, 1341*3ff01b23SMartin Matuskawhich is what setting this flag will do. 1342*3ff01b23SMartin MatuskaIt is therefore reasonable for this flag to normally be set, 1343*3ff01b23SMartin Matuskabut we chose the more conservative approach of not setting it, 1344*3ff01b23SMartin Matuskaso that there is no possibility of 1345*3ff01b23SMartin Matuskaleaking space in the "partial temporary" failure case. 1346*3ff01b23SMartin Matuska. 1347*3ff01b23SMartin Matuska.It Sy zfs_free_min_time_ms Ns = Ns Sy 1000 Ns ms Po 1s Pc Pq int 1348*3ff01b23SMartin MatuskaDuring a 1349*3ff01b23SMartin Matuska.Nm zfs Cm destroy 1350*3ff01b23SMartin Matuskaoperation using the 1351*3ff01b23SMartin Matuska.Sy async_destroy 1352*3ff01b23SMartin Matuskafeature, 1353*3ff01b23SMartin Matuskaa minimum of this much time will be spent working on freeing blocks per TXG. 1354*3ff01b23SMartin Matuska. 1355*3ff01b23SMartin Matuska.It Sy zfs_obsolete_min_time_ms Ns = Ns Sy 500 Ns ms Pq int 1356*3ff01b23SMartin MatuskaSimilar to 1357*3ff01b23SMartin Matuska.Sy zfs_free_min_time_ms , 1358*3ff01b23SMartin Matuskabut for cleanup of old indirection records for removed vdevs. 1359*3ff01b23SMartin Matuska. 1360*3ff01b23SMartin Matuska.It Sy zfs_immediate_write_sz Ns = Ns Sy 32768 Ns B Po 32kB Pc Pq long 1361*3ff01b23SMartin MatuskaLargest data block to write to the ZIL. 1362*3ff01b23SMartin MatuskaLarger blocks will be treated as if the dataset being written to had the 1363*3ff01b23SMartin Matuska.Sy logbias Ns = Ns Sy throughput 1364*3ff01b23SMartin Matuskaproperty set. 1365*3ff01b23SMartin Matuska. 1366*3ff01b23SMartin Matuska.It Sy zfs_initialize_value Ns = Ns Sy 16045690984833335022 Po 0xDEADBEEFDEADBEEE Pc Pq ulong 1367*3ff01b23SMartin MatuskaPattern written to vdev free space by 1368*3ff01b23SMartin Matuska.Xr zpool-initialize 8 . 1369*3ff01b23SMartin Matuska. 1370*3ff01b23SMartin Matuska.It Sy zfs_initialize_chunk_size Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq ulong 1371*3ff01b23SMartin MatuskaSize of writes used by 1372*3ff01b23SMartin Matuska.Xr zpool-initialize 8 . 1373*3ff01b23SMartin MatuskaThis option is used by the test suite. 1374*3ff01b23SMartin Matuska. 1375*3ff01b23SMartin Matuska.It Sy zfs_livelist_max_entries Ns = Ns Sy 500000 Po 5*10^5 Pc Pq ulong 1376*3ff01b23SMartin MatuskaThe threshold size (in block pointers) at which we create a new sub-livelist. 1377*3ff01b23SMartin MatuskaLarger sublists are more costly from a memory perspective but the fewer 1378*3ff01b23SMartin Matuskasublists there are, the lower the cost of insertion. 1379*3ff01b23SMartin Matuska. 1380*3ff01b23SMartin Matuska.It Sy zfs_livelist_min_percent_shared Ns = Ns Sy 75 Ns % Pq int 1381*3ff01b23SMartin MatuskaIf the amount of shared space between a snapshot and its clone drops below 1382*3ff01b23SMartin Matuskathis threshold, the clone turns off the livelist and reverts to the old 1383*3ff01b23SMartin Matuskadeletion method. 1384*3ff01b23SMartin MatuskaThis is in place because livelists no long give us a benefit 1385*3ff01b23SMartin Matuskaonce a clone has been overwritten enough. 1386*3ff01b23SMartin Matuska. 1387*3ff01b23SMartin Matuska.It Sy zfs_livelist_condense_new_alloc Ns = Ns Sy 0 Pq int 1388*3ff01b23SMartin MatuskaIncremented each time an extra ALLOC blkptr is added to a livelist entry while 1389*3ff01b23SMartin Matuskait is being condensed. 1390*3ff01b23SMartin MatuskaThis option is used by the test suite to track race conditions. 1391*3ff01b23SMartin Matuska. 1392*3ff01b23SMartin Matuska.It Sy zfs_livelist_condense_sync_cancel Ns = Ns Sy 0 Pq int 1393*3ff01b23SMartin MatuskaIncremented each time livelist condensing is canceled while in 1394*3ff01b23SMartin Matuska.Fn spa_livelist_condense_sync . 1395*3ff01b23SMartin MatuskaThis option is used by the test suite to track race conditions. 1396*3ff01b23SMartin Matuska. 1397*3ff01b23SMartin Matuska.It Sy zfs_livelist_condense_sync_pause Ns = Ns Sy 0 Ns | Ns 1 Pq int 1398*3ff01b23SMartin MatuskaWhen set, the livelist condense process pauses indefinitely before 1399*3ff01b23SMartin Matuskaexecuting the synctask - 1400*3ff01b23SMartin Matuska.Fn spa_livelist_condense_sync . 1401*3ff01b23SMartin MatuskaThis option is used by the test suite to trigger race conditions. 1402*3ff01b23SMartin Matuska. 1403*3ff01b23SMartin Matuska.It Sy zfs_livelist_condense_zthr_cancel Ns = Ns Sy 0 Pq int 1404*3ff01b23SMartin MatuskaIncremented each time livelist condensing is canceled while in 1405*3ff01b23SMartin Matuska.Fn spa_livelist_condense_cb . 1406*3ff01b23SMartin MatuskaThis option is used by the test suite to track race conditions. 1407*3ff01b23SMartin Matuska. 1408*3ff01b23SMartin Matuska.It Sy zfs_livelist_condense_zthr_pause Ns = Ns Sy 0 Ns | Ns 1 Pq int 1409*3ff01b23SMartin MatuskaWhen set, the livelist condense process pauses indefinitely before 1410*3ff01b23SMartin Matuskaexecuting the open context condensing work in 1411*3ff01b23SMartin Matuska.Fn spa_livelist_condense_cb . 1412*3ff01b23SMartin MatuskaThis option is used by the test suite to trigger race conditions. 1413*3ff01b23SMartin Matuska. 1414*3ff01b23SMartin Matuska.It Sy zfs_lua_max_instrlimit Ns = Ns Sy 100000000 Po 10^8 Pc Pq ulong 1415*3ff01b23SMartin MatuskaThe maximum execution time limit that can be set for a ZFS channel program, 1416*3ff01b23SMartin Matuskaspecified as a number of Lua instructions. 1417*3ff01b23SMartin Matuska. 1418*3ff01b23SMartin Matuska.It Sy zfs_lua_max_memlimit Ns = Ns Sy 104857600 Po 100MB Pc Pq ulong 1419*3ff01b23SMartin MatuskaThe maximum memory limit that can be set for a ZFS channel program, specified 1420*3ff01b23SMartin Matuskain bytes. 1421*3ff01b23SMartin Matuska. 1422*3ff01b23SMartin Matuska.It Sy zfs_max_dataset_nesting Ns = Ns Sy 50 Pq int 1423*3ff01b23SMartin MatuskaThe maximum depth of nested datasets. 1424*3ff01b23SMartin MatuskaThis value can be tuned temporarily to 1425*3ff01b23SMartin Matuskafix existing datasets that exceed the predefined limit. 1426*3ff01b23SMartin Matuska. 1427*3ff01b23SMartin Matuska.It Sy zfs_max_log_walking Ns = Ns Sy 5 Pq ulong 1428*3ff01b23SMartin MatuskaThe number of past TXGs that the flushing algorithm of the log spacemap 1429*3ff01b23SMartin Matuskafeature uses to estimate incoming log blocks. 1430*3ff01b23SMartin Matuska. 1431*3ff01b23SMartin Matuska.It Sy zfs_max_logsm_summary_length Ns = Ns Sy 10 Pq ulong 1432*3ff01b23SMartin MatuskaMaximum number of rows allowed in the summary of the spacemap log. 1433*3ff01b23SMartin Matuska. 1434*3ff01b23SMartin Matuska.It Sy zfs_max_recordsize Ns = Ns Sy 1048576 Po 1MB Pc Pq int 1435*3ff01b23SMartin MatuskaWe currently support block sizes from 1436*3ff01b23SMartin Matuska.Em 512B No to Em 16MB . 1437*3ff01b23SMartin MatuskaThe benefits of larger blocks, and thus larger I/O, 1438*3ff01b23SMartin Matuskaneed to be weighed against the cost of COWing a giant block to modify one byte. 1439*3ff01b23SMartin MatuskaAdditionally, very large blocks can have an impact on I/O latency, 1440*3ff01b23SMartin Matuskaand also potentially on the memory allocator. 1441*3ff01b23SMartin MatuskaTherefore, we do not allow the recordsize to be set larger than this tunable. 1442*3ff01b23SMartin MatuskaLarger blocks can be created by changing it, 1443*3ff01b23SMartin Matuskaand pools with larger blocks can always be imported and used, 1444*3ff01b23SMartin Matuskaregardless of this setting. 1445*3ff01b23SMartin Matuska. 1446*3ff01b23SMartin Matuska.It Sy zfs_allow_redacted_dataset_mount Ns = Ns Sy 0 Ns | Ns 1 Pq int 1447*3ff01b23SMartin MatuskaAllow datasets received with redacted send/receive to be mounted. 1448*3ff01b23SMartin MatuskaNormally disabled because these datasets may be missing key data. 1449*3ff01b23SMartin Matuska. 1450*3ff01b23SMartin Matuska.It Sy zfs_min_metaslabs_to_flush Ns = Ns Sy 1 Pq ulong 1451*3ff01b23SMartin MatuskaMinimum number of metaslabs to flush per dirty TXG. 1452*3ff01b23SMartin Matuska. 1453*3ff01b23SMartin Matuska.It Sy zfs_metaslab_fragmentation_threshold Ns = Ns Sy 70 Ns % Pq int 1454*3ff01b23SMartin MatuskaAllow metaslabs to keep their active state as long as their fragmentation 1455*3ff01b23SMartin Matuskapercentage is no more than this value. 1456*3ff01b23SMartin MatuskaAn active metaslab that exceeds this threshold 1457*3ff01b23SMartin Matuskawill no longer keep its active status allowing better metaslabs to be selected. 1458*3ff01b23SMartin Matuska. 1459*3ff01b23SMartin Matuska.It Sy zfs_mg_fragmentation_threshold Ns = Ns Sy 95 Ns % Pq int 1460*3ff01b23SMartin MatuskaMetaslab groups are considered eligible for allocations if their 1461*3ff01b23SMartin Matuskafragmentation metric (measured as a percentage) is less than or equal to 1462*3ff01b23SMartin Matuskathis value. 1463*3ff01b23SMartin MatuskaIf a metaslab group exceeds this threshold then it will be 1464*3ff01b23SMartin Matuskaskipped unless all metaslab groups within the metaslab class have also 1465*3ff01b23SMartin Matuskacrossed this threshold. 1466*3ff01b23SMartin Matuska. 1467*3ff01b23SMartin Matuska.It Sy zfs_mg_noalloc_threshold Ns = Ns Sy 0 Ns % Pq int 1468*3ff01b23SMartin MatuskaDefines a threshold at which metaslab groups should be eligible for allocations. 1469*3ff01b23SMartin MatuskaThe value is expressed as a percentage of free space 1470*3ff01b23SMartin Matuskabeyond which a metaslab group is always eligible for allocations. 1471*3ff01b23SMartin MatuskaIf a metaslab group's free space is less than or equal to the 1472*3ff01b23SMartin Matuskathreshold, the allocator will avoid allocating to that group 1473*3ff01b23SMartin Matuskaunless all groups in the pool have reached the threshold. 1474*3ff01b23SMartin MatuskaOnce all groups have reached the threshold, all groups are allowed to accept 1475*3ff01b23SMartin Matuskaallocations. 1476*3ff01b23SMartin MatuskaThe default value of 1477*3ff01b23SMartin Matuska.Sy 0 1478*3ff01b23SMartin Matuskadisables the feature and causes all metaslab groups to be eligible for allocations. 1479*3ff01b23SMartin Matuska.Pp 1480*3ff01b23SMartin MatuskaThis parameter allows one to deal with pools having heavily imbalanced 1481*3ff01b23SMartin Matuskavdevs such as would be the case when a new vdev has been added. 1482*3ff01b23SMartin MatuskaSetting the threshold to a non-zero percentage will stop allocations 1483*3ff01b23SMartin Matuskafrom being made to vdevs that aren't filled to the specified percentage 1484*3ff01b23SMartin Matuskaand allow lesser filled vdevs to acquire more allocations than they 1485*3ff01b23SMartin Matuskaotherwise would under the old 1486*3ff01b23SMartin Matuska.Sy zfs_mg_alloc_failures 1487*3ff01b23SMartin Matuskafacility. 1488*3ff01b23SMartin Matuska. 1489*3ff01b23SMartin Matuska.It Sy zfs_ddt_data_is_special Ns = Ns Sy 1 Ns | Ns 0 Pq int 1490*3ff01b23SMartin MatuskaIf enabled, ZFS will place DDT data into the special allocation class. 1491*3ff01b23SMartin Matuska. 1492*3ff01b23SMartin Matuska.It Sy zfs_user_indirect_is_special Ns = Ns Sy 1 Ns | Ns 0 Pq int 1493*3ff01b23SMartin MatuskaIf enabled, ZFS will place user data indirect blocks 1494*3ff01b23SMartin Matuskainto the special allocation class. 1495*3ff01b23SMartin Matuska. 1496*3ff01b23SMartin Matuska.It Sy zfs_multihost_history Ns = Ns Sy 0 Pq int 1497*3ff01b23SMartin MatuskaHistorical statistics for this many latest multihost updates will be available in 1498*3ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/ Ns Ao Ar pool Ac Ns Pa /multihost . 1499*3ff01b23SMartin Matuska. 1500*3ff01b23SMartin Matuska.It Sy zfs_multihost_interval Ns = Ns Sy 1000 Ns ms Po 1s Pc Pq ulong 1501*3ff01b23SMartin MatuskaUsed to control the frequency of multihost writes which are performed when the 1502*3ff01b23SMartin Matuska.Sy multihost 1503*3ff01b23SMartin Matuskapool property is on. 1504*3ff01b23SMartin MatuskaThis is one of the factors used to determine the 1505*3ff01b23SMartin Matuskalength of the activity check during import. 1506*3ff01b23SMartin Matuska.Pp 1507*3ff01b23SMartin MatuskaThe multihost write period is 1508*3ff01b23SMartin Matuska.Sy zfs_multihost_interval / leaf-vdevs . 1509*3ff01b23SMartin MatuskaOn average a multihost write will be issued for each leaf vdev 1510*3ff01b23SMartin Matuskaevery 1511*3ff01b23SMartin Matuska.Sy zfs_multihost_interval 1512*3ff01b23SMartin Matuskamilliseconds. 1513*3ff01b23SMartin MatuskaIn practice, the observed period can vary with the I/O load 1514*3ff01b23SMartin Matuskaand this observed value is the delay which is stored in the uberblock. 1515*3ff01b23SMartin Matuska. 1516*3ff01b23SMartin Matuska.It Sy zfs_multihost_import_intervals Ns = Ns Sy 20 Pq uint 1517*3ff01b23SMartin MatuskaUsed to control the duration of the activity test on import. 1518*3ff01b23SMartin MatuskaSmaller values of 1519*3ff01b23SMartin Matuska.Sy zfs_multihost_import_intervals 1520*3ff01b23SMartin Matuskawill reduce the import time but increase 1521*3ff01b23SMartin Matuskathe risk of failing to detect an active pool. 1522*3ff01b23SMartin MatuskaThe total activity check time is never allowed to drop below one second. 1523*3ff01b23SMartin Matuska.Pp 1524*3ff01b23SMartin MatuskaOn import the activity check waits a minimum amount of time determined by 1525*3ff01b23SMartin Matuska.Sy zfs_multihost_interval * zfs_multihost_import_intervals , 1526*3ff01b23SMartin Matuskaor the same product computed on the host which last had the pool imported, 1527*3ff01b23SMartin Matuskawhichever is greater. 1528*3ff01b23SMartin MatuskaThe activity check time may be further extended if the value of MMP 1529*3ff01b23SMartin Matuskadelay found in the best uberblock indicates actual multihost updates happened 1530*3ff01b23SMartin Matuskaat longer intervals than 1531*3ff01b23SMartin Matuska.Sy zfs_multihost_interval . 1532*3ff01b23SMartin MatuskaA minimum of 1533*3ff01b23SMartin Matuska.Em 100ms 1534*3ff01b23SMartin Matuskais enforced. 1535*3ff01b23SMartin Matuska.Pp 1536*3ff01b23SMartin Matuska.Sy 0 No is equivalent to Sy 1 . 1537*3ff01b23SMartin Matuska. 1538*3ff01b23SMartin Matuska.It Sy zfs_multihost_fail_intervals Ns = Ns Sy 10 Pq uint 1539*3ff01b23SMartin MatuskaControls the behavior of the pool when multihost write failures or delays are 1540*3ff01b23SMartin Matuskadetected. 1541*3ff01b23SMartin Matuska.Pp 1542*3ff01b23SMartin MatuskaWhen 1543*3ff01b23SMartin Matuska.Sy 0 , 1544*3ff01b23SMartin Matuskamultihost write failures or delays are ignored. 1545*3ff01b23SMartin MatuskaThe failures will still be reported to the ZED which depending on 1546*3ff01b23SMartin Matuskaits configuration may take action such as suspending the pool or offlining a 1547*3ff01b23SMartin Matuskadevice. 1548*3ff01b23SMartin Matuska.Pp 1549*3ff01b23SMartin MatuskaOtherwise, the pool will be suspended if 1550*3ff01b23SMartin Matuska.Sy zfs_multihost_fail_intervals * zfs_multihost_interval 1551*3ff01b23SMartin Matuskamilliseconds pass without a successful MMP write. 1552*3ff01b23SMartin MatuskaThis guarantees the activity test will see MMP writes if the pool is imported. 1553*3ff01b23SMartin Matuska.Sy 1 No is equivalent to Sy 2 ; 1554*3ff01b23SMartin Matuskathis is necessary to prevent the pool from being suspended 1555*3ff01b23SMartin Matuskadue to normal, small I/O latency variations. 1556*3ff01b23SMartin Matuska. 1557*3ff01b23SMartin Matuska.It Sy zfs_no_scrub_io Ns = Ns Sy 0 Ns | Ns 1 Pq int 1558*3ff01b23SMartin MatuskaSet to disable scrub I/O. 1559*3ff01b23SMartin MatuskaThis results in scrubs not actually scrubbing data and 1560*3ff01b23SMartin Matuskasimply doing a metadata crawl of the pool instead. 1561*3ff01b23SMartin Matuska. 1562*3ff01b23SMartin Matuska.It Sy zfs_no_scrub_prefetch Ns = Ns Sy 0 Ns | Ns 1 Pq int 1563*3ff01b23SMartin MatuskaSet to disable block prefetching for scrubs. 1564*3ff01b23SMartin Matuska. 1565*3ff01b23SMartin Matuska.It Sy zfs_nocacheflush Ns = Ns Sy 0 Ns | Ns 1 Pq int 1566*3ff01b23SMartin MatuskaDisable cache flush operations on disks when writing. 1567*3ff01b23SMartin MatuskaSetting this will cause pool corruption on power loss 1568*3ff01b23SMartin Matuskaif a volatile out-of-order write cache is enabled. 1569*3ff01b23SMartin Matuska. 1570*3ff01b23SMartin Matuska.It Sy zfs_nopwrite_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int 1571*3ff01b23SMartin MatuskaAllow no-operation writes. 1572*3ff01b23SMartin MatuskaThe occurrence of nopwrites will further depend on other pool properties 1573*3ff01b23SMartin Matuska.Pq i.a. the checksumming and compression algorithms . 1574*3ff01b23SMartin Matuska. 1575*3ff01b23SMartin Matuska.It Sy zfs_dmu_offset_next_sync Ns = Ns Sy 0 Ns | ns 1 Pq int 1576*3ff01b23SMartin MatuskaEnable forcing TXG sync to find holes. 1577*3ff01b23SMartin MatuskaWhen enabled forces ZFS to act like prior versions when 1578*3ff01b23SMartin Matuska.Sy SEEK_HOLE No or Sy SEEK_DATA 1579*3ff01b23SMartin Matuskaflags are used, which, when a dnode is dirty, 1580*3ff01b23SMartin Matuskacauses TXGs to be synced so that this data can be found. 1581*3ff01b23SMartin Matuska. 1582*3ff01b23SMartin Matuska.It Sy zfs_pd_bytes_max Ns = Ns Sy 52428800 Ns B Po 50MB Pc Pq int 1583*3ff01b23SMartin MatuskaThe number of bytes which should be prefetched during a pool traversal, like 1584*3ff01b23SMartin Matuska.Nm zfs Cm send 1585*3ff01b23SMartin Matuskaor other data crawling operations. 1586*3ff01b23SMartin Matuska. 1587*3ff01b23SMartin Matuska.It Sy zfs_traverse_indirect_prefetch_limit Ns = Ns Sy 32 Pq int 1588*3ff01b23SMartin MatuskaThe number of blocks pointed by indirect (non-L0) block which should be 1589*3ff01b23SMartin Matuskaprefetched during a pool traversal, like 1590*3ff01b23SMartin Matuska.Nm zfs Cm send 1591*3ff01b23SMartin Matuskaor other data crawling operations. 1592*3ff01b23SMartin Matuska. 1593*3ff01b23SMartin Matuska.It Sy zfs_per_txg_dirty_frees_percent Ns = Ns Sy 5 Ns % Pq ulong 1594*3ff01b23SMartin MatuskaControl percentage of dirtied indirect blocks from frees allowed into one TXG. 1595*3ff01b23SMartin MatuskaAfter this threshold is crossed, additional frees will wait until the next TXG. 1596*3ff01b23SMartin Matuska.Sy 0 No disables this throttle. 1597*3ff01b23SMartin Matuska. 1598*3ff01b23SMartin Matuska.It Sy zfs_prefetch_disable Ns = Ns Sy 0 Ns | Ns 1 Pq int 1599*3ff01b23SMartin MatuskaDisable predictive prefetch. 1600*3ff01b23SMartin MatuskaNote that it leaves "prescient" prefetch (for. e.g.\& 1601*3ff01b23SMartin Matuska.Nm zfs Cm send ) 1602*3ff01b23SMartin Matuskaintact. 1603*3ff01b23SMartin MatuskaUnlike predictive prefetch, prescient prefetch never issues I/O 1604*3ff01b23SMartin Matuskathat ends up not being needed, so it can't hurt performance. 1605*3ff01b23SMartin Matuska. 1606*3ff01b23SMartin Matuska.It Sy zfs_qat_checksum_disable Ns = Ns Sy 0 Ns | Ns 1 Pq int 1607*3ff01b23SMartin MatuskaDisable QAT hardware acceleration for SHA256 checksums. 1608*3ff01b23SMartin MatuskaMay be unset after the ZFS modules have been loaded to initialize the QAT 1609*3ff01b23SMartin Matuskahardware as long as support is compiled in and the QAT driver is present. 1610*3ff01b23SMartin Matuska. 1611*3ff01b23SMartin Matuska.It Sy zfs_qat_compress_disable Ns = Ns Sy 0 Ns | Ns 1 Pq int 1612*3ff01b23SMartin MatuskaDisable QAT hardware acceleration for gzip compression. 1613*3ff01b23SMartin MatuskaMay be unset after the ZFS modules have been loaded to initialize the QAT 1614*3ff01b23SMartin Matuskahardware as long as support is compiled in and the QAT driver is present. 1615*3ff01b23SMartin Matuska. 1616*3ff01b23SMartin Matuska.It Sy zfs_qat_encrypt_disable Ns = Ns Sy 0 Ns | Ns 1 Pq int 1617*3ff01b23SMartin MatuskaDisable QAT hardware acceleration for AES-GCM encryption. 1618*3ff01b23SMartin MatuskaMay be unset after the ZFS modules have been loaded to initialize the QAT 1619*3ff01b23SMartin Matuskahardware as long as support is compiled in and the QAT driver is present. 1620*3ff01b23SMartin Matuska. 1621*3ff01b23SMartin Matuska.It Sy zfs_vnops_read_chunk_size Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq long 1622*3ff01b23SMartin MatuskaBytes to read per chunk. 1623*3ff01b23SMartin Matuska. 1624*3ff01b23SMartin Matuska.It Sy zfs_read_history Ns = Ns Sy 0 Pq int 1625*3ff01b23SMartin MatuskaHistorical statistics for this many latest reads will be available in 1626*3ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/ Ns Ao Ar pool Ac Ns Pa /reads . 1627*3ff01b23SMartin Matuska. 1628*3ff01b23SMartin Matuska.It Sy zfs_read_history_hits Ns = Ns Sy 0 Ns | Ns 1 Pq int 1629*3ff01b23SMartin MatuskaInclude cache hits in read history 1630*3ff01b23SMartin Matuska. 1631*3ff01b23SMartin Matuska.It Sy zfs_rebuild_max_segment Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq ulong 1632*3ff01b23SMartin MatuskaMaximum read segment size to issue when sequentially resilvering a 1633*3ff01b23SMartin Matuskatop-level vdev. 1634*3ff01b23SMartin Matuska. 1635*3ff01b23SMartin Matuska.It Sy zfs_rebuild_scrub_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int 1636*3ff01b23SMartin MatuskaAutomatically start a pool scrub when the last active sequential resilver 1637*3ff01b23SMartin Matuskacompletes in order to verify the checksums of all blocks which have been 1638*3ff01b23SMartin Matuskaresilvered. 1639*3ff01b23SMartin MatuskaThis is enabled by default and strongly recommended. 1640*3ff01b23SMartin Matuska. 1641*3ff01b23SMartin Matuska.It Sy zfs_rebuild_vdev_limit Ns = Ns Sy 33554432 Ns B Po 32MB Pc Pq ulong 1642*3ff01b23SMartin MatuskaMaximum amount of I/O that can be concurrently issued for a sequential 1643*3ff01b23SMartin Matuskaresilver per leaf device, given in bytes. 1644*3ff01b23SMartin Matuska. 1645*3ff01b23SMartin Matuska.It Sy zfs_reconstruct_indirect_combinations_max Ns = Ns Sy 4096 Pq int 1646*3ff01b23SMartin MatuskaIf an indirect split block contains more than this many possible unique 1647*3ff01b23SMartin Matuskacombinations when being reconstructed, consider it too computationally 1648*3ff01b23SMartin Matuskaexpensive to check them all. 1649*3ff01b23SMartin MatuskaInstead, try at most this many randomly selected 1650*3ff01b23SMartin Matuskacombinations each time the block is accessed. 1651*3ff01b23SMartin MatuskaThis allows all segment copies to participate fairly 1652*3ff01b23SMartin Matuskain the reconstruction when all combinations 1653*3ff01b23SMartin Matuskacannot be checked and prevents repeated use of one bad copy. 1654*3ff01b23SMartin Matuska. 1655*3ff01b23SMartin Matuska.It Sy zfs_recover Ns = Ns Sy 0 Ns | Ns 1 Pq int 1656*3ff01b23SMartin MatuskaSet to attempt to recover from fatal errors. 1657*3ff01b23SMartin MatuskaThis should only be used as a last resort, 1658*3ff01b23SMartin Matuskaas it typically results in leaked space, or worse. 1659*3ff01b23SMartin Matuska. 1660*3ff01b23SMartin Matuska.It Sy zfs_removal_ignore_errors Ns = Ns Sy 0 Ns | Ns 1 Pq int 1661*3ff01b23SMartin MatuskaIgnore hard IO errors during device removal. 1662*3ff01b23SMartin MatuskaWhen set, if a device encounters a hard IO error during the removal process 1663*3ff01b23SMartin Matuskathe removal will not be cancelled. 1664*3ff01b23SMartin MatuskaThis can result in a normally recoverable block becoming permanently damaged 1665*3ff01b23SMartin Matuskaand is hence not recommended. 1666*3ff01b23SMartin MatuskaThis should only be used as a last resort when the 1667*3ff01b23SMartin Matuskapool cannot be returned to a healthy state prior to removing the device. 1668*3ff01b23SMartin Matuska. 1669*3ff01b23SMartin Matuska.It Sy zfs_removal_suspend_progress Ns = Ns Sy 0 Ns | Ns 1 Pq int 1670*3ff01b23SMartin MatuskaThis is used by the test suite so that it can ensure that certain actions 1671*3ff01b23SMartin Matuskahappen while in the middle of a removal. 1672*3ff01b23SMartin Matuska. 1673*3ff01b23SMartin Matuska.It Sy zfs_remove_max_segment Ns = Ns Sy 16777216 Ns B Po 16MB Pc Pq int 1674*3ff01b23SMartin MatuskaThe largest contiguous segment that we will attempt to allocate when removing 1675*3ff01b23SMartin Matuskaa device. 1676*3ff01b23SMartin MatuskaIf there is a performance problem with attempting to allocate large blocks, 1677*3ff01b23SMartin Matuskaconsider decreasing this. 1678*3ff01b23SMartin MatuskaThe default value is also the maximum. 1679*3ff01b23SMartin Matuska. 1680*3ff01b23SMartin Matuska.It Sy zfs_resilver_disable_defer Ns = Ns Sy 0 Ns | Ns 1 Pq int 1681*3ff01b23SMartin MatuskaIgnore the 1682*3ff01b23SMartin Matuska.Sy resilver_defer 1683*3ff01b23SMartin Matuskafeature, causing an operation that would start a resilver to 1684*3ff01b23SMartin Matuskaimmediately restart the one in progress. 1685*3ff01b23SMartin Matuska. 1686*3ff01b23SMartin Matuska.It Sy zfs_resilver_min_time_ms Ns = Ns Sy 3000 Ns ms Po 3s Pc Pq int 1687*3ff01b23SMartin MatuskaResilvers are processed by the sync thread. 1688*3ff01b23SMartin MatuskaWhile resilvering, it will spend at least this much time 1689*3ff01b23SMartin Matuskaworking on a resilver between TXG flushes. 1690*3ff01b23SMartin Matuska. 1691*3ff01b23SMartin Matuska.It Sy zfs_scan_ignore_errors Ns = Ns Sy 0 Ns | Ns 1 Pq int 1692*3ff01b23SMartin MatuskaIf set, remove the DTL (dirty time list) upon completion of a pool scan (scrub), 1693*3ff01b23SMartin Matuskaeven if there were unrepairable errors. 1694*3ff01b23SMartin MatuskaIntended to be used during pool repair or recovery to 1695*3ff01b23SMartin Matuskastop resilvering when the pool is next imported. 1696*3ff01b23SMartin Matuska. 1697*3ff01b23SMartin Matuska.It Sy zfs_scrub_min_time_ms Ns = Ns Sy 1000 Ns ms Po 1s Pc Pq int 1698*3ff01b23SMartin MatuskaScrubs are processed by the sync thread. 1699*3ff01b23SMartin MatuskaWhile scrubbing, it will spend at least this much time 1700*3ff01b23SMartin Matuskaworking on a scrub between TXG flushes. 1701*3ff01b23SMartin Matuska. 1702*3ff01b23SMartin Matuska.It Sy zfs_scan_checkpoint_intval Ns = Ns Sy 7200 Ns s Po 2h Pc Pq int 1703*3ff01b23SMartin MatuskaTo preserve progress across reboots, the sequential scan algorithm periodically 1704*3ff01b23SMartin Matuskaneeds to stop metadata scanning and issue all the verification I/O to disk. 1705*3ff01b23SMartin MatuskaThe frequency of this flushing is determined by this tunable. 1706*3ff01b23SMartin Matuska. 1707*3ff01b23SMartin Matuska.It Sy zfs_scan_fill_weight Ns = Ns Sy 3 Pq int 1708*3ff01b23SMartin MatuskaThis tunable affects how scrub and resilver I/O segments are ordered. 1709*3ff01b23SMartin MatuskaA higher number indicates that we care more about how filled in a segment is, 1710*3ff01b23SMartin Matuskawhile a lower number indicates we care more about the size of the extent without 1711*3ff01b23SMartin Matuskaconsidering the gaps within a segment. 1712*3ff01b23SMartin MatuskaThis value is only tunable upon module insertion. 1713*3ff01b23SMartin MatuskaChanging the value afterwards will have no affect on scrub or resilver performance. 1714*3ff01b23SMartin Matuska. 1715*3ff01b23SMartin Matuska.It Sy zfs_scan_issue_strategy Ns = Ns Sy 0 Pq int 1716*3ff01b23SMartin MatuskaDetermines the order that data will be verified while scrubbing or resilvering: 1717*3ff01b23SMartin Matuska.Bl -tag -compact -offset 4n -width "a" 1718*3ff01b23SMartin Matuska.It Sy 1 1719*3ff01b23SMartin MatuskaData will be verified as sequentially as possible, given the 1720*3ff01b23SMartin Matuskaamount of memory reserved for scrubbing 1721*3ff01b23SMartin Matuska.Pq see Sy zfs_scan_mem_lim_fact . 1722*3ff01b23SMartin MatuskaThis may improve scrub performance if the pool's data is very fragmented. 1723*3ff01b23SMartin Matuska.It Sy 2 1724*3ff01b23SMartin MatuskaThe largest mostly-contiguous chunk of found data will be verified first. 1725*3ff01b23SMartin MatuskaBy deferring scrubbing of small segments, we may later find adjacent data 1726*3ff01b23SMartin Matuskato coalesce and increase the segment size. 1727*3ff01b23SMartin Matuska.It Sy 0 1728*3ff01b23SMartin Matuska.No Use strategy Sy 1 No during normal verification 1729*3ff01b23SMartin Matuska.No and strategy Sy 2 No while taking a checkpoint. 1730*3ff01b23SMartin Matuska.El 1731*3ff01b23SMartin Matuska. 1732*3ff01b23SMartin Matuska.It Sy zfs_scan_legacy Ns = Ns Sy 0 Ns | Ns 1 Pq int 1733*3ff01b23SMartin MatuskaIf unset, indicates that scrubs and resilvers will gather metadata in 1734*3ff01b23SMartin Matuskamemory before issuing sequential I/O. 1735*3ff01b23SMartin MatuskaOtherwise indicates that the legacy algorithm will be used, 1736*3ff01b23SMartin Matuskawhere I/O is initiated as soon as it is discovered. 1737*3ff01b23SMartin MatuskaUnsetting will not affect scrubs or resilvers that are already in progress. 1738*3ff01b23SMartin Matuska. 1739*3ff01b23SMartin Matuska.It Sy zfs_scan_max_ext_gap Ns = Ns Sy 2097152 Ns B Po 2MB Pc Pq int 1740*3ff01b23SMartin MatuskaSets the largest gap in bytes between scrub/resilver I/O operations 1741*3ff01b23SMartin Matuskathat will still be considered sequential for sorting purposes. 1742*3ff01b23SMartin MatuskaChanging this value will not 1743*3ff01b23SMartin Matuskaaffect scrubs or resilvers that are already in progress. 1744*3ff01b23SMartin Matuska. 1745*3ff01b23SMartin Matuska.It Sy zfs_scan_mem_lim_fact Ns = Ns Sy 20 Ns ^-1 Pq int 1746*3ff01b23SMartin MatuskaMaximum fraction of RAM used for I/O sorting by sequential scan algorithm. 1747*3ff01b23SMartin MatuskaThis tunable determines the hard limit for I/O sorting memory usage. 1748*3ff01b23SMartin MatuskaWhen the hard limit is reached we stop scanning metadata and start issuing 1749*3ff01b23SMartin Matuskadata verification I/O. 1750*3ff01b23SMartin MatuskaThis is done until we get below the soft limit. 1751*3ff01b23SMartin Matuska. 1752*3ff01b23SMartin Matuska.It Sy zfs_scan_mem_lim_soft_fact Ns = Ns Sy 20 Ns ^-1 Pq int 1753*3ff01b23SMartin MatuskaThe fraction of the hard limit used to determined the soft limit for I/O sorting 1754*3ff01b23SMartin Matuskaby the sequential scan algorithm. 1755*3ff01b23SMartin MatuskaWhen we cross this limit from below no action is taken. 1756*3ff01b23SMartin MatuskaWhen we cross this limit from above it is because we are issuing verification I/O. 1757*3ff01b23SMartin MatuskaIn this case (unless the metadata scan is done) we stop issuing verification I/O 1758*3ff01b23SMartin Matuskaand start scanning metadata again until we get to the hard limit. 1759*3ff01b23SMartin Matuska. 1760*3ff01b23SMartin Matuska.It Sy zfs_scan_strict_mem_lim Ns = Ns Sy 0 Ns | Ns 1 Pq int 1761*3ff01b23SMartin MatuskaEnforce tight memory limits on pool scans when a sequential scan is in progress. 1762*3ff01b23SMartin MatuskaWhen disabled, the memory limit may be exceeded by fast disks. 1763*3ff01b23SMartin Matuska. 1764*3ff01b23SMartin Matuska.It Sy zfs_scan_suspend_progress Ns = Ns Sy 0 Ns | Ns 1 Pq int 1765*3ff01b23SMartin MatuskaFreezes a scrub/resilver in progress without actually pausing it. 1766*3ff01b23SMartin MatuskaIntended for testing/debugging. 1767*3ff01b23SMartin Matuska. 1768*3ff01b23SMartin Matuska.It Sy zfs_scan_vdev_limit Ns = Ns Sy 4194304 Ns B Po 4MB Pc Pq int 1769*3ff01b23SMartin MatuskaMaximum amount of data that can be concurrently issued at once for scrubs and 1770*3ff01b23SMartin Matuskaresilvers per leaf device, given in bytes. 1771*3ff01b23SMartin Matuska. 1772*3ff01b23SMartin Matuska.It Sy zfs_send_corrupt_data Ns = Ns Sy 0 Ns | Ns 1 Pq int 1773*3ff01b23SMartin MatuskaAllow sending of corrupt data (ignore read/checksum errors when sending). 1774*3ff01b23SMartin Matuska. 1775*3ff01b23SMartin Matuska.It Sy zfs_send_unmodified_spill_blocks Ns = Ns Sy 1 Ns | Ns 0 Pq int 1776*3ff01b23SMartin MatuskaInclude unmodified spill blocks in the send stream. 1777*3ff01b23SMartin MatuskaUnder certain circumstances, previous versions of ZFS could incorrectly 1778*3ff01b23SMartin Matuskaremove the spill block from an existing object. 1779*3ff01b23SMartin MatuskaIncluding unmodified copies of the spill blocks creates a backwards-compatible 1780*3ff01b23SMartin Matuskastream which will recreate a spill block if it was incorrectly removed. 1781*3ff01b23SMartin Matuska. 1782*3ff01b23SMartin Matuska.It Sy zfs_send_no_prefetch_queue_ff Ns = Ns Sy 20 Ns ^-1 Pq int 1783*3ff01b23SMartin MatuskaThe fill fraction of the 1784*3ff01b23SMartin Matuska.Nm zfs Cm send 1785*3ff01b23SMartin Matuskainternal queues. 1786*3ff01b23SMartin MatuskaThe fill fraction controls the timing with which internal threads are woken up. 1787*3ff01b23SMartin Matuska. 1788*3ff01b23SMartin Matuska.It Sy zfs_send_no_prefetch_queue_length Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq int 1789*3ff01b23SMartin MatuskaThe maximum number of bytes allowed in 1790*3ff01b23SMartin Matuska.Nm zfs Cm send Ns 's 1791*3ff01b23SMartin Matuskainternal queues. 1792*3ff01b23SMartin Matuska. 1793*3ff01b23SMartin Matuska.It Sy zfs_send_queue_ff Ns = Ns Sy 20 Ns ^-1 Pq int 1794*3ff01b23SMartin MatuskaThe fill fraction of the 1795*3ff01b23SMartin Matuska.Nm zfs Cm send 1796*3ff01b23SMartin Matuskaprefetch queue. 1797*3ff01b23SMartin MatuskaThe fill fraction controls the timing with which internal threads are woken up. 1798*3ff01b23SMartin Matuska. 1799*3ff01b23SMartin Matuska.It Sy zfs_send_queue_length Ns = Ns Sy 16777216 Ns B Po 16MB Pc Pq int 1800*3ff01b23SMartin MatuskaThe maximum number of bytes allowed that will be prefetched by 1801*3ff01b23SMartin Matuska.Nm zfs Cm send . 1802*3ff01b23SMartin MatuskaThis value must be at least twice the maximum block size in use. 1803*3ff01b23SMartin Matuska. 1804*3ff01b23SMartin Matuska.It Sy zfs_recv_queue_ff Ns = Ns Sy 20 Ns ^-1 Pq int 1805*3ff01b23SMartin MatuskaThe fill fraction of the 1806*3ff01b23SMartin Matuska.Nm zfs Cm receive 1807*3ff01b23SMartin Matuskaqueue. 1808*3ff01b23SMartin MatuskaThe fill fraction controls the timing with which internal threads are woken up. 1809*3ff01b23SMartin Matuska. 1810*3ff01b23SMartin Matuska.It Sy zfs_recv_queue_length Ns = Ns Sy 16777216 Ns B Po 16MB Pc Pq int 1811*3ff01b23SMartin MatuskaThe maximum number of bytes allowed in the 1812*3ff01b23SMartin Matuska.Nm zfs Cm receive 1813*3ff01b23SMartin Matuskaqueue. 1814*3ff01b23SMartin MatuskaThis value must be at least twice the maximum block size in use. 1815*3ff01b23SMartin Matuska. 1816*3ff01b23SMartin Matuska.It Sy zfs_recv_write_batch_size Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq int 1817*3ff01b23SMartin MatuskaThe maximum amount of data, in bytes, that 1818*3ff01b23SMartin Matuska.Nm zfs Cm receive 1819*3ff01b23SMartin Matuskawill write in one DMU transaction. 1820*3ff01b23SMartin MatuskaThis is the uncompressed size, even when receiving a compressed send stream. 1821*3ff01b23SMartin MatuskaThis setting will not reduce the write size below a single block. 1822*3ff01b23SMartin MatuskaCapped at a maximum of 1823*3ff01b23SMartin Matuska.Sy 32MB . 1824*3ff01b23SMartin Matuska. 1825*3ff01b23SMartin Matuska.It Sy zfs_override_estimate_recordsize Ns = Ns Sy 0 Ns | Ns 1 Pq ulong 1826*3ff01b23SMartin MatuskaSetting this variable overrides the default logic for estimating block 1827*3ff01b23SMartin Matuskasizes when doing a 1828*3ff01b23SMartin Matuska.Nm zfs Cm send . 1829*3ff01b23SMartin MatuskaThe default heuristic is that the average block size 1830*3ff01b23SMartin Matuskawill be the current recordsize. 1831*3ff01b23SMartin MatuskaOverride this value if most data in your dataset is not of that size 1832*3ff01b23SMartin Matuskaand you require accurate zfs send size estimates. 1833*3ff01b23SMartin Matuska. 1834*3ff01b23SMartin Matuska.It Sy zfs_sync_pass_deferred_free Ns = Ns Sy 2 Pq int 1835*3ff01b23SMartin MatuskaFlushing of data to disk is done in passes. 1836*3ff01b23SMartin MatuskaDefer frees starting in this pass. 1837*3ff01b23SMartin Matuska. 1838*3ff01b23SMartin Matuska.It Sy zfs_spa_discard_memory_limit Ns = Ns Sy 16777216 Ns B Po 16MB Pc Pq int 1839*3ff01b23SMartin MatuskaMaximum memory used for prefetching a checkpoint's space map on each 1840*3ff01b23SMartin Matuskavdev while discarding the checkpoint. 1841*3ff01b23SMartin Matuska. 1842*3ff01b23SMartin Matuska.It Sy zfs_special_class_metadata_reserve_pct Ns = Ns Sy 25 Ns % Pq int 1843*3ff01b23SMartin MatuskaOnly allow small data blocks to be allocated on the special and dedup vdev 1844*3ff01b23SMartin Matuskatypes when the available free space percentage on these vdevs exceeds this value. 1845*3ff01b23SMartin MatuskaThis ensures reserved space is available for pool metadata as the 1846*3ff01b23SMartin Matuskaspecial vdevs approach capacity. 1847*3ff01b23SMartin Matuska. 1848*3ff01b23SMartin Matuska.It Sy zfs_sync_pass_dont_compress Ns = Ns Sy 8 Pq int 1849*3ff01b23SMartin MatuskaStarting in this sync pass, disable compression (including of metadata). 1850*3ff01b23SMartin MatuskaWith the default setting, in practice, we don't have this many sync passes, 1851*3ff01b23SMartin Matuskaso this has no effect. 1852*3ff01b23SMartin Matuska.Pp 1853*3ff01b23SMartin MatuskaThe original intent was that disabling compression would help the sync passes 1854*3ff01b23SMartin Matuskato converge. 1855*3ff01b23SMartin MatuskaHowever, in practice, disabling compression increases 1856*3ff01b23SMartin Matuskathe average number of sync passes; because when we turn compression off, 1857*3ff01b23SMartin Matuskamany blocks' size will change, and thus we have to re-allocate 1858*3ff01b23SMartin Matuska(not overwrite) them. 1859*3ff01b23SMartin MatuskaIt also increases the number of 1860*3ff01b23SMartin Matuska.Em 128kB 1861*3ff01b23SMartin Matuskaallocations (e.g. for indirect blocks and spacemaps) 1862*3ff01b23SMartin Matuskabecause these will not be compressed. 1863*3ff01b23SMartin MatuskaThe 1864*3ff01b23SMartin Matuska.Em 128kB 1865*3ff01b23SMartin Matuskaallocations are especially detrimental to performance 1866*3ff01b23SMartin Matuskaon highly fragmented systems, which may have very few free segments of this size, 1867*3ff01b23SMartin Matuskaand may need to load new metaslabs to satisfy these allocations. 1868*3ff01b23SMartin Matuska. 1869*3ff01b23SMartin Matuska.It Sy zfs_sync_pass_rewrite Ns = Ns Sy 2 Pq int 1870*3ff01b23SMartin MatuskaRewrite new block pointers starting in this pass. 1871*3ff01b23SMartin Matuska. 1872*3ff01b23SMartin Matuska.It Sy zfs_sync_taskq_batch_pct Ns = Ns Sy 75 Ns % Pq int 1873*3ff01b23SMartin MatuskaThis controls the number of threads used by 1874*3ff01b23SMartin Matuska.Sy dp_sync_taskq . 1875*3ff01b23SMartin MatuskaThe default value of 1876*3ff01b23SMartin Matuska.Sy 75% 1877*3ff01b23SMartin Matuskawill create a maximum of one thread per CPU. 1878*3ff01b23SMartin Matuska. 1879*3ff01b23SMartin Matuska.It Sy zfs_trim_extent_bytes_max Ns = Ns Sy 134217728 Ns B Po 128MB Pc Pq uint 1880*3ff01b23SMartin MatuskaMaximum size of TRIM command. 1881*3ff01b23SMartin MatuskaLarger ranges will be split into chunks no larger than this value before issuing. 1882*3ff01b23SMartin Matuska. 1883*3ff01b23SMartin Matuska.It Sy zfs_trim_extent_bytes_min Ns = Ns Sy 32768 Ns B Po 32kB Pc Pq uint 1884*3ff01b23SMartin MatuskaMinimum size of TRIM commands. 1885*3ff01b23SMartin MatuskaTRIM ranges smaller than this will be skipped, 1886*3ff01b23SMartin Matuskaunless they're part of a larger range which was chunked. 1887*3ff01b23SMartin MatuskaThis is done because it's common for these small TRIMs 1888*3ff01b23SMartin Matuskato negatively impact overall performance. 1889*3ff01b23SMartin Matuska. 1890*3ff01b23SMartin Matuska.It Sy zfs_trim_metaslab_skip Ns = Ns Sy 0 Ns | Ns 1 Pq uint 1891*3ff01b23SMartin MatuskaSkip uninitialized metaslabs during the TRIM process. 1892*3ff01b23SMartin MatuskaThis option is useful for pools constructed from large thinly-provisioned devices 1893*3ff01b23SMartin Matuskawhere TRIM operations are slow. 1894*3ff01b23SMartin MatuskaAs a pool ages, an increasing fraction of the pool's metaslabs 1895*3ff01b23SMartin Matuskawill be initialized, progressively degrading the usefulness of this option. 1896*3ff01b23SMartin MatuskaThis setting is stored when starting a manual TRIM and will 1897*3ff01b23SMartin Matuskapersist for the duration of the requested TRIM. 1898*3ff01b23SMartin Matuska. 1899*3ff01b23SMartin Matuska.It Sy zfs_trim_queue_limit Ns = Ns Sy 10 Pq uint 1900*3ff01b23SMartin MatuskaMaximum number of queued TRIMs outstanding per leaf vdev. 1901*3ff01b23SMartin MatuskaThe number of concurrent TRIM commands issued to the device is controlled by 1902*3ff01b23SMartin Matuska.Sy zfs_vdev_trim_min_active No and Sy zfs_vdev_trim_max_active . 1903*3ff01b23SMartin Matuska. 1904*3ff01b23SMartin Matuska.It Sy zfs_trim_txg_batch Ns = Ns Sy 32 Pq uint 1905*3ff01b23SMartin MatuskaThe number of transaction groups' worth of frees which should be aggregated 1906*3ff01b23SMartin Matuskabefore TRIM operations are issued to the device. 1907*3ff01b23SMartin MatuskaThis setting represents a trade-off between issuing larger, 1908*3ff01b23SMartin Matuskamore efficient TRIM operations and the delay 1909*3ff01b23SMartin Matuskabefore the recently trimmed space is available for use by the device. 1910*3ff01b23SMartin Matuska.Pp 1911*3ff01b23SMartin MatuskaIncreasing this value will allow frees to be aggregated for a longer time. 1912*3ff01b23SMartin MatuskaThis will result is larger TRIM operations and potentially increased memory usage. 1913*3ff01b23SMartin MatuskaDecreasing this value will have the opposite effect. 1914*3ff01b23SMartin MatuskaThe default of 1915*3ff01b23SMartin Matuska.Sy 32 1916*3ff01b23SMartin Matuskawas determined to be a reasonable compromise. 1917*3ff01b23SMartin Matuska. 1918*3ff01b23SMartin Matuska.It Sy zfs_txg_history Ns = Ns Sy 0 Pq int 1919*3ff01b23SMartin MatuskaHistorical statistics for this many latest TXGs will be available in 1920*3ff01b23SMartin Matuska.Pa /proc/spl/kstat/zfs/ Ns Ao Ar pool Ac Ns Pa /TXGs . 1921*3ff01b23SMartin Matuska. 1922*3ff01b23SMartin Matuska.It Sy zfs_txg_timeout Ns = Ns Sy 5 Ns s Pq int 1923*3ff01b23SMartin MatuskaFlush dirty data to disk at least every this many seconds (maximum TXG duration). 1924*3ff01b23SMartin Matuska. 1925*3ff01b23SMartin Matuska.It Sy zfs_vdev_aggregate_trim Ns = Ns Sy 0 Ns | Ns 1 Pq int 1926*3ff01b23SMartin MatuskaAllow TRIM I/Os to be aggregated. 1927*3ff01b23SMartin MatuskaThis is normally not helpful because the extents to be trimmed 1928*3ff01b23SMartin Matuskawill have been already been aggregated by the metaslab. 1929*3ff01b23SMartin MatuskaThis option is provided for debugging and performance analysis. 1930*3ff01b23SMartin Matuska. 1931*3ff01b23SMartin Matuska.It Sy zfs_vdev_aggregation_limit Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq int 1932*3ff01b23SMartin MatuskaMax vdev I/O aggregation size. 1933*3ff01b23SMartin Matuska. 1934*3ff01b23SMartin Matuska.It Sy zfs_vdev_aggregation_limit_non_rotating Ns = Ns Sy 131072 Ns B Po 128kB Pc Pq int 1935*3ff01b23SMartin MatuskaMax vdev I/O aggregation size for non-rotating media. 1936*3ff01b23SMartin Matuska. 1937*3ff01b23SMartin Matuska.It Sy zfs_vdev_cache_bshift Ns = Ns Sy 16 Po 64kB Pc Pq int 1938*3ff01b23SMartin MatuskaShift size to inflate reads to. 1939*3ff01b23SMartin Matuska. 1940*3ff01b23SMartin Matuska.It Sy zfs_vdev_cache_max Ns = Ns Sy 16384 Ns B Po 16kB Pc Pq int 1941*3ff01b23SMartin MatuskaInflate reads smaller than this value to meet the 1942*3ff01b23SMartin Matuska.Sy zfs_vdev_cache_bshift 1943*3ff01b23SMartin Matuskasize 1944*3ff01b23SMartin Matuska.Pq default Sy 64kB . 1945*3ff01b23SMartin Matuska. 1946*3ff01b23SMartin Matuska.It Sy zfs_vdev_cache_size Ns = Ns Sy 0 Pq int 1947*3ff01b23SMartin MatuskaTotal size of the per-disk cache in bytes. 1948*3ff01b23SMartin Matuska.Pp 1949*3ff01b23SMartin MatuskaCurrently this feature is disabled, as it has been found to not be helpful 1950*3ff01b23SMartin Matuskafor performance and in some cases harmful. 1951*3ff01b23SMartin Matuska. 1952*3ff01b23SMartin Matuska.It Sy zfs_vdev_mirror_rotating_inc Ns = Ns Sy 0 Pq int 1953*3ff01b23SMartin MatuskaA number by which the balancing algorithm increments the load calculation for 1954*3ff01b23SMartin Matuskathe purpose of selecting the least busy mirror member when an I/O operation 1955*3ff01b23SMartin Matuskaimmediately follows its predecessor on rotational vdevs 1956*3ff01b23SMartin Matuskafor the purpose of making decisions based on load. 1957*3ff01b23SMartin Matuska. 1958*3ff01b23SMartin Matuska.It Sy zfs_vdev_mirror_rotating_seek_inc Ns = Ns Sy 5 Pq int 1959*3ff01b23SMartin MatuskaA number by which the balancing algorithm increments the load calculation for 1960*3ff01b23SMartin Matuskathe purpose of selecting the least busy mirror member when an I/O operation 1961*3ff01b23SMartin Matuskalacks locality as defined by 1962*3ff01b23SMartin Matuska.Sy zfs_vdev_mirror_rotating_seek_offset . 1963*3ff01b23SMartin MatuskaOperations within this that are not immediately following the previous operation 1964*3ff01b23SMartin Matuskaare incremented by half. 1965*3ff01b23SMartin Matuska. 1966*3ff01b23SMartin Matuska.It Sy zfs_vdev_mirror_rotating_seek_offset Ns = Ns Sy 1048576 Ns B Po 1MB Pc Pq int 1967*3ff01b23SMartin MatuskaThe maximum distance for the last queued I/O operation in which 1968*3ff01b23SMartin Matuskathe balancing algorithm considers an operation to have locality. 1969*3ff01b23SMartin Matuska.No See Sx ZFS I/O SCHEDULER . 1970*3ff01b23SMartin Matuska. 1971*3ff01b23SMartin Matuska.It Sy zfs_vdev_mirror_non_rotating_inc Ns = Ns Sy 0 Pq int 1972*3ff01b23SMartin MatuskaA number by which the balancing algorithm increments the load calculation for 1973*3ff01b23SMartin Matuskathe purpose of selecting the least busy mirror member on non-rotational vdevs 1974*3ff01b23SMartin Matuskawhen I/O operations do not immediately follow one another. 1975*3ff01b23SMartin Matuska. 1976*3ff01b23SMartin Matuska.It Sy zfs_vdev_mirror_non_rotating_seek_inc Ns = Ns Sy 1 Pq int 1977*3ff01b23SMartin MatuskaA number by which the balancing algorithm increments the load calculation for 1978*3ff01b23SMartin Matuskathe purpose of selecting the least busy mirror member when an I/O operation lacks 1979*3ff01b23SMartin Matuskalocality as defined by the 1980*3ff01b23SMartin Matuska.Sy zfs_vdev_mirror_rotating_seek_offset . 1981*3ff01b23SMartin MatuskaOperations within this that are not immediately following the previous operation 1982*3ff01b23SMartin Matuskaare incremented by half. 1983*3ff01b23SMartin Matuska. 1984*3ff01b23SMartin Matuska.It Sy zfs_vdev_read_gap_limit Ns = Ns Sy 32768 Ns B Po 32kB Pc Pq int 1985*3ff01b23SMartin MatuskaAggregate read I/O operations if the on-disk gap between them is within this 1986*3ff01b23SMartin Matuskathreshold. 1987*3ff01b23SMartin Matuska. 1988*3ff01b23SMartin Matuska.It Sy zfs_vdev_write_gap_limit Ns = Ns Sy 4096 Ns B Po 4kB Pc Pq int 1989*3ff01b23SMartin MatuskaAggregate write I/O operations if the on-disk gap between them is within this 1990*3ff01b23SMartin Matuskathreshold. 1991*3ff01b23SMartin Matuska. 1992*3ff01b23SMartin Matuska.It Sy zfs_vdev_raidz_impl Ns = Ns Sy fastest Pq string 1993*3ff01b23SMartin MatuskaSelect the raidz parity implementation to use. 1994*3ff01b23SMartin Matuska.Pp 1995*3ff01b23SMartin MatuskaVariants that don't depend on CPU-specific features 1996*3ff01b23SMartin Matuskamay be selected on module load, as they are supported on all systems. 1997*3ff01b23SMartin MatuskaThe remaining options may only be set after the module is loaded, 1998*3ff01b23SMartin Matuskaas they are available only if the implementations are compiled in 1999*3ff01b23SMartin Matuskaand supported on the running system. 2000*3ff01b23SMartin Matuska.Pp 2001*3ff01b23SMartin MatuskaOnce the module is loaded, 2002*3ff01b23SMartin Matuska.Pa /sys/module/zfs/parameters/zfs_vdev_raidz_impl 2003*3ff01b23SMartin Matuskawill show the available options, 2004*3ff01b23SMartin Matuskawith the currently selected one enclosed in square brackets. 2005*3ff01b23SMartin Matuska.Pp 2006*3ff01b23SMartin Matuska.TS 2007*3ff01b23SMartin Matuskalb l l . 2008*3ff01b23SMartin Matuskafastest selected by built-in benchmark 2009*3ff01b23SMartin Matuskaoriginal original implementation 2010*3ff01b23SMartin Matuskascalar scalar implementation 2011*3ff01b23SMartin Matuskasse2 SSE2 instruction set 64-bit x86 2012*3ff01b23SMartin Matuskassse3 SSSE3 instruction set 64-bit x86 2013*3ff01b23SMartin Matuskaavx2 AVX2 instruction set 64-bit x86 2014*3ff01b23SMartin Matuskaavx512f AVX512F instruction set 64-bit x86 2015*3ff01b23SMartin Matuskaavx512bw AVX512F & AVX512BW instruction sets 64-bit x86 2016*3ff01b23SMartin Matuskaaarch64_neon NEON Aarch64/64-bit ARMv8 2017*3ff01b23SMartin Matuskaaarch64_neonx2 NEON with more unrolling Aarch64/64-bit ARMv8 2018*3ff01b23SMartin Matuskapowerpc_altivec Altivec PowerPC 2019*3ff01b23SMartin Matuska.TE 2020*3ff01b23SMartin Matuska. 2021*3ff01b23SMartin Matuska.It Sy zfs_vdev_scheduler Pq charp 2022*3ff01b23SMartin Matuska.Sy DEPRECATED . 2023*3ff01b23SMartin MatuskaPrints warning to kernel log for compatiblity. 2024*3ff01b23SMartin Matuska. 2025*3ff01b23SMartin Matuska.It Sy zfs_zevent_len_max Ns = Ns Sy 512 Pq int 2026*3ff01b23SMartin MatuskaMax event queue length. 2027*3ff01b23SMartin MatuskaEvents in the queue can be viewed with 2028*3ff01b23SMartin Matuska.Xr zpool-events 8 . 2029*3ff01b23SMartin Matuska. 2030*3ff01b23SMartin Matuska.It Sy zfs_zevent_retain_max Ns = Ns Sy 2000 Pq int 2031*3ff01b23SMartin MatuskaMaximum recent zevent records to retain for duplicate checking. 2032*3ff01b23SMartin MatuskaSetting this to 2033*3ff01b23SMartin Matuska.Sy 0 2034*3ff01b23SMartin Matuskadisables duplicate detection. 2035*3ff01b23SMartin Matuska. 2036*3ff01b23SMartin Matuska.It Sy zfs_zevent_retain_expire_secs Ns = Ns Sy 900 Ns s Po 15min Pc Pq int 2037*3ff01b23SMartin MatuskaLifespan for a recent ereport that was retained for duplicate checking. 2038*3ff01b23SMartin Matuska. 2039*3ff01b23SMartin Matuska.It Sy zfs_zil_clean_taskq_maxalloc Ns = Ns Sy 1048576 Pq int 2040*3ff01b23SMartin MatuskaThe maximum number of taskq entries that are allowed to be cached. 2041*3ff01b23SMartin MatuskaWhen this limit is exceeded transaction records (itxs) 2042*3ff01b23SMartin Matuskawill be cleaned synchronously. 2043*3ff01b23SMartin Matuska. 2044*3ff01b23SMartin Matuska.It Sy zfs_zil_clean_taskq_minalloc Ns = Ns Sy 1024 Pq int 2045*3ff01b23SMartin MatuskaThe number of taskq entries that are pre-populated when the taskq is first 2046*3ff01b23SMartin Matuskacreated and are immediately available for use. 2047*3ff01b23SMartin Matuska. 2048*3ff01b23SMartin Matuska.It Sy zfs_zil_clean_taskq_nthr_pct Ns = Ns Sy 100 Ns % Pq int 2049*3ff01b23SMartin MatuskaThis controls the number of threads used by 2050*3ff01b23SMartin Matuska.Sy dp_zil_clean_taskq . 2051*3ff01b23SMartin MatuskaThe default value of 2052*3ff01b23SMartin Matuska.Sy 100% 2053*3ff01b23SMartin Matuskawill create a maximum of one thread per cpu. 2054*3ff01b23SMartin Matuska. 2055*3ff01b23SMartin Matuska.It Sy zil_maxblocksize Ns = Ns Sy 131072 Ns B Po 128kB Pc Pq int 2056*3ff01b23SMartin MatuskaThis sets the maximum block size used by the ZIL. 2057*3ff01b23SMartin MatuskaOn very fragmented pools, lowering this 2058*3ff01b23SMartin Matuska.Pq typically to Sy 36kB 2059*3ff01b23SMartin Matuskacan improve performance. 2060*3ff01b23SMartin Matuska. 2061*3ff01b23SMartin Matuska.It Sy zil_nocacheflush Ns = Ns Sy 0 Ns | Ns 1 Pq int 2062*3ff01b23SMartin MatuskaDisable the cache flush commands that are normally sent to disk by 2063*3ff01b23SMartin Matuskathe ZIL after an LWB write has completed. 2064*3ff01b23SMartin MatuskaSetting this will cause ZIL corruption on power loss 2065*3ff01b23SMartin Matuskaif a volatile out-of-order write cache is enabled. 2066*3ff01b23SMartin Matuska. 2067*3ff01b23SMartin Matuska.It Sy zil_replay_disable Ns = Ns Sy 0 Ns | Ns 1 Pq int 2068*3ff01b23SMartin MatuskaDisable intent logging replay. 2069*3ff01b23SMartin MatuskaCan be disabled for recovery from corrupted ZIL. 2070*3ff01b23SMartin Matuska. 2071*3ff01b23SMartin Matuska.It Sy zil_slog_bulk Ns = Ns Sy 786432 Ns B Po 768kB Pc Pq ulong 2072*3ff01b23SMartin MatuskaLimit SLOG write size per commit executed with synchronous priority. 2073*3ff01b23SMartin MatuskaAny writes above that will be executed with lower (asynchronous) priority 2074*3ff01b23SMartin Matuskato limit potential SLOG device abuse by single active ZIL writer. 2075*3ff01b23SMartin Matuska. 2076*3ff01b23SMartin Matuska.It Sy zfs_embedded_slog_min_ms Ns = Ns Sy 64 Pq int 2077*3ff01b23SMartin MatuskaUsually, one metaslab from each normal-class vdev is dedicated for use by 2078*3ff01b23SMartin Matuskathe ZIL to log synchronous writes. 2079*3ff01b23SMartin MatuskaHowever, if there are fewer than 2080*3ff01b23SMartin Matuska.Sy zfs_embedded_slog_min_ms 2081*3ff01b23SMartin Matuskametaslabs in the vdev, this functionality is disabled. 2082*3ff01b23SMartin MatuskaThis ensures that we don't set aside an unreasonable amount of space for the ZIL. 2083*3ff01b23SMartin Matuska. 2084*3ff01b23SMartin Matuska.It Sy zio_deadman_log_all Ns = Ns Sy 0 Ns | Ns 1 Pq int 2085*3ff01b23SMartin MatuskaIf non-zero, the zio deadman will produce debugging messages 2086*3ff01b23SMartin Matuska.Pq see Sy zfs_dbgmsg_enable 2087*3ff01b23SMartin Matuskafor all zios, rather than only for leaf zios possessing a vdev. 2088*3ff01b23SMartin MatuskaThis is meant to be used by developers to gain 2089*3ff01b23SMartin Matuskadiagnostic information for hang conditions which don't involve a mutex 2090*3ff01b23SMartin Matuskaor other locking primitive: typically conditions in which a thread in 2091*3ff01b23SMartin Matuskathe zio pipeline is looping indefinitely. 2092*3ff01b23SMartin Matuska. 2093*3ff01b23SMartin Matuska.It Sy zio_slow_io_ms Ns = Ns Sy 30000 Ns ms Po 30s Pc Pq int 2094*3ff01b23SMartin MatuskaWhen an I/O operation takes more than this much time to complete, 2095*3ff01b23SMartin Matuskait's marked as slow. 2096*3ff01b23SMartin MatuskaEach slow operation causes a delay zevent. 2097*3ff01b23SMartin MatuskaSlow I/O counters can be seen with 2098*3ff01b23SMartin Matuska.Nm zpool Cm status Fl s . 2099*3ff01b23SMartin Matuska. 2100*3ff01b23SMartin Matuska.It Sy zio_dva_throttle_enabled Ns = Ns Sy 1 Ns | Ns 0 Pq int 2101*3ff01b23SMartin MatuskaThrottle block allocations in the I/O pipeline. 2102*3ff01b23SMartin MatuskaThis allows for dynamic allocation distribution when devices are imbalanced. 2103*3ff01b23SMartin MatuskaWhen enabled, the maximum number of pending allocations per top-level vdev 2104*3ff01b23SMartin Matuskais limited by 2105*3ff01b23SMartin Matuska.Sy zfs_vdev_queue_depth_pct . 2106*3ff01b23SMartin Matuska. 2107*3ff01b23SMartin Matuska.It Sy zio_requeue_io_start_cut_in_line Ns = Ns Sy 0 Ns | Ns 1 Pq int 2108*3ff01b23SMartin MatuskaPrioritize requeued I/O. 2109*3ff01b23SMartin Matuska. 2110*3ff01b23SMartin Matuska.It Sy zio_taskq_batch_pct Ns = Ns Sy 80 Ns % Pq uint 2111*3ff01b23SMartin MatuskaPercentage of online CPUs which will run a worker thread for I/O. 2112*3ff01b23SMartin MatuskaThese workers are responsible for I/O work such as compression and 2113*3ff01b23SMartin Matuskachecksum calculations. 2114*3ff01b23SMartin MatuskaFractional number of CPUs will be rounded down. 2115*3ff01b23SMartin Matuska.Pp 2116*3ff01b23SMartin MatuskaThe default value of 2117*3ff01b23SMartin Matuska.Sy 80% 2118*3ff01b23SMartin Matuskawas chosen to avoid using all CPUs which can result in 2119*3ff01b23SMartin Matuskalatency issues and inconsistent application performance, 2120*3ff01b23SMartin Matuskaespecially when slower compression and/or checksumming is enabled. 2121*3ff01b23SMartin Matuska. 2122*3ff01b23SMartin Matuska.It Sy zio_taskq_batch_tpq Ns = Ns Sy 0 Pq uint 2123*3ff01b23SMartin MatuskaNumber of worker threads per taskq. 2124*3ff01b23SMartin MatuskaLower values improve I/O ordering and CPU utilization, 2125*3ff01b23SMartin Matuskawhile higher reduces lock contention. 2126*3ff01b23SMartin Matuska.Pp 2127*3ff01b23SMartin MatuskaIf 2128*3ff01b23SMartin Matuska.Sy 0 , 2129*3ff01b23SMartin Matuskagenerate a system-dependent value close to 6 threads per taskq. 2130*3ff01b23SMartin Matuska. 2131*3ff01b23SMartin Matuska.It Sy zvol_inhibit_dev Ns = Ns Sy 0 Ns | Ns 1 Pq uint 2132*3ff01b23SMartin MatuskaDo not create zvol device nodes. 2133*3ff01b23SMartin MatuskaThis may slightly improve startup time on 2134*3ff01b23SMartin Matuskasystems with a very large number of zvols. 2135*3ff01b23SMartin Matuska. 2136*3ff01b23SMartin Matuska.It Sy zvol_major Ns = Ns Sy 230 Pq uint 2137*3ff01b23SMartin MatuskaMajor number for zvol block devices. 2138*3ff01b23SMartin Matuska. 2139*3ff01b23SMartin Matuska.It Sy zvol_max_discard_blocks Ns = Ns Sy 16384 Pq ulong 2140*3ff01b23SMartin MatuskaDiscard (TRIM) operations done on zvols will be done in batches of this 2141*3ff01b23SMartin Matuskamany blocks, where block size is determined by the 2142*3ff01b23SMartin Matuska.Sy volblocksize 2143*3ff01b23SMartin Matuskaproperty of a zvol. 2144*3ff01b23SMartin Matuska. 2145*3ff01b23SMartin Matuska.It Sy zvol_prefetch_bytes Ns = Ns Sy 131072 Ns B Po 128kB Pc Pq uint 2146*3ff01b23SMartin MatuskaWhen adding a zvol to the system, prefetch this many bytes 2147*3ff01b23SMartin Matuskafrom the start and end of the volume. 2148*3ff01b23SMartin MatuskaPrefetching these regions of the volume is desirable, 2149*3ff01b23SMartin Matuskabecause they are likely to be accessed immediately by 2150*3ff01b23SMartin Matuska.Xr blkid 8 2151*3ff01b23SMartin Matuskaor the kernel partitioner. 2152*3ff01b23SMartin Matuska. 2153*3ff01b23SMartin Matuska.It Sy zvol_request_sync Ns = Ns Sy 0 Ns | Ns 1 Pq uint 2154*3ff01b23SMartin MatuskaWhen processing I/O requests for a zvol, submit them synchronously. 2155*3ff01b23SMartin MatuskaThis effectively limits the queue depth to 2156*3ff01b23SMartin Matuska.Em 1 2157*3ff01b23SMartin Matuskafor each I/O submitter. 2158*3ff01b23SMartin MatuskaWhen unset, requests are handled asynchronously by a thread pool. 2159*3ff01b23SMartin MatuskaThe number of requests which can be handled concurrently is controlled by 2160*3ff01b23SMartin Matuska.Sy zvol_threads . 2161*3ff01b23SMartin Matuska. 2162*3ff01b23SMartin Matuska.It Sy zvol_threads Ns = Ns Sy 32 Pq uint 2163*3ff01b23SMartin MatuskaMax number of threads which can handle zvol I/O requests concurrently. 2164*3ff01b23SMartin Matuska. 2165*3ff01b23SMartin Matuska.It Sy zvol_volmode Ns = Ns Sy 1 Pq uint 2166*3ff01b23SMartin MatuskaDefines zvol block devices behaviour when 2167*3ff01b23SMartin Matuska.Sy volmode Ns = Ns Sy default : 2168*3ff01b23SMartin Matuska.Bl -tag -compact -offset 4n -width "a" 2169*3ff01b23SMartin Matuska.It Sy 1 2170*3ff01b23SMartin Matuska.No equivalent to Sy full 2171*3ff01b23SMartin Matuska.It Sy 2 2172*3ff01b23SMartin Matuska.No equivalent to Sy dev 2173*3ff01b23SMartin Matuska.It Sy 3 2174*3ff01b23SMartin Matuska.No equivalent to Sy none 2175*3ff01b23SMartin Matuska.El 2176*3ff01b23SMartin Matuska.El 2177*3ff01b23SMartin Matuska. 2178*3ff01b23SMartin Matuska.Sh ZFS I/O SCHEDULER 2179*3ff01b23SMartin MatuskaZFS issues I/O operations to leaf vdevs to satisfy and complete I/O operations. 2180*3ff01b23SMartin MatuskaThe scheduler determines when and in what order those operations are issued. 2181*3ff01b23SMartin MatuskaThe scheduler divides operations into five I/O classes, 2182*3ff01b23SMartin Matuskaprioritized in the following order: sync read, sync write, async read, 2183*3ff01b23SMartin Matuskaasync write, and scrub/resilver. 2184*3ff01b23SMartin MatuskaEach queue defines the minimum and maximum number of concurrent operations 2185*3ff01b23SMartin Matuskathat may be issued to the device. 2186*3ff01b23SMartin MatuskaIn addition, the device has an aggregate maximum, 2187*3ff01b23SMartin Matuska.Sy zfs_vdev_max_active . 2188*3ff01b23SMartin MatuskaNote that the sum of the per-queue minima must not exceed the aggregate maximum. 2189*3ff01b23SMartin MatuskaIf the sum of the per-queue maxima exceeds the aggregate maximum, 2190*3ff01b23SMartin Matuskathen the number of active operations may reach 2191*3ff01b23SMartin Matuska.Sy zfs_vdev_max_active , 2192*3ff01b23SMartin Matuskain which case no further operations will be issued, 2193*3ff01b23SMartin Matuskaregardless of whether all per-queue minima have been met. 2194*3ff01b23SMartin Matuska.Pp 2195*3ff01b23SMartin MatuskaFor many physical devices, throughput increases with the number of 2196*3ff01b23SMartin Matuskaconcurrent operations, but latency typically suffers. 2197*3ff01b23SMartin MatuskaFurthermore, physical devices typically have a limit 2198*3ff01b23SMartin Matuskaat which more concurrent operations have no 2199*3ff01b23SMartin Matuskaeffect on throughput or can actually cause it to decrease. 2200*3ff01b23SMartin Matuska.Pp 2201*3ff01b23SMartin MatuskaThe scheduler selects the next operation to issue by first looking for an 2202*3ff01b23SMartin MatuskaI/O class whose minimum has not been satisfied. 2203*3ff01b23SMartin MatuskaOnce all are satisfied and the aggregate maximum has not been hit, 2204*3ff01b23SMartin Matuskathe scheduler looks for classes whose maximum has not been satisfied. 2205*3ff01b23SMartin MatuskaIteration through the I/O classes is done in the order specified above. 2206*3ff01b23SMartin MatuskaNo further operations are issued 2207*3ff01b23SMartin Matuskaif the aggregate maximum number of concurrent operations has been hit, 2208*3ff01b23SMartin Matuskaor if there are no operations queued for an I/O class that has not hit its maximum. 2209*3ff01b23SMartin MatuskaEvery time an I/O operation is queued or an operation completes, 2210*3ff01b23SMartin Matuskathe scheduler looks for new operations to issue. 2211*3ff01b23SMartin Matuska.Pp 2212*3ff01b23SMartin MatuskaIn general, smaller 2213*3ff01b23SMartin Matuska.Sy max_active Ns s 2214*3ff01b23SMartin Matuskawill lead to lower latency of synchronous operations. 2215*3ff01b23SMartin MatuskaLarger 2216*3ff01b23SMartin Matuska.Sy max_active Ns s 2217*3ff01b23SMartin Matuskamay lead to higher overall throughput, depending on underlying storage. 2218*3ff01b23SMartin Matuska.Pp 2219*3ff01b23SMartin MatuskaThe ratio of the queues' 2220*3ff01b23SMartin Matuska.Sy max_active Ns s 2221*3ff01b23SMartin Matuskadetermines the balance of performance between reads, writes, and scrubs. 2222*3ff01b23SMartin MatuskaFor example, increasing 2223*3ff01b23SMartin Matuska.Sy zfs_vdev_scrub_max_active 2224*3ff01b23SMartin Matuskawill cause the scrub or resilver to complete more quickly, 2225*3ff01b23SMartin Matuskabut reads and writes to have higher latency and lower throughput. 2226*3ff01b23SMartin Matuska.Pp 2227*3ff01b23SMartin MatuskaAll I/O classes have a fixed maximum number of outstanding operations, 2228*3ff01b23SMartin Matuskaexcept for the async write class. 2229*3ff01b23SMartin MatuskaAsynchronous writes represent the data that is committed to stable storage 2230*3ff01b23SMartin Matuskaduring the syncing stage for transaction groups. 2231*3ff01b23SMartin MatuskaTransaction groups enter the syncing state periodically, 2232*3ff01b23SMartin Matuskaso the number of queued async writes will quickly burst up 2233*3ff01b23SMartin Matuskaand then bleed down to zero. 2234*3ff01b23SMartin MatuskaRather than servicing them as quickly as possible, 2235*3ff01b23SMartin Matuskathe I/O scheduler changes the maximum number of active async write operations 2236*3ff01b23SMartin Matuskaaccording to the amount of dirty data in the pool. 2237*3ff01b23SMartin MatuskaSince both throughput and latency typically increase with the number of 2238*3ff01b23SMartin Matuskaconcurrent operations issued to physical devices, reducing the 2239*3ff01b23SMartin Matuskaburstiness in the number of concurrent operations also stabilizes the 2240*3ff01b23SMartin Matuskaresponse time of operations from other – and in particular synchronous – queues. 2241*3ff01b23SMartin MatuskaIn broad strokes, the I/O scheduler will issue more concurrent operations 2242*3ff01b23SMartin Matuskafrom the async write queue as there's more dirty data in the pool. 2243*3ff01b23SMartin Matuska. 2244*3ff01b23SMartin Matuska.Ss Async Writes 2245*3ff01b23SMartin MatuskaThe number of concurrent operations issued for the async write I/O class 2246*3ff01b23SMartin Matuskafollows a piece-wise linear function defined by a few adjustable points: 2247*3ff01b23SMartin Matuska.Bd -literal 2248*3ff01b23SMartin Matuska | o---------| <-- \fBzfs_vdev_async_write_max_active\fP 2249*3ff01b23SMartin Matuska ^ | /^ | 2250*3ff01b23SMartin Matuska | | / | | 2251*3ff01b23SMartin Matuskaactive | / | | 2252*3ff01b23SMartin Matuska I/O | / | | 2253*3ff01b23SMartin Matuskacount | / | | 2254*3ff01b23SMartin Matuska | / | | 2255*3ff01b23SMartin Matuska |-------o | | <-- \fBzfs_vdev_async_write_min_active\fP 2256*3ff01b23SMartin Matuska 0|_______^______|_________| 2257*3ff01b23SMartin Matuska 0% | | 100% of \fBzfs_dirty_data_max\fP 2258*3ff01b23SMartin Matuska | | 2259*3ff01b23SMartin Matuska | `-- \fBzfs_vdev_async_write_active_max_dirty_percent\fP 2260*3ff01b23SMartin Matuska `--------- \fBzfs_vdev_async_write_active_min_dirty_percent\fP 2261*3ff01b23SMartin Matuska.Ed 2262*3ff01b23SMartin Matuska.Pp 2263*3ff01b23SMartin MatuskaUntil the amount of dirty data exceeds a minimum percentage of the dirty 2264*3ff01b23SMartin Matuskadata allowed in the pool, the I/O scheduler will limit the number of 2265*3ff01b23SMartin Matuskaconcurrent operations to the minimum. 2266*3ff01b23SMartin MatuskaAs that threshold is crossed, the number of concurrent operations issued 2267*3ff01b23SMartin Matuskaincreases linearly to the maximum at the specified maximum percentage 2268*3ff01b23SMartin Matuskaof the dirty data allowed in the pool. 2269*3ff01b23SMartin Matuska.Pp 2270*3ff01b23SMartin MatuskaIdeally, the amount of dirty data on a busy pool will stay in the sloped 2271*3ff01b23SMartin Matuskapart of the function between 2272*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_active_min_dirty_percent 2273*3ff01b23SMartin Matuskaand 2274*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_active_max_dirty_percent . 2275*3ff01b23SMartin MatuskaIf it exceeds the maximum percentage, 2276*3ff01b23SMartin Matuskathis indicates that the rate of incoming data is 2277*3ff01b23SMartin Matuskagreater than the rate that the backend storage can handle. 2278*3ff01b23SMartin MatuskaIn this case, we must further throttle incoming writes, 2279*3ff01b23SMartin Matuskaas described in the next section. 2280*3ff01b23SMartin Matuska. 2281*3ff01b23SMartin Matuska.Sh ZFS TRANSACTION DELAY 2282*3ff01b23SMartin MatuskaWe delay transactions when we've determined that the backend storage 2283*3ff01b23SMartin Matuskaisn't able to accommodate the rate of incoming writes. 2284*3ff01b23SMartin Matuska.Pp 2285*3ff01b23SMartin MatuskaIf there is already a transaction waiting, we delay relative to when 2286*3ff01b23SMartin Matuskathat transaction will finish waiting. 2287*3ff01b23SMartin MatuskaThis way the calculated delay time 2288*3ff01b23SMartin Matuskais independent of the number of threads concurrently executing transactions. 2289*3ff01b23SMartin Matuska.Pp 2290*3ff01b23SMartin MatuskaIf we are the only waiter, wait relative to when the transaction started, 2291*3ff01b23SMartin Matuskarather than the current time. 2292*3ff01b23SMartin MatuskaThis credits the transaction for "time already served", 2293*3ff01b23SMartin Matuskae.g. reading indirect blocks. 2294*3ff01b23SMartin Matuska.Pp 2295*3ff01b23SMartin MatuskaThe minimum time for a transaction to take is calculated as 2296*3ff01b23SMartin Matuska.Dl min_time = min( Ns Sy zfs_delay_scale No * (dirty - min) / (max - dirty), 100ms) 2297*3ff01b23SMartin Matuska.Pp 2298*3ff01b23SMartin MatuskaThe delay has two degrees of freedom that can be adjusted via tunables. 2299*3ff01b23SMartin MatuskaThe percentage of dirty data at which we start to delay is defined by 2300*3ff01b23SMartin Matuska.Sy zfs_delay_min_dirty_percent . 2301*3ff01b23SMartin MatuskaThis should typically be at or above 2302*3ff01b23SMartin Matuska.Sy zfs_vdev_async_write_active_max_dirty_percent , 2303*3ff01b23SMartin Matuskaso that we only start to delay after writing at full speed 2304*3ff01b23SMartin Matuskahas failed to keep up with the incoming write rate. 2305*3ff01b23SMartin MatuskaThe scale of the curve is defined by 2306*3ff01b23SMartin Matuska.Sy zfs_delay_scale . 2307*3ff01b23SMartin MatuskaRoughly speaking, this variable determines the amount of delay at the midpoint of the curve. 2308*3ff01b23SMartin Matuska.Bd -literal 2309*3ff01b23SMartin Matuskadelay 2310*3ff01b23SMartin Matuska 10ms +-------------------------------------------------------------*+ 2311*3ff01b23SMartin Matuska | *| 2312*3ff01b23SMartin Matuska 9ms + *+ 2313*3ff01b23SMartin Matuska | *| 2314*3ff01b23SMartin Matuska 8ms + *+ 2315*3ff01b23SMartin Matuska | * | 2316*3ff01b23SMartin Matuska 7ms + * + 2317*3ff01b23SMartin Matuska | * | 2318*3ff01b23SMartin Matuska 6ms + * + 2319*3ff01b23SMartin Matuska | * | 2320*3ff01b23SMartin Matuska 5ms + * + 2321*3ff01b23SMartin Matuska | * | 2322*3ff01b23SMartin Matuska 4ms + * + 2323*3ff01b23SMartin Matuska | * | 2324*3ff01b23SMartin Matuska 3ms + * + 2325*3ff01b23SMartin Matuska | * | 2326*3ff01b23SMartin Matuska 2ms + (midpoint) * + 2327*3ff01b23SMartin Matuska | | ** | 2328*3ff01b23SMartin Matuska 1ms + v *** + 2329*3ff01b23SMartin Matuska | \fBzfs_delay_scale\fP ----------> ******** | 2330*3ff01b23SMartin Matuska 0 +-------------------------------------*********----------------+ 2331*3ff01b23SMartin Matuska 0% <- \fBzfs_dirty_data_max\fP -> 100% 2332*3ff01b23SMartin Matuska.Ed 2333*3ff01b23SMartin Matuska.Pp 2334*3ff01b23SMartin MatuskaNote, that since the delay is added to the outstanding time remaining on the 2335*3ff01b23SMartin Matuskamost recent transaction it's effectively the inverse of IOPS. 2336*3ff01b23SMartin MatuskaHere, the midpoint of 2337*3ff01b23SMartin Matuska.Em 500us 2338*3ff01b23SMartin Matuskatranslates to 2339*3ff01b23SMartin Matuska.Em 2000 IOPS . 2340*3ff01b23SMartin MatuskaThe shape of the curve 2341*3ff01b23SMartin Matuskawas chosen such that small changes in the amount of accumulated dirty data 2342*3ff01b23SMartin Matuskain the first three quarters of the curve yield relatively small differences 2343*3ff01b23SMartin Matuskain the amount of delay. 2344*3ff01b23SMartin Matuska.Pp 2345*3ff01b23SMartin MatuskaThe effects can be easier to understand when the amount of delay is 2346*3ff01b23SMartin Matuskarepresented on a logarithmic scale: 2347*3ff01b23SMartin Matuska.Bd -literal 2348*3ff01b23SMartin Matuskadelay 2349*3ff01b23SMartin Matuska100ms +-------------------------------------------------------------++ 2350*3ff01b23SMartin Matuska + + 2351*3ff01b23SMartin Matuska | | 2352*3ff01b23SMartin Matuska + *+ 2353*3ff01b23SMartin Matuska 10ms + *+ 2354*3ff01b23SMartin Matuska + ** + 2355*3ff01b23SMartin Matuska | (midpoint) ** | 2356*3ff01b23SMartin Matuska + | ** + 2357*3ff01b23SMartin Matuska 1ms + v **** + 2358*3ff01b23SMartin Matuska + \fBzfs_delay_scale\fP ----------> ***** + 2359*3ff01b23SMartin Matuska | **** | 2360*3ff01b23SMartin Matuska + **** + 2361*3ff01b23SMartin Matuska100us + ** + 2362*3ff01b23SMartin Matuska + * + 2363*3ff01b23SMartin Matuska | * | 2364*3ff01b23SMartin Matuska + * + 2365*3ff01b23SMartin Matuska 10us + * + 2366*3ff01b23SMartin Matuska + + 2367*3ff01b23SMartin Matuska | | 2368*3ff01b23SMartin Matuska + + 2369*3ff01b23SMartin Matuska +--------------------------------------------------------------+ 2370*3ff01b23SMartin Matuska 0% <- \fBzfs_dirty_data_max\fP -> 100% 2371*3ff01b23SMartin Matuska.Ed 2372*3ff01b23SMartin Matuska.Pp 2373*3ff01b23SMartin MatuskaNote here that only as the amount of dirty data approaches its limit does 2374*3ff01b23SMartin Matuskathe delay start to increase rapidly. 2375*3ff01b23SMartin MatuskaThe goal of a properly tuned system should be to keep the amount of dirty data 2376*3ff01b23SMartin Matuskaout of that range by first ensuring that the appropriate limits are set 2377*3ff01b23SMartin Matuskafor the I/O scheduler to reach optimal throughput on the back-end storage, 2378*3ff01b23SMartin Matuskaand then by changing the value of 2379*3ff01b23SMartin Matuska.Sy zfs_delay_scale 2380*3ff01b23SMartin Matuskato increase the steepness of the curve. 2381