1*3ff01b23SMartin Matuska.\" 2*3ff01b23SMartin Matuska.\" CDDL HEADER START 3*3ff01b23SMartin Matuska.\" 4*3ff01b23SMartin Matuska.\" The contents of this file are subject to the terms of the 5*3ff01b23SMartin Matuska.\" Common Development and Distribution License (the "License"). 6*3ff01b23SMartin Matuska.\" You may not use this file except in compliance with the License. 7*3ff01b23SMartin Matuska.\" 8*3ff01b23SMartin Matuska.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9*3ff01b23SMartin Matuska.\" or http://www.opensolaris.org/os/licensing. 10*3ff01b23SMartin Matuska.\" See the License for the specific language governing permissions 11*3ff01b23SMartin Matuska.\" and limitations under the License. 12*3ff01b23SMartin Matuska.\" 13*3ff01b23SMartin Matuska.\" When distributing Covered Code, include this CDDL HEADER in each 14*3ff01b23SMartin Matuska.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. 15*3ff01b23SMartin Matuska.\" If applicable, add the following below this CDDL HEADER, with the 16*3ff01b23SMartin Matuska.\" fields enclosed by brackets "[]" replaced with your own identifying 17*3ff01b23SMartin Matuska.\" information: Portions Copyright [yyyy] [name of copyright owner] 18*3ff01b23SMartin Matuska.\" 19*3ff01b23SMartin Matuska.\" CDDL HEADER END 20*3ff01b23SMartin Matuska.\" 21*3ff01b23SMartin Matuska.\" Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved. 22*3ff01b23SMartin Matuska.\" Copyright (c) 2012, 2018 by Delphix. All rights reserved. 23*3ff01b23SMartin Matuska.\" Copyright (c) 2012 Cyril Plisko. All Rights Reserved. 24*3ff01b23SMartin Matuska.\" Copyright (c) 2017 Datto Inc. 25*3ff01b23SMartin Matuska.\" Copyright (c) 2018 George Melikov. All Rights Reserved. 26*3ff01b23SMartin Matuska.\" Copyright 2017 Nexenta Systems, Inc. 27*3ff01b23SMartin Matuska.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved. 28*3ff01b23SMartin Matuska.\" 29*3ff01b23SMartin Matuska.Dd June 2, 2021 30*3ff01b23SMartin Matuska.Dt ZPOOLCONCEPTS 7 31*3ff01b23SMartin Matuska.Os 32*3ff01b23SMartin Matuska. 33*3ff01b23SMartin Matuska.Sh NAME 34*3ff01b23SMartin Matuska.Nm zpoolconcepts 35*3ff01b23SMartin Matuska.Nd overview of ZFS storage pools 36*3ff01b23SMartin Matuska. 37*3ff01b23SMartin Matuska.Sh DESCRIPTION 38*3ff01b23SMartin Matuska.Ss Virtual Devices (vdevs) 39*3ff01b23SMartin MatuskaA "virtual device" describes a single device or a collection of devices 40*3ff01b23SMartin Matuskaorganized according to certain performance and fault characteristics. 41*3ff01b23SMartin MatuskaThe following virtual devices are supported: 42*3ff01b23SMartin Matuska.Bl -tag -width "special" 43*3ff01b23SMartin Matuska.It Sy disk 44*3ff01b23SMartin MatuskaA block device, typically located under 45*3ff01b23SMartin Matuska.Pa /dev . 46*3ff01b23SMartin MatuskaZFS can use individual slices or partitions, though the recommended mode of 47*3ff01b23SMartin Matuskaoperation is to use whole disks. 48*3ff01b23SMartin MatuskaA disk can be specified by a full path, or it can be a shorthand name 49*3ff01b23SMartin Matuska.Po the relative portion of the path under 50*3ff01b23SMartin Matuska.Pa /dev 51*3ff01b23SMartin Matuska.Pc . 52*3ff01b23SMartin MatuskaA whole disk can be specified by omitting the slice or partition designation. 53*3ff01b23SMartin MatuskaFor example, 54*3ff01b23SMartin Matuska.Pa sda 55*3ff01b23SMartin Matuskais equivalent to 56*3ff01b23SMartin Matuska.Pa /dev/sda . 57*3ff01b23SMartin MatuskaWhen given a whole disk, ZFS automatically labels the disk, if necessary. 58*3ff01b23SMartin Matuska.It Sy file 59*3ff01b23SMartin MatuskaA regular file. 60*3ff01b23SMartin MatuskaThe use of files as a backing store is strongly discouraged. 61*3ff01b23SMartin MatuskaIt is designed primarily for experimental purposes, as the fault tolerance of a 62*3ff01b23SMartin Matuskafile is only as good as the file system on which it resides. 63*3ff01b23SMartin MatuskaA file must be specified by a full path. 64*3ff01b23SMartin Matuska.It Sy mirror 65*3ff01b23SMartin MatuskaA mirror of two or more devices. 66*3ff01b23SMartin MatuskaData is replicated in an identical fashion across all components of a mirror. 67*3ff01b23SMartin MatuskaA mirror with 68*3ff01b23SMartin Matuska.Em N No disks of size Em X No can hold Em X No bytes and can withstand Em N-1 69*3ff01b23SMartin Matuskadevices failing without losing data. 70*3ff01b23SMartin Matuska.It Sy raidz , raidz1 , raidz2 , raidz3 71*3ff01b23SMartin MatuskaA variation on RAID-5 that allows for better distribution of parity and 72*3ff01b23SMartin Matuskaeliminates the RAID-5 73*3ff01b23SMartin Matuska.Qq write hole 74*3ff01b23SMartin Matuska.Pq in which data and parity become inconsistent after a power loss . 75*3ff01b23SMartin MatuskaData and parity is striped across all disks within a raidz group. 76*3ff01b23SMartin Matuska.Pp 77*3ff01b23SMartin MatuskaA raidz group can have single, double, or triple parity, meaning that the 78*3ff01b23SMartin Matuskaraidz group can sustain one, two, or three failures, respectively, without 79*3ff01b23SMartin Matuskalosing any data. 80*3ff01b23SMartin MatuskaThe 81*3ff01b23SMartin Matuska.Sy raidz1 82*3ff01b23SMartin Matuskavdev type specifies a single-parity raidz group; the 83*3ff01b23SMartin Matuska.Sy raidz2 84*3ff01b23SMartin Matuskavdev type specifies a double-parity raidz group; and the 85*3ff01b23SMartin Matuska.Sy raidz3 86*3ff01b23SMartin Matuskavdev type specifies a triple-parity raidz group. 87*3ff01b23SMartin MatuskaThe 88*3ff01b23SMartin Matuska.Sy raidz 89*3ff01b23SMartin Matuskavdev type is an alias for 90*3ff01b23SMartin Matuska.Sy raidz1 . 91*3ff01b23SMartin Matuska.Pp 92*3ff01b23SMartin MatuskaA raidz group with 93*3ff01b23SMartin Matuska.Em N No disks of size Em X No with Em P No parity disks can hold approximately 94*3ff01b23SMartin Matuska.Em (N-P)*X No bytes and can withstand Em P No devices failing without losing data. 95*3ff01b23SMartin MatuskaThe minimum number of devices in a raidz group is one more than the number of 96*3ff01b23SMartin Matuskaparity disks. 97*3ff01b23SMartin MatuskaThe recommended number is between 3 and 9 to help increase performance. 98*3ff01b23SMartin Matuska.It Sy draid , draid1 , draid2 , draid3 99*3ff01b23SMartin MatuskaA variant of raidz that provides integrated distributed hot spares which 100*3ff01b23SMartin Matuskaallows for faster resilvering while retaining the benefits of raidz. 101*3ff01b23SMartin MatuskaA dRAID vdev is constructed from multiple internal raidz groups, each with 102*3ff01b23SMartin Matuska.Em D No data devices and Em P No parity devices. 103*3ff01b23SMartin MatuskaThese groups are distributed over all of the children in order to fully 104*3ff01b23SMartin Matuskautilize the available disk performance. 105*3ff01b23SMartin Matuska.Pp 106*3ff01b23SMartin MatuskaUnlike raidz, dRAID uses a fixed stripe width (padding as necessary with 107*3ff01b23SMartin Matuskazeros) to allow fully sequential resilvering. 108*3ff01b23SMartin MatuskaThis fixed stripe width significantly effects both usable capacity and IOPS. 109*3ff01b23SMartin MatuskaFor example, with the default 110*3ff01b23SMartin Matuska.Em D=8 No and Em 4kB No disk sectors the minimum allocation size is Em 32kB . 111*3ff01b23SMartin MatuskaIf using compression, this relatively large allocation size can reduce the 112*3ff01b23SMartin Matuskaeffective compression ratio. 113*3ff01b23SMartin MatuskaWhen using ZFS volumes and dRAID, the default of the 114*3ff01b23SMartin Matuska.Sy volblocksize 115*3ff01b23SMartin Matuskaproperty is increased to account for the allocation size. 116*3ff01b23SMartin MatuskaIf a dRAID pool will hold a significant amount of small blocks, it is 117*3ff01b23SMartin Matuskarecommended to also add a mirrored 118*3ff01b23SMartin Matuska.Sy special 119*3ff01b23SMartin Matuskavdev to store those blocks. 120*3ff01b23SMartin Matuska.Pp 121*3ff01b23SMartin MatuskaIn regards to I/O, performance is similar to raidz since for any read all 122*3ff01b23SMartin Matuska.Em D No data disks must be accessed. 123*3ff01b23SMartin MatuskaDelivered random IOPS can be reasonably approximated as 124*3ff01b23SMartin Matuska.Sy floor((N-S)/(D+P))*single_drive_IOPS . 125*3ff01b23SMartin Matuska.Pp 126*3ff01b23SMartin MatuskaLike raidzm a dRAID can have single-, double-, or triple-parity. 127*3ff01b23SMartin MatuskaThe 128*3ff01b23SMartin Matuska.Sy draid1 , 129*3ff01b23SMartin Matuska.Sy draid2 , 130*3ff01b23SMartin Matuskaand 131*3ff01b23SMartin Matuska.Sy draid3 132*3ff01b23SMartin Matuskatypes can be used to specify the parity level. 133*3ff01b23SMartin MatuskaThe 134*3ff01b23SMartin Matuska.Sy draid 135*3ff01b23SMartin Matuskavdev type is an alias for 136*3ff01b23SMartin Matuska.Sy draid1 . 137*3ff01b23SMartin Matuska.Pp 138*3ff01b23SMartin MatuskaA dRAID with 139*3ff01b23SMartin Matuska.Em N No disks of size Em X , D No data disks per redundancy group, Em P 140*3ff01b23SMartin Matuska.No parity level, and Em S No distributed hot spares can hold approximately 141*3ff01b23SMartin Matuska.Em (N-S)*(D/(D+P))*X No bytes and can withstand Em P 142*3ff01b23SMartin Matuskadevices failing without losing data. 143*3ff01b23SMartin Matuska.It Sy draid Ns Oo Ar parity Oc Ns Oo Sy \&: Ns Ar data Ns Sy d Oc Ns Oo Sy \&: Ns Ar children Ns Sy c Oc Ns Oo Sy \&: Ns Ar spares Ns Sy s Oc 144*3ff01b23SMartin MatuskaA non-default dRAID configuration can be specified by appending one or more 145*3ff01b23SMartin Matuskaof the following optional arguments to the 146*3ff01b23SMartin Matuska.Sy draid 147*3ff01b23SMartin Matuskakeyword: 148*3ff01b23SMartin Matuska.Bl -tag -compact -width "children" 149*3ff01b23SMartin Matuska.It Ar parity 150*3ff01b23SMartin MatuskaThe parity level (1-3). 151*3ff01b23SMartin Matuska.It Ar data 152*3ff01b23SMartin MatuskaThe number of data devices per redundancy group. 153*3ff01b23SMartin MatuskaIn general, a smaller value of 154*3ff01b23SMartin Matuska.Em D No will increase IOPS, improve the compression ratio, 155*3ff01b23SMartin Matuskaand speed up resilvering at the expense of total usable capacity. 156*3ff01b23SMartin MatuskaDefaults to 157*3ff01b23SMartin Matuska.Em 8 , No unless Em N-P-S No is less than Em 8 . 158*3ff01b23SMartin Matuska.It Ar children 159*3ff01b23SMartin MatuskaThe expected number of children. 160*3ff01b23SMartin MatuskaUseful as a cross-check when listing a large number of devices. 161*3ff01b23SMartin MatuskaAn error is returned when the provided number of children differs. 162*3ff01b23SMartin Matuska.It Ar spares 163*3ff01b23SMartin MatuskaThe number of distributed hot spares. 164*3ff01b23SMartin MatuskaDefaults to zero. 165*3ff01b23SMartin Matuska.El 166*3ff01b23SMartin Matuska.It Sy spare 167*3ff01b23SMartin MatuskaA pseudo-vdev which keeps track of available hot spares for a pool. 168*3ff01b23SMartin MatuskaFor more information, see the 169*3ff01b23SMartin Matuska.Sx Hot Spares 170*3ff01b23SMartin Matuskasection. 171*3ff01b23SMartin Matuska.It Sy log 172*3ff01b23SMartin MatuskaA separate intent log device. 173*3ff01b23SMartin MatuskaIf more than one log device is specified, then writes are load-balanced between 174*3ff01b23SMartin Matuskadevices. 175*3ff01b23SMartin MatuskaLog devices can be mirrored. 176*3ff01b23SMartin MatuskaHowever, raidz vdev types are not supported for the intent log. 177*3ff01b23SMartin MatuskaFor more information, see the 178*3ff01b23SMartin Matuska.Sx Intent Log 179*3ff01b23SMartin Matuskasection. 180*3ff01b23SMartin Matuska.It Sy dedup 181*3ff01b23SMartin MatuskaA device dedicated solely for deduplication tables. 182*3ff01b23SMartin MatuskaThe redundancy of this device should match the redundancy of the other normal 183*3ff01b23SMartin Matuskadevices in the pool. 184*3ff01b23SMartin MatuskaIf more than one dedup device is specified, then 185*3ff01b23SMartin Matuskaallocations are load-balanced between those devices. 186*3ff01b23SMartin Matuska.It Sy special 187*3ff01b23SMartin MatuskaA device dedicated solely for allocating various kinds of internal metadata, 188*3ff01b23SMartin Matuskaand optionally small file blocks. 189*3ff01b23SMartin MatuskaThe redundancy of this device should match the redundancy of the other normal 190*3ff01b23SMartin Matuskadevices in the pool. 191*3ff01b23SMartin MatuskaIf more than one special device is specified, then 192*3ff01b23SMartin Matuskaallocations are load-balanced between those devices. 193*3ff01b23SMartin Matuska.Pp 194*3ff01b23SMartin MatuskaFor more information on special allocations, see the 195*3ff01b23SMartin Matuska.Sx Special Allocation Class 196*3ff01b23SMartin Matuskasection. 197*3ff01b23SMartin Matuska.It Sy cache 198*3ff01b23SMartin MatuskaA device used to cache storage pool data. 199*3ff01b23SMartin MatuskaA cache device cannot be configured as a mirror or raidz group. 200*3ff01b23SMartin MatuskaFor more information, see the 201*3ff01b23SMartin Matuska.Sx Cache Devices 202*3ff01b23SMartin Matuskasection. 203*3ff01b23SMartin Matuska.El 204*3ff01b23SMartin Matuska.Pp 205*3ff01b23SMartin MatuskaVirtual devices cannot be nested, so a mirror or raidz virtual device can only 206*3ff01b23SMartin Matuskacontain files or disks. 207*3ff01b23SMartin MatuskaMirrors of mirrors 208*3ff01b23SMartin Matuska.Pq or other combinations 209*3ff01b23SMartin Matuskaare not allowed. 210*3ff01b23SMartin Matuska.Pp 211*3ff01b23SMartin MatuskaA pool can have any number of virtual devices at the top of the configuration 212*3ff01b23SMartin Matuska.Po known as 213*3ff01b23SMartin Matuska.Qq root vdevs 214*3ff01b23SMartin Matuska.Pc . 215*3ff01b23SMartin MatuskaData is dynamically distributed across all top-level devices to balance data 216*3ff01b23SMartin Matuskaamong devices. 217*3ff01b23SMartin MatuskaAs new virtual devices are added, ZFS automatically places data on the newly 218*3ff01b23SMartin Matuskaavailable devices. 219*3ff01b23SMartin Matuska.Pp 220*3ff01b23SMartin MatuskaVirtual devices are specified one at a time on the command line, 221*3ff01b23SMartin Matuskaseparated by whitespace. 222*3ff01b23SMartin MatuskaKeywords like 223*3ff01b23SMartin Matuska.Sy mirror No and Sy raidz 224*3ff01b23SMartin Matuskaare used to distinguish where a group ends and another begins. 225*3ff01b23SMartin MatuskaFor example, the following creates a pool with two root vdevs, 226*3ff01b23SMartin Matuskaeach a mirror of two disks: 227*3ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar mypool Sy mirror Ar sda sdb Sy mirror Ar sdc sdd 228*3ff01b23SMartin Matuska. 229*3ff01b23SMartin Matuska.Ss Device Failure and Recovery 230*3ff01b23SMartin MatuskaZFS supports a rich set of mechanisms for handling device failure and data 231*3ff01b23SMartin Matuskacorruption. 232*3ff01b23SMartin MatuskaAll metadata and data is checksummed, and ZFS automatically repairs bad data 233*3ff01b23SMartin Matuskafrom a good copy when corruption is detected. 234*3ff01b23SMartin Matuska.Pp 235*3ff01b23SMartin MatuskaIn order to take advantage of these features, a pool must make use of some form 236*3ff01b23SMartin Matuskaof redundancy, using either mirrored or raidz groups. 237*3ff01b23SMartin MatuskaWhile ZFS supports running in a non-redundant configuration, where each root 238*3ff01b23SMartin Matuskavdev is simply a disk or file, this is strongly discouraged. 239*3ff01b23SMartin MatuskaA single case of bit corruption can render some or all of your data unavailable. 240*3ff01b23SMartin Matuska.Pp 241*3ff01b23SMartin MatuskaA pool's health status is described by one of three states: 242*3ff01b23SMartin Matuska.Sy online , degraded , No or Sy faulted . 243*3ff01b23SMartin MatuskaAn online pool has all devices operating normally. 244*3ff01b23SMartin MatuskaA degraded pool is one in which one or more devices have failed, but the data is 245*3ff01b23SMartin Matuskastill available due to a redundant configuration. 246*3ff01b23SMartin MatuskaA faulted pool has corrupted metadata, or one or more faulted devices, and 247*3ff01b23SMartin Matuskainsufficient replicas to continue functioning. 248*3ff01b23SMartin Matuska.Pp 249*3ff01b23SMartin MatuskaThe health of the top-level vdev, such as a mirror or raidz device, 250*3ff01b23SMartin Matuskais potentially impacted by the state of its associated vdevs, 251*3ff01b23SMartin Matuskaor component devices. 252*3ff01b23SMartin MatuskaA top-level vdev or component device is in one of the following states: 253*3ff01b23SMartin Matuska.Bl -tag -width "DEGRADED" 254*3ff01b23SMartin Matuska.It Sy DEGRADED 255*3ff01b23SMartin MatuskaOne or more top-level vdevs is in the degraded state because one or more 256*3ff01b23SMartin Matuskacomponent devices are offline. 257*3ff01b23SMartin MatuskaSufficient replicas exist to continue functioning. 258*3ff01b23SMartin Matuska.Pp 259*3ff01b23SMartin MatuskaOne or more component devices is in the degraded or faulted state, but 260*3ff01b23SMartin Matuskasufficient replicas exist to continue functioning. 261*3ff01b23SMartin MatuskaThe underlying conditions are as follows: 262*3ff01b23SMartin Matuska.Bl -bullet -compact 263*3ff01b23SMartin Matuska.It 264*3ff01b23SMartin MatuskaThe number of checksum errors exceeds acceptable levels and the device is 265*3ff01b23SMartin Matuskadegraded as an indication that something may be wrong. 266*3ff01b23SMartin MatuskaZFS continues to use the device as necessary. 267*3ff01b23SMartin Matuska.It 268*3ff01b23SMartin MatuskaThe number of I/O errors exceeds acceptable levels. 269*3ff01b23SMartin MatuskaThe device could not be marked as faulted because there are insufficient 270*3ff01b23SMartin Matuskareplicas to continue functioning. 271*3ff01b23SMartin Matuska.El 272*3ff01b23SMartin Matuska.It Sy FAULTED 273*3ff01b23SMartin MatuskaOne or more top-level vdevs is in the faulted state because one or more 274*3ff01b23SMartin Matuskacomponent devices are offline. 275*3ff01b23SMartin MatuskaInsufficient replicas exist to continue functioning. 276*3ff01b23SMartin Matuska.Pp 277*3ff01b23SMartin MatuskaOne or more component devices is in the faulted state, and insufficient 278*3ff01b23SMartin Matuskareplicas exist to continue functioning. 279*3ff01b23SMartin MatuskaThe underlying conditions are as follows: 280*3ff01b23SMartin Matuska.Bl -bullet -compact 281*3ff01b23SMartin Matuska.It 282*3ff01b23SMartin MatuskaThe device could be opened, but the contents did not match expected values. 283*3ff01b23SMartin Matuska.It 284*3ff01b23SMartin MatuskaThe number of I/O errors exceeds acceptable levels and the device is faulted to 285*3ff01b23SMartin Matuskaprevent further use of the device. 286*3ff01b23SMartin Matuska.El 287*3ff01b23SMartin Matuska.It Sy OFFLINE 288*3ff01b23SMartin MatuskaThe device was explicitly taken offline by the 289*3ff01b23SMartin Matuska.Nm zpool Cm offline 290*3ff01b23SMartin Matuskacommand. 291*3ff01b23SMartin Matuska.It Sy ONLINE 292*3ff01b23SMartin MatuskaThe device is online and functioning. 293*3ff01b23SMartin Matuska.It Sy REMOVED 294*3ff01b23SMartin MatuskaThe device was physically removed while the system was running. 295*3ff01b23SMartin MatuskaDevice removal detection is hardware-dependent and may not be supported on all 296*3ff01b23SMartin Matuskaplatforms. 297*3ff01b23SMartin Matuska.It Sy UNAVAIL 298*3ff01b23SMartin MatuskaThe device could not be opened. 299*3ff01b23SMartin MatuskaIf a pool is imported when a device was unavailable, then the device will be 300*3ff01b23SMartin Matuskaidentified by a unique identifier instead of its path since the path was never 301*3ff01b23SMartin Matuskacorrect in the first place. 302*3ff01b23SMartin Matuska.El 303*3ff01b23SMartin Matuska.Pp 304*3ff01b23SMartin MatuskaChecksum errors represent events where a disk returned data that was expected 305*3ff01b23SMartin Matuskato be correct, but was not. 306*3ff01b23SMartin MatuskaIn other words, these are instances of silent data corruption. 307*3ff01b23SMartin MatuskaThe checksum errors are reported in 308*3ff01b23SMartin Matuska.Nm zpool Cm status 309*3ff01b23SMartin Matuskaand 310*3ff01b23SMartin Matuska.Nm zpool Cm events . 311*3ff01b23SMartin MatuskaWhen a block is stored redundantly, a damaged block may be reconstructed 312*3ff01b23SMartin Matuska(e.g. from raidz parity or a mirrored copy). 313*3ff01b23SMartin MatuskaIn this case, ZFS reports the checksum error against the disks that contained 314*3ff01b23SMartin Matuskadamaged data. 315*3ff01b23SMartin MatuskaIf a block is unable to be reconstructed (e.g. due to 3 disks being damaged 316*3ff01b23SMartin Matuskain a raidz2 group), it is not possible to determine which disks were silently 317*3ff01b23SMartin Matuskacorrupted. 318*3ff01b23SMartin MatuskaIn this case, checksum errors are reported for all disks on which the block 319*3ff01b23SMartin Matuskais stored. 320*3ff01b23SMartin Matuska.Pp 321*3ff01b23SMartin MatuskaIf a device is removed and later re-attached to the system, 322*3ff01b23SMartin MatuskaZFS attempts online the device automatically. 323*3ff01b23SMartin MatuskaDevice attachment detection is hardware-dependent 324*3ff01b23SMartin Matuskaand might not be supported on all platforms. 325*3ff01b23SMartin Matuska. 326*3ff01b23SMartin Matuska.Ss Hot Spares 327*3ff01b23SMartin MatuskaZFS allows devices to be associated with pools as 328*3ff01b23SMartin Matuska.Qq hot spares . 329*3ff01b23SMartin MatuskaThese devices are not actively used in the pool, but when an active device 330*3ff01b23SMartin Matuskafails, it is automatically replaced by a hot spare. 331*3ff01b23SMartin MatuskaTo create a pool with hot spares, specify a 332*3ff01b23SMartin Matuska.Sy spare 333*3ff01b23SMartin Matuskavdev with any number of devices. 334*3ff01b23SMartin MatuskaFor example, 335*3ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool Sy mirror Ar sda sdb Sy spare Ar sdc sdd 336*3ff01b23SMartin Matuska.Pp 337*3ff01b23SMartin MatuskaSpares can be shared across multiple pools, and can be added with the 338*3ff01b23SMartin Matuska.Nm zpool Cm add 339*3ff01b23SMartin Matuskacommand and removed with the 340*3ff01b23SMartin Matuska.Nm zpool Cm remove 341*3ff01b23SMartin Matuskacommand. 342*3ff01b23SMartin MatuskaOnce a spare replacement is initiated, a new 343*3ff01b23SMartin Matuska.Sy spare 344*3ff01b23SMartin Matuskavdev is created within the configuration that will remain there until the 345*3ff01b23SMartin Matuskaoriginal device is replaced. 346*3ff01b23SMartin MatuskaAt this point, the hot spare becomes available again if another device fails. 347*3ff01b23SMartin Matuska.Pp 348*3ff01b23SMartin MatuskaIf a pool has a shared spare that is currently being used, the pool can not be 349*3ff01b23SMartin Matuskaexported since other pools may use this shared spare, which may lead to 350*3ff01b23SMartin Matuskapotential data corruption. 351*3ff01b23SMartin Matuska.Pp 352*3ff01b23SMartin MatuskaShared spares add some risk. 353*3ff01b23SMartin MatuskaIf the pools are imported on different hosts, 354*3ff01b23SMartin Matuskaand both pools suffer a device failure at the same time, 355*3ff01b23SMartin Matuskaboth could attempt to use the spare at the same time. 356*3ff01b23SMartin MatuskaThis may not be detected, resulting in data corruption. 357*3ff01b23SMartin Matuska.Pp 358*3ff01b23SMartin MatuskaAn in-progress spare replacement can be cancelled by detaching the hot spare. 359*3ff01b23SMartin MatuskaIf the original faulted device is detached, then the hot spare assumes its 360*3ff01b23SMartin Matuskaplace in the configuration, and is removed from the spare list of all active 361*3ff01b23SMartin Matuskapools. 362*3ff01b23SMartin Matuska.Pp 363*3ff01b23SMartin MatuskaThe 364*3ff01b23SMartin Matuska.Sy draid 365*3ff01b23SMartin Matuskavdev type provides distributed hot spares. 366*3ff01b23SMartin MatuskaThese hot spares are named after the dRAID vdev they're a part of 367*3ff01b23SMartin Matuska.Po Sy draid1 Ns - Ns Ar 2 Ns - Ns Ar 3 No specifies spare Ar 3 No of vdev Ar 2 , 368*3ff01b23SMartin Matuska.No which is a single parity dRAID Pc 369*3ff01b23SMartin Matuskaand may only be used by that dRAID vdev. 370*3ff01b23SMartin MatuskaOtherwise, they behave the same as normal hot spares. 371*3ff01b23SMartin Matuska.Pp 372*3ff01b23SMartin MatuskaSpares cannot replace log devices. 373*3ff01b23SMartin Matuska. 374*3ff01b23SMartin Matuska.Ss Intent Log 375*3ff01b23SMartin MatuskaThe ZFS Intent Log (ZIL) satisfies POSIX requirements for synchronous 376*3ff01b23SMartin Matuskatransactions. 377*3ff01b23SMartin MatuskaFor instance, databases often require their transactions to be on stable storage 378*3ff01b23SMartin Matuskadevices when returning from a system call. 379*3ff01b23SMartin MatuskaNFS and other applications can also use 380*3ff01b23SMartin Matuska.Xr fsync 2 381*3ff01b23SMartin Matuskato ensure data stability. 382*3ff01b23SMartin MatuskaBy default, the intent log is allocated from blocks within the main pool. 383*3ff01b23SMartin MatuskaHowever, it might be possible to get better performance using separate intent 384*3ff01b23SMartin Matuskalog devices such as NVRAM or a dedicated disk. 385*3ff01b23SMartin MatuskaFor example: 386*3ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool sda sdb Sy log Ar sdc 387*3ff01b23SMartin Matuska.Pp 388*3ff01b23SMartin MatuskaMultiple log devices can also be specified, and they can be mirrored. 389*3ff01b23SMartin MatuskaSee the 390*3ff01b23SMartin Matuska.Sx EXAMPLES 391*3ff01b23SMartin Matuskasection for an example of mirroring multiple log devices. 392*3ff01b23SMartin Matuska.Pp 393*3ff01b23SMartin MatuskaLog devices can be added, replaced, attached, detached and removed. 394*3ff01b23SMartin MatuskaIn addition, log devices are imported and exported as part of the pool 395*3ff01b23SMartin Matuskathat contains them. 396*3ff01b23SMartin MatuskaMirrored devices can be removed by specifying the top-level mirror vdev. 397*3ff01b23SMartin Matuska. 398*3ff01b23SMartin Matuska.Ss Cache Devices 399*3ff01b23SMartin MatuskaDevices can be added to a storage pool as 400*3ff01b23SMartin Matuska.Qq cache devices . 401*3ff01b23SMartin MatuskaThese devices provide an additional layer of caching between main memory and 402*3ff01b23SMartin Matuskadisk. 403*3ff01b23SMartin MatuskaFor read-heavy workloads, where the working set size is much larger than what 404*3ff01b23SMartin Matuskacan be cached in main memory, using cache devices allows much more of this 405*3ff01b23SMartin Matuskaworking set to be served from low latency media. 406*3ff01b23SMartin MatuskaUsing cache devices provides the greatest performance improvement for random 407*3ff01b23SMartin Matuskaread-workloads of mostly static content. 408*3ff01b23SMartin Matuska.Pp 409*3ff01b23SMartin MatuskaTo create a pool with cache devices, specify a 410*3ff01b23SMartin Matuska.Sy cache 411*3ff01b23SMartin Matuskavdev with any number of devices. 412*3ff01b23SMartin MatuskaFor example: 413*3ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool sda sdb Sy cache Ar sdc sdd 414*3ff01b23SMartin Matuska.Pp 415*3ff01b23SMartin MatuskaCache devices cannot be mirrored or part of a raidz configuration. 416*3ff01b23SMartin MatuskaIf a read error is encountered on a cache device, that read I/O is reissued to 417*3ff01b23SMartin Matuskathe original storage pool device, which might be part of a mirrored or raidz 418*3ff01b23SMartin Matuskaconfiguration. 419*3ff01b23SMartin Matuska.Pp 420*3ff01b23SMartin MatuskaThe content of the cache devices is persistent across reboots and restored 421*3ff01b23SMartin Matuskaasynchronously when importing the pool in L2ARC (persistent L2ARC). 422*3ff01b23SMartin MatuskaThis can be disabled by setting 423*3ff01b23SMartin Matuska.Sy l2arc_rebuild_enabled Ns = Ns Sy 0 . 424*3ff01b23SMartin MatuskaFor cache devices smaller than 425*3ff01b23SMartin Matuska.Em 1GB , 426*3ff01b23SMartin Matuskawe do not write the metadata structures 427*3ff01b23SMartin Matuskarequired for rebuilding the L2ARC in order not to waste space. 428*3ff01b23SMartin MatuskaThis can be changed with 429*3ff01b23SMartin Matuska.Sy l2arc_rebuild_blocks_min_l2size . 430*3ff01b23SMartin MatuskaThe cache device header 431*3ff01b23SMartin Matuska.Pq Em 512B 432*3ff01b23SMartin Matuskais updated even if no metadata structures are written. 433*3ff01b23SMartin MatuskaSetting 434*3ff01b23SMartin Matuska.Sy l2arc_headroom Ns = Ns Sy 0 435*3ff01b23SMartin Matuskawill result in scanning the full-length ARC lists for cacheable content to be 436*3ff01b23SMartin Matuskawritten in L2ARC (persistent ARC). 437*3ff01b23SMartin MatuskaIf a cache device is added with 438*3ff01b23SMartin Matuska.Nm zpool Cm add 439*3ff01b23SMartin Matuskaits label and header will be overwritten and its contents are not going to be 440*3ff01b23SMartin Matuskarestored in L2ARC, even if the device was previously part of the pool. 441*3ff01b23SMartin MatuskaIf a cache device is onlined with 442*3ff01b23SMartin Matuska.Nm zpool Cm online 443*3ff01b23SMartin Matuskaits contents will be restored in L2ARC. 444*3ff01b23SMartin MatuskaThis is useful in case of memory pressure 445*3ff01b23SMartin Matuskawhere the contents of the cache device are not fully restored in L2ARC. 446*3ff01b23SMartin MatuskaThe user can off- and online the cache device when there is less memory pressure 447*3ff01b23SMartin Matuskain order to fully restore its contents to L2ARC. 448*3ff01b23SMartin Matuska. 449*3ff01b23SMartin Matuska.Ss Pool checkpoint 450*3ff01b23SMartin MatuskaBefore starting critical procedures that include destructive actions 451*3ff01b23SMartin Matuska.Pq like Nm zfs Cm destroy , 452*3ff01b23SMartin Matuskaan administrator can checkpoint the pool's state and in the case of a 453*3ff01b23SMartin Matuskamistake or failure, rewind the entire pool back to the checkpoint. 454*3ff01b23SMartin MatuskaOtherwise, the checkpoint can be discarded when the procedure has completed 455*3ff01b23SMartin Matuskasuccessfully. 456*3ff01b23SMartin Matuska.Pp 457*3ff01b23SMartin MatuskaA pool checkpoint can be thought of as a pool-wide snapshot and should be used 458*3ff01b23SMartin Matuskawith care as it contains every part of the pool's state, from properties to vdev 459*3ff01b23SMartin Matuskaconfiguration. 460*3ff01b23SMartin MatuskaThus, certain operations are not allowed while a pool has a checkpoint. 461*3ff01b23SMartin MatuskaSpecifically, vdev removal/attach/detach, mirror splitting, and 462*3ff01b23SMartin Matuskachanging the pool's GUID. 463*3ff01b23SMartin MatuskaAdding a new vdev is supported, but in the case of a rewind it will have to be 464*3ff01b23SMartin Matuskaadded again. 465*3ff01b23SMartin MatuskaFinally, users of this feature should keep in mind that scrubs in a pool that 466*3ff01b23SMartin Matuskahas a checkpoint do not repair checkpointed data. 467*3ff01b23SMartin Matuska.Pp 468*3ff01b23SMartin MatuskaTo create a checkpoint for a pool: 469*3ff01b23SMartin Matuska.Dl # Nm zpool Cm checkpoint Ar pool 470*3ff01b23SMartin Matuska.Pp 471*3ff01b23SMartin MatuskaTo later rewind to its checkpointed state, you need to first export it and 472*3ff01b23SMartin Matuskathen rewind it during import: 473*3ff01b23SMartin Matuska.Dl # Nm zpool Cm export Ar pool 474*3ff01b23SMartin Matuska.Dl # Nm zpool Cm import Fl -rewind-to-checkpoint Ar pool 475*3ff01b23SMartin Matuska.Pp 476*3ff01b23SMartin MatuskaTo discard the checkpoint from a pool: 477*3ff01b23SMartin Matuska.Dl # Nm zpool Cm checkpoint Fl d Ar pool 478*3ff01b23SMartin Matuska.Pp 479*3ff01b23SMartin MatuskaDataset reservations (controlled by the 480*3ff01b23SMartin Matuska.Sy reservation No and Sy refreservation 481*3ff01b23SMartin Matuskaproperties) may be unenforceable while a checkpoint exists, because the 482*3ff01b23SMartin Matuskacheckpoint is allowed to consume the dataset's reservation. 483*3ff01b23SMartin MatuskaFinally, data that is part of the checkpoint but has been freed in the 484*3ff01b23SMartin Matuskacurrent state of the pool won't be scanned during a scrub. 485*3ff01b23SMartin Matuska. 486*3ff01b23SMartin Matuska.Ss Special Allocation Class 487*3ff01b23SMartin MatuskaAllocations in the special class are dedicated to specific block types. 488*3ff01b23SMartin MatuskaBy default this includes all metadata, the indirect blocks of user data, and 489*3ff01b23SMartin Matuskaany deduplication tables. 490*3ff01b23SMartin MatuskaThe class can also be provisioned to accept small file blocks. 491*3ff01b23SMartin Matuska.Pp 492*3ff01b23SMartin MatuskaA pool must always have at least one normal 493*3ff01b23SMartin Matuska.Pq non- Ns Sy dedup Ns /- Ns Sy special 494*3ff01b23SMartin Matuskavdev before 495*3ff01b23SMartin Matuskaother devices can be assigned to the special class. 496*3ff01b23SMartin MatuskaIf the 497*3ff01b23SMartin Matuska.Sy special 498*3ff01b23SMartin Matuskaclass becomes full, then allocations intended for it 499*3ff01b23SMartin Matuskawill spill back into the normal class. 500*3ff01b23SMartin Matuska.Pp 501*3ff01b23SMartin MatuskaDeduplication tables can be excluded from the special class by unsetting the 502*3ff01b23SMartin Matuska.Sy zfs_ddt_data_is_special 503*3ff01b23SMartin MatuskaZFS module parameter. 504*3ff01b23SMartin Matuska.Pp 505*3ff01b23SMartin MatuskaInclusion of small file blocks in the special class is opt-in. 506*3ff01b23SMartin MatuskaEach dataset can control the size of small file blocks allowed 507*3ff01b23SMartin Matuskain the special class by setting the 508*3ff01b23SMartin Matuska.Sy special_small_blocks 509*3ff01b23SMartin Matuskaproperty to nonzero. 510*3ff01b23SMartin MatuskaSee 511*3ff01b23SMartin Matuska.Xr zfsprops 7 512*3ff01b23SMartin Matuskafor more info on this property. 513