xref: /freebsd/sys/contrib/openzfs/man/man7/zpoolconcepts.7 (revision d411c1d696ef35d60f8c3564e5eef7aeafa2fece)
13ff01b23SMartin Matuska.\"
23ff01b23SMartin Matuska.\" CDDL HEADER START
33ff01b23SMartin Matuska.\"
43ff01b23SMartin Matuska.\" The contents of this file are subject to the terms of the
53ff01b23SMartin Matuska.\" Common Development and Distribution License (the "License").
63ff01b23SMartin Matuska.\" You may not use this file except in compliance with the License.
73ff01b23SMartin Matuska.\"
83ff01b23SMartin Matuska.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
9271171e0SMartin Matuska.\" or https://opensource.org/licenses/CDDL-1.0.
103ff01b23SMartin Matuska.\" See the License for the specific language governing permissions
113ff01b23SMartin Matuska.\" and limitations under the License.
123ff01b23SMartin Matuska.\"
133ff01b23SMartin Matuska.\" When distributing Covered Code, include this CDDL HEADER in each
143ff01b23SMartin Matuska.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
153ff01b23SMartin Matuska.\" If applicable, add the following below this CDDL HEADER, with the
163ff01b23SMartin Matuska.\" fields enclosed by brackets "[]" replaced with your own identifying
173ff01b23SMartin Matuska.\" information: Portions Copyright [yyyy] [name of copyright owner]
183ff01b23SMartin Matuska.\"
193ff01b23SMartin Matuska.\" CDDL HEADER END
203ff01b23SMartin Matuska.\"
213ff01b23SMartin Matuska.\" Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved.
223ff01b23SMartin Matuska.\" Copyright (c) 2012, 2018 by Delphix. All rights reserved.
233ff01b23SMartin Matuska.\" Copyright (c) 2012 Cyril Plisko. All Rights Reserved.
243ff01b23SMartin Matuska.\" Copyright (c) 2017 Datto Inc.
253ff01b23SMartin Matuska.\" Copyright (c) 2018 George Melikov. All Rights Reserved.
263ff01b23SMartin Matuska.\" Copyright 2017 Nexenta Systems, Inc.
273ff01b23SMartin Matuska.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
283ff01b23SMartin Matuska.\"
29*d411c1d6SMartin Matuska.Dd April 7, 2023
303ff01b23SMartin Matuska.Dt ZPOOLCONCEPTS 7
313ff01b23SMartin Matuska.Os
323ff01b23SMartin Matuska.
333ff01b23SMartin Matuska.Sh NAME
343ff01b23SMartin Matuska.Nm zpoolconcepts
353ff01b23SMartin Matuska.Nd overview of ZFS storage pools
363ff01b23SMartin Matuska.
373ff01b23SMartin Matuska.Sh DESCRIPTION
383ff01b23SMartin Matuska.Ss Virtual Devices (vdevs)
39*d411c1d6SMartin MatuskaA "virtual device" describes a single device or a collection of devices,
403ff01b23SMartin Matuskaorganized according to certain performance and fault characteristics.
413ff01b23SMartin MatuskaThe following virtual devices are supported:
423ff01b23SMartin Matuska.Bl -tag -width "special"
433ff01b23SMartin Matuska.It Sy disk
443ff01b23SMartin MatuskaA block device, typically located under
453ff01b23SMartin Matuska.Pa /dev .
463ff01b23SMartin MatuskaZFS can use individual slices or partitions, though the recommended mode of
473ff01b23SMartin Matuskaoperation is to use whole disks.
483ff01b23SMartin MatuskaA disk can be specified by a full path, or it can be a shorthand name
493ff01b23SMartin Matuska.Po the relative portion of the path under
503ff01b23SMartin Matuska.Pa /dev
513ff01b23SMartin Matuska.Pc .
523ff01b23SMartin MatuskaA whole disk can be specified by omitting the slice or partition designation.
533ff01b23SMartin MatuskaFor example,
543ff01b23SMartin Matuska.Pa sda
553ff01b23SMartin Matuskais equivalent to
563ff01b23SMartin Matuska.Pa /dev/sda .
573ff01b23SMartin MatuskaWhen given a whole disk, ZFS automatically labels the disk, if necessary.
583ff01b23SMartin Matuska.It Sy file
593ff01b23SMartin MatuskaA regular file.
603ff01b23SMartin MatuskaThe use of files as a backing store is strongly discouraged.
613ff01b23SMartin MatuskaIt is designed primarily for experimental purposes, as the fault tolerance of a
623ff01b23SMartin Matuskafile is only as good as the file system on which it resides.
633ff01b23SMartin MatuskaA file must be specified by a full path.
643ff01b23SMartin Matuska.It Sy mirror
653ff01b23SMartin MatuskaA mirror of two or more devices.
663ff01b23SMartin MatuskaData is replicated in an identical fashion across all components of a mirror.
673ff01b23SMartin MatuskaA mirror with
683ff01b23SMartin Matuska.Em N No disks of size Em X No can hold Em X No bytes and can withstand Em N-1
69*d411c1d6SMartin Matuskadevices failing, without losing data.
703ff01b23SMartin Matuska.It Sy raidz , raidz1 , raidz2 , raidz3
71*d411c1d6SMartin MatuskaA distributed-parity layout, similar to RAID-5/6, with improved distribution of
72*d411c1d6SMartin Matuskaparity, and which does not suffer from the RAID-5/6
73*d411c1d6SMartin Matuska.Qq write hole ,
743ff01b23SMartin Matuska.Pq in which data and parity become inconsistent after a power loss .
75*d411c1d6SMartin MatuskaData and parity is striped across all disks within a raidz group, though not
76*d411c1d6SMartin Matuskanecessarily in a consistent stripe width.
773ff01b23SMartin Matuska.Pp
783ff01b23SMartin MatuskaA raidz group can have single, double, or triple parity, meaning that the
793ff01b23SMartin Matuskaraidz group can sustain one, two, or three failures, respectively, without
803ff01b23SMartin Matuskalosing any data.
813ff01b23SMartin MatuskaThe
823ff01b23SMartin Matuska.Sy raidz1
833ff01b23SMartin Matuskavdev type specifies a single-parity raidz group; the
843ff01b23SMartin Matuska.Sy raidz2
853ff01b23SMartin Matuskavdev type specifies a double-parity raidz group; and the
863ff01b23SMartin Matuska.Sy raidz3
873ff01b23SMartin Matuskavdev type specifies a triple-parity raidz group.
883ff01b23SMartin MatuskaThe
893ff01b23SMartin Matuska.Sy raidz
903ff01b23SMartin Matuskavdev type is an alias for
913ff01b23SMartin Matuska.Sy raidz1 .
923ff01b23SMartin Matuska.Pp
933ff01b23SMartin MatuskaA raidz group with
943ff01b23SMartin Matuska.Em N No disks of size Em X No with Em P No parity disks can hold approximately
953ff01b23SMartin Matuska.Em (N-P)*X No bytes and can withstand Em P No devices failing without losing data .
963ff01b23SMartin MatuskaThe minimum number of devices in a raidz group is one more than the number of
973ff01b23SMartin Matuskaparity disks.
983ff01b23SMartin MatuskaThe recommended number is between 3 and 9 to help increase performance.
993ff01b23SMartin Matuska.It Sy draid , draid1 , draid2 , draid3
100*d411c1d6SMartin MatuskaA variant of raidz that provides integrated distributed hot spares, allowing
101*d411c1d6SMartin Matuskafor faster resilvering, while retaining the benefits of raidz.
1023ff01b23SMartin MatuskaA dRAID vdev is constructed from multiple internal raidz groups, each with
1033ff01b23SMartin Matuska.Em D No data devices and Em P No parity devices .
1043ff01b23SMartin MatuskaThese groups are distributed over all of the children in order to fully
1053ff01b23SMartin Matuskautilize the available disk performance.
1063ff01b23SMartin Matuska.Pp
1073ff01b23SMartin MatuskaUnlike raidz, dRAID uses a fixed stripe width (padding as necessary with
1083ff01b23SMartin Matuskazeros) to allow fully sequential resilvering.
109*d411c1d6SMartin MatuskaThis fixed stripe width significantly affects both usable capacity and IOPS.
1103ff01b23SMartin MatuskaFor example, with the default
111716fd348SMartin Matuska.Em D=8 No and Em 4 KiB No disk sectors the minimum allocation size is Em 32 KiB .
1123ff01b23SMartin MatuskaIf using compression, this relatively large allocation size can reduce the
1133ff01b23SMartin Matuskaeffective compression ratio.
114*d411c1d6SMartin MatuskaWhen using ZFS volumes (zvols) and dRAID, the default of the
1153ff01b23SMartin Matuska.Sy volblocksize
1163ff01b23SMartin Matuskaproperty is increased to account for the allocation size.
1173ff01b23SMartin MatuskaIf a dRAID pool will hold a significant amount of small blocks, it is
1183ff01b23SMartin Matuskarecommended to also add a mirrored
1193ff01b23SMartin Matuska.Sy special
1203ff01b23SMartin Matuskavdev to store those blocks.
1213ff01b23SMartin Matuska.Pp
122*d411c1d6SMartin MatuskaIn regards to I/O, performance is similar to raidz since, for any read, all
1233ff01b23SMartin Matuska.Em D No data disks must be accessed .
1243ff01b23SMartin MatuskaDelivered random IOPS can be reasonably approximated as
1253ff01b23SMartin Matuska.Sy floor((N-S)/(D+P))*single_drive_IOPS .
1263ff01b23SMartin Matuska.Pp
127da5137abSMartin MatuskaLike raidz, a dRAID can have single-, double-, or triple-parity.
1283ff01b23SMartin MatuskaThe
1293ff01b23SMartin Matuska.Sy draid1 ,
1303ff01b23SMartin Matuska.Sy draid2 ,
1313ff01b23SMartin Matuskaand
1323ff01b23SMartin Matuska.Sy draid3
1333ff01b23SMartin Matuskatypes can be used to specify the parity level.
1343ff01b23SMartin MatuskaThe
1353ff01b23SMartin Matuska.Sy draid
1363ff01b23SMartin Matuskavdev type is an alias for
1373ff01b23SMartin Matuska.Sy draid1 .
1383ff01b23SMartin Matuska.Pp
1393ff01b23SMartin MatuskaA dRAID with
1403ff01b23SMartin Matuska.Em N No disks of size Em X , D No data disks per redundancy group , Em P
1413ff01b23SMartin Matuska.No parity level, and Em S No distributed hot spares can hold approximately
1423ff01b23SMartin Matuska.Em (N-S)*(D/(D+P))*X No bytes and can withstand Em P
1433ff01b23SMartin Matuskadevices failing without losing data.
1443ff01b23SMartin Matuska.It Sy draid Ns Oo Ar parity Oc Ns Oo Sy \&: Ns Ar data Ns Sy d Oc Ns Oo Sy \&: Ns Ar children Ns Sy c Oc Ns Oo Sy \&: Ns Ar spares Ns Sy s Oc
1453ff01b23SMartin MatuskaA non-default dRAID configuration can be specified by appending one or more
1463ff01b23SMartin Matuskaof the following optional arguments to the
1473ff01b23SMartin Matuska.Sy draid
1483ff01b23SMartin Matuskakeyword:
1493ff01b23SMartin Matuska.Bl -tag -compact -width "children"
1503ff01b23SMartin Matuska.It Ar parity
1513ff01b23SMartin MatuskaThe parity level (1-3).
1523ff01b23SMartin Matuska.It Ar data
1533ff01b23SMartin MatuskaThe number of data devices per redundancy group.
1543ff01b23SMartin MatuskaIn general, a smaller value of
1553ff01b23SMartin Matuska.Em D No will increase IOPS, improve the compression ratio ,
1563ff01b23SMartin Matuskaand speed up resilvering at the expense of total usable capacity.
1573ff01b23SMartin MatuskaDefaults to
1583ff01b23SMartin Matuska.Em 8 , No unless Em N-P-S No is less than Em 8 .
1593ff01b23SMartin Matuska.It Ar children
1603ff01b23SMartin MatuskaThe expected number of children.
1613ff01b23SMartin MatuskaUseful as a cross-check when listing a large number of devices.
1623ff01b23SMartin MatuskaAn error is returned when the provided number of children differs.
1633ff01b23SMartin Matuska.It Ar spares
1643ff01b23SMartin MatuskaThe number of distributed hot spares.
1653ff01b23SMartin MatuskaDefaults to zero.
1663ff01b23SMartin Matuska.El
1673ff01b23SMartin Matuska.It Sy spare
1683ff01b23SMartin MatuskaA pseudo-vdev which keeps track of available hot spares for a pool.
1693ff01b23SMartin MatuskaFor more information, see the
1703ff01b23SMartin Matuska.Sx Hot Spares
1713ff01b23SMartin Matuskasection.
1723ff01b23SMartin Matuska.It Sy log
1733ff01b23SMartin MatuskaA separate intent log device.
1743ff01b23SMartin MatuskaIf more than one log device is specified, then writes are load-balanced between
1753ff01b23SMartin Matuskadevices.
1763ff01b23SMartin MatuskaLog devices can be mirrored.
1773ff01b23SMartin MatuskaHowever, raidz vdev types are not supported for the intent log.
1783ff01b23SMartin MatuskaFor more information, see the
1793ff01b23SMartin Matuska.Sx Intent Log
1803ff01b23SMartin Matuskasection.
1813ff01b23SMartin Matuska.It Sy dedup
182*d411c1d6SMartin MatuskaA device solely dedicated for deduplication tables.
1833ff01b23SMartin MatuskaThe redundancy of this device should match the redundancy of the other normal
1843ff01b23SMartin Matuskadevices in the pool.
1853ff01b23SMartin MatuskaIf more than one dedup device is specified, then
1863ff01b23SMartin Matuskaallocations are load-balanced between those devices.
1873ff01b23SMartin Matuska.It Sy special
1883ff01b23SMartin MatuskaA device dedicated solely for allocating various kinds of internal metadata,
1893ff01b23SMartin Matuskaand optionally small file blocks.
1903ff01b23SMartin MatuskaThe redundancy of this device should match the redundancy of the other normal
1913ff01b23SMartin Matuskadevices in the pool.
1923ff01b23SMartin MatuskaIf more than one special device is specified, then
1933ff01b23SMartin Matuskaallocations are load-balanced between those devices.
1943ff01b23SMartin Matuska.Pp
1953ff01b23SMartin MatuskaFor more information on special allocations, see the
1963ff01b23SMartin Matuska.Sx Special Allocation Class
1973ff01b23SMartin Matuskasection.
1983ff01b23SMartin Matuska.It Sy cache
1993ff01b23SMartin MatuskaA device used to cache storage pool data.
2003ff01b23SMartin MatuskaA cache device cannot be configured as a mirror or raidz group.
2013ff01b23SMartin MatuskaFor more information, see the
2023ff01b23SMartin Matuska.Sx Cache Devices
2033ff01b23SMartin Matuskasection.
2043ff01b23SMartin Matuska.El
2053ff01b23SMartin Matuska.Pp
2063ff01b23SMartin MatuskaVirtual devices cannot be nested, so a mirror or raidz virtual device can only
2073ff01b23SMartin Matuskacontain files or disks.
2083ff01b23SMartin MatuskaMirrors of mirrors
2093ff01b23SMartin Matuska.Pq or other combinations
2103ff01b23SMartin Matuskaare not allowed.
2113ff01b23SMartin Matuska.Pp
2123ff01b23SMartin MatuskaA pool can have any number of virtual devices at the top of the configuration
2133ff01b23SMartin Matuska.Po known as
2143ff01b23SMartin Matuska.Qq root vdevs
2153ff01b23SMartin Matuska.Pc .
2163ff01b23SMartin MatuskaData is dynamically distributed across all top-level devices to balance data
2173ff01b23SMartin Matuskaamong devices.
2183ff01b23SMartin MatuskaAs new virtual devices are added, ZFS automatically places data on the newly
2193ff01b23SMartin Matuskaavailable devices.
2203ff01b23SMartin Matuska.Pp
2213ff01b23SMartin MatuskaVirtual devices are specified one at a time on the command line,
2223ff01b23SMartin Matuskaseparated by whitespace.
2233ff01b23SMartin MatuskaKeywords like
2243ff01b23SMartin Matuska.Sy mirror No and Sy raidz
2253ff01b23SMartin Matuskaare used to distinguish where a group ends and another begins.
2263ff01b23SMartin MatuskaFor example, the following creates a pool with two root vdevs,
2273ff01b23SMartin Matuskaeach a mirror of two disks:
2283ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar mypool Sy mirror Ar sda sdb Sy mirror Ar sdc sdd
2293ff01b23SMartin Matuska.
2303ff01b23SMartin Matuska.Ss Device Failure and Recovery
2313ff01b23SMartin MatuskaZFS supports a rich set of mechanisms for handling device failure and data
2323ff01b23SMartin Matuskacorruption.
2333ff01b23SMartin MatuskaAll metadata and data is checksummed, and ZFS automatically repairs bad data
234*d411c1d6SMartin Matuskafrom a good copy, when corruption is detected.
2353ff01b23SMartin Matuska.Pp
2363ff01b23SMartin MatuskaIn order to take advantage of these features, a pool must make use of some form
2373ff01b23SMartin Matuskaof redundancy, using either mirrored or raidz groups.
2383ff01b23SMartin MatuskaWhile ZFS supports running in a non-redundant configuration, where each root
2393ff01b23SMartin Matuskavdev is simply a disk or file, this is strongly discouraged.
2403ff01b23SMartin MatuskaA single case of bit corruption can render some or all of your data unavailable.
2413ff01b23SMartin Matuska.Pp
2423ff01b23SMartin MatuskaA pool's health status is described by one of three states:
2433ff01b23SMartin Matuska.Sy online , degraded , No or Sy faulted .
2443ff01b23SMartin MatuskaAn online pool has all devices operating normally.
2453ff01b23SMartin MatuskaA degraded pool is one in which one or more devices have failed, but the data is
2463ff01b23SMartin Matuskastill available due to a redundant configuration.
2473ff01b23SMartin MatuskaA faulted pool has corrupted metadata, or one or more faulted devices, and
2483ff01b23SMartin Matuskainsufficient replicas to continue functioning.
2493ff01b23SMartin Matuska.Pp
2503ff01b23SMartin MatuskaThe health of the top-level vdev, such as a mirror or raidz device,
251*d411c1d6SMartin Matuskais potentially impacted by the state of its associated vdevs
2523ff01b23SMartin Matuskaor component devices.
2533ff01b23SMartin MatuskaA top-level vdev or component device is in one of the following states:
2543ff01b23SMartin Matuska.Bl -tag -width "DEGRADED"
2553ff01b23SMartin Matuska.It Sy DEGRADED
2563ff01b23SMartin MatuskaOne or more top-level vdevs is in the degraded state because one or more
2573ff01b23SMartin Matuskacomponent devices are offline.
2583ff01b23SMartin MatuskaSufficient replicas exist to continue functioning.
2593ff01b23SMartin Matuska.Pp
2603ff01b23SMartin MatuskaOne or more component devices is in the degraded or faulted state, but
2613ff01b23SMartin Matuskasufficient replicas exist to continue functioning.
2623ff01b23SMartin MatuskaThe underlying conditions are as follows:
2633ff01b23SMartin Matuska.Bl -bullet -compact
2643ff01b23SMartin Matuska.It
2653ff01b23SMartin MatuskaThe number of checksum errors exceeds acceptable levels and the device is
2663ff01b23SMartin Matuskadegraded as an indication that something may be wrong.
2673ff01b23SMartin MatuskaZFS continues to use the device as necessary.
2683ff01b23SMartin Matuska.It
2693ff01b23SMartin MatuskaThe number of I/O errors exceeds acceptable levels.
2703ff01b23SMartin MatuskaThe device could not be marked as faulted because there are insufficient
2713ff01b23SMartin Matuskareplicas to continue functioning.
2723ff01b23SMartin Matuska.El
2733ff01b23SMartin Matuska.It Sy FAULTED
2743ff01b23SMartin MatuskaOne or more top-level vdevs is in the faulted state because one or more
2753ff01b23SMartin Matuskacomponent devices are offline.
2763ff01b23SMartin MatuskaInsufficient replicas exist to continue functioning.
2773ff01b23SMartin Matuska.Pp
2783ff01b23SMartin MatuskaOne or more component devices is in the faulted state, and insufficient
2793ff01b23SMartin Matuskareplicas exist to continue functioning.
2803ff01b23SMartin MatuskaThe underlying conditions are as follows:
2813ff01b23SMartin Matuska.Bl -bullet -compact
2823ff01b23SMartin Matuska.It
2833ff01b23SMartin MatuskaThe device could be opened, but the contents did not match expected values.
2843ff01b23SMartin Matuska.It
2853ff01b23SMartin MatuskaThe number of I/O errors exceeds acceptable levels and the device is faulted to
2863ff01b23SMartin Matuskaprevent further use of the device.
2873ff01b23SMartin Matuska.El
2883ff01b23SMartin Matuska.It Sy OFFLINE
2893ff01b23SMartin MatuskaThe device was explicitly taken offline by the
2903ff01b23SMartin Matuska.Nm zpool Cm offline
2913ff01b23SMartin Matuskacommand.
2923ff01b23SMartin Matuska.It Sy ONLINE
2933ff01b23SMartin MatuskaThe device is online and functioning.
2943ff01b23SMartin Matuska.It Sy REMOVED
2953ff01b23SMartin MatuskaThe device was physically removed while the system was running.
2963ff01b23SMartin MatuskaDevice removal detection is hardware-dependent and may not be supported on all
2973ff01b23SMartin Matuskaplatforms.
2983ff01b23SMartin Matuska.It Sy UNAVAIL
2993ff01b23SMartin MatuskaThe device could not be opened.
3003ff01b23SMartin MatuskaIf a pool is imported when a device was unavailable, then the device will be
3013ff01b23SMartin Matuskaidentified by a unique identifier instead of its path since the path was never
3023ff01b23SMartin Matuskacorrect in the first place.
3033ff01b23SMartin Matuska.El
3043ff01b23SMartin Matuska.Pp
3053ff01b23SMartin MatuskaChecksum errors represent events where a disk returned data that was expected
3063ff01b23SMartin Matuskato be correct, but was not.
3073ff01b23SMartin MatuskaIn other words, these are instances of silent data corruption.
3083ff01b23SMartin MatuskaThe checksum errors are reported in
3093ff01b23SMartin Matuska.Nm zpool Cm status
3103ff01b23SMartin Matuskaand
3113ff01b23SMartin Matuska.Nm zpool Cm events .
3123ff01b23SMartin MatuskaWhen a block is stored redundantly, a damaged block may be reconstructed
3133ff01b23SMartin Matuska(e.g. from raidz parity or a mirrored copy).
3143ff01b23SMartin MatuskaIn this case, ZFS reports the checksum error against the disks that contained
3153ff01b23SMartin Matuskadamaged data.
3163ff01b23SMartin MatuskaIf a block is unable to be reconstructed (e.g. due to 3 disks being damaged
3173ff01b23SMartin Matuskain a raidz2 group), it is not possible to determine which disks were silently
3183ff01b23SMartin Matuskacorrupted.
3193ff01b23SMartin MatuskaIn this case, checksum errors are reported for all disks on which the block
3203ff01b23SMartin Matuskais stored.
3213ff01b23SMartin Matuska.Pp
3223ff01b23SMartin MatuskaIf a device is removed and later re-attached to the system,
323*d411c1d6SMartin MatuskaZFS attempts to bring the device online automatically.
3243ff01b23SMartin MatuskaDevice attachment detection is hardware-dependent
3253ff01b23SMartin Matuskaand might not be supported on all platforms.
3263ff01b23SMartin Matuska.
3273ff01b23SMartin Matuska.Ss Hot Spares
3283ff01b23SMartin MatuskaZFS allows devices to be associated with pools as
3293ff01b23SMartin Matuska.Qq hot spares .
330*d411c1d6SMartin MatuskaThese devices are not actively used in the pool.
331*d411c1d6SMartin MatuskaBut, when an active device
3323ff01b23SMartin Matuskafails, it is automatically replaced by a hot spare.
3333ff01b23SMartin MatuskaTo create a pool with hot spares, specify a
3343ff01b23SMartin Matuska.Sy spare
3353ff01b23SMartin Matuskavdev with any number of devices.
3363ff01b23SMartin MatuskaFor example,
3373ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool Sy mirror Ar sda sdb Sy spare Ar sdc sdd
3383ff01b23SMartin Matuska.Pp
3393ff01b23SMartin MatuskaSpares can be shared across multiple pools, and can be added with the
3403ff01b23SMartin Matuska.Nm zpool Cm add
3413ff01b23SMartin Matuskacommand and removed with the
3423ff01b23SMartin Matuska.Nm zpool Cm remove
3433ff01b23SMartin Matuskacommand.
3443ff01b23SMartin MatuskaOnce a spare replacement is initiated, a new
3453ff01b23SMartin Matuska.Sy spare
3463ff01b23SMartin Matuskavdev is created within the configuration that will remain there until the
3473ff01b23SMartin Matuskaoriginal device is replaced.
348*d411c1d6SMartin MatuskaAt this point, the hot spare becomes available again, if another device fails.
3493ff01b23SMartin Matuska.Pp
3503ff01b23SMartin MatuskaIf a pool has a shared spare that is currently being used, the pool cannot be
351*d411c1d6SMartin Matuskaexported, since other pools may use this shared spare, which may lead to
3523ff01b23SMartin Matuskapotential data corruption.
3533ff01b23SMartin Matuska.Pp
3543ff01b23SMartin MatuskaShared spares add some risk.
3553ff01b23SMartin MatuskaIf the pools are imported on different hosts,
3563ff01b23SMartin Matuskaand both pools suffer a device failure at the same time,
3573ff01b23SMartin Matuskaboth could attempt to use the spare at the same time.
3583ff01b23SMartin MatuskaThis may not be detected, resulting in data corruption.
3593ff01b23SMartin Matuska.Pp
3603ff01b23SMartin MatuskaAn in-progress spare replacement can be cancelled by detaching the hot spare.
3613ff01b23SMartin MatuskaIf the original faulted device is detached, then the hot spare assumes its
3623ff01b23SMartin Matuskaplace in the configuration, and is removed from the spare list of all active
3633ff01b23SMartin Matuskapools.
3643ff01b23SMartin Matuska.Pp
3653ff01b23SMartin MatuskaThe
3663ff01b23SMartin Matuska.Sy draid
3673ff01b23SMartin Matuskavdev type provides distributed hot spares.
3683ff01b23SMartin MatuskaThese hot spares are named after the dRAID vdev they're a part of
3693ff01b23SMartin Matuska.Po Sy draid1 Ns - Ns Ar 2 Ns - Ns Ar 3 No specifies spare Ar 3 No of vdev Ar 2 ,
3703ff01b23SMartin Matuska.No which is a single parity dRAID Pc
3713ff01b23SMartin Matuskaand may only be used by that dRAID vdev.
3723ff01b23SMartin MatuskaOtherwise, they behave the same as normal hot spares.
3733ff01b23SMartin Matuska.Pp
3743ff01b23SMartin MatuskaSpares cannot replace log devices.
3753ff01b23SMartin Matuska.
3763ff01b23SMartin Matuska.Ss Intent Log
3773ff01b23SMartin MatuskaThe ZFS Intent Log (ZIL) satisfies POSIX requirements for synchronous
3783ff01b23SMartin Matuskatransactions.
3793ff01b23SMartin MatuskaFor instance, databases often require their transactions to be on stable storage
3803ff01b23SMartin Matuskadevices when returning from a system call.
3813ff01b23SMartin MatuskaNFS and other applications can also use
3823ff01b23SMartin Matuska.Xr fsync 2
3833ff01b23SMartin Matuskato ensure data stability.
3843ff01b23SMartin MatuskaBy default, the intent log is allocated from blocks within the main pool.
3853ff01b23SMartin MatuskaHowever, it might be possible to get better performance using separate intent
3863ff01b23SMartin Matuskalog devices such as NVRAM or a dedicated disk.
3873ff01b23SMartin MatuskaFor example:
3883ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool sda sdb Sy log Ar sdc
3893ff01b23SMartin Matuska.Pp
3903ff01b23SMartin MatuskaMultiple log devices can also be specified, and they can be mirrored.
3913ff01b23SMartin MatuskaSee the
3923ff01b23SMartin Matuska.Sx EXAMPLES
3933ff01b23SMartin Matuskasection for an example of mirroring multiple log devices.
3943ff01b23SMartin Matuska.Pp
395*d411c1d6SMartin MatuskaLog devices can be added, replaced, attached, detached, and removed.
3963ff01b23SMartin MatuskaIn addition, log devices are imported and exported as part of the pool
3973ff01b23SMartin Matuskathat contains them.
3983ff01b23SMartin MatuskaMirrored devices can be removed by specifying the top-level mirror vdev.
3993ff01b23SMartin Matuska.
4003ff01b23SMartin Matuska.Ss Cache Devices
4013ff01b23SMartin MatuskaDevices can be added to a storage pool as
4023ff01b23SMartin Matuska.Qq cache devices .
4033ff01b23SMartin MatuskaThese devices provide an additional layer of caching between main memory and
4043ff01b23SMartin Matuskadisk.
4053ff01b23SMartin MatuskaFor read-heavy workloads, where the working set size is much larger than what
4063ff01b23SMartin Matuskacan be cached in main memory, using cache devices allows much more of this
4073ff01b23SMartin Matuskaworking set to be served from low latency media.
4083ff01b23SMartin MatuskaUsing cache devices provides the greatest performance improvement for random
4093ff01b23SMartin Matuskaread-workloads of mostly static content.
4103ff01b23SMartin Matuska.Pp
4113ff01b23SMartin MatuskaTo create a pool with cache devices, specify a
4123ff01b23SMartin Matuska.Sy cache
4133ff01b23SMartin Matuskavdev with any number of devices.
4143ff01b23SMartin MatuskaFor example:
4153ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool sda sdb Sy cache Ar sdc sdd
4163ff01b23SMartin Matuska.Pp
4173ff01b23SMartin MatuskaCache devices cannot be mirrored or part of a raidz configuration.
4183ff01b23SMartin MatuskaIf a read error is encountered on a cache device, that read I/O is reissued to
4193ff01b23SMartin Matuskathe original storage pool device, which might be part of a mirrored or raidz
4203ff01b23SMartin Matuskaconfiguration.
4213ff01b23SMartin Matuska.Pp
4223ff01b23SMartin MatuskaThe content of the cache devices is persistent across reboots and restored
4233ff01b23SMartin Matuskaasynchronously when importing the pool in L2ARC (persistent L2ARC).
4243ff01b23SMartin MatuskaThis can be disabled by setting
4253ff01b23SMartin Matuska.Sy l2arc_rebuild_enabled Ns = Ns Sy 0 .
4263ff01b23SMartin MatuskaFor cache devices smaller than
427716fd348SMartin Matuska.Em 1 GiB ,
428*d411c1d6SMartin MatuskaZFS does not write the metadata structures
429*d411c1d6SMartin Matuskarequired for rebuilding the L2ARC, to conserve space.
4303ff01b23SMartin MatuskaThis can be changed with
4313ff01b23SMartin Matuska.Sy l2arc_rebuild_blocks_min_l2size .
4323ff01b23SMartin MatuskaThe cache device header
4333ff01b23SMartin Matuska.Pq Em 512 B
4343ff01b23SMartin Matuskais updated even if no metadata structures are written.
4353ff01b23SMartin MatuskaSetting
4363ff01b23SMartin Matuska.Sy l2arc_headroom Ns = Ns Sy 0
4373ff01b23SMartin Matuskawill result in scanning the full-length ARC lists for cacheable content to be
4383ff01b23SMartin Matuskawritten in L2ARC (persistent ARC).
4393ff01b23SMartin MatuskaIf a cache device is added with
440*d411c1d6SMartin Matuska.Nm zpool Cm add ,
441*d411c1d6SMartin Matuskaits label and header will be overwritten and its contents will not be
4423ff01b23SMartin Matuskarestored in L2ARC, even if the device was previously part of the pool.
4433ff01b23SMartin MatuskaIf a cache device is onlined with
444*d411c1d6SMartin Matuska.Nm zpool Cm online ,
4453ff01b23SMartin Matuskaits contents will be restored in L2ARC.
446*d411c1d6SMartin MatuskaThis is useful in case of memory pressure,
4473ff01b23SMartin Matuskawhere the contents of the cache device are not fully restored in L2ARC.
448*d411c1d6SMartin MatuskaThe user can off- and online the cache device when there is less memory
449*d411c1d6SMartin Matuskapressure, to fully restore its contents to L2ARC.
4503ff01b23SMartin Matuska.
4513ff01b23SMartin Matuska.Ss Pool checkpoint
4523ff01b23SMartin MatuskaBefore starting critical procedures that include destructive actions
4533ff01b23SMartin Matuska.Pq like Nm zfs Cm destroy ,
454*d411c1d6SMartin Matuskaan administrator can checkpoint the pool's state and, in the case of a
4553ff01b23SMartin Matuskamistake or failure, rewind the entire pool back to the checkpoint.
4563ff01b23SMartin MatuskaOtherwise, the checkpoint can be discarded when the procedure has completed
4573ff01b23SMartin Matuskasuccessfully.
4583ff01b23SMartin Matuska.Pp
4593ff01b23SMartin MatuskaA pool checkpoint can be thought of as a pool-wide snapshot and should be used
4603ff01b23SMartin Matuskawith care as it contains every part of the pool's state, from properties to vdev
4613ff01b23SMartin Matuskaconfiguration.
4623ff01b23SMartin MatuskaThus, certain operations are not allowed while a pool has a checkpoint.
4633ff01b23SMartin MatuskaSpecifically, vdev removal/attach/detach, mirror splitting, and
4643ff01b23SMartin Matuskachanging the pool's GUID.
4653ff01b23SMartin MatuskaAdding a new vdev is supported, but in the case of a rewind it will have to be
4663ff01b23SMartin Matuskaadded again.
4673ff01b23SMartin MatuskaFinally, users of this feature should keep in mind that scrubs in a pool that
4683ff01b23SMartin Matuskahas a checkpoint do not repair checkpointed data.
4693ff01b23SMartin Matuska.Pp
4703ff01b23SMartin MatuskaTo create a checkpoint for a pool:
4713ff01b23SMartin Matuska.Dl # Nm zpool Cm checkpoint Ar pool
4723ff01b23SMartin Matuska.Pp
4733ff01b23SMartin MatuskaTo later rewind to its checkpointed state, you need to first export it and
4743ff01b23SMartin Matuskathen rewind it during import:
4753ff01b23SMartin Matuska.Dl # Nm zpool Cm export Ar pool
4763ff01b23SMartin Matuska.Dl # Nm zpool Cm import Fl -rewind-to-checkpoint Ar pool
4773ff01b23SMartin Matuska.Pp
4783ff01b23SMartin MatuskaTo discard the checkpoint from a pool:
4793ff01b23SMartin Matuska.Dl # Nm zpool Cm checkpoint Fl d Ar pool
4803ff01b23SMartin Matuska.Pp
4813ff01b23SMartin MatuskaDataset reservations (controlled by the
4823ff01b23SMartin Matuska.Sy reservation No and Sy refreservation
4833ff01b23SMartin Matuskaproperties) may be unenforceable while a checkpoint exists, because the
4843ff01b23SMartin Matuskacheckpoint is allowed to consume the dataset's reservation.
4853ff01b23SMartin MatuskaFinally, data that is part of the checkpoint but has been freed in the
4863ff01b23SMartin Matuskacurrent state of the pool won't be scanned during a scrub.
4873ff01b23SMartin Matuska.
4883ff01b23SMartin Matuska.Ss Special Allocation Class
4893ff01b23SMartin MatuskaAllocations in the special class are dedicated to specific block types.
490*d411c1d6SMartin MatuskaBy default, this includes all metadata, the indirect blocks of user data, and
4913ff01b23SMartin Matuskaany deduplication tables.
4923ff01b23SMartin MatuskaThe class can also be provisioned to accept small file blocks.
4933ff01b23SMartin Matuska.Pp
4943ff01b23SMartin MatuskaA pool must always have at least one normal
4953ff01b23SMartin Matuska.Pq non- Ns Sy dedup Ns /- Ns Sy special
4963ff01b23SMartin Matuskavdev before
4973ff01b23SMartin Matuskaother devices can be assigned to the special class.
4983ff01b23SMartin MatuskaIf the
4993ff01b23SMartin Matuska.Sy special
5003ff01b23SMartin Matuskaclass becomes full, then allocations intended for it
5013ff01b23SMartin Matuskawill spill back into the normal class.
5023ff01b23SMartin Matuska.Pp
5033ff01b23SMartin MatuskaDeduplication tables can be excluded from the special class by unsetting the
5043ff01b23SMartin Matuska.Sy zfs_ddt_data_is_special
5053ff01b23SMartin MatuskaZFS module parameter.
5063ff01b23SMartin Matuska.Pp
5073ff01b23SMartin MatuskaInclusion of small file blocks in the special class is opt-in.
5083ff01b23SMartin MatuskaEach dataset can control the size of small file blocks allowed
5093ff01b23SMartin Matuskain the special class by setting the
5103ff01b23SMartin Matuska.Sy special_small_blocks
5113ff01b23SMartin Matuskaproperty to nonzero.
5123ff01b23SMartin MatuskaSee
5133ff01b23SMartin Matuska.Xr zfsprops 7
5143ff01b23SMartin Matuskafor more info on this property.
515