xref: /freebsd/sys/contrib/openzfs/man/man7/zpoolconcepts.7 (revision e2257b3168fc1697518265527e5d817eca6c43b8)
13ff01b23SMartin Matuska.\"
23ff01b23SMartin Matuska.\" CDDL HEADER START
33ff01b23SMartin Matuska.\"
43ff01b23SMartin Matuska.\" The contents of this file are subject to the terms of the
53ff01b23SMartin Matuska.\" Common Development and Distribution License (the "License").
63ff01b23SMartin Matuska.\" You may not use this file except in compliance with the License.
73ff01b23SMartin Matuska.\"
83ff01b23SMartin Matuska.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
9271171e0SMartin Matuska.\" or https://opensource.org/licenses/CDDL-1.0.
103ff01b23SMartin Matuska.\" See the License for the specific language governing permissions
113ff01b23SMartin Matuska.\" and limitations under the License.
123ff01b23SMartin Matuska.\"
133ff01b23SMartin Matuska.\" When distributing Covered Code, include this CDDL HEADER in each
143ff01b23SMartin Matuska.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE.
153ff01b23SMartin Matuska.\" If applicable, add the following below this CDDL HEADER, with the
163ff01b23SMartin Matuska.\" fields enclosed by brackets "[]" replaced with your own identifying
173ff01b23SMartin Matuska.\" information: Portions Copyright [yyyy] [name of copyright owner]
183ff01b23SMartin Matuska.\"
193ff01b23SMartin Matuska.\" CDDL HEADER END
203ff01b23SMartin Matuska.\"
213ff01b23SMartin Matuska.\" Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved.
223ff01b23SMartin Matuska.\" Copyright (c) 2012, 2018 by Delphix. All rights reserved.
233ff01b23SMartin Matuska.\" Copyright (c) 2012 Cyril Plisko. All Rights Reserved.
243ff01b23SMartin Matuska.\" Copyright (c) 2017 Datto Inc.
253ff01b23SMartin Matuska.\" Copyright (c) 2018 George Melikov. All Rights Reserved.
263ff01b23SMartin Matuska.\" Copyright 2017 Nexenta Systems, Inc.
273ff01b23SMartin Matuska.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved.
283ff01b23SMartin Matuska.\"
29d411c1d6SMartin Matuska.Dd April 7, 2023
303ff01b23SMartin Matuska.Dt ZPOOLCONCEPTS 7
313ff01b23SMartin Matuska.Os
323ff01b23SMartin Matuska.
333ff01b23SMartin Matuska.Sh NAME
343ff01b23SMartin Matuska.Nm zpoolconcepts
353ff01b23SMartin Matuska.Nd overview of ZFS storage pools
363ff01b23SMartin Matuska.
373ff01b23SMartin Matuska.Sh DESCRIPTION
383ff01b23SMartin Matuska.Ss Virtual Devices (vdevs)
39d411c1d6SMartin MatuskaA "virtual device" describes a single device or a collection of devices,
403ff01b23SMartin Matuskaorganized according to certain performance and fault characteristics.
413ff01b23SMartin MatuskaThe following virtual devices are supported:
423ff01b23SMartin Matuska.Bl -tag -width "special"
433ff01b23SMartin Matuska.It Sy disk
443ff01b23SMartin MatuskaA block device, typically located under
453ff01b23SMartin Matuska.Pa /dev .
463ff01b23SMartin MatuskaZFS can use individual slices or partitions, though the recommended mode of
473ff01b23SMartin Matuskaoperation is to use whole disks.
483ff01b23SMartin MatuskaA disk can be specified by a full path, or it can be a shorthand name
493ff01b23SMartin Matuska.Po the relative portion of the path under
503ff01b23SMartin Matuska.Pa /dev
513ff01b23SMartin Matuska.Pc .
523ff01b23SMartin MatuskaA whole disk can be specified by omitting the slice or partition designation.
533ff01b23SMartin MatuskaFor example,
543ff01b23SMartin Matuska.Pa sda
553ff01b23SMartin Matuskais equivalent to
563ff01b23SMartin Matuska.Pa /dev/sda .
573ff01b23SMartin MatuskaWhen given a whole disk, ZFS automatically labels the disk, if necessary.
583ff01b23SMartin Matuska.It Sy file
593ff01b23SMartin MatuskaA regular file.
603ff01b23SMartin MatuskaThe use of files as a backing store is strongly discouraged.
613ff01b23SMartin MatuskaIt is designed primarily for experimental purposes, as the fault tolerance of a
623ff01b23SMartin Matuskafile is only as good as the file system on which it resides.
633ff01b23SMartin MatuskaA file must be specified by a full path.
643ff01b23SMartin Matuska.It Sy mirror
653ff01b23SMartin MatuskaA mirror of two or more devices.
663ff01b23SMartin MatuskaData is replicated in an identical fashion across all components of a mirror.
673ff01b23SMartin MatuskaA mirror with
683ff01b23SMartin Matuska.Em N No disks of size Em X No can hold Em X No bytes and can withstand Em N-1
69d411c1d6SMartin Matuskadevices failing, without losing data.
703ff01b23SMartin Matuska.It Sy raidz , raidz1 , raidz2 , raidz3
71d411c1d6SMartin MatuskaA distributed-parity layout, similar to RAID-5/6, with improved distribution of
72d411c1d6SMartin Matuskaparity, and which does not suffer from the RAID-5/6
73d411c1d6SMartin Matuska.Qq write hole ,
743ff01b23SMartin Matuska.Pq in which data and parity become inconsistent after a power loss .
75d411c1d6SMartin MatuskaData and parity is striped across all disks within a raidz group, though not
76d411c1d6SMartin Matuskanecessarily in a consistent stripe width.
773ff01b23SMartin Matuska.Pp
783ff01b23SMartin MatuskaA raidz group can have single, double, or triple parity, meaning that the
793ff01b23SMartin Matuskaraidz group can sustain one, two, or three failures, respectively, without
803ff01b23SMartin Matuskalosing any data.
813ff01b23SMartin MatuskaThe
823ff01b23SMartin Matuska.Sy raidz1
833ff01b23SMartin Matuskavdev type specifies a single-parity raidz group; the
843ff01b23SMartin Matuska.Sy raidz2
853ff01b23SMartin Matuskavdev type specifies a double-parity raidz group; and the
863ff01b23SMartin Matuska.Sy raidz3
873ff01b23SMartin Matuskavdev type specifies a triple-parity raidz group.
883ff01b23SMartin MatuskaThe
893ff01b23SMartin Matuska.Sy raidz
903ff01b23SMartin Matuskavdev type is an alias for
913ff01b23SMartin Matuska.Sy raidz1 .
923ff01b23SMartin Matuska.Pp
933ff01b23SMartin MatuskaA raidz group with
943ff01b23SMartin Matuska.Em N No disks of size Em X No with Em P No parity disks can hold approximately
953ff01b23SMartin Matuska.Em (N-P)*X No bytes and can withstand Em P No devices failing without losing data .
963ff01b23SMartin MatuskaThe minimum number of devices in a raidz group is one more than the number of
973ff01b23SMartin Matuskaparity disks.
983ff01b23SMartin MatuskaThe recommended number is between 3 and 9 to help increase performance.
993ff01b23SMartin Matuska.It Sy draid , draid1 , draid2 , draid3
100d411c1d6SMartin MatuskaA variant of raidz that provides integrated distributed hot spares, allowing
101d411c1d6SMartin Matuskafor faster resilvering, while retaining the benefits of raidz.
1023ff01b23SMartin MatuskaA dRAID vdev is constructed from multiple internal raidz groups, each with
1033ff01b23SMartin Matuska.Em D No data devices and Em P No parity devices .
1043ff01b23SMartin MatuskaThese groups are distributed over all of the children in order to fully
1053ff01b23SMartin Matuskautilize the available disk performance.
1063ff01b23SMartin Matuska.Pp
1073ff01b23SMartin MatuskaUnlike raidz, dRAID uses a fixed stripe width (padding as necessary with
1083ff01b23SMartin Matuskazeros) to allow fully sequential resilvering.
109d411c1d6SMartin MatuskaThis fixed stripe width significantly affects both usable capacity and IOPS.
1103ff01b23SMartin MatuskaFor example, with the default
111716fd348SMartin Matuska.Em D=8 No and Em 4 KiB No disk sectors the minimum allocation size is Em 32 KiB .
1123ff01b23SMartin MatuskaIf using compression, this relatively large allocation size can reduce the
1133ff01b23SMartin Matuskaeffective compression ratio.
114d411c1d6SMartin MatuskaWhen using ZFS volumes (zvols) and dRAID, the default of the
1153ff01b23SMartin Matuska.Sy volblocksize
1163ff01b23SMartin Matuskaproperty is increased to account for the allocation size.
1173ff01b23SMartin MatuskaIf a dRAID pool will hold a significant amount of small blocks, it is
1183ff01b23SMartin Matuskarecommended to also add a mirrored
1193ff01b23SMartin Matuska.Sy special
1203ff01b23SMartin Matuskavdev to store those blocks.
1213ff01b23SMartin Matuska.Pp
122d411c1d6SMartin MatuskaIn regards to I/O, performance is similar to raidz since, for any read, all
1233ff01b23SMartin Matuska.Em D No data disks must be accessed .
1243ff01b23SMartin MatuskaDelivered random IOPS can be reasonably approximated as
1253ff01b23SMartin Matuska.Sy floor((N-S)/(D+P))*single_drive_IOPS .
1263ff01b23SMartin Matuska.Pp
127da5137abSMartin MatuskaLike raidz, a dRAID can have single-, double-, or triple-parity.
1283ff01b23SMartin MatuskaThe
1293ff01b23SMartin Matuska.Sy draid1 ,
1303ff01b23SMartin Matuska.Sy draid2 ,
1313ff01b23SMartin Matuskaand
1323ff01b23SMartin Matuska.Sy draid3
1333ff01b23SMartin Matuskatypes can be used to specify the parity level.
1343ff01b23SMartin MatuskaThe
1353ff01b23SMartin Matuska.Sy draid
1363ff01b23SMartin Matuskavdev type is an alias for
1373ff01b23SMartin Matuska.Sy draid1 .
1383ff01b23SMartin Matuska.Pp
1393ff01b23SMartin MatuskaA dRAID with
1403ff01b23SMartin Matuska.Em N No disks of size Em X , D No data disks per redundancy group , Em P
1413ff01b23SMartin Matuska.No parity level, and Em S No distributed hot spares can hold approximately
1423ff01b23SMartin Matuska.Em (N-S)*(D/(D+P))*X No bytes and can withstand Em P
1433ff01b23SMartin Matuskadevices failing without losing data.
1443ff01b23SMartin Matuska.It Sy draid Ns Oo Ar parity Oc Ns Oo Sy \&: Ns Ar data Ns Sy d Oc Ns Oo Sy \&: Ns Ar children Ns Sy c Oc Ns Oo Sy \&: Ns Ar spares Ns Sy s Oc
1453ff01b23SMartin MatuskaA non-default dRAID configuration can be specified by appending one or more
1463ff01b23SMartin Matuskaof the following optional arguments to the
1473ff01b23SMartin Matuska.Sy draid
1483ff01b23SMartin Matuskakeyword:
1493ff01b23SMartin Matuska.Bl -tag -compact -width "children"
1503ff01b23SMartin Matuska.It Ar parity
1513ff01b23SMartin MatuskaThe parity level (1-3).
1523ff01b23SMartin Matuska.It Ar data
1533ff01b23SMartin MatuskaThe number of data devices per redundancy group.
1543ff01b23SMartin MatuskaIn general, a smaller value of
1553ff01b23SMartin Matuska.Em D No will increase IOPS, improve the compression ratio ,
1563ff01b23SMartin Matuskaand speed up resilvering at the expense of total usable capacity.
1573ff01b23SMartin MatuskaDefaults to
1583ff01b23SMartin Matuska.Em 8 , No unless Em N-P-S No is less than Em 8 .
1593ff01b23SMartin Matuska.It Ar children
1603ff01b23SMartin MatuskaThe expected number of children.
1613ff01b23SMartin MatuskaUseful as a cross-check when listing a large number of devices.
1623ff01b23SMartin MatuskaAn error is returned when the provided number of children differs.
1633ff01b23SMartin Matuska.It Ar spares
1643ff01b23SMartin MatuskaThe number of distributed hot spares.
1653ff01b23SMartin MatuskaDefaults to zero.
1663ff01b23SMartin Matuska.El
1673ff01b23SMartin Matuska.It Sy spare
1683ff01b23SMartin MatuskaA pseudo-vdev which keeps track of available hot spares for a pool.
1693ff01b23SMartin MatuskaFor more information, see the
1703ff01b23SMartin Matuska.Sx Hot Spares
1713ff01b23SMartin Matuskasection.
1723ff01b23SMartin Matuska.It Sy log
1733ff01b23SMartin MatuskaA separate intent log device.
1743ff01b23SMartin MatuskaIf more than one log device is specified, then writes are load-balanced between
1753ff01b23SMartin Matuskadevices.
1763ff01b23SMartin MatuskaLog devices can be mirrored.
1773ff01b23SMartin MatuskaHowever, raidz vdev types are not supported for the intent log.
1783ff01b23SMartin MatuskaFor more information, see the
1793ff01b23SMartin Matuska.Sx Intent Log
1803ff01b23SMartin Matuskasection.
1813ff01b23SMartin Matuska.It Sy dedup
182d411c1d6SMartin MatuskaA device solely dedicated for deduplication tables.
1833ff01b23SMartin MatuskaThe redundancy of this device should match the redundancy of the other normal
1843ff01b23SMartin Matuskadevices in the pool.
1853ff01b23SMartin MatuskaIf more than one dedup device is specified, then
1863ff01b23SMartin Matuskaallocations are load-balanced between those devices.
1873ff01b23SMartin Matuska.It Sy special
1883ff01b23SMartin MatuskaA device dedicated solely for allocating various kinds of internal metadata,
1893ff01b23SMartin Matuskaand optionally small file blocks.
1903ff01b23SMartin MatuskaThe redundancy of this device should match the redundancy of the other normal
1913ff01b23SMartin Matuskadevices in the pool.
1923ff01b23SMartin MatuskaIf more than one special device is specified, then
1933ff01b23SMartin Matuskaallocations are load-balanced between those devices.
1943ff01b23SMartin Matuska.Pp
1953ff01b23SMartin MatuskaFor more information on special allocations, see the
1963ff01b23SMartin Matuska.Sx Special Allocation Class
1973ff01b23SMartin Matuskasection.
1983ff01b23SMartin Matuska.It Sy cache
1993ff01b23SMartin MatuskaA device used to cache storage pool data.
2003ff01b23SMartin MatuskaA cache device cannot be configured as a mirror or raidz group.
2013ff01b23SMartin MatuskaFor more information, see the
2023ff01b23SMartin Matuska.Sx Cache Devices
2033ff01b23SMartin Matuskasection.
2043ff01b23SMartin Matuska.El
2053ff01b23SMartin Matuska.Pp
206cbfe9975SMartin MatuskaVirtual devices cannot be nested arbitrarily.
207cbfe9975SMartin MatuskaA mirror, raidz or draid virtual device can only be created with files or disks.
208cbfe9975SMartin MatuskaMirrors of mirrors or other such combinations are not allowed.
2093ff01b23SMartin Matuska.Pp
2103ff01b23SMartin MatuskaA pool can have any number of virtual devices at the top of the configuration
2113ff01b23SMartin Matuska.Po known as
2123ff01b23SMartin Matuska.Qq root vdevs
2133ff01b23SMartin Matuska.Pc .
2143ff01b23SMartin MatuskaData is dynamically distributed across all top-level devices to balance data
2153ff01b23SMartin Matuskaamong devices.
2163ff01b23SMartin MatuskaAs new virtual devices are added, ZFS automatically places data on the newly
2173ff01b23SMartin Matuskaavailable devices.
2183ff01b23SMartin Matuska.Pp
2193ff01b23SMartin MatuskaVirtual devices are specified one at a time on the command line,
2203ff01b23SMartin Matuskaseparated by whitespace.
2213ff01b23SMartin MatuskaKeywords like
2223ff01b23SMartin Matuska.Sy mirror No and Sy raidz
2233ff01b23SMartin Matuskaare used to distinguish where a group ends and another begins.
2243ff01b23SMartin MatuskaFor example, the following creates a pool with two root vdevs,
2253ff01b23SMartin Matuskaeach a mirror of two disks:
2263ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar mypool Sy mirror Ar sda sdb Sy mirror Ar sdc sdd
2273ff01b23SMartin Matuska.
2283ff01b23SMartin Matuska.Ss Device Failure and Recovery
2293ff01b23SMartin MatuskaZFS supports a rich set of mechanisms for handling device failure and data
2303ff01b23SMartin Matuskacorruption.
2313ff01b23SMartin MatuskaAll metadata and data is checksummed, and ZFS automatically repairs bad data
232d411c1d6SMartin Matuskafrom a good copy, when corruption is detected.
2333ff01b23SMartin Matuska.Pp
2343ff01b23SMartin MatuskaIn order to take advantage of these features, a pool must make use of some form
2353ff01b23SMartin Matuskaof redundancy, using either mirrored or raidz groups.
2363ff01b23SMartin MatuskaWhile ZFS supports running in a non-redundant configuration, where each root
2373ff01b23SMartin Matuskavdev is simply a disk or file, this is strongly discouraged.
2383ff01b23SMartin MatuskaA single case of bit corruption can render some or all of your data unavailable.
2393ff01b23SMartin Matuska.Pp
2403ff01b23SMartin MatuskaA pool's health status is described by one of three states:
2413ff01b23SMartin Matuska.Sy online , degraded , No or Sy faulted .
2423ff01b23SMartin MatuskaAn online pool has all devices operating normally.
2433ff01b23SMartin MatuskaA degraded pool is one in which one or more devices have failed, but the data is
2443ff01b23SMartin Matuskastill available due to a redundant configuration.
2453ff01b23SMartin MatuskaA faulted pool has corrupted metadata, or one or more faulted devices, and
2463ff01b23SMartin Matuskainsufficient replicas to continue functioning.
2473ff01b23SMartin Matuska.Pp
2483ff01b23SMartin MatuskaThe health of the top-level vdev, such as a mirror or raidz device,
249d411c1d6SMartin Matuskais potentially impacted by the state of its associated vdevs
2503ff01b23SMartin Matuskaor component devices.
2513ff01b23SMartin MatuskaA top-level vdev or component device is in one of the following states:
2523ff01b23SMartin Matuska.Bl -tag -width "DEGRADED"
2533ff01b23SMartin Matuska.It Sy DEGRADED
2543ff01b23SMartin MatuskaOne or more top-level vdevs is in the degraded state because one or more
2553ff01b23SMartin Matuskacomponent devices are offline.
2563ff01b23SMartin MatuskaSufficient replicas exist to continue functioning.
2573ff01b23SMartin Matuska.Pp
2583ff01b23SMartin MatuskaOne or more component devices is in the degraded or faulted state, but
2593ff01b23SMartin Matuskasufficient replicas exist to continue functioning.
2603ff01b23SMartin MatuskaThe underlying conditions are as follows:
2613ff01b23SMartin Matuska.Bl -bullet -compact
2623ff01b23SMartin Matuska.It
263*e2257b31SMartin MatuskaThe number of checksum errors or slow I/Os exceeds acceptable levels and the
264*e2257b31SMartin Matuskadevice is degraded as an indication that something may be wrong.
2653ff01b23SMartin MatuskaZFS continues to use the device as necessary.
2663ff01b23SMartin Matuska.It
2673ff01b23SMartin MatuskaThe number of I/O errors exceeds acceptable levels.
2683ff01b23SMartin MatuskaThe device could not be marked as faulted because there are insufficient
2693ff01b23SMartin Matuskareplicas to continue functioning.
2703ff01b23SMartin Matuska.El
2713ff01b23SMartin Matuska.It Sy FAULTED
2723ff01b23SMartin MatuskaOne or more top-level vdevs is in the faulted state because one or more
2733ff01b23SMartin Matuskacomponent devices are offline.
2743ff01b23SMartin MatuskaInsufficient replicas exist to continue functioning.
2753ff01b23SMartin Matuska.Pp
2763ff01b23SMartin MatuskaOne or more component devices is in the faulted state, and insufficient
2773ff01b23SMartin Matuskareplicas exist to continue functioning.
2783ff01b23SMartin MatuskaThe underlying conditions are as follows:
2793ff01b23SMartin Matuska.Bl -bullet -compact
2803ff01b23SMartin Matuska.It
2813ff01b23SMartin MatuskaThe device could be opened, but the contents did not match expected values.
2823ff01b23SMartin Matuska.It
2833ff01b23SMartin MatuskaThe number of I/O errors exceeds acceptable levels and the device is faulted to
2843ff01b23SMartin Matuskaprevent further use of the device.
2853ff01b23SMartin Matuska.El
2863ff01b23SMartin Matuska.It Sy OFFLINE
2873ff01b23SMartin MatuskaThe device was explicitly taken offline by the
2883ff01b23SMartin Matuska.Nm zpool Cm offline
2893ff01b23SMartin Matuskacommand.
2903ff01b23SMartin Matuska.It Sy ONLINE
2913ff01b23SMartin MatuskaThe device is online and functioning.
2923ff01b23SMartin Matuska.It Sy REMOVED
2933ff01b23SMartin MatuskaThe device was physically removed while the system was running.
2943ff01b23SMartin MatuskaDevice removal detection is hardware-dependent and may not be supported on all
2953ff01b23SMartin Matuskaplatforms.
2963ff01b23SMartin Matuska.It Sy UNAVAIL
2973ff01b23SMartin MatuskaThe device could not be opened.
2983ff01b23SMartin MatuskaIf a pool is imported when a device was unavailable, then the device will be
2993ff01b23SMartin Matuskaidentified by a unique identifier instead of its path since the path was never
3003ff01b23SMartin Matuskacorrect in the first place.
3013ff01b23SMartin Matuska.El
3023ff01b23SMartin Matuska.Pp
3033ff01b23SMartin MatuskaChecksum errors represent events where a disk returned data that was expected
3043ff01b23SMartin Matuskato be correct, but was not.
3053ff01b23SMartin MatuskaIn other words, these are instances of silent data corruption.
3063ff01b23SMartin MatuskaThe checksum errors are reported in
3073ff01b23SMartin Matuska.Nm zpool Cm status
3083ff01b23SMartin Matuskaand
3093ff01b23SMartin Matuska.Nm zpool Cm events .
3103ff01b23SMartin MatuskaWhen a block is stored redundantly, a damaged block may be reconstructed
3113ff01b23SMartin Matuska(e.g. from raidz parity or a mirrored copy).
3123ff01b23SMartin MatuskaIn this case, ZFS reports the checksum error against the disks that contained
3133ff01b23SMartin Matuskadamaged data.
3143ff01b23SMartin MatuskaIf a block is unable to be reconstructed (e.g. due to 3 disks being damaged
3153ff01b23SMartin Matuskain a raidz2 group), it is not possible to determine which disks were silently
3163ff01b23SMartin Matuskacorrupted.
3173ff01b23SMartin MatuskaIn this case, checksum errors are reported for all disks on which the block
3183ff01b23SMartin Matuskais stored.
3193ff01b23SMartin Matuska.Pp
3203ff01b23SMartin MatuskaIf a device is removed and later re-attached to the system,
321d411c1d6SMartin MatuskaZFS attempts to bring the device online automatically.
3223ff01b23SMartin MatuskaDevice attachment detection is hardware-dependent
3233ff01b23SMartin Matuskaand might not be supported on all platforms.
3243ff01b23SMartin Matuska.
3253ff01b23SMartin Matuska.Ss Hot Spares
3263ff01b23SMartin MatuskaZFS allows devices to be associated with pools as
3273ff01b23SMartin Matuska.Qq hot spares .
328d411c1d6SMartin MatuskaThese devices are not actively used in the pool.
329d411c1d6SMartin MatuskaBut, when an active device
3303ff01b23SMartin Matuskafails, it is automatically replaced by a hot spare.
3313ff01b23SMartin MatuskaTo create a pool with hot spares, specify a
3323ff01b23SMartin Matuska.Sy spare
3333ff01b23SMartin Matuskavdev with any number of devices.
3343ff01b23SMartin MatuskaFor example,
3353ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool Sy mirror Ar sda sdb Sy spare Ar sdc sdd
3363ff01b23SMartin Matuska.Pp
3373ff01b23SMartin MatuskaSpares can be shared across multiple pools, and can be added with the
3383ff01b23SMartin Matuska.Nm zpool Cm add
3393ff01b23SMartin Matuskacommand and removed with the
3403ff01b23SMartin Matuska.Nm zpool Cm remove
3413ff01b23SMartin Matuskacommand.
3423ff01b23SMartin MatuskaOnce a spare replacement is initiated, a new
3433ff01b23SMartin Matuska.Sy spare
3443ff01b23SMartin Matuskavdev is created within the configuration that will remain there until the
3453ff01b23SMartin Matuskaoriginal device is replaced.
346d411c1d6SMartin MatuskaAt this point, the hot spare becomes available again, if another device fails.
3473ff01b23SMartin Matuska.Pp
3483ff01b23SMartin MatuskaIf a pool has a shared spare that is currently being used, the pool cannot be
349d411c1d6SMartin Matuskaexported, since other pools may use this shared spare, which may lead to
3503ff01b23SMartin Matuskapotential data corruption.
3513ff01b23SMartin Matuska.Pp
3523ff01b23SMartin MatuskaShared spares add some risk.
3533ff01b23SMartin MatuskaIf the pools are imported on different hosts,
3543ff01b23SMartin Matuskaand both pools suffer a device failure at the same time,
3553ff01b23SMartin Matuskaboth could attempt to use the spare at the same time.
3563ff01b23SMartin MatuskaThis may not be detected, resulting in data corruption.
3573ff01b23SMartin Matuska.Pp
3583ff01b23SMartin MatuskaAn in-progress spare replacement can be cancelled by detaching the hot spare.
3593ff01b23SMartin MatuskaIf the original faulted device is detached, then the hot spare assumes its
3603ff01b23SMartin Matuskaplace in the configuration, and is removed from the spare list of all active
3613ff01b23SMartin Matuskapools.
3623ff01b23SMartin Matuska.Pp
3633ff01b23SMartin MatuskaThe
3643ff01b23SMartin Matuska.Sy draid
3653ff01b23SMartin Matuskavdev type provides distributed hot spares.
3663ff01b23SMartin MatuskaThese hot spares are named after the dRAID vdev they're a part of
3673ff01b23SMartin Matuska.Po Sy draid1 Ns - Ns Ar 2 Ns - Ns Ar 3 No specifies spare Ar 3 No of vdev Ar 2 ,
3683ff01b23SMartin Matuska.No which is a single parity dRAID Pc
3693ff01b23SMartin Matuskaand may only be used by that dRAID vdev.
3703ff01b23SMartin MatuskaOtherwise, they behave the same as normal hot spares.
3713ff01b23SMartin Matuska.Pp
3723ff01b23SMartin MatuskaSpares cannot replace log devices.
3733ff01b23SMartin Matuska.
3743ff01b23SMartin Matuska.Ss Intent Log
3753ff01b23SMartin MatuskaThe ZFS Intent Log (ZIL) satisfies POSIX requirements for synchronous
3763ff01b23SMartin Matuskatransactions.
3773ff01b23SMartin MatuskaFor instance, databases often require their transactions to be on stable storage
3783ff01b23SMartin Matuskadevices when returning from a system call.
3793ff01b23SMartin MatuskaNFS and other applications can also use
3803ff01b23SMartin Matuska.Xr fsync 2
3813ff01b23SMartin Matuskato ensure data stability.
3823ff01b23SMartin MatuskaBy default, the intent log is allocated from blocks within the main pool.
3833ff01b23SMartin MatuskaHowever, it might be possible to get better performance using separate intent
3843ff01b23SMartin Matuskalog devices such as NVRAM or a dedicated disk.
3853ff01b23SMartin MatuskaFor example:
3863ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool sda sdb Sy log Ar sdc
3873ff01b23SMartin Matuska.Pp
3883ff01b23SMartin MatuskaMultiple log devices can also be specified, and they can be mirrored.
3893ff01b23SMartin MatuskaSee the
3903ff01b23SMartin Matuska.Sx EXAMPLES
3913ff01b23SMartin Matuskasection for an example of mirroring multiple log devices.
3923ff01b23SMartin Matuska.Pp
393d411c1d6SMartin MatuskaLog devices can be added, replaced, attached, detached, and removed.
3943ff01b23SMartin MatuskaIn addition, log devices are imported and exported as part of the pool
3953ff01b23SMartin Matuskathat contains them.
3963ff01b23SMartin MatuskaMirrored devices can be removed by specifying the top-level mirror vdev.
3973ff01b23SMartin Matuska.
3983ff01b23SMartin Matuska.Ss Cache Devices
3993ff01b23SMartin MatuskaDevices can be added to a storage pool as
4003ff01b23SMartin Matuska.Qq cache devices .
4013ff01b23SMartin MatuskaThese devices provide an additional layer of caching between main memory and
4023ff01b23SMartin Matuskadisk.
4033ff01b23SMartin MatuskaFor read-heavy workloads, where the working set size is much larger than what
4043ff01b23SMartin Matuskacan be cached in main memory, using cache devices allows much more of this
4053ff01b23SMartin Matuskaworking set to be served from low latency media.
4063ff01b23SMartin MatuskaUsing cache devices provides the greatest performance improvement for random
4073ff01b23SMartin Matuskaread-workloads of mostly static content.
4083ff01b23SMartin Matuska.Pp
4093ff01b23SMartin MatuskaTo create a pool with cache devices, specify a
4103ff01b23SMartin Matuska.Sy cache
4113ff01b23SMartin Matuskavdev with any number of devices.
4123ff01b23SMartin MatuskaFor example:
4133ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool sda sdb Sy cache Ar sdc sdd
4143ff01b23SMartin Matuska.Pp
4153ff01b23SMartin MatuskaCache devices cannot be mirrored or part of a raidz configuration.
4163ff01b23SMartin MatuskaIf a read error is encountered on a cache device, that read I/O is reissued to
4173ff01b23SMartin Matuskathe original storage pool device, which might be part of a mirrored or raidz
4183ff01b23SMartin Matuskaconfiguration.
4193ff01b23SMartin Matuska.Pp
4203ff01b23SMartin MatuskaThe content of the cache devices is persistent across reboots and restored
4213ff01b23SMartin Matuskaasynchronously when importing the pool in L2ARC (persistent L2ARC).
4223ff01b23SMartin MatuskaThis can be disabled by setting
4233ff01b23SMartin Matuska.Sy l2arc_rebuild_enabled Ns = Ns Sy 0 .
4243ff01b23SMartin MatuskaFor cache devices smaller than
425716fd348SMartin Matuska.Em 1 GiB ,
426d411c1d6SMartin MatuskaZFS does not write the metadata structures
427d411c1d6SMartin Matuskarequired for rebuilding the L2ARC, to conserve space.
4283ff01b23SMartin MatuskaThis can be changed with
4293ff01b23SMartin Matuska.Sy l2arc_rebuild_blocks_min_l2size .
4303ff01b23SMartin MatuskaThe cache device header
4313ff01b23SMartin Matuska.Pq Em 512 B
4323ff01b23SMartin Matuskais updated even if no metadata structures are written.
4333ff01b23SMartin MatuskaSetting
4343ff01b23SMartin Matuska.Sy l2arc_headroom Ns = Ns Sy 0
4353ff01b23SMartin Matuskawill result in scanning the full-length ARC lists for cacheable content to be
4363ff01b23SMartin Matuskawritten in L2ARC (persistent ARC).
4373ff01b23SMartin MatuskaIf a cache device is added with
438d411c1d6SMartin Matuska.Nm zpool Cm add ,
439d411c1d6SMartin Matuskaits label and header will be overwritten and its contents will not be
4403ff01b23SMartin Matuskarestored in L2ARC, even if the device was previously part of the pool.
4413ff01b23SMartin MatuskaIf a cache device is onlined with
442d411c1d6SMartin Matuska.Nm zpool Cm online ,
4433ff01b23SMartin Matuskaits contents will be restored in L2ARC.
444d411c1d6SMartin MatuskaThis is useful in case of memory pressure,
4453ff01b23SMartin Matuskawhere the contents of the cache device are not fully restored in L2ARC.
446d411c1d6SMartin MatuskaThe user can off- and online the cache device when there is less memory
447d411c1d6SMartin Matuskapressure, to fully restore its contents to L2ARC.
4483ff01b23SMartin Matuska.
4493ff01b23SMartin Matuska.Ss Pool checkpoint
4503ff01b23SMartin MatuskaBefore starting critical procedures that include destructive actions
4513ff01b23SMartin Matuska.Pq like Nm zfs Cm destroy ,
452d411c1d6SMartin Matuskaan administrator can checkpoint the pool's state and, in the case of a
4533ff01b23SMartin Matuskamistake or failure, rewind the entire pool back to the checkpoint.
4543ff01b23SMartin MatuskaOtherwise, the checkpoint can be discarded when the procedure has completed
4553ff01b23SMartin Matuskasuccessfully.
4563ff01b23SMartin Matuska.Pp
4573ff01b23SMartin MatuskaA pool checkpoint can be thought of as a pool-wide snapshot and should be used
4583ff01b23SMartin Matuskawith care as it contains every part of the pool's state, from properties to vdev
4593ff01b23SMartin Matuskaconfiguration.
4603ff01b23SMartin MatuskaThus, certain operations are not allowed while a pool has a checkpoint.
4613ff01b23SMartin MatuskaSpecifically, vdev removal/attach/detach, mirror splitting, and
4623ff01b23SMartin Matuskachanging the pool's GUID.
4633ff01b23SMartin MatuskaAdding a new vdev is supported, but in the case of a rewind it will have to be
4643ff01b23SMartin Matuskaadded again.
4653ff01b23SMartin MatuskaFinally, users of this feature should keep in mind that scrubs in a pool that
4663ff01b23SMartin Matuskahas a checkpoint do not repair checkpointed data.
4673ff01b23SMartin Matuska.Pp
4683ff01b23SMartin MatuskaTo create a checkpoint for a pool:
4693ff01b23SMartin Matuska.Dl # Nm zpool Cm checkpoint Ar pool
4703ff01b23SMartin Matuska.Pp
4713ff01b23SMartin MatuskaTo later rewind to its checkpointed state, you need to first export it and
4723ff01b23SMartin Matuskathen rewind it during import:
4733ff01b23SMartin Matuska.Dl # Nm zpool Cm export Ar pool
4743ff01b23SMartin Matuska.Dl # Nm zpool Cm import Fl -rewind-to-checkpoint Ar pool
4753ff01b23SMartin Matuska.Pp
4763ff01b23SMartin MatuskaTo discard the checkpoint from a pool:
4773ff01b23SMartin Matuska.Dl # Nm zpool Cm checkpoint Fl d Ar pool
4783ff01b23SMartin Matuska.Pp
4793ff01b23SMartin MatuskaDataset reservations (controlled by the
4803ff01b23SMartin Matuska.Sy reservation No and Sy refreservation
4813ff01b23SMartin Matuskaproperties) may be unenforceable while a checkpoint exists, because the
4823ff01b23SMartin Matuskacheckpoint is allowed to consume the dataset's reservation.
4833ff01b23SMartin MatuskaFinally, data that is part of the checkpoint but has been freed in the
4843ff01b23SMartin Matuskacurrent state of the pool won't be scanned during a scrub.
4853ff01b23SMartin Matuska.
4863ff01b23SMartin Matuska.Ss Special Allocation Class
4873ff01b23SMartin MatuskaAllocations in the special class are dedicated to specific block types.
488d411c1d6SMartin MatuskaBy default, this includes all metadata, the indirect blocks of user data, and
4893ff01b23SMartin Matuskaany deduplication tables.
4903ff01b23SMartin MatuskaThe class can also be provisioned to accept small file blocks.
4913ff01b23SMartin Matuska.Pp
4923ff01b23SMartin MatuskaA pool must always have at least one normal
4933ff01b23SMartin Matuska.Pq non- Ns Sy dedup Ns /- Ns Sy special
4943ff01b23SMartin Matuskavdev before
4953ff01b23SMartin Matuskaother devices can be assigned to the special class.
4963ff01b23SMartin MatuskaIf the
4973ff01b23SMartin Matuska.Sy special
4983ff01b23SMartin Matuskaclass becomes full, then allocations intended for it
4993ff01b23SMartin Matuskawill spill back into the normal class.
5003ff01b23SMartin Matuska.Pp
5013ff01b23SMartin MatuskaDeduplication tables can be excluded from the special class by unsetting the
5023ff01b23SMartin Matuska.Sy zfs_ddt_data_is_special
5033ff01b23SMartin MatuskaZFS module parameter.
5043ff01b23SMartin Matuska.Pp
5053ff01b23SMartin MatuskaInclusion of small file blocks in the special class is opt-in.
5063ff01b23SMartin MatuskaEach dataset can control the size of small file blocks allowed
5073ff01b23SMartin Matuskain the special class by setting the
5083ff01b23SMartin Matuska.Sy special_small_blocks
5093ff01b23SMartin Matuskaproperty to nonzero.
5103ff01b23SMartin MatuskaSee
5113ff01b23SMartin Matuska.Xr zfsprops 7
5123ff01b23SMartin Matuskafor more info on this property.
513