13ff01b23SMartin Matuska.\" 23ff01b23SMartin Matuska.\" CDDL HEADER START 33ff01b23SMartin Matuska.\" 43ff01b23SMartin Matuska.\" The contents of this file are subject to the terms of the 53ff01b23SMartin Matuska.\" Common Development and Distribution License (the "License"). 63ff01b23SMartin Matuska.\" You may not use this file except in compliance with the License. 73ff01b23SMartin Matuska.\" 83ff01b23SMartin Matuska.\" You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 9271171e0SMartin Matuska.\" or https://opensource.org/licenses/CDDL-1.0. 103ff01b23SMartin Matuska.\" See the License for the specific language governing permissions 113ff01b23SMartin Matuska.\" and limitations under the License. 123ff01b23SMartin Matuska.\" 133ff01b23SMartin Matuska.\" When distributing Covered Code, include this CDDL HEADER in each 143ff01b23SMartin Matuska.\" file and include the License file at usr/src/OPENSOLARIS.LICENSE. 153ff01b23SMartin Matuska.\" If applicable, add the following below this CDDL HEADER, with the 163ff01b23SMartin Matuska.\" fields enclosed by brackets "[]" replaced with your own identifying 173ff01b23SMartin Matuska.\" information: Portions Copyright [yyyy] [name of copyright owner] 183ff01b23SMartin Matuska.\" 193ff01b23SMartin Matuska.\" CDDL HEADER END 203ff01b23SMartin Matuska.\" 213ff01b23SMartin Matuska.\" Copyright (c) 2007, Sun Microsystems, Inc. All Rights Reserved. 223ff01b23SMartin Matuska.\" Copyright (c) 2012, 2018 by Delphix. All rights reserved. 233ff01b23SMartin Matuska.\" Copyright (c) 2012 Cyril Plisko. All Rights Reserved. 243ff01b23SMartin Matuska.\" Copyright (c) 2017 Datto Inc. 253ff01b23SMartin Matuska.\" Copyright (c) 2018 George Melikov. All Rights Reserved. 263ff01b23SMartin Matuska.\" Copyright 2017 Nexenta Systems, Inc. 273ff01b23SMartin Matuska.\" Copyright (c) 2017 Open-E, Inc. All Rights Reserved. 283ff01b23SMartin Matuska.\" 29d411c1d6SMartin Matuska.Dd April 7, 2023 303ff01b23SMartin Matuska.Dt ZPOOLCONCEPTS 7 313ff01b23SMartin Matuska.Os 323ff01b23SMartin Matuska. 333ff01b23SMartin Matuska.Sh NAME 343ff01b23SMartin Matuska.Nm zpoolconcepts 353ff01b23SMartin Matuska.Nd overview of ZFS storage pools 363ff01b23SMartin Matuska. 373ff01b23SMartin Matuska.Sh DESCRIPTION 383ff01b23SMartin Matuska.Ss Virtual Devices (vdevs) 39d411c1d6SMartin MatuskaA "virtual device" describes a single device or a collection of devices, 403ff01b23SMartin Matuskaorganized according to certain performance and fault characteristics. 413ff01b23SMartin MatuskaThe following virtual devices are supported: 423ff01b23SMartin Matuska.Bl -tag -width "special" 433ff01b23SMartin Matuska.It Sy disk 443ff01b23SMartin MatuskaA block device, typically located under 453ff01b23SMartin Matuska.Pa /dev . 463ff01b23SMartin MatuskaZFS can use individual slices or partitions, though the recommended mode of 473ff01b23SMartin Matuskaoperation is to use whole disks. 483ff01b23SMartin MatuskaA disk can be specified by a full path, or it can be a shorthand name 493ff01b23SMartin Matuska.Po the relative portion of the path under 503ff01b23SMartin Matuska.Pa /dev 513ff01b23SMartin Matuska.Pc . 523ff01b23SMartin MatuskaA whole disk can be specified by omitting the slice or partition designation. 533ff01b23SMartin MatuskaFor example, 543ff01b23SMartin Matuska.Pa sda 553ff01b23SMartin Matuskais equivalent to 563ff01b23SMartin Matuska.Pa /dev/sda . 573ff01b23SMartin MatuskaWhen given a whole disk, ZFS automatically labels the disk, if necessary. 583ff01b23SMartin Matuska.It Sy file 593ff01b23SMartin MatuskaA regular file. 603ff01b23SMartin MatuskaThe use of files as a backing store is strongly discouraged. 613ff01b23SMartin MatuskaIt is designed primarily for experimental purposes, as the fault tolerance of a 623ff01b23SMartin Matuskafile is only as good as the file system on which it resides. 633ff01b23SMartin MatuskaA file must be specified by a full path. 643ff01b23SMartin Matuska.It Sy mirror 653ff01b23SMartin MatuskaA mirror of two or more devices. 663ff01b23SMartin MatuskaData is replicated in an identical fashion across all components of a mirror. 673ff01b23SMartin MatuskaA mirror with 683ff01b23SMartin Matuska.Em N No disks of size Em X No can hold Em X No bytes and can withstand Em N-1 69d411c1d6SMartin Matuskadevices failing, without losing data. 703ff01b23SMartin Matuska.It Sy raidz , raidz1 , raidz2 , raidz3 71d411c1d6SMartin MatuskaA distributed-parity layout, similar to RAID-5/6, with improved distribution of 72d411c1d6SMartin Matuskaparity, and which does not suffer from the RAID-5/6 73d411c1d6SMartin Matuska.Qq write hole , 743ff01b23SMartin Matuska.Pq in which data and parity become inconsistent after a power loss . 75d411c1d6SMartin MatuskaData and parity is striped across all disks within a raidz group, though not 76d411c1d6SMartin Matuskanecessarily in a consistent stripe width. 773ff01b23SMartin Matuska.Pp 783ff01b23SMartin MatuskaA raidz group can have single, double, or triple parity, meaning that the 793ff01b23SMartin Matuskaraidz group can sustain one, two, or three failures, respectively, without 803ff01b23SMartin Matuskalosing any data. 813ff01b23SMartin MatuskaThe 823ff01b23SMartin Matuska.Sy raidz1 833ff01b23SMartin Matuskavdev type specifies a single-parity raidz group; the 843ff01b23SMartin Matuska.Sy raidz2 853ff01b23SMartin Matuskavdev type specifies a double-parity raidz group; and the 863ff01b23SMartin Matuska.Sy raidz3 873ff01b23SMartin Matuskavdev type specifies a triple-parity raidz group. 883ff01b23SMartin MatuskaThe 893ff01b23SMartin Matuska.Sy raidz 903ff01b23SMartin Matuskavdev type is an alias for 913ff01b23SMartin Matuska.Sy raidz1 . 923ff01b23SMartin Matuska.Pp 933ff01b23SMartin MatuskaA raidz group with 943ff01b23SMartin Matuska.Em N No disks of size Em X No with Em P No parity disks can hold approximately 953ff01b23SMartin Matuska.Em (N-P)*X No bytes and can withstand Em P No devices failing without losing data . 963ff01b23SMartin MatuskaThe minimum number of devices in a raidz group is one more than the number of 973ff01b23SMartin Matuskaparity disks. 983ff01b23SMartin MatuskaThe recommended number is between 3 and 9 to help increase performance. 993ff01b23SMartin Matuska.It Sy draid , draid1 , draid2 , draid3 100d411c1d6SMartin MatuskaA variant of raidz that provides integrated distributed hot spares, allowing 101d411c1d6SMartin Matuskafor faster resilvering, while retaining the benefits of raidz. 1023ff01b23SMartin MatuskaA dRAID vdev is constructed from multiple internal raidz groups, each with 1033ff01b23SMartin Matuska.Em D No data devices and Em P No parity devices . 1043ff01b23SMartin MatuskaThese groups are distributed over all of the children in order to fully 1053ff01b23SMartin Matuskautilize the available disk performance. 1063ff01b23SMartin Matuska.Pp 1073ff01b23SMartin MatuskaUnlike raidz, dRAID uses a fixed stripe width (padding as necessary with 1083ff01b23SMartin Matuskazeros) to allow fully sequential resilvering. 109d411c1d6SMartin MatuskaThis fixed stripe width significantly affects both usable capacity and IOPS. 1103ff01b23SMartin MatuskaFor example, with the default 111716fd348SMartin Matuska.Em D=8 No and Em 4 KiB No disk sectors the minimum allocation size is Em 32 KiB . 1123ff01b23SMartin MatuskaIf using compression, this relatively large allocation size can reduce the 1133ff01b23SMartin Matuskaeffective compression ratio. 114d411c1d6SMartin MatuskaWhen using ZFS volumes (zvols) and dRAID, the default of the 1153ff01b23SMartin Matuska.Sy volblocksize 1163ff01b23SMartin Matuskaproperty is increased to account for the allocation size. 1173ff01b23SMartin MatuskaIf a dRAID pool will hold a significant amount of small blocks, it is 1183ff01b23SMartin Matuskarecommended to also add a mirrored 1193ff01b23SMartin Matuska.Sy special 1203ff01b23SMartin Matuskavdev to store those blocks. 1213ff01b23SMartin Matuska.Pp 122d411c1d6SMartin MatuskaIn regards to I/O, performance is similar to raidz since, for any read, all 1233ff01b23SMartin Matuska.Em D No data disks must be accessed . 1243ff01b23SMartin MatuskaDelivered random IOPS can be reasonably approximated as 1253ff01b23SMartin Matuska.Sy floor((N-S)/(D+P))*single_drive_IOPS . 1263ff01b23SMartin Matuska.Pp 127da5137abSMartin MatuskaLike raidz, a dRAID can have single-, double-, or triple-parity. 1283ff01b23SMartin MatuskaThe 1293ff01b23SMartin Matuska.Sy draid1 , 1303ff01b23SMartin Matuska.Sy draid2 , 1313ff01b23SMartin Matuskaand 1323ff01b23SMartin Matuska.Sy draid3 1333ff01b23SMartin Matuskatypes can be used to specify the parity level. 1343ff01b23SMartin MatuskaThe 1353ff01b23SMartin Matuska.Sy draid 1363ff01b23SMartin Matuskavdev type is an alias for 1373ff01b23SMartin Matuska.Sy draid1 . 1383ff01b23SMartin Matuska.Pp 1393ff01b23SMartin MatuskaA dRAID with 1403ff01b23SMartin Matuska.Em N No disks of size Em X , D No data disks per redundancy group , Em P 1413ff01b23SMartin Matuska.No parity level, and Em S No distributed hot spares can hold approximately 1423ff01b23SMartin Matuska.Em (N-S)*(D/(D+P))*X No bytes and can withstand Em P 1433ff01b23SMartin Matuskadevices failing without losing data. 1443ff01b23SMartin Matuska.It Sy draid Ns Oo Ar parity Oc Ns Oo Sy \&: Ns Ar data Ns Sy d Oc Ns Oo Sy \&: Ns Ar children Ns Sy c Oc Ns Oo Sy \&: Ns Ar spares Ns Sy s Oc 1453ff01b23SMartin MatuskaA non-default dRAID configuration can be specified by appending one or more 1463ff01b23SMartin Matuskaof the following optional arguments to the 1473ff01b23SMartin Matuska.Sy draid 1483ff01b23SMartin Matuskakeyword: 1493ff01b23SMartin Matuska.Bl -tag -compact -width "children" 1503ff01b23SMartin Matuska.It Ar parity 1513ff01b23SMartin MatuskaThe parity level (1-3). 1523ff01b23SMartin Matuska.It Ar data 1533ff01b23SMartin MatuskaThe number of data devices per redundancy group. 1543ff01b23SMartin MatuskaIn general, a smaller value of 1553ff01b23SMartin Matuska.Em D No will increase IOPS, improve the compression ratio , 1563ff01b23SMartin Matuskaand speed up resilvering at the expense of total usable capacity. 1573ff01b23SMartin MatuskaDefaults to 1583ff01b23SMartin Matuska.Em 8 , No unless Em N-P-S No is less than Em 8 . 1593ff01b23SMartin Matuska.It Ar children 1603ff01b23SMartin MatuskaThe expected number of children. 1613ff01b23SMartin MatuskaUseful as a cross-check when listing a large number of devices. 1623ff01b23SMartin MatuskaAn error is returned when the provided number of children differs. 1633ff01b23SMartin Matuska.It Ar spares 1643ff01b23SMartin MatuskaThe number of distributed hot spares. 1653ff01b23SMartin MatuskaDefaults to zero. 1663ff01b23SMartin Matuska.El 1673ff01b23SMartin Matuska.It Sy spare 1683ff01b23SMartin MatuskaA pseudo-vdev which keeps track of available hot spares for a pool. 1693ff01b23SMartin MatuskaFor more information, see the 1703ff01b23SMartin Matuska.Sx Hot Spares 1713ff01b23SMartin Matuskasection. 1723ff01b23SMartin Matuska.It Sy log 1733ff01b23SMartin MatuskaA separate intent log device. 1743ff01b23SMartin MatuskaIf more than one log device is specified, then writes are load-balanced between 1753ff01b23SMartin Matuskadevices. 1763ff01b23SMartin MatuskaLog devices can be mirrored. 1773ff01b23SMartin MatuskaHowever, raidz vdev types are not supported for the intent log. 1783ff01b23SMartin MatuskaFor more information, see the 1793ff01b23SMartin Matuska.Sx Intent Log 1803ff01b23SMartin Matuskasection. 1813ff01b23SMartin Matuska.It Sy dedup 182d411c1d6SMartin MatuskaA device solely dedicated for deduplication tables. 1833ff01b23SMartin MatuskaThe redundancy of this device should match the redundancy of the other normal 1843ff01b23SMartin Matuskadevices in the pool. 1853ff01b23SMartin MatuskaIf more than one dedup device is specified, then 1863ff01b23SMartin Matuskaallocations are load-balanced between those devices. 1873ff01b23SMartin Matuska.It Sy special 1883ff01b23SMartin MatuskaA device dedicated solely for allocating various kinds of internal metadata, 1893ff01b23SMartin Matuskaand optionally small file blocks. 1903ff01b23SMartin MatuskaThe redundancy of this device should match the redundancy of the other normal 1913ff01b23SMartin Matuskadevices in the pool. 1923ff01b23SMartin MatuskaIf more than one special device is specified, then 1933ff01b23SMartin Matuskaallocations are load-balanced between those devices. 1943ff01b23SMartin Matuska.Pp 1953ff01b23SMartin MatuskaFor more information on special allocations, see the 1963ff01b23SMartin Matuska.Sx Special Allocation Class 1973ff01b23SMartin Matuskasection. 1983ff01b23SMartin Matuska.It Sy cache 1993ff01b23SMartin MatuskaA device used to cache storage pool data. 2003ff01b23SMartin MatuskaA cache device cannot be configured as a mirror or raidz group. 2013ff01b23SMartin MatuskaFor more information, see the 2023ff01b23SMartin Matuska.Sx Cache Devices 2033ff01b23SMartin Matuskasection. 2043ff01b23SMartin Matuska.El 2053ff01b23SMartin Matuska.Pp 206cbfe9975SMartin MatuskaVirtual devices cannot be nested arbitrarily. 207cbfe9975SMartin MatuskaA mirror, raidz or draid virtual device can only be created with files or disks. 208cbfe9975SMartin MatuskaMirrors of mirrors or other such combinations are not allowed. 2093ff01b23SMartin Matuska.Pp 2103ff01b23SMartin MatuskaA pool can have any number of virtual devices at the top of the configuration 2113ff01b23SMartin Matuska.Po known as 2123ff01b23SMartin Matuska.Qq root vdevs 2133ff01b23SMartin Matuska.Pc . 2143ff01b23SMartin MatuskaData is dynamically distributed across all top-level devices to balance data 2153ff01b23SMartin Matuskaamong devices. 2163ff01b23SMartin MatuskaAs new virtual devices are added, ZFS automatically places data on the newly 2173ff01b23SMartin Matuskaavailable devices. 2183ff01b23SMartin Matuska.Pp 2193ff01b23SMartin MatuskaVirtual devices are specified one at a time on the command line, 2203ff01b23SMartin Matuskaseparated by whitespace. 2213ff01b23SMartin MatuskaKeywords like 2223ff01b23SMartin Matuska.Sy mirror No and Sy raidz 2233ff01b23SMartin Matuskaare used to distinguish where a group ends and another begins. 2243ff01b23SMartin MatuskaFor example, the following creates a pool with two root vdevs, 2253ff01b23SMartin Matuskaeach a mirror of two disks: 2263ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar mypool Sy mirror Ar sda sdb Sy mirror Ar sdc sdd 2273ff01b23SMartin Matuska. 2283ff01b23SMartin Matuska.Ss Device Failure and Recovery 2293ff01b23SMartin MatuskaZFS supports a rich set of mechanisms for handling device failure and data 2303ff01b23SMartin Matuskacorruption. 2313ff01b23SMartin MatuskaAll metadata and data is checksummed, and ZFS automatically repairs bad data 232d411c1d6SMartin Matuskafrom a good copy, when corruption is detected. 2333ff01b23SMartin Matuska.Pp 2343ff01b23SMartin MatuskaIn order to take advantage of these features, a pool must make use of some form 2353ff01b23SMartin Matuskaof redundancy, using either mirrored or raidz groups. 2363ff01b23SMartin MatuskaWhile ZFS supports running in a non-redundant configuration, where each root 2373ff01b23SMartin Matuskavdev is simply a disk or file, this is strongly discouraged. 2383ff01b23SMartin MatuskaA single case of bit corruption can render some or all of your data unavailable. 2393ff01b23SMartin Matuska.Pp 2403ff01b23SMartin MatuskaA pool's health status is described by one of three states: 2413ff01b23SMartin Matuska.Sy online , degraded , No or Sy faulted . 2423ff01b23SMartin MatuskaAn online pool has all devices operating normally. 2433ff01b23SMartin MatuskaA degraded pool is one in which one or more devices have failed, but the data is 2443ff01b23SMartin Matuskastill available due to a redundant configuration. 2453ff01b23SMartin MatuskaA faulted pool has corrupted metadata, or one or more faulted devices, and 2463ff01b23SMartin Matuskainsufficient replicas to continue functioning. 2473ff01b23SMartin Matuska.Pp 2483ff01b23SMartin MatuskaThe health of the top-level vdev, such as a mirror or raidz device, 249d411c1d6SMartin Matuskais potentially impacted by the state of its associated vdevs 2503ff01b23SMartin Matuskaor component devices. 2513ff01b23SMartin MatuskaA top-level vdev or component device is in one of the following states: 2523ff01b23SMartin Matuska.Bl -tag -width "DEGRADED" 2533ff01b23SMartin Matuska.It Sy DEGRADED 2543ff01b23SMartin MatuskaOne or more top-level vdevs is in the degraded state because one or more 2553ff01b23SMartin Matuskacomponent devices are offline. 2563ff01b23SMartin MatuskaSufficient replicas exist to continue functioning. 2573ff01b23SMartin Matuska.Pp 2583ff01b23SMartin MatuskaOne or more component devices is in the degraded or faulted state, but 2593ff01b23SMartin Matuskasufficient replicas exist to continue functioning. 2603ff01b23SMartin MatuskaThe underlying conditions are as follows: 2613ff01b23SMartin Matuska.Bl -bullet -compact 2623ff01b23SMartin Matuska.It 263*e2257b31SMartin MatuskaThe number of checksum errors or slow I/Os exceeds acceptable levels and the 264*e2257b31SMartin Matuskadevice is degraded as an indication that something may be wrong. 2653ff01b23SMartin MatuskaZFS continues to use the device as necessary. 2663ff01b23SMartin Matuska.It 2673ff01b23SMartin MatuskaThe number of I/O errors exceeds acceptable levels. 2683ff01b23SMartin MatuskaThe device could not be marked as faulted because there are insufficient 2693ff01b23SMartin Matuskareplicas to continue functioning. 2703ff01b23SMartin Matuska.El 2713ff01b23SMartin Matuska.It Sy FAULTED 2723ff01b23SMartin MatuskaOne or more top-level vdevs is in the faulted state because one or more 2733ff01b23SMartin Matuskacomponent devices are offline. 2743ff01b23SMartin MatuskaInsufficient replicas exist to continue functioning. 2753ff01b23SMartin Matuska.Pp 2763ff01b23SMartin MatuskaOne or more component devices is in the faulted state, and insufficient 2773ff01b23SMartin Matuskareplicas exist to continue functioning. 2783ff01b23SMartin MatuskaThe underlying conditions are as follows: 2793ff01b23SMartin Matuska.Bl -bullet -compact 2803ff01b23SMartin Matuska.It 2813ff01b23SMartin MatuskaThe device could be opened, but the contents did not match expected values. 2823ff01b23SMartin Matuska.It 2833ff01b23SMartin MatuskaThe number of I/O errors exceeds acceptable levels and the device is faulted to 2843ff01b23SMartin Matuskaprevent further use of the device. 2853ff01b23SMartin Matuska.El 2863ff01b23SMartin Matuska.It Sy OFFLINE 2873ff01b23SMartin MatuskaThe device was explicitly taken offline by the 2883ff01b23SMartin Matuska.Nm zpool Cm offline 2893ff01b23SMartin Matuskacommand. 2903ff01b23SMartin Matuska.It Sy ONLINE 2913ff01b23SMartin MatuskaThe device is online and functioning. 2923ff01b23SMartin Matuska.It Sy REMOVED 2933ff01b23SMartin MatuskaThe device was physically removed while the system was running. 2943ff01b23SMartin MatuskaDevice removal detection is hardware-dependent and may not be supported on all 2953ff01b23SMartin Matuskaplatforms. 2963ff01b23SMartin Matuska.It Sy UNAVAIL 2973ff01b23SMartin MatuskaThe device could not be opened. 2983ff01b23SMartin MatuskaIf a pool is imported when a device was unavailable, then the device will be 2993ff01b23SMartin Matuskaidentified by a unique identifier instead of its path since the path was never 3003ff01b23SMartin Matuskacorrect in the first place. 3013ff01b23SMartin Matuska.El 3023ff01b23SMartin Matuska.Pp 3033ff01b23SMartin MatuskaChecksum errors represent events where a disk returned data that was expected 3043ff01b23SMartin Matuskato be correct, but was not. 3053ff01b23SMartin MatuskaIn other words, these are instances of silent data corruption. 3063ff01b23SMartin MatuskaThe checksum errors are reported in 3073ff01b23SMartin Matuska.Nm zpool Cm status 3083ff01b23SMartin Matuskaand 3093ff01b23SMartin Matuska.Nm zpool Cm events . 3103ff01b23SMartin MatuskaWhen a block is stored redundantly, a damaged block may be reconstructed 3113ff01b23SMartin Matuska(e.g. from raidz parity or a mirrored copy). 3123ff01b23SMartin MatuskaIn this case, ZFS reports the checksum error against the disks that contained 3133ff01b23SMartin Matuskadamaged data. 3143ff01b23SMartin MatuskaIf a block is unable to be reconstructed (e.g. due to 3 disks being damaged 3153ff01b23SMartin Matuskain a raidz2 group), it is not possible to determine which disks were silently 3163ff01b23SMartin Matuskacorrupted. 3173ff01b23SMartin MatuskaIn this case, checksum errors are reported for all disks on which the block 3183ff01b23SMartin Matuskais stored. 3193ff01b23SMartin Matuska.Pp 3203ff01b23SMartin MatuskaIf a device is removed and later re-attached to the system, 321d411c1d6SMartin MatuskaZFS attempts to bring the device online automatically. 3223ff01b23SMartin MatuskaDevice attachment detection is hardware-dependent 3233ff01b23SMartin Matuskaand might not be supported on all platforms. 3243ff01b23SMartin Matuska. 3253ff01b23SMartin Matuska.Ss Hot Spares 3263ff01b23SMartin MatuskaZFS allows devices to be associated with pools as 3273ff01b23SMartin Matuska.Qq hot spares . 328d411c1d6SMartin MatuskaThese devices are not actively used in the pool. 329d411c1d6SMartin MatuskaBut, when an active device 3303ff01b23SMartin Matuskafails, it is automatically replaced by a hot spare. 3313ff01b23SMartin MatuskaTo create a pool with hot spares, specify a 3323ff01b23SMartin Matuska.Sy spare 3333ff01b23SMartin Matuskavdev with any number of devices. 3343ff01b23SMartin MatuskaFor example, 3353ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool Sy mirror Ar sda sdb Sy spare Ar sdc sdd 3363ff01b23SMartin Matuska.Pp 3373ff01b23SMartin MatuskaSpares can be shared across multiple pools, and can be added with the 3383ff01b23SMartin Matuska.Nm zpool Cm add 3393ff01b23SMartin Matuskacommand and removed with the 3403ff01b23SMartin Matuska.Nm zpool Cm remove 3413ff01b23SMartin Matuskacommand. 3423ff01b23SMartin MatuskaOnce a spare replacement is initiated, a new 3433ff01b23SMartin Matuska.Sy spare 3443ff01b23SMartin Matuskavdev is created within the configuration that will remain there until the 3453ff01b23SMartin Matuskaoriginal device is replaced. 346d411c1d6SMartin MatuskaAt this point, the hot spare becomes available again, if another device fails. 3473ff01b23SMartin Matuska.Pp 3483ff01b23SMartin MatuskaIf a pool has a shared spare that is currently being used, the pool cannot be 349d411c1d6SMartin Matuskaexported, since other pools may use this shared spare, which may lead to 3503ff01b23SMartin Matuskapotential data corruption. 3513ff01b23SMartin Matuska.Pp 3523ff01b23SMartin MatuskaShared spares add some risk. 3533ff01b23SMartin MatuskaIf the pools are imported on different hosts, 3543ff01b23SMartin Matuskaand both pools suffer a device failure at the same time, 3553ff01b23SMartin Matuskaboth could attempt to use the spare at the same time. 3563ff01b23SMartin MatuskaThis may not be detected, resulting in data corruption. 3573ff01b23SMartin Matuska.Pp 3583ff01b23SMartin MatuskaAn in-progress spare replacement can be cancelled by detaching the hot spare. 3593ff01b23SMartin MatuskaIf the original faulted device is detached, then the hot spare assumes its 3603ff01b23SMartin Matuskaplace in the configuration, and is removed from the spare list of all active 3613ff01b23SMartin Matuskapools. 3623ff01b23SMartin Matuska.Pp 3633ff01b23SMartin MatuskaThe 3643ff01b23SMartin Matuska.Sy draid 3653ff01b23SMartin Matuskavdev type provides distributed hot spares. 3663ff01b23SMartin MatuskaThese hot spares are named after the dRAID vdev they're a part of 3673ff01b23SMartin Matuska.Po Sy draid1 Ns - Ns Ar 2 Ns - Ns Ar 3 No specifies spare Ar 3 No of vdev Ar 2 , 3683ff01b23SMartin Matuska.No which is a single parity dRAID Pc 3693ff01b23SMartin Matuskaand may only be used by that dRAID vdev. 3703ff01b23SMartin MatuskaOtherwise, they behave the same as normal hot spares. 3713ff01b23SMartin Matuska.Pp 3723ff01b23SMartin MatuskaSpares cannot replace log devices. 3733ff01b23SMartin Matuska. 3743ff01b23SMartin Matuska.Ss Intent Log 3753ff01b23SMartin MatuskaThe ZFS Intent Log (ZIL) satisfies POSIX requirements for synchronous 3763ff01b23SMartin Matuskatransactions. 3773ff01b23SMartin MatuskaFor instance, databases often require their transactions to be on stable storage 3783ff01b23SMartin Matuskadevices when returning from a system call. 3793ff01b23SMartin MatuskaNFS and other applications can also use 3803ff01b23SMartin Matuska.Xr fsync 2 3813ff01b23SMartin Matuskato ensure data stability. 3823ff01b23SMartin MatuskaBy default, the intent log is allocated from blocks within the main pool. 3833ff01b23SMartin MatuskaHowever, it might be possible to get better performance using separate intent 3843ff01b23SMartin Matuskalog devices such as NVRAM or a dedicated disk. 3853ff01b23SMartin MatuskaFor example: 3863ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool sda sdb Sy log Ar sdc 3873ff01b23SMartin Matuska.Pp 3883ff01b23SMartin MatuskaMultiple log devices can also be specified, and they can be mirrored. 3893ff01b23SMartin MatuskaSee the 3903ff01b23SMartin Matuska.Sx EXAMPLES 3913ff01b23SMartin Matuskasection for an example of mirroring multiple log devices. 3923ff01b23SMartin Matuska.Pp 393d411c1d6SMartin MatuskaLog devices can be added, replaced, attached, detached, and removed. 3943ff01b23SMartin MatuskaIn addition, log devices are imported and exported as part of the pool 3953ff01b23SMartin Matuskathat contains them. 3963ff01b23SMartin MatuskaMirrored devices can be removed by specifying the top-level mirror vdev. 3973ff01b23SMartin Matuska. 3983ff01b23SMartin Matuska.Ss Cache Devices 3993ff01b23SMartin MatuskaDevices can be added to a storage pool as 4003ff01b23SMartin Matuska.Qq cache devices . 4013ff01b23SMartin MatuskaThese devices provide an additional layer of caching between main memory and 4023ff01b23SMartin Matuskadisk. 4033ff01b23SMartin MatuskaFor read-heavy workloads, where the working set size is much larger than what 4043ff01b23SMartin Matuskacan be cached in main memory, using cache devices allows much more of this 4053ff01b23SMartin Matuskaworking set to be served from low latency media. 4063ff01b23SMartin MatuskaUsing cache devices provides the greatest performance improvement for random 4073ff01b23SMartin Matuskaread-workloads of mostly static content. 4083ff01b23SMartin Matuska.Pp 4093ff01b23SMartin MatuskaTo create a pool with cache devices, specify a 4103ff01b23SMartin Matuska.Sy cache 4113ff01b23SMartin Matuskavdev with any number of devices. 4123ff01b23SMartin MatuskaFor example: 4133ff01b23SMartin Matuska.Dl # Nm zpool Cm create Ar pool sda sdb Sy cache Ar sdc sdd 4143ff01b23SMartin Matuska.Pp 4153ff01b23SMartin MatuskaCache devices cannot be mirrored or part of a raidz configuration. 4163ff01b23SMartin MatuskaIf a read error is encountered on a cache device, that read I/O is reissued to 4173ff01b23SMartin Matuskathe original storage pool device, which might be part of a mirrored or raidz 4183ff01b23SMartin Matuskaconfiguration. 4193ff01b23SMartin Matuska.Pp 4203ff01b23SMartin MatuskaThe content of the cache devices is persistent across reboots and restored 4213ff01b23SMartin Matuskaasynchronously when importing the pool in L2ARC (persistent L2ARC). 4223ff01b23SMartin MatuskaThis can be disabled by setting 4233ff01b23SMartin Matuska.Sy l2arc_rebuild_enabled Ns = Ns Sy 0 . 4243ff01b23SMartin MatuskaFor cache devices smaller than 425716fd348SMartin Matuska.Em 1 GiB , 426d411c1d6SMartin MatuskaZFS does not write the metadata structures 427d411c1d6SMartin Matuskarequired for rebuilding the L2ARC, to conserve space. 4283ff01b23SMartin MatuskaThis can be changed with 4293ff01b23SMartin Matuska.Sy l2arc_rebuild_blocks_min_l2size . 4303ff01b23SMartin MatuskaThe cache device header 4313ff01b23SMartin Matuska.Pq Em 512 B 4323ff01b23SMartin Matuskais updated even if no metadata structures are written. 4333ff01b23SMartin MatuskaSetting 4343ff01b23SMartin Matuska.Sy l2arc_headroom Ns = Ns Sy 0 4353ff01b23SMartin Matuskawill result in scanning the full-length ARC lists for cacheable content to be 4363ff01b23SMartin Matuskawritten in L2ARC (persistent ARC). 4373ff01b23SMartin MatuskaIf a cache device is added with 438d411c1d6SMartin Matuska.Nm zpool Cm add , 439d411c1d6SMartin Matuskaits label and header will be overwritten and its contents will not be 4403ff01b23SMartin Matuskarestored in L2ARC, even if the device was previously part of the pool. 4413ff01b23SMartin MatuskaIf a cache device is onlined with 442d411c1d6SMartin Matuska.Nm zpool Cm online , 4433ff01b23SMartin Matuskaits contents will be restored in L2ARC. 444d411c1d6SMartin MatuskaThis is useful in case of memory pressure, 4453ff01b23SMartin Matuskawhere the contents of the cache device are not fully restored in L2ARC. 446d411c1d6SMartin MatuskaThe user can off- and online the cache device when there is less memory 447d411c1d6SMartin Matuskapressure, to fully restore its contents to L2ARC. 4483ff01b23SMartin Matuska. 4493ff01b23SMartin Matuska.Ss Pool checkpoint 4503ff01b23SMartin MatuskaBefore starting critical procedures that include destructive actions 4513ff01b23SMartin Matuska.Pq like Nm zfs Cm destroy , 452d411c1d6SMartin Matuskaan administrator can checkpoint the pool's state and, in the case of a 4533ff01b23SMartin Matuskamistake or failure, rewind the entire pool back to the checkpoint. 4543ff01b23SMartin MatuskaOtherwise, the checkpoint can be discarded when the procedure has completed 4553ff01b23SMartin Matuskasuccessfully. 4563ff01b23SMartin Matuska.Pp 4573ff01b23SMartin MatuskaA pool checkpoint can be thought of as a pool-wide snapshot and should be used 4583ff01b23SMartin Matuskawith care as it contains every part of the pool's state, from properties to vdev 4593ff01b23SMartin Matuskaconfiguration. 4603ff01b23SMartin MatuskaThus, certain operations are not allowed while a pool has a checkpoint. 4613ff01b23SMartin MatuskaSpecifically, vdev removal/attach/detach, mirror splitting, and 4623ff01b23SMartin Matuskachanging the pool's GUID. 4633ff01b23SMartin MatuskaAdding a new vdev is supported, but in the case of a rewind it will have to be 4643ff01b23SMartin Matuskaadded again. 4653ff01b23SMartin MatuskaFinally, users of this feature should keep in mind that scrubs in a pool that 4663ff01b23SMartin Matuskahas a checkpoint do not repair checkpointed data. 4673ff01b23SMartin Matuska.Pp 4683ff01b23SMartin MatuskaTo create a checkpoint for a pool: 4693ff01b23SMartin Matuska.Dl # Nm zpool Cm checkpoint Ar pool 4703ff01b23SMartin Matuska.Pp 4713ff01b23SMartin MatuskaTo later rewind to its checkpointed state, you need to first export it and 4723ff01b23SMartin Matuskathen rewind it during import: 4733ff01b23SMartin Matuska.Dl # Nm zpool Cm export Ar pool 4743ff01b23SMartin Matuska.Dl # Nm zpool Cm import Fl -rewind-to-checkpoint Ar pool 4753ff01b23SMartin Matuska.Pp 4763ff01b23SMartin MatuskaTo discard the checkpoint from a pool: 4773ff01b23SMartin Matuska.Dl # Nm zpool Cm checkpoint Fl d Ar pool 4783ff01b23SMartin Matuska.Pp 4793ff01b23SMartin MatuskaDataset reservations (controlled by the 4803ff01b23SMartin Matuska.Sy reservation No and Sy refreservation 4813ff01b23SMartin Matuskaproperties) may be unenforceable while a checkpoint exists, because the 4823ff01b23SMartin Matuskacheckpoint is allowed to consume the dataset's reservation. 4833ff01b23SMartin MatuskaFinally, data that is part of the checkpoint but has been freed in the 4843ff01b23SMartin Matuskacurrent state of the pool won't be scanned during a scrub. 4853ff01b23SMartin Matuska. 4863ff01b23SMartin Matuska.Ss Special Allocation Class 4873ff01b23SMartin MatuskaAllocations in the special class are dedicated to specific block types. 488d411c1d6SMartin MatuskaBy default, this includes all metadata, the indirect blocks of user data, and 4893ff01b23SMartin Matuskaany deduplication tables. 4903ff01b23SMartin MatuskaThe class can also be provisioned to accept small file blocks. 4913ff01b23SMartin Matuska.Pp 4923ff01b23SMartin MatuskaA pool must always have at least one normal 4933ff01b23SMartin Matuska.Pq non- Ns Sy dedup Ns /- Ns Sy special 4943ff01b23SMartin Matuskavdev before 4953ff01b23SMartin Matuskaother devices can be assigned to the special class. 4963ff01b23SMartin MatuskaIf the 4973ff01b23SMartin Matuska.Sy special 4983ff01b23SMartin Matuskaclass becomes full, then allocations intended for it 4993ff01b23SMartin Matuskawill spill back into the normal class. 5003ff01b23SMartin Matuska.Pp 5013ff01b23SMartin MatuskaDeduplication tables can be excluded from the special class by unsetting the 5023ff01b23SMartin Matuska.Sy zfs_ddt_data_is_special 5033ff01b23SMartin MatuskaZFS module parameter. 5043ff01b23SMartin Matuska.Pp 5053ff01b23SMartin MatuskaInclusion of small file blocks in the special class is opt-in. 5063ff01b23SMartin MatuskaEach dataset can control the size of small file blocks allowed 5073ff01b23SMartin Matuskain the special class by setting the 5083ff01b23SMartin Matuska.Sy special_small_blocks 5093ff01b23SMartin Matuskaproperty to nonzero. 5103ff01b23SMartin MatuskaSee 5113ff01b23SMartin Matuska.Xr zfsprops 7 5123ff01b23SMartin Matuskafor more info on this property. 513