Lines Matching +full:sub +full:- +full:block
1 .. SPDX-License-Identifier: GPL-2.0
4 ZoneFS - Zone filesystem for Zoned block devices
10 zonefs is a very simple file system exposing each zone of a zoned block device
11 as a file. Unlike a regular POSIX-compliant file system with native zoned block
13 constraint of zoned block devices to the user. Files representing sequential
17 As such, zonefs is in essence closer to a raw block device access interface
18 than to a full-featured POSIX file system. The goal of zonefs is to simplify
19 the implementation of zoned block device support in applications by replacing
20 raw block device file accesses with a richer file API, avoiding relying on
21 direct block device file ioctls which may be more obscure to developers. One
22 example of this approach is the implementation of LSM (log-structured merge)
23 tree structures (such as used in RocksDB and LevelDB) on zoned block devices
30 Zoned block devices
31 -------------------
39 regular block device.
49 Block Commands (ZBC) and Zoned ATA Commands (ZAC) interfaces on Shingled
60 Zonefs exposes the zones of a zoned block device as files. The files
62 by sub-directories. This file structure is built entirely using zone information
63 provided by the device and so does not require any complex on-disk metadata
66 On-disk metadata
67 ----------------
69 zonefs on-disk metadata is reduced to an immutable super block which
76 The super block is always written on disk at sector 0. The first zone of the
77 device storing the super block is never exposed as a zone file by zonefs. If
78 the zone containing the super block is a sequential zone, the mkzonefs format
80 state to make it read-only, preventing any data write.
82 Zone type sub-directories
83 -------------------------
86 sub-directory automatically created on mount.
88 For conventional zones, the sub-directory "cnv" is used. This directory is
91 be exposed as a file as it will be used to store the zonefs super block. For
92 such devices, the "cnv" sub-directory will not be created.
94 For sequential write zones, the sub-directory "seq" is used.
98 "seq" sub-directories.
105 ----------
114 capacity is failed with the -EFBIG error.
117 sub-directories is not allowed.
123 -----------------------
133 ---------------------
135 The size of sequential zone files grouped in the "seq" sub-directory represents
142 write issued and still in-flight (for asynchronous I/O operations).
148 implemented by the block layer elevator. An elevator implementing the sequential
149 write feature for zoned block device (ELEVATOR_F_ZBD_SEQ_WRITE elevator feature)
150 must be used. This type of elevator (e.g. mq-deadline) is set by default
151 for zoned block devices on device initialization.
163 --------------
174 -----------------
176 Zoned block devices may fail I/O requests for reasons similar to regular block
178 failure pattern, the standards governing zoned block devices behavior define
181 * A zone may transition to the read-only condition (BLK_ZONE_COND_READONLY):
185 state. While the reasons for the device to transition a zone to read-only
188 changed to read-only).
192 offline zone back to an operational good state. Similarly to zone read-only
194 condition are undefined. A typical cause would be a defective read-write head
208 * Delayed write errors: similarly to regular block devices, if the device side
228 * A zone condition change to read-only or offline also always triggers zonefs
237 the file zone. For instance, the partial failure of a multi-BIO large write
250 A zone condition change to read-only is indicated with a change in the file
251 access permissions to render the file read-only. This disables changes to the
260 +--------------+-----------+-----------------------------------------+
265 +--------------+-----------+-----------------------------------------+
267 | remount-ro | read-only | as is yes no yes no |
269 +--------------+-----------+-----------------------------------------+
271 | zone-ro | read-only | as is yes no yes no |
273 +--------------+-----------+-----------------------------------------+
275 | zone-offline | read-only | 0 no no yes no |
277 +--------------+-----------+-----------------------------------------+
279 | repair | read-only | as is yes no yes no |
281 +--------------+-----------+-----------------------------------------+
285 * The "errors=remount-ro" mount option is the default behavior of zonefs I/O
287 * With the "errors=remount-ro" mount option, the change of the file access
288 permissions to read-only applies to all files. The file system is remounted
289 read-only.
294 * File access permission changes to read-only due to the device transitioning
295 zones to the read-only condition are permanent. Remounting or reformatting
296 the device will not re-enable file write access.
297 * File access permission changes implied by the remount-ro, zone-ro and
298 zone-offline mount options are temporary for zones in a good condition.
303 indicated as being read-only or offline by the device still imply changes to
307 -------------
311 * explicit-open
320 * remount-ro (default)
321 * zone-ro
322 * zone-offline
325 The run-time I/O error actions defined for each behavior are detailed in the
327 The handling of read-only zones also differs between mount-time and run-time.
328 If a read-only zone is found at mount time, the zone is always treated in the
330 file size set to 0. This is necessary as the write pointer of read-only zones
333 read-only zone discovered at run-time, as indicated in the previous section.
336 "explicit-open" option
339 A zoned block device (e.g. an NVMe Zoned Namespace device) may have limits on
346 To avoid these potential errors, the "explicit-open" mount option forces zones
350 "explicit-open" mount option will result in a zone close command being issued
355 ------------------------
359 where <dev> is the name of the mounted zoned block device.
368 state of other zones. When the *explicit-open* mount option is used, zonefs
373 zone files open for writing. When the "explicit-open" mount option is used,
374 this number can never exceed *max_wro_seq_files*. If the *explicit-open*
382 is explicitly open (which happens only if the *explicit-open* mount option is
389 *nr_active_seq_files*, regardless of the use of the *explicit-open* mount
395 The mkzonefs tool is used to format zoned block devices for use with zonefs.
398 https://github.com/damien-lemoal/zonefs-tools
400 zonefs-tools also includes a test suite which can be run against any zoned
401 block device, including null_blk block device created with zoned mode.
404 --------
406 The following formats a 15TB host-managed SMR HDD with 256 MB zones
409 # mkzonefs -o aggr_cnv /dev/sdX
410 # mount -t zonefs /dev/sdX /mnt
411 # ls -l /mnt/
413 dr-xr-xr-x 2 root root 1 Nov 25 13:23 cnv
414 dr-xr-xr-x 2 root root 55356 Nov 25 13:23 seq
416 The size of the zone files sub-directories indicate the number of files
421 # ls -l /mnt/cnv
423 -rw-r----- 1 root root 140391743488 Nov 25 13:23 0
428 # mount -o loop /mnt/cnv/0 /data
430 The "seq" sub-directory grouping files for sequential write zones has in this
433 # ls -lv /mnt/seq
435 -rw-r----- 1 root root 0 Nov 25 13:23 0
436 -rw-r----- 1 root root 0 Nov 25 13:23 1
437 -rw-r----- 1 root root 0 Nov 25 13:23 2
439 -rw-r----- 1 root root 0 Nov 25 13:23 55354
440 -rw-r----- 1 root root 0 Nov 25 13:23 55355
450 # ls -l /mnt/seq/0
451 -rw-r----- 1 root root 4096 Nov 25 13:23 /mnt/seq/0
456 # truncate -s 268435456 /mnt/seq/0
457 # ls -l /mnt/seq/0
458 -rw-r----- 1 root root 268435456 Nov 25 13:49 /mnt/seq/0
461 append-writes to the file::
463 # truncate -s 0 /mnt/seq/0
464 # ls -l /mnt/seq/0
465 -rw-r----- 1 root root 0 Nov 25 13:49 /mnt/seq/0
473 Size: 0 Blocks: 524288 IO Block: 4096 regular empty file
475 Access: (0640/-rw-r-----) Uid: ( 0/ root) Gid: ( 0/ root)
476 Access: 2019-11-25 13:23:57.048971997 +0900
477 Modify: 2019-11-25 13:52:25.553805765 +0900
478 Change: 2019-11-25 13:52:25.553805765 +0900
479 Birth: -
483 capacity in this example. Of note is that the "IO block" field always