1.. SPDX-License-Identifier: GPL-2.0-only 2 3====== 4dm-vdo 5====== 6 7The dm-vdo (virtual data optimizer) device mapper target provides 8block-level deduplication, compression, and thin provisioning. As a device 9mapper target, it can add these features to the storage stack, compatible 10with any file system. The vdo target does not protect against data 11corruption, relying instead on integrity protection of the storage below 12it. It is strongly recommended that lvm be used to manage vdo volumes. See 13lvmvdo(7). 14 15Userspace component 16=================== 17 18Formatting a vdo volume requires the use of the 'vdoformat' tool, available 19at: 20 21https://github.com/dm-vdo/vdo/ 22 23In most cases, a vdo target will recover from a crash automatically the 24next time it is started. In cases where it encountered an unrecoverable 25error (either during normal operation or crash recovery) the target will 26enter or come up in read-only mode. Because read-only mode is indicative of 27data-loss, a positive action must be taken to bring vdo out of read-only 28mode. The 'vdoforcerebuild' tool, available from the same repo, is used to 29prepare a read-only vdo to exit read-only mode. After running this tool, 30the vdo target will rebuild its metadata the next time it is 31started. Although some data may be lost, the rebuilt vdo's metadata will be 32internally consistent and the target will be writable again. 33 34The repo also contains additional userspace tools which can be used to 35inspect a vdo target's on-disk metadata. Fortunately, these tools are 36rarely needed except by dm-vdo developers. 37 38Metadata requirements 39===================== 40 41Each vdo volume reserves 3GB of space for metadata, or more depending on 42its configuration. It is helpful to check that the space saved by 43deduplication and compression is not cancelled out by the metadata 44requirements. An estimation of the space saved for a specific dataset can 45be computed with the vdo estimator tool, which is available at: 46 47https://github.com/dm-vdo/vdoestimator/ 48 49Target interface 50================ 51 52Table line 53---------- 54 55:: 56 57 <offset> <logical device size> vdo V4 <storage device> 58 <storage device size> <minimum I/O size> <block map cache size> 59 <block map era length> [optional arguments] 60 61 62Required parameters: 63 64 offset: 65 The offset, in sectors, at which the vdo volume's logical 66 space begins. 67 68 logical device size: 69 The size of the device which the vdo volume will service, 70 in sectors. Must match the current logical size of the vdo 71 volume. 72 73 storage device: 74 The device holding the vdo volume's data and metadata. 75 76 storage device size: 77 The size of the device holding the vdo volume, as a number 78 of 4096-byte blocks. Must match the current size of the vdo 79 volume. 80 81 minimum I/O size: 82 The minimum I/O size for this vdo volume to accept, in 83 bytes. Valid values are 512 or 4096. The recommended value 84 is 4096. 85 86 block map cache size: 87 The size of the block map cache, as a number of 4096-byte 88 blocks. The minimum and recommended value is 32768 blocks. 89 If the logical thread count is non-zero, the cache size 90 must be at least 4096 blocks per logical thread. 91 92 block map era length: 93 The speed with which the block map cache writes out 94 modified block map pages. A smaller era length is likely to 95 reduce the amount of time spent rebuilding, at the cost of 96 increased block map writes during normal operation. The 97 maximum and recommended value is 16380; the minimum value 98 is 1. 99 100Optional parameters: 101-------------------- 102Some or all of these parameters may be specified as <key> <value> pairs. 103 104Thread related parameters: 105 106Different categories of work are assigned to separate thread groups, and 107the number of threads in each group can be configured separately. 108 109If <hash>, <logical>, and <physical> are all set to 0, the work handled by 110all three thread types will be handled by a single thread. If any of these 111values are non-zero, all of them must be non-zero. 112 113 ack: 114 The number of threads used to complete bios. Since 115 completing a bio calls an arbitrary completion function 116 outside the vdo volume, threads of this type allow the vdo 117 volume to continue processing requests even when bio 118 completion is slow. The default is 1. 119 120 bio: 121 The number of threads used to issue bios to the underlying 122 storage. Threads of this type allow the vdo volume to 123 continue processing requests even when bio submission is 124 slow. The default is 4. 125 126 bioRotationInterval: 127 The number of bios to enqueue on each bio thread before 128 switching to the next thread. The value must be greater 129 than 0 and not more than 1024; the default is 64. 130 131 cpu: 132 The number of threads used to do CPU-intensive work, such 133 as hashing and compression. The default is 1. 134 135 hash: 136 The number of threads used to manage data comparisons for 137 deduplication based on the hash value of data blocks. The 138 default is 0. 139 140 logical: 141 The number of threads used to manage caching and locking 142 based on the logical address of incoming bios. The default 143 is 0; the maximum is 60. 144 145 physical: 146 The number of threads used to manage administration of the 147 underlying storage device. At format time, a slab size for 148 the vdo is chosen; the vdo storage device must be large 149 enough to have at least 1 slab per physical thread. The 150 default is 0; the maximum is 16. 151 152Miscellaneous parameters: 153 154 maxDiscard: 155 The maximum size of discard bio accepted, in 4096-byte 156 blocks. I/O requests to a vdo volume are normally split 157 into 4096-byte blocks, and processed up to 2048 at a time. 158 However, discard requests to a vdo volume can be 159 automatically split to a larger size, up to <maxDiscard> 160 4096-byte blocks in a single bio, and are limited to 1500 161 at a time. Increasing this value may provide better overall 162 performance, at the cost of increased latency for the 163 individual discard requests. The default and minimum is 1; 164 the maximum is UINT_MAX / 4096. 165 166 deduplication: 167 Whether deduplication is enabled. The default is 'on'; the 168 acceptable values are 'on' and 'off'. 169 170 compression: 171 Whether compression is enabled. The default is 'off'; the 172 acceptable values are 'on' and 'off'. 173 174Device modification 175------------------- 176 177A modified table may be loaded into a running, non-suspended vdo volume. 178The modifications will take effect when the device is next resumed. The 179modifiable parameters are <logical device size>, <physical device size>, 180<maxDiscard>, <compression>, and <deduplication>. 181 182If the logical device size or physical device size are changed, upon 183successful resume vdo will store the new values and require them on future 184startups. These two parameters may not be decreased. The logical device 185size may not exceed 4 PB. The physical device size must increase by at 186least 32832 4096-byte blocks if at all, and must not exceed the size of the 187underlying storage device. Additionally, when formatting the vdo device, a 188slab size is chosen: the physical device size may never increase above the 189size which provides 8192 slabs, and each increase must be large enough to 190add at least one new slab. 191 192Examples: 193 194Start a previously-formatted vdo volume with 1 GB logical space and 1 GB 195physical space, storing to /dev/dm-1 which has more than 1 GB of space. 196 197:: 198 199 dmsetup create vdo0 --table \ 200 "0 2097152 vdo V4 /dev/dm-1 262144 4096 32768 16380" 201 202Grow the logical size to 4 GB. 203 204:: 205 206 dmsetup reload vdo0 --table \ 207 "0 8388608 vdo V4 /dev/dm-1 262144 4096 32768 16380" 208 dmsetup resume vdo0 209 210Grow the physical size to 2 GB. 211 212:: 213 214 dmsetup reload vdo0 --table \ 215 "0 8388608 vdo V4 /dev/dm-1 524288 4096 32768 16380" 216 dmsetup resume vdo0 217 218Grow the physical size by 1 GB more and increase max discard sectors. 219 220:: 221 222 dmsetup reload vdo0 --table \ 223 "0 10485760 vdo V4 /dev/dm-1 786432 4096 32768 16380 maxDiscard 8" 224 dmsetup resume vdo0 225 226Stop the vdo volume. 227 228:: 229 230 dmsetup remove vdo0 231 232Start the vdo volume again. Note that the logical and physical device sizes 233must still match, but other parameters can change. 234 235:: 236 237 dmsetup create vdo1 --table \ 238 "0 10485760 vdo V4 /dev/dm-1 786432 512 65550 5000 hash 1 logical 3 physical 2" 239 240Messages 241-------- 242All vdo devices accept messages in the form: 243 244:: 245 246 dmsetup message <target-name> 0 <message-name> <message-parameters> 247 248The messages are: 249 250 stats: 251 Outputs the current view of the vdo statistics. Mostly used 252 by the vdostats userspace program to interpret the output 253 buffer. 254 255 config: 256 Outputs useful vdo configuration information. Mostly used 257 by users who want to recreate a similar VDO volume and 258 want to know the creation configuration used. 259 260 dump: 261 Dumps many internal structures to the system log. This is 262 not always safe to run, so it should only be used to debug 263 a hung vdo. Optional parameters to specify structures to 264 dump are: 265 266 viopool: The pool of I/O requests incoming bios 267 pools: A synonym of 'viopool' 268 vdo: Most of the structures managing on-disk data 269 queues: Basic information about each vdo thread 270 threads: A synonym of 'queues' 271 default: Equivalent to 'queues vdo' 272 all: All of the above. 273 274 dump-on-shutdown: 275 Perform a default dump next time vdo shuts down. 276 277 278Status 279------ 280 281:: 282 283 <device> <operating mode> <in recovery> <index state> 284 <compression state> <physical blocks used> <total physical blocks> 285 286 device: 287 The name of the vdo volume. 288 289 operating mode: 290 The current operating mode of the vdo volume; values may be 291 'normal', 'recovering' (the volume has detected an issue 292 with its metadata and is attempting to repair itself), and 293 'read-only' (an error has occurred that forces the vdo 294 volume to only support read operations and not writes). 295 296 in recovery: 297 Whether the vdo volume is currently in recovery mode; 298 values may be 'recovering' or '-' which indicates not 299 recovering. 300 301 index state: 302 The current state of the deduplication index in the vdo 303 volume; values may be 'closed', 'closing', 'error', 304 'offline', 'online', 'opening', and 'unknown'. 305 306 compression state: 307 The current state of compression in the vdo volume; values 308 may be 'offline' and 'online'. 309 310 used physical blocks: 311 The number of physical blocks in use by the vdo volume. 312 313 total physical blocks: 314 The total number of physical blocks the vdo volume may use; 315 the difference between this value and the 316 <used physical blocks> is the number of blocks the vdo 317 volume has left before being full. 318 319Memory Requirements 320=================== 321 322A vdo target requires a fixed 38 MB of RAM along with the following amounts 323that scale with the target: 324 325- 1.15 MB of RAM for each 1 MB of configured block map cache size. The 326 block map cache requires a minimum of 150 MB. 327- 1.6 MB of RAM for each 1 TB of logical space. 328- 268 MB of RAM for each 1 TB of physical storage managed by the volume. 329 330The deduplication index requires additional memory which scales with the 331size of the deduplication window. For dense indexes, the index requires 1 332GB of RAM per 1 TB of window. For sparse indexes, the index requires 1 GB 333of RAM per 10 TB of window. The index configuration is set when the target 334is formatted and may not be modified. 335 336Module Parameters 337================= 338 339The vdo driver has a numeric parameter 'log_level' which controls the 340verbosity of logging from the driver. The default setting is 6 341(LOGLEVEL_INFO and more severe messages). 342 343Run-time Usage 344============== 345 346When using dm-vdo, it is important to be aware of the ways in which its 347behavior differs from other storage targets. 348 349- There is no guarantee that over-writes of existing blocks will succeed. 350 Because the underlying storage may be multiply referenced, over-writing 351 an existing block generally requires a vdo to have a free block 352 available. 353 354- When blocks are no longer in use, sending a discard request for those 355 blocks lets the vdo release references for those blocks. If the vdo is 356 thinly provisioned, discarding unused blocks is essential to prevent the 357 target from running out of space. However, due to the sharing of 358 duplicate blocks, no discard request for any given logical block is 359 guaranteed to reclaim space. 360 361- Assuming the underlying storage properly implements flush requests, vdo 362 is resilient against crashes, however, unflushed writes may or may not 363 persist after a crash. 364 365- Each write to a vdo target entails a significant amount of processing. 366 However, much of the work is paralellizable. Therefore, vdo targets 367 achieve better throughput at higher I/O depths, and can support up 2048 368 requests in parallel. 369 370Tuning 371====== 372 373The vdo device has many options, and it can be difficult to make optimal 374choices without perfect knowledge of the workload. Additionally, most 375configuration options must be set when a vdo target is started, and cannot 376be changed without shutting it down completely; the configuration cannot be 377changed while the target is active. Ideally, tuning with simulated 378workloads should be performed before deploying vdo in production 379environments. 380 381The most important value to adjust is the block map cache size. In order to 382service a request for any logical address, a vdo must load the portion of 383the block map which holds the relevant mapping. These mappings are cached. 384Performance will suffer when the working set does not fit in the cache. By 385default, a vdo allocates 128 MB of metadata cache in RAM to support 386efficient access to 100 GB of logical space at a time. It should be scaled 387up proportionally for larger working sets. 388 389The logical and physical thread counts should also be adjusted. A logical 390thread controls a disjoint section of the block map, so additional logical 391threads increase parallelism and can increase throughput. Physical threads 392control a disjoint section of the data blocks, so additional physical 393threads can also increase throughput. However, excess threads can waste 394resources and increase contention. 395 396Bio submission threads control the parallelism involved in sending I/O to 397the underlying storage; fewer threads mean there is more opportunity to 398reorder I/O requests for performance benefit, but also that each I/O 399request has to wait longer before being submitted. 400 401Bio acknowledgment threads are used for finishing I/O requests. This is 402done on dedicated threads since the amount of work required to execute a 403bio's callback can not be controlled by the vdo itself. Usually one thread 404is sufficient but additional threads may be beneficial, particularly when 405bios have CPU-heavy callbacks. 406 407CPU threads are used for hashing and for compression; in workloads with 408compression enabled, more threads may result in higher throughput. 409 410Hash threads are used to sort active requests by hash and determine whether 411they should deduplicate; the most CPU intensive actions done by these 412threads are comparison of 4096-byte data blocks. In most cases, a single 413hash thread is sufficient. 414