1.. SPDX-License-Identifier: GPL-2.0-only 2 3dm-vdo 4====== 5 6The dm-vdo (virtual data optimizer) device mapper target provides 7block-level deduplication, compression, and thin provisioning. As a device 8mapper target, it can add these features to the storage stack, compatible 9with any file system. The vdo target does not protect against data 10corruption, relying instead on integrity protection of the storage below 11it. It is strongly recommended that lvm be used to manage vdo volumes. See 12lvmvdo(7). 13 14Userspace component 15=================== 16 17Formatting a vdo volume requires the use of the 'vdoformat' tool, available 18at: 19 20https://github.com/dm-vdo/vdo/ 21 22In most cases, a vdo target will recover from a crash automatically the 23next time it is started. In cases where it encountered an unrecoverable 24error (either during normal operation or crash recovery) the target will 25enter or come up in read-only mode. Because read-only mode is indicative of 26data-loss, a positive action must be taken to bring vdo out of read-only 27mode. The 'vdoforcerebuild' tool, available from the same repo, is used to 28prepare a read-only vdo to exit read-only mode. After running this tool, 29the vdo target will rebuild its metadata the next time it is 30started. Although some data may be lost, the rebuilt vdo's metadata will be 31internally consistent and the target will be writable again. 32 33The repo also contains additional userspace tools which can be used to 34inspect a vdo target's on-disk metadata. Fortunately, these tools are 35rarely needed except by dm-vdo developers. 36 37Metadata requirements 38===================== 39 40Each vdo volume reserves 3GB of space for metadata, or more depending on 41its configuration. It is helpful to check that the space saved by 42deduplication and compression is not cancelled out by the metadata 43requirements. An estimation of the space saved for a specific dataset can 44be computed with the vdo estimator tool, which is available at: 45 46https://github.com/dm-vdo/vdoestimator/ 47 48Target interface 49================ 50 51Table line 52---------- 53 54:: 55 56 <offset> <logical device size> vdo V4 <storage device> 57 <storage device size> <minimum I/O size> <block map cache size> 58 <block map era length> [optional arguments] 59 60 61Required parameters: 62 63 offset: 64 The offset, in sectors, at which the vdo volume's logical 65 space begins. 66 67 logical device size: 68 The size of the device which the vdo volume will service, 69 in sectors. Must match the current logical size of the vdo 70 volume. 71 72 storage device: 73 The device holding the vdo volume's data and metadata. 74 75 storage device size: 76 The size of the device holding the vdo volume, as a number 77 of 4096-byte blocks. Must match the current size of the vdo 78 volume. 79 80 minimum I/O size: 81 The minimum I/O size for this vdo volume to accept, in 82 bytes. Valid values are 512 or 4096. The recommended value 83 is 4096. 84 85 block map cache size: 86 The size of the block map cache, as a number of 4096-byte 87 blocks. The minimum and recommended value is 32768 blocks. 88 If the logical thread count is non-zero, the cache size 89 must be at least 4096 blocks per logical thread. 90 91 block map era length: 92 The speed with which the block map cache writes out 93 modified block map pages. A smaller era length is likely to 94 reduce the amount of time spent rebuilding, at the cost of 95 increased block map writes during normal operation. The 96 maximum and recommended value is 16380; the minimum value 97 is 1. 98 99Optional parameters: 100-------------------- 101Some or all of these parameters may be specified as <key> <value> pairs. 102 103Thread related parameters: 104 105Different categories of work are assigned to separate thread groups, and 106the number of threads in each group can be configured separately. 107 108If <hash>, <logical>, and <physical> are all set to 0, the work handled by 109all three thread types will be handled by a single thread. If any of these 110values are non-zero, all of them must be non-zero. 111 112 ack: 113 The number of threads used to complete bios. Since 114 completing a bio calls an arbitrary completion function 115 outside the vdo volume, threads of this type allow the vdo 116 volume to continue processing requests even when bio 117 completion is slow. The default is 1. 118 119 bio: 120 The number of threads used to issue bios to the underlying 121 storage. Threads of this type allow the vdo volume to 122 continue processing requests even when bio submission is 123 slow. The default is 4. 124 125 bioRotationInterval: 126 The number of bios to enqueue on each bio thread before 127 switching to the next thread. The value must be greater 128 than 0 and not more than 1024; the default is 64. 129 130 cpu: 131 The number of threads used to do CPU-intensive work, such 132 as hashing and compression. The default is 1. 133 134 hash: 135 The number of threads used to manage data comparisons for 136 deduplication based on the hash value of data blocks. The 137 default is 0. 138 139 logical: 140 The number of threads used to manage caching and locking 141 based on the logical address of incoming bios. The default 142 is 0; the maximum is 60. 143 144 physical: 145 The number of threads used to manage administration of the 146 underlying storage device. At format time, a slab size for 147 the vdo is chosen; the vdo storage device must be large 148 enough to have at least 1 slab per physical thread. The 149 default is 0; the maximum is 16. 150 151Miscellaneous parameters: 152 153 maxDiscard: 154 The maximum size of discard bio accepted, in 4096-byte 155 blocks. I/O requests to a vdo volume are normally split 156 into 4096-byte blocks, and processed up to 2048 at a time. 157 However, discard requests to a vdo volume can be 158 automatically split to a larger size, up to <maxDiscard> 159 4096-byte blocks in a single bio, and are limited to 1500 160 at a time. Increasing this value may provide better overall 161 performance, at the cost of increased latency for the 162 individual discard requests. The default and minimum is 1; 163 the maximum is UINT_MAX / 4096. 164 165 deduplication: 166 Whether deduplication is enabled. The default is 'on'; the 167 acceptable values are 'on' and 'off'. 168 169 compression: 170 Whether compression is enabled. The default is 'off'; the 171 acceptable values are 'on' and 'off'. 172 173Device modification 174------------------- 175 176A modified table may be loaded into a running, non-suspended vdo volume. 177The modifications will take effect when the device is next resumed. The 178modifiable parameters are <logical device size>, <physical device size>, 179<maxDiscard>, <compression>, and <deduplication>. 180 181If the logical device size or physical device size are changed, upon 182successful resume vdo will store the new values and require them on future 183startups. These two parameters may not be decreased. The logical device 184size may not exceed 4 PB. The physical device size must increase by at 185least 32832 4096-byte blocks if at all, and must not exceed the size of the 186underlying storage device. Additionally, when formatting the vdo device, a 187slab size is chosen: the physical device size may never increase above the 188size which provides 8192 slabs, and each increase must be large enough to 189add at least one new slab. 190 191Examples: 192 193Start a previously-formatted vdo volume with 1 GB logical space and 1 GB 194physical space, storing to /dev/dm-1 which has more than 1 GB of space. 195 196:: 197 198 dmsetup create vdo0 --table \ 199 "0 2097152 vdo V4 /dev/dm-1 262144 4096 32768 16380" 200 201Grow the logical size to 4 GB. 202 203:: 204 205 dmsetup reload vdo0 --table \ 206 "0 8388608 vdo V4 /dev/dm-1 262144 4096 32768 16380" 207 dmsetup resume vdo0 208 209Grow the physical size to 2 GB. 210 211:: 212 213 dmsetup reload vdo0 --table \ 214 "0 8388608 vdo V4 /dev/dm-1 524288 4096 32768 16380" 215 dmsetup resume vdo0 216 217Grow the physical size by 1 GB more and increase max discard sectors. 218 219:: 220 221 dmsetup reload vdo0 --table \ 222 "0 10485760 vdo V4 /dev/dm-1 786432 4096 32768 16380 maxDiscard 8" 223 dmsetup resume vdo0 224 225Stop the vdo volume. 226 227:: 228 229 dmsetup remove vdo0 230 231Start the vdo volume again. Note that the logical and physical device sizes 232must still match, but other parameters can change. 233 234:: 235 236 dmsetup create vdo1 --table \ 237 "0 10485760 vdo V4 /dev/dm-1 786432 512 65550 5000 hash 1 logical 3 physical 2" 238 239Messages 240-------- 241All vdo devices accept messages in the form: 242 243:: 244 245 dmsetup message <target-name> 0 <message-name> <message-parameters> 246 247The messages are: 248 249 stats: 250 Outputs the current view of the vdo statistics. Mostly used 251 by the vdostats userspace program to interpret the output 252 buffer. 253 254 dump: 255 Dumps many internal structures to the system log. This is 256 not always safe to run, so it should only be used to debug 257 a hung vdo. Optional parameters to specify structures to 258 dump are: 259 260 viopool: The pool of I/O requests incoming bios 261 pools: A synonym of 'viopool' 262 vdo: Most of the structures managing on-disk data 263 queues: Basic information about each vdo thread 264 threads: A synonym of 'queues' 265 default: Equivalent to 'queues vdo' 266 all: All of the above. 267 268 dump-on-shutdown: 269 Perform a default dump next time vdo shuts down. 270 271 272Status 273------ 274 275:: 276 277 <device> <operating mode> <in recovery> <index state> 278 <compression state> <physical blocks used> <total physical blocks> 279 280 device: 281 The name of the vdo volume. 282 283 operating mode: 284 The current operating mode of the vdo volume; values may be 285 'normal', 'recovering' (the volume has detected an issue 286 with its metadata and is attempting to repair itself), and 287 'read-only' (an error has occurred that forces the vdo 288 volume to only support read operations and not writes). 289 290 in recovery: 291 Whether the vdo volume is currently in recovery mode; 292 values may be 'recovering' or '-' which indicates not 293 recovering. 294 295 index state: 296 The current state of the deduplication index in the vdo 297 volume; values may be 'closed', 'closing', 'error', 298 'offline', 'online', 'opening', and 'unknown'. 299 300 compression state: 301 The current state of compression in the vdo volume; values 302 may be 'offline' and 'online'. 303 304 used physical blocks: 305 The number of physical blocks in use by the vdo volume. 306 307 total physical blocks: 308 The total number of physical blocks the vdo volume may use; 309 the difference between this value and the 310 <used physical blocks> is the number of blocks the vdo 311 volume has left before being full. 312 313Memory Requirements 314=================== 315 316A vdo target requires a fixed 38 MB of RAM along with the following amounts 317that scale with the target: 318 319- 1.15 MB of RAM for each 1 MB of configured block map cache size. The 320 block map cache requires a minimum of 150 MB. 321- 1.6 MB of RAM for each 1 TB of logical space. 322- 268 MB of RAM for each 1 TB of physical storage managed by the volume. 323 324The deduplication index requires additional memory which scales with the 325size of the deduplication window. For dense indexes, the index requires 1 326GB of RAM per 1 TB of window. For sparse indexes, the index requires 1 GB 327of RAM per 10 TB of window. The index configuration is set when the target 328is formatted and may not be modified. 329 330Module Parameters 331================= 332 333The vdo driver has a numeric parameter 'log_level' which controls the 334verbosity of logging from the driver. The default setting is 6 335(LOGLEVEL_INFO and more severe messages). 336 337Run-time Usage 338============== 339 340When using dm-vdo, it is important to be aware of the ways in which its 341behavior differs from other storage targets. 342 343- There is no guarantee that over-writes of existing blocks will succeed. 344 Because the underlying storage may be multiply referenced, over-writing 345 an existing block generally requires a vdo to have a free block 346 available. 347 348- When blocks are no longer in use, sending a discard request for those 349 blocks lets the vdo release references for those blocks. If the vdo is 350 thinly provisioned, discarding unused blocks is essential to prevent the 351 target from running out of space. However, due to the sharing of 352 duplicate blocks, no discard request for any given logical block is 353 guaranteed to reclaim space. 354 355- Assuming the underlying storage properly implements flush requests, vdo 356 is resilient against crashes, however, unflushed writes may or may not 357 persist after a crash. 358 359- Each write to a vdo target entails a significant amount of processing. 360 However, much of the work is paralellizable. Therefore, vdo targets 361 achieve better throughput at higher I/O depths, and can support up 2048 362 requests in parallel. 363 364Tuning 365====== 366 367The vdo device has many options, and it can be difficult to make optimal 368choices without perfect knowledge of the workload. Additionally, most 369configuration options must be set when a vdo target is started, and cannot 370be changed without shutting it down completely; the configuration cannot be 371changed while the target is active. Ideally, tuning with simulated 372workloads should be performed before deploying vdo in production 373environments. 374 375The most important value to adjust is the block map cache size. In order to 376service a request for any logical address, a vdo must load the portion of 377the block map which holds the relevant mapping. These mappings are cached. 378Performance will suffer when the working set does not fit in the cache. By 379default, a vdo allocates 128 MB of metadata cache in RAM to support 380efficient access to 100 GB of logical space at a time. It should be scaled 381up proportionally for larger working sets. 382 383The logical and physical thread counts should also be adjusted. A logical 384thread controls a disjoint section of the block map, so additional logical 385threads increase parallelism and can increase throughput. Physical threads 386control a disjoint section of the data blocks, so additional physical 387threads can also increase throughput. However, excess threads can waste 388resources and increase contention. 389 390Bio submission threads control the parallelism involved in sending I/O to 391the underlying storage; fewer threads mean there is more opportunity to 392reorder I/O requests for performance benefit, but also that each I/O 393request has to wait longer before being submitted. 394 395Bio acknowledgment threads are used for finishing I/O requests. This is 396done on dedicated threads since the amount of work required to execute a 397bio's callback can not be controlled by the vdo itself. Usually one thread 398is sufficient but additional threads may be beneficial, particularly when 399bios have CPU-heavy callbacks. 400 401CPU threads are used for hashing and for compression; in workloads with 402compression enabled, more threads may result in higher throughput. 403 404Hash threads are used to sort active requests by hash and determine whether 405they should deduplicate; the most CPU intensive actions done by these 406threads are comparison of 4096-byte data blocks. In most cases, a single 407hash thread is sufficient. 408