xref: /linux/Documentation/admin-guide/device-mapper/vdo.rst (revision 68a052239fc4b351e961f698b824f7654a346091)
1.. SPDX-License-Identifier: GPL-2.0-only
2
3======
4dm-vdo
5======
6
7The dm-vdo (virtual data optimizer) device mapper target provides
8block-level deduplication, compression, and thin provisioning. As a device
9mapper target, it can add these features to the storage stack, compatible
10with any file system. The vdo target does not protect against data
11corruption, relying instead on integrity protection of the storage below
12it. It is strongly recommended that lvm be used to manage vdo volumes. See
13lvmvdo(7).
14
15Userspace component
16===================
17
18Formatting a vdo volume requires the use of the 'vdoformat' tool, available
19at:
20
21https://github.com/dm-vdo/vdo/
22
23In most cases, a vdo target will recover from a crash automatically the
24next time it is started. In cases where it encountered an unrecoverable
25error (either during normal operation or crash recovery) the target will
26enter or come up in read-only mode. Because read-only mode is indicative of
27data-loss, a positive action must be taken to bring vdo out of read-only
28mode. The 'vdoforcerebuild' tool, available from the same repo, is used to
29prepare a read-only vdo to exit read-only mode. After running this tool,
30the vdo target will rebuild its metadata the next time it is
31started. Although some data may be lost, the rebuilt vdo's metadata will be
32internally consistent and the target will be writable again.
33
34The repo also contains additional userspace tools which can be used to
35inspect a vdo target's on-disk metadata. Fortunately, these tools are
36rarely needed except by dm-vdo developers.
37
38Metadata requirements
39=====================
40
41Each vdo volume reserves 3GB of space for metadata, or more depending on
42its configuration. It is helpful to check that the space saved by
43deduplication and compression is not cancelled out by the metadata
44requirements. An estimation of the space saved for a specific dataset can
45be computed with the vdo estimator tool, which is available at:
46
47https://github.com/dm-vdo/vdoestimator/
48
49Target interface
50================
51
52Table line
53----------
54
55::
56
57	<offset> <logical device size> vdo V4 <storage device>
58	<storage device size> <minimum I/O size> <block map cache size>
59	<block map era length> [optional arguments]
60
61
62Required parameters:
63
64	offset:
65		The offset, in sectors, at which the vdo volume's logical
66		space begins.
67
68	logical device size:
69		The size of the device which the vdo volume will service,
70		in sectors. Must match the current logical size of the vdo
71		volume.
72
73	storage device:
74		The device holding the vdo volume's data and metadata.
75
76	storage device size:
77		The size of the device holding the vdo volume, as a number
78		of 4096-byte blocks. Must match the current size of the vdo
79		volume.
80
81	minimum I/O size:
82		The minimum I/O size for this vdo volume to accept, in
83		bytes. Valid values are 512 or 4096. The recommended value
84		is 4096.
85
86	block map cache size:
87		The size of the block map cache, as a number of 4096-byte
88		blocks. The minimum and recommended value is 32768 blocks.
89		If the logical thread count is non-zero, the cache size
90		must be at least 4096 blocks per logical thread.
91
92	block map era length:
93		The speed with which the block map cache writes out
94		modified block map pages. A smaller era length is likely to
95		reduce the amount of time spent rebuilding, at the cost of
96		increased block map writes during normal operation. The
97		maximum and recommended value is 16380; the minimum value
98		is 1.
99
100Optional parameters:
101--------------------
102Some or all of these parameters may be specified as <key> <value> pairs.
103
104Thread related parameters:
105
106Different categories of work are assigned to separate thread groups, and
107the number of threads in each group can be configured separately.
108
109If <hash>, <logical>, and <physical> are all set to 0, the work handled by
110all three thread types will be handled by a single thread. If any of these
111values are non-zero, all of them must be non-zero.
112
113	ack:
114		The number of threads used to complete bios. Since
115		completing a bio calls an arbitrary completion function
116		outside the vdo volume, threads of this type allow the vdo
117		volume to continue processing requests even when bio
118		completion is slow. The default is 1.
119
120	bio:
121		The number of threads used to issue bios to the underlying
122		storage. Threads of this type allow the vdo volume to
123		continue processing requests even when bio submission is
124		slow. The default is 4.
125
126	bioRotationInterval:
127		The number of bios to enqueue on each bio thread before
128		switching to the next thread. The value must be greater
129		than 0 and not more than 1024; the default is 64.
130
131	cpu:
132		The number of threads used to do CPU-intensive work, such
133		as hashing and compression. The default is 1.
134
135	hash:
136		The number of threads used to manage data comparisons for
137		deduplication based on the hash value of data blocks. The
138		default is 0.
139
140	logical:
141		The number of threads used to manage caching and locking
142		based on the logical address of incoming bios. The default
143		is 0; the maximum is 60.
144
145	physical:
146		The number of threads used to manage administration of the
147		underlying storage device. At format time, a slab size for
148		the vdo is chosen; the vdo storage device must be large
149		enough to have at least 1 slab per physical thread. The
150		default is 0; the maximum is 16.
151
152Miscellaneous parameters:
153
154	maxDiscard:
155		The maximum size of discard bio accepted, in 4096-byte
156		blocks. I/O requests to a vdo volume are normally split
157		into 4096-byte blocks, and processed up to 2048 at a time.
158		However, discard requests to a vdo volume can be
159		automatically split to a larger size, up to <maxDiscard>
160		4096-byte blocks in a single bio, and are limited to 1500
161		at a time. Increasing this value may provide better overall
162		performance, at the cost of increased latency for the
163		individual discard requests. The default and minimum is 1;
164		the maximum is UINT_MAX / 4096.
165
166	deduplication:
167		Whether deduplication is enabled. The default is 'on'; the
168		acceptable values are 'on' and 'off'.
169
170	compression:
171		Whether compression is enabled. The default is 'off'; the
172		acceptable values are 'on' and 'off'.
173
174Device modification
175-------------------
176
177A modified table may be loaded into a running, non-suspended vdo volume.
178The modifications will take effect when the device is next resumed. The
179modifiable parameters are <logical device size>, <physical device size>,
180<maxDiscard>, <compression>, and <deduplication>.
181
182If the logical device size or physical device size are changed, upon
183successful resume vdo will store the new values and require them on future
184startups. These two parameters may not be decreased. The logical device
185size may not exceed 4 PB. The physical device size must increase by at
186least 32832 4096-byte blocks if at all, and must not exceed the size of the
187underlying storage device. Additionally, when formatting the vdo device, a
188slab size is chosen: the physical device size may never increase above the
189size which provides 8192 slabs, and each increase must be large enough to
190add at least one new slab.
191
192Examples:
193
194Start a previously-formatted vdo volume with 1 GB logical space and 1 GB
195physical space, storing to /dev/dm-1 which has more than 1 GB of space.
196
197::
198
199	dmsetup create vdo0 --table \
200	"0 2097152 vdo V4 /dev/dm-1 262144 4096 32768 16380"
201
202Grow the logical size to 4 GB.
203
204::
205
206	dmsetup reload vdo0 --table \
207	"0 8388608 vdo V4 /dev/dm-1 262144 4096 32768 16380"
208	dmsetup resume vdo0
209
210Grow the physical size to 2 GB.
211
212::
213
214	dmsetup reload vdo0 --table \
215	"0 8388608 vdo V4 /dev/dm-1 524288 4096 32768 16380"
216	dmsetup resume vdo0
217
218Grow the physical size by 1 GB more and increase max discard sectors.
219
220::
221
222	dmsetup reload vdo0 --table \
223	"0 10485760 vdo V4 /dev/dm-1 786432 4096 32768 16380 maxDiscard 8"
224	dmsetup resume vdo0
225
226Stop the vdo volume.
227
228::
229
230	dmsetup remove vdo0
231
232Start the vdo volume again. Note that the logical and physical device sizes
233must still match, but other parameters can change.
234
235::
236
237	dmsetup create vdo1 --table \
238	"0 10485760 vdo V4 /dev/dm-1 786432 512 65550 5000 hash 1 logical 3 physical 2"
239
240Messages
241--------
242All vdo devices accept messages in the form:
243
244::
245
246        dmsetup message <target-name> 0 <message-name> <message-parameters>
247
248The messages are:
249
250        stats:
251		Outputs the current view of the vdo statistics. Mostly used
252		by the vdostats userspace program to interpret the output
253		buffer.
254
255	config:
256		Outputs useful vdo configuration information. Mostly used
257		by users who want to recreate a similar VDO volume and
258		want to know the creation configuration used.
259
260	dump:
261		Dumps many internal structures to the system log. This is
262		not always safe to run, so it should only be used to debug
263		a hung vdo. Optional parameters to specify structures to
264		dump are:
265
266			viopool: The pool of I/O requests incoming bios
267			pools: A synonym of 'viopool'
268			vdo: Most of the structures managing on-disk data
269			queues: Basic information about each vdo thread
270			threads: A synonym of 'queues'
271			default: Equivalent to 'queues vdo'
272			all: All of the above.
273
274        dump-on-shutdown:
275		Perform a default dump next time vdo shuts down.
276
277
278Status
279------
280
281::
282
283    <device> <operating mode> <in recovery> <index state>
284    <compression state> <physical blocks used> <total physical blocks>
285
286	device:
287		The name of the vdo volume.
288
289	operating mode:
290		The current operating mode of the vdo volume; values may be
291		'normal', 'recovering' (the volume has detected an issue
292		with its metadata and is attempting to repair itself), and
293		'read-only' (an error has occurred that forces the vdo
294		volume to only support read operations and not writes).
295
296	in recovery:
297		Whether the vdo volume is currently in recovery mode;
298		values may be 'recovering' or '-' which indicates not
299		recovering.
300
301	index state:
302		The current state of the deduplication index in the vdo
303		volume; values may be 'closed', 'closing', 'error',
304		'offline', 'online', 'opening', and 'unknown'.
305
306	compression state:
307		The current state of compression in the vdo volume; values
308		may be 'offline' and 'online'.
309
310	used physical blocks:
311		The number of physical blocks in use by the vdo volume.
312
313	total physical blocks:
314		The total number of physical blocks the vdo volume may use;
315		the difference between this value and the
316		<used physical blocks> is the number of blocks the vdo
317		volume has left before being full.
318
319Memory Requirements
320===================
321
322A vdo target requires a fixed 38 MB of RAM along with the following amounts
323that scale with the target:
324
325- 1.15 MB of RAM for each 1 MB of configured block map cache size. The
326  block map cache requires a minimum of 150 MB.
327- 1.6 MB of RAM for each 1 TB of logical space.
328- 268 MB of RAM for each 1 TB of physical storage managed by the volume.
329
330The deduplication index requires additional memory which scales with the
331size of the deduplication window. For dense indexes, the index requires 1
332GB of RAM per 1 TB of window. For sparse indexes, the index requires 1 GB
333of RAM per 10 TB of window. The index configuration is set when the target
334is formatted and may not be modified.
335
336Module Parameters
337=================
338
339The vdo driver has a numeric parameter 'log_level' which controls the
340verbosity of logging from the driver. The default setting is 6
341(LOGLEVEL_INFO and more severe messages).
342
343Run-time Usage
344==============
345
346When using dm-vdo, it is important to be aware of the ways in which its
347behavior differs from other storage targets.
348
349- There is no guarantee that over-writes of existing blocks will succeed.
350  Because the underlying storage may be multiply referenced, over-writing
351  an existing block generally requires a vdo to have a free block
352  available.
353
354- When blocks are no longer in use, sending a discard request for those
355  blocks lets the vdo release references for those blocks. If the vdo is
356  thinly provisioned, discarding unused blocks is essential to prevent the
357  target from running out of space. However, due to the sharing of
358  duplicate blocks, no discard request for any given logical block is
359  guaranteed to reclaim space.
360
361- Assuming the underlying storage properly implements flush requests, vdo
362  is resilient against crashes, however, unflushed writes may or may not
363  persist after a crash.
364
365- Each write to a vdo target entails a significant amount of processing.
366  However, much of the work is paralellizable. Therefore, vdo targets
367  achieve better throughput at higher I/O depths, and can support up 2048
368  requests in parallel.
369
370Tuning
371======
372
373The vdo device has many options, and it can be difficult to make optimal
374choices without perfect knowledge of the workload. Additionally, most
375configuration options must be set when a vdo target is started, and cannot
376be changed without shutting it down completely; the configuration cannot be
377changed while the target is active. Ideally, tuning with simulated
378workloads should be performed before deploying vdo in production
379environments.
380
381The most important value to adjust is the block map cache size. In order to
382service a request for any logical address, a vdo must load the portion of
383the block map which holds the relevant mapping. These mappings are cached.
384Performance will suffer when the working set does not fit in the cache. By
385default, a vdo allocates 128 MB of metadata cache in RAM to support
386efficient access to 100 GB of logical space at a time. It should be scaled
387up proportionally for larger working sets.
388
389The logical and physical thread counts should also be adjusted. A logical
390thread controls a disjoint section of the block map, so additional logical
391threads increase parallelism and can increase throughput. Physical threads
392control a disjoint section of the data blocks, so additional physical
393threads can also increase throughput. However, excess threads can waste
394resources and increase contention.
395
396Bio submission threads control the parallelism involved in sending I/O to
397the underlying storage; fewer threads mean there is more opportunity to
398reorder I/O requests for performance benefit, but also that each I/O
399request has to wait longer before being submitted.
400
401Bio acknowledgment threads are used for finishing I/O requests. This is
402done on dedicated threads since the amount of work required to execute a
403bio's callback can not be controlled by the vdo itself. Usually one thread
404is sufficient but additional threads may be beneficial, particularly when
405bios have CPU-heavy callbacks.
406
407CPU threads are used for hashing and for compression; in workloads with
408compression enabled, more threads may result in higher throughput.
409
410Hash threads are used to sort active requests by hash and determine whether
411they should deduplicate; the most CPU intensive actions done by these
412threads are comparison of 4096-byte data blocks. In most cases, a single
413hash thread is sufficient.
414