xref: /linux/Documentation/admin-guide/device-mapper/cache.rst (revision 0898782247ae533d1f4e47a06bc5d4870931b284)
1*6cf2a73cSMauro Carvalho Chehab=====
2*6cf2a73cSMauro Carvalho ChehabCache
3*6cf2a73cSMauro Carvalho Chehab=====
4*6cf2a73cSMauro Carvalho Chehab
5*6cf2a73cSMauro Carvalho ChehabIntroduction
6*6cf2a73cSMauro Carvalho Chehab============
7*6cf2a73cSMauro Carvalho Chehab
8*6cf2a73cSMauro Carvalho Chehabdm-cache is a device mapper target written by Joe Thornber, Heinz
9*6cf2a73cSMauro Carvalho ChehabMauelshagen, and Mike Snitzer.
10*6cf2a73cSMauro Carvalho Chehab
11*6cf2a73cSMauro Carvalho ChehabIt aims to improve performance of a block device (eg, a spindle) by
12*6cf2a73cSMauro Carvalho Chehabdynamically migrating some of its data to a faster, smaller device
13*6cf2a73cSMauro Carvalho Chehab(eg, an SSD).
14*6cf2a73cSMauro Carvalho Chehab
15*6cf2a73cSMauro Carvalho ChehabThis device-mapper solution allows us to insert this caching at
16*6cf2a73cSMauro Carvalho Chehabdifferent levels of the dm stack, for instance above the data device for
17*6cf2a73cSMauro Carvalho Chehaba thin-provisioning pool.  Caching solutions that are integrated more
18*6cf2a73cSMauro Carvalho Chehabclosely with the virtual memory system should give better performance.
19*6cf2a73cSMauro Carvalho Chehab
20*6cf2a73cSMauro Carvalho ChehabThe target reuses the metadata library used in the thin-provisioning
21*6cf2a73cSMauro Carvalho Chehablibrary.
22*6cf2a73cSMauro Carvalho Chehab
23*6cf2a73cSMauro Carvalho ChehabThe decision as to what data to migrate and when is left to a plug-in
24*6cf2a73cSMauro Carvalho Chehabpolicy module.  Several of these have been written as we experiment,
25*6cf2a73cSMauro Carvalho Chehaband we hope other people will contribute others for specific io
26*6cf2a73cSMauro Carvalho Chehabscenarios (eg. a vm image server).
27*6cf2a73cSMauro Carvalho Chehab
28*6cf2a73cSMauro Carvalho ChehabGlossary
29*6cf2a73cSMauro Carvalho Chehab========
30*6cf2a73cSMauro Carvalho Chehab
31*6cf2a73cSMauro Carvalho Chehab  Migration
32*6cf2a73cSMauro Carvalho Chehab	       Movement of the primary copy of a logical block from one
33*6cf2a73cSMauro Carvalho Chehab	       device to the other.
34*6cf2a73cSMauro Carvalho Chehab  Promotion
35*6cf2a73cSMauro Carvalho Chehab	       Migration from slow device to fast device.
36*6cf2a73cSMauro Carvalho Chehab  Demotion
37*6cf2a73cSMauro Carvalho Chehab	       Migration from fast device to slow device.
38*6cf2a73cSMauro Carvalho Chehab
39*6cf2a73cSMauro Carvalho ChehabThe origin device always contains a copy of the logical block, which
40*6cf2a73cSMauro Carvalho Chehabmay be out of date or kept in sync with the copy on the cache device
41*6cf2a73cSMauro Carvalho Chehab(depending on policy).
42*6cf2a73cSMauro Carvalho Chehab
43*6cf2a73cSMauro Carvalho ChehabDesign
44*6cf2a73cSMauro Carvalho Chehab======
45*6cf2a73cSMauro Carvalho Chehab
46*6cf2a73cSMauro Carvalho ChehabSub-devices
47*6cf2a73cSMauro Carvalho Chehab-----------
48*6cf2a73cSMauro Carvalho Chehab
49*6cf2a73cSMauro Carvalho ChehabThe target is constructed by passing three devices to it (along with
50*6cf2a73cSMauro Carvalho Chehabother parameters detailed later):
51*6cf2a73cSMauro Carvalho Chehab
52*6cf2a73cSMauro Carvalho Chehab1. An origin device - the big, slow one.
53*6cf2a73cSMauro Carvalho Chehab
54*6cf2a73cSMauro Carvalho Chehab2. A cache device - the small, fast one.
55*6cf2a73cSMauro Carvalho Chehab
56*6cf2a73cSMauro Carvalho Chehab3. A small metadata device - records which blocks are in the cache,
57*6cf2a73cSMauro Carvalho Chehab   which are dirty, and extra hints for use by the policy object.
58*6cf2a73cSMauro Carvalho Chehab   This information could be put on the cache device, but having it
59*6cf2a73cSMauro Carvalho Chehab   separate allows the volume manager to configure it differently,
60*6cf2a73cSMauro Carvalho Chehab   e.g. as a mirror for extra robustness.  This metadata device may only
61*6cf2a73cSMauro Carvalho Chehab   be used by a single cache device.
62*6cf2a73cSMauro Carvalho Chehab
63*6cf2a73cSMauro Carvalho ChehabFixed block size
64*6cf2a73cSMauro Carvalho Chehab----------------
65*6cf2a73cSMauro Carvalho Chehab
66*6cf2a73cSMauro Carvalho ChehabThe origin is divided up into blocks of a fixed size.  This block size
67*6cf2a73cSMauro Carvalho Chehabis configurable when you first create the cache.  Typically we've been
68*6cf2a73cSMauro Carvalho Chehabusing block sizes of 256KB - 1024KB.  The block size must be between 64
69*6cf2a73cSMauro Carvalho Chehabsectors (32KB) and 2097152 sectors (1GB) and a multiple of 64 sectors (32KB).
70*6cf2a73cSMauro Carvalho Chehab
71*6cf2a73cSMauro Carvalho ChehabHaving a fixed block size simplifies the target a lot.  But it is
72*6cf2a73cSMauro Carvalho Chehabsomething of a compromise.  For instance, a small part of a block may be
73*6cf2a73cSMauro Carvalho Chehabgetting hit a lot, yet the whole block will be promoted to the cache.
74*6cf2a73cSMauro Carvalho ChehabSo large block sizes are bad because they waste cache space.  And small
75*6cf2a73cSMauro Carvalho Chehabblock sizes are bad because they increase the amount of metadata (both
76*6cf2a73cSMauro Carvalho Chehabin core and on disk).
77*6cf2a73cSMauro Carvalho Chehab
78*6cf2a73cSMauro Carvalho ChehabCache operating modes
79*6cf2a73cSMauro Carvalho Chehab---------------------
80*6cf2a73cSMauro Carvalho Chehab
81*6cf2a73cSMauro Carvalho ChehabThe cache has three operating modes: writeback, writethrough and
82*6cf2a73cSMauro Carvalho Chehabpassthrough.
83*6cf2a73cSMauro Carvalho Chehab
84*6cf2a73cSMauro Carvalho ChehabIf writeback, the default, is selected then a write to a block that is
85*6cf2a73cSMauro Carvalho Chehabcached will go only to the cache and the block will be marked dirty in
86*6cf2a73cSMauro Carvalho Chehabthe metadata.
87*6cf2a73cSMauro Carvalho Chehab
88*6cf2a73cSMauro Carvalho ChehabIf writethrough is selected then a write to a cached block will not
89*6cf2a73cSMauro Carvalho Chehabcomplete until it has hit both the origin and cache devices.  Clean
90*6cf2a73cSMauro Carvalho Chehabblocks should remain clean.
91*6cf2a73cSMauro Carvalho Chehab
92*6cf2a73cSMauro Carvalho ChehabIf passthrough is selected, useful when the cache contents are not known
93*6cf2a73cSMauro Carvalho Chehabto be coherent with the origin device, then all reads are served from
94*6cf2a73cSMauro Carvalho Chehabthe origin device (all reads miss the cache) and all writes are
95*6cf2a73cSMauro Carvalho Chehabforwarded to the origin device; additionally, write hits cause cache
96*6cf2a73cSMauro Carvalho Chehabblock invalidates.  To enable passthrough mode the cache must be clean.
97*6cf2a73cSMauro Carvalho ChehabPassthrough mode allows a cache device to be activated without having to
98*6cf2a73cSMauro Carvalho Chehabworry about coherency.  Coherency that exists is maintained, although
99*6cf2a73cSMauro Carvalho Chehabthe cache will gradually cool as writes take place.  If the coherency of
100*6cf2a73cSMauro Carvalho Chehabthe cache can later be verified, or established through use of the
101*6cf2a73cSMauro Carvalho Chehab"invalidate_cblocks" message, the cache device can be transitioned to
102*6cf2a73cSMauro Carvalho Chehabwritethrough or writeback mode while still warm.  Otherwise, the cache
103*6cf2a73cSMauro Carvalho Chehabcontents can be discarded prior to transitioning to the desired
104*6cf2a73cSMauro Carvalho Chehaboperating mode.
105*6cf2a73cSMauro Carvalho Chehab
106*6cf2a73cSMauro Carvalho ChehabA simple cleaner policy is provided, which will clean (write back) all
107*6cf2a73cSMauro Carvalho Chehabdirty blocks in a cache.  Useful for decommissioning a cache or when
108*6cf2a73cSMauro Carvalho Chehabshrinking a cache.  Shrinking the cache's fast device requires all cache
109*6cf2a73cSMauro Carvalho Chehabblocks, in the area of the cache being removed, to be clean.  If the
110*6cf2a73cSMauro Carvalho Chehabarea being removed from the cache still contains dirty blocks the resize
111*6cf2a73cSMauro Carvalho Chehabwill fail.  Care must be taken to never reduce the volume used for the
112*6cf2a73cSMauro Carvalho Chehabcache's fast device until the cache is clean.  This is of particular
113*6cf2a73cSMauro Carvalho Chehabimportance if writeback mode is used.  Writethrough and passthrough
114*6cf2a73cSMauro Carvalho Chehabmodes already maintain a clean cache.  Future support to partially clean
115*6cf2a73cSMauro Carvalho Chehabthe cache, above a specified threshold, will allow for keeping the cache
116*6cf2a73cSMauro Carvalho Chehabwarm and in writeback mode during resize.
117*6cf2a73cSMauro Carvalho Chehab
118*6cf2a73cSMauro Carvalho ChehabMigration throttling
119*6cf2a73cSMauro Carvalho Chehab--------------------
120*6cf2a73cSMauro Carvalho Chehab
121*6cf2a73cSMauro Carvalho ChehabMigrating data between the origin and cache device uses bandwidth.
122*6cf2a73cSMauro Carvalho ChehabThe user can set a throttle to prevent more than a certain amount of
123*6cf2a73cSMauro Carvalho Chehabmigration occurring at any one time.  Currently we're not taking any
124*6cf2a73cSMauro Carvalho Chehabaccount of normal io traffic going to the devices.  More work needs
125*6cf2a73cSMauro Carvalho Chehabdoing here to avoid migrating during those peak io moments.
126*6cf2a73cSMauro Carvalho Chehab
127*6cf2a73cSMauro Carvalho ChehabFor the time being, a message "migration_threshold <#sectors>"
128*6cf2a73cSMauro Carvalho Chehabcan be used to set the maximum number of sectors being migrated,
129*6cf2a73cSMauro Carvalho Chehabthe default being 2048 sectors (1MB).
130*6cf2a73cSMauro Carvalho Chehab
131*6cf2a73cSMauro Carvalho ChehabUpdating on-disk metadata
132*6cf2a73cSMauro Carvalho Chehab-------------------------
133*6cf2a73cSMauro Carvalho Chehab
134*6cf2a73cSMauro Carvalho ChehabOn-disk metadata is committed every time a FLUSH or FUA bio is written.
135*6cf2a73cSMauro Carvalho ChehabIf no such requests are made then commits will occur every second.  This
136*6cf2a73cSMauro Carvalho Chehabmeans the cache behaves like a physical disk that has a volatile write
137*6cf2a73cSMauro Carvalho Chehabcache.  If power is lost you may lose some recent writes.  The metadata
138*6cf2a73cSMauro Carvalho Chehabshould always be consistent in spite of any crash.
139*6cf2a73cSMauro Carvalho Chehab
140*6cf2a73cSMauro Carvalho ChehabThe 'dirty' state for a cache block changes far too frequently for us
141*6cf2a73cSMauro Carvalho Chehabto keep updating it on the fly.  So we treat it as a hint.  In normal
142*6cf2a73cSMauro Carvalho Chehaboperation it will be written when the dm device is suspended.  If the
143*6cf2a73cSMauro Carvalho Chehabsystem crashes all cache blocks will be assumed dirty when restarted.
144*6cf2a73cSMauro Carvalho Chehab
145*6cf2a73cSMauro Carvalho ChehabPer-block policy hints
146*6cf2a73cSMauro Carvalho Chehab----------------------
147*6cf2a73cSMauro Carvalho Chehab
148*6cf2a73cSMauro Carvalho ChehabPolicy plug-ins can store a chunk of data per cache block.  It's up to
149*6cf2a73cSMauro Carvalho Chehabthe policy how big this chunk is, but it should be kept small.  Like the
150*6cf2a73cSMauro Carvalho Chehabdirty flags this data is lost if there's a crash so a safe fallback
151*6cf2a73cSMauro Carvalho Chehabvalue should always be possible.
152*6cf2a73cSMauro Carvalho Chehab
153*6cf2a73cSMauro Carvalho ChehabPolicy hints affect performance, not correctness.
154*6cf2a73cSMauro Carvalho Chehab
155*6cf2a73cSMauro Carvalho ChehabPolicy messaging
156*6cf2a73cSMauro Carvalho Chehab----------------
157*6cf2a73cSMauro Carvalho Chehab
158*6cf2a73cSMauro Carvalho ChehabPolicies will have different tunables, specific to each one, so we
159*6cf2a73cSMauro Carvalho Chehabneed a generic way of getting and setting these.  Device-mapper
160*6cf2a73cSMauro Carvalho Chehabmessages are used.  Refer to cache-policies.txt.
161*6cf2a73cSMauro Carvalho Chehab
162*6cf2a73cSMauro Carvalho ChehabDiscard bitset resolution
163*6cf2a73cSMauro Carvalho Chehab-------------------------
164*6cf2a73cSMauro Carvalho Chehab
165*6cf2a73cSMauro Carvalho ChehabWe can avoid copying data during migration if we know the block has
166*6cf2a73cSMauro Carvalho Chehabbeen discarded.  A prime example of this is when mkfs discards the
167*6cf2a73cSMauro Carvalho Chehabwhole block device.  We store a bitset tracking the discard state of
168*6cf2a73cSMauro Carvalho Chehabblocks.  However, we allow this bitset to have a different block size
169*6cf2a73cSMauro Carvalho Chehabfrom the cache blocks.  This is because we need to track the discard
170*6cf2a73cSMauro Carvalho Chehabstate for all of the origin device (compare with the dirty bitset
171*6cf2a73cSMauro Carvalho Chehabwhich is just for the smaller cache device).
172*6cf2a73cSMauro Carvalho Chehab
173*6cf2a73cSMauro Carvalho ChehabTarget interface
174*6cf2a73cSMauro Carvalho Chehab================
175*6cf2a73cSMauro Carvalho Chehab
176*6cf2a73cSMauro Carvalho ChehabConstructor
177*6cf2a73cSMauro Carvalho Chehab-----------
178*6cf2a73cSMauro Carvalho Chehab
179*6cf2a73cSMauro Carvalho Chehab  ::
180*6cf2a73cSMauro Carvalho Chehab
181*6cf2a73cSMauro Carvalho Chehab   cache <metadata dev> <cache dev> <origin dev> <block size>
182*6cf2a73cSMauro Carvalho Chehab         <#feature args> [<feature arg>]*
183*6cf2a73cSMauro Carvalho Chehab         <policy> <#policy args> [policy args]*
184*6cf2a73cSMauro Carvalho Chehab
185*6cf2a73cSMauro Carvalho Chehab ================ =======================================================
186*6cf2a73cSMauro Carvalho Chehab metadata dev     fast device holding the persistent metadata
187*6cf2a73cSMauro Carvalho Chehab cache dev	  fast device holding cached data blocks
188*6cf2a73cSMauro Carvalho Chehab origin dev	  slow device holding original data blocks
189*6cf2a73cSMauro Carvalho Chehab block size       cache unit size in sectors
190*6cf2a73cSMauro Carvalho Chehab
191*6cf2a73cSMauro Carvalho Chehab #feature args    number of feature arguments passed
192*6cf2a73cSMauro Carvalho Chehab feature args     writethrough or passthrough (The default is writeback.)
193*6cf2a73cSMauro Carvalho Chehab
194*6cf2a73cSMauro Carvalho Chehab policy           the replacement policy to use
195*6cf2a73cSMauro Carvalho Chehab #policy args     an even number of arguments corresponding to
196*6cf2a73cSMauro Carvalho Chehab                  key/value pairs passed to the policy
197*6cf2a73cSMauro Carvalho Chehab policy args      key/value pairs passed to the policy
198*6cf2a73cSMauro Carvalho Chehab		  E.g. 'sequential_threshold 1024'
199*6cf2a73cSMauro Carvalho Chehab		  See cache-policies.txt for details.
200*6cf2a73cSMauro Carvalho Chehab ================ =======================================================
201*6cf2a73cSMauro Carvalho Chehab
202*6cf2a73cSMauro Carvalho ChehabOptional feature arguments are:
203*6cf2a73cSMauro Carvalho Chehab
204*6cf2a73cSMauro Carvalho Chehab
205*6cf2a73cSMauro Carvalho Chehab   ==================== ========================================================
206*6cf2a73cSMauro Carvalho Chehab   writethrough		write through caching that prohibits cache block
207*6cf2a73cSMauro Carvalho Chehab			content from being different from origin block content.
208*6cf2a73cSMauro Carvalho Chehab			Without this argument, the default behaviour is to write
209*6cf2a73cSMauro Carvalho Chehab			back cache block contents later for performance reasons,
210*6cf2a73cSMauro Carvalho Chehab			so they may differ from the corresponding origin blocks.
211*6cf2a73cSMauro Carvalho Chehab
212*6cf2a73cSMauro Carvalho Chehab   passthrough		a degraded mode useful for various cache coherency
213*6cf2a73cSMauro Carvalho Chehab			situations (e.g., rolling back snapshots of
214*6cf2a73cSMauro Carvalho Chehab			underlying storage).	 Reads and writes always go to
215*6cf2a73cSMauro Carvalho Chehab			the origin.	If a write goes to a cached origin
216*6cf2a73cSMauro Carvalho Chehab			block, then the cache block is invalidated.
217*6cf2a73cSMauro Carvalho Chehab			To enable passthrough mode the cache must be clean.
218*6cf2a73cSMauro Carvalho Chehab
219*6cf2a73cSMauro Carvalho Chehab   metadata2		use version 2 of the metadata.  This stores the dirty
220*6cf2a73cSMauro Carvalho Chehab			bits in a separate btree, which improves speed of
221*6cf2a73cSMauro Carvalho Chehab			shutting down the cache.
222*6cf2a73cSMauro Carvalho Chehab
223*6cf2a73cSMauro Carvalho Chehab   no_discard_passdown	disable passing down discards from the cache
224*6cf2a73cSMauro Carvalho Chehab			to the origin's data device.
225*6cf2a73cSMauro Carvalho Chehab   ==================== ========================================================
226*6cf2a73cSMauro Carvalho Chehab
227*6cf2a73cSMauro Carvalho ChehabA policy called 'default' is always registered.  This is an alias for
228*6cf2a73cSMauro Carvalho Chehabthe policy we currently think is giving best all round performance.
229*6cf2a73cSMauro Carvalho Chehab
230*6cf2a73cSMauro Carvalho ChehabAs the default policy could vary between kernels, if you are relying on
231*6cf2a73cSMauro Carvalho Chehabthe characteristics of a specific policy, always request it by name.
232*6cf2a73cSMauro Carvalho Chehab
233*6cf2a73cSMauro Carvalho ChehabStatus
234*6cf2a73cSMauro Carvalho Chehab------
235*6cf2a73cSMauro Carvalho Chehab
236*6cf2a73cSMauro Carvalho Chehab::
237*6cf2a73cSMauro Carvalho Chehab
238*6cf2a73cSMauro Carvalho Chehab  <metadata block size> <#used metadata blocks>/<#total metadata blocks>
239*6cf2a73cSMauro Carvalho Chehab  <cache block size> <#used cache blocks>/<#total cache blocks>
240*6cf2a73cSMauro Carvalho Chehab  <#read hits> <#read misses> <#write hits> <#write misses>
241*6cf2a73cSMauro Carvalho Chehab  <#demotions> <#promotions> <#dirty> <#features> <features>*
242*6cf2a73cSMauro Carvalho Chehab  <#core args> <core args>* <policy name> <#policy args> <policy args>*
243*6cf2a73cSMauro Carvalho Chehab  <cache metadata mode>
244*6cf2a73cSMauro Carvalho Chehab
245*6cf2a73cSMauro Carvalho Chehab
246*6cf2a73cSMauro Carvalho Chehab========================= =====================================================
247*6cf2a73cSMauro Carvalho Chehabmetadata block size	  Fixed block size for each metadata block in
248*6cf2a73cSMauro Carvalho Chehab			  sectors
249*6cf2a73cSMauro Carvalho Chehab#used metadata blocks	  Number of metadata blocks used
250*6cf2a73cSMauro Carvalho Chehab#total metadata blocks	  Total number of metadata blocks
251*6cf2a73cSMauro Carvalho Chehabcache block size	  Configurable block size for the cache device
252*6cf2a73cSMauro Carvalho Chehab			  in sectors
253*6cf2a73cSMauro Carvalho Chehab#used cache blocks	  Number of blocks resident in the cache
254*6cf2a73cSMauro Carvalho Chehab#total cache blocks	  Total number of cache blocks
255*6cf2a73cSMauro Carvalho Chehab#read hits		  Number of times a READ bio has been mapped
256*6cf2a73cSMauro Carvalho Chehab			  to the cache
257*6cf2a73cSMauro Carvalho Chehab#read misses		  Number of times a READ bio has been mapped
258*6cf2a73cSMauro Carvalho Chehab			  to the origin
259*6cf2a73cSMauro Carvalho Chehab#write hits		  Number of times a WRITE bio has been mapped
260*6cf2a73cSMauro Carvalho Chehab			  to the cache
261*6cf2a73cSMauro Carvalho Chehab#write misses		  Number of times a WRITE bio has been
262*6cf2a73cSMauro Carvalho Chehab			  mapped to the origin
263*6cf2a73cSMauro Carvalho Chehab#demotions		  Number of times a block has been removed
264*6cf2a73cSMauro Carvalho Chehab			  from the cache
265*6cf2a73cSMauro Carvalho Chehab#promotions		  Number of times a block has been moved to
266*6cf2a73cSMauro Carvalho Chehab			  the cache
267*6cf2a73cSMauro Carvalho Chehab#dirty			  Number of blocks in the cache that differ
268*6cf2a73cSMauro Carvalho Chehab			  from the origin
269*6cf2a73cSMauro Carvalho Chehab#feature args		  Number of feature args to follow
270*6cf2a73cSMauro Carvalho Chehabfeature args		  'writethrough' (optional)
271*6cf2a73cSMauro Carvalho Chehab#core args		  Number of core arguments (must be even)
272*6cf2a73cSMauro Carvalho Chehabcore args		  Key/value pairs for tuning the core
273*6cf2a73cSMauro Carvalho Chehab			  e.g. migration_threshold
274*6cf2a73cSMauro Carvalho Chehabpolicy name		  Name of the policy
275*6cf2a73cSMauro Carvalho Chehab#policy args		  Number of policy arguments to follow (must be even)
276*6cf2a73cSMauro Carvalho Chehabpolicy args		  Key/value pairs e.g. sequential_threshold
277*6cf2a73cSMauro Carvalho Chehabcache metadata mode       ro if read-only, rw if read-write
278*6cf2a73cSMauro Carvalho Chehab
279*6cf2a73cSMauro Carvalho Chehab			  In serious cases where even a read-only mode is
280*6cf2a73cSMauro Carvalho Chehab			  deemed unsafe no further I/O will be permitted and
281*6cf2a73cSMauro Carvalho Chehab			  the status will just contain the string 'Fail'.
282*6cf2a73cSMauro Carvalho Chehab			  The userspace recovery tools should then be used.
283*6cf2a73cSMauro Carvalho Chehabneeds_check		  'needs_check' if set, '-' if not set
284*6cf2a73cSMauro Carvalho Chehab			  A metadata operation has failed, resulting in the
285*6cf2a73cSMauro Carvalho Chehab			  needs_check flag being set in the metadata's
286*6cf2a73cSMauro Carvalho Chehab			  superblock.  The metadata device must be
287*6cf2a73cSMauro Carvalho Chehab			  deactivated and checked/repaired before the
288*6cf2a73cSMauro Carvalho Chehab			  cache can be made fully operational again.
289*6cf2a73cSMauro Carvalho Chehab			  '-' indicates	needs_check is not set.
290*6cf2a73cSMauro Carvalho Chehab========================= =====================================================
291*6cf2a73cSMauro Carvalho Chehab
292*6cf2a73cSMauro Carvalho ChehabMessages
293*6cf2a73cSMauro Carvalho Chehab--------
294*6cf2a73cSMauro Carvalho Chehab
295*6cf2a73cSMauro Carvalho ChehabPolicies will have different tunables, specific to each one, so we
296*6cf2a73cSMauro Carvalho Chehabneed a generic way of getting and setting these.  Device-mapper
297*6cf2a73cSMauro Carvalho Chehabmessages are used.  (A sysfs interface would also be possible.)
298*6cf2a73cSMauro Carvalho Chehab
299*6cf2a73cSMauro Carvalho ChehabThe message format is::
300*6cf2a73cSMauro Carvalho Chehab
301*6cf2a73cSMauro Carvalho Chehab   <key> <value>
302*6cf2a73cSMauro Carvalho Chehab
303*6cf2a73cSMauro Carvalho ChehabE.g.::
304*6cf2a73cSMauro Carvalho Chehab
305*6cf2a73cSMauro Carvalho Chehab   dmsetup message my_cache 0 sequential_threshold 1024
306*6cf2a73cSMauro Carvalho Chehab
307*6cf2a73cSMauro Carvalho Chehab
308*6cf2a73cSMauro Carvalho ChehabInvalidation is removing an entry from the cache without writing it
309*6cf2a73cSMauro Carvalho Chehabback.  Cache blocks can be invalidated via the invalidate_cblocks
310*6cf2a73cSMauro Carvalho Chehabmessage, which takes an arbitrary number of cblock ranges.  Each cblock
311*6cf2a73cSMauro Carvalho Chehabrange's end value is "one past the end", meaning 5-10 expresses a range
312*6cf2a73cSMauro Carvalho Chehabof values from 5 to 9.  Each cblock must be expressed as a decimal
313*6cf2a73cSMauro Carvalho Chehabvalue, in the future a variant message that takes cblock ranges
314*6cf2a73cSMauro Carvalho Chehabexpressed in hexadecimal may be needed to better support efficient
315*6cf2a73cSMauro Carvalho Chehabinvalidation of larger caches.  The cache must be in passthrough mode
316*6cf2a73cSMauro Carvalho Chehabwhen invalidate_cblocks is used::
317*6cf2a73cSMauro Carvalho Chehab
318*6cf2a73cSMauro Carvalho Chehab   invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]*
319*6cf2a73cSMauro Carvalho Chehab
320*6cf2a73cSMauro Carvalho ChehabE.g.::
321*6cf2a73cSMauro Carvalho Chehab
322*6cf2a73cSMauro Carvalho Chehab   dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789
323*6cf2a73cSMauro Carvalho Chehab
324*6cf2a73cSMauro Carvalho ChehabExamples
325*6cf2a73cSMauro Carvalho Chehab========
326*6cf2a73cSMauro Carvalho Chehab
327*6cf2a73cSMauro Carvalho ChehabThe test suite can be found here:
328*6cf2a73cSMauro Carvalho Chehab
329*6cf2a73cSMauro Carvalho Chehabhttps://github.com/jthornber/device-mapper-test-suite
330*6cf2a73cSMauro Carvalho Chehab
331*6cf2a73cSMauro Carvalho Chehab::
332*6cf2a73cSMauro Carvalho Chehab
333*6cf2a73cSMauro Carvalho Chehab  dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
334*6cf2a73cSMauro Carvalho Chehab	  /dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0'
335*6cf2a73cSMauro Carvalho Chehab  dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \
336*6cf2a73cSMauro Carvalho Chehab	  /dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \
337*6cf2a73cSMauro Carvalho Chehab	  mq 4 sequential_threshold 1024 random_threshold 8'
338