1*6cf2a73cSMauro Carvalho Chehab===== 2*6cf2a73cSMauro Carvalho ChehabCache 3*6cf2a73cSMauro Carvalho Chehab===== 4*6cf2a73cSMauro Carvalho Chehab 5*6cf2a73cSMauro Carvalho ChehabIntroduction 6*6cf2a73cSMauro Carvalho Chehab============ 7*6cf2a73cSMauro Carvalho Chehab 8*6cf2a73cSMauro Carvalho Chehabdm-cache is a device mapper target written by Joe Thornber, Heinz 9*6cf2a73cSMauro Carvalho ChehabMauelshagen, and Mike Snitzer. 10*6cf2a73cSMauro Carvalho Chehab 11*6cf2a73cSMauro Carvalho ChehabIt aims to improve performance of a block device (eg, a spindle) by 12*6cf2a73cSMauro Carvalho Chehabdynamically migrating some of its data to a faster, smaller device 13*6cf2a73cSMauro Carvalho Chehab(eg, an SSD). 14*6cf2a73cSMauro Carvalho Chehab 15*6cf2a73cSMauro Carvalho ChehabThis device-mapper solution allows us to insert this caching at 16*6cf2a73cSMauro Carvalho Chehabdifferent levels of the dm stack, for instance above the data device for 17*6cf2a73cSMauro Carvalho Chehaba thin-provisioning pool. Caching solutions that are integrated more 18*6cf2a73cSMauro Carvalho Chehabclosely with the virtual memory system should give better performance. 19*6cf2a73cSMauro Carvalho Chehab 20*6cf2a73cSMauro Carvalho ChehabThe target reuses the metadata library used in the thin-provisioning 21*6cf2a73cSMauro Carvalho Chehablibrary. 22*6cf2a73cSMauro Carvalho Chehab 23*6cf2a73cSMauro Carvalho ChehabThe decision as to what data to migrate and when is left to a plug-in 24*6cf2a73cSMauro Carvalho Chehabpolicy module. Several of these have been written as we experiment, 25*6cf2a73cSMauro Carvalho Chehaband we hope other people will contribute others for specific io 26*6cf2a73cSMauro Carvalho Chehabscenarios (eg. a vm image server). 27*6cf2a73cSMauro Carvalho Chehab 28*6cf2a73cSMauro Carvalho ChehabGlossary 29*6cf2a73cSMauro Carvalho Chehab======== 30*6cf2a73cSMauro Carvalho Chehab 31*6cf2a73cSMauro Carvalho Chehab Migration 32*6cf2a73cSMauro Carvalho Chehab Movement of the primary copy of a logical block from one 33*6cf2a73cSMauro Carvalho Chehab device to the other. 34*6cf2a73cSMauro Carvalho Chehab Promotion 35*6cf2a73cSMauro Carvalho Chehab Migration from slow device to fast device. 36*6cf2a73cSMauro Carvalho Chehab Demotion 37*6cf2a73cSMauro Carvalho Chehab Migration from fast device to slow device. 38*6cf2a73cSMauro Carvalho Chehab 39*6cf2a73cSMauro Carvalho ChehabThe origin device always contains a copy of the logical block, which 40*6cf2a73cSMauro Carvalho Chehabmay be out of date or kept in sync with the copy on the cache device 41*6cf2a73cSMauro Carvalho Chehab(depending on policy). 42*6cf2a73cSMauro Carvalho Chehab 43*6cf2a73cSMauro Carvalho ChehabDesign 44*6cf2a73cSMauro Carvalho Chehab====== 45*6cf2a73cSMauro Carvalho Chehab 46*6cf2a73cSMauro Carvalho ChehabSub-devices 47*6cf2a73cSMauro Carvalho Chehab----------- 48*6cf2a73cSMauro Carvalho Chehab 49*6cf2a73cSMauro Carvalho ChehabThe target is constructed by passing three devices to it (along with 50*6cf2a73cSMauro Carvalho Chehabother parameters detailed later): 51*6cf2a73cSMauro Carvalho Chehab 52*6cf2a73cSMauro Carvalho Chehab1. An origin device - the big, slow one. 53*6cf2a73cSMauro Carvalho Chehab 54*6cf2a73cSMauro Carvalho Chehab2. A cache device - the small, fast one. 55*6cf2a73cSMauro Carvalho Chehab 56*6cf2a73cSMauro Carvalho Chehab3. A small metadata device - records which blocks are in the cache, 57*6cf2a73cSMauro Carvalho Chehab which are dirty, and extra hints for use by the policy object. 58*6cf2a73cSMauro Carvalho Chehab This information could be put on the cache device, but having it 59*6cf2a73cSMauro Carvalho Chehab separate allows the volume manager to configure it differently, 60*6cf2a73cSMauro Carvalho Chehab e.g. as a mirror for extra robustness. This metadata device may only 61*6cf2a73cSMauro Carvalho Chehab be used by a single cache device. 62*6cf2a73cSMauro Carvalho Chehab 63*6cf2a73cSMauro Carvalho ChehabFixed block size 64*6cf2a73cSMauro Carvalho Chehab---------------- 65*6cf2a73cSMauro Carvalho Chehab 66*6cf2a73cSMauro Carvalho ChehabThe origin is divided up into blocks of a fixed size. This block size 67*6cf2a73cSMauro Carvalho Chehabis configurable when you first create the cache. Typically we've been 68*6cf2a73cSMauro Carvalho Chehabusing block sizes of 256KB - 1024KB. The block size must be between 64 69*6cf2a73cSMauro Carvalho Chehabsectors (32KB) and 2097152 sectors (1GB) and a multiple of 64 sectors (32KB). 70*6cf2a73cSMauro Carvalho Chehab 71*6cf2a73cSMauro Carvalho ChehabHaving a fixed block size simplifies the target a lot. But it is 72*6cf2a73cSMauro Carvalho Chehabsomething of a compromise. For instance, a small part of a block may be 73*6cf2a73cSMauro Carvalho Chehabgetting hit a lot, yet the whole block will be promoted to the cache. 74*6cf2a73cSMauro Carvalho ChehabSo large block sizes are bad because they waste cache space. And small 75*6cf2a73cSMauro Carvalho Chehabblock sizes are bad because they increase the amount of metadata (both 76*6cf2a73cSMauro Carvalho Chehabin core and on disk). 77*6cf2a73cSMauro Carvalho Chehab 78*6cf2a73cSMauro Carvalho ChehabCache operating modes 79*6cf2a73cSMauro Carvalho Chehab--------------------- 80*6cf2a73cSMauro Carvalho Chehab 81*6cf2a73cSMauro Carvalho ChehabThe cache has three operating modes: writeback, writethrough and 82*6cf2a73cSMauro Carvalho Chehabpassthrough. 83*6cf2a73cSMauro Carvalho Chehab 84*6cf2a73cSMauro Carvalho ChehabIf writeback, the default, is selected then a write to a block that is 85*6cf2a73cSMauro Carvalho Chehabcached will go only to the cache and the block will be marked dirty in 86*6cf2a73cSMauro Carvalho Chehabthe metadata. 87*6cf2a73cSMauro Carvalho Chehab 88*6cf2a73cSMauro Carvalho ChehabIf writethrough is selected then a write to a cached block will not 89*6cf2a73cSMauro Carvalho Chehabcomplete until it has hit both the origin and cache devices. Clean 90*6cf2a73cSMauro Carvalho Chehabblocks should remain clean. 91*6cf2a73cSMauro Carvalho Chehab 92*6cf2a73cSMauro Carvalho ChehabIf passthrough is selected, useful when the cache contents are not known 93*6cf2a73cSMauro Carvalho Chehabto be coherent with the origin device, then all reads are served from 94*6cf2a73cSMauro Carvalho Chehabthe origin device (all reads miss the cache) and all writes are 95*6cf2a73cSMauro Carvalho Chehabforwarded to the origin device; additionally, write hits cause cache 96*6cf2a73cSMauro Carvalho Chehabblock invalidates. To enable passthrough mode the cache must be clean. 97*6cf2a73cSMauro Carvalho ChehabPassthrough mode allows a cache device to be activated without having to 98*6cf2a73cSMauro Carvalho Chehabworry about coherency. Coherency that exists is maintained, although 99*6cf2a73cSMauro Carvalho Chehabthe cache will gradually cool as writes take place. If the coherency of 100*6cf2a73cSMauro Carvalho Chehabthe cache can later be verified, or established through use of the 101*6cf2a73cSMauro Carvalho Chehab"invalidate_cblocks" message, the cache device can be transitioned to 102*6cf2a73cSMauro Carvalho Chehabwritethrough or writeback mode while still warm. Otherwise, the cache 103*6cf2a73cSMauro Carvalho Chehabcontents can be discarded prior to transitioning to the desired 104*6cf2a73cSMauro Carvalho Chehaboperating mode. 105*6cf2a73cSMauro Carvalho Chehab 106*6cf2a73cSMauro Carvalho ChehabA simple cleaner policy is provided, which will clean (write back) all 107*6cf2a73cSMauro Carvalho Chehabdirty blocks in a cache. Useful for decommissioning a cache or when 108*6cf2a73cSMauro Carvalho Chehabshrinking a cache. Shrinking the cache's fast device requires all cache 109*6cf2a73cSMauro Carvalho Chehabblocks, in the area of the cache being removed, to be clean. If the 110*6cf2a73cSMauro Carvalho Chehabarea being removed from the cache still contains dirty blocks the resize 111*6cf2a73cSMauro Carvalho Chehabwill fail. Care must be taken to never reduce the volume used for the 112*6cf2a73cSMauro Carvalho Chehabcache's fast device until the cache is clean. This is of particular 113*6cf2a73cSMauro Carvalho Chehabimportance if writeback mode is used. Writethrough and passthrough 114*6cf2a73cSMauro Carvalho Chehabmodes already maintain a clean cache. Future support to partially clean 115*6cf2a73cSMauro Carvalho Chehabthe cache, above a specified threshold, will allow for keeping the cache 116*6cf2a73cSMauro Carvalho Chehabwarm and in writeback mode during resize. 117*6cf2a73cSMauro Carvalho Chehab 118*6cf2a73cSMauro Carvalho ChehabMigration throttling 119*6cf2a73cSMauro Carvalho Chehab-------------------- 120*6cf2a73cSMauro Carvalho Chehab 121*6cf2a73cSMauro Carvalho ChehabMigrating data between the origin and cache device uses bandwidth. 122*6cf2a73cSMauro Carvalho ChehabThe user can set a throttle to prevent more than a certain amount of 123*6cf2a73cSMauro Carvalho Chehabmigration occurring at any one time. Currently we're not taking any 124*6cf2a73cSMauro Carvalho Chehabaccount of normal io traffic going to the devices. More work needs 125*6cf2a73cSMauro Carvalho Chehabdoing here to avoid migrating during those peak io moments. 126*6cf2a73cSMauro Carvalho Chehab 127*6cf2a73cSMauro Carvalho ChehabFor the time being, a message "migration_threshold <#sectors>" 128*6cf2a73cSMauro Carvalho Chehabcan be used to set the maximum number of sectors being migrated, 129*6cf2a73cSMauro Carvalho Chehabthe default being 2048 sectors (1MB). 130*6cf2a73cSMauro Carvalho Chehab 131*6cf2a73cSMauro Carvalho ChehabUpdating on-disk metadata 132*6cf2a73cSMauro Carvalho Chehab------------------------- 133*6cf2a73cSMauro Carvalho Chehab 134*6cf2a73cSMauro Carvalho ChehabOn-disk metadata is committed every time a FLUSH or FUA bio is written. 135*6cf2a73cSMauro Carvalho ChehabIf no such requests are made then commits will occur every second. This 136*6cf2a73cSMauro Carvalho Chehabmeans the cache behaves like a physical disk that has a volatile write 137*6cf2a73cSMauro Carvalho Chehabcache. If power is lost you may lose some recent writes. The metadata 138*6cf2a73cSMauro Carvalho Chehabshould always be consistent in spite of any crash. 139*6cf2a73cSMauro Carvalho Chehab 140*6cf2a73cSMauro Carvalho ChehabThe 'dirty' state for a cache block changes far too frequently for us 141*6cf2a73cSMauro Carvalho Chehabto keep updating it on the fly. So we treat it as a hint. In normal 142*6cf2a73cSMauro Carvalho Chehaboperation it will be written when the dm device is suspended. If the 143*6cf2a73cSMauro Carvalho Chehabsystem crashes all cache blocks will be assumed dirty when restarted. 144*6cf2a73cSMauro Carvalho Chehab 145*6cf2a73cSMauro Carvalho ChehabPer-block policy hints 146*6cf2a73cSMauro Carvalho Chehab---------------------- 147*6cf2a73cSMauro Carvalho Chehab 148*6cf2a73cSMauro Carvalho ChehabPolicy plug-ins can store a chunk of data per cache block. It's up to 149*6cf2a73cSMauro Carvalho Chehabthe policy how big this chunk is, but it should be kept small. Like the 150*6cf2a73cSMauro Carvalho Chehabdirty flags this data is lost if there's a crash so a safe fallback 151*6cf2a73cSMauro Carvalho Chehabvalue should always be possible. 152*6cf2a73cSMauro Carvalho Chehab 153*6cf2a73cSMauro Carvalho ChehabPolicy hints affect performance, not correctness. 154*6cf2a73cSMauro Carvalho Chehab 155*6cf2a73cSMauro Carvalho ChehabPolicy messaging 156*6cf2a73cSMauro Carvalho Chehab---------------- 157*6cf2a73cSMauro Carvalho Chehab 158*6cf2a73cSMauro Carvalho ChehabPolicies will have different tunables, specific to each one, so we 159*6cf2a73cSMauro Carvalho Chehabneed a generic way of getting and setting these. Device-mapper 160*6cf2a73cSMauro Carvalho Chehabmessages are used. Refer to cache-policies.txt. 161*6cf2a73cSMauro Carvalho Chehab 162*6cf2a73cSMauro Carvalho ChehabDiscard bitset resolution 163*6cf2a73cSMauro Carvalho Chehab------------------------- 164*6cf2a73cSMauro Carvalho Chehab 165*6cf2a73cSMauro Carvalho ChehabWe can avoid copying data during migration if we know the block has 166*6cf2a73cSMauro Carvalho Chehabbeen discarded. A prime example of this is when mkfs discards the 167*6cf2a73cSMauro Carvalho Chehabwhole block device. We store a bitset tracking the discard state of 168*6cf2a73cSMauro Carvalho Chehabblocks. However, we allow this bitset to have a different block size 169*6cf2a73cSMauro Carvalho Chehabfrom the cache blocks. This is because we need to track the discard 170*6cf2a73cSMauro Carvalho Chehabstate for all of the origin device (compare with the dirty bitset 171*6cf2a73cSMauro Carvalho Chehabwhich is just for the smaller cache device). 172*6cf2a73cSMauro Carvalho Chehab 173*6cf2a73cSMauro Carvalho ChehabTarget interface 174*6cf2a73cSMauro Carvalho Chehab================ 175*6cf2a73cSMauro Carvalho Chehab 176*6cf2a73cSMauro Carvalho ChehabConstructor 177*6cf2a73cSMauro Carvalho Chehab----------- 178*6cf2a73cSMauro Carvalho Chehab 179*6cf2a73cSMauro Carvalho Chehab :: 180*6cf2a73cSMauro Carvalho Chehab 181*6cf2a73cSMauro Carvalho Chehab cache <metadata dev> <cache dev> <origin dev> <block size> 182*6cf2a73cSMauro Carvalho Chehab <#feature args> [<feature arg>]* 183*6cf2a73cSMauro Carvalho Chehab <policy> <#policy args> [policy args]* 184*6cf2a73cSMauro Carvalho Chehab 185*6cf2a73cSMauro Carvalho Chehab ================ ======================================================= 186*6cf2a73cSMauro Carvalho Chehab metadata dev fast device holding the persistent metadata 187*6cf2a73cSMauro Carvalho Chehab cache dev fast device holding cached data blocks 188*6cf2a73cSMauro Carvalho Chehab origin dev slow device holding original data blocks 189*6cf2a73cSMauro Carvalho Chehab block size cache unit size in sectors 190*6cf2a73cSMauro Carvalho Chehab 191*6cf2a73cSMauro Carvalho Chehab #feature args number of feature arguments passed 192*6cf2a73cSMauro Carvalho Chehab feature args writethrough or passthrough (The default is writeback.) 193*6cf2a73cSMauro Carvalho Chehab 194*6cf2a73cSMauro Carvalho Chehab policy the replacement policy to use 195*6cf2a73cSMauro Carvalho Chehab #policy args an even number of arguments corresponding to 196*6cf2a73cSMauro Carvalho Chehab key/value pairs passed to the policy 197*6cf2a73cSMauro Carvalho Chehab policy args key/value pairs passed to the policy 198*6cf2a73cSMauro Carvalho Chehab E.g. 'sequential_threshold 1024' 199*6cf2a73cSMauro Carvalho Chehab See cache-policies.txt for details. 200*6cf2a73cSMauro Carvalho Chehab ================ ======================================================= 201*6cf2a73cSMauro Carvalho Chehab 202*6cf2a73cSMauro Carvalho ChehabOptional feature arguments are: 203*6cf2a73cSMauro Carvalho Chehab 204*6cf2a73cSMauro Carvalho Chehab 205*6cf2a73cSMauro Carvalho Chehab ==================== ======================================================== 206*6cf2a73cSMauro Carvalho Chehab writethrough write through caching that prohibits cache block 207*6cf2a73cSMauro Carvalho Chehab content from being different from origin block content. 208*6cf2a73cSMauro Carvalho Chehab Without this argument, the default behaviour is to write 209*6cf2a73cSMauro Carvalho Chehab back cache block contents later for performance reasons, 210*6cf2a73cSMauro Carvalho Chehab so they may differ from the corresponding origin blocks. 211*6cf2a73cSMauro Carvalho Chehab 212*6cf2a73cSMauro Carvalho Chehab passthrough a degraded mode useful for various cache coherency 213*6cf2a73cSMauro Carvalho Chehab situations (e.g., rolling back snapshots of 214*6cf2a73cSMauro Carvalho Chehab underlying storage). Reads and writes always go to 215*6cf2a73cSMauro Carvalho Chehab the origin. If a write goes to a cached origin 216*6cf2a73cSMauro Carvalho Chehab block, then the cache block is invalidated. 217*6cf2a73cSMauro Carvalho Chehab To enable passthrough mode the cache must be clean. 218*6cf2a73cSMauro Carvalho Chehab 219*6cf2a73cSMauro Carvalho Chehab metadata2 use version 2 of the metadata. This stores the dirty 220*6cf2a73cSMauro Carvalho Chehab bits in a separate btree, which improves speed of 221*6cf2a73cSMauro Carvalho Chehab shutting down the cache. 222*6cf2a73cSMauro Carvalho Chehab 223*6cf2a73cSMauro Carvalho Chehab no_discard_passdown disable passing down discards from the cache 224*6cf2a73cSMauro Carvalho Chehab to the origin's data device. 225*6cf2a73cSMauro Carvalho Chehab ==================== ======================================================== 226*6cf2a73cSMauro Carvalho Chehab 227*6cf2a73cSMauro Carvalho ChehabA policy called 'default' is always registered. This is an alias for 228*6cf2a73cSMauro Carvalho Chehabthe policy we currently think is giving best all round performance. 229*6cf2a73cSMauro Carvalho Chehab 230*6cf2a73cSMauro Carvalho ChehabAs the default policy could vary between kernels, if you are relying on 231*6cf2a73cSMauro Carvalho Chehabthe characteristics of a specific policy, always request it by name. 232*6cf2a73cSMauro Carvalho Chehab 233*6cf2a73cSMauro Carvalho ChehabStatus 234*6cf2a73cSMauro Carvalho Chehab------ 235*6cf2a73cSMauro Carvalho Chehab 236*6cf2a73cSMauro Carvalho Chehab:: 237*6cf2a73cSMauro Carvalho Chehab 238*6cf2a73cSMauro Carvalho Chehab <metadata block size> <#used metadata blocks>/<#total metadata blocks> 239*6cf2a73cSMauro Carvalho Chehab <cache block size> <#used cache blocks>/<#total cache blocks> 240*6cf2a73cSMauro Carvalho Chehab <#read hits> <#read misses> <#write hits> <#write misses> 241*6cf2a73cSMauro Carvalho Chehab <#demotions> <#promotions> <#dirty> <#features> <features>* 242*6cf2a73cSMauro Carvalho Chehab <#core args> <core args>* <policy name> <#policy args> <policy args>* 243*6cf2a73cSMauro Carvalho Chehab <cache metadata mode> 244*6cf2a73cSMauro Carvalho Chehab 245*6cf2a73cSMauro Carvalho Chehab 246*6cf2a73cSMauro Carvalho Chehab========================= ===================================================== 247*6cf2a73cSMauro Carvalho Chehabmetadata block size Fixed block size for each metadata block in 248*6cf2a73cSMauro Carvalho Chehab sectors 249*6cf2a73cSMauro Carvalho Chehab#used metadata blocks Number of metadata blocks used 250*6cf2a73cSMauro Carvalho Chehab#total metadata blocks Total number of metadata blocks 251*6cf2a73cSMauro Carvalho Chehabcache block size Configurable block size for the cache device 252*6cf2a73cSMauro Carvalho Chehab in sectors 253*6cf2a73cSMauro Carvalho Chehab#used cache blocks Number of blocks resident in the cache 254*6cf2a73cSMauro Carvalho Chehab#total cache blocks Total number of cache blocks 255*6cf2a73cSMauro Carvalho Chehab#read hits Number of times a READ bio has been mapped 256*6cf2a73cSMauro Carvalho Chehab to the cache 257*6cf2a73cSMauro Carvalho Chehab#read misses Number of times a READ bio has been mapped 258*6cf2a73cSMauro Carvalho Chehab to the origin 259*6cf2a73cSMauro Carvalho Chehab#write hits Number of times a WRITE bio has been mapped 260*6cf2a73cSMauro Carvalho Chehab to the cache 261*6cf2a73cSMauro Carvalho Chehab#write misses Number of times a WRITE bio has been 262*6cf2a73cSMauro Carvalho Chehab mapped to the origin 263*6cf2a73cSMauro Carvalho Chehab#demotions Number of times a block has been removed 264*6cf2a73cSMauro Carvalho Chehab from the cache 265*6cf2a73cSMauro Carvalho Chehab#promotions Number of times a block has been moved to 266*6cf2a73cSMauro Carvalho Chehab the cache 267*6cf2a73cSMauro Carvalho Chehab#dirty Number of blocks in the cache that differ 268*6cf2a73cSMauro Carvalho Chehab from the origin 269*6cf2a73cSMauro Carvalho Chehab#feature args Number of feature args to follow 270*6cf2a73cSMauro Carvalho Chehabfeature args 'writethrough' (optional) 271*6cf2a73cSMauro Carvalho Chehab#core args Number of core arguments (must be even) 272*6cf2a73cSMauro Carvalho Chehabcore args Key/value pairs for tuning the core 273*6cf2a73cSMauro Carvalho Chehab e.g. migration_threshold 274*6cf2a73cSMauro Carvalho Chehabpolicy name Name of the policy 275*6cf2a73cSMauro Carvalho Chehab#policy args Number of policy arguments to follow (must be even) 276*6cf2a73cSMauro Carvalho Chehabpolicy args Key/value pairs e.g. sequential_threshold 277*6cf2a73cSMauro Carvalho Chehabcache metadata mode ro if read-only, rw if read-write 278*6cf2a73cSMauro Carvalho Chehab 279*6cf2a73cSMauro Carvalho Chehab In serious cases where even a read-only mode is 280*6cf2a73cSMauro Carvalho Chehab deemed unsafe no further I/O will be permitted and 281*6cf2a73cSMauro Carvalho Chehab the status will just contain the string 'Fail'. 282*6cf2a73cSMauro Carvalho Chehab The userspace recovery tools should then be used. 283*6cf2a73cSMauro Carvalho Chehabneeds_check 'needs_check' if set, '-' if not set 284*6cf2a73cSMauro Carvalho Chehab A metadata operation has failed, resulting in the 285*6cf2a73cSMauro Carvalho Chehab needs_check flag being set in the metadata's 286*6cf2a73cSMauro Carvalho Chehab superblock. The metadata device must be 287*6cf2a73cSMauro Carvalho Chehab deactivated and checked/repaired before the 288*6cf2a73cSMauro Carvalho Chehab cache can be made fully operational again. 289*6cf2a73cSMauro Carvalho Chehab '-' indicates needs_check is not set. 290*6cf2a73cSMauro Carvalho Chehab========================= ===================================================== 291*6cf2a73cSMauro Carvalho Chehab 292*6cf2a73cSMauro Carvalho ChehabMessages 293*6cf2a73cSMauro Carvalho Chehab-------- 294*6cf2a73cSMauro Carvalho Chehab 295*6cf2a73cSMauro Carvalho ChehabPolicies will have different tunables, specific to each one, so we 296*6cf2a73cSMauro Carvalho Chehabneed a generic way of getting and setting these. Device-mapper 297*6cf2a73cSMauro Carvalho Chehabmessages are used. (A sysfs interface would also be possible.) 298*6cf2a73cSMauro Carvalho Chehab 299*6cf2a73cSMauro Carvalho ChehabThe message format is:: 300*6cf2a73cSMauro Carvalho Chehab 301*6cf2a73cSMauro Carvalho Chehab <key> <value> 302*6cf2a73cSMauro Carvalho Chehab 303*6cf2a73cSMauro Carvalho ChehabE.g.:: 304*6cf2a73cSMauro Carvalho Chehab 305*6cf2a73cSMauro Carvalho Chehab dmsetup message my_cache 0 sequential_threshold 1024 306*6cf2a73cSMauro Carvalho Chehab 307*6cf2a73cSMauro Carvalho Chehab 308*6cf2a73cSMauro Carvalho ChehabInvalidation is removing an entry from the cache without writing it 309*6cf2a73cSMauro Carvalho Chehabback. Cache blocks can be invalidated via the invalidate_cblocks 310*6cf2a73cSMauro Carvalho Chehabmessage, which takes an arbitrary number of cblock ranges. Each cblock 311*6cf2a73cSMauro Carvalho Chehabrange's end value is "one past the end", meaning 5-10 expresses a range 312*6cf2a73cSMauro Carvalho Chehabof values from 5 to 9. Each cblock must be expressed as a decimal 313*6cf2a73cSMauro Carvalho Chehabvalue, in the future a variant message that takes cblock ranges 314*6cf2a73cSMauro Carvalho Chehabexpressed in hexadecimal may be needed to better support efficient 315*6cf2a73cSMauro Carvalho Chehabinvalidation of larger caches. The cache must be in passthrough mode 316*6cf2a73cSMauro Carvalho Chehabwhen invalidate_cblocks is used:: 317*6cf2a73cSMauro Carvalho Chehab 318*6cf2a73cSMauro Carvalho Chehab invalidate_cblocks [<cblock>|<cblock begin>-<cblock end>]* 319*6cf2a73cSMauro Carvalho Chehab 320*6cf2a73cSMauro Carvalho ChehabE.g.:: 321*6cf2a73cSMauro Carvalho Chehab 322*6cf2a73cSMauro Carvalho Chehab dmsetup message my_cache 0 invalidate_cblocks 2345 3456-4567 5678-6789 323*6cf2a73cSMauro Carvalho Chehab 324*6cf2a73cSMauro Carvalho ChehabExamples 325*6cf2a73cSMauro Carvalho Chehab======== 326*6cf2a73cSMauro Carvalho Chehab 327*6cf2a73cSMauro Carvalho ChehabThe test suite can be found here: 328*6cf2a73cSMauro Carvalho Chehab 329*6cf2a73cSMauro Carvalho Chehabhttps://github.com/jthornber/device-mapper-test-suite 330*6cf2a73cSMauro Carvalho Chehab 331*6cf2a73cSMauro Carvalho Chehab:: 332*6cf2a73cSMauro Carvalho Chehab 333*6cf2a73cSMauro Carvalho Chehab dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \ 334*6cf2a73cSMauro Carvalho Chehab /dev/mapper/ssd /dev/mapper/origin 512 1 writeback default 0' 335*6cf2a73cSMauro Carvalho Chehab dmsetup create my_cache --table '0 41943040 cache /dev/mapper/metadata \ 336*6cf2a73cSMauro Carvalho Chehab /dev/mapper/ssd /dev/mapper/origin 1024 1 writeback \ 337*6cf2a73cSMauro Carvalho Chehab mq 4 sequential_threshold 1024 random_threshold 8' 338