1.. SPDX-License-Identifier: GPL-2.0 2 3================================= 4dm-pcache — Persistent Cache 5================================= 6 7*Author: Dongsheng Yang <dongsheng.yang@linux.dev>* 8 9This document describes *dm-pcache*, a Device-Mapper target that lets a 10byte-addressable *DAX* (persistent-memory, “pmem”) region act as a 11high-performance, crash-persistent cache in front of a slower block 12device. The code lives in `drivers/md/dm-pcache/`. 13 14Quick feature summary 15===================== 16 17* *Write-back* caching (only mode currently supported). 18* *16 MiB segments* allocated on the pmem device. 19* *Data CRC32* verification (optional, per cache). 20* Crash-safe: every metadata structure is duplicated (`PCACHE_META_INDEX_MAX 21 == 2`) and protected with CRC+sequence numbers. 22* *Multi-tree indexing* (indexing trees sharded by logical address) for high PMem parallelism 23* Pure *DAX path* I/O – no extra BIO round-trips 24* *Log-structured write-back* that preserves backend crash-consistency 25 26 27Constructor 28=========== 29 30:: 31 32 pcache <cache_dev> <backing_dev> [<number_of_optional_arguments> <cache_mode writeback> <data_crc true|false>] 33 34========================= ==================================================== 35``cache_dev`` Any DAX-capable block device (``/dev/pmem0``…). 36 All metadata *and* cached blocks are stored here. 37 38``backing_dev`` The slow block device to be cached. 39 40``cache_mode`` Optional, Only ``writeback`` is accepted at the 41 moment. 42 43``data_crc`` Optional, default to ``false`` 44 45 * ``true`` – store CRC32 for every cached entry 46 and verify on reads 47 * ``false`` – skip CRC (faster) 48========================= ==================================================== 49 50Example 51------- 52 53.. code-block:: shell 54 55 dmsetup create pcache_sdb --table \ 56 "0 $(blockdev --getsz /dev/sdb) pcache /dev/pmem0 /dev/sdb 4 cache_mode writeback data_crc true" 57 58The first time a pmem device is used, dm-pcache formats it automatically 59(super-block, cache_info, etc.). 60 61 62Status line 63=========== 64 65``dmsetup status <device>`` (``STATUSTYPE_INFO``) prints: 66 67:: 68 69 <sb_flags> <seg_total> <cache_segs> <segs_used> \ 70 <gc_percent> <cache_flags> \ 71 <key_head_seg>:<key_head_off> \ 72 <dirty_tail_seg>:<dirty_tail_off> \ 73 <key_tail_seg>:<key_tail_off> 74 75Field meanings 76-------------- 77 78=============================== ============================================= 79``sb_flags`` Super-block flags (e.g. endian marker). 80 81``seg_total`` Number of physical *pmem* segments. 82 83``cache_segs`` Number of segments used for cache. 84 85``segs_used`` Segments currently allocated (bitmap weight). 86 87``gc_percent`` Current GC high-water mark (0-90). 88 89``cache_flags`` Bit 0 – DATA_CRC enabled 90 Bit 1 – INIT_DONE (cache initialised) 91 Bits 2-5 – cache mode (0 == WB). 92 93``key_head`` Where new key-sets are being written. 94 95``dirty_tail`` First dirty key-set that still needs 96 write-back to the backing device. 97 98``key_tail`` First key-set that may be reclaimed by GC. 99=============================== ============================================= 100 101 102Messages 103======== 104 105*Change GC trigger* 106 107:: 108 109 dmsetup message <dev> 0 gc_percent <0-90> 110 111 112Theory of operation 113=================== 114 115Sub-devices 116----------- 117 118==================== ========================================================= 119backing_dev Any block device (SSD/HDD/loop/LVM, etc.). 120cache_dev DAX device; must expose direct-access memory. 121==================== ========================================================= 122 123Segments and key-sets 124--------------------- 125 126* The pmem space is divided into *16 MiB segments*. 127* Each write allocates space from a per-CPU *data_head* inside a segment. 128* A *cache-key* records a logical range on the origin and where it lives 129 inside pmem (segment + offset + generation). 130* 128 keys form a *key-set* (kset); ksets are written sequentially in pmem 131 and are themselves crash-safe (CRC). 132* The pair *(key_tail, dirty_tail)* delimit clean/dirty and live/dead ksets. 133 134Write-back 135---------- 136 137Dirty keys are queued into a tree; a background worker copies data 138back to the backing_dev and advances *dirty_tail*. A FLUSH/FUA bio from the 139upper layers forces an immediate metadata commit. 140 141Garbage collection 142------------------ 143 144GC starts when ``segs_used >= seg_total * gc_percent / 100``. It walks 145from *key_tail*, frees segments whose every key has been invalidated, and 146advances *key_tail*. 147 148CRC verification 149---------------- 150 151If ``data_crc is enabled`` dm-pcache computes a CRC32 over every cached data 152range when it is inserted and stores it in the on-media key. Reads 153validate the CRC before copying to the caller. 154 155 156Failure handling 157================ 158 159* *pmem media errors* – all metadata copies are read with 160 ``copy_mc_to_kernel``; an uncorrectable error logs and aborts initialisation. 161* *Cache full* – if no free segment can be found, writes return ``-EBUSY``; 162 dm-pcache retries internally (request deferral). 163* *System crash* – on attach, the driver replays ksets from *key_tail* to 164 rebuild the in-core trees; every segment’s generation guards against 165 use-after-free keys. 166 167 168Limitations & TODO 169================== 170 171* Only *write-back* mode; other modes planned. 172* Only FIFO cache invalidate; other (LRU, ARC...) planned. 173* Table reload is not supported currently. 174* Discard planned. 175 176 177Example workflow 178================ 179 180.. code-block:: shell 181 182 # 1. Create devices 183 dmsetup create pcache_sdb --table \ 184 "0 $(blockdev --getsz /dev/sdb) pcache /dev/pmem0 /dev/sdb 4 cache_mode writeback data_crc true" 185 186 # 2. Put a filesystem on top 187 mkfs.ext4 /dev/mapper/pcache_sdb 188 mount /dev/mapper/pcache_sdb /mnt 189 190 # 3. Tune GC threshold to 80 % 191 dmsetup message pcache_sdb 0 gc_percent 80 192 193 # 4. Observe status 194 watch -n1 'dmsetup status pcache_sdb' 195 196 # 5. Shutdown 197 umount /mnt 198 dmsetup remove pcache_sdb 199 200 201``dm-pcache`` is under active development; feedback, bug reports and patches 202are very welcome! 203