1.. _zsmalloc: 2 3======== 4zsmalloc 5======== 6 7This allocator is designed for use with zram. Thus, the allocator is 8supposed to work well under low memory conditions. In particular, it 9never attempts higher order page allocation which is very likely to 10fail under memory pressure. On the other hand, if we just use single 11(0-order) pages, it would suffer from very high fragmentation -- 12any object of size PAGE_SIZE/2 or larger would occupy an entire page. 13This was one of the major issues with its predecessor (xvmalloc). 14 15To overcome these issues, zsmalloc allocates a bunch of 0-order pages 16and links them together using various 'struct page' fields. These linked 17pages act as a single higher-order page i.e. an object can span 0-order 18page boundaries. The code refers to these linked pages as a single entity 19called zspage. 20 21For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE 22since this satisfies the requirements of all its current users (in the 23worst case, page is incompressible and is thus stored "as-is" i.e. in 24uncompressed form). For allocation requests larger than this size, failure 25is returned (see zs_malloc). 26 27Additionally, zs_malloc() does not return a dereferenceable pointer. 28Instead, it returns an opaque handle (unsigned long) which encodes actual 29location of the allocated object. The reason for this indirection is that 30zsmalloc does not keep zspages permanently mapped since that would cause 31issues on 32-bit systems where the VA region for kernel space mappings 32is very small. So, before using the allocating memory, the object has to 33be mapped using zs_map_object() to get a usable pointer and subsequently 34unmapped using zs_unmap_object(). 35 36stat 37==== 38 39With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via 40``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output:: 41 42 # cat /sys/kernel/debug/zsmalloc/zram0/classes 43 44 class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage 45 ... 46 ... 47 9 176 0 1 186 129 8 4 48 10 192 1 0 2880 2872 135 3 49 11 208 0 1 819 795 42 2 50 12 224 0 1 219 159 12 4 51 ... 52 ... 53 54 55class 56 index 57size 58 object size zspage stores 59almost_empty 60 the number of ZS_ALMOST_EMPTY zspages(see below) 61almost_full 62 the number of ZS_ALMOST_FULL zspages(see below) 63obj_allocated 64 the number of objects allocated 65obj_used 66 the number of objects allocated to the user 67pages_used 68 the number of pages allocated for the class 69pages_per_zspage 70 the number of 0-order pages to make a zspage 71 72We assign a zspage to ZS_ALMOST_EMPTY fullness group when n <= N / f, where 73 74* n = number of allocated objects 75* N = total number of objects zspage can store 76* f = fullness_threshold_frac(ie, 4 at the moment) 77 78Similarly, we assign zspage to: 79 80* ZS_ALMOST_FULL when n > N / f 81* ZS_EMPTY when n == 0 82* ZS_FULL when n == N 83 84 85Internals 86========= 87 88zsmalloc has 255 size classes, each of which can hold a number of zspages. 89Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages. 90The optimal zspage chain size for each size class is calculated during the 91creation of the zsmalloc pool (see calculate_zspage_chain_size()). 92 93As an optimization, zsmalloc merges size classes that have similar 94characteristics in terms of the number of pages per zspage and the number 95of objects that each zspage can store. 96 97For instance, consider the following size classes::: 98 99 class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 100 ... 101 94 1536 0 0 0 0 0 3 0 102 100 1632 0 0 0 0 0 2 0 103 ... 104 105 106Size classes #95-99 are merged with size class #100. This means that when we 107need to store an object of size, say, 1568 bytes, we end up using size class 108#100 instead of size class #96. Size class #100 is meant for objects of size 1091632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes. 110 111Size class #100 consists of zspages with 2 physical pages each, which can 112hold a total of 5 objects. If we need to store 13 objects of size 1568, we 113end up allocating three zspages, or 6 physical pages. 114 115However, if we take a closer look at size class #96 (which is meant for 116objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we 117find that the most optimal zspage configuration for this class is a chain 118of 5 physical pages::: 119 120 pages per zspage wasted bytes used% 121 1 960 76 122 2 352 95 123 3 1312 89 124 4 704 95 125 5 96 99 126 127This means that a class #96 configuration with 5 physical pages can store 13 128objects of size 1568 in a single zspage, using a total of 5 physical pages. 129This is more efficient than the class #100 configuration, which would use 6 130physical pages to store the same number of objects. 131 132As the zspage chain size for class #96 increases, its key characteristics 133such as pages per-zspage and objects per-zspage also change. This leads to 134dewer class mergers, resulting in a more compact grouping of classes, which 135reduces memory wastage. 136 137Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`::: 138 139 class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 140 ... 141 202 3264 0 0 0 0 0 4 0 142 254 4096 0 0 0 0 0 1 0 143 ... 144 145Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages 146per zspage. Any object larger than 3264 bytes is considered huge and belongs 147to size class #254, which stores each object in its own physical page (objects 148in huge classes do not share pages). 149 150Increasing the size of the chain of zspages also results in a higher watermark 151for the huge size class and fewer huge classes overall. This allows for more 152efficient storage of large objects. 153 154For zspage chain size of 8, huge class watermark becomes 3632 bytes::: 155 156 class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 157 ... 158 202 3264 0 0 0 0 0 4 0 159 211 3408 0 0 0 0 0 5 0 160 217 3504 0 0 0 0 0 6 0 161 222 3584 0 0 0 0 0 7 0 162 225 3632 0 0 0 0 0 8 0 163 254 4096 0 0 0 0 0 1 0 164 ... 165 166For zspage chain size of 16, huge class watermark becomes 3840 bytes::: 167 168 class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 169 ... 170 202 3264 0 0 0 0 0 4 0 171 206 3328 0 0 0 0 0 13 0 172 207 3344 0 0 0 0 0 9 0 173 208 3360 0 0 0 0 0 14 0 174 211 3408 0 0 0 0 0 5 0 175 212 3424 0 0 0 0 0 16 0 176 214 3456 0 0 0 0 0 11 0 177 217 3504 0 0 0 0 0 6 0 178 219 3536 0 0 0 0 0 13 0 179 222 3584 0 0 0 0 0 7 0 180 223 3600 0 0 0 0 0 15 0 181 225 3632 0 0 0 0 0 8 0 182 228 3680 0 0 0 0 0 9 0 183 230 3712 0 0 0 0 0 10 0 184 232 3744 0 0 0 0 0 11 0 185 234 3776 0 0 0 0 0 12 0 186 235 3792 0 0 0 0 0 13 0 187 236 3808 0 0 0 0 0 14 0 188 238 3840 0 0 0 0 0 15 0 189 254 4096 0 0 0 0 0 1 0 190 ... 191 192Overall the combined zspage chain size effect on zsmalloc pool configuration::: 193 194 pages per zspage number of size classes (clusters) huge size class watermark 195 4 69 3264 196 5 86 3408 197 6 93 3504 198 7 112 3584 199 8 123 3632 200 9 140 3680 201 10 143 3712 202 11 159 3744 203 12 164 3776 204 13 180 3792 205 14 183 3808 206 15 188 3840 207 16 191 3840 208 209 210A synthetic test 211---------------- 212 213zram as a build artifacts storage (Linux kernel compilation). 214 215* `CONFIG_ZSMALLOC_CHAIN_SIZE=4` 216 217 zsmalloc classes stats::: 218 219 class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 220 ... 221 Total 13 51 413836 412973 159955 3 222 223 zram mm_stat::: 224 225 1691783168 628083717 655175680 0 655175680 60 0 34048 34049 226 227 228* `CONFIG_ZSMALLOC_CHAIN_SIZE=8` 229 230 zsmalloc classes stats::: 231 232 class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 233 ... 234 Total 18 87 414852 412978 156666 0 235 236 zram mm_stat::: 237 238 1691803648 627793930 641703936 0 641703936 60 0 33591 33591 239 240Using larger zspage chains may result in using fewer physical pages, as seen 241in the example where the number of physical pages used decreased from 159955 242to 156666, at the same time maximum zsmalloc pool memory usage went down from 243655175680 to 641703936 bytes. 244 245However, this advantage may be offset by the potential for increased system 246memory pressure (as some zspages have larger chain sizes) in cases where there 247is heavy internal fragmentation and zspool compaction is unable to relocate 248objects and release zspages. In these cases, it is recommended to decrease 249the limit on the size of the zspage chains (as specified by the 250CONFIG_ZSMALLOC_CHAIN_SIZE option). 251