1======== 2zsmalloc 3======== 4 5This allocator is designed for use with zram. Thus, the allocator is 6supposed to work well under low memory conditions. In particular, it 7never attempts higher order page allocation which is very likely to 8fail under memory pressure. On the other hand, if we just use single 9(0-order) pages, it would suffer from very high fragmentation -- 10any object of size PAGE_SIZE/2 or larger would occupy an entire page. 11This was one of the major issues with its predecessor (xvmalloc). 12 13To overcome these issues, zsmalloc allocates a bunch of 0-order pages 14and links them together using various 'struct page' fields. These linked 15pages act as a single higher-order page i.e. an object can span 0-order 16page boundaries. The code refers to these linked pages as a single entity 17called zspage. 18 19For simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE 20since this satisfies the requirements of all its current users (in the 21worst case, page is incompressible and is thus stored "as-is" i.e. in 22uncompressed form). For allocation requests larger than this size, failure 23is returned (see zs_malloc). 24 25Additionally, zs_malloc() does not return a dereferenceable pointer. 26Instead, it returns an opaque handle (unsigned long) which encodes actual 27location of the allocated object. The reason for this indirection is that 28zsmalloc does not keep zspages permanently mapped since that would cause 29issues on 32-bit systems where the VA region for kernel space mappings 30is very small. So, before using the allocating memory, the object has to 31be mapped using zs_map_object() to get a usable pointer and subsequently 32unmapped using zs_unmap_object(). 33 34stat 35==== 36 37With CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via 38``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output:: 39 40 # cat /sys/kernel/debug/zsmalloc/zram0/classes 41 42 class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage 43 ... 44 ... 45 9 176 0 1 186 129 8 4 46 10 192 1 0 2880 2872 135 3 47 11 208 0 1 819 795 42 2 48 12 224 0 1 219 159 12 4 49 ... 50 ... 51 52 53class 54 index 55size 56 object size zspage stores 57almost_empty 58 the number of ZS_ALMOST_EMPTY zspages(see below) 59almost_full 60 the number of ZS_ALMOST_FULL zspages(see below) 61obj_allocated 62 the number of objects allocated 63obj_used 64 the number of objects allocated to the user 65pages_used 66 the number of pages allocated for the class 67pages_per_zspage 68 the number of 0-order pages to make a zspage 69 70We assign a zspage to ZS_ALMOST_EMPTY fullness group when n <= N / f, where 71 72* n = number of allocated objects 73* N = total number of objects zspage can store 74* f = fullness_threshold_frac(ie, 4 at the moment) 75 76Similarly, we assign zspage to: 77 78* ZS_ALMOST_FULL when n > N / f 79* ZS_EMPTY when n == 0 80* ZS_FULL when n == N 81 82 83Internals 84========= 85 86zsmalloc has 255 size classes, each of which can hold a number of zspages. 87Each zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages. 88The optimal zspage chain size for each size class is calculated during the 89creation of the zsmalloc pool (see calculate_zspage_chain_size()). 90 91As an optimization, zsmalloc merges size classes that have similar 92characteristics in terms of the number of pages per zspage and the number 93of objects that each zspage can store. 94 95For instance, consider the following size classes::: 96 97 class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 98 ... 99 94 1536 0 0 0 0 0 3 0 100 100 1632 0 0 0 0 0 2 0 101 ... 102 103 104Size classes #95-99 are merged with size class #100. This means that when we 105need to store an object of size, say, 1568 bytes, we end up using size class 106#100 instead of size class #96. Size class #100 is meant for objects of size 1071632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes. 108 109Size class #100 consists of zspages with 2 physical pages each, which can 110hold a total of 5 objects. If we need to store 13 objects of size 1568, we 111end up allocating three zspages, or 6 physical pages. 112 113However, if we take a closer look at size class #96 (which is meant for 114objects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we 115find that the most optimal zspage configuration for this class is a chain 116of 5 physical pages::: 117 118 pages per zspage wasted bytes used% 119 1 960 76 120 2 352 95 121 3 1312 89 122 4 704 95 123 5 96 99 124 125This means that a class #96 configuration with 5 physical pages can store 13 126objects of size 1568 in a single zspage, using a total of 5 physical pages. 127This is more efficient than the class #100 configuration, which would use 6 128physical pages to store the same number of objects. 129 130As the zspage chain size for class #96 increases, its key characteristics 131such as pages per-zspage and objects per-zspage also change. This leads to 132dewer class mergers, resulting in a more compact grouping of classes, which 133reduces memory wastage. 134 135Let's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`::: 136 137 class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 138 ... 139 202 3264 0 0 0 0 0 4 0 140 254 4096 0 0 0 0 0 1 0 141 ... 142 143Size class #202 stores objects of size 3264 bytes and has a maximum of 4 pages 144per zspage. Any object larger than 3264 bytes is considered huge and belongs 145to size class #254, which stores each object in its own physical page (objects 146in huge classes do not share pages). 147 148Increasing the size of the chain of zspages also results in a higher watermark 149for the huge size class and fewer huge classes overall. This allows for more 150efficient storage of large objects. 151 152For zspage chain size of 8, huge class watermark becomes 3632 bytes::: 153 154 class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 155 ... 156 202 3264 0 0 0 0 0 4 0 157 211 3408 0 0 0 0 0 5 0 158 217 3504 0 0 0 0 0 6 0 159 222 3584 0 0 0 0 0 7 0 160 225 3632 0 0 0 0 0 8 0 161 254 4096 0 0 0 0 0 1 0 162 ... 163 164For zspage chain size of 16, huge class watermark becomes 3840 bytes::: 165 166 class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 167 ... 168 202 3264 0 0 0 0 0 4 0 169 206 3328 0 0 0 0 0 13 0 170 207 3344 0 0 0 0 0 9 0 171 208 3360 0 0 0 0 0 14 0 172 211 3408 0 0 0 0 0 5 0 173 212 3424 0 0 0 0 0 16 0 174 214 3456 0 0 0 0 0 11 0 175 217 3504 0 0 0 0 0 6 0 176 219 3536 0 0 0 0 0 13 0 177 222 3584 0 0 0 0 0 7 0 178 223 3600 0 0 0 0 0 15 0 179 225 3632 0 0 0 0 0 8 0 180 228 3680 0 0 0 0 0 9 0 181 230 3712 0 0 0 0 0 10 0 182 232 3744 0 0 0 0 0 11 0 183 234 3776 0 0 0 0 0 12 0 184 235 3792 0 0 0 0 0 13 0 185 236 3808 0 0 0 0 0 14 0 186 238 3840 0 0 0 0 0 15 0 187 254 4096 0 0 0 0 0 1 0 188 ... 189 190Overall the combined zspage chain size effect on zsmalloc pool configuration::: 191 192 pages per zspage number of size classes (clusters) huge size class watermark 193 4 69 3264 194 5 86 3408 195 6 93 3504 196 7 112 3584 197 8 123 3632 198 9 140 3680 199 10 143 3712 200 11 159 3744 201 12 164 3776 202 13 180 3792 203 14 183 3808 204 15 188 3840 205 16 191 3840 206 207 208A synthetic test 209---------------- 210 211zram as a build artifacts storage (Linux kernel compilation). 212 213* `CONFIG_ZSMALLOC_CHAIN_SIZE=4` 214 215 zsmalloc classes stats::: 216 217 class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 218 ... 219 Total 13 51 413836 412973 159955 3 220 221 zram mm_stat::: 222 223 1691783168 628083717 655175680 0 655175680 60 0 34048 34049 224 225 226* `CONFIG_ZSMALLOC_CHAIN_SIZE=8` 227 228 zsmalloc classes stats::: 229 230 class size almost_full almost_empty obj_allocated obj_used pages_used pages_per_zspage freeable 231 ... 232 Total 18 87 414852 412978 156666 0 233 234 zram mm_stat::: 235 236 1691803648 627793930 641703936 0 641703936 60 0 33591 33591 237 238Using larger zspage chains may result in using fewer physical pages, as seen 239in the example where the number of physical pages used decreased from 159955 240to 156666, at the same time maximum zsmalloc pool memory usage went down from 241655175680 to 641703936 bytes. 242 243However, this advantage may be offset by the potential for increased system 244memory pressure (as some zspages have larger chain sizes) in cases where there 245is heavy internal fragmentation and zspool compaction is unable to relocate 246objects and release zspages. In these cases, it is recommended to decrease 247the limit on the size of the zspage chains (as specified by the 248CONFIG_ZSMALLOC_CHAIN_SIZE option). 249