1ee65728eSMike Rapoport======== 2ee65728eSMike Rapoportzsmalloc 3ee65728eSMike Rapoport======== 4ee65728eSMike Rapoport 5ee65728eSMike RapoportThis allocator is designed for use with zram. Thus, the allocator is 6ee65728eSMike Rapoportsupposed to work well under low memory conditions. In particular, it 7ee65728eSMike Rapoportnever attempts higher order page allocation which is very likely to 8ee65728eSMike Rapoportfail under memory pressure. On the other hand, if we just use single 9ee65728eSMike Rapoport(0-order) pages, it would suffer from very high fragmentation -- 10ee65728eSMike Rapoportany object of size PAGE_SIZE/2 or larger would occupy an entire page. 11ee65728eSMike RapoportThis was one of the major issues with its predecessor (xvmalloc). 12ee65728eSMike Rapoport 13ee65728eSMike RapoportTo overcome these issues, zsmalloc allocates a bunch of 0-order pages 14ee65728eSMike Rapoportand links them together using various 'struct page' fields. These linked 15ee65728eSMike Rapoportpages act as a single higher-order page i.e. an object can span 0-order 16ee65728eSMike Rapoportpage boundaries. The code refers to these linked pages as a single entity 17ee65728eSMike Rapoportcalled zspage. 18ee65728eSMike Rapoport 19ee65728eSMike RapoportFor simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE 20ee65728eSMike Rapoportsince this satisfies the requirements of all its current users (in the 21ee65728eSMike Rapoportworst case, page is incompressible and is thus stored "as-is" i.e. in 22ee65728eSMike Rapoportuncompressed form). For allocation requests larger than this size, failure 23ee65728eSMike Rapoportis returned (see zs_malloc). 24ee65728eSMike Rapoport 25ee65728eSMike RapoportAdditionally, zs_malloc() does not return a dereferenceable pointer. 26ee65728eSMike RapoportInstead, it returns an opaque handle (unsigned long) which encodes actual 27ee65728eSMike Rapoportlocation of the allocated object. The reason for this indirection is that 28ee65728eSMike Rapoportzsmalloc does not keep zspages permanently mapped since that would cause 29ee65728eSMike Rapoportissues on 32-bit systems where the VA region for kernel space mappings 30ee65728eSMike Rapoportis very small. So, before using the allocating memory, the object has to 31ee65728eSMike Rapoportbe mapped using zs_map_object() to get a usable pointer and subsequently 32ee65728eSMike Rapoportunmapped using zs_unmap_object(). 33ee65728eSMike Rapoport 34ee65728eSMike Rapoportstat 35ee65728eSMike Rapoport==== 36ee65728eSMike Rapoport 37ee65728eSMike RapoportWith CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via 38ee65728eSMike Rapoport``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output:: 39ee65728eSMike Rapoport 40ee65728eSMike Rapoport # cat /sys/kernel/debug/zsmalloc/zram0/classes 41ee65728eSMike Rapoport 42119b57eaSSergey Senozhatsky class size 10% 20% 30% 40% 50% 60% 70% 80% 90% 99% 100% obj_allocated obj_used pages_used pages_per_zspage freeable 43ee65728eSMike Rapoport ... 44ee65728eSMike Rapoport ... 45119b57eaSSergey Senozhatsky 30 512 0 12 4 1 0 1 0 0 1 0 414 3464 3346 433 1 14 46119b57eaSSergey Senozhatsky 31 528 2 7 2 2 1 0 1 0 0 2 117 4154 3793 536 4 44 47119b57eaSSergey Senozhatsky 32 544 6 3 4 1 2 1 0 0 0 1 260 4170 3965 556 2 26 48ee65728eSMike Rapoport ... 49ee65728eSMike Rapoport ... 50ee65728eSMike Rapoport 51ee65728eSMike Rapoport 52ee65728eSMike Rapoportclass 53ee65728eSMike Rapoport index 54ee65728eSMike Rapoportsize 55ee65728eSMike Rapoport object size zspage stores 56119b57eaSSergey Senozhatsky10% 57119b57eaSSergey Senozhatsky the number of zspages with usage ratio less than 10% (see below) 58119b57eaSSergey Senozhatsky20% 59119b57eaSSergey Senozhatsky the number of zspages with usage ratio between 10% and 20% 60119b57eaSSergey Senozhatsky30% 61119b57eaSSergey Senozhatsky the number of zspages with usage ratio between 20% and 30% 62119b57eaSSergey Senozhatsky40% 63119b57eaSSergey Senozhatsky the number of zspages with usage ratio between 30% and 40% 64119b57eaSSergey Senozhatsky50% 65119b57eaSSergey Senozhatsky the number of zspages with usage ratio between 40% and 50% 66119b57eaSSergey Senozhatsky60% 67119b57eaSSergey Senozhatsky the number of zspages with usage ratio between 50% and 60% 68119b57eaSSergey Senozhatsky70% 69119b57eaSSergey Senozhatsky the number of zspages with usage ratio between 60% and 70% 70119b57eaSSergey Senozhatsky80% 71119b57eaSSergey Senozhatsky the number of zspages with usage ratio between 70% and 80% 72119b57eaSSergey Senozhatsky90% 73119b57eaSSergey Senozhatsky the number of zspages with usage ratio between 80% and 90% 74119b57eaSSergey Senozhatsky99% 75119b57eaSSergey Senozhatsky the number of zspages with usage ratio between 90% and 99% 76119b57eaSSergey Senozhatsky100% 77119b57eaSSergey Senozhatsky the number of zspages with usage ratio 100% 78ee65728eSMike Rapoportobj_allocated 79ee65728eSMike Rapoport the number of objects allocated 80ee65728eSMike Rapoportobj_used 81ee65728eSMike Rapoport the number of objects allocated to the user 82ee65728eSMike Rapoportpages_used 83ee65728eSMike Rapoport the number of pages allocated for the class 84ee65728eSMike Rapoportpages_per_zspage 85ee65728eSMike Rapoport the number of 0-order pages to make a zspage 86618a8a91SSergey Senozhatskyfreeable 87618a8a91SSergey Senozhatsky the approximate number of pages class compaction can free 88ee65728eSMike Rapoport 89119b57eaSSergey SenozhatskyEach zspage maintains inuse counter which keeps track of the number of 90119b57eaSSergey Senozhatskyobjects stored in the zspage. The inuse counter determines the zspage's 91119b57eaSSergey Senozhatsky"fullness group" which is calculated as the ratio of the "inuse" objects to 92119b57eaSSergey Senozhatskythe total number of objects the zspage can hold (objs_per_zspage). The 93119b57eaSSergey Senozhatskycloser the inuse counter is to objs_per_zspage, the better. 944ff93b29SSergey Senozhatsky 954ff93b29SSergey SenozhatskyInternals 964ff93b29SSergey Senozhatsky========= 974ff93b29SSergey Senozhatsky 984ff93b29SSergey Senozhatskyzsmalloc has 255 size classes, each of which can hold a number of zspages. 994ff93b29SSergey SenozhatskyEach zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages. 1004ff93b29SSergey SenozhatskyThe optimal zspage chain size for each size class is calculated during the 1014ff93b29SSergey Senozhatskycreation of the zsmalloc pool (see calculate_zspage_chain_size()). 1024ff93b29SSergey Senozhatsky 1034ff93b29SSergey SenozhatskyAs an optimization, zsmalloc merges size classes that have similar 1044ff93b29SSergey Senozhatskycharacteristics in terms of the number of pages per zspage and the number 1054ff93b29SSergey Senozhatskyof objects that each zspage can store. 1064ff93b29SSergey Senozhatsky 1074ff93b29SSergey SenozhatskyFor instance, consider the following size classes::: 1084ff93b29SSergey Senozhatsky 109119b57eaSSergey Senozhatsky class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 1104ff93b29SSergey Senozhatsky ... 111119b57eaSSergey Senozhatsky 94 1536 0 .... 0 0 0 0 3 0 112119b57eaSSergey Senozhatsky 100 1632 0 .... 0 0 0 0 2 0 1134ff93b29SSergey Senozhatsky ... 1144ff93b29SSergey Senozhatsky 1154ff93b29SSergey Senozhatsky 1164ff93b29SSergey SenozhatskySize classes #95-99 are merged with size class #100. This means that when we 1174ff93b29SSergey Senozhatskyneed to store an object of size, say, 1568 bytes, we end up using size class 1184ff93b29SSergey Senozhatsky#100 instead of size class #96. Size class #100 is meant for objects of size 1194ff93b29SSergey Senozhatsky1632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes. 1204ff93b29SSergey Senozhatsky 1214ff93b29SSergey SenozhatskySize class #100 consists of zspages with 2 physical pages each, which can 1224ff93b29SSergey Senozhatskyhold a total of 5 objects. If we need to store 13 objects of size 1568, we 1234ff93b29SSergey Senozhatskyend up allocating three zspages, or 6 physical pages. 1244ff93b29SSergey Senozhatsky 1254ff93b29SSergey SenozhatskyHowever, if we take a closer look at size class #96 (which is meant for 1264ff93b29SSergey Senozhatskyobjects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we 1274ff93b29SSergey Senozhatskyfind that the most optimal zspage configuration for this class is a chain 1284ff93b29SSergey Senozhatskyof 5 physical pages::: 1294ff93b29SSergey Senozhatsky 1304ff93b29SSergey Senozhatsky pages per zspage wasted bytes used% 1314ff93b29SSergey Senozhatsky 1 960 76 1324ff93b29SSergey Senozhatsky 2 352 95 1334ff93b29SSergey Senozhatsky 3 1312 89 1344ff93b29SSergey Senozhatsky 4 704 95 1354ff93b29SSergey Senozhatsky 5 96 99 1364ff93b29SSergey Senozhatsky 1374ff93b29SSergey SenozhatskyThis means that a class #96 configuration with 5 physical pages can store 13 1384ff93b29SSergey Senozhatskyobjects of size 1568 in a single zspage, using a total of 5 physical pages. 1394ff93b29SSergey SenozhatskyThis is more efficient than the class #100 configuration, which would use 6 1404ff93b29SSergey Senozhatskyphysical pages to store the same number of objects. 1414ff93b29SSergey Senozhatsky 1424ff93b29SSergey SenozhatskyAs the zspage chain size for class #96 increases, its key characteristics 1434ff93b29SSergey Senozhatskysuch as pages per-zspage and objects per-zspage also change. This leads to 1444ff93b29SSergey Senozhatskydewer class mergers, resulting in a more compact grouping of classes, which 1454ff93b29SSergey Senozhatskyreduces memory wastage. 1464ff93b29SSergey Senozhatsky 1474ff93b29SSergey SenozhatskyLet's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`::: 1484ff93b29SSergey Senozhatsky 149119b57eaSSergey Senozhatsky class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 150119b57eaSSergey Senozhatsky 1514ff93b29SSergey Senozhatsky ... 152119b57eaSSergey Senozhatsky 202 3264 0 .. 0 0 0 0 4 0 153119b57eaSSergey Senozhatsky 254 4096 0 .. 0 0 0 0 1 0 1544ff93b29SSergey Senozhatsky ... 1554ff93b29SSergey Senozhatsky 1564ff93b29SSergey SenozhatskySize class #202 stores objects of size 3264 bytes and has a maximum of 4 pages 1574ff93b29SSergey Senozhatskyper zspage. Any object larger than 3264 bytes is considered huge and belongs 1584ff93b29SSergey Senozhatskyto size class #254, which stores each object in its own physical page (objects 1594ff93b29SSergey Senozhatskyin huge classes do not share pages). 1604ff93b29SSergey Senozhatsky 1614ff93b29SSergey SenozhatskyIncreasing the size of the chain of zspages also results in a higher watermark 1624ff93b29SSergey Senozhatskyfor the huge size class and fewer huge classes overall. This allows for more 1634ff93b29SSergey Senozhatskyefficient storage of large objects. 1644ff93b29SSergey Senozhatsky 1654ff93b29SSergey SenozhatskyFor zspage chain size of 8, huge class watermark becomes 3632 bytes::: 1664ff93b29SSergey Senozhatsky 167119b57eaSSergey Senozhatsky class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 168119b57eaSSergey Senozhatsky 1694ff93b29SSergey Senozhatsky ... 170119b57eaSSergey Senozhatsky 202 3264 0 .. 0 0 0 0 4 0 171119b57eaSSergey Senozhatsky 211 3408 0 .. 0 0 0 0 5 0 172119b57eaSSergey Senozhatsky 217 3504 0 .. 0 0 0 0 6 0 173119b57eaSSergey Senozhatsky 222 3584 0 .. 0 0 0 0 7 0 174119b57eaSSergey Senozhatsky 225 3632 0 .. 0 0 0 0 8 0 175119b57eaSSergey Senozhatsky 254 4096 0 .. 0 0 0 0 1 0 1764ff93b29SSergey Senozhatsky ... 1774ff93b29SSergey Senozhatsky 1784ff93b29SSergey SenozhatskyFor zspage chain size of 16, huge class watermark becomes 3840 bytes::: 1794ff93b29SSergey Senozhatsky 180119b57eaSSergey Senozhatsky class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 181119b57eaSSergey Senozhatsky 1824ff93b29SSergey Senozhatsky ... 183119b57eaSSergey Senozhatsky 202 3264 0 .. 0 0 0 0 4 0 184119b57eaSSergey Senozhatsky 206 3328 0 .. 0 0 0 0 13 0 185119b57eaSSergey Senozhatsky 207 3344 0 .. 0 0 0 0 9 0 186119b57eaSSergey Senozhatsky 208 3360 0 .. 0 0 0 0 14 0 187119b57eaSSergey Senozhatsky 211 3408 0 .. 0 0 0 0 5 0 188119b57eaSSergey Senozhatsky 212 3424 0 .. 0 0 0 0 16 0 189119b57eaSSergey Senozhatsky 214 3456 0 .. 0 0 0 0 11 0 190119b57eaSSergey Senozhatsky 217 3504 0 .. 0 0 0 0 6 0 191119b57eaSSergey Senozhatsky 219 3536 0 .. 0 0 0 0 13 0 192119b57eaSSergey Senozhatsky 222 3584 0 .. 0 0 0 0 7 0 193119b57eaSSergey Senozhatsky 223 3600 0 .. 0 0 0 0 15 0 194119b57eaSSergey Senozhatsky 225 3632 0 .. 0 0 0 0 8 0 195119b57eaSSergey Senozhatsky 228 3680 0 .. 0 0 0 0 9 0 196119b57eaSSergey Senozhatsky 230 3712 0 .. 0 0 0 0 10 0 197119b57eaSSergey Senozhatsky 232 3744 0 .. 0 0 0 0 11 0 198119b57eaSSergey Senozhatsky 234 3776 0 .. 0 0 0 0 12 0 199119b57eaSSergey Senozhatsky 235 3792 0 .. 0 0 0 0 13 0 200119b57eaSSergey Senozhatsky 236 3808 0 .. 0 0 0 0 14 0 201119b57eaSSergey Senozhatsky 238 3840 0 .. 0 0 0 0 15 0 202119b57eaSSergey Senozhatsky 254 4096 0 .. 0 0 0 0 1 0 2034ff93b29SSergey Senozhatsky ... 2044ff93b29SSergey Senozhatsky 2054ff93b29SSergey SenozhatskyOverall the combined zspage chain size effect on zsmalloc pool configuration::: 2064ff93b29SSergey Senozhatsky 2074ff93b29SSergey Senozhatsky pages per zspage number of size classes (clusters) huge size class watermark 2084ff93b29SSergey Senozhatsky 4 69 3264 2094ff93b29SSergey Senozhatsky 5 86 3408 2104ff93b29SSergey Senozhatsky 6 93 3504 2114ff93b29SSergey Senozhatsky 7 112 3584 2124ff93b29SSergey Senozhatsky 8 123 3632 2134ff93b29SSergey Senozhatsky 9 140 3680 2144ff93b29SSergey Senozhatsky 10 143 3712 2154ff93b29SSergey Senozhatsky 11 159 3744 2164ff93b29SSergey Senozhatsky 12 164 3776 2174ff93b29SSergey Senozhatsky 13 180 3792 2184ff93b29SSergey Senozhatsky 14 183 3808 2194ff93b29SSergey Senozhatsky 15 188 3840 2204ff93b29SSergey Senozhatsky 16 191 3840 2214ff93b29SSergey Senozhatsky 2224ff93b29SSergey Senozhatsky 2234ff93b29SSergey SenozhatskyA synthetic test 2244ff93b29SSergey Senozhatsky---------------- 2254ff93b29SSergey Senozhatsky 2264ff93b29SSergey Senozhatskyzram as a build artifacts storage (Linux kernel compilation). 2274ff93b29SSergey Senozhatsky 2284ff93b29SSergey Senozhatsky* `CONFIG_ZSMALLOC_CHAIN_SIZE=4` 2294ff93b29SSergey Senozhatsky 2304ff93b29SSergey Senozhatsky zsmalloc classes stats::: 2314ff93b29SSergey Senozhatsky 232119b57eaSSergey Senozhatsky class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 233119b57eaSSergey Senozhatsky 2344ff93b29SSergey Senozhatsky ... 235119b57eaSSergey Senozhatsky Total 13 .. 51 413836 412973 159955 3 2364ff93b29SSergey Senozhatsky 2374ff93b29SSergey Senozhatsky zram mm_stat::: 2384ff93b29SSergey Senozhatsky 2394ff93b29SSergey Senozhatsky 1691783168 628083717 655175680 0 655175680 60 0 34048 34049 2404ff93b29SSergey Senozhatsky 2414ff93b29SSergey Senozhatsky 2424ff93b29SSergey Senozhatsky* `CONFIG_ZSMALLOC_CHAIN_SIZE=8` 2434ff93b29SSergey Senozhatsky 2444ff93b29SSergey Senozhatsky zsmalloc classes stats::: 2454ff93b29SSergey Senozhatsky 246119b57eaSSergey Senozhatsky class size 10% .... 100% obj_allocated obj_used pages_used pages_per_zspage freeable 247119b57eaSSergey Senozhatsky 2484ff93b29SSergey Senozhatsky ... 249119b57eaSSergey Senozhatsky Total 18 .. 87 414852 412978 156666 0 2504ff93b29SSergey Senozhatsky 2514ff93b29SSergey Senozhatsky zram mm_stat::: 2524ff93b29SSergey Senozhatsky 2534ff93b29SSergey Senozhatsky 1691803648 627793930 641703936 0 641703936 60 0 33591 33591 2544ff93b29SSergey Senozhatsky 2554ff93b29SSergey SenozhatskyUsing larger zspage chains may result in using fewer physical pages, as seen 2564ff93b29SSergey Senozhatskyin the example where the number of physical pages used decreased from 159955 2574ff93b29SSergey Senozhatskyto 156666, at the same time maximum zsmalloc pool memory usage went down from 2584ff93b29SSergey Senozhatsky655175680 to 641703936 bytes. 2594ff93b29SSergey Senozhatsky 2604ff93b29SSergey SenozhatskyHowever, this advantage may be offset by the potential for increased system 2614ff93b29SSergey Senozhatskymemory pressure (as some zspages have larger chain sizes) in cases where there 2624ff93b29SSergey Senozhatskyis heavy internal fragmentation and zspool compaction is unable to relocate 2634ff93b29SSergey Senozhatskyobjects and release zspages. In these cases, it is recommended to decrease 2644ff93b29SSergey Senozhatskythe limit on the size of the zspage chains (as specified by the 2654ff93b29SSergey SenozhatskyCONFIG_ZSMALLOC_CHAIN_SIZE option). 266*61ff748bSMatthew Wilcox (Oracle) 267*61ff748bSMatthew Wilcox (Oracle)Functions 268*61ff748bSMatthew Wilcox (Oracle)========= 269*61ff748bSMatthew Wilcox (Oracle) 270*61ff748bSMatthew Wilcox (Oracle).. kernel-doc:: mm/zsmalloc.c 271