xref: /linux/Documentation/mm/zsmalloc.rst (revision a1c613ae4c322ddd58d5a8539dbfba2a0380a8c0)
1ee65728eSMike Rapoport========
2ee65728eSMike Rapoportzsmalloc
3ee65728eSMike Rapoport========
4ee65728eSMike Rapoport
5ee65728eSMike RapoportThis allocator is designed for use with zram. Thus, the allocator is
6ee65728eSMike Rapoportsupposed to work well under low memory conditions. In particular, it
7ee65728eSMike Rapoportnever attempts higher order page allocation which is very likely to
8ee65728eSMike Rapoportfail under memory pressure. On the other hand, if we just use single
9ee65728eSMike Rapoport(0-order) pages, it would suffer from very high fragmentation --
10ee65728eSMike Rapoportany object of size PAGE_SIZE/2 or larger would occupy an entire page.
11ee65728eSMike RapoportThis was one of the major issues with its predecessor (xvmalloc).
12ee65728eSMike Rapoport
13ee65728eSMike RapoportTo overcome these issues, zsmalloc allocates a bunch of 0-order pages
14ee65728eSMike Rapoportand links them together using various 'struct page' fields. These linked
15ee65728eSMike Rapoportpages act as a single higher-order page i.e. an object can span 0-order
16ee65728eSMike Rapoportpage boundaries. The code refers to these linked pages as a single entity
17ee65728eSMike Rapoportcalled zspage.
18ee65728eSMike Rapoport
19ee65728eSMike RapoportFor simplicity, zsmalloc can only allocate objects of size up to PAGE_SIZE
20ee65728eSMike Rapoportsince this satisfies the requirements of all its current users (in the
21ee65728eSMike Rapoportworst case, page is incompressible and is thus stored "as-is" i.e. in
22ee65728eSMike Rapoportuncompressed form). For allocation requests larger than this size, failure
23ee65728eSMike Rapoportis returned (see zs_malloc).
24ee65728eSMike Rapoport
25ee65728eSMike RapoportAdditionally, zs_malloc() does not return a dereferenceable pointer.
26ee65728eSMike RapoportInstead, it returns an opaque handle (unsigned long) which encodes actual
27ee65728eSMike Rapoportlocation of the allocated object. The reason for this indirection is that
28ee65728eSMike Rapoportzsmalloc does not keep zspages permanently mapped since that would cause
29ee65728eSMike Rapoportissues on 32-bit systems where the VA region for kernel space mappings
30ee65728eSMike Rapoportis very small. So, before using the allocating memory, the object has to
31ee65728eSMike Rapoportbe mapped using zs_map_object() to get a usable pointer and subsequently
32ee65728eSMike Rapoportunmapped using zs_unmap_object().
33ee65728eSMike Rapoport
34ee65728eSMike Rapoportstat
35ee65728eSMike Rapoport====
36ee65728eSMike Rapoport
37ee65728eSMike RapoportWith CONFIG_ZSMALLOC_STAT, we could see zsmalloc internal information via
38ee65728eSMike Rapoport``/sys/kernel/debug/zsmalloc/<user name>``. Here is a sample of stat output::
39ee65728eSMike Rapoport
40ee65728eSMike Rapoport # cat /sys/kernel/debug/zsmalloc/zram0/classes
41ee65728eSMike Rapoport
42119b57eaSSergey Senozhatsky class  size       10%       20%       30%       40%       50%       60%       70%       80%       90%       99%      100% obj_allocated   obj_used pages_used pages_per_zspage freeable
43ee65728eSMike Rapoport    ...
44ee65728eSMike Rapoport    ...
45119b57eaSSergey Senozhatsky    30   512         0        12         4         1         0         1         0         0         1         0       414          3464       3346        433                1       14
46119b57eaSSergey Senozhatsky    31   528         2         7         2         2         1         0         1         0         0         2       117          4154       3793        536                4       44
47119b57eaSSergey Senozhatsky    32   544         6         3         4         1         2         1         0         0         0         1       260          4170       3965        556                2       26
48ee65728eSMike Rapoport    ...
49ee65728eSMike Rapoport    ...
50ee65728eSMike Rapoport
51ee65728eSMike Rapoport
52ee65728eSMike Rapoportclass
53ee65728eSMike Rapoport	index
54ee65728eSMike Rapoportsize
55ee65728eSMike Rapoport	object size zspage stores
56119b57eaSSergey Senozhatsky10%
57119b57eaSSergey Senozhatsky	the number of zspages with usage ratio less than 10% (see below)
58119b57eaSSergey Senozhatsky20%
59119b57eaSSergey Senozhatsky	the number of zspages with usage ratio between 10% and 20%
60119b57eaSSergey Senozhatsky30%
61119b57eaSSergey Senozhatsky	the number of zspages with usage ratio between 20% and 30%
62119b57eaSSergey Senozhatsky40%
63119b57eaSSergey Senozhatsky	the number of zspages with usage ratio between 30% and 40%
64119b57eaSSergey Senozhatsky50%
65119b57eaSSergey Senozhatsky	the number of zspages with usage ratio between 40% and 50%
66119b57eaSSergey Senozhatsky60%
67119b57eaSSergey Senozhatsky	the number of zspages with usage ratio between 50% and 60%
68119b57eaSSergey Senozhatsky70%
69119b57eaSSergey Senozhatsky	the number of zspages with usage ratio between 60% and 70%
70119b57eaSSergey Senozhatsky80%
71119b57eaSSergey Senozhatsky	the number of zspages with usage ratio between 70% and 80%
72119b57eaSSergey Senozhatsky90%
73119b57eaSSergey Senozhatsky	the number of zspages with usage ratio between 80% and 90%
74119b57eaSSergey Senozhatsky99%
75119b57eaSSergey Senozhatsky	the number of zspages with usage ratio between 90% and 99%
76119b57eaSSergey Senozhatsky100%
77119b57eaSSergey Senozhatsky	the number of zspages with usage ratio 100%
78ee65728eSMike Rapoportobj_allocated
79ee65728eSMike Rapoport	the number of objects allocated
80ee65728eSMike Rapoportobj_used
81ee65728eSMike Rapoport	the number of objects allocated to the user
82ee65728eSMike Rapoportpages_used
83ee65728eSMike Rapoport	the number of pages allocated for the class
84ee65728eSMike Rapoportpages_per_zspage
85ee65728eSMike Rapoport	the number of 0-order pages to make a zspage
86618a8a91SSergey Senozhatskyfreeable
87618a8a91SSergey Senozhatsky	the approximate number of pages class compaction can free
88ee65728eSMike Rapoport
89119b57eaSSergey SenozhatskyEach zspage maintains inuse counter which keeps track of the number of
90119b57eaSSergey Senozhatskyobjects stored in the zspage.  The inuse counter determines the zspage's
91119b57eaSSergey Senozhatsky"fullness group" which is calculated as the ratio of the "inuse" objects to
92119b57eaSSergey Senozhatskythe total number of objects the zspage can hold (objs_per_zspage). The
93119b57eaSSergey Senozhatskycloser the inuse counter is to objs_per_zspage, the better.
944ff93b29SSergey Senozhatsky
954ff93b29SSergey SenozhatskyInternals
964ff93b29SSergey Senozhatsky=========
974ff93b29SSergey Senozhatsky
984ff93b29SSergey Senozhatskyzsmalloc has 255 size classes, each of which can hold a number of zspages.
994ff93b29SSergey SenozhatskyEach zspage can contain up to ZSMALLOC_CHAIN_SIZE physical (0-order) pages.
1004ff93b29SSergey SenozhatskyThe optimal zspage chain size for each size class is calculated during the
1014ff93b29SSergey Senozhatskycreation of the zsmalloc pool (see calculate_zspage_chain_size()).
1024ff93b29SSergey Senozhatsky
1034ff93b29SSergey SenozhatskyAs an optimization, zsmalloc merges size classes that have similar
1044ff93b29SSergey Senozhatskycharacteristics in terms of the number of pages per zspage and the number
1054ff93b29SSergey Senozhatskyof objects that each zspage can store.
1064ff93b29SSergey Senozhatsky
1074ff93b29SSergey SenozhatskyFor instance, consider the following size classes:::
1084ff93b29SSergey Senozhatsky
109119b57eaSSergey Senozhatsky  class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
1104ff93b29SSergey Senozhatsky  ...
111119b57eaSSergey Senozhatsky     94  1536        0    ....       0             0          0          0                3        0
112119b57eaSSergey Senozhatsky    100  1632        0    ....       0             0          0          0                2        0
1134ff93b29SSergey Senozhatsky  ...
1144ff93b29SSergey Senozhatsky
1154ff93b29SSergey Senozhatsky
1164ff93b29SSergey SenozhatskySize classes #95-99 are merged with size class #100. This means that when we
1174ff93b29SSergey Senozhatskyneed to store an object of size, say, 1568 bytes, we end up using size class
1184ff93b29SSergey Senozhatsky#100 instead of size class #96. Size class #100 is meant for objects of size
1194ff93b29SSergey Senozhatsky1632 bytes, so each object of size 1568 bytes wastes 1632-1568=64 bytes.
1204ff93b29SSergey Senozhatsky
1214ff93b29SSergey SenozhatskySize class #100 consists of zspages with 2 physical pages each, which can
1224ff93b29SSergey Senozhatskyhold a total of 5 objects. If we need to store 13 objects of size 1568, we
1234ff93b29SSergey Senozhatskyend up allocating three zspages, or 6 physical pages.
1244ff93b29SSergey Senozhatsky
1254ff93b29SSergey SenozhatskyHowever, if we take a closer look at size class #96 (which is meant for
1264ff93b29SSergey Senozhatskyobjects of size 1568 bytes) and trace `calculate_zspage_chain_size()`, we
1274ff93b29SSergey Senozhatskyfind that the most optimal zspage configuration for this class is a chain
1284ff93b29SSergey Senozhatskyof 5 physical pages:::
1294ff93b29SSergey Senozhatsky
1304ff93b29SSergey Senozhatsky    pages per zspage      wasted bytes     used%
1314ff93b29SSergey Senozhatsky           1                  960           76
1324ff93b29SSergey Senozhatsky           2                  352           95
1334ff93b29SSergey Senozhatsky           3                 1312           89
1344ff93b29SSergey Senozhatsky           4                  704           95
1354ff93b29SSergey Senozhatsky           5                   96           99
1364ff93b29SSergey Senozhatsky
1374ff93b29SSergey SenozhatskyThis means that a class #96 configuration with 5 physical pages can store 13
1384ff93b29SSergey Senozhatskyobjects of size 1568 in a single zspage, using a total of 5 physical pages.
1394ff93b29SSergey SenozhatskyThis is more efficient than the class #100 configuration, which would use 6
1404ff93b29SSergey Senozhatskyphysical pages to store the same number of objects.
1414ff93b29SSergey Senozhatsky
1424ff93b29SSergey SenozhatskyAs the zspage chain size for class #96 increases, its key characteristics
1434ff93b29SSergey Senozhatskysuch as pages per-zspage and objects per-zspage also change. This leads to
1444ff93b29SSergey Senozhatskydewer class mergers, resulting in a more compact grouping of classes, which
1454ff93b29SSergey Senozhatskyreduces memory wastage.
1464ff93b29SSergey Senozhatsky
1474ff93b29SSergey SenozhatskyLet's take a closer look at the bottom of `/sys/kernel/debug/zsmalloc/zramX/classes`:::
1484ff93b29SSergey Senozhatsky
149119b57eaSSergey Senozhatsky  class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
150119b57eaSSergey Senozhatsky
1514ff93b29SSergey Senozhatsky  ...
152119b57eaSSergey Senozhatsky    202  3264         0   ..         0             0          0          0                4        0
153119b57eaSSergey Senozhatsky    254  4096         0   ..         0             0          0          0                1        0
1544ff93b29SSergey Senozhatsky  ...
1554ff93b29SSergey Senozhatsky
1564ff93b29SSergey SenozhatskySize class #202 stores objects of size 3264 bytes and has a maximum of 4 pages
1574ff93b29SSergey Senozhatskyper zspage. Any object larger than 3264 bytes is considered huge and belongs
1584ff93b29SSergey Senozhatskyto size class #254, which stores each object in its own physical page (objects
1594ff93b29SSergey Senozhatskyin huge classes do not share pages).
1604ff93b29SSergey Senozhatsky
1614ff93b29SSergey SenozhatskyIncreasing the size of the chain of zspages also results in a higher watermark
1624ff93b29SSergey Senozhatskyfor the huge size class and fewer huge classes overall. This allows for more
1634ff93b29SSergey Senozhatskyefficient storage of large objects.
1644ff93b29SSergey Senozhatsky
1654ff93b29SSergey SenozhatskyFor zspage chain size of 8, huge class watermark becomes 3632 bytes:::
1664ff93b29SSergey Senozhatsky
167119b57eaSSergey Senozhatsky  class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
168119b57eaSSergey Senozhatsky
1694ff93b29SSergey Senozhatsky  ...
170119b57eaSSergey Senozhatsky    202  3264         0   ..         0             0          0          0                4        0
171119b57eaSSergey Senozhatsky    211  3408         0   ..         0             0          0          0                5        0
172119b57eaSSergey Senozhatsky    217  3504         0   ..         0             0          0          0                6        0
173119b57eaSSergey Senozhatsky    222  3584         0   ..         0             0          0          0                7        0
174119b57eaSSergey Senozhatsky    225  3632         0   ..         0             0          0          0                8        0
175119b57eaSSergey Senozhatsky    254  4096         0   ..         0             0          0          0                1        0
1764ff93b29SSergey Senozhatsky  ...
1774ff93b29SSergey Senozhatsky
1784ff93b29SSergey SenozhatskyFor zspage chain size of 16, huge class watermark becomes 3840 bytes:::
1794ff93b29SSergey Senozhatsky
180119b57eaSSergey Senozhatsky  class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
181119b57eaSSergey Senozhatsky
1824ff93b29SSergey Senozhatsky  ...
183119b57eaSSergey Senozhatsky    202  3264         0   ..         0             0          0          0                4        0
184119b57eaSSergey Senozhatsky    206  3328         0   ..         0             0          0          0               13        0
185119b57eaSSergey Senozhatsky    207  3344         0   ..         0             0          0          0                9        0
186119b57eaSSergey Senozhatsky    208  3360         0   ..         0             0          0          0               14        0
187119b57eaSSergey Senozhatsky    211  3408         0   ..         0             0          0          0                5        0
188119b57eaSSergey Senozhatsky    212  3424         0   ..         0             0          0          0               16        0
189119b57eaSSergey Senozhatsky    214  3456         0   ..         0             0          0          0               11        0
190119b57eaSSergey Senozhatsky    217  3504         0   ..         0             0          0          0                6        0
191119b57eaSSergey Senozhatsky    219  3536         0   ..         0             0          0          0               13        0
192119b57eaSSergey Senozhatsky    222  3584         0   ..         0             0          0          0                7        0
193119b57eaSSergey Senozhatsky    223  3600         0   ..         0             0          0          0               15        0
194119b57eaSSergey Senozhatsky    225  3632         0   ..         0             0          0          0                8        0
195119b57eaSSergey Senozhatsky    228  3680         0   ..         0             0          0          0                9        0
196119b57eaSSergey Senozhatsky    230  3712         0   ..         0             0          0          0               10        0
197119b57eaSSergey Senozhatsky    232  3744         0   ..         0             0          0          0               11        0
198119b57eaSSergey Senozhatsky    234  3776         0   ..         0             0          0          0               12        0
199119b57eaSSergey Senozhatsky    235  3792         0   ..         0             0          0          0               13        0
200119b57eaSSergey Senozhatsky    236  3808         0   ..         0             0          0          0               14        0
201119b57eaSSergey Senozhatsky    238  3840         0   ..         0             0          0          0               15        0
202119b57eaSSergey Senozhatsky    254  4096         0   ..         0             0          0          0                1        0
2034ff93b29SSergey Senozhatsky  ...
2044ff93b29SSergey Senozhatsky
2054ff93b29SSergey SenozhatskyOverall the combined zspage chain size effect on zsmalloc pool configuration:::
2064ff93b29SSergey Senozhatsky
2074ff93b29SSergey Senozhatsky  pages per zspage   number of size classes (clusters)   huge size class watermark
2084ff93b29SSergey Senozhatsky         4                        69                               3264
2094ff93b29SSergey Senozhatsky         5                        86                               3408
2104ff93b29SSergey Senozhatsky         6                        93                               3504
2114ff93b29SSergey Senozhatsky         7                       112                               3584
2124ff93b29SSergey Senozhatsky         8                       123                               3632
2134ff93b29SSergey Senozhatsky         9                       140                               3680
2144ff93b29SSergey Senozhatsky        10                       143                               3712
2154ff93b29SSergey Senozhatsky        11                       159                               3744
2164ff93b29SSergey Senozhatsky        12                       164                               3776
2174ff93b29SSergey Senozhatsky        13                       180                               3792
2184ff93b29SSergey Senozhatsky        14                       183                               3808
2194ff93b29SSergey Senozhatsky        15                       188                               3840
2204ff93b29SSergey Senozhatsky        16                       191                               3840
2214ff93b29SSergey Senozhatsky
2224ff93b29SSergey Senozhatsky
2234ff93b29SSergey SenozhatskyA synthetic test
2244ff93b29SSergey Senozhatsky----------------
2254ff93b29SSergey Senozhatsky
2264ff93b29SSergey Senozhatskyzram as a build artifacts storage (Linux kernel compilation).
2274ff93b29SSergey Senozhatsky
2284ff93b29SSergey Senozhatsky* `CONFIG_ZSMALLOC_CHAIN_SIZE=4`
2294ff93b29SSergey Senozhatsky
2304ff93b29SSergey Senozhatsky  zsmalloc classes stats:::
2314ff93b29SSergey Senozhatsky
232119b57eaSSergey Senozhatsky    class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
233119b57eaSSergey Senozhatsky
2344ff93b29SSergey Senozhatsky    ...
235119b57eaSSergey Senozhatsky    Total              13   ..        51        413836     412973     159955                         3
2364ff93b29SSergey Senozhatsky
2374ff93b29SSergey Senozhatsky  zram mm_stat:::
2384ff93b29SSergey Senozhatsky
2394ff93b29SSergey Senozhatsky   1691783168 628083717 655175680        0 655175680       60        0    34048    34049
2404ff93b29SSergey Senozhatsky
2414ff93b29SSergey Senozhatsky
2424ff93b29SSergey Senozhatsky* `CONFIG_ZSMALLOC_CHAIN_SIZE=8`
2434ff93b29SSergey Senozhatsky
2444ff93b29SSergey Senozhatsky  zsmalloc classes stats:::
2454ff93b29SSergey Senozhatsky
246119b57eaSSergey Senozhatsky    class  size       10%   ....    100% obj_allocated   obj_used pages_used pages_per_zspage freeable
247119b57eaSSergey Senozhatsky
2484ff93b29SSergey Senozhatsky    ...
249119b57eaSSergey Senozhatsky    Total              18   ..        87        414852     412978     156666                         0
2504ff93b29SSergey Senozhatsky
2514ff93b29SSergey Senozhatsky  zram mm_stat:::
2524ff93b29SSergey Senozhatsky
2534ff93b29SSergey Senozhatsky    1691803648 627793930 641703936        0 641703936       60        0    33591    33591
2544ff93b29SSergey Senozhatsky
2554ff93b29SSergey SenozhatskyUsing larger zspage chains may result in using fewer physical pages, as seen
2564ff93b29SSergey Senozhatskyin the example where the number of physical pages used decreased from 159955
2574ff93b29SSergey Senozhatskyto 156666, at the same time maximum zsmalloc pool memory usage went down from
2584ff93b29SSergey Senozhatsky655175680 to 641703936 bytes.
2594ff93b29SSergey Senozhatsky
2604ff93b29SSergey SenozhatskyHowever, this advantage may be offset by the potential for increased system
2614ff93b29SSergey Senozhatskymemory pressure (as some zspages have larger chain sizes) in cases where there
2624ff93b29SSergey Senozhatskyis heavy internal fragmentation and zspool compaction is unable to relocate
2634ff93b29SSergey Senozhatskyobjects and release zspages. In these cases, it is recommended to decrease
2644ff93b29SSergey Senozhatskythe limit on the size of the zspage chains (as specified by the
2654ff93b29SSergey SenozhatskyCONFIG_ZSMALLOC_CHAIN_SIZE option).
266*61ff748bSMatthew Wilcox (Oracle)
267*61ff748bSMatthew Wilcox (Oracle)Functions
268*61ff748bSMatthew Wilcox (Oracle)=========
269*61ff748bSMatthew Wilcox (Oracle)
270*61ff748bSMatthew Wilcox (Oracle).. kernel-doc:: mm/zsmalloc.c
271