1*87cc5157SChris Li.. SPDX-License-Identifier: GPL-2.0 2*87cc5157SChris Li 3*87cc5157SChris Li:Author: Chris Li <chrisl@kernel.org>, Kairui Song <kasong@tencent.com> 4*87cc5157SChris Li 5*87cc5157SChris Li========== 6*87cc5157SChris LiSwap Table 7*87cc5157SChris Li========== 8*87cc5157SChris Li 9*87cc5157SChris LiSwap table implements swap cache as a per-cluster swap cache value array. 10*87cc5157SChris Li 11*87cc5157SChris LiSwap Entry 12*87cc5157SChris Li---------- 13*87cc5157SChris Li 14*87cc5157SChris LiA swap entry contains the information required to serve the anonymous page 15*87cc5157SChris Lifault. 16*87cc5157SChris Li 17*87cc5157SChris LiSwap entry is encoded as two parts: swap type and swap offset. 18*87cc5157SChris Li 19*87cc5157SChris LiThe swap type indicates which swap device to use. 20*87cc5157SChris LiThe swap offset is the offset of the swap file to read the page data from. 21*87cc5157SChris Li 22*87cc5157SChris LiSwap Cache 23*87cc5157SChris Li---------- 24*87cc5157SChris Li 25*87cc5157SChris LiSwap cache is a map to look up folios using swap entry as the key. The result 26*87cc5157SChris Livalue can have three possible types depending on which stage of this swap entry 27*87cc5157SChris Liwas in. 28*87cc5157SChris Li 29*87cc5157SChris Li1. NULL: This swap entry is not used. 30*87cc5157SChris Li 31*87cc5157SChris Li2. folio: A folio has been allocated and bound to this swap entry. This is 32*87cc5157SChris Li the transient state of swap out or swap in. The folio data can be in 33*87cc5157SChris Li the folio or swap file, or both. 34*87cc5157SChris Li 35*87cc5157SChris Li3. shadow: The shadow contains the working set information of the swapped 36*87cc5157SChris Li out folio. This is the normal state for a swapped out page. 37*87cc5157SChris Li 38*87cc5157SChris LiSwap Table Internals 39*87cc5157SChris Li-------------------- 40*87cc5157SChris Li 41*87cc5157SChris LiThe previous swap cache is implemented by XArray. The XArray is a tree 42*87cc5157SChris Listructure. Each lookup will go through multiple nodes. Can we do better? 43*87cc5157SChris Li 44*87cc5157SChris LiNotice that most of the time when we look up the swap cache, we are either 45*87cc5157SChris Liin a swap in or swap out path. We should already have the swap cluster, 46*87cc5157SChris Liwhich contains the swap entry. 47*87cc5157SChris Li 48*87cc5157SChris LiIf we have a per-cluster array to store swap cache value in the cluster. 49*87cc5157SChris LiSwap cache lookup within the cluster can be a very simple array lookup. 50*87cc5157SChris Li 51*87cc5157SChris LiWe give such a per-cluster swap cache value array a name: the swap table. 52*87cc5157SChris Li 53*87cc5157SChris LiA swap table is an array of pointers. Each pointer is the same size as a 54*87cc5157SChris LiPTE. The size of a swap table for one swap cluster typically matches a PTE 55*87cc5157SChris Lipage table, which is one page on modern 64-bit systems. 56*87cc5157SChris Li 57*87cc5157SChris LiWith swap table, swap cache lookup can achieve great locality, simpler, 58*87cc5157SChris Liand faster. 59*87cc5157SChris Li 60*87cc5157SChris LiLocking 61*87cc5157SChris Li------- 62*87cc5157SChris Li 63*87cc5157SChris LiSwap table modification requires taking the cluster lock. If a folio 64*87cc5157SChris Liis being added to or removed from the swap table, the folio must be 65*87cc5157SChris Lilocked prior to the cluster lock. After adding or removing is done, the 66*87cc5157SChris Lifolio shall be unlocked. 67*87cc5157SChris Li 68*87cc5157SChris LiSwap table lookup is protected by RCU and atomic read. If the lookup 69*87cc5157SChris Lireturns a folio, the user must lock the folio before use. 70