1================ 2Memory Balancing 3================ 4 5Started Jan 2000 by Kanoj Sarcar <kanoj@sgi.com> 6 7Memory balancing is needed for !__GFP_ATOMIC and !__GFP_KSWAPD_RECLAIM as 8well as for non __GFP_IO allocations. 9 10The first reason why a caller may avoid reclaim is that the caller can not 11sleep due to holding a spinlock or is in interrupt context. The second may 12be that the caller is willing to fail the allocation without incurring the 13overhead of page reclaim. This may happen for opportunistic high-order 14allocation requests that have order-0 fallback options. In such cases, 15the caller may also wish to avoid waking kswapd. 16 17__GFP_IO allocation requests are made to prevent file system deadlocks. 18 19In the absence of non sleepable allocation requests, it seems detrimental 20to be doing balancing. Page reclamation can be kicked off lazily, that 21is, only when needed (aka zone free memory is 0), instead of making it 22a proactive process. 23 24That being said, the kernel should try to fulfill requests for direct 25mapped pages from the direct mapped pool, instead of falling back on 26the dma pool, so as to keep the dma pool filled for dma requests (atomic 27or not). A similar argument applies to highmem and direct mapped pages. 28OTOH, if there is a lot of free dma pages, it is preferable to satisfy 29regular memory requests by allocating one from the dma pool, instead 30of incurring the overhead of regular zone balancing. 31 32In 2.2, memory balancing/page reclamation would kick off only when the 33_total_ number of free pages fell below 1/64 th of total memory. With the 34right ratio of dma and regular memory, it is quite possible that balancing 35would not be done even when the dma zone was completely empty. 2.2 has 36been running production machines of varying memory sizes, and seems to be 37doing fine even with the presence of this problem. In 2.3, due to 38HIGHMEM, this problem is aggravated. 39 40In 2.3, zone balancing can be done in one of two ways: depending on the 41zone size (and possibly of the size of lower class zones), we can decide 42at init time how many free pages we should aim for while balancing any 43zone. The good part is, while balancing, we do not need to look at sizes 44of lower class zones, the bad part is, we might do too frequent balancing 45due to ignoring possibly lower usage in the lower class zones. Also, 46with a slight change in the allocation routine, it is possible to reduce 47the memclass() macro to be a simple equality. 48 49Another possible solution is that we balance only when the free memory 50of a zone _and_ all its lower class zones falls below 1/64th of the 51total memory in the zone and its lower class zones. This fixes the 2.2 52balancing problem, and stays as close to 2.2 behavior as possible. Also, 53the balancing algorithm works the same way on the various architectures, 54which have different numbers and types of zones. If we wanted to get 55fancy, we could assign different weights to free pages in different 56zones in the future. 57 58Note that if the size of the regular zone is huge compared to dma zone, 59it becomes less significant to consider the free dma pages while 60deciding whether to balance the regular zone. The first solution 61becomes more attractive then. 62 63The appended patch implements the second solution. It also "fixes" two 64problems: first, kswapd is woken up as in 2.2 on low memory conditions 65for non-sleepable allocations. Second, the HIGHMEM zone is also balanced, 66so as to give a fighting chance for replace_with_highmem() to get a 67HIGHMEM page, as well as to ensure that HIGHMEM allocations do not 68fall back into regular zone. This also makes sure that HIGHMEM pages 69are not leaked (for example, in situations where a HIGHMEM page is in 70the swapcache but is not being used by anyone) 71 72kswapd also needs to know about the zones it should balance. kswapd is 73primarily needed in a situation where balancing can not be done, 74probably because all allocation requests are coming from intr context 75and all process contexts are sleeping. For 2.3, kswapd does not really 76need to balance the highmem zone, since intr context does not request 77highmem pages. kswapd looks at the zone_wake_kswapd field in the zone 78structure to decide whether a zone needs balancing. 79 80Page stealing from process memory and shm is done if stealing the page would 81alleviate memory pressure on any zone in the page's node that has fallen below 82its watermark. 83 84watemark[WMARK_MIN/WMARK_LOW/WMARK_HIGH]/low_on_memory/zone_wake_kswapd: These 85are per-zone fields, used to determine when a zone needs to be balanced. When 86the number of pages falls below watermark[WMARK_MIN], the hysteric field 87low_on_memory gets set. This stays set till the number of free pages becomes 88watermark[WMARK_HIGH]. When low_on_memory is set, page allocation requests will 89try to free some pages in the zone (providing GFP_WAIT is set in the request). 90Orthogonal to this, is the decision to poke kswapd to free some zone pages. 91That decision is not hysteresis based, and is done when the number of free 92pages is below watermark[WMARK_LOW]; in which case zone_wake_kswapd is also set. 93 94 95(Good) Ideas that I have heard: 96 971. Dynamic experience should influence balancing: number of failed requests 98 for a zone can be tracked and fed into the balancing scheme (jalvo@mbay.net) 992. Implement a replace_with_highmem()-like replace_with_regular() to preserve 100 dma pages. (lkd@tantalophile.demon.co.uk) 101