/linux/mm/ |
H A D | msync.c | diff b5810039a54e5babf428e9a1e89fc1940fabff11 Sun Oct 30 02:16:12 CET 2005 Nick Piggin <nickpiggin@yahoo.com.au> [PATCH] core remove PageReserved
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
H A D | mprotect.c | diff b5810039a54e5babf428e9a1e89fc1940fabff11 Sun Oct 30 02:16:12 CET 2005 Nick Piggin <nickpiggin@yahoo.com.au> [PATCH] core remove PageReserved
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
H A D | madvise.c | diff b5810039a54e5babf428e9a1e89fc1940fabff11 Sun Oct 30 02:16:12 CET 2005 Nick Piggin <nickpiggin@yahoo.com.au> [PATCH] core remove PageReserved
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
H A D | swap.c | diff b5810039a54e5babf428e9a1e89fc1940fabff11 Sun Oct 30 02:16:12 CET 2005 Nick Piggin <nickpiggin@yahoo.com.au> [PATCH] core remove PageReserved
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
H A D | memory.c | diff 557ed1fa2620dc119adb86b34c614e152a629a80 Tue Oct 16 10:24:40 CEST 2007 Nick Piggin <npiggin@suse.de> remove ZERO_PAGE
The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
And indeed this cacheline bouncing has shown up on large SGI systems. There was a situation where an Altix system was essentially livelocked tearing down ZERO_PAGE pagetables when an HPC app aborted during startup. This situation can be avoided in userspace, but it does highlight the potential scalability problem with refcounting ZERO_PAGE, and corner cases where it can really hurt (we don't want the system to livelock!).
There are several broad ways to fix this problem: 1. add back some special casing to avoid refcounting ZERO_PAGE 2. per-node or per-cpu ZERO_PAGES 3. remove the ZERO_PAGE completely
I will argue for 3. The others should also fix the problem, but they result in more complex code than does 3, with little or no real benefit that I can see.
Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a false optimisation: if an application is performance critical, it would not be doing many read faults of new memory, or at least it could be expected to write to that memory soon afterwards. If cache or memory use is critical, it should not be working with a significant number of ZERO_PAGEs anyway (a more compact representation of zeroes should be used).
As a sanity check -- mesuring on my desktop system, there are never many mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not increase much without it.
When running a make -j4 kernel compile on my dual core system, there are about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000 ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second is torn down without being COWed). So removing ZERO_PAGE will save 1,000 page faults per second when running kbuild, while keeping it only saves less than 1 page clearing operation per second. 1 page clear is cheaper than a thousand faults, presumably, so there isn't an obvious loss.
Neither the logical argument nor these basic tests give a guarantee of no regressions. However, this is a reasonable opportunity to try to remove the ZERO_PAGE from the pagefault path. If it is found to cause regressions, we can reintroduce it and just avoid refcounting it.
The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see much use to them except on benchmarks. All other users of ZERO_PAGE are converted just to use ZERO_PAGE(0) for simplicity. We can look at replacing them all and maybe ripping out ZERO_PAGE completely when we are more satisfied with this solution.
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org> diff b5810039a54e5babf428e9a1e89fc1940fabff11 Sun Oct 30 02:16:12 CET 2005 Nick Piggin <nickpiggin@yahoo.com.au> [PATCH] core remove PageReserved
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
H A D | rmap.c | diff b5810039a54e5babf428e9a1e89fc1940fabff11 Sun Oct 30 02:16:12 CET 2005 Nick Piggin <nickpiggin@yahoo.com.au> [PATCH] core remove PageReserved
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
H A D | mempolicy.c | diff b5810039a54e5babf428e9a1e89fc1940fabff11 Sun Oct 30 02:16:12 CET 2005 Nick Piggin <nickpiggin@yahoo.com.au> [PATCH] core remove PageReserved
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
H A D | shmem.c | diff b5810039a54e5babf428e9a1e89fc1940fabff11 Sun Oct 30 02:16:12 CET 2005 Nick Piggin <nickpiggin@yahoo.com.au> [PATCH] core remove PageReserved
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
H A D | mmap.c | diff b5810039a54e5babf428e9a1e89fc1940fabff11 Sun Oct 30 02:16:12 CET 2005 Nick Piggin <nickpiggin@yahoo.com.au> [PATCH] core remove PageReserved
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
H A D | page_alloc.c | diff b5810039a54e5babf428e9a1e89fc1940fabff11 Sun Oct 30 02:16:12 CET 2005 Nick Piggin <nickpiggin@yahoo.com.au> [PATCH] core remove PageReserved
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
/linux/fs/ |
H A D | direct-io.c | diff 557ed1fa2620dc119adb86b34c614e152a629a80 Tue Oct 16 10:24:40 CEST 2007 Nick Piggin <npiggin@suse.de> remove ZERO_PAGE
The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
And indeed this cacheline bouncing has shown up on large SGI systems. There was a situation where an Altix system was essentially livelocked tearing down ZERO_PAGE pagetables when an HPC app aborted during startup. This situation can be avoided in userspace, but it does highlight the potential scalability problem with refcounting ZERO_PAGE, and corner cases where it can really hurt (we don't want the system to livelock!).
There are several broad ways to fix this problem: 1. add back some special casing to avoid refcounting ZERO_PAGE 2. per-node or per-cpu ZERO_PAGES 3. remove the ZERO_PAGE completely
I will argue for 3. The others should also fix the problem, but they result in more complex code than does 3, with little or no real benefit that I can see.
Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a false optimisation: if an application is performance critical, it would not be doing many read faults of new memory, or at least it could be expected to write to that memory soon afterwards. If cache or memory use is critical, it should not be working with a significant number of ZERO_PAGEs anyway (a more compact representation of zeroes should be used).
As a sanity check -- mesuring on my desktop system, there are never many mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not increase much without it.
When running a make -j4 kernel compile on my dual core system, there are about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000 ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second is torn down without being COWed). So removing ZERO_PAGE will save 1,000 page faults per second when running kbuild, while keeping it only saves less than 1 page clearing operation per second. 1 page clear is cheaper than a thousand faults, presumably, so there isn't an obvious loss.
Neither the logical argument nor these basic tests give a guarantee of no regressions. However, this is a reasonable opportunity to try to remove the ZERO_PAGE from the pagefault path. If it is found to cause regressions, we can reintroduce it and just avoid refcounting it.
The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see much use to them except on benchmarks. All other users of ZERO_PAGE are converted just to use ZERO_PAGE(0) for simplicity. We can look at replacing them all and maybe ripping out ZERO_PAGE completely when we are more satisfied with this solution.
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org> diff b5810039a54e5babf428e9a1e89fc1940fabff11 Sun Oct 30 02:16:12 CET 2005 Nick Piggin <nickpiggin@yahoo.com.au> [PATCH] core remove PageReserved
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
H A D | binfmt_elf_fdpic.c | diff 557ed1fa2620dc119adb86b34c614e152a629a80 Tue Oct 16 10:24:40 CEST 2007 Nick Piggin <npiggin@suse.de> remove ZERO_PAGE
The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
And indeed this cacheline bouncing has shown up on large SGI systems. There was a situation where an Altix system was essentially livelocked tearing down ZERO_PAGE pagetables when an HPC app aborted during startup. This situation can be avoided in userspace, but it does highlight the potential scalability problem with refcounting ZERO_PAGE, and corner cases where it can really hurt (we don't want the system to livelock!).
There are several broad ways to fix this problem: 1. add back some special casing to avoid refcounting ZERO_PAGE 2. per-node or per-cpu ZERO_PAGES 3. remove the ZERO_PAGE completely
I will argue for 3. The others should also fix the problem, but they result in more complex code than does 3, with little or no real benefit that I can see.
Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a false optimisation: if an application is performance critical, it would not be doing many read faults of new memory, or at least it could be expected to write to that memory soon afterwards. If cache or memory use is critical, it should not be working with a significant number of ZERO_PAGEs anyway (a more compact representation of zeroes should be used).
As a sanity check -- mesuring on my desktop system, there are never many mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not increase much without it.
When running a make -j4 kernel compile on my dual core system, there are about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000 ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second is torn down without being COWed). So removing ZERO_PAGE will save 1,000 page faults per second when running kbuild, while keeping it only saves less than 1 page clearing operation per second. 1 page clear is cheaper than a thousand faults, presumably, so there isn't an obvious loss.
Neither the logical argument nor these basic tests give a guarantee of no regressions. However, this is a reasonable opportunity to try to remove the ZERO_PAGE from the pagefault path. If it is found to cause regressions, we can reintroduce it and just avoid refcounting it.
The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see much use to them except on benchmarks. All other users of ZERO_PAGE are converted just to use ZERO_PAGE(0) for simplicity. We can look at replacing them all and maybe ripping out ZERO_PAGE completely when we are more satisfied with this solution.
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org>
|
H A D | binfmt_elf.c | diff 557ed1fa2620dc119adb86b34c614e152a629a80 Tue Oct 16 10:24:40 CEST 2007 Nick Piggin <npiggin@suse.de> remove ZERO_PAGE
The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
And indeed this cacheline bouncing has shown up on large SGI systems. There was a situation where an Altix system was essentially livelocked tearing down ZERO_PAGE pagetables when an HPC app aborted during startup. This situation can be avoided in userspace, but it does highlight the potential scalability problem with refcounting ZERO_PAGE, and corner cases where it can really hurt (we don't want the system to livelock!).
There are several broad ways to fix this problem: 1. add back some special casing to avoid refcounting ZERO_PAGE 2. per-node or per-cpu ZERO_PAGES 3. remove the ZERO_PAGE completely
I will argue for 3. The others should also fix the problem, but they result in more complex code than does 3, with little or no real benefit that I can see.
Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a false optimisation: if an application is performance critical, it would not be doing many read faults of new memory, or at least it could be expected to write to that memory soon afterwards. If cache or memory use is critical, it should not be working with a significant number of ZERO_PAGEs anyway (a more compact representation of zeroes should be used).
As a sanity check -- mesuring on my desktop system, there are never many mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not increase much without it.
When running a make -j4 kernel compile on my dual core system, there are about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000 ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second is torn down without being COWed). So removing ZERO_PAGE will save 1,000 page faults per second when running kbuild, while keeping it only saves less than 1 page clearing operation per second. 1 page clear is cheaper than a thousand faults, presumably, so there isn't an obvious loss.
Neither the logical argument nor these basic tests give a guarantee of no regressions. However, this is a reasonable opportunity to try to remove the ZERO_PAGE from the pagefault path. If it is found to cause regressions, we can reintroduce it and just avoid refcounting it.
The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see much use to them except on benchmarks. All other users of ZERO_PAGE are converted just to use ZERO_PAGE(0) for simplicity. We can look at replacing them all and maybe ripping out ZERO_PAGE completely when we are more satisfied with this solution.
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org>
|
/linux/drivers/char/ |
H A D | mem.c | diff 557ed1fa2620dc119adb86b34c614e152a629a80 Tue Oct 16 10:24:40 CEST 2007 Nick Piggin <npiggin@suse.de> remove ZERO_PAGE
The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
And indeed this cacheline bouncing has shown up on large SGI systems. There was a situation where an Altix system was essentially livelocked tearing down ZERO_PAGE pagetables when an HPC app aborted during startup. This situation can be avoided in userspace, but it does highlight the potential scalability problem with refcounting ZERO_PAGE, and corner cases where it can really hurt (we don't want the system to livelock!).
There are several broad ways to fix this problem: 1. add back some special casing to avoid refcounting ZERO_PAGE 2. per-node or per-cpu ZERO_PAGES 3. remove the ZERO_PAGE completely
I will argue for 3. The others should also fix the problem, but they result in more complex code than does 3, with little or no real benefit that I can see.
Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a false optimisation: if an application is performance critical, it would not be doing many read faults of new memory, or at least it could be expected to write to that memory soon afterwards. If cache or memory use is critical, it should not be working with a significant number of ZERO_PAGEs anyway (a more compact representation of zeroes should be used).
As a sanity check -- mesuring on my desktop system, there are never many mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not increase much without it.
When running a make -j4 kernel compile on my dual core system, there are about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000 ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second is torn down without being COWed). So removing ZERO_PAGE will save 1,000 page faults per second when running kbuild, while keeping it only saves less than 1 page clearing operation per second. 1 page clear is cheaper than a thousand faults, presumably, so there isn't an obvious loss.
Neither the logical argument nor these basic tests give a guarantee of no regressions. However, this is a reasonable opportunity to try to remove the ZERO_PAGE from the pagefault path. If it is found to cause regressions, we can reintroduce it and just avoid refcounting it.
The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see much use to them except on benchmarks. All other users of ZERO_PAGE are converted just to use ZERO_PAGE(0) for simplicity. We can look at replacing them all and maybe ripping out ZERO_PAGE completely when we are more satisfied with this solution.
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org>
|
/linux/drivers/scsi/ |
H A D | st.c | diff b5810039a54e5babf428e9a1e89fc1940fabff11 Sun Oct 30 02:16:12 CET 2005 Nick Piggin <nickpiggin@yahoo.com.au> [PATCH] core remove PageReserved
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
/linux/sound/core/ |
H A D | pcm_native.c | diff b5810039a54e5babf428e9a1e89fc1940fabff11 Sun Oct 30 02:16:12 CET 2005 Nick Piggin <nickpiggin@yahoo.com.au> [PATCH] core remove PageReserved
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|
/linux/include/linux/ |
H A D | mm.h | diff 557ed1fa2620dc119adb86b34c614e152a629a80 Tue Oct 16 10:24:40 CEST 2007 Nick Piggin <npiggin@suse.de> remove ZERO_PAGE
The commit b5810039a54e5babf428e9a1e89fc1940fabff11 contains the note
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
And indeed this cacheline bouncing has shown up on large SGI systems. There was a situation where an Altix system was essentially livelocked tearing down ZERO_PAGE pagetables when an HPC app aborted during startup. This situation can be avoided in userspace, but it does highlight the potential scalability problem with refcounting ZERO_PAGE, and corner cases where it can really hurt (we don't want the system to livelock!).
There are several broad ways to fix this problem: 1. add back some special casing to avoid refcounting ZERO_PAGE 2. per-node or per-cpu ZERO_PAGES 3. remove the ZERO_PAGE completely
I will argue for 3. The others should also fix the problem, but they result in more complex code than does 3, with little or no real benefit that I can see.
Why? Inserting a ZERO_PAGE for anonymous read faults appears to be a false optimisation: if an application is performance critical, it would not be doing many read faults of new memory, or at least it could be expected to write to that memory soon afterwards. If cache or memory use is critical, it should not be working with a significant number of ZERO_PAGEs anyway (a more compact representation of zeroes should be used).
As a sanity check -- mesuring on my desktop system, there are never many mappings to the ZERO_PAGE (eg. 2 or 3), thus memory usage here should not increase much without it.
When running a make -j4 kernel compile on my dual core system, there are about 1,000 mappings to the ZERO_PAGE created per second, but about 1,000 ZERO_PAGE COW faults per second (less than 1 ZERO_PAGE mapping per second is torn down without being COWed). So removing ZERO_PAGE will save 1,000 page faults per second when running kbuild, while keeping it only saves less than 1 page clearing operation per second. 1 page clear is cheaper than a thousand faults, presumably, so there isn't an obvious loss.
Neither the logical argument nor these basic tests give a guarantee of no regressions. However, this is a reasonable opportunity to try to remove the ZERO_PAGE from the pagefault path. If it is found to cause regressions, we can reintroduce it and just avoid refcounting it.
The /dev/zero ZERO_PAGE usage and TLB tricks also get nuked. I don't see much use to them except on benchmarks. All other users of ZERO_PAGE are converted just to use ZERO_PAGE(0) for simplicity. We can look at replacing them all and maybe ripping out ZERO_PAGE completely when we are more satisfied with this solution.
Signed-off-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus "snif" Torvalds <torvalds@linux-foundation.org> diff b5810039a54e5babf428e9a1e89fc1940fabff11 Sun Oct 30 02:16:12 CET 2005 Nick Piggin <nickpiggin@yahoo.com.au> [PATCH] core remove PageReserved
Remove PageReserved() calls from core code by tightening VM_RESERVED handling in mm/ to cover PageReserved functionality.
PageReserved special casing is removed from get_page and put_page.
All setting and clearing of PageReserved is retained, and it is now flagged in the page_alloc checks to help ensure we don't introduce any refcount based freeing of Reserved pages.
MAP_PRIVATE, PROT_WRITE of VM_RESERVED regions is tentatively being deprecated. We never completely handled it correctly anyway, and is be reintroduced in future if required (Hugh has a proof of concept).
Once PageReserved() calls are removed from kernel/power/swsusp.c, and all arch/ and driver code, the Set and Clear calls, and the PG_reserved bit can be trivially removed.
Last real user of PageReserved is swsusp, which uses PageReserved to determine whether a struct page points to valid memory or not. This still needs to be addressed (a generic page_is_ram() should work).
A last caveat: the ZERO_PAGE is now refcounted and managed with rmap (and thus mapcounted and count towards shared rss). These writes to the struct page could cause excessive cacheline bouncing on big systems. There are a number of ways this could be addressed if it is an issue.
Signed-off-by: Nick Piggin <npiggin@suse.de>
Refcount bug fix for filemap_xip.c
Signed-off-by: Carsten Otte <cotte@de.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
|