1de0f51e4SMauro Carvalho Chehab================================== 2de0f51e4SMauro Carvalho ChehabCache and TLB Flushing Under Linux 3de0f51e4SMauro Carvalho Chehab================================== 4de0f51e4SMauro Carvalho Chehab 5de0f51e4SMauro Carvalho Chehab:Author: David S. Miller <davem@redhat.com> 6de0f51e4SMauro Carvalho Chehab 7de0f51e4SMauro Carvalho ChehabThis document describes the cache/tlb flushing interfaces called 8de0f51e4SMauro Carvalho Chehabby the Linux VM subsystem. It enumerates over each interface, 9de0f51e4SMauro Carvalho Chehabdescribes its intended purpose, and what side effect is expected 10de0f51e4SMauro Carvalho Chehabafter the interface is invoked. 11de0f51e4SMauro Carvalho Chehab 12de0f51e4SMauro Carvalho ChehabThe side effects described below are stated for a uniprocessor 13de0f51e4SMauro Carvalho Chehabimplementation, and what is to happen on that single processor. The 14de0f51e4SMauro Carvalho ChehabSMP cases are a simple extension, in that you just extend the 15de0f51e4SMauro Carvalho Chehabdefinition such that the side effect for a particular interface occurs 16de0f51e4SMauro Carvalho Chehabon all processors in the system. Don't let this scare you into 17de0f51e4SMauro Carvalho Chehabthinking SMP cache/tlb flushing must be so inefficient, this is in 18de0f51e4SMauro Carvalho Chehabfact an area where many optimizations are possible. For example, 19de0f51e4SMauro Carvalho Chehabif it can be proven that a user address space has never executed 20de0f51e4SMauro Carvalho Chehabon a cpu (see mm_cpumask()), one need not perform a flush 21de0f51e4SMauro Carvalho Chehabfor this address space on that cpu. 22de0f51e4SMauro Carvalho Chehab 23de0f51e4SMauro Carvalho ChehabFirst, the TLB flushing interfaces, since they are the simplest. The 24de0f51e4SMauro Carvalho Chehab"TLB" is abstracted under Linux as something the cpu uses to cache 25de0f51e4SMauro Carvalho Chehabvirtual-->physical address translations obtained from the software 26de0f51e4SMauro Carvalho Chehabpage tables. Meaning that if the software page tables change, it is 27de0f51e4SMauro Carvalho Chehabpossible for stale translations to exist in this "TLB" cache. 28de0f51e4SMauro Carvalho ChehabTherefore when software page table changes occur, the kernel will 29de0f51e4SMauro Carvalho Chehabinvoke one of the following flush methods _after_ the page table 30de0f51e4SMauro Carvalho Chehabchanges occur: 31de0f51e4SMauro Carvalho Chehab 32de0f51e4SMauro Carvalho Chehab1) ``void flush_tlb_all(void)`` 33de0f51e4SMauro Carvalho Chehab 34de0f51e4SMauro Carvalho Chehab The most severe flush of all. After this interface runs, 35de0f51e4SMauro Carvalho Chehab any previous page table modification whatsoever will be 36de0f51e4SMauro Carvalho Chehab visible to the cpu. 37de0f51e4SMauro Carvalho Chehab 38de0f51e4SMauro Carvalho Chehab This is usually invoked when the kernel page tables are 39de0f51e4SMauro Carvalho Chehab changed, since such translations are "global" in nature. 40de0f51e4SMauro Carvalho Chehab 41de0f51e4SMauro Carvalho Chehab2) ``void flush_tlb_mm(struct mm_struct *mm)`` 42de0f51e4SMauro Carvalho Chehab 43de0f51e4SMauro Carvalho Chehab This interface flushes an entire user address space from 44de0f51e4SMauro Carvalho Chehab the TLB. After running, this interface must make sure that 45de0f51e4SMauro Carvalho Chehab any previous page table modifications for the address space 46de0f51e4SMauro Carvalho Chehab 'mm' will be visible to the cpu. That is, after running, 47de0f51e4SMauro Carvalho Chehab there will be no entries in the TLB for 'mm'. 48de0f51e4SMauro Carvalho Chehab 49de0f51e4SMauro Carvalho Chehab This interface is used to handle whole address space 50de0f51e4SMauro Carvalho Chehab page table operations such as what happens during 51de0f51e4SMauro Carvalho Chehab fork, and exec. 52de0f51e4SMauro Carvalho Chehab 53de0f51e4SMauro Carvalho Chehab3) ``void flush_tlb_range(struct vm_area_struct *vma, 54de0f51e4SMauro Carvalho Chehab unsigned long start, unsigned long end)`` 55de0f51e4SMauro Carvalho Chehab 56de0f51e4SMauro Carvalho Chehab Here we are flushing a specific range of (user) virtual 57de0f51e4SMauro Carvalho Chehab address translations from the TLB. After running, this 58de0f51e4SMauro Carvalho Chehab interface must make sure that any previous page table 59de0f51e4SMauro Carvalho Chehab modifications for the address space 'vma->vm_mm' in the range 60de0f51e4SMauro Carvalho Chehab 'start' to 'end-1' will be visible to the cpu. That is, after 61de0f51e4SMauro Carvalho Chehab running, there will be no entries in the TLB for 'mm' for 62de0f51e4SMauro Carvalho Chehab virtual addresses in the range 'start' to 'end-1'. 63de0f51e4SMauro Carvalho Chehab 64de0f51e4SMauro Carvalho Chehab The "vma" is the backing store being used for the region. 65de0f51e4SMauro Carvalho Chehab Primarily, this is used for munmap() type operations. 66de0f51e4SMauro Carvalho Chehab 67de0f51e4SMauro Carvalho Chehab The interface is provided in hopes that the port can find 68de0f51e4SMauro Carvalho Chehab a suitably efficient method for removing multiple page 69de0f51e4SMauro Carvalho Chehab sized translations from the TLB, instead of having the kernel 70de0f51e4SMauro Carvalho Chehab call flush_tlb_page (see below) for each entry which may be 71de0f51e4SMauro Carvalho Chehab modified. 72de0f51e4SMauro Carvalho Chehab 73de0f51e4SMauro Carvalho Chehab4) ``void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)`` 74de0f51e4SMauro Carvalho Chehab 75de0f51e4SMauro Carvalho Chehab This time we need to remove the PAGE_SIZE sized translation 76de0f51e4SMauro Carvalho Chehab from the TLB. The 'vma' is the backing structure used by 77de0f51e4SMauro Carvalho Chehab Linux to keep track of mmap'd regions for a process, the 78de0f51e4SMauro Carvalho Chehab address space is available via vma->vm_mm. Also, one may 79de0f51e4SMauro Carvalho Chehab test (vma->vm_flags & VM_EXEC) to see if this region is 80de0f51e4SMauro Carvalho Chehab executable (and thus could be in the 'instruction TLB' in 81de0f51e4SMauro Carvalho Chehab split-tlb type setups). 82de0f51e4SMauro Carvalho Chehab 83de0f51e4SMauro Carvalho Chehab After running, this interface must make sure that any previous 84de0f51e4SMauro Carvalho Chehab page table modification for address space 'vma->vm_mm' for 85de0f51e4SMauro Carvalho Chehab user virtual address 'addr' will be visible to the cpu. That 86de0f51e4SMauro Carvalho Chehab is, after running, there will be no entries in the TLB for 87de0f51e4SMauro Carvalho Chehab 'vma->vm_mm' for virtual address 'addr'. 88de0f51e4SMauro Carvalho Chehab 89de0f51e4SMauro Carvalho Chehab This is used primarily during fault processing. 90de0f51e4SMauro Carvalho Chehab 913a255267SMatthew Wilcox (Oracle)5) ``void update_mmu_cache_range(struct vm_fault *vmf, 923a255267SMatthew Wilcox (Oracle) struct vm_area_struct *vma, unsigned long address, pte_t *ptep, 933a255267SMatthew Wilcox (Oracle) unsigned int nr)`` 94de0f51e4SMauro Carvalho Chehab 953a255267SMatthew Wilcox (Oracle) At the end of every page fault, this routine is invoked to tell 963a255267SMatthew Wilcox (Oracle) the architecture specific code that translations now exists 973a255267SMatthew Wilcox (Oracle) in the software page tables for address space "vma->vm_mm" 983a255267SMatthew Wilcox (Oracle) at virtual address "address" for "nr" consecutive pages. 993a255267SMatthew Wilcox (Oracle) 1003a255267SMatthew Wilcox (Oracle) This routine is also invoked in various other places which pass 1013a255267SMatthew Wilcox (Oracle) a NULL "vmf". 102de0f51e4SMauro Carvalho Chehab 103de0f51e4SMauro Carvalho Chehab A port may use this information in any way it so chooses. 104de0f51e4SMauro Carvalho Chehab For example, it could use this event to pre-load TLB 105de0f51e4SMauro Carvalho Chehab translations for software managed TLB configurations. 106de0f51e4SMauro Carvalho Chehab The sparc64 port currently does this. 107de0f51e4SMauro Carvalho Chehab 108de0f51e4SMauro Carvalho ChehabNext, we have the cache flushing interfaces. In general, when Linux 109de0f51e4SMauro Carvalho Chehabis changing an existing virtual-->physical mapping to a new value, 110de0f51e4SMauro Carvalho Chehabthe sequence will be in one of the following forms:: 111de0f51e4SMauro Carvalho Chehab 112de0f51e4SMauro Carvalho Chehab 1) flush_cache_mm(mm); 113de0f51e4SMauro Carvalho Chehab change_all_page_tables_of(mm); 114de0f51e4SMauro Carvalho Chehab flush_tlb_mm(mm); 115de0f51e4SMauro Carvalho Chehab 116de0f51e4SMauro Carvalho Chehab 2) flush_cache_range(vma, start, end); 117de0f51e4SMauro Carvalho Chehab change_range_of_page_tables(mm, start, end); 118de0f51e4SMauro Carvalho Chehab flush_tlb_range(vma, start, end); 119de0f51e4SMauro Carvalho Chehab 120de0f51e4SMauro Carvalho Chehab 3) flush_cache_page(vma, addr, pfn); 121de0f51e4SMauro Carvalho Chehab set_pte(pte_pointer, new_pte_val); 122de0f51e4SMauro Carvalho Chehab flush_tlb_page(vma, addr); 123de0f51e4SMauro Carvalho Chehab 124de0f51e4SMauro Carvalho ChehabThe cache level flush will always be first, because this allows 125de0f51e4SMauro Carvalho Chehabus to properly handle systems whose caches are strict and require 126de0f51e4SMauro Carvalho Chehaba virtual-->physical translation to exist for a virtual address 127de0f51e4SMauro Carvalho Chehabwhen that virtual address is flushed from the cache. The HyperSparc 128de0f51e4SMauro Carvalho Chehabcpu is one such cpu with this attribute. 129de0f51e4SMauro Carvalho Chehab 130de0f51e4SMauro Carvalho ChehabThe cache flushing routines below need only deal with cache flushing 131de0f51e4SMauro Carvalho Chehabto the extent that it is necessary for a particular cpu. Mostly, 132de0f51e4SMauro Carvalho Chehabthese routines must be implemented for cpus which have virtually 133de0f51e4SMauro Carvalho Chehabindexed caches which must be flushed when virtual-->physical 134de0f51e4SMauro Carvalho Chehabtranslations are changed or removed. So, for example, the physically 135de0f51e4SMauro Carvalho Chehabindexed physically tagged caches of IA32 processors have no need to 136de0f51e4SMauro Carvalho Chehabimplement these interfaces since the caches are fully synchronized 137de0f51e4SMauro Carvalho Chehaband have no dependency on translation information. 138de0f51e4SMauro Carvalho Chehab 139de0f51e4SMauro Carvalho ChehabHere are the routines, one by one: 140de0f51e4SMauro Carvalho Chehab 141de0f51e4SMauro Carvalho Chehab1) ``void flush_cache_mm(struct mm_struct *mm)`` 142de0f51e4SMauro Carvalho Chehab 143de0f51e4SMauro Carvalho Chehab This interface flushes an entire user address space from 144de0f51e4SMauro Carvalho Chehab the caches. That is, after running, there will be no cache 145de0f51e4SMauro Carvalho Chehab lines associated with 'mm'. 146de0f51e4SMauro Carvalho Chehab 147de0f51e4SMauro Carvalho Chehab This interface is used to handle whole address space 148de0f51e4SMauro Carvalho Chehab page table operations such as what happens during exit and exec. 149de0f51e4SMauro Carvalho Chehab 150de0f51e4SMauro Carvalho Chehab2) ``void flush_cache_dup_mm(struct mm_struct *mm)`` 151de0f51e4SMauro Carvalho Chehab 152de0f51e4SMauro Carvalho Chehab This interface flushes an entire user address space from 153de0f51e4SMauro Carvalho Chehab the caches. That is, after running, there will be no cache 154de0f51e4SMauro Carvalho Chehab lines associated with 'mm'. 155de0f51e4SMauro Carvalho Chehab 156de0f51e4SMauro Carvalho Chehab This interface is used to handle whole address space 157de0f51e4SMauro Carvalho Chehab page table operations such as what happens during fork. 158de0f51e4SMauro Carvalho Chehab 159de0f51e4SMauro Carvalho Chehab This option is separate from flush_cache_mm to allow some 160de0f51e4SMauro Carvalho Chehab optimizations for VIPT caches. 161de0f51e4SMauro Carvalho Chehab 162de0f51e4SMauro Carvalho Chehab3) ``void flush_cache_range(struct vm_area_struct *vma, 163de0f51e4SMauro Carvalho Chehab unsigned long start, unsigned long end)`` 164de0f51e4SMauro Carvalho Chehab 165de0f51e4SMauro Carvalho Chehab Here we are flushing a specific range of (user) virtual 166de0f51e4SMauro Carvalho Chehab addresses from the cache. After running, there will be no 167de0f51e4SMauro Carvalho Chehab entries in the cache for 'vma->vm_mm' for virtual addresses in 168de0f51e4SMauro Carvalho Chehab the range 'start' to 'end-1'. 169de0f51e4SMauro Carvalho Chehab 170de0f51e4SMauro Carvalho Chehab The "vma" is the backing store being used for the region. 171de0f51e4SMauro Carvalho Chehab Primarily, this is used for munmap() type operations. 172de0f51e4SMauro Carvalho Chehab 173de0f51e4SMauro Carvalho Chehab The interface is provided in hopes that the port can find 174de0f51e4SMauro Carvalho Chehab a suitably efficient method for removing multiple page 175de0f51e4SMauro Carvalho Chehab sized regions from the cache, instead of having the kernel 176de0f51e4SMauro Carvalho Chehab call flush_cache_page (see below) for each entry which may be 177de0f51e4SMauro Carvalho Chehab modified. 178de0f51e4SMauro Carvalho Chehab 179de0f51e4SMauro Carvalho Chehab4) ``void flush_cache_page(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn)`` 180de0f51e4SMauro Carvalho Chehab 181de0f51e4SMauro Carvalho Chehab This time we need to remove a PAGE_SIZE sized range 182de0f51e4SMauro Carvalho Chehab from the cache. The 'vma' is the backing structure used by 183de0f51e4SMauro Carvalho Chehab Linux to keep track of mmap'd regions for a process, the 184de0f51e4SMauro Carvalho Chehab address space is available via vma->vm_mm. Also, one may 185de0f51e4SMauro Carvalho Chehab test (vma->vm_flags & VM_EXEC) to see if this region is 186de0f51e4SMauro Carvalho Chehab executable (and thus could be in the 'instruction cache' in 187de0f51e4SMauro Carvalho Chehab "Harvard" type cache layouts). 188de0f51e4SMauro Carvalho Chehab 189de0f51e4SMauro Carvalho Chehab The 'pfn' indicates the physical page frame (shift this value 190de0f51e4SMauro Carvalho Chehab left by PAGE_SHIFT to get the physical address) that 'addr' 191de0f51e4SMauro Carvalho Chehab translates to. It is this mapping which should be removed from 192de0f51e4SMauro Carvalho Chehab the cache. 193de0f51e4SMauro Carvalho Chehab 194de0f51e4SMauro Carvalho Chehab After running, there will be no entries in the cache for 195de0f51e4SMauro Carvalho Chehab 'vma->vm_mm' for virtual address 'addr' which translates 196de0f51e4SMauro Carvalho Chehab to 'pfn'. 197de0f51e4SMauro Carvalho Chehab 198de0f51e4SMauro Carvalho Chehab This is used primarily during fault processing. 199de0f51e4SMauro Carvalho Chehab 200de0f51e4SMauro Carvalho Chehab5) ``void flush_cache_kmaps(void)`` 201de0f51e4SMauro Carvalho Chehab 202de0f51e4SMauro Carvalho Chehab This routine need only be implemented if the platform utilizes 203de0f51e4SMauro Carvalho Chehab highmem. It will be called right before all of the kmaps 204de0f51e4SMauro Carvalho Chehab are invalidated. 205de0f51e4SMauro Carvalho Chehab 206de0f51e4SMauro Carvalho Chehab After running, there will be no entries in the cache for 207de0f51e4SMauro Carvalho Chehab the kernel virtual address range PKMAP_ADDR(0) to 208de0f51e4SMauro Carvalho Chehab PKMAP_ADDR(LAST_PKMAP). 209de0f51e4SMauro Carvalho Chehab 210de0f51e4SMauro Carvalho Chehab This routing should be implemented in asm/highmem.h 211de0f51e4SMauro Carvalho Chehab 212de0f51e4SMauro Carvalho Chehab6) ``void flush_cache_vmap(unsigned long start, unsigned long end)`` 213de0f51e4SMauro Carvalho Chehab ``void flush_cache_vunmap(unsigned long start, unsigned long end)`` 214de0f51e4SMauro Carvalho Chehab 215de0f51e4SMauro Carvalho Chehab Here in these two interfaces we are flushing a specific range 216de0f51e4SMauro Carvalho Chehab of (kernel) virtual addresses from the cache. After running, 217de0f51e4SMauro Carvalho Chehab there will be no entries in the cache for the kernel address 218de0f51e4SMauro Carvalho Chehab space for virtual addresses in the range 'start' to 'end-1'. 219de0f51e4SMauro Carvalho Chehab 220b67177ecSNicholas Piggin The first of these two routines is invoked after vmap_range() 221de0f51e4SMauro Carvalho Chehab has installed the page table entries. The second is invoked 2224ad0ae8cSNicholas Piggin before vunmap_range() deletes the page table entries. 223de0f51e4SMauro Carvalho Chehab 224de0f51e4SMauro Carvalho ChehabThere exists another whole class of cpu cache issues which currently 225de0f51e4SMauro Carvalho Chehabrequire a whole different set of interfaces to handle properly. 226de0f51e4SMauro Carvalho ChehabThe biggest problem is that of virtual aliasing in the data cache 227de0f51e4SMauro Carvalho Chehabof a processor. 228de0f51e4SMauro Carvalho Chehab 229de0f51e4SMauro Carvalho ChehabIs your port susceptible to virtual aliasing in its D-cache? 230de0f51e4SMauro Carvalho ChehabWell, if your D-cache is virtually indexed, is larger in size than 231de0f51e4SMauro Carvalho ChehabPAGE_SIZE, and does not prevent multiple cache lines for the same 232de0f51e4SMauro Carvalho Chehabphysical address from existing at once, you have this problem. 233de0f51e4SMauro Carvalho Chehab 234de0f51e4SMauro Carvalho ChehabIf your D-cache has this problem, first define asm/shmparam.h SHMLBA 235de0f51e4SMauro Carvalho Chehabproperly, it should essentially be the size of your virtually 236de0f51e4SMauro Carvalho Chehabaddressed D-cache (or if the size is variable, the largest possible 237de0f51e4SMauro Carvalho Chehabsize). This setting will force the SYSv IPC layer to only allow user 238de0f51e4SMauro Carvalho Chehabprocesses to mmap shared memory at address which are a multiple of 239de0f51e4SMauro Carvalho Chehabthis value. 240de0f51e4SMauro Carvalho Chehab 241de0f51e4SMauro Carvalho Chehab.. note:: 242de0f51e4SMauro Carvalho Chehab 243de0f51e4SMauro Carvalho Chehab This does not fix shared mmaps, check out the sparc64 port for 244de0f51e4SMauro Carvalho Chehab one way to solve this (in particular SPARC_FLAG_MMAPSHARED). 245de0f51e4SMauro Carvalho Chehab 246de0f51e4SMauro Carvalho ChehabNext, you have to solve the D-cache aliasing issue for all 247de0f51e4SMauro Carvalho Chehabother cases. Please keep in mind that fact that, for a given page 248de0f51e4SMauro Carvalho Chehabmapped into some user address space, there is always at least one more 249de0f51e4SMauro Carvalho Chehabmapping, that of the kernel in its linear mapping starting at 250de0f51e4SMauro Carvalho ChehabPAGE_OFFSET. So immediately, once the first user maps a given 251de0f51e4SMauro Carvalho Chehabphysical page into its address space, by implication the D-cache 252de0f51e4SMauro Carvalho Chehabaliasing problem has the potential to exist since the kernel already 253de0f51e4SMauro Carvalho Chehabmaps this page at its virtual address. 254de0f51e4SMauro Carvalho Chehab 255de0f51e4SMauro Carvalho Chehab ``void copy_user_page(void *to, void *from, unsigned long addr, struct page *page)`` 256de0f51e4SMauro Carvalho Chehab ``void clear_user_page(void *to, unsigned long addr, struct page *page)`` 257de0f51e4SMauro Carvalho Chehab 258de0f51e4SMauro Carvalho Chehab These two routines store data in user anonymous or COW 259de0f51e4SMauro Carvalho Chehab pages. It allows a port to efficiently avoid D-cache alias 260de0f51e4SMauro Carvalho Chehab issues between userspace and the kernel. 261de0f51e4SMauro Carvalho Chehab 262de0f51e4SMauro Carvalho Chehab For example, a port may temporarily map 'from' and 'to' to 263de0f51e4SMauro Carvalho Chehab kernel virtual addresses during the copy. The virtual address 264de0f51e4SMauro Carvalho Chehab for these two pages is chosen in such a way that the kernel 265de0f51e4SMauro Carvalho Chehab load/store instructions happen to virtual addresses which are 266de0f51e4SMauro Carvalho Chehab of the same "color" as the user mapping of the page. Sparc64 267de0f51e4SMauro Carvalho Chehab for example, uses this technique. 268de0f51e4SMauro Carvalho Chehab 269de0f51e4SMauro Carvalho Chehab The 'addr' parameter tells the virtual address where the 270de0f51e4SMauro Carvalho Chehab user will ultimately have this page mapped, and the 'page' 271de0f51e4SMauro Carvalho Chehab parameter gives a pointer to the struct page of the target. 272de0f51e4SMauro Carvalho Chehab 273de0f51e4SMauro Carvalho Chehab If D-cache aliasing is not an issue, these two routines may 274de0f51e4SMauro Carvalho Chehab simply call memcpy/memset directly and do nothing more. 275de0f51e4SMauro Carvalho Chehab 276*29d26f12SMatthew Wilcox (Oracle) ``void flush_dcache_folio(struct folio *folio)`` 277de0f51e4SMauro Carvalho Chehab 278f358afc5SChristoph Hellwig This routines must be called when: 279f358afc5SChristoph Hellwig 280f358afc5SChristoph Hellwig a) the kernel did write to a page that is in the page cache page 281f358afc5SChristoph Hellwig and / or in high memory 282f358afc5SChristoph Hellwig b) the kernel is about to read from a page cache page and user space 283f358afc5SChristoph Hellwig shared/writable mappings of this page potentially exist. Note 284*29d26f12SMatthew Wilcox (Oracle) that {get,pin}_user_pages{_fast} already call flush_dcache_folio 285f358afc5SChristoph Hellwig on any page found in the user address space and thus driver 286f358afc5SChristoph Hellwig code rarely needs to take this into account. 287de0f51e4SMauro Carvalho Chehab 288de0f51e4SMauro Carvalho Chehab .. note:: 289de0f51e4SMauro Carvalho Chehab 290de0f51e4SMauro Carvalho Chehab This routine need only be called for page cache pages 291de0f51e4SMauro Carvalho Chehab which can potentially ever be mapped into the address 292de0f51e4SMauro Carvalho Chehab space of a user process. So for example, VFS layer code 293de0f51e4SMauro Carvalho Chehab handling vfs symlinks in the page cache need not call 294de0f51e4SMauro Carvalho Chehab this interface at all. 295de0f51e4SMauro Carvalho Chehab 296f358afc5SChristoph Hellwig The phrase "kernel writes to a page cache page" means, specifically, 297f358afc5SChristoph Hellwig that the kernel executes store instructions that dirty data in that 298*29d26f12SMatthew Wilcox (Oracle) page at the kernel virtual mapping of that page. It is important to 299f358afc5SChristoph Hellwig flush here to handle D-cache aliasing, to make sure these kernel stores 300f358afc5SChristoph Hellwig are visible to user space mappings of that page. 301de0f51e4SMauro Carvalho Chehab 302f358afc5SChristoph Hellwig The corollary case is just as important, if there are users which have 303f358afc5SChristoph Hellwig shared+writable mappings of this file, we must make sure that kernel 304f358afc5SChristoph Hellwig reads of these pages will see the most recent stores done by the user. 305de0f51e4SMauro Carvalho Chehab 306f358afc5SChristoph Hellwig If D-cache aliasing is not an issue, this routine may simply be defined 307f358afc5SChristoph Hellwig as a nop on that architecture. 308de0f51e4SMauro Carvalho Chehab 309*29d26f12SMatthew Wilcox (Oracle) There is a bit set aside in folio->flags (PG_arch_1) as "architecture 310f358afc5SChristoph Hellwig private". The kernel guarantees that, for pagecache pages, it will 311f358afc5SChristoph Hellwig clear this bit when such a page first enters the pagecache. 312de0f51e4SMauro Carvalho Chehab 3133a255267SMatthew Wilcox (Oracle) This allows these interfaces to be implemented much more 3143a255267SMatthew Wilcox (Oracle) efficiently. It allows one to "defer" (perhaps indefinitely) the 3153a255267SMatthew Wilcox (Oracle) actual flush if there are currently no user processes mapping this 316*29d26f12SMatthew Wilcox (Oracle) page. See sparc64's flush_dcache_folio and update_mmu_cache_range 3173a255267SMatthew Wilcox (Oracle) implementations for an example of how to go about doing this. 318de0f51e4SMauro Carvalho Chehab 319*29d26f12SMatthew Wilcox (Oracle) The idea is, first at flush_dcache_folio() time, if 320*29d26f12SMatthew Wilcox (Oracle) folio_flush_mapping() returns a mapping, and mapping_mapped() on that 3213a255267SMatthew Wilcox (Oracle) mapping returns %false, just mark the architecture private page 3223a255267SMatthew Wilcox (Oracle) flag bit. Later, in update_mmu_cache_range(), a check is made 3233a255267SMatthew Wilcox (Oracle) of this flag bit, and if set the flush is done and the flag bit 3243a255267SMatthew Wilcox (Oracle) is cleared. 325de0f51e4SMauro Carvalho Chehab 326de0f51e4SMauro Carvalho Chehab .. important:: 327de0f51e4SMauro Carvalho Chehab 328de0f51e4SMauro Carvalho Chehab It is often important, if you defer the flush, 329de0f51e4SMauro Carvalho Chehab that the actual flush occurs on the same CPU 330de0f51e4SMauro Carvalho Chehab as did the cpu stores into the page to make it 331de0f51e4SMauro Carvalho Chehab dirty. Again, see sparc64 for examples of how 332de0f51e4SMauro Carvalho Chehab to deal with this. 333de0f51e4SMauro Carvalho Chehab 334de0f51e4SMauro Carvalho Chehab ``void copy_to_user_page(struct vm_area_struct *vma, struct page *page, 335de0f51e4SMauro Carvalho Chehab unsigned long user_vaddr, void *dst, void *src, int len)`` 336de0f51e4SMauro Carvalho Chehab ``void copy_from_user_page(struct vm_area_struct *vma, struct page *page, 337de0f51e4SMauro Carvalho Chehab unsigned long user_vaddr, void *dst, void *src, int len)`` 338de0f51e4SMauro Carvalho Chehab 339de0f51e4SMauro Carvalho Chehab When the kernel needs to copy arbitrary data in and out 340de0f51e4SMauro Carvalho Chehab of arbitrary user pages (f.e. for ptrace()) it will use 341de0f51e4SMauro Carvalho Chehab these two routines. 342de0f51e4SMauro Carvalho Chehab 343de0f51e4SMauro Carvalho Chehab Any necessary cache flushing or other coherency operations 344de0f51e4SMauro Carvalho Chehab that need to occur should happen here. If the processor's 345de0f51e4SMauro Carvalho Chehab instruction cache does not snoop cpu stores, it is very 346de0f51e4SMauro Carvalho Chehab likely that you will need to flush the instruction cache 347de0f51e4SMauro Carvalho Chehab for copy_to_user_page(). 348de0f51e4SMauro Carvalho Chehab 349de0f51e4SMauro Carvalho Chehab ``void flush_anon_page(struct vm_area_struct *vma, struct page *page, 350de0f51e4SMauro Carvalho Chehab unsigned long vmaddr)`` 351de0f51e4SMauro Carvalho Chehab 352de0f51e4SMauro Carvalho Chehab When the kernel needs to access the contents of an anonymous 353de0f51e4SMauro Carvalho Chehab page, it calls this function (currently only 354*29d26f12SMatthew Wilcox (Oracle) get_user_pages()). Note: flush_dcache_folio() deliberately 355de0f51e4SMauro Carvalho Chehab doesn't work for an anonymous page. The default 356de0f51e4SMauro Carvalho Chehab implementation is a nop (and should remain so for all coherent 357de0f51e4SMauro Carvalho Chehab architectures). For incoherent architectures, it should flush 358de0f51e4SMauro Carvalho Chehab the cache of the page at vmaddr. 359de0f51e4SMauro Carvalho Chehab 360de0f51e4SMauro Carvalho Chehab ``void flush_icache_range(unsigned long start, unsigned long end)`` 361de0f51e4SMauro Carvalho Chehab 362de0f51e4SMauro Carvalho Chehab When the kernel stores into addresses that it will execute 363de0f51e4SMauro Carvalho Chehab out of (eg when loading modules), this function is called. 364de0f51e4SMauro Carvalho Chehab 365de0f51e4SMauro Carvalho Chehab If the icache does not snoop stores then this routine will need 366de0f51e4SMauro Carvalho Chehab to flush it. 367de0f51e4SMauro Carvalho Chehab 368de0f51e4SMauro Carvalho Chehab ``void flush_icache_page(struct vm_area_struct *vma, struct page *page)`` 369de0f51e4SMauro Carvalho Chehab 370de0f51e4SMauro Carvalho Chehab All the functionality of flush_icache_page can be implemented in 371*29d26f12SMatthew Wilcox (Oracle) flush_dcache_folio and update_mmu_cache_range. In the future, the hope 372de0f51e4SMauro Carvalho Chehab is to remove this interface completely. 373de0f51e4SMauro Carvalho Chehab 374de0f51e4SMauro Carvalho ChehabThe final category of APIs is for I/O to deliberately aliased address 375de0f51e4SMauro Carvalho Chehabranges inside the kernel. Such aliases are set up by use of the 376de0f51e4SMauro Carvalho Chehabvmap/vmalloc API. Since kernel I/O goes via physical pages, the I/O 377de0f51e4SMauro Carvalho Chehabsubsystem assumes that the user mapping and kernel offset mapping are 378de0f51e4SMauro Carvalho Chehabthe only aliases. This isn't true for vmap aliases, so anything in 379de0f51e4SMauro Carvalho Chehabthe kernel trying to do I/O to vmap areas must manually manage 380de0f51e4SMauro Carvalho Chehabcoherency. It must do this by flushing the vmap range before doing 381de0f51e4SMauro Carvalho ChehabI/O and invalidating it after the I/O returns. 382de0f51e4SMauro Carvalho Chehab 383de0f51e4SMauro Carvalho Chehab ``void flush_kernel_vmap_range(void *vaddr, int size)`` 384de0f51e4SMauro Carvalho Chehab 385de0f51e4SMauro Carvalho Chehab flushes the kernel cache for a given virtual address range in 386de0f51e4SMauro Carvalho Chehab the vmap area. This is to make sure that any data the kernel 387de0f51e4SMauro Carvalho Chehab modified in the vmap range is made visible to the physical 388de0f51e4SMauro Carvalho Chehab page. The design is to make this area safe to perform I/O on. 389de0f51e4SMauro Carvalho Chehab Note that this API does *not* also flush the offset map alias 390de0f51e4SMauro Carvalho Chehab of the area. 391de0f51e4SMauro Carvalho Chehab 392de0f51e4SMauro Carvalho Chehab ``void invalidate_kernel_vmap_range(void *vaddr, int size) invalidates`` 393de0f51e4SMauro Carvalho Chehab 394de0f51e4SMauro Carvalho Chehab the cache for a given virtual address range in the vmap area 395de0f51e4SMauro Carvalho Chehab which prevents the processor from making the cache stale by 396de0f51e4SMauro Carvalho Chehab speculatively reading data while the I/O was occurring to the 397de0f51e4SMauro Carvalho Chehab physical pages. This is only necessary for data reads into the 398de0f51e4SMauro Carvalho Chehab vmap area. 399