xref: /linux/Documentation/core-api/cachetlb.rst (revision a1c613ae4c322ddd58d5a8539dbfba2a0380a8c0)
1de0f51e4SMauro Carvalho Chehab==================================
2de0f51e4SMauro Carvalho ChehabCache and TLB Flushing Under Linux
3de0f51e4SMauro Carvalho Chehab==================================
4de0f51e4SMauro Carvalho Chehab
5de0f51e4SMauro Carvalho Chehab:Author: David S. Miller <davem@redhat.com>
6de0f51e4SMauro Carvalho Chehab
7de0f51e4SMauro Carvalho ChehabThis document describes the cache/tlb flushing interfaces called
8de0f51e4SMauro Carvalho Chehabby the Linux VM subsystem.  It enumerates over each interface,
9de0f51e4SMauro Carvalho Chehabdescribes its intended purpose, and what side effect is expected
10de0f51e4SMauro Carvalho Chehabafter the interface is invoked.
11de0f51e4SMauro Carvalho Chehab
12de0f51e4SMauro Carvalho ChehabThe side effects described below are stated for a uniprocessor
13de0f51e4SMauro Carvalho Chehabimplementation, and what is to happen on that single processor.  The
14de0f51e4SMauro Carvalho ChehabSMP cases are a simple extension, in that you just extend the
15de0f51e4SMauro Carvalho Chehabdefinition such that the side effect for a particular interface occurs
16de0f51e4SMauro Carvalho Chehabon all processors in the system.  Don't let this scare you into
17de0f51e4SMauro Carvalho Chehabthinking SMP cache/tlb flushing must be so inefficient, this is in
18de0f51e4SMauro Carvalho Chehabfact an area where many optimizations are possible.  For example,
19de0f51e4SMauro Carvalho Chehabif it can be proven that a user address space has never executed
20de0f51e4SMauro Carvalho Chehabon a cpu (see mm_cpumask()), one need not perform a flush
21de0f51e4SMauro Carvalho Chehabfor this address space on that cpu.
22de0f51e4SMauro Carvalho Chehab
23de0f51e4SMauro Carvalho ChehabFirst, the TLB flushing interfaces, since they are the simplest.  The
24de0f51e4SMauro Carvalho Chehab"TLB" is abstracted under Linux as something the cpu uses to cache
25de0f51e4SMauro Carvalho Chehabvirtual-->physical address translations obtained from the software
26de0f51e4SMauro Carvalho Chehabpage tables.  Meaning that if the software page tables change, it is
27de0f51e4SMauro Carvalho Chehabpossible for stale translations to exist in this "TLB" cache.
28de0f51e4SMauro Carvalho ChehabTherefore when software page table changes occur, the kernel will
29de0f51e4SMauro Carvalho Chehabinvoke one of the following flush methods _after_ the page table
30de0f51e4SMauro Carvalho Chehabchanges occur:
31de0f51e4SMauro Carvalho Chehab
32de0f51e4SMauro Carvalho Chehab1) ``void flush_tlb_all(void)``
33de0f51e4SMauro Carvalho Chehab
34de0f51e4SMauro Carvalho Chehab	The most severe flush of all.  After this interface runs,
35de0f51e4SMauro Carvalho Chehab	any previous page table modification whatsoever will be
36de0f51e4SMauro Carvalho Chehab	visible to the cpu.
37de0f51e4SMauro Carvalho Chehab
38de0f51e4SMauro Carvalho Chehab	This is usually invoked when the kernel page tables are
39de0f51e4SMauro Carvalho Chehab	changed, since such translations are "global" in nature.
40de0f51e4SMauro Carvalho Chehab
41de0f51e4SMauro Carvalho Chehab2) ``void flush_tlb_mm(struct mm_struct *mm)``
42de0f51e4SMauro Carvalho Chehab
43de0f51e4SMauro Carvalho Chehab	This interface flushes an entire user address space from
44de0f51e4SMauro Carvalho Chehab	the TLB.  After running, this interface must make sure that
45de0f51e4SMauro Carvalho Chehab	any previous page table modifications for the address space
46de0f51e4SMauro Carvalho Chehab	'mm' will be visible to the cpu.  That is, after running,
47de0f51e4SMauro Carvalho Chehab	there will be no entries in the TLB for 'mm'.
48de0f51e4SMauro Carvalho Chehab
49de0f51e4SMauro Carvalho Chehab	This interface is used to handle whole address space
50de0f51e4SMauro Carvalho Chehab	page table operations such as what happens during
51de0f51e4SMauro Carvalho Chehab	fork, and exec.
52de0f51e4SMauro Carvalho Chehab
53de0f51e4SMauro Carvalho Chehab3) ``void flush_tlb_range(struct vm_area_struct *vma,
54de0f51e4SMauro Carvalho Chehab   unsigned long start, unsigned long end)``
55de0f51e4SMauro Carvalho Chehab
56de0f51e4SMauro Carvalho Chehab	Here we are flushing a specific range of (user) virtual
57de0f51e4SMauro Carvalho Chehab	address translations from the TLB.  After running, this
58de0f51e4SMauro Carvalho Chehab	interface must make sure that any previous page table
59de0f51e4SMauro Carvalho Chehab	modifications for the address space 'vma->vm_mm' in the range
60de0f51e4SMauro Carvalho Chehab	'start' to 'end-1' will be visible to the cpu.  That is, after
61de0f51e4SMauro Carvalho Chehab	running, there will be no entries in the TLB for 'mm' for
62de0f51e4SMauro Carvalho Chehab	virtual addresses in the range 'start' to 'end-1'.
63de0f51e4SMauro Carvalho Chehab
64de0f51e4SMauro Carvalho Chehab	The "vma" is the backing store being used for the region.
65de0f51e4SMauro Carvalho Chehab	Primarily, this is used for munmap() type operations.
66de0f51e4SMauro Carvalho Chehab
67de0f51e4SMauro Carvalho Chehab	The interface is provided in hopes that the port can find
68de0f51e4SMauro Carvalho Chehab	a suitably efficient method for removing multiple page
69de0f51e4SMauro Carvalho Chehab	sized translations from the TLB, instead of having the kernel
70de0f51e4SMauro Carvalho Chehab	call flush_tlb_page (see below) for each entry which may be
71de0f51e4SMauro Carvalho Chehab	modified.
72de0f51e4SMauro Carvalho Chehab
73de0f51e4SMauro Carvalho Chehab4) ``void flush_tlb_page(struct vm_area_struct *vma, unsigned long addr)``
74de0f51e4SMauro Carvalho Chehab
75de0f51e4SMauro Carvalho Chehab	This time we need to remove the PAGE_SIZE sized translation
76de0f51e4SMauro Carvalho Chehab	from the TLB.  The 'vma' is the backing structure used by
77de0f51e4SMauro Carvalho Chehab	Linux to keep track of mmap'd regions for a process, the
78de0f51e4SMauro Carvalho Chehab	address space is available via vma->vm_mm.  Also, one may
79de0f51e4SMauro Carvalho Chehab	test (vma->vm_flags & VM_EXEC) to see if this region is
80de0f51e4SMauro Carvalho Chehab	executable (and thus could be in the 'instruction TLB' in
81de0f51e4SMauro Carvalho Chehab	split-tlb type setups).
82de0f51e4SMauro Carvalho Chehab
83de0f51e4SMauro Carvalho Chehab	After running, this interface must make sure that any previous
84de0f51e4SMauro Carvalho Chehab	page table modification for address space 'vma->vm_mm' for
85de0f51e4SMauro Carvalho Chehab	user virtual address 'addr' will be visible to the cpu.  That
86de0f51e4SMauro Carvalho Chehab	is, after running, there will be no entries in the TLB for
87de0f51e4SMauro Carvalho Chehab	'vma->vm_mm' for virtual address 'addr'.
88de0f51e4SMauro Carvalho Chehab
89de0f51e4SMauro Carvalho Chehab	This is used primarily during fault processing.
90de0f51e4SMauro Carvalho Chehab
913a255267SMatthew Wilcox (Oracle)5) ``void update_mmu_cache_range(struct vm_fault *vmf,
923a255267SMatthew Wilcox (Oracle)   struct vm_area_struct *vma, unsigned long address, pte_t *ptep,
933a255267SMatthew Wilcox (Oracle)   unsigned int nr)``
94de0f51e4SMauro Carvalho Chehab
953a255267SMatthew Wilcox (Oracle)	At the end of every page fault, this routine is invoked to tell
963a255267SMatthew Wilcox (Oracle)	the architecture specific code that translations now exists
973a255267SMatthew Wilcox (Oracle)	in the software page tables for address space "vma->vm_mm"
983a255267SMatthew Wilcox (Oracle)	at virtual address "address" for "nr" consecutive pages.
993a255267SMatthew Wilcox (Oracle)
1003a255267SMatthew Wilcox (Oracle)	This routine is also invoked in various other places which pass
1013a255267SMatthew Wilcox (Oracle)	a NULL "vmf".
102de0f51e4SMauro Carvalho Chehab
103de0f51e4SMauro Carvalho Chehab	A port may use this information in any way it so chooses.
104de0f51e4SMauro Carvalho Chehab	For example, it could use this event to pre-load TLB
105de0f51e4SMauro Carvalho Chehab	translations for software managed TLB configurations.
106de0f51e4SMauro Carvalho Chehab	The sparc64 port currently does this.
107de0f51e4SMauro Carvalho Chehab
108de0f51e4SMauro Carvalho ChehabNext, we have the cache flushing interfaces.  In general, when Linux
109de0f51e4SMauro Carvalho Chehabis changing an existing virtual-->physical mapping to a new value,
110de0f51e4SMauro Carvalho Chehabthe sequence will be in one of the following forms::
111de0f51e4SMauro Carvalho Chehab
112de0f51e4SMauro Carvalho Chehab	1) flush_cache_mm(mm);
113de0f51e4SMauro Carvalho Chehab	   change_all_page_tables_of(mm);
114de0f51e4SMauro Carvalho Chehab	   flush_tlb_mm(mm);
115de0f51e4SMauro Carvalho Chehab
116de0f51e4SMauro Carvalho Chehab	2) flush_cache_range(vma, start, end);
117de0f51e4SMauro Carvalho Chehab	   change_range_of_page_tables(mm, start, end);
118de0f51e4SMauro Carvalho Chehab	   flush_tlb_range(vma, start, end);
119de0f51e4SMauro Carvalho Chehab
120de0f51e4SMauro Carvalho Chehab	3) flush_cache_page(vma, addr, pfn);
121de0f51e4SMauro Carvalho Chehab	   set_pte(pte_pointer, new_pte_val);
122de0f51e4SMauro Carvalho Chehab	   flush_tlb_page(vma, addr);
123de0f51e4SMauro Carvalho Chehab
124de0f51e4SMauro Carvalho ChehabThe cache level flush will always be first, because this allows
125de0f51e4SMauro Carvalho Chehabus to properly handle systems whose caches are strict and require
126de0f51e4SMauro Carvalho Chehaba virtual-->physical translation to exist for a virtual address
127de0f51e4SMauro Carvalho Chehabwhen that virtual address is flushed from the cache.  The HyperSparc
128de0f51e4SMauro Carvalho Chehabcpu is one such cpu with this attribute.
129de0f51e4SMauro Carvalho Chehab
130de0f51e4SMauro Carvalho ChehabThe cache flushing routines below need only deal with cache flushing
131de0f51e4SMauro Carvalho Chehabto the extent that it is necessary for a particular cpu.  Mostly,
132de0f51e4SMauro Carvalho Chehabthese routines must be implemented for cpus which have virtually
133de0f51e4SMauro Carvalho Chehabindexed caches which must be flushed when virtual-->physical
134de0f51e4SMauro Carvalho Chehabtranslations are changed or removed.  So, for example, the physically
135de0f51e4SMauro Carvalho Chehabindexed physically tagged caches of IA32 processors have no need to
136de0f51e4SMauro Carvalho Chehabimplement these interfaces since the caches are fully synchronized
137de0f51e4SMauro Carvalho Chehaband have no dependency on translation information.
138de0f51e4SMauro Carvalho Chehab
139de0f51e4SMauro Carvalho ChehabHere are the routines, one by one:
140de0f51e4SMauro Carvalho Chehab
141de0f51e4SMauro Carvalho Chehab1) ``void flush_cache_mm(struct mm_struct *mm)``
142de0f51e4SMauro Carvalho Chehab
143de0f51e4SMauro Carvalho Chehab	This interface flushes an entire user address space from
144de0f51e4SMauro Carvalho Chehab	the caches.  That is, after running, there will be no cache
145de0f51e4SMauro Carvalho Chehab	lines associated with 'mm'.
146de0f51e4SMauro Carvalho Chehab
147de0f51e4SMauro Carvalho Chehab	This interface is used to handle whole address space
148de0f51e4SMauro Carvalho Chehab	page table operations such as what happens during exit and exec.
149de0f51e4SMauro Carvalho Chehab
150de0f51e4SMauro Carvalho Chehab2) ``void flush_cache_dup_mm(struct mm_struct *mm)``
151de0f51e4SMauro Carvalho Chehab
152de0f51e4SMauro Carvalho Chehab	This interface flushes an entire user address space from
153de0f51e4SMauro Carvalho Chehab	the caches.  That is, after running, there will be no cache
154de0f51e4SMauro Carvalho Chehab	lines associated with 'mm'.
155de0f51e4SMauro Carvalho Chehab
156de0f51e4SMauro Carvalho Chehab	This interface is used to handle whole address space
157de0f51e4SMauro Carvalho Chehab	page table operations such as what happens during fork.
158de0f51e4SMauro Carvalho Chehab
159de0f51e4SMauro Carvalho Chehab	This option is separate from flush_cache_mm to allow some
160de0f51e4SMauro Carvalho Chehab	optimizations for VIPT caches.
161de0f51e4SMauro Carvalho Chehab
162de0f51e4SMauro Carvalho Chehab3) ``void flush_cache_range(struct vm_area_struct *vma,
163de0f51e4SMauro Carvalho Chehab   unsigned long start, unsigned long end)``
164de0f51e4SMauro Carvalho Chehab
165de0f51e4SMauro Carvalho Chehab	Here we are flushing a specific range of (user) virtual
166de0f51e4SMauro Carvalho Chehab	addresses from the cache.  After running, there will be no
167de0f51e4SMauro Carvalho Chehab	entries in the cache for 'vma->vm_mm' for virtual addresses in
168de0f51e4SMauro Carvalho Chehab	the range 'start' to 'end-1'.
169de0f51e4SMauro Carvalho Chehab
170de0f51e4SMauro Carvalho Chehab	The "vma" is the backing store being used for the region.
171de0f51e4SMauro Carvalho Chehab	Primarily, this is used for munmap() type operations.
172de0f51e4SMauro Carvalho Chehab
173de0f51e4SMauro Carvalho Chehab	The interface is provided in hopes that the port can find
174de0f51e4SMauro Carvalho Chehab	a suitably efficient method for removing multiple page
175de0f51e4SMauro Carvalho Chehab	sized regions from the cache, instead of having the kernel
176de0f51e4SMauro Carvalho Chehab	call flush_cache_page (see below) for each entry which may be
177de0f51e4SMauro Carvalho Chehab	modified.
178de0f51e4SMauro Carvalho Chehab
179de0f51e4SMauro Carvalho Chehab4) ``void flush_cache_page(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn)``
180de0f51e4SMauro Carvalho Chehab
181de0f51e4SMauro Carvalho Chehab	This time we need to remove a PAGE_SIZE sized range
182de0f51e4SMauro Carvalho Chehab	from the cache.  The 'vma' is the backing structure used by
183de0f51e4SMauro Carvalho Chehab	Linux to keep track of mmap'd regions for a process, the
184de0f51e4SMauro Carvalho Chehab	address space is available via vma->vm_mm.  Also, one may
185de0f51e4SMauro Carvalho Chehab	test (vma->vm_flags & VM_EXEC) to see if this region is
186de0f51e4SMauro Carvalho Chehab	executable (and thus could be in the 'instruction cache' in
187de0f51e4SMauro Carvalho Chehab	"Harvard" type cache layouts).
188de0f51e4SMauro Carvalho Chehab
189de0f51e4SMauro Carvalho Chehab	The 'pfn' indicates the physical page frame (shift this value
190de0f51e4SMauro Carvalho Chehab	left by PAGE_SHIFT to get the physical address) that 'addr'
191de0f51e4SMauro Carvalho Chehab	translates to.  It is this mapping which should be removed from
192de0f51e4SMauro Carvalho Chehab	the cache.
193de0f51e4SMauro Carvalho Chehab
194de0f51e4SMauro Carvalho Chehab	After running, there will be no entries in the cache for
195de0f51e4SMauro Carvalho Chehab	'vma->vm_mm' for virtual address 'addr' which translates
196de0f51e4SMauro Carvalho Chehab	to 'pfn'.
197de0f51e4SMauro Carvalho Chehab
198de0f51e4SMauro Carvalho Chehab	This is used primarily during fault processing.
199de0f51e4SMauro Carvalho Chehab
200de0f51e4SMauro Carvalho Chehab5) ``void flush_cache_kmaps(void)``
201de0f51e4SMauro Carvalho Chehab
202de0f51e4SMauro Carvalho Chehab	This routine need only be implemented if the platform utilizes
203de0f51e4SMauro Carvalho Chehab	highmem.  It will be called right before all of the kmaps
204de0f51e4SMauro Carvalho Chehab	are invalidated.
205de0f51e4SMauro Carvalho Chehab
206de0f51e4SMauro Carvalho Chehab	After running, there will be no entries in the cache for
207de0f51e4SMauro Carvalho Chehab	the kernel virtual address range PKMAP_ADDR(0) to
208de0f51e4SMauro Carvalho Chehab	PKMAP_ADDR(LAST_PKMAP).
209de0f51e4SMauro Carvalho Chehab
210de0f51e4SMauro Carvalho Chehab	This routing should be implemented in asm/highmem.h
211de0f51e4SMauro Carvalho Chehab
212de0f51e4SMauro Carvalho Chehab6) ``void flush_cache_vmap(unsigned long start, unsigned long end)``
213de0f51e4SMauro Carvalho Chehab   ``void flush_cache_vunmap(unsigned long start, unsigned long end)``
214de0f51e4SMauro Carvalho Chehab
215de0f51e4SMauro Carvalho Chehab	Here in these two interfaces we are flushing a specific range
216de0f51e4SMauro Carvalho Chehab	of (kernel) virtual addresses from the cache.  After running,
217de0f51e4SMauro Carvalho Chehab	there will be no entries in the cache for the kernel address
218de0f51e4SMauro Carvalho Chehab	space for virtual addresses in the range 'start' to 'end-1'.
219de0f51e4SMauro Carvalho Chehab
220b67177ecSNicholas Piggin	The first of these two routines is invoked after vmap_range()
221de0f51e4SMauro Carvalho Chehab	has installed the page table entries.  The second is invoked
2224ad0ae8cSNicholas Piggin	before vunmap_range() deletes the page table entries.
223de0f51e4SMauro Carvalho Chehab
224de0f51e4SMauro Carvalho ChehabThere exists another whole class of cpu cache issues which currently
225de0f51e4SMauro Carvalho Chehabrequire a whole different set of interfaces to handle properly.
226de0f51e4SMauro Carvalho ChehabThe biggest problem is that of virtual aliasing in the data cache
227de0f51e4SMauro Carvalho Chehabof a processor.
228de0f51e4SMauro Carvalho Chehab
229de0f51e4SMauro Carvalho ChehabIs your port susceptible to virtual aliasing in its D-cache?
230de0f51e4SMauro Carvalho ChehabWell, if your D-cache is virtually indexed, is larger in size than
231de0f51e4SMauro Carvalho ChehabPAGE_SIZE, and does not prevent multiple cache lines for the same
232de0f51e4SMauro Carvalho Chehabphysical address from existing at once, you have this problem.
233de0f51e4SMauro Carvalho Chehab
234de0f51e4SMauro Carvalho ChehabIf your D-cache has this problem, first define asm/shmparam.h SHMLBA
235de0f51e4SMauro Carvalho Chehabproperly, it should essentially be the size of your virtually
236de0f51e4SMauro Carvalho Chehabaddressed D-cache (or if the size is variable, the largest possible
237de0f51e4SMauro Carvalho Chehabsize).  This setting will force the SYSv IPC layer to only allow user
238de0f51e4SMauro Carvalho Chehabprocesses to mmap shared memory at address which are a multiple of
239de0f51e4SMauro Carvalho Chehabthis value.
240de0f51e4SMauro Carvalho Chehab
241de0f51e4SMauro Carvalho Chehab.. note::
242de0f51e4SMauro Carvalho Chehab
243de0f51e4SMauro Carvalho Chehab  This does not fix shared mmaps, check out the sparc64 port for
244de0f51e4SMauro Carvalho Chehab  one way to solve this (in particular SPARC_FLAG_MMAPSHARED).
245de0f51e4SMauro Carvalho Chehab
246de0f51e4SMauro Carvalho ChehabNext, you have to solve the D-cache aliasing issue for all
247de0f51e4SMauro Carvalho Chehabother cases.  Please keep in mind that fact that, for a given page
248de0f51e4SMauro Carvalho Chehabmapped into some user address space, there is always at least one more
249de0f51e4SMauro Carvalho Chehabmapping, that of the kernel in its linear mapping starting at
250de0f51e4SMauro Carvalho ChehabPAGE_OFFSET.  So immediately, once the first user maps a given
251de0f51e4SMauro Carvalho Chehabphysical page into its address space, by implication the D-cache
252de0f51e4SMauro Carvalho Chehabaliasing problem has the potential to exist since the kernel already
253de0f51e4SMauro Carvalho Chehabmaps this page at its virtual address.
254de0f51e4SMauro Carvalho Chehab
255de0f51e4SMauro Carvalho Chehab  ``void copy_user_page(void *to, void *from, unsigned long addr, struct page *page)``
256de0f51e4SMauro Carvalho Chehab  ``void clear_user_page(void *to, unsigned long addr, struct page *page)``
257de0f51e4SMauro Carvalho Chehab
258de0f51e4SMauro Carvalho Chehab	These two routines store data in user anonymous or COW
259de0f51e4SMauro Carvalho Chehab	pages.  It allows a port to efficiently avoid D-cache alias
260de0f51e4SMauro Carvalho Chehab	issues between userspace and the kernel.
261de0f51e4SMauro Carvalho Chehab
262de0f51e4SMauro Carvalho Chehab	For example, a port may temporarily map 'from' and 'to' to
263de0f51e4SMauro Carvalho Chehab	kernel virtual addresses during the copy.  The virtual address
264de0f51e4SMauro Carvalho Chehab	for these two pages is chosen in such a way that the kernel
265de0f51e4SMauro Carvalho Chehab	load/store instructions happen to virtual addresses which are
266de0f51e4SMauro Carvalho Chehab	of the same "color" as the user mapping of the page.  Sparc64
267de0f51e4SMauro Carvalho Chehab	for example, uses this technique.
268de0f51e4SMauro Carvalho Chehab
269de0f51e4SMauro Carvalho Chehab	The 'addr' parameter tells the virtual address where the
270de0f51e4SMauro Carvalho Chehab	user will ultimately have this page mapped, and the 'page'
271de0f51e4SMauro Carvalho Chehab	parameter gives a pointer to the struct page of the target.
272de0f51e4SMauro Carvalho Chehab
273de0f51e4SMauro Carvalho Chehab	If D-cache aliasing is not an issue, these two routines may
274de0f51e4SMauro Carvalho Chehab	simply call memcpy/memset directly and do nothing more.
275de0f51e4SMauro Carvalho Chehab
276*29d26f12SMatthew Wilcox (Oracle)  ``void flush_dcache_folio(struct folio *folio)``
277de0f51e4SMauro Carvalho Chehab
278f358afc5SChristoph Hellwig        This routines must be called when:
279f358afc5SChristoph Hellwig
280f358afc5SChristoph Hellwig	  a) the kernel did write to a page that is in the page cache page
281f358afc5SChristoph Hellwig	     and / or in high memory
282f358afc5SChristoph Hellwig	  b) the kernel is about to read from a page cache page and user space
283f358afc5SChristoph Hellwig	     shared/writable mappings of this page potentially exist.  Note
284*29d26f12SMatthew Wilcox (Oracle)	     that {get,pin}_user_pages{_fast} already call flush_dcache_folio
285f358afc5SChristoph Hellwig	     on any page found in the user address space and thus driver
286f358afc5SChristoph Hellwig	     code rarely needs to take this into account.
287de0f51e4SMauro Carvalho Chehab
288de0f51e4SMauro Carvalho Chehab	.. note::
289de0f51e4SMauro Carvalho Chehab
290de0f51e4SMauro Carvalho Chehab	      This routine need only be called for page cache pages
291de0f51e4SMauro Carvalho Chehab	      which can potentially ever be mapped into the address
292de0f51e4SMauro Carvalho Chehab	      space of a user process.  So for example, VFS layer code
293de0f51e4SMauro Carvalho Chehab	      handling vfs symlinks in the page cache need not call
294de0f51e4SMauro Carvalho Chehab	      this interface at all.
295de0f51e4SMauro Carvalho Chehab
296f358afc5SChristoph Hellwig	The phrase "kernel writes to a page cache page" means, specifically,
297f358afc5SChristoph Hellwig	that the kernel executes store instructions that dirty data in that
298*29d26f12SMatthew Wilcox (Oracle)	page at the kernel virtual mapping of that page.  It is important to
299f358afc5SChristoph Hellwig	flush here to handle D-cache aliasing, to make sure these kernel stores
300f358afc5SChristoph Hellwig	are visible to user space mappings of that page.
301de0f51e4SMauro Carvalho Chehab
302f358afc5SChristoph Hellwig	The corollary case is just as important, if there are users which have
303f358afc5SChristoph Hellwig	shared+writable mappings of this file, we must make sure that kernel
304f358afc5SChristoph Hellwig	reads of these pages will see the most recent stores done by the user.
305de0f51e4SMauro Carvalho Chehab
306f358afc5SChristoph Hellwig	If D-cache aliasing is not an issue, this routine may simply be defined
307f358afc5SChristoph Hellwig	as a nop on that architecture.
308de0f51e4SMauro Carvalho Chehab
309*29d26f12SMatthew Wilcox (Oracle)        There is a bit set aside in folio->flags (PG_arch_1) as "architecture
310f358afc5SChristoph Hellwig	private".  The kernel guarantees that, for pagecache pages, it will
311f358afc5SChristoph Hellwig	clear this bit when such a page first enters the pagecache.
312de0f51e4SMauro Carvalho Chehab
3133a255267SMatthew Wilcox (Oracle)	This allows these interfaces to be implemented much more
3143a255267SMatthew Wilcox (Oracle)	efficiently.  It allows one to "defer" (perhaps indefinitely) the
3153a255267SMatthew Wilcox (Oracle)	actual flush if there are currently no user processes mapping this
316*29d26f12SMatthew Wilcox (Oracle)	page.  See sparc64's flush_dcache_folio and update_mmu_cache_range
3173a255267SMatthew Wilcox (Oracle)	implementations for an example of how to go about doing this.
318de0f51e4SMauro Carvalho Chehab
319*29d26f12SMatthew Wilcox (Oracle)	The idea is, first at flush_dcache_folio() time, if
320*29d26f12SMatthew Wilcox (Oracle)	folio_flush_mapping() returns a mapping, and mapping_mapped() on that
3213a255267SMatthew Wilcox (Oracle)	mapping returns %false, just mark the architecture private page
3223a255267SMatthew Wilcox (Oracle)	flag bit.  Later, in update_mmu_cache_range(), a check is made
3233a255267SMatthew Wilcox (Oracle)	of this flag bit, and if set the flush is done and the flag bit
3243a255267SMatthew Wilcox (Oracle)	is cleared.
325de0f51e4SMauro Carvalho Chehab
326de0f51e4SMauro Carvalho Chehab	.. important::
327de0f51e4SMauro Carvalho Chehab
328de0f51e4SMauro Carvalho Chehab			It is often important, if you defer the flush,
329de0f51e4SMauro Carvalho Chehab			that the actual flush occurs on the same CPU
330de0f51e4SMauro Carvalho Chehab			as did the cpu stores into the page to make it
331de0f51e4SMauro Carvalho Chehab			dirty.  Again, see sparc64 for examples of how
332de0f51e4SMauro Carvalho Chehab			to deal with this.
333de0f51e4SMauro Carvalho Chehab
334de0f51e4SMauro Carvalho Chehab  ``void copy_to_user_page(struct vm_area_struct *vma, struct page *page,
335de0f51e4SMauro Carvalho Chehab  unsigned long user_vaddr, void *dst, void *src, int len)``
336de0f51e4SMauro Carvalho Chehab  ``void copy_from_user_page(struct vm_area_struct *vma, struct page *page,
337de0f51e4SMauro Carvalho Chehab  unsigned long user_vaddr, void *dst, void *src, int len)``
338de0f51e4SMauro Carvalho Chehab
339de0f51e4SMauro Carvalho Chehab	When the kernel needs to copy arbitrary data in and out
340de0f51e4SMauro Carvalho Chehab	of arbitrary user pages (f.e. for ptrace()) it will use
341de0f51e4SMauro Carvalho Chehab	these two routines.
342de0f51e4SMauro Carvalho Chehab
343de0f51e4SMauro Carvalho Chehab	Any necessary cache flushing or other coherency operations
344de0f51e4SMauro Carvalho Chehab	that need to occur should happen here.  If the processor's
345de0f51e4SMauro Carvalho Chehab	instruction cache does not snoop cpu stores, it is very
346de0f51e4SMauro Carvalho Chehab	likely that you will need to flush the instruction cache
347de0f51e4SMauro Carvalho Chehab	for copy_to_user_page().
348de0f51e4SMauro Carvalho Chehab
349de0f51e4SMauro Carvalho Chehab  ``void flush_anon_page(struct vm_area_struct *vma, struct page *page,
350de0f51e4SMauro Carvalho Chehab  unsigned long vmaddr)``
351de0f51e4SMauro Carvalho Chehab
352de0f51e4SMauro Carvalho Chehab  	When the kernel needs to access the contents of an anonymous
353de0f51e4SMauro Carvalho Chehab	page, it calls this function (currently only
354*29d26f12SMatthew Wilcox (Oracle)	get_user_pages()).  Note: flush_dcache_folio() deliberately
355de0f51e4SMauro Carvalho Chehab	doesn't work for an anonymous page.  The default
356de0f51e4SMauro Carvalho Chehab	implementation is a nop (and should remain so for all coherent
357de0f51e4SMauro Carvalho Chehab	architectures).  For incoherent architectures, it should flush
358de0f51e4SMauro Carvalho Chehab	the cache of the page at vmaddr.
359de0f51e4SMauro Carvalho Chehab
360de0f51e4SMauro Carvalho Chehab  ``void flush_icache_range(unsigned long start, unsigned long end)``
361de0f51e4SMauro Carvalho Chehab
362de0f51e4SMauro Carvalho Chehab  	When the kernel stores into addresses that it will execute
363de0f51e4SMauro Carvalho Chehab	out of (eg when loading modules), this function is called.
364de0f51e4SMauro Carvalho Chehab
365de0f51e4SMauro Carvalho Chehab	If the icache does not snoop stores then this routine will need
366de0f51e4SMauro Carvalho Chehab	to flush it.
367de0f51e4SMauro Carvalho Chehab
368de0f51e4SMauro Carvalho Chehab  ``void flush_icache_page(struct vm_area_struct *vma, struct page *page)``
369de0f51e4SMauro Carvalho Chehab
370de0f51e4SMauro Carvalho Chehab	All the functionality of flush_icache_page can be implemented in
371*29d26f12SMatthew Wilcox (Oracle)	flush_dcache_folio and update_mmu_cache_range. In the future, the hope
372de0f51e4SMauro Carvalho Chehab	is to remove this interface completely.
373de0f51e4SMauro Carvalho Chehab
374de0f51e4SMauro Carvalho ChehabThe final category of APIs is for I/O to deliberately aliased address
375de0f51e4SMauro Carvalho Chehabranges inside the kernel.  Such aliases are set up by use of the
376de0f51e4SMauro Carvalho Chehabvmap/vmalloc API.  Since kernel I/O goes via physical pages, the I/O
377de0f51e4SMauro Carvalho Chehabsubsystem assumes that the user mapping and kernel offset mapping are
378de0f51e4SMauro Carvalho Chehabthe only aliases.  This isn't true for vmap aliases, so anything in
379de0f51e4SMauro Carvalho Chehabthe kernel trying to do I/O to vmap areas must manually manage
380de0f51e4SMauro Carvalho Chehabcoherency.  It must do this by flushing the vmap range before doing
381de0f51e4SMauro Carvalho ChehabI/O and invalidating it after the I/O returns.
382de0f51e4SMauro Carvalho Chehab
383de0f51e4SMauro Carvalho Chehab  ``void flush_kernel_vmap_range(void *vaddr, int size)``
384de0f51e4SMauro Carvalho Chehab
385de0f51e4SMauro Carvalho Chehab       flushes the kernel cache for a given virtual address range in
386de0f51e4SMauro Carvalho Chehab       the vmap area.  This is to make sure that any data the kernel
387de0f51e4SMauro Carvalho Chehab       modified in the vmap range is made visible to the physical
388de0f51e4SMauro Carvalho Chehab       page.  The design is to make this area safe to perform I/O on.
389de0f51e4SMauro Carvalho Chehab       Note that this API does *not* also flush the offset map alias
390de0f51e4SMauro Carvalho Chehab       of the area.
391de0f51e4SMauro Carvalho Chehab
392de0f51e4SMauro Carvalho Chehab  ``void invalidate_kernel_vmap_range(void *vaddr, int size) invalidates``
393de0f51e4SMauro Carvalho Chehab
394de0f51e4SMauro Carvalho Chehab       the cache for a given virtual address range in the vmap area
395de0f51e4SMauro Carvalho Chehab       which prevents the processor from making the cache stale by
396de0f51e4SMauro Carvalho Chehab       speculatively reading data while the I/O was occurring to the
397de0f51e4SMauro Carvalho Chehab       physical pages.  This is only necessary for data reads into the
398de0f51e4SMauro Carvalho Chehab       vmap area.
399