xref: /linux/Documentation/filesystems/propagate_umount.txt (revision 794cbac9c053155754d04231b9365f91ea4ce7d2)
1	Notes on propagate_umount()
2
3Umount propagation starts with a set of mounts we are already going to
4take out.  Ideally, we would like to add all downstream cognates to
5that set - anything with the same mountpoint as one of the removed
6mounts and with parent that would receive events from the parent of that
7mount.  However, there are some constraints the resulting set must
8satisfy.
9
10It is convenient to define several properties of sets of mounts:
11
121) A set S of mounts is non-shifting if for any mount X belonging
13to S all subtrees mounted strictly inside of X (i.e. not overmounting
14the root of X) contain only elements of S.
15
162) A set S is non-revealing if all locked mounts that belong to S have
17parents that also belong to S.
18
193) A set S is closed if it contains all children of its elements.
20
21The set of mounts taken out by umount(2) must be non-shifting and
22non-revealing; the first constraint is what allows to reparent
23any remaining mounts and the second is what prevents the exposure
24of any concealed mountpoints.
25
26propagate_umount() takes the original set as an argument and tries to
27extend that set.  The original set is a full subtree and its root is
28unlocked; what matters is that it's closed and non-revealing.
29Resulting set may not be closed; there might still be mounts outside
30of that set, but only on top of stacks of root-overmounting elements
31of set.  They can be reparented to the place where the bottom of
32stack is attached to a mount that will survive.  NOTE: doing that
33will violate a constraint on having no more than one mount with
34the same parent/mountpoint pair; however, the caller (umount_tree())
35will immediately remedy that - it may keep unmounted element attached
36to parent, but only if the parent itself is unmounted.  Since all
37conflicts created by reparenting have common parent *not* in the
38set and one side of the conflict (bottom of the stack of overmounts)
39is in the set, it will be resolved.  However, we rely upon umount_tree()
40doing that pretty much immediately after the call of propagate_umount().
41
42Algorithm is based on two statements:
43	1) for any set S, there is a maximal non-shifting subset of S
44and it can be calculated in O(#S) time.
45	2) for any non-shifting set S, there is a maximal non-revealing
46subset of S.  That subset is also non-shifting and it can be calculated
47in O(#S) time.
48
49		Finding candidates.
50
51We are given a closed set U and we want to find all mounts that have
52the same mountpoint as some mount m in U *and* whose parent receives
53propagation from the parent of the same mount m.  Naive implementation
54would be
55	S = {}
56	for each m in U
57		add m to S
58		p = parent(m)
59		for each q in Propagation(p) - {p}
60			child = look_up(q, mountpoint(m))
61			if child
62				add child to S
63but that can lead to excessive work - there might be propagation among the
64subtrees of U, in which case we'd end up examining the same candidates
65many times.  Since propagation is transitive, the same will happen to
66everything downstream of that candidate and it's not hard to construct
67cases where the approach above leads to the time quadratic by the actual
68number of candidates.
69
70Note that if we run into a candidate we'd already seen, it must've been
71added on an earlier iteration of the outer loop - all additions made
72during one iteration of the outer loop have different parents.  So
73if we find a child already added to the set, we know that everything
74in Propagation(parent(child)) with the same mountpoint has been already
75added.
76	S = {}
77	for each m in U
78		if m in S
79			continue
80		add m to S
81		p = parent(m)
82		q = propagation_next(p, p)
83		while q
84			child = look_up(q, mountpoint(m))
85			if child
86				if child in S
87					q = skip_them(q, p)
88					continue;
89				add child to S
90			q = propagation_next(q, p)
91where
92skip_them(q, p)
93	keep walking Propagation(p) from q until we find something
94	not in Propagation(q)
95
96would get rid of that problem, but we need a sane implementation of
97skip_them().  That's not hard to do - split propagation_next() into
98"down into mnt_slave_list" and "forward-and-up" parts, with the
99skip_them() being "repeat the forward-and-up part until we get NULL
100or something that isn't a peer of the one we are skipping".
101
102Note that there can be no absolute roots among the extra candidates -
103they all come from mount lookups.  Absolute root among the original
104set is _currently_ impossible, but it might be worth protecting
105against.
106
107		Maximal non-shifting subsets.
108
109Let's call a mount m in a set S forbidden in that set if there is a
110subtree mounted strictly inside m and containing mounts that do not
111belong to S.
112
113The set is non-shifting when none of its elements are forbidden in it.
114
115If mount m is forbidden in a set S, it is forbidden in any subset S' it
116belongs to.  In other words, it can't belong to any of the non-shifting
117subsets of S.  If we had a way to find a forbidden mount or show that
118there's none, we could use it to find the maximal non-shifting subset
119simply by finding and removing them until none remain.
120
121Suppose mount m is forbidden in S; then any mounts forbidden in S - {m}
122must have been forbidden in S itself.  Indeed, since m has descendents
123that do not belong to S, any subtree that fits into S will fit into
124S - {m} as well.
125
126So in principle we could go through elements of S, checking if they
127are forbidden in S and removing the ones that are.  Removals will
128not invalidate the checks done for earlier mounts - if they were not
129forbidden at the time we checked, they won't become forbidden later.
130It's too costly to be practical, but there is a similar approach that
131is linear by size of S.
132
133Let's say that mount x in a set S is forbidden by mount y, if
134	* both x and y belong to S.
135	* there is a chain of mounts starting at x and leaving S
136	  immediately after passing through y, with the first
137	  mountpoint strictly inside x.
138Note 1: x may be equal to y - that's the case when something not
139belonging to S is mounted strictly inside x.
140Note 2: if y does not belong to S, it can't forbid anything in S.
141Note 3: if y has no children outside of S, it can't forbid anything in S.
142
143It's easy to show that mount x is forbidden in S if and only if x is
144forbidden in S by some mount y.  And it's easy to find all mounts in S
145forbidden by a given mount.
146
147Consider the following operation:
148	Trim(S, m) = S - {x : x is forbidden by m in S}
149
150Note that if m does not belong to S or has no children outside of S we
151are guaranteed that Trim(S, m) is equal to S.
152
153The following is true: if x is forbidden by y in Trim(S, m), it was
154already forbidden by y in S.
155
156Proof: Suppose x is forbidden by y in Trim(S, m).  Then there is a
157chain of mounts (x_0 = x, ..., x_k = y, x_{k+1} = r), such that x_{k+1}
158is the first element that doesn't belong to Trim(S, m) and the
159mountpoint of x_1 is strictly inside x.  If mount r belongs to S, it must
160have been removed by Trim(S, m), i.e. it was forbidden in S by m.
161Then there was a mount chain from r to some child of m that stayed in
162S all the way until m, but that's impossible since x belongs to Trim(S, m)
163and prepending (x_0, ..., x_k) to that chain demonstrates that x is also
164forbidden in S by m, and thus can't belong to Trim(S, m).
165Therefore r can not belong to S and our chain demonstrates that
166x is forbidden by y in S.  QED.
167
168Corollary: no mount is forbidden by m in Trim(S, m).  Indeed, any
169such mount would have been forbidden by m in S and thus would have been
170in the part of S removed in Trim(S, m).
171
172Corollary: no mount is forbidden by m in Trim(Trim(S, m), n).  Indeed,
173any such would have to have been forbidden by m in Trim(S, m), which
174is impossible.
175
176Corollary: after
177	S = Trim(S, x_1)
178	S = Trim(S, x_2)
179	...
180	S = Trim(S, x_k)
181no mount remaining in S will be forbidden by either of x_1,...,x_k.
182
183The following will reduce S to its maximal non-shifting subset:
184	visited = {}
185	while S contains elements not belonging to visited
186		let m be an arbitrary such element of S
187		S = Trim(S, m)
188		add m to visited
189
190S never grows, so the number of elements of S not belonging to visited
191decreases at least by one on each iteration.  When the loop terminates,
192all mounts remaining in S belong to visited.  It's easy to see that at
193the beginning of each iteration no mount remaining in S will be forbidden
194by any element of visited.  In other words, no mount remaining in S will
195be forbidden, i.e. final value of S will be non-shifting.  It will be
196the maximal non-shifting subset, since we were removing only forbidden
197elements.
198
199	There are two difficulties in implementing the above in linear
200time, both due to the fact that Trim() might need to remove more than one
201element.  Naive implementation of Trim() is vulnerable to running into a
202long chain of mounts, each mounted on top of parent's root.  Nothing in
203that chain is forbidden, so nothing gets removed from it.  We need to
204recognize such chains and avoid walking them again on subsequent calls of
205Trim(), otherwise we will end up with worst-case time being quadratic by
206the number of elements in S.  Another difficulty is in implementing the
207outer loop - we need to iterate through all elements of a shrinking set.
208That would be trivial if we never removed more than one element at a time
209(linked list, with list_for_each_entry_safe for iterator), but we may
210need to remove more than one entry, possibly including the ones we have
211already visited.
212
213	Let's start with naive algorithm for Trim():
214
215Trim_one(m)
216	found = false
217	for each n in children(m)
218		if n not in S
219			found = true
220			if (mountpoint(n) != root(m))
221				remove m from S
222				break
223	if found
224		Trim_ancestors(m)
225
226Trim_ancestors(m)
227	for (; parent(m) in S; m = parent(m)) {
228		if (mountpoint(m) != root(parent(m)))
229			remove parent(m) from S
230	}
231
232If m belongs to S, Trim_one(m) will replace S with Trim(S, m).
233Proof:
234	Consider the chains excluding elements from Trim(S, m).  The last
235two elements in such chain are m and some child of m that does not belong
236to S.  If m has no such children, Trim(S, m) is equal to S.
237	m itself is removed if and only if the chain has exactly two
238elements, i.e. when the last element does not overmount the root of m.
239In other words, that happens when m has a child not in S that does not
240overmount the root of m.
241	All other elements to remove will be ancestors of m, such that
242the entire descent chain from them to m is contained in S.  Let
243(x_0, x_1, ..., x_k = m) be the longest such chain.  x_i needs to be
244removed if and only if x_{i+1} does not overmount its root.  It's easy
245to see that Trim_ancestors(m) will iterate through that chain from
246x_k to x_1 and that it will remove exactly the elements that need to be
247removed.
248
249	Note that if the loop in Trim_ancestors() walks into an already
250visited element, we are guaranteed that remaining iterations will see
251only elements that had already been visited and remove none of them.
252That's the weakness that makes it vulnerable to long chains of full
253overmounts.
254
255	It's easy to deal with, if we can afford setting marks on
256elements of S; we would mark all elements already visited by
257Trim_ancestors() and have it bail out as soon as it sees an already
258marked element.
259
260	The problems with iterating through the set can be dealt with in
261several ways, depending upon the representation we choose for our set.
262One useful observation is that we are given a closed subset in S - the
263original set passed to propagate_umount().  Its elements can neither
264forbid anything nor be forbidden by anything - all their descendents
265belong to S, so they can not occur anywhere in any excluding chain.
266In other words, the elements of that subset will remain in S until
267the end and Trim_one(S, m) is a no-op for all m from that subset.
268
269	That suggests keeping S as a disjoint union of a closed set U
270('will be unmounted, no matter what') and the set of all elements of
271S that do not belong to U.  That set ('candidates') is all we need
272to iterate through.  Let's represent it as a subset in a cyclic list,
273consisting of all list elements that are marked as candidates (initially -
274all of them).  Then we could have Trim_ancestors() only remove the mark,
275leaving the elements on the list.  Then Trim_one() would never remove
276anything other than its argument from the containing list, allowing to
277use list_for_each_entry_safe() as iterator.
278
279	Assuming that representation we get the following:
280
281	list_for_each_entry_safe(m, ..., Candidates, ...)
282		Trim_one(m)
283where
284Trim_one(m)
285	if (m is not marked as a candidate)
286		strip the "seen by Trim_ancestors" mark from m
287		remove m from the Candidates list
288		return
289
290	remove_this = false
291	found = false
292	for each n in children(m)
293		if n not in S
294			found = true
295			if (mountpoint(n) != root(m))
296				remove_this = true
297				break
298	if found
299		Trim_ancestors(m)
300	if remove_this
301		strip the "seen by Trim_ancestors" mark from m
302		strip the "candidate" mark from m
303		remove m from the Candidate list
304
305Trim_ancestors(m)
306	for (p = parent(m); p is marked as candidate ; m = p, p = parent(p)) {
307		if m is marked as seen by Trim_ancestors
308			return
309		mark m as seen by Trim_ancestors
310		if (mountpoint(m) != root(p))
311			strip the "candidate" mark from p
312	}
313
314	Terminating condition in the loop in Trim_ancestors() is correct,
315since that that loop will never run into p belonging to U - p is always
316an ancestor of argument of Trim_one() and since U is closed, the argument
317of Trim_one() would also have to belong to U.  But Trim_one() is never
318called for elements of U.  In other words, p belongs to S if and only
319if it belongs to candidates.
320
321	Time complexity:
322* we get no more than O(#S) calls of Trim_one()
323* the loop over children in Trim_one() never looks at the same child
324twice through all the calls.
325* iterations of that loop for children in S are no more than O(#S)
326in the worst case
327* at most two children that are not elements of S are considered per
328call of Trim_one().
329* the loop in Trim_ancestors() sets its mark once per iteration and
330no element of S has is set more than once.
331
332	In the end we may have some elements excluded from S by
333Trim_ancestors() still stuck on the list.  We could do a separate
334loop removing them from the list (also no worse than O(#S) time),
335but it's easier to leave that until the next phase - there we will
336iterate through the candidates anyway.
337
338	The caller has already removed all elements of U from their parents'
339lists of children, which means that checking if child belongs to S is
340equivalent to checking if it's marked as a candidate; we'll never see
341the elements of U in the loop over children in Trim_one().
342
343	What's more, if we see that children(m) is empty and m is not
344locked, we can immediately move m into the committed subset (remove
345from the parent's list of children, etc.).  That's one fewer mount we'll
346have to look into when we check the list of children of its parent *and*
347when we get to building the non-revealing subset.
348
349		Maximal non-revealing subsets
350
351If S is not a non-revealing subset, there is a locked element x in S
352such that parent of x is not in S.
353
354Obviously, no non-revealing subset of S may contain x.  Removing such
355elements one by one will obviously end with the maximal non-revealing
356subset (possibly empty one).  Note that removal of an element will
357require removal of all its locked children, etc.
358
359If the set had been non-shifting, it will remain non-shifting after
360such removals.
361Proof: suppose S was non-shifting, x is a locked element of S, parent of x
362is not in S and S - {x} is not non-shifting.  Then there is an element m
363in S - {x} and a subtree mounted strictly inside m, such that m contains
364an element not in in S - {x}.  Since S is non-shifting, everything in
365that subtree must belong to S.  But that means that this subtree must
366contain x somewhere *and* that parent of x either belongs that subtree
367or is equal to m.  Either way it must belong to S.  Contradiction.
368
369// same representation as for finding maximal non-shifting subsets:
370// S is a disjoint union of a non-revealing set U (the ones we are committed
371// to unmount) and a set of candidates, represented as a subset of list
372// elements that have "is a candidate" mark on them.
373// Elements of U are removed from their parents' lists of children.
374// In the end candidates becomes empty and maximal non-revealing non-shifting
375// subset of S is now in U
376	while (Candidates list is non-empty)
377		handle_locked(first(Candidates))
378
379handle_locked(m)
380	if m is not marked as a candidate
381		strip the "seen by Trim_ancestors" mark from m
382		remove m from the list
383		return
384	cutoff = m
385	for (p = m; p in candidates; p = parent(p)) {
386		strip the "seen by Trim_ancestors" mark from p
387		strip the "candidate" mark from p
388		remove p from the Candidates list
389		if (!locked(p))
390			cutoff = parent(p)
391	}
392	if p in U
393		cutoff = p
394	while m != cutoff
395		remove m from children(parent(m))
396		add m to U
397		m = parent(m)
398
399Let (x_0, ..., x_n = m) be the maximal chain of descent of m within S.
400* If it contains some elements of U, let x_k be the last one of those.
401Then union of U with {x_{k+1}, ..., x_n} is obviously non-revealing.
402* otherwise if all its elements are locked, then none of {x_0, ..., x_n}
403may be elements of a non-revealing subset of S.
404* otherwise let x_k be the first unlocked element of the chain.  Then none
405of {x_0, ..., x_{k-1}} may be an element of a non-revealing subset of
406S and union of U and {x_k, ..., x_n} is non-revealing.
407
408handle_locked(m) finds which of these cases applies and adjusts Candidates
409and U accordingly.  U remains non-revealing, union of Candidates and
410U still contains any non-revealing subset of S and after the call of
411handle_locked(m) m is guaranteed to be not in Candidates list.  So having
412it called for each element of S would suffice to empty Candidates,
413leaving U the maximal non-revealing subset of S.
414
415However, handle_locked(m) is a no-op when m belongs to U, so it's enough
416to have it called for elements of Candidates list until none remain.
417
418Time complexity: number of calls of handle_locked() is limited by
419#Candidates, each iteration of the first loop in handle_locked() removes
420an element from the list, so their total number of executions is also
421limited by #Candidates; number of iterations in the second loop is no
422greater than the number of iterations of the first loop.
423
424
425		Reparenting
426
427After we'd calculated the final set, we still need to deal with
428reparenting - if an element of the final set has a child not in it,
429we need to reparent such child.
430
431Such children can only be root-overmounting (otherwise the set wouldn't
432be non-shifting) and their parents can not belong to the original set,
433since the original is guaranteed to be closed.
434
435
436		Putting all of that together
437
438The plan is to
439	* find all candidates
440	* trim down to maximal non-shifting subset
441	* trim down to maximal non-revealing subset
442	* reparent anything that needs to be reparented
443	* return the resulting set to the caller
444
445For the 2nd and 3rd steps we want to separate the set into growing
446non-revealing subset, initially containing the original set ("U" in
447terms of the pseudocode above) and everything we are still not sure about
448("candidates").  It means that for the output of the 1st step we'd like
449the extra candidates separated from the stuff already in the original set.
450For the 4th step we would like the additions to U separate from the
451original set.
452
453So let's go for
454	* original set ("set").  Linkage via mnt_list
455	* undecided candidates ("candidates").  Subset of a list,
456consisting of all its elements marked with a new flag (T_UMOUNT_CANDIDATE).
457Initially all elements of the list will be marked that way; in the
458end the list will become empty and no mounts will remain marked with
459that flag.
460	* Reuse T_MARKED for "has been already seen by trim_ancestors()".
461	* anything in U that hadn't been in the original set - elements of
462candidates will gradually be either discarded or moved there.  In other
463words, it's the candidates we have already decided to unmount.	Its role
464is reasonably close to the old "to_umount", so let's use that name.
465Linkage via mnt_list.
466
467For gather_candidates() we'll need to maintain both candidates (S -
468set) and intersection of S with set.  Use T_UMOUNT_CANDIDATE for
469all elements we encounter, putting the ones not already in the original
470set into the list of candidates.  When we are done, strip that flag from
471all elements of the original set.  That gives a cheap way to check
472if element belongs to S (in gather_candidates) and to candidates
473itself (at later stages).  Call that predicate is_candidate(); it would
474be m->mnt_t_flags & T_UMOUNT_CANDIDATE.
475
476All elements of the original set are marked with MNT_UMOUNT and we'll
477need the same for elements added when joining the contents of to_umount
478to set in the end.  Let's set MNT_UMOUNT at the time we add an element
479to to_umount; that's close to what the old 'umount_one' is doing, so
480let's keep that name.  It also gives us another predicate we need -
481"belongs to union of set and to_umount"; will_be_unmounted() for now.
482
483Removals from the candidates list should strip both T_MARKED and
484T_UMOUNT_CANDIDATE; call it remove_from_candidates_list().
485