1 Notes on propagate_umount() 2 3Umount propagation starts with a set of mounts we are already going to 4take out. Ideally, we would like to add all downstream cognates to 5that set - anything with the same mountpoint as one of the removed 6mounts and with parent that would receive events from the parent of that 7mount. However, there are some constraints the resulting set must 8satisfy. 9 10It is convenient to define several properties of sets of mounts: 11 121) A set S of mounts is non-shifting if for any mount X belonging 13to S all subtrees mounted strictly inside of X (i.e. not overmounting 14the root of X) contain only elements of S. 15 162) A set S is non-revealing if all locked mounts that belong to S have 17parents that also belong to S. 18 193) A set S is closed if it contains all children of its elements. 20 21The set of mounts taken out by umount(2) must be non-shifting and 22non-revealing; the first constraint is what allows to reparent 23any remaining mounts and the second is what prevents the exposure 24of any concealed mountpoints. 25 26propagate_umount() takes the original set as an argument and tries to 27extend that set. The original set is a full subtree and its root is 28unlocked; what matters is that it's closed and non-revealing. 29Resulting set may not be closed; there might still be mounts outside 30of that set, but only on top of stacks of root-overmounting elements 31of set. They can be reparented to the place where the bottom of 32stack is attached to a mount that will survive. NOTE: doing that 33will violate a constraint on having no more than one mount with 34the same parent/mountpoint pair; however, the caller (umount_tree()) 35will immediately remedy that - it may keep unmounted element attached 36to parent, but only if the parent itself is unmounted. Since all 37conflicts created by reparenting have common parent *not* in the 38set and one side of the conflict (bottom of the stack of overmounts) 39is in the set, it will be resolved. However, we rely upon umount_tree() 40doing that pretty much immediately after the call of propagate_umount(). 41 42Algorithm is based on two statements: 43 1) for any set S, there is a maximal non-shifting subset of S 44and it can be calculated in O(#S) time. 45 2) for any non-shifting set S, there is a maximal non-revealing 46subset of S. That subset is also non-shifting and it can be calculated 47in O(#S) time. 48 49 Finding candidates. 50 51We are given a closed set U and we want to find all mounts that have 52the same mountpoint as some mount m in U *and* whose parent receives 53propagation from the parent of the same mount m. Naive implementation 54would be 55 S = {} 56 for each m in U 57 add m to S 58 p = parent(m) 59 for each q in Propagation(p) - {p} 60 child = look_up(q, mountpoint(m)) 61 if child 62 add child to S 63but that can lead to excessive work - there might be propagation among the 64subtrees of U, in which case we'd end up examining the same candidates 65many times. Since propagation is transitive, the same will happen to 66everything downstream of that candidate and it's not hard to construct 67cases where the approach above leads to the time quadratic by the actual 68number of candidates. 69 70Note that if we run into a candidate we'd already seen, it must've been 71added on an earlier iteration of the outer loop - all additions made 72during one iteration of the outer loop have different parents. So 73if we find a child already added to the set, we know that everything 74in Propagation(parent(child)) with the same mountpoint has been already 75added. 76 S = {} 77 for each m in U 78 if m in S 79 continue 80 add m to S 81 p = parent(m) 82 q = propagation_next(p, p) 83 while q 84 child = look_up(q, mountpoint(m)) 85 if child 86 if child in S 87 q = skip_them(q, p) 88 continue; 89 add child to S 90 q = propagation_next(q, p) 91where 92skip_them(q, p) 93 keep walking Propagation(p) from q until we find something 94 not in Propagation(q) 95 96would get rid of that problem, but we need a sane implementation of 97skip_them(). That's not hard to do - split propagation_next() into 98"down into mnt_slave_list" and "forward-and-up" parts, with the 99skip_them() being "repeat the forward-and-up part until we get NULL 100or something that isn't a peer of the one we are skipping". 101 102Note that there can be no absolute roots among the extra candidates - 103they all come from mount lookups. Absolute root among the original 104set is _currently_ impossible, but it might be worth protecting 105against. 106 107 Maximal non-shifting subsets. 108 109Let's call a mount m in a set S forbidden in that set if there is a 110subtree mounted strictly inside m and containing mounts that do not 111belong to S. 112 113The set is non-shifting when none of its elements are forbidden in it. 114 115If mount m is forbidden in a set S, it is forbidden in any subset S' it 116belongs to. In other words, it can't belong to any of the non-shifting 117subsets of S. If we had a way to find a forbidden mount or show that 118there's none, we could use it to find the maximal non-shifting subset 119simply by finding and removing them until none remain. 120 121Suppose mount m is forbidden in S; then any mounts forbidden in S - {m} 122must have been forbidden in S itself. Indeed, since m has descendents 123that do not belong to S, any subtree that fits into S will fit into 124S - {m} as well. 125 126So in principle we could go through elements of S, checking if they 127are forbidden in S and removing the ones that are. Removals will 128not invalidate the checks done for earlier mounts - if they were not 129forbidden at the time we checked, they won't become forbidden later. 130It's too costly to be practical, but there is a similar approach that 131is linear by size of S. 132 133Let's say that mount x in a set S is forbidden by mount y, if 134 * both x and y belong to S. 135 * there is a chain of mounts starting at x and leaving S 136 immediately after passing through y, with the first 137 mountpoint strictly inside x. 138Note 1: x may be equal to y - that's the case when something not 139belonging to S is mounted strictly inside x. 140Note 2: if y does not belong to S, it can't forbid anything in S. 141Note 3: if y has no children outside of S, it can't forbid anything in S. 142 143It's easy to show that mount x is forbidden in S if and only if x is 144forbidden in S by some mount y. And it's easy to find all mounts in S 145forbidden by a given mount. 146 147Consider the following operation: 148 Trim(S, m) = S - {x : x is forbidden by m in S} 149 150Note that if m does not belong to S or has no children outside of S we 151are guaranteed that Trim(S, m) is equal to S. 152 153The following is true: if x is forbidden by y in Trim(S, m), it was 154already forbidden by y in S. 155 156Proof: Suppose x is forbidden by y in Trim(S, m). Then there is a 157chain of mounts (x_0 = x, ..., x_k = y, x_{k+1} = r), such that x_{k+1} 158is the first element that doesn't belong to Trim(S, m) and the 159mountpoint of x_1 is strictly inside x. If mount r belongs to S, it must 160have been removed by Trim(S, m), i.e. it was forbidden in S by m. 161Then there was a mount chain from r to some child of m that stayed in 162S all the way until m, but that's impossible since x belongs to Trim(S, m) 163and prepending (x_0, ..., x_k) to that chain demonstrates that x is also 164forbidden in S by m, and thus can't belong to Trim(S, m). 165Therefore r can not belong to S and our chain demonstrates that 166x is forbidden by y in S. QED. 167 168Corollary: no mount is forbidden by m in Trim(S, m). Indeed, any 169such mount would have been forbidden by m in S and thus would have been 170in the part of S removed in Trim(S, m). 171 172Corollary: no mount is forbidden by m in Trim(Trim(S, m), n). Indeed, 173any such would have to have been forbidden by m in Trim(S, m), which 174is impossible. 175 176Corollary: after 177 S = Trim(S, x_1) 178 S = Trim(S, x_2) 179 ... 180 S = Trim(S, x_k) 181no mount remaining in S will be forbidden by either of x_1,...,x_k. 182 183The following will reduce S to its maximal non-shifting subset: 184 visited = {} 185 while S contains elements not belonging to visited 186 let m be an arbitrary such element of S 187 S = Trim(S, m) 188 add m to visited 189 190S never grows, so the number of elements of S not belonging to visited 191decreases at least by one on each iteration. When the loop terminates, 192all mounts remaining in S belong to visited. It's easy to see that at 193the beginning of each iteration no mount remaining in S will be forbidden 194by any element of visited. In other words, no mount remaining in S will 195be forbidden, i.e. final value of S will be non-shifting. It will be 196the maximal non-shifting subset, since we were removing only forbidden 197elements. 198 199 There are two difficulties in implementing the above in linear 200time, both due to the fact that Trim() might need to remove more than one 201element. Naive implementation of Trim() is vulnerable to running into a 202long chain of mounts, each mounted on top of parent's root. Nothing in 203that chain is forbidden, so nothing gets removed from it. We need to 204recognize such chains and avoid walking them again on subsequent calls of 205Trim(), otherwise we will end up with worst-case time being quadratic by 206the number of elements in S. Another difficulty is in implementing the 207outer loop - we need to iterate through all elements of a shrinking set. 208That would be trivial if we never removed more than one element at a time 209(linked list, with list_for_each_entry_safe for iterator), but we may 210need to remove more than one entry, possibly including the ones we have 211already visited. 212 213 Let's start with naive algorithm for Trim(): 214 215Trim_one(m) 216 found = false 217 for each n in children(m) 218 if n not in S 219 found = true 220 if (mountpoint(n) != root(m)) 221 remove m from S 222 break 223 if found 224 Trim_ancestors(m) 225 226Trim_ancestors(m) 227 for (; parent(m) in S; m = parent(m)) { 228 if (mountpoint(m) != root(parent(m))) 229 remove parent(m) from S 230 } 231 232If m belongs to S, Trim_one(m) will replace S with Trim(S, m). 233Proof: 234 Consider the chains excluding elements from Trim(S, m). The last 235two elements in such chain are m and some child of m that does not belong 236to S. If m has no such children, Trim(S, m) is equal to S. 237 m itself is removed if and only if the chain has exactly two 238elements, i.e. when the last element does not overmount the root of m. 239In other words, that happens when m has a child not in S that does not 240overmount the root of m. 241 All other elements to remove will be ancestors of m, such that 242the entire descent chain from them to m is contained in S. Let 243(x_0, x_1, ..., x_k = m) be the longest such chain. x_i needs to be 244removed if and only if x_{i+1} does not overmount its root. It's easy 245to see that Trim_ancestors(m) will iterate through that chain from 246x_k to x_1 and that it will remove exactly the elements that need to be 247removed. 248 249 Note that if the loop in Trim_ancestors() walks into an already 250visited element, we are guaranteed that remaining iterations will see 251only elements that had already been visited and remove none of them. 252That's the weakness that makes it vulnerable to long chains of full 253overmounts. 254 255 It's easy to deal with, if we can afford setting marks on 256elements of S; we would mark all elements already visited by 257Trim_ancestors() and have it bail out as soon as it sees an already 258marked element. 259 260 The problems with iterating through the set can be dealt with in 261several ways, depending upon the representation we choose for our set. 262One useful observation is that we are given a closed subset in S - the 263original set passed to propagate_umount(). Its elements can neither 264forbid anything nor be forbidden by anything - all their descendents 265belong to S, so they can not occur anywhere in any excluding chain. 266In other words, the elements of that subset will remain in S until 267the end and Trim_one(S, m) is a no-op for all m from that subset. 268 269 That suggests keeping S as a disjoint union of a closed set U 270('will be unmounted, no matter what') and the set of all elements of 271S that do not belong to U. That set ('candidates') is all we need 272to iterate through. Let's represent it as a subset in a cyclic list, 273consisting of all list elements that are marked as candidates (initially - 274all of them). Then we could have Trim_ancestors() only remove the mark, 275leaving the elements on the list. Then Trim_one() would never remove 276anything other than its argument from the containing list, allowing to 277use list_for_each_entry_safe() as iterator. 278 279 Assuming that representation we get the following: 280 281 list_for_each_entry_safe(m, ..., Candidates, ...) 282 Trim_one(m) 283where 284Trim_one(m) 285 if (m is not marked as a candidate) 286 strip the "seen by Trim_ancestors" mark from m 287 remove m from the Candidates list 288 return 289 290 remove_this = false 291 found = false 292 for each n in children(m) 293 if n not in S 294 found = true 295 if (mountpoint(n) != root(m)) 296 remove_this = true 297 break 298 if found 299 Trim_ancestors(m) 300 if remove_this 301 strip the "seen by Trim_ancestors" mark from m 302 strip the "candidate" mark from m 303 remove m from the Candidate list 304 305Trim_ancestors(m) 306 for (p = parent(m); p is marked as candidate ; m = p, p = parent(p)) { 307 if m is marked as seen by Trim_ancestors 308 return 309 mark m as seen by Trim_ancestors 310 if (mountpoint(m) != root(p)) 311 strip the "candidate" mark from p 312 } 313 314 Terminating condition in the loop in Trim_ancestors() is correct, 315since that that loop will never run into p belonging to U - p is always 316an ancestor of argument of Trim_one() and since U is closed, the argument 317of Trim_one() would also have to belong to U. But Trim_one() is never 318called for elements of U. In other words, p belongs to S if and only 319if it belongs to candidates. 320 321 Time complexity: 322* we get no more than O(#S) calls of Trim_one() 323* the loop over children in Trim_one() never looks at the same child 324twice through all the calls. 325* iterations of that loop for children in S are no more than O(#S) 326in the worst case 327* at most two children that are not elements of S are considered per 328call of Trim_one(). 329* the loop in Trim_ancestors() sets its mark once per iteration and 330no element of S has is set more than once. 331 332 In the end we may have some elements excluded from S by 333Trim_ancestors() still stuck on the list. We could do a separate 334loop removing them from the list (also no worse than O(#S) time), 335but it's easier to leave that until the next phase - there we will 336iterate through the candidates anyway. 337 338 The caller has already removed all elements of U from their parents' 339lists of children, which means that checking if child belongs to S is 340equivalent to checking if it's marked as a candidate; we'll never see 341the elements of U in the loop over children in Trim_one(). 342 343 What's more, if we see that children(m) is empty and m is not 344locked, we can immediately move m into the committed subset (remove 345from the parent's list of children, etc.). That's one fewer mount we'll 346have to look into when we check the list of children of its parent *and* 347when we get to building the non-revealing subset. 348 349 Maximal non-revealing subsets 350 351If S is not a non-revealing subset, there is a locked element x in S 352such that parent of x is not in S. 353 354Obviously, no non-revealing subset of S may contain x. Removing such 355elements one by one will obviously end with the maximal non-revealing 356subset (possibly empty one). Note that removal of an element will 357require removal of all its locked children, etc. 358 359If the set had been non-shifting, it will remain non-shifting after 360such removals. 361Proof: suppose S was non-shifting, x is a locked element of S, parent of x 362is not in S and S - {x} is not non-shifting. Then there is an element m 363in S - {x} and a subtree mounted strictly inside m, such that m contains 364an element not in in S - {x}. Since S is non-shifting, everything in 365that subtree must belong to S. But that means that this subtree must 366contain x somewhere *and* that parent of x either belongs that subtree 367or is equal to m. Either way it must belong to S. Contradiction. 368 369// same representation as for finding maximal non-shifting subsets: 370// S is a disjoint union of a non-revealing set U (the ones we are committed 371// to unmount) and a set of candidates, represented as a subset of list 372// elements that have "is a candidate" mark on them. 373// Elements of U are removed from their parents' lists of children. 374// In the end candidates becomes empty and maximal non-revealing non-shifting 375// subset of S is now in U 376 while (Candidates list is non-empty) 377 handle_locked(first(Candidates)) 378 379handle_locked(m) 380 if m is not marked as a candidate 381 strip the "seen by Trim_ancestors" mark from m 382 remove m from the list 383 return 384 cutoff = m 385 for (p = m; p in candidates; p = parent(p)) { 386 strip the "seen by Trim_ancestors" mark from p 387 strip the "candidate" mark from p 388 remove p from the Candidates list 389 if (!locked(p)) 390 cutoff = parent(p) 391 } 392 if p in U 393 cutoff = p 394 while m != cutoff 395 remove m from children(parent(m)) 396 add m to U 397 m = parent(m) 398 399Let (x_0, ..., x_n = m) be the maximal chain of descent of m within S. 400* If it contains some elements of U, let x_k be the last one of those. 401Then union of U with {x_{k+1}, ..., x_n} is obviously non-revealing. 402* otherwise if all its elements are locked, then none of {x_0, ..., x_n} 403may be elements of a non-revealing subset of S. 404* otherwise let x_k be the first unlocked element of the chain. Then none 405of {x_0, ..., x_{k-1}} may be an element of a non-revealing subset of 406S and union of U and {x_k, ..., x_n} is non-revealing. 407 408handle_locked(m) finds which of these cases applies and adjusts Candidates 409and U accordingly. U remains non-revealing, union of Candidates and 410U still contains any non-revealing subset of S and after the call of 411handle_locked(m) m is guaranteed to be not in Candidates list. So having 412it called for each element of S would suffice to empty Candidates, 413leaving U the maximal non-revealing subset of S. 414 415However, handle_locked(m) is a no-op when m belongs to U, so it's enough 416to have it called for elements of Candidates list until none remain. 417 418Time complexity: number of calls of handle_locked() is limited by 419#Candidates, each iteration of the first loop in handle_locked() removes 420an element from the list, so their total number of executions is also 421limited by #Candidates; number of iterations in the second loop is no 422greater than the number of iterations of the first loop. 423 424 425 Reparenting 426 427After we'd calculated the final set, we still need to deal with 428reparenting - if an element of the final set has a child not in it, 429we need to reparent such child. 430 431Such children can only be root-overmounting (otherwise the set wouldn't 432be non-shifting) and their parents can not belong to the original set, 433since the original is guaranteed to be closed. 434 435 436 Putting all of that together 437 438The plan is to 439 * find all candidates 440 * trim down to maximal non-shifting subset 441 * trim down to maximal non-revealing subset 442 * reparent anything that needs to be reparented 443 * return the resulting set to the caller 444 445For the 2nd and 3rd steps we want to separate the set into growing 446non-revealing subset, initially containing the original set ("U" in 447terms of the pseudocode above) and everything we are still not sure about 448("candidates"). It means that for the output of the 1st step we'd like 449the extra candidates separated from the stuff already in the original set. 450For the 4th step we would like the additions to U separate from the 451original set. 452 453So let's go for 454 * original set ("set"). Linkage via mnt_list 455 * undecided candidates ("candidates"). Subset of a list, 456consisting of all its elements marked with a new flag (T_UMOUNT_CANDIDATE). 457Initially all elements of the list will be marked that way; in the 458end the list will become empty and no mounts will remain marked with 459that flag. 460 * Reuse T_MARKED for "has been already seen by trim_ancestors()". 461 * anything in U that hadn't been in the original set - elements of 462candidates will gradually be either discarded or moved there. In other 463words, it's the candidates we have already decided to unmount. Its role 464is reasonably close to the old "to_umount", so let's use that name. 465Linkage via mnt_list. 466 467For gather_candidates() we'll need to maintain both candidates (S - 468set) and intersection of S with set. Use T_UMOUNT_CANDIDATE for 469all elements we encounter, putting the ones not already in the original 470set into the list of candidates. When we are done, strip that flag from 471all elements of the original set. That gives a cheap way to check 472if element belongs to S (in gather_candidates) and to candidates 473itself (at later stages). Call that predicate is_candidate(); it would 474be m->mnt_t_flags & T_UMOUNT_CANDIDATE. 475 476All elements of the original set are marked with MNT_UMOUNT and we'll 477need the same for elements added when joining the contents of to_umount 478to set in the end. Let's set MNT_UMOUNT at the time we add an element 479to to_umount; that's close to what the old 'umount_one' is doing, so 480let's keep that name. It also gives us another predicate we need - 481"belongs to union of set and to_umount"; will_be_unmounted() for now. 482 483Removals from the candidates list should strip both T_MARKED and 484T_UMOUNT_CANDIDATE; call it remove_from_candidates_list(). 485