1.. SPDX-License-Identifier: GPL-2.0 2 3=============== 4Shared Subtrees 5=============== 6 7.. Contents: 8 1) Overview 9 2) Features 10 3) Setting mount states 11 4) Use-case 12 5) Detailed semantics 13 6) Quiz 14 7) FAQ 15 8) Implementation 16 17 181) Overview 19----------- 20 21Consider the following situation: 22 23A process wants to clone its own namespace, but still wants to access the CD 24that got mounted recently. Shared subtree semantics provide the necessary 25mechanism to accomplish the above. 26 27It provides the necessary building blocks for features like per-user-namespace 28and versioned filesystem. 29 302) Features 31----------- 32 33Shared subtree provides four different flavors of mounts; struct vfsmount to be 34precise: 35 36 37a) A **shared mount** can be replicated to as many mountpoints and all the 38 replicas continue to be exactly same. 39 40 Here is an example: 41 42 Let's say /mnt has a mount that is shared:: 43 44 # mount --make-shared /mnt 45 46 .. note:: 47 mount(8) command now supports the --make-shared flag, 48 so the sample 'smount' program is no longer needed and has been 49 removed. 50 51 :: 52 53 # mount --bind /mnt /tmp 54 55 The above command replicates the mount at /mnt to the mountpoint /tmp 56 and the contents of both the mounts remain identical. 57 58 :: 59 60 #ls /mnt 61 a b c 62 63 #ls /tmp 64 a b c 65 66 Now let's say we mount a device at /tmp/a:: 67 68 # mount /dev/sd0 /tmp/a 69 70 # ls /tmp/a 71 t1 t2 t3 72 73 # ls /mnt/a 74 t1 t2 t3 75 76 Note that the mount has propagated to the mount at /mnt as well. 77 78 And the same is true even when /dev/sd0 is mounted on /mnt/a. The 79 contents will be visible under /tmp/a too. 80 81 82b) A **slave mount** is like a shared mount except that mount and umount events 83 only propagate towards it. 84 85 All slave mounts have a master mount which is a shared. 86 87 Here is an example: 88 89 Let's say /mnt has a mount which is shared:: 90 91 # mount --make-shared /mnt 92 93 Let's bind mount /mnt to /tmp:: 94 95 # mount --bind /mnt /tmp 96 97 the new mount at /tmp becomes a shared mount and it is a replica of 98 the mount at /mnt. 99 100 Now let's make the mount at /tmp; a slave of /mnt:: 101 102 # mount --make-slave /tmp 103 104 let's mount /dev/sd0 on /mnt/a:: 105 106 # mount /dev/sd0 /mnt/a 107 108 # ls /mnt/a 109 t1 t2 t3 110 111 # ls /tmp/a 112 t1 t2 t3 113 114 Note the mount event has propagated to the mount at /tmp 115 116 However let's see what happens if we mount something on the mount at 117 /tmp:: 118 119 # mount /dev/sd1 /tmp/b 120 121 # ls /tmp/b 122 s1 s2 s3 123 124 # ls /mnt/b 125 126 Note how the mount event has not propagated to the mount at 127 /mnt 128 129 130c) A **private mount** does not forward or receive propagation. 131 132 This is the mount we are familiar with. Its the default type. 133 134 135d) An **unbindable mount** is, as the name suggests, an unbindable private 136 mount. 137 138 let's say we have a mount at /mnt and we make it unbindable:: 139 140 # mount --make-unbindable /mnt 141 142 Let's try to bind mount this mount somewhere else:: 143 144 # mount --bind /mnt /tmp mount: wrong fs type, bad option, bad 145 superblock on /mnt, or too many mounted file systems 146 147 Binding a unbindable mount is a invalid operation. 148 149 1503) Setting mount states 151----------------------- 152 153The mount command (util-linux package) can be used to set mount 154states:: 155 156 mount --make-shared mountpoint 157 mount --make-slave mountpoint 158 mount --make-private mountpoint 159 mount --make-unbindable mountpoint 160 161 1624) Use cases 163------------ 164 165A) A process wants to clone its own namespace, but still wants to 166 access the CD that got mounted recently. 167 168 Solution: 169 170 The system administrator can make the mount at /cdrom shared:: 171 172 mount --bind /cdrom /cdrom 173 mount --make-shared /cdrom 174 175 Now any process that clones off a new namespace will have a 176 mount at /cdrom which is a replica of the same mount in the 177 parent namespace. 178 179 So when a CD is inserted and mounted at /cdrom that mount gets 180 propagated to the other mount at /cdrom in all the other clone 181 namespaces. 182 183B) A process wants its mounts invisible to any other process, but 184 still be able to see the other system mounts. 185 186 Solution: 187 188 To begin with, the administrator can mark the entire mount tree 189 as shareable:: 190 191 mount --make-rshared / 192 193 A new process can clone off a new namespace. And mark some part 194 of its namespace as slave:: 195 196 mount --make-rslave /myprivatetree 197 198 Hence forth any mounts within the /myprivatetree done by the 199 process will not show up in any other namespace. However mounts 200 done in the parent namespace under /myprivatetree still shows 201 up in the process's namespace. 202 203 204Apart from the above semantics this feature provides the 205building blocks to solve the following problems: 206 207C) Per-user namespace 208 209 The above semantics allows a way to share mounts across 210 namespaces. But namespaces are associated with processes. If 211 namespaces are made first class objects with user API to 212 associate/disassociate a namespace with userid, then each user 213 could have his/her own namespace and tailor it to his/her 214 requirements. This needs to be supported in PAM. 215 216D) Versioned files 217 218 If the entire mount tree is visible at multiple locations, then 219 an underlying versioning file system can return different 220 versions of the file depending on the path used to access that 221 file. 222 223 An example is:: 224 225 mount --make-shared / 226 mount --rbind / /view/v1 227 mount --rbind / /view/v2 228 mount --rbind / /view/v3 229 mount --rbind / /view/v4 230 231 and if /usr has a versioning filesystem mounted, then that 232 mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and 233 /view/v4/usr too 234 235 A user can request v3 version of the file /usr/fs/namespace.c 236 by accessing /view/v3/usr/fs/namespace.c . The underlying 237 versioning filesystem can then decipher that v3 version of the 238 filesystem is being requested and return the corresponding 239 inode. 240 2415) Detailed semantics 242--------------------- 243The section below explains the detailed semantics of 244bind, rbind, move, mount, umount and clone-namespace operations. 245 246.. Note:: 247 the word 'vfsmount' and the noun 'mount' have been used 248 to mean the same thing, throughout this document. 249 250a) Mount states 251 252 A **propagation event** is defined as event generated on a vfsmount 253 that leads to mount or unmount actions in other vfsmounts. 254 255 A **peer group** is defined as a group of vfsmounts that propagate 256 events to each other. 257 258 A given mount can be in one of the following states: 259 260 (1) Shared mounts 261 262 A **shared mount** is defined as a vfsmount that belongs to a 263 peer group. 264 265 For example:: 266 267 mount --make-shared /mnt 268 mount --bind /mnt /tmp 269 270 The mount at /mnt and that at /tmp are both shared and belong 271 to the same peer group. Anything mounted or unmounted under 272 /mnt or /tmp reflect in all the other mounts of its peer 273 group. 274 275 276 (2) Slave mounts 277 278 A **slave mount** is defined as a vfsmount that receives 279 propagation events and does not forward propagation events. 280 281 A slave mount as the name implies has a master mount from which 282 mount/unmount events are received. Events do not propagate from 283 the slave mount to the master. Only a shared mount can be made 284 a slave by executing the following command:: 285 286 mount --make-slave mount 287 288 A shared mount that is made as a slave is no more shared unless 289 modified to become shared. 290 291 (3) Shared and Slave 292 293 A vfsmount can be both **shared** as well as **slave**. This state 294 indicates that the mount is a slave of some vfsmount, and 295 has its own peer group too. This vfsmount receives propagation 296 events from its master vfsmount, and also forwards propagation 297 events to its 'peer group' and to its slave vfsmounts. 298 299 Strictly speaking, the vfsmount is shared having its own 300 peer group, and this peer-group is a slave of some other 301 peer group. 302 303 Only a slave vfsmount can be made as 'shared and slave' by 304 either executing the following command:: 305 306 mount --make-shared mount 307 308 or by moving the slave vfsmount under a shared vfsmount. 309 310 (4) Private mount 311 312 A **private mount** is defined as vfsmount that does not 313 receive or forward any propagation events. 314 315 (5) Unbindable mount 316 317 A **unbindable mount** is defined as vfsmount that does not 318 receive or forward any propagation events and cannot 319 be bind mounted. 320 321 322 State diagram: 323 324 The state diagram below explains the state transition of a mount, 325 in response to various commands:: 326 327 ----------------------------------------------------------------------- 328 | |make-shared | make-slave | make-private |make-unbindab| 329 --------------|------------|--------------|--------------|-------------| 330 |shared |shared |*slave/private| private | unbindable | 331 | | | | | | 332 |-------------|------------|--------------|--------------|-------------| 333 |slave |shared | **slave | private | unbindable | 334 | |and slave | | | | 335 |-------------|------------|--------------|--------------|-------------| 336 |shared |shared | slave | private | unbindable | 337 |and slave |and slave | | | | 338 |-------------|------------|--------------|--------------|-------------| 339 |private |shared | **private | private | unbindable | 340 |-------------|------------|--------------|--------------|-------------| 341 |unbindable |shared |**unbindable | private | unbindable | 342 ------------------------------------------------------------------------ 343 344 * if the shared mount is the only mount in its peer group, making it 345 slave, makes it private automatically. Note that there is no master to 346 which it can be slaved to. 347 348 ** slaving a non-shared mount has no effect on the mount. 349 350 Apart from the commands listed below, the 'move' operation also changes 351 the state of a mount depending on type of the destination mount. Its 352 explained in section 5d. 353 354b) Bind semantics 355 356 Consider the following command:: 357 358 mount --bind A/a B/b 359 360 where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B' 361 is the destination mount and 'b' is the dentry in the destination mount. 362 363 The outcome depends on the type of mount of 'A' and 'B'. The table 364 below contains quick reference:: 365 366 -------------------------------------------------------------------------- 367 | BIND MOUNT OPERATION | 368 |************************************************************************| 369 |source(A)->| shared | private | slave | unbindable | 370 | dest(B) | | | | | 371 | | | | | | | 372 | v | | | | | 373 |************************************************************************| 374 | shared | shared | shared | shared & slave | invalid | 375 | | | | | | 376 |non-shared| shared | private | slave | invalid | 377 ************************************************************************** 378 379 Details: 380 381 1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C' 382 which is clone of 'A', is created. Its root dentry is 'a' . 'C' is 383 mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ... 384 are created and mounted at the dentry 'b' on all mounts where 'B' 385 propagates to. A new propagation tree containing 'C1',..,'Cn' is 386 created. This propagation tree is identical to the propagation tree of 387 'B'. And finally the peer-group of 'C' is merged with the peer group 388 of 'A'. 389 390 2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C' 391 which is clone of 'A', is created. Its root dentry is 'a'. 'C' is 392 mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ... 393 are created and mounted at the dentry 'b' on all mounts where 'B' 394 propagates to. A new propagation tree is set containing all new mounts 395 'C', 'C1', .., 'Cn' with exactly the same configuration as the 396 propagation tree for 'B'. 397 398 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new 399 mount 'C' which is clone of 'A', is created. Its root dentry is 'a' . 400 'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2', 401 'C3' ... are created and mounted at the dentry 'b' on all mounts where 402 'B' propagates to. A new propagation tree containing the new mounts 403 'C','C1',.. 'Cn' is created. This propagation tree is identical to the 404 propagation tree for 'B'. And finally the mount 'C' and its peer group 405 is made the slave of mount 'Z'. In other words, mount 'C' is in the 406 state 'slave and shared'. 407 408 4. 'A' is a unbindable mount and 'B' is a shared mount. This is a 409 invalid operation. 410 411 5. 'A' is a private mount and 'B' is a non-shared(private or slave or 412 unbindable) mount. A new mount 'C' which is clone of 'A', is created. 413 Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'. 414 415 6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C' 416 which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is 417 mounted on mount 'B' at dentry 'b'. 'C' is made a member of the 418 peer-group of 'A'. 419 420 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A 421 new mount 'C' which is a clone of 'A' is created. Its root dentry is 422 'a'. 'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a 423 slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of 424 'Z'. All mount/unmount events on 'Z' propagates to 'A' and 'C'. But 425 mount/unmount on 'A' do not propagate anywhere else. Similarly 426 mount/unmount on 'C' do not propagate anywhere else. 427 428 8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a 429 invalid operation. A unbindable mount cannot be bind mounted. 430 431c) Rbind semantics 432 433 rbind is same as bind. Bind replicates the specified mount. Rbind 434 replicates all the mounts in the tree belonging to the specified mount. 435 Rbind mount is bind mount applied to all the mounts in the tree. 436 437 If the source tree that is rbind has some unbindable mounts, 438 then the subtree under the unbindable mount is pruned in the new 439 location. 440 441 eg: 442 443 let's say we have the following mount tree:: 444 445 A 446 / \ 447 B C 448 / \ / \ 449 D E F G 450 451 Let's say all the mount except the mount C in the tree are 452 of a type other than unbindable. 453 454 If this tree is rbound to say Z 455 456 We will have the following tree at the new location:: 457 458 Z 459 | 460 A' 461 / 462 B' Note how the tree under C is pruned 463 / \ in the new location. 464 D' E' 465 466 467 468d) Move semantics 469 470 Consider the following command:: 471 472 mount --move A B/b 473 474 where 'A' is the source mount, 'B' is the destination mount and 'b' is 475 the dentry in the destination mount. 476 477 The outcome depends on the type of the mount of 'A' and 'B'. The table 478 below is a quick reference:: 479 480 --------------------------------------------------------------------------- 481 | MOVE MOUNT OPERATION | 482 |************************************************************************** 483 | source(A)->| shared | private | slave | unbindable | 484 | dest(B) | | | | | 485 | | | | | | | 486 | v | | | | | 487 |************************************************************************** 488 | shared | shared | shared |shared and slave| invalid | 489 | | | | | | 490 |non-shared| shared | private | slave | unbindable | 491 *************************************************************************** 492 493 .. Note:: moving a mount residing under a shared mount is invalid. 494 495 Details follow: 496 497 1. 'A' is a shared mount and 'B' is a shared mount. The mount 'A' is 498 mounted on mount 'B' at dentry 'b'. Also new mounts 'A1', 'A2'...'An' 499 are created and mounted at dentry 'b' on all mounts that receive 500 propagation from mount 'B'. A new propagation tree is created in the 501 exact same configuration as that of 'B'. This new propagation tree 502 contains all the new mounts 'A1', 'A2'... 'An'. And this new 503 propagation tree is appended to the already existing propagation tree 504 of 'A'. 505 506 2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is 507 mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An' 508 are created and mounted at dentry 'b' on all mounts that receive 509 propagation from mount 'B'. The mount 'A' becomes a shared mount and a 510 propagation tree is created which is identical to that of 511 'B'. This new propagation tree contains all the new mounts 'A1', 512 'A2'... 'An'. 513 514 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. The 515 mount 'A' is mounted on mount 'B' at dentry 'b'. Also new mounts 'A1', 516 'A2'... 'An' are created and mounted at dentry 'b' on all mounts that 517 receive propagation from mount 'B'. A new propagation tree is created 518 in the exact same configuration as that of 'B'. This new propagation 519 tree contains all the new mounts 'A1', 'A2'... 'An'. And this new 520 propagation tree is appended to the already existing propagation tree of 521 'A'. Mount 'A' continues to be the slave mount of 'Z' but it also 522 becomes 'shared'. 523 524 4. 'A' is a unbindable mount and 'B' is a shared mount. The operation 525 is invalid. Because mounting anything on the shared mount 'B' can 526 create new mounts that get mounted on the mounts that receive 527 propagation from 'B'. And since the mount 'A' is unbindable, cloning 528 it to mount at other mountpoints is not possible. 529 530 5. 'A' is a private mount and 'B' is a non-shared(private or slave or 531 unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'. 532 533 6. 'A' is a shared mount and 'B' is a non-shared mount. The mount 'A' 534 is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a 535 shared mount. 536 537 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. 538 The mount 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' 539 continues to be a slave mount of mount 'Z'. 540 541 8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount 542 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a 543 unbindable mount. 544 545e) Mount semantics 546 547 Consider the following command:: 548 549 mount device B/b 550 551 'B' is the destination mount and 'b' is the dentry in the destination 552 mount. 553 554 The above operation is the same as bind operation with the exception 555 that the source mount is always a private mount. 556 557 558f) Unmount semantics 559 560 Consider the following command:: 561 562 umount A 563 564 where 'A' is a mount mounted on mount 'B' at dentry 'b'. 565 566 If mount 'B' is shared, then all most-recently-mounted mounts at dentry 567 'b' on mounts that receive propagation from mount 'B' and does not have 568 sub-mounts within them are unmounted. 569 570 Example: Let's say 'B1', 'B2', 'B3' are shared mounts that propagate to 571 each other. 572 573 let's say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount 574 'B1', 'B2' and 'B3' respectively. 575 576 let's say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on 577 mount 'B1', 'B2' and 'B3' respectively. 578 579 if 'C1' is unmounted, all the mounts that are most-recently-mounted on 580 'B1' and on the mounts that 'B1' propagates-to are unmounted. 581 582 'B1' propagates to 'B2' and 'B3'. And the most recently mounted mount 583 on 'B2' at dentry 'b' is 'C2', and that of mount 'B3' is 'C3'. 584 585 So all 'C1', 'C2' and 'C3' should be unmounted. 586 587 If any of 'C2' or 'C3' has some child mounts, then that mount is not 588 unmounted, but all other mounts are unmounted. However if 'C1' is told 589 to be unmounted and 'C1' has some sub-mounts, the umount operation is 590 failed entirely. 591 592g) Clone Namespace 593 594 A cloned namespace contains all the mounts as that of the parent 595 namespace. 596 597 Let's say 'A' and 'B' are the corresponding mounts in the parent and the 598 child namespace. 599 600 If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to 601 each other. 602 603 If 'A' is a slave mount of 'Z', then 'B' is also the slave mount of 604 'Z'. 605 606 If 'A' is a private mount, then 'B' is a private mount too. 607 608 If 'A' is unbindable mount, then 'B' is a unbindable mount too. 609 610 6116) Quiz 612------- 613 614A. What is the result of the following command sequence? 615 616 :: 617 618 mount --bind /mnt /mnt 619 mount --make-shared /mnt 620 mount --bind /mnt /tmp 621 mount --move /tmp /mnt/1 622 623 what should be the contents of /mnt /mnt/1 /mnt/1/1 should be? 624 Should they all be identical? or should /mnt and /mnt/1 be 625 identical only? 626 627 628B. What is the result of the following command sequence? 629 630 :: 631 632 mount --make-rshared / 633 mkdir -p /v/1 634 mount --rbind / /v/1 635 636 what should be the content of /v/1/v/1 be? 637 638 639C. What is the result of the following command sequence? 640 641 :: 642 643 mount --bind /mnt /mnt 644 mount --make-shared /mnt 645 mkdir -p /mnt/1/2/3 /mnt/1/test 646 mount --bind /mnt/1 /tmp 647 mount --make-slave /mnt 648 mount --make-shared /mnt 649 mount --bind /mnt/1/2 /tmp1 650 mount --make-slave /mnt 651 652 At this point we have the first mount at /tmp and 653 its root dentry is 1. Let's call this mount 'A' 654 And then we have a second mount at /tmp1 with root 655 dentry 2. Let's call this mount 'B' 656 Next we have a third mount at /mnt with root dentry 657 mnt. Let's call this mount 'C' 658 659 'B' is the slave of 'A' and 'C' is a slave of 'B' 660 A -> B -> C 661 662 at this point if we execute the following command:: 663 664 mount --bind /bin /tmp/test 665 666 The mount is attempted on 'A' 667 668 will the mount propagate to 'B' and 'C' ? 669 670 what would be the contents of 671 /mnt/1/test be? 672 6737) FAQ 674------ 675 6761. Why is bind mount needed? How is it different from symbolic links? 677 678 symbolic links can get stale if the destination mount gets 679 unmounted or moved. Bind mounts continue to exist even if the 680 other mount is unmounted or moved. 681 6822. Why can't the shared subtree be implemented using exportfs? 683 684 exportfs is a heavyweight way of accomplishing part of what 685 shared subtree can do. I cannot imagine a way to implement the 686 semantics of slave mount using exportfs? 687 6883. Why is unbindable mount needed? 689 690 Let's say we want to replicate the mount tree at multiple 691 locations within the same subtree. 692 693 if one rbind mounts a tree within the same subtree 'n' times 694 the number of mounts created is an exponential function of 'n'. 695 Having unbindable mount can help prune the unneeded bind 696 mounts. Here is an example. 697 698 step 1: 699 let's say the root tree has just two directories with 700 one vfsmount:: 701 702 root 703 / \ 704 tmp usr 705 706 And we want to replicate the tree at multiple 707 mountpoints under /root/tmp 708 709 step 2: 710 :: 711 712 713 mount --make-shared /root 714 715 mkdir -p /tmp/m1 716 717 mount --rbind /root /tmp/m1 718 719 the new tree now looks like this:: 720 721 root 722 / \ 723 tmp usr 724 / 725 m1 726 / \ 727 tmp usr 728 / 729 m1 730 731 it has two vfsmounts 732 733 step 3: 734 :: 735 736 mkdir -p /tmp/m2 737 mount --rbind /root /tmp/m2 738 739 the new tree now looks like this:: 740 741 root 742 / \ 743 tmp usr 744 / \ 745 m1 m2 746 / \ / \ 747 tmp usr tmp usr 748 / \ / 749 m1 m2 m1 750 / \ / \ 751 tmp usr tmp usr 752 / / \ 753 m1 m1 m2 754 / \ 755 tmp usr 756 / \ 757 m1 m2 758 759 it has 6 vfsmounts 760 761 step 4: 762 :: 763 764 mkdir -p /tmp/m3 765 mount --rbind /root /tmp/m3 766 767 I won't draw the tree..but it has 24 vfsmounts 768 769 770 at step i the number of vfsmounts is V[i] = i*V[i-1]. 771 This is an exponential function. And this tree has way more 772 mounts than what we really needed in the first place. 773 774 One could use a series of umount at each step to prune 775 out the unneeded mounts. But there is a better solution. 776 Unclonable mounts come in handy here. 777 778 step 1: 779 let's say the root tree has just two directories with 780 one vfsmount:: 781 782 root 783 / \ 784 tmp usr 785 786 How do we set up the same tree at multiple locations under 787 /root/tmp 788 789 step 2: 790 :: 791 792 793 mount --bind /root/tmp /root/tmp 794 795 mount --make-rshared /root 796 mount --make-unbindable /root/tmp 797 798 mkdir -p /tmp/m1 799 800 mount --rbind /root /tmp/m1 801 802 the new tree now looks like this:: 803 804 root 805 / \ 806 tmp usr 807 / 808 m1 809 / \ 810 tmp usr 811 812 step 3: 813 :: 814 815 mkdir -p /tmp/m2 816 mount --rbind /root /tmp/m2 817 818 the new tree now looks like this:: 819 820 root 821 / \ 822 tmp usr 823 / \ 824 m1 m2 825 / \ / \ 826 tmp usr tmp usr 827 828 step 4: 829 :: 830 831 mkdir -p /tmp/m3 832 mount --rbind /root /tmp/m3 833 834 the new tree now looks like this:: 835 836 root 837 / \ 838 tmp usr 839 / \ \ 840 m1 m2 m3 841 / \ / \ / \ 842 tmp usr tmp usr tmp usr 843 8448) Implementation 845----------------- 846 847A) Datastructure 848 849 Several new fields are introduced to struct vfsmount: 850 851 ->mnt_share 852 Links together all the mount to/from which this vfsmount 853 send/receives propagation events. 854 855 ->mnt_slave_list 856 Links all the mounts to which this vfsmount propagates 857 to. 858 859 ->mnt_slave 860 Links together all the slaves that its master vfsmount 861 propagates to. 862 863 ->mnt_master 864 Points to the master vfsmount from which this vfsmount 865 receives propagation. 866 867 ->mnt_flags 868 Takes two more flags to indicate the propagation status of 869 the vfsmount. MNT_SHARE indicates that the vfsmount is a shared 870 vfsmount. MNT_UNCLONABLE indicates that the vfsmount cannot be 871 replicated. 872 873 All the shared vfsmounts in a peer group form a cyclic list through 874 ->mnt_share. 875 876 All vfsmounts with the same ->mnt_master form on a cyclic list anchored 877 in ->mnt_master->mnt_slave_list and going through ->mnt_slave. 878 879 ->mnt_master can point to arbitrary (and possibly different) members 880 of master peer group. To find all immediate slaves of a peer group 881 you need to go through _all_ ->mnt_slave_list of its members. 882 Conceptually it's just a single set - distribution among the 883 individual lists does not affect propagation or the way propagation 884 tree is modified by operations. 885 886 All vfsmounts in a peer group have the same ->mnt_master. If it is 887 non-NULL, they form a contiguous (ordered) segment of slave list. 888 889 A example propagation tree looks as shown in the figure below. 890 891 .. note:: 892 Though it looks like a forest, if we consider all the shared 893 mounts as a conceptual entity called 'pnode', it becomes a tree. 894 895 :: 896 897 898 A <--> B <--> C <---> D 899 /|\ /| |\ 900 / F G J K H I 901 / 902 E<-->K 903 /|\ 904 M L N 905 906 In the above figure A,B,C and D all are shared and propagate to each 907 other. 'A' has got 3 slave mounts 'E' 'F' and 'G' 'C' has got 2 slave 908 mounts 'J' and 'K' and 'D' has got two slave mounts 'H' and 'I'. 909 'E' is also shared with 'K' and they propagate to each other. And 910 'K' has 3 slaves 'M', 'L' and 'N' 911 912 A's ->mnt_share links with the ->mnt_share of 'B' 'C' and 'D' 913 914 A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G' 915 916 E's ->mnt_share links with ->mnt_share of K 917 918 'E', 'K', 'F', 'G' have their ->mnt_master point to struct vfsmount of 'A' 919 920 'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K' 921 922 K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N' 923 924 C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K' 925 926 J and K's ->mnt_master points to struct vfsmount of C 927 928 and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I' 929 930 'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'. 931 932 933 NOTE: The propagation tree is orthogonal to the mount tree. 934 935B) Locking: 936 937 ->mnt_share, ->mnt_slave, ->mnt_slave_list, ->mnt_master are protected 938 by namespace_sem (exclusive for modifications, shared for reading). 939 940 Normally we have ->mnt_flags modifications serialized by vfsmount_lock. 941 There are two exceptions: do_add_mount() and clone_mnt(). 942 The former modifies a vfsmount that has not been visible in any shared 943 data structures yet. 944 The latter holds namespace_sem and the only references to vfsmount 945 are in lists that can't be traversed without namespace_sem. 946 947C) Algorithm: 948 949 The crux of the implementation resides in rbind/move operation. 950 951 The overall algorithm breaks the operation into 3 phases: (look at 952 attach_recursive_mnt() and propagate_mnt()) 953 954 1. Prepare phase. 955 956 For each mount in the source tree: 957 958 a) Create the necessary number of mount trees to 959 be attached to each of the mounts that receive 960 propagation from the destination mount. 961 b) Do not attach any of the trees to its destination. 962 However note down its ->mnt_parent and ->mnt_mountpoint 963 c) Link all the new mounts to form a propagation tree that 964 is identical to the propagation tree of the destination 965 mount. 966 967 If this phase is successful, there should be 'n' new 968 propagation trees; where 'n' is the number of mounts in the 969 source tree. Go to the commit phase 970 971 Also there should be 'm' new mount trees, where 'm' is 972 the number of mounts to which the destination mount 973 propagates to. 974 975 If any memory allocations fail, go to the abort phase. 976 977 2. Commit phase. 978 979 Attach each of the mount trees to their corresponding 980 destination mounts. 981 982 3. Abort phase. 983 984 Delete all the newly created trees. 985 986 .. Note:: 987 all the propagation related functionality resides in the file pnode.c 988 989 990------------------------------------------------------------------------ 991 992version 0.1 (created the initial document, Ram Pai linuxram@us.ibm.com) 993 994version 0.2 (Incorporated comments from Al Viro) 995