1.. SPDX-License-Identifier: GPL-2.0 2 3=============== 4Shared Subtrees 5=============== 6 7.. Contents: 8 1) Overview 9 2) Features 10 3) Setting mount states 11 4) Use-case 12 5) Detailed semantics 13 6) Quiz 14 7) FAQ 15 8) Implementation 16 17 181) Overview 19----------- 20 21Consider the following situation: 22 23A process wants to clone its own namespace, but still wants to access the CD 24that got mounted recently. Shared subtree semantics provide the necessary 25mechanism to accomplish the above. 26 27It provides the necessary building blocks for features like per-user-namespace 28and versioned filesystem. 29 302) Features 31----------- 32 33Shared subtree provides four different flavors of mounts; struct vfsmount to be 34precise 35 36 a. shared mount 37 b. slave mount 38 c. private mount 39 d. unbindable mount 40 41 422a) A shared mount can be replicated to as many mountpoints and all the 43replicas continue to be exactly same. 44 45 Here is an example: 46 47 Let's say /mnt has a mount that is shared:: 48 49 mount --make-shared /mnt 50 51 Note: mount(8) command now supports the --make-shared flag, 52 so the sample 'smount' program is no longer needed and has been 53 removed. 54 55 :: 56 57 # mount --bind /mnt /tmp 58 59 The above command replicates the mount at /mnt to the mountpoint /tmp 60 and the contents of both the mounts remain identical. 61 62 :: 63 64 #ls /mnt 65 a b c 66 67 #ls /tmp 68 a b c 69 70 Now let's say we mount a device at /tmp/a:: 71 72 # mount /dev/sd0 /tmp/a 73 74 #ls /tmp/a 75 t1 t2 t3 76 77 #ls /mnt/a 78 t1 t2 t3 79 80 Note that the mount has propagated to the mount at /mnt as well. 81 82 And the same is true even when /dev/sd0 is mounted on /mnt/a. The 83 contents will be visible under /tmp/a too. 84 85 862b) A slave mount is like a shared mount except that mount and umount events 87 only propagate towards it. 88 89 All slave mounts have a master mount which is a shared. 90 91 Here is an example: 92 93 Let's say /mnt has a mount which is shared. 94 # mount --make-shared /mnt 95 96 Let's bind mount /mnt to /tmp 97 # mount --bind /mnt /tmp 98 99 the new mount at /tmp becomes a shared mount and it is a replica of 100 the mount at /mnt. 101 102 Now let's make the mount at /tmp; a slave of /mnt 103 # mount --make-slave /tmp 104 105 let's mount /dev/sd0 on /mnt/a 106 # mount /dev/sd0 /mnt/a 107 108 #ls /mnt/a 109 t1 t2 t3 110 111 #ls /tmp/a 112 t1 t2 t3 113 114 Note the mount event has propagated to the mount at /tmp 115 116 However let's see what happens if we mount something on the mount at /tmp 117 118 # mount /dev/sd1 /tmp/b 119 120 #ls /tmp/b 121 s1 s2 s3 122 123 #ls /mnt/b 124 125 Note how the mount event has not propagated to the mount at 126 /mnt 127 128 1292c) A private mount does not forward or receive propagation. 130 131 This is the mount we are familiar with. Its the default type. 132 133 1342d) A unbindable mount is a unbindable private mount 135 136 let's say we have a mount at /mnt and we make it unbindable:: 137 138 # mount --make-unbindable /mnt 139 140 Let's try to bind mount this mount somewhere else:: 141 142 # mount --bind /mnt /tmp 143 mount: wrong fs type, bad option, bad superblock on /mnt, 144 or too many mounted file systems 145 146 Binding a unbindable mount is a invalid operation. 147 148 1493) Setting mount states 150----------------------- 151 152 The mount command (util-linux package) can be used to set mount 153 states:: 154 155 mount --make-shared mountpoint 156 mount --make-slave mountpoint 157 mount --make-private mountpoint 158 mount --make-unbindable mountpoint 159 160 1614) Use cases 162------------ 163 164 A) A process wants to clone its own namespace, but still wants to 165 access the CD that got mounted recently. 166 167 Solution: 168 169 The system administrator can make the mount at /cdrom shared:: 170 171 mount --bind /cdrom /cdrom 172 mount --make-shared /cdrom 173 174 Now any process that clones off a new namespace will have a 175 mount at /cdrom which is a replica of the same mount in the 176 parent namespace. 177 178 So when a CD is inserted and mounted at /cdrom that mount gets 179 propagated to the other mount at /cdrom in all the other clone 180 namespaces. 181 182 B) A process wants its mounts invisible to any other process, but 183 still be able to see the other system mounts. 184 185 Solution: 186 187 To begin with, the administrator can mark the entire mount tree 188 as shareable:: 189 190 mount --make-rshared / 191 192 A new process can clone off a new namespace. And mark some part 193 of its namespace as slave:: 194 195 mount --make-rslave /myprivatetree 196 197 Hence forth any mounts within the /myprivatetree done by the 198 process will not show up in any other namespace. However mounts 199 done in the parent namespace under /myprivatetree still shows 200 up in the process's namespace. 201 202 203 Apart from the above semantics this feature provides the 204 building blocks to solve the following problems: 205 206 C) Per-user namespace 207 208 The above semantics allows a way to share mounts across 209 namespaces. But namespaces are associated with processes. If 210 namespaces are made first class objects with user API to 211 associate/disassociate a namespace with userid, then each user 212 could have his/her own namespace and tailor it to his/her 213 requirements. This needs to be supported in PAM. 214 215 D) Versioned files 216 217 If the entire mount tree is visible at multiple locations, then 218 an underlying versioning file system can return different 219 versions of the file depending on the path used to access that 220 file. 221 222 An example is:: 223 224 mount --make-shared / 225 mount --rbind / /view/v1 226 mount --rbind / /view/v2 227 mount --rbind / /view/v3 228 mount --rbind / /view/v4 229 230 and if /usr has a versioning filesystem mounted, then that 231 mount appears at /view/v1/usr, /view/v2/usr, /view/v3/usr and 232 /view/v4/usr too 233 234 A user can request v3 version of the file /usr/fs/namespace.c 235 by accessing /view/v3/usr/fs/namespace.c . The underlying 236 versioning filesystem can then decipher that v3 version of the 237 filesystem is being requested and return the corresponding 238 inode. 239 2405) Detailed semantics 241--------------------- 242 The section below explains the detailed semantics of 243 bind, rbind, move, mount, umount and clone-namespace operations. 244 245 Note: the word 'vfsmount' and the noun 'mount' have been used 246 to mean the same thing, throughout this document. 247 2485a) Mount states 249 250 A given mount can be in one of the following states 251 252 1) shared 253 2) slave 254 3) shared and slave 255 4) private 256 5) unbindable 257 258 A 'propagation event' is defined as event generated on a vfsmount 259 that leads to mount or unmount actions in other vfsmounts. 260 261 A 'peer group' is defined as a group of vfsmounts that propagate 262 events to each other. 263 264 (1) Shared mounts 265 266 A 'shared mount' is defined as a vfsmount that belongs to a 267 'peer group'. 268 269 For example:: 270 271 mount --make-shared /mnt 272 mount --bind /mnt /tmp 273 274 The mount at /mnt and that at /tmp are both shared and belong 275 to the same peer group. Anything mounted or unmounted under 276 /mnt or /tmp reflect in all the other mounts of its peer 277 group. 278 279 280 (2) Slave mounts 281 282 A 'slave mount' is defined as a vfsmount that receives 283 propagation events and does not forward propagation events. 284 285 A slave mount as the name implies has a master mount from which 286 mount/unmount events are received. Events do not propagate from 287 the slave mount to the master. Only a shared mount can be made 288 a slave by executing the following command:: 289 290 mount --make-slave mount 291 292 A shared mount that is made as a slave is no more shared unless 293 modified to become shared. 294 295 (3) Shared and Slave 296 297 A vfsmount can be both shared as well as slave. This state 298 indicates that the mount is a slave of some vfsmount, and 299 has its own peer group too. This vfsmount receives propagation 300 events from its master vfsmount, and also forwards propagation 301 events to its 'peer group' and to its slave vfsmounts. 302 303 Strictly speaking, the vfsmount is shared having its own 304 peer group, and this peer-group is a slave of some other 305 peer group. 306 307 Only a slave vfsmount can be made as 'shared and slave' by 308 either executing the following command:: 309 310 mount --make-shared mount 311 312 or by moving the slave vfsmount under a shared vfsmount. 313 314 (4) Private mount 315 316 A 'private mount' is defined as vfsmount that does not 317 receive or forward any propagation events. 318 319 (5) Unbindable mount 320 321 A 'unbindable mount' is defined as vfsmount that does not 322 receive or forward any propagation events and cannot 323 be bind mounted. 324 325 326 State diagram: 327 328 The state diagram below explains the state transition of a mount, 329 in response to various commands:: 330 331 ----------------------------------------------------------------------- 332 | |make-shared | make-slave | make-private |make-unbindab| 333 --------------|------------|--------------|--------------|-------------| 334 |shared |shared |*slave/private| private | unbindable | 335 | | | | | | 336 |-------------|------------|--------------|--------------|-------------| 337 |slave |shared | **slave | private | unbindable | 338 | |and slave | | | | 339 |-------------|------------|--------------|--------------|-------------| 340 |shared |shared | slave | private | unbindable | 341 |and slave |and slave | | | | 342 |-------------|------------|--------------|--------------|-------------| 343 |private |shared | **private | private | unbindable | 344 |-------------|------------|--------------|--------------|-------------| 345 |unbindable |shared |**unbindable | private | unbindable | 346 ------------------------------------------------------------------------ 347 348 * if the shared mount is the only mount in its peer group, making it 349 slave, makes it private automatically. Note that there is no master to 350 which it can be slaved to. 351 352 ** slaving a non-shared mount has no effect on the mount. 353 354 Apart from the commands listed below, the 'move' operation also changes 355 the state of a mount depending on type of the destination mount. Its 356 explained in section 5d. 357 3585b) Bind semantics 359 360 Consider the following command:: 361 362 mount --bind A/a B/b 363 364 where 'A' is the source mount, 'a' is the dentry in the mount 'A', 'B' 365 is the destination mount and 'b' is the dentry in the destination mount. 366 367 The outcome depends on the type of mount of 'A' and 'B'. The table 368 below contains quick reference:: 369 370 -------------------------------------------------------------------------- 371 | BIND MOUNT OPERATION | 372 |************************************************************************| 373 |source(A)->| shared | private | slave | unbindable | 374 | dest(B) | | | | | 375 | | | | | | | 376 | v | | | | | 377 |************************************************************************| 378 | shared | shared | shared | shared & slave | invalid | 379 | | | | | | 380 |non-shared| shared | private | slave | invalid | 381 ************************************************************************** 382 383 Details: 384 385 1. 'A' is a shared mount and 'B' is a shared mount. A new mount 'C' 386 which is clone of 'A', is created. Its root dentry is 'a' . 'C' is 387 mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ... 388 are created and mounted at the dentry 'b' on all mounts where 'B' 389 propagates to. A new propagation tree containing 'C1',..,'Cn' is 390 created. This propagation tree is identical to the propagation tree of 391 'B'. And finally the peer-group of 'C' is merged with the peer group 392 of 'A'. 393 394 2. 'A' is a private mount and 'B' is a shared mount. A new mount 'C' 395 which is clone of 'A', is created. Its root dentry is 'a'. 'C' is 396 mounted on mount 'B' at dentry 'b'. Also new mount 'C1', 'C2', 'C3' ... 397 are created and mounted at the dentry 'b' on all mounts where 'B' 398 propagates to. A new propagation tree is set containing all new mounts 399 'C', 'C1', .., 'Cn' with exactly the same configuration as the 400 propagation tree for 'B'. 401 402 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. A new 403 mount 'C' which is clone of 'A', is created. Its root dentry is 'a' . 404 'C' is mounted on mount 'B' at dentry 'b'. Also new mounts 'C1', 'C2', 405 'C3' ... are created and mounted at the dentry 'b' on all mounts where 406 'B' propagates to. A new propagation tree containing the new mounts 407 'C','C1',.. 'Cn' is created. This propagation tree is identical to the 408 propagation tree for 'B'. And finally the mount 'C' and its peer group 409 is made the slave of mount 'Z'. In other words, mount 'C' is in the 410 state 'slave and shared'. 411 412 4. 'A' is a unbindable mount and 'B' is a shared mount. This is a 413 invalid operation. 414 415 5. 'A' is a private mount and 'B' is a non-shared(private or slave or 416 unbindable) mount. A new mount 'C' which is clone of 'A', is created. 417 Its root dentry is 'a'. 'C' is mounted on mount 'B' at dentry 'b'. 418 419 6. 'A' is a shared mount and 'B' is a non-shared mount. A new mount 'C' 420 which is a clone of 'A' is created. Its root dentry is 'a'. 'C' is 421 mounted on mount 'B' at dentry 'b'. 'C' is made a member of the 422 peer-group of 'A'. 423 424 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. A 425 new mount 'C' which is a clone of 'A' is created. Its root dentry is 426 'a'. 'C' is mounted on mount 'B' at dentry 'b'. Also 'C' is set as a 427 slave mount of 'Z'. In other words 'A' and 'C' are both slave mounts of 428 'Z'. All mount/unmount events on 'Z' propagates to 'A' and 'C'. But 429 mount/unmount on 'A' do not propagate anywhere else. Similarly 430 mount/unmount on 'C' do not propagate anywhere else. 431 432 8. 'A' is a unbindable mount and 'B' is a non-shared mount. This is a 433 invalid operation. A unbindable mount cannot be bind mounted. 434 4355c) Rbind semantics 436 437 rbind is same as bind. Bind replicates the specified mount. Rbind 438 replicates all the mounts in the tree belonging to the specified mount. 439 Rbind mount is bind mount applied to all the mounts in the tree. 440 441 If the source tree that is rbind has some unbindable mounts, 442 then the subtree under the unbindable mount is pruned in the new 443 location. 444 445 eg: 446 447 let's say we have the following mount tree:: 448 449 A 450 / \ 451 B C 452 / \ / \ 453 D E F G 454 455 Let's say all the mount except the mount C in the tree are 456 of a type other than unbindable. 457 458 If this tree is rbound to say Z 459 460 We will have the following tree at the new location:: 461 462 Z 463 | 464 A' 465 / 466 B' Note how the tree under C is pruned 467 / \ in the new location. 468 D' E' 469 470 471 4725d) Move semantics 473 474 Consider the following command 475 476 mount --move A B/b 477 478 where 'A' is the source mount, 'B' is the destination mount and 'b' is 479 the dentry in the destination mount. 480 481 The outcome depends on the type of the mount of 'A' and 'B'. The table 482 below is a quick reference:: 483 484 --------------------------------------------------------------------------- 485 | MOVE MOUNT OPERATION | 486 |************************************************************************** 487 | source(A)->| shared | private | slave | unbindable | 488 | dest(B) | | | | | 489 | | | | | | | 490 | v | | | | | 491 |************************************************************************** 492 | shared | shared | shared |shared and slave| invalid | 493 | | | | | | 494 |non-shared| shared | private | slave | unbindable | 495 *************************************************************************** 496 497 .. Note:: moving a mount residing under a shared mount is invalid. 498 499 Details follow: 500 501 1. 'A' is a shared mount and 'B' is a shared mount. The mount 'A' is 502 mounted on mount 'B' at dentry 'b'. Also new mounts 'A1', 'A2'...'An' 503 are created and mounted at dentry 'b' on all mounts that receive 504 propagation from mount 'B'. A new propagation tree is created in the 505 exact same configuration as that of 'B'. This new propagation tree 506 contains all the new mounts 'A1', 'A2'... 'An'. And this new 507 propagation tree is appended to the already existing propagation tree 508 of 'A'. 509 510 2. 'A' is a private mount and 'B' is a shared mount. The mount 'A' is 511 mounted on mount 'B' at dentry 'b'. Also new mount 'A1', 'A2'... 'An' 512 are created and mounted at dentry 'b' on all mounts that receive 513 propagation from mount 'B'. The mount 'A' becomes a shared mount and a 514 propagation tree is created which is identical to that of 515 'B'. This new propagation tree contains all the new mounts 'A1', 516 'A2'... 'An'. 517 518 3. 'A' is a slave mount of mount 'Z' and 'B' is a shared mount. The 519 mount 'A' is mounted on mount 'B' at dentry 'b'. Also new mounts 'A1', 520 'A2'... 'An' are created and mounted at dentry 'b' on all mounts that 521 receive propagation from mount 'B'. A new propagation tree is created 522 in the exact same configuration as that of 'B'. This new propagation 523 tree contains all the new mounts 'A1', 'A2'... 'An'. And this new 524 propagation tree is appended to the already existing propagation tree of 525 'A'. Mount 'A' continues to be the slave mount of 'Z' but it also 526 becomes 'shared'. 527 528 4. 'A' is a unbindable mount and 'B' is a shared mount. The operation 529 is invalid. Because mounting anything on the shared mount 'B' can 530 create new mounts that get mounted on the mounts that receive 531 propagation from 'B'. And since the mount 'A' is unbindable, cloning 532 it to mount at other mountpoints is not possible. 533 534 5. 'A' is a private mount and 'B' is a non-shared(private or slave or 535 unbindable) mount. The mount 'A' is mounted on mount 'B' at dentry 'b'. 536 537 6. 'A' is a shared mount and 'B' is a non-shared mount. The mount 'A' 538 is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a 539 shared mount. 540 541 7. 'A' is a slave mount of mount 'Z' and 'B' is a non-shared mount. 542 The mount 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' 543 continues to be a slave mount of mount 'Z'. 544 545 8. 'A' is a unbindable mount and 'B' is a non-shared mount. The mount 546 'A' is mounted on mount 'B' at dentry 'b'. Mount 'A' continues to be a 547 unbindable mount. 548 5495e) Mount semantics 550 551 Consider the following command:: 552 553 mount device B/b 554 555 'B' is the destination mount and 'b' is the dentry in the destination 556 mount. 557 558 The above operation is the same as bind operation with the exception 559 that the source mount is always a private mount. 560 561 5625f) Unmount semantics 563 564 Consider the following command:: 565 566 umount A 567 568 where 'A' is a mount mounted on mount 'B' at dentry 'b'. 569 570 If mount 'B' is shared, then all most-recently-mounted mounts at dentry 571 'b' on mounts that receive propagation from mount 'B' and does not have 572 sub-mounts within them are unmounted. 573 574 Example: Let's say 'B1', 'B2', 'B3' are shared mounts that propagate to 575 each other. 576 577 let's say 'A1', 'A2', 'A3' are first mounted at dentry 'b' on mount 578 'B1', 'B2' and 'B3' respectively. 579 580 let's say 'C1', 'C2', 'C3' are next mounted at the same dentry 'b' on 581 mount 'B1', 'B2' and 'B3' respectively. 582 583 if 'C1' is unmounted, all the mounts that are most-recently-mounted on 584 'B1' and on the mounts that 'B1' propagates-to are unmounted. 585 586 'B1' propagates to 'B2' and 'B3'. And the most recently mounted mount 587 on 'B2' at dentry 'b' is 'C2', and that of mount 'B3' is 'C3'. 588 589 So all 'C1', 'C2' and 'C3' should be unmounted. 590 591 If any of 'C2' or 'C3' has some child mounts, then that mount is not 592 unmounted, but all other mounts are unmounted. However if 'C1' is told 593 to be unmounted and 'C1' has some sub-mounts, the umount operation is 594 failed entirely. 595 5965g) Clone Namespace 597 598 A cloned namespace contains all the mounts as that of the parent 599 namespace. 600 601 Let's say 'A' and 'B' are the corresponding mounts in the parent and the 602 child namespace. 603 604 If 'A' is shared, then 'B' is also shared and 'A' and 'B' propagate to 605 each other. 606 607 If 'A' is a slave mount of 'Z', then 'B' is also the slave mount of 608 'Z'. 609 610 If 'A' is a private mount, then 'B' is a private mount too. 611 612 If 'A' is unbindable mount, then 'B' is a unbindable mount too. 613 614 6156) Quiz 616------- 617 618 A. What is the result of the following command sequence? 619 620 :: 621 622 mount --bind /mnt /mnt 623 mount --make-shared /mnt 624 mount --bind /mnt /tmp 625 mount --move /tmp /mnt/1 626 627 what should be the contents of /mnt /mnt/1 /mnt/1/1 should be? 628 Should they all be identical? or should /mnt and /mnt/1 be 629 identical only? 630 631 632 B. What is the result of the following command sequence? 633 634 :: 635 636 mount --make-rshared / 637 mkdir -p /v/1 638 mount --rbind / /v/1 639 640 what should be the content of /v/1/v/1 be? 641 642 643 C. What is the result of the following command sequence? 644 645 :: 646 647 mount --bind /mnt /mnt 648 mount --make-shared /mnt 649 mkdir -p /mnt/1/2/3 /mnt/1/test 650 mount --bind /mnt/1 /tmp 651 mount --make-slave /mnt 652 mount --make-shared /mnt 653 mount --bind /mnt/1/2 /tmp1 654 mount --make-slave /mnt 655 656 At this point we have the first mount at /tmp and 657 its root dentry is 1. Let's call this mount 'A' 658 And then we have a second mount at /tmp1 with root 659 dentry 2. Let's call this mount 'B' 660 Next we have a third mount at /mnt with root dentry 661 mnt. Let's call this mount 'C' 662 663 'B' is the slave of 'A' and 'C' is a slave of 'B' 664 A -> B -> C 665 666 at this point if we execute the following command 667 668 mount --bind /bin /tmp/test 669 670 The mount is attempted on 'A' 671 672 will the mount propagate to 'B' and 'C' ? 673 674 what would be the contents of 675 /mnt/1/test be? 676 6777) FAQ 678------ 679 680 Q1. Why is bind mount needed? How is it different from symbolic links? 681 symbolic links can get stale if the destination mount gets 682 unmounted or moved. Bind mounts continue to exist even if the 683 other mount is unmounted or moved. 684 685 Q2. Why can't the shared subtree be implemented using exportfs? 686 687 exportfs is a heavyweight way of accomplishing part of what 688 shared subtree can do. I cannot imagine a way to implement the 689 semantics of slave mount using exportfs? 690 691 Q3 Why is unbindable mount needed? 692 693 Let's say we want to replicate the mount tree at multiple 694 locations within the same subtree. 695 696 if one rbind mounts a tree within the same subtree 'n' times 697 the number of mounts created is an exponential function of 'n'. 698 Having unbindable mount can help prune the unneeded bind 699 mounts. Here is an example. 700 701 step 1: 702 let's say the root tree has just two directories with 703 one vfsmount:: 704 705 root 706 / \ 707 tmp usr 708 709 And we want to replicate the tree at multiple 710 mountpoints under /root/tmp 711 712 step 2: 713 :: 714 715 716 mount --make-shared /root 717 718 mkdir -p /tmp/m1 719 720 mount --rbind /root /tmp/m1 721 722 the new tree now looks like this:: 723 724 root 725 / \ 726 tmp usr 727 / 728 m1 729 / \ 730 tmp usr 731 / 732 m1 733 734 it has two vfsmounts 735 736 step 3: 737 :: 738 739 mkdir -p /tmp/m2 740 mount --rbind /root /tmp/m2 741 742 the new tree now looks like this:: 743 744 root 745 / \ 746 tmp usr 747 / \ 748 m1 m2 749 / \ / \ 750 tmp usr tmp usr 751 / \ / 752 m1 m2 m1 753 / \ / \ 754 tmp usr tmp usr 755 / / \ 756 m1 m1 m2 757 / \ 758 tmp usr 759 / \ 760 m1 m2 761 762 it has 6 vfsmounts 763 764 step 4: 765 :: 766 mkdir -p /tmp/m3 767 mount --rbind /root /tmp/m3 768 769 I won't draw the tree..but it has 24 vfsmounts 770 771 772 at step i the number of vfsmounts is V[i] = i*V[i-1]. 773 This is an exponential function. And this tree has way more 774 mounts than what we really needed in the first place. 775 776 One could use a series of umount at each step to prune 777 out the unneeded mounts. But there is a better solution. 778 Unclonable mounts come in handy here. 779 780 step 1: 781 let's say the root tree has just two directories with 782 one vfsmount:: 783 784 root 785 / \ 786 tmp usr 787 788 How do we set up the same tree at multiple locations under 789 /root/tmp 790 791 step 2: 792 :: 793 794 795 mount --bind /root/tmp /root/tmp 796 797 mount --make-rshared /root 798 mount --make-unbindable /root/tmp 799 800 mkdir -p /tmp/m1 801 802 mount --rbind /root /tmp/m1 803 804 the new tree now looks like this:: 805 806 root 807 / \ 808 tmp usr 809 / 810 m1 811 / \ 812 tmp usr 813 814 step 3: 815 :: 816 817 mkdir -p /tmp/m2 818 mount --rbind /root /tmp/m2 819 820 the new tree now looks like this:: 821 822 root 823 / \ 824 tmp usr 825 / \ 826 m1 m2 827 / \ / \ 828 tmp usr tmp usr 829 830 step 4: 831 :: 832 833 mkdir -p /tmp/m3 834 mount --rbind /root /tmp/m3 835 836 the new tree now looks like this:: 837 838 root 839 / \ 840 tmp usr 841 / \ \ 842 m1 m2 m3 843 / \ / \ / \ 844 tmp usr tmp usr tmp usr 845 8468) Implementation 847----------------- 848 8498A) Datastructure 850 851 4 new fields are introduced to struct vfsmount: 852 853 * ->mnt_share 854 * ->mnt_slave_list 855 * ->mnt_slave 856 * ->mnt_master 857 858 ->mnt_share 859 links together all the mount to/from which this vfsmount 860 send/receives propagation events. 861 862 ->mnt_slave_list 863 links all the mounts to which this vfsmount propagates 864 to. 865 866 ->mnt_slave 867 links together all the slaves that its master vfsmount 868 propagates to. 869 870 ->mnt_master 871 points to the master vfsmount from which this vfsmount 872 receives propagation. 873 874 ->mnt_flags 875 takes two more flags to indicate the propagation status of 876 the vfsmount. MNT_SHARE indicates that the vfsmount is a shared 877 vfsmount. MNT_UNCLONABLE indicates that the vfsmount cannot be 878 replicated. 879 880 All the shared vfsmounts in a peer group form a cyclic list through 881 ->mnt_share. 882 883 All vfsmounts with the same ->mnt_master form on a cyclic list anchored 884 in ->mnt_master->mnt_slave_list and going through ->mnt_slave. 885 886 ->mnt_master can point to arbitrary (and possibly different) members 887 of master peer group. To find all immediate slaves of a peer group 888 you need to go through _all_ ->mnt_slave_list of its members. 889 Conceptually it's just a single set - distribution among the 890 individual lists does not affect propagation or the way propagation 891 tree is modified by operations. 892 893 All vfsmounts in a peer group have the same ->mnt_master. If it is 894 non-NULL, they form a contiguous (ordered) segment of slave list. 895 896 A example propagation tree looks as shown in the figure below. 897 [ NOTE: Though it looks like a forest, if we consider all the shared 898 mounts as a conceptual entity called 'pnode', it becomes a tree]:: 899 900 901 A <--> B <--> C <---> D 902 /|\ /| |\ 903 / F G J K H I 904 / 905 E<-->K 906 /|\ 907 M L N 908 909 In the above figure A,B,C and D all are shared and propagate to each 910 other. 'A' has got 3 slave mounts 'E' 'F' and 'G' 'C' has got 2 slave 911 mounts 'J' and 'K' and 'D' has got two slave mounts 'H' and 'I'. 912 'E' is also shared with 'K' and they propagate to each other. And 913 'K' has 3 slaves 'M', 'L' and 'N' 914 915 A's ->mnt_share links with the ->mnt_share of 'B' 'C' and 'D' 916 917 A's ->mnt_slave_list links with ->mnt_slave of 'E', 'K', 'F' and 'G' 918 919 E's ->mnt_share links with ->mnt_share of K 920 921 'E', 'K', 'F', 'G' have their ->mnt_master point to struct vfsmount of 'A' 922 923 'M', 'L', 'N' have their ->mnt_master point to struct vfsmount of 'K' 924 925 K's ->mnt_slave_list links with ->mnt_slave of 'M', 'L' and 'N' 926 927 C's ->mnt_slave_list links with ->mnt_slave of 'J' and 'K' 928 929 J and K's ->mnt_master points to struct vfsmount of C 930 931 and finally D's ->mnt_slave_list links with ->mnt_slave of 'H' and 'I' 932 933 'H' and 'I' have their ->mnt_master pointing to struct vfsmount of 'D'. 934 935 936 NOTE: The propagation tree is orthogonal to the mount tree. 937 9388B Locking: 939 940 ->mnt_share, ->mnt_slave, ->mnt_slave_list, ->mnt_master are protected 941 by namespace_sem (exclusive for modifications, shared for reading). 942 943 Normally we have ->mnt_flags modifications serialized by vfsmount_lock. 944 There are two exceptions: do_add_mount() and clone_mnt(). 945 The former modifies a vfsmount that has not been visible in any shared 946 data structures yet. 947 The latter holds namespace_sem and the only references to vfsmount 948 are in lists that can't be traversed without namespace_sem. 949 9508C Algorithm: 951 952 The crux of the implementation resides in rbind/move operation. 953 954 The overall algorithm breaks the operation into 3 phases: (look at 955 attach_recursive_mnt() and propagate_mnt()) 956 957 1. prepare phase. 958 2. commit phases. 959 3. abort phases. 960 961 Prepare phase: 962 963 for each mount in the source tree: 964 965 a) Create the necessary number of mount trees to 966 be attached to each of the mounts that receive 967 propagation from the destination mount. 968 b) Do not attach any of the trees to its destination. 969 However note down its ->mnt_parent and ->mnt_mountpoint 970 c) Link all the new mounts to form a propagation tree that 971 is identical to the propagation tree of the destination 972 mount. 973 974 If this phase is successful, there should be 'n' new 975 propagation trees; where 'n' is the number of mounts in the 976 source tree. Go to the commit phase 977 978 Also there should be 'm' new mount trees, where 'm' is 979 the number of mounts to which the destination mount 980 propagates to. 981 982 if any memory allocations fail, go to the abort phase. 983 984 Commit phase 985 attach each of the mount trees to their corresponding 986 destination mounts. 987 988 Abort phase 989 delete all the newly created trees. 990 991 .. Note:: 992 all the propagation related functionality resides in the file pnode.c 993 994 995------------------------------------------------------------------------ 996 997version 0.1 (created the initial document, Ram Pai linuxram@us.ibm.com) 998 999version 0.2 (Incorporated comments from Al Viro) 1000