125b532ceSMauro Carvalho Chehab==================== 225b532ceSMauro Carvalho ChehabChanges since 2.5.0: 325b532ceSMauro Carvalho Chehab==================== 425b532ceSMauro Carvalho Chehab 525b532ceSMauro Carvalho Chehab--- 625b532ceSMauro Carvalho Chehab 725b532ceSMauro Carvalho Chehab**recommended** 825b532ceSMauro Carvalho Chehab 925b532ceSMauro Carvalho ChehabNew helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(), 1025b532ceSMauro Carvalho Chehabsb_set_blocksize() and sb_min_blocksize(). 1125b532ceSMauro Carvalho Chehab 1225b532ceSMauro Carvalho ChehabUse them. 1325b532ceSMauro Carvalho Chehab 1425b532ceSMauro Carvalho Chehab(sb_find_get_block() replaces 2.4's get_hash_table()) 1525b532ceSMauro Carvalho Chehab 1625b532ceSMauro Carvalho Chehab--- 1725b532ceSMauro Carvalho Chehab 1825b532ceSMauro Carvalho Chehab**recommended** 1925b532ceSMauro Carvalho Chehab 2025b532ceSMauro Carvalho ChehabNew methods: ->alloc_inode() and ->destroy_inode(). 2125b532ceSMauro Carvalho Chehab 2225b532ceSMauro Carvalho ChehabRemove inode->u.foo_inode_i 2325b532ceSMauro Carvalho Chehab 2425b532ceSMauro Carvalho ChehabDeclare:: 2525b532ceSMauro Carvalho Chehab 2625b532ceSMauro Carvalho Chehab struct foo_inode_info { 2725b532ceSMauro Carvalho Chehab /* fs-private stuff */ 2825b532ceSMauro Carvalho Chehab struct inode vfs_inode; 2925b532ceSMauro Carvalho Chehab }; 3025b532ceSMauro Carvalho Chehab static inline struct foo_inode_info *FOO_I(struct inode *inode) 3125b532ceSMauro Carvalho Chehab { 3225b532ceSMauro Carvalho Chehab return list_entry(inode, struct foo_inode_info, vfs_inode); 3325b532ceSMauro Carvalho Chehab } 3425b532ceSMauro Carvalho Chehab 3525b532ceSMauro Carvalho ChehabUse FOO_I(inode) instead of &inode->u.foo_inode_i; 3625b532ceSMauro Carvalho Chehab 3725b532ceSMauro Carvalho ChehabAdd foo_alloc_inode() and foo_destroy_inode() - the former should allocate 3825b532ceSMauro Carvalho Chehabfoo_inode_info and return the address of ->vfs_inode, the latter should free 3925b532ceSMauro Carvalho ChehabFOO_I(inode) (see in-tree filesystems for examples). 4025b532ceSMauro Carvalho Chehab 4125b532ceSMauro Carvalho ChehabMake them ->alloc_inode and ->destroy_inode in your super_operations. 4225b532ceSMauro Carvalho Chehab 4325b532ceSMauro Carvalho ChehabKeep in mind that now you need explicit initialization of private data 4425b532ceSMauro Carvalho Chehabtypically between calling iget_locked() and unlocking the inode. 4525b532ceSMauro Carvalho Chehab 4625b532ceSMauro Carvalho ChehabAt some point that will become mandatory. 4725b532ceSMauro Carvalho Chehab 488b9f3ac5SMuchun Song**mandatory** 498b9f3ac5SMuchun Song 508b9f3ac5SMuchun SongThe foo_inode_info should always be allocated through alloc_inode_sb() rather 518b9f3ac5SMuchun Songthan kmem_cache_alloc() or kmalloc() related to set up the inode reclaim context 528b9f3ac5SMuchun Songcorrectly. 538b9f3ac5SMuchun Song 5425b532ceSMauro Carvalho Chehab--- 5525b532ceSMauro Carvalho Chehab 5625b532ceSMauro Carvalho Chehab**mandatory** 5725b532ceSMauro Carvalho Chehab 5825b532ceSMauro Carvalho ChehabChange of file_system_type method (->read_super to ->get_sb) 5925b532ceSMauro Carvalho Chehab 6025b532ceSMauro Carvalho Chehab->read_super() is no more. Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV. 6125b532ceSMauro Carvalho Chehab 6225b532ceSMauro Carvalho ChehabTurn your foo_read_super() into a function that would return 0 in case of 6325b532ceSMauro Carvalho Chehabsuccess and negative number in case of error (-EINVAL unless you have more 6425b532ceSMauro Carvalho Chehabinformative error value to report). Call it foo_fill_super(). Now declare:: 6525b532ceSMauro Carvalho Chehab 6625b532ceSMauro Carvalho Chehab int foo_get_sb(struct file_system_type *fs_type, 6725b532ceSMauro Carvalho Chehab int flags, const char *dev_name, void *data, struct vfsmount *mnt) 6825b532ceSMauro Carvalho Chehab { 6925b532ceSMauro Carvalho Chehab return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super, 7025b532ceSMauro Carvalho Chehab mnt); 7125b532ceSMauro Carvalho Chehab } 7225b532ceSMauro Carvalho Chehab 7325b532ceSMauro Carvalho Chehab(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of 7425b532ceSMauro Carvalho Chehabfilesystem). 7525b532ceSMauro Carvalho Chehab 7625b532ceSMauro Carvalho ChehabReplace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as 7725b532ceSMauro Carvalho Chehabfoo_get_sb. 7825b532ceSMauro Carvalho Chehab 7925b532ceSMauro Carvalho Chehab--- 8025b532ceSMauro Carvalho Chehab 8125b532ceSMauro Carvalho Chehab**mandatory** 8225b532ceSMauro Carvalho Chehab 8325b532ceSMauro Carvalho ChehabLocking change: ->s_vfs_rename_sem is taken only by cross-directory renames. 8425b532ceSMauro Carvalho ChehabMost likely there is no need to change anything, but if you relied on 8525b532ceSMauro Carvalho Chehabglobal exclusion between renames for some internal purpose - you need to 8625b532ceSMauro Carvalho Chehabchange your internal locking. Otherwise exclusion warranties remain the 8725b532ceSMauro Carvalho Chehabsame (i.e. parents and victim are locked, etc.). 8825b532ceSMauro Carvalho Chehab 8925b532ceSMauro Carvalho Chehab--- 9025b532ceSMauro Carvalho Chehab 9125b532ceSMauro Carvalho Chehab**informational** 9225b532ceSMauro Carvalho Chehab 9325b532ceSMauro Carvalho ChehabNow we have the exclusion between ->lookup() and directory removal (by 9425b532ceSMauro Carvalho Chehab->rmdir() and ->rename()). If you used to need that exclusion and do 9525b532ceSMauro Carvalho Chehabit by internal locking (most of filesystems couldn't care less) - you 9625b532ceSMauro Carvalho Chehabcan relax your locking. 9725b532ceSMauro Carvalho Chehab 9825b532ceSMauro Carvalho Chehab--- 9925b532ceSMauro Carvalho Chehab 10025b532ceSMauro Carvalho Chehab**mandatory** 10125b532ceSMauro Carvalho Chehab 10225b532ceSMauro Carvalho Chehab->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(), 10325b532ceSMauro Carvalho Chehab->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename() 10425b532ceSMauro Carvalho Chehaband ->readdir() are called without BKL now. Grab it on entry, drop upon return 10525b532ceSMauro Carvalho Chehab- that will guarantee the same locking you used to have. If your method or its 10625b532ceSMauro Carvalho Chehabparts do not need BKL - better yet, now you can shift lock_kernel() and 10725b532ceSMauro Carvalho Chehabunlock_kernel() so that they would protect exactly what needs to be 10825b532ceSMauro Carvalho Chehabprotected. 10925b532ceSMauro Carvalho Chehab 11025b532ceSMauro Carvalho Chehab--- 11125b532ceSMauro Carvalho Chehab 11225b532ceSMauro Carvalho Chehab**mandatory** 11325b532ceSMauro Carvalho Chehab 11425b532ceSMauro Carvalho ChehabBKL is also moved from around sb operations. BKL should have been shifted into 11525b532ceSMauro Carvalho Chehabindividual fs sb_op functions. If you don't need it, remove it. 11625b532ceSMauro Carvalho Chehab 11725b532ceSMauro Carvalho Chehab--- 11825b532ceSMauro Carvalho Chehab 11925b532ceSMauro Carvalho Chehab**informational** 12025b532ceSMauro Carvalho Chehab 12125b532ceSMauro Carvalho Chehabcheck for ->link() target not being a directory is done by callers. Feel 12225b532ceSMauro Carvalho Chehabfree to drop it... 12325b532ceSMauro Carvalho Chehab 12425b532ceSMauro Carvalho Chehab--- 12525b532ceSMauro Carvalho Chehab 12625b532ceSMauro Carvalho Chehab**informational** 12725b532ceSMauro Carvalho Chehab 12825b532ceSMauro Carvalho Chehab->link() callers hold ->i_mutex on the object we are linking to. Some of your 12925b532ceSMauro Carvalho Chehabproblems might be over... 13025b532ceSMauro Carvalho Chehab 13125b532ceSMauro Carvalho Chehab--- 13225b532ceSMauro Carvalho Chehab 13325b532ceSMauro Carvalho Chehab**mandatory** 13425b532ceSMauro Carvalho Chehab 13525b532ceSMauro Carvalho Chehabnew file_system_type method - kill_sb(superblock). If you are converting 13625b532ceSMauro Carvalho Chehaban existing filesystem, set it according to ->fs_flags:: 13725b532ceSMauro Carvalho Chehab 13825b532ceSMauro Carvalho Chehab FS_REQUIRES_DEV - kill_block_super 13925b532ceSMauro Carvalho Chehab FS_LITTER - kill_litter_super 14025b532ceSMauro Carvalho Chehab neither - kill_anon_super 14125b532ceSMauro Carvalho Chehab 14225b532ceSMauro Carvalho ChehabFS_LITTER is gone - just remove it from fs_flags. 14325b532ceSMauro Carvalho Chehab 14425b532ceSMauro Carvalho Chehab--- 14525b532ceSMauro Carvalho Chehab 14625b532ceSMauro Carvalho Chehab**mandatory** 14725b532ceSMauro Carvalho Chehab 14825b532ceSMauro Carvalho ChehabFS_SINGLE is gone (actually, that had happened back when ->get_sb() 14925b532ceSMauro Carvalho Chehabwent in - and hadn't been documented ;-/). Just remove it from fs_flags 15025b532ceSMauro Carvalho Chehab(and see ->get_sb() entry for other actions). 15125b532ceSMauro Carvalho Chehab 15225b532ceSMauro Carvalho Chehab--- 15325b532ceSMauro Carvalho Chehab 15425b532ceSMauro Carvalho Chehab**mandatory** 15525b532ceSMauro Carvalho Chehab 15625b532ceSMauro Carvalho Chehab->setattr() is called without BKL now. Caller _always_ holds ->i_mutex, so 15725b532ceSMauro Carvalho Chehabwatch for ->i_mutex-grabbing code that might be used by your ->setattr(). 15825b532ceSMauro Carvalho ChehabCallers of notify_change() need ->i_mutex now. 15925b532ceSMauro Carvalho Chehab 16025b532ceSMauro Carvalho Chehab--- 16125b532ceSMauro Carvalho Chehab 16225b532ceSMauro Carvalho Chehab**recommended** 16325b532ceSMauro Carvalho Chehab 16425b532ceSMauro Carvalho ChehabNew super_block field ``struct export_operations *s_export_op`` for 16525b532ceSMauro Carvalho Chehabexplicit support for exporting, e.g. via NFS. The structure is fully 16625b532ceSMauro Carvalho Chehabdocumented at its declaration in include/linux/fs.h, and in 1679195c3e8SMauro Carvalho ChehabDocumentation/filesystems/nfs/exporting.rst. 16825b532ceSMauro Carvalho Chehab 16925b532ceSMauro Carvalho ChehabBriefly it allows for the definition of decode_fh and encode_fh operations 17025b532ceSMauro Carvalho Chehabto encode and decode filehandles, and allows the filesystem to use 17125b532ceSMauro Carvalho Chehaba standard helper function for decode_fh, and provide file-system specific 17225b532ceSMauro Carvalho Chehabsupport for this helper, particularly get_parent. 17325b532ceSMauro Carvalho Chehab 17425b532ceSMauro Carvalho ChehabIt is planned that this will be required for exporting once the code 17525b532ceSMauro Carvalho Chehabsettles down a bit. 17625b532ceSMauro Carvalho Chehab 17725b532ceSMauro Carvalho Chehab**mandatory** 17825b532ceSMauro Carvalho Chehab 17925b532ceSMauro Carvalho Chehabs_export_op is now required for exporting a filesystem. 180d56b699dSBjorn Helgaasisofs, ext2, ext3, reiserfs, fat 18125b532ceSMauro Carvalho Chehabcan be used as examples of very different filesystems. 18225b532ceSMauro Carvalho Chehab 18325b532ceSMauro Carvalho Chehab--- 18425b532ceSMauro Carvalho Chehab 18525b532ceSMauro Carvalho Chehab**mandatory** 18625b532ceSMauro Carvalho Chehab 18725b532ceSMauro Carvalho Chehabiget4() and the read_inode2 callback have been superseded by iget5_locked() 18825b532ceSMauro Carvalho Chehabwhich has the following prototype:: 18925b532ceSMauro Carvalho Chehab 19025b532ceSMauro Carvalho Chehab struct inode *iget5_locked(struct super_block *sb, unsigned long ino, 19125b532ceSMauro Carvalho Chehab int (*test)(struct inode *, void *), 19225b532ceSMauro Carvalho Chehab int (*set)(struct inode *, void *), 19325b532ceSMauro Carvalho Chehab void *data); 19425b532ceSMauro Carvalho Chehab 19525b532ceSMauro Carvalho Chehab'test' is an additional function that can be used when the inode 19625b532ceSMauro Carvalho Chehabnumber is not sufficient to identify the actual file object. 'set' 19725b532ceSMauro Carvalho Chehabshould be a non-blocking function that initializes those parts of a 19825b532ceSMauro Carvalho Chehabnewly created inode to allow the test function to succeed. 'data' is 19925b532ceSMauro Carvalho Chehabpassed as an opaque value to both test and set functions. 20025b532ceSMauro Carvalho Chehab 20125b532ceSMauro Carvalho ChehabWhen the inode has been created by iget5_locked(), it will be returned with the 20225b532ceSMauro Carvalho ChehabI_NEW flag set and will still be locked. The filesystem then needs to finalize 20325b532ceSMauro Carvalho Chehabthe initialization. Once the inode is initialized it must be unlocked by 20425b532ceSMauro Carvalho Chehabcalling unlock_new_inode(). 20525b532ceSMauro Carvalho Chehab 20625b532ceSMauro Carvalho ChehabThe filesystem is responsible for setting (and possibly testing) i_ino 20725b532ceSMauro Carvalho Chehabwhen appropriate. There is also a simpler iget_locked function that 20825b532ceSMauro Carvalho Chehabjust takes the superblock and inode number as arguments and does the 20925b532ceSMauro Carvalho Chehabtest and set for you. 21025b532ceSMauro Carvalho Chehab 21125b532ceSMauro Carvalho Chehabe.g.:: 21225b532ceSMauro Carvalho Chehab 21325b532ceSMauro Carvalho Chehab inode = iget_locked(sb, ino); 21425b532ceSMauro Carvalho Chehab if (inode->i_state & I_NEW) { 21525b532ceSMauro Carvalho Chehab err = read_inode_from_disk(inode); 21625b532ceSMauro Carvalho Chehab if (err < 0) { 21725b532ceSMauro Carvalho Chehab iget_failed(inode); 21825b532ceSMauro Carvalho Chehab return err; 21925b532ceSMauro Carvalho Chehab } 22025b532ceSMauro Carvalho Chehab unlock_new_inode(inode); 22125b532ceSMauro Carvalho Chehab } 22225b532ceSMauro Carvalho Chehab 22325b532ceSMauro Carvalho ChehabNote that if the process of setting up a new inode fails, then iget_failed() 22425b532ceSMauro Carvalho Chehabshould be called on the inode to render it dead, and an appropriate error 22525b532ceSMauro Carvalho Chehabshould be passed back to the caller. 22625b532ceSMauro Carvalho Chehab 22725b532ceSMauro Carvalho Chehab--- 22825b532ceSMauro Carvalho Chehab 22925b532ceSMauro Carvalho Chehab**recommended** 23025b532ceSMauro Carvalho Chehab 23125b532ceSMauro Carvalho Chehab->getattr() finally getting used. See instances in nfs, minix, etc. 23225b532ceSMauro Carvalho Chehab 23325b532ceSMauro Carvalho Chehab--- 23425b532ceSMauro Carvalho Chehab 23525b532ceSMauro Carvalho Chehab**mandatory** 23625b532ceSMauro Carvalho Chehab 23725b532ceSMauro Carvalho Chehab->revalidate() is gone. If your filesystem had it - provide ->getattr() 23825b532ceSMauro Carvalho Chehaband let it call whatever you had as ->revlidate() + (for symlinks that 23925b532ceSMauro Carvalho Chehabhad ->revalidate()) add calls in ->follow_link()/->readlink(). 24025b532ceSMauro Carvalho Chehab 24125b532ceSMauro Carvalho Chehab--- 24225b532ceSMauro Carvalho Chehab 24325b532ceSMauro Carvalho Chehab**mandatory** 24425b532ceSMauro Carvalho Chehab 24525b532ceSMauro Carvalho Chehab->d_parent changes are not protected by BKL anymore. Read access is safe 24625b532ceSMauro Carvalho Chehabif at least one of the following is true: 24725b532ceSMauro Carvalho Chehab 24825b532ceSMauro Carvalho Chehab * filesystem has no cross-directory rename() 24925b532ceSMauro Carvalho Chehab * we know that parent had been locked (e.g. we are looking at 25025b532ceSMauro Carvalho Chehab ->d_parent of ->lookup() argument). 25125b532ceSMauro Carvalho Chehab * we are called from ->rename(). 25225b532ceSMauro Carvalho Chehab * the child's ->d_lock is held 25325b532ceSMauro Carvalho Chehab 25425b532ceSMauro Carvalho ChehabAudit your code and add locking if needed. Notice that any place that is 25525b532ceSMauro Carvalho Chehabnot protected by the conditions above is risky even in the old tree - you 25625b532ceSMauro Carvalho Chehabhad been relying on BKL and that's prone to screwups. Old tree had quite 25725b532ceSMauro Carvalho Chehaba few holes of that kind - unprotected access to ->d_parent leading to 25825b532ceSMauro Carvalho Chehabanything from oops to silent memory corruption. 25925b532ceSMauro Carvalho Chehab 26025b532ceSMauro Carvalho Chehab--- 26125b532ceSMauro Carvalho Chehab 26225b532ceSMauro Carvalho Chehab**mandatory** 26325b532ceSMauro Carvalho Chehab 26425b532ceSMauro Carvalho ChehabFS_NOMOUNT is gone. If you use it - just set SB_NOUSER in flags 26525b532ceSMauro Carvalho Chehab(see rootfs for one kind of solution and bdev/socket/pipe for another). 26625b532ceSMauro Carvalho Chehab 26725b532ceSMauro Carvalho Chehab--- 26825b532ceSMauro Carvalho Chehab 26925b532ceSMauro Carvalho Chehab**recommended** 27025b532ceSMauro Carvalho Chehab 27125b532ceSMauro Carvalho ChehabUse bdev_read_only(bdev) instead of is_read_only(kdev). The latter 27225b532ceSMauro Carvalho Chehabis still alive, but only because of the mess in drivers/s390/block/dasd.c. 27325b532ceSMauro Carvalho ChehabAs soon as it gets fixed is_read_only() will die. 27425b532ceSMauro Carvalho Chehab 27525b532ceSMauro Carvalho Chehab--- 27625b532ceSMauro Carvalho Chehab 27725b532ceSMauro Carvalho Chehab**mandatory** 27825b532ceSMauro Carvalho Chehab 27925b532ceSMauro Carvalho Chehab->permission() is called without BKL now. Grab it on entry, drop upon 28025b532ceSMauro Carvalho Chehabreturn - that will guarantee the same locking you used to have. If 28125b532ceSMauro Carvalho Chehabyour method or its parts do not need BKL - better yet, now you can 28225b532ceSMauro Carvalho Chehabshift lock_kernel() and unlock_kernel() so that they would protect 28325b532ceSMauro Carvalho Chehabexactly what needs to be protected. 28425b532ceSMauro Carvalho Chehab 28525b532ceSMauro Carvalho Chehab--- 28625b532ceSMauro Carvalho Chehab 28725b532ceSMauro Carvalho Chehab**mandatory** 28825b532ceSMauro Carvalho Chehab 28925b532ceSMauro Carvalho Chehab->statfs() is now called without BKL held. BKL should have been 29025b532ceSMauro Carvalho Chehabshifted into individual fs sb_op functions where it's not clear that 29125b532ceSMauro Carvalho Chehabit's safe to remove it. If you don't need it, remove it. 29225b532ceSMauro Carvalho Chehab 29325b532ceSMauro Carvalho Chehab--- 29425b532ceSMauro Carvalho Chehab 29525b532ceSMauro Carvalho Chehab**mandatory** 29625b532ceSMauro Carvalho Chehab 29725b532ceSMauro Carvalho Chehabis_read_only() is gone; use bdev_read_only() instead. 29825b532ceSMauro Carvalho Chehab 29925b532ceSMauro Carvalho Chehab--- 30025b532ceSMauro Carvalho Chehab 30125b532ceSMauro Carvalho Chehab**mandatory** 30225b532ceSMauro Carvalho Chehab 30325b532ceSMauro Carvalho Chehabdestroy_buffers() is gone; use invalidate_bdev(). 30425b532ceSMauro Carvalho Chehab 30525b532ceSMauro Carvalho Chehab--- 30625b532ceSMauro Carvalho Chehab 30725b532ceSMauro Carvalho Chehab**mandatory** 30825b532ceSMauro Carvalho Chehab 30925b532ceSMauro Carvalho Chehabfsync_dev() is gone; use fsync_bdev(). NOTE: lvm breakage is 31025b532ceSMauro Carvalho Chehabdeliberate; as soon as struct block_device * is propagated in a reasonable 31125b532ceSMauro Carvalho Chehabway by that code fixing will become trivial; until then nothing can be 31225b532ceSMauro Carvalho Chehabdone. 31325b532ceSMauro Carvalho Chehab 31425b532ceSMauro Carvalho Chehab**mandatory** 31525b532ceSMauro Carvalho Chehab 31625b532ceSMauro Carvalho Chehabblock truncatation on error exit from ->write_begin, and ->direct_IO 31725b532ceSMauro Carvalho Chehabmoved from generic methods (block_write_begin, cont_write_begin, 31825b532ceSMauro Carvalho Chehabnobh_write_begin, blockdev_direct_IO*) to callers. Take a look at 31925b532ceSMauro Carvalho Chehabext2_write_failed and callers for an example. 32025b532ceSMauro Carvalho Chehab 32125b532ceSMauro Carvalho Chehab**mandatory** 32225b532ceSMauro Carvalho Chehab 32325b532ceSMauro Carvalho Chehab->truncate is gone. The whole truncate sequence needs to be 32425b532ceSMauro Carvalho Chehabimplemented in ->setattr, which is now mandatory for filesystems 32525b532ceSMauro Carvalho Chehabimplementing on-disk size changes. Start with a copy of the old inode_setattr 32625b532ceSMauro Carvalho Chehaband vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to 32725b532ceSMauro Carvalho Chehabbe in order of zeroing blocks using block_truncate_page or similar helpers, 32825b532ceSMauro Carvalho Chehabsize update and on finally on-disk truncation which should not fail. 32925b532ceSMauro Carvalho Chehabsetattr_prepare (which used to be inode_change_ok) now includes the size checks 33025b532ceSMauro Carvalho Chehabfor ATTR_SIZE and must be called in the beginning of ->setattr unconditionally. 33125b532ceSMauro Carvalho Chehab 33225b532ceSMauro Carvalho Chehab**mandatory** 33325b532ceSMauro Carvalho Chehab 33425b532ceSMauro Carvalho Chehab->clear_inode() and ->delete_inode() are gone; ->evict_inode() should 33525b532ceSMauro Carvalho Chehabbe used instead. It gets called whenever the inode is evicted, whether it has 33625b532ceSMauro Carvalho Chehabremaining links or not. Caller does *not* evict the pagecache or inode-associated 33725b532ceSMauro Carvalho Chehabmetadata buffers; the method has to use truncate_inode_pages_final() to get rid 33825b532ceSMauro Carvalho Chehabof those. Caller makes sure async writeback cannot be running for the inode while 33925b532ceSMauro Carvalho Chehab(or after) ->evict_inode() is called. 34025b532ceSMauro Carvalho Chehab 34125b532ceSMauro Carvalho Chehab->drop_inode() returns int now; it's called on final iput() with 34225b532ceSMauro Carvalho Chehabinode->i_lock held and it returns true if filesystems wants the inode to be 34325b532ceSMauro Carvalho Chehabdropped. As before, generic_drop_inode() is still the default and it's been 34425b532ceSMauro Carvalho Chehabupdated appropriately. generic_delete_inode() is also alive and it consists 34525b532ceSMauro Carvalho Chehabsimply of return 1. Note that all actual eviction work is done by caller after 34625b532ceSMauro Carvalho Chehab->drop_inode() returns. 34725b532ceSMauro Carvalho Chehab 34825b532ceSMauro Carvalho ChehabAs before, clear_inode() must be called exactly once on each call of 34925b532ceSMauro Carvalho Chehab->evict_inode() (as it used to be for each call of ->delete_inode()). Unlike 35025b532ceSMauro Carvalho Chehabbefore, if you are using inode-associated metadata buffers (i.e. 35125b532ceSMauro Carvalho Chehabmark_buffer_dirty_inode()), it's your responsibility to call 35225b532ceSMauro Carvalho Chehabinvalidate_inode_buffers() before clear_inode(). 35325b532ceSMauro Carvalho Chehab 35425b532ceSMauro Carvalho ChehabNOTE: checking i_nlink in the beginning of ->write_inode() and bailing out 35525b532ceSMauro Carvalho Chehabif it's zero is not *and* *never* *had* *been* enough. Final unlink() and iput() 35625b532ceSMauro Carvalho Chehabmay happen while the inode is in the middle of ->write_inode(); e.g. if you blindly 35725b532ceSMauro Carvalho Chehabfree the on-disk inode, you may end up doing that while ->write_inode() is writing 35825b532ceSMauro Carvalho Chehabto it. 35925b532ceSMauro Carvalho Chehab 36025b532ceSMauro Carvalho Chehab--- 36125b532ceSMauro Carvalho Chehab 36225b532ceSMauro Carvalho Chehab**mandatory** 36325b532ceSMauro Carvalho Chehab 36425b532ceSMauro Carvalho Chehab.d_delete() now only advises the dcache as to whether or not to cache 36525b532ceSMauro Carvalho Chehabunreferenced dentries, and is now only called when the dentry refcount goes to 36625b532ceSMauro Carvalho Chehab0. Even on 0 refcount transition, it must be able to tolerate being called 0, 36725b532ceSMauro Carvalho Chehab1, or more times (eg. constant, idempotent). 36825b532ceSMauro Carvalho Chehab 36925b532ceSMauro Carvalho Chehab--- 37025b532ceSMauro Carvalho Chehab 37125b532ceSMauro Carvalho Chehab**mandatory** 37225b532ceSMauro Carvalho Chehab 37325b532ceSMauro Carvalho Chehab.d_compare() calling convention and locking rules are significantly 37425b532ceSMauro Carvalho Chehabchanged. Read updated documentation in Documentation/filesystems/vfs.rst (and 37525b532ceSMauro Carvalho Chehablook at examples of other filesystems) for guidance. 37625b532ceSMauro Carvalho Chehab 37725b532ceSMauro Carvalho Chehab--- 37825b532ceSMauro Carvalho Chehab 37925b532ceSMauro Carvalho Chehab**mandatory** 38025b532ceSMauro Carvalho Chehab 38125b532ceSMauro Carvalho Chehab.d_hash() calling convention and locking rules are significantly 38225b532ceSMauro Carvalho Chehabchanged. Read updated documentation in Documentation/filesystems/vfs.rst (and 38325b532ceSMauro Carvalho Chehablook at examples of other filesystems) for guidance. 38425b532ceSMauro Carvalho Chehab 38525b532ceSMauro Carvalho Chehab--- 38625b532ceSMauro Carvalho Chehab 38725b532ceSMauro Carvalho Chehab**mandatory** 38825b532ceSMauro Carvalho Chehab 38925b532ceSMauro Carvalho Chehabdcache_lock is gone, replaced by fine grained locks. See fs/dcache.c 39025b532ceSMauro Carvalho Chehabfor details of what locks to replace dcache_lock with in order to protect 39125b532ceSMauro Carvalho Chehabparticular things. Most of the time, a filesystem only needs ->d_lock, which 39225b532ceSMauro Carvalho Chehabprotects *all* the dcache state of a given dentry. 39325b532ceSMauro Carvalho Chehab 39425b532ceSMauro Carvalho Chehab--- 39525b532ceSMauro Carvalho Chehab 39625b532ceSMauro Carvalho Chehab**mandatory** 39725b532ceSMauro Carvalho Chehab 39825b532ceSMauro Carvalho ChehabFilesystems must RCU-free their inodes, if they can have been accessed 39925b532ceSMauro Carvalho Chehabvia rcu-walk path walk (basically, if the file can have had a path name in the 40025b532ceSMauro Carvalho Chehabvfs namespace). 40125b532ceSMauro Carvalho Chehab 40225b532ceSMauro Carvalho ChehabEven though i_dentry and i_rcu share storage in a union, we will 40325b532ceSMauro Carvalho Chehabinitialize the former in inode_init_always(), so just leave it alone in 40425b532ceSMauro Carvalho Chehabthe callback. It used to be necessary to clean it there, but not anymore 40525b532ceSMauro Carvalho Chehab(starting at 3.2). 40625b532ceSMauro Carvalho Chehab 40725b532ceSMauro Carvalho Chehab--- 40825b532ceSMauro Carvalho Chehab 40925b532ceSMauro Carvalho Chehab**recommended** 41025b532ceSMauro Carvalho Chehab 41125b532ceSMauro Carvalho Chehabvfs now tries to do path walking in "rcu-walk mode", which avoids 41225b532ceSMauro Carvalho Chehabatomic operations and scalability hazards on dentries and inodes (see 41325b532ceSMauro Carvalho ChehabDocumentation/filesystems/path-lookup.txt). d_hash and d_compare changes 41425b532ceSMauro Carvalho Chehab(above) are examples of the changes required to support this. For more complex 41525b532ceSMauro Carvalho Chehabfilesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so 41625b532ceSMauro Carvalho Chehabno changes are required to the filesystem. However, this is costly and loses 41725b532ceSMauro Carvalho Chehabthe benefits of rcu-walk mode. We will begin to add filesystem callbacks that 41825b532ceSMauro Carvalho Chehabare rcu-walk aware, shown below. Filesystems should take advantage of this 41925b532ceSMauro Carvalho Chehabwhere possible. 42025b532ceSMauro Carvalho Chehab 42125b532ceSMauro Carvalho Chehab--- 42225b532ceSMauro Carvalho Chehab 42325b532ceSMauro Carvalho Chehab**mandatory** 42425b532ceSMauro Carvalho Chehab 42525b532ceSMauro Carvalho Chehabd_revalidate is a callback that is made on every path element (if 42625b532ceSMauro Carvalho Chehabthe filesystem provides it), which requires dropping out of rcu-walk mode. This 42725b532ceSMauro Carvalho Chehabmay now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be 42825b532ceSMauro Carvalho Chehabreturned if the filesystem cannot handle rcu-walk. See 42925b532ceSMauro Carvalho ChehabDocumentation/filesystems/vfs.rst for more details. 43025b532ceSMauro Carvalho Chehab 43125b532ceSMauro Carvalho Chehabpermission is an inode permission check that is called on many or all 43225b532ceSMauro Carvalho Chehabdirectory inodes on the way down a path walk (to check for exec permission). It 43325b532ceSMauro Carvalho Chehabmust now be rcu-walk aware (mask & MAY_NOT_BLOCK). See 43425b532ceSMauro Carvalho ChehabDocumentation/filesystems/vfs.rst for more details. 43525b532ceSMauro Carvalho Chehab 43625b532ceSMauro Carvalho Chehab--- 43725b532ceSMauro Carvalho Chehab 43825b532ceSMauro Carvalho Chehab**mandatory** 43925b532ceSMauro Carvalho Chehab 44025b532ceSMauro Carvalho ChehabIn ->fallocate() you must check the mode option passed in. If your 44125b532ceSMauro Carvalho Chehabfilesystem does not support hole punching (deallocating space in the middle of a 44225b532ceSMauro Carvalho Chehabfile) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode. 44325b532ceSMauro Carvalho ChehabCurrently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set, 44425b532ceSMauro Carvalho Chehabso the i_size should not change when hole punching, even when puching the end of 44525b532ceSMauro Carvalho Chehaba file off. 44625b532ceSMauro Carvalho Chehab 44725b532ceSMauro Carvalho Chehab--- 44825b532ceSMauro Carvalho Chehab 44925b532ceSMauro Carvalho Chehab**mandatory** 45025b532ceSMauro Carvalho Chehab 45125b532ceSMauro Carvalho Chehab->get_sb() is gone. Switch to use of ->mount(). Typically it's just 45225b532ceSMauro Carvalho Chehaba matter of switching from calling ``get_sb_``... to ``mount_``... and changing 45325b532ceSMauro Carvalho Chehabthe function type. If you were doing it manually, just switch from setting 45425b532ceSMauro Carvalho Chehab->mnt_root to some pointer to returning that pointer. On errors return 45525b532ceSMauro Carvalho ChehabERR_PTR(...). 45625b532ceSMauro Carvalho Chehab 45725b532ceSMauro Carvalho Chehab--- 45825b532ceSMauro Carvalho Chehab 45925b532ceSMauro Carvalho Chehab**mandatory** 46025b532ceSMauro Carvalho Chehab 46125b532ceSMauro Carvalho Chehab->permission() and generic_permission()have lost flags 46225b532ceSMauro Carvalho Chehabargument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask. 46325b532ceSMauro Carvalho Chehab 46425b532ceSMauro Carvalho Chehabgeneric_permission() has also lost the check_acl argument; ACL checking 465cac2f8b8SChristian Braunerhas been taken to VFS and filesystems need to provide a non-NULL 466cac2f8b8SChristian Brauner->i_op->get_inode_acl to read an ACL from disk. 46725b532ceSMauro Carvalho Chehab 46825b532ceSMauro Carvalho Chehab--- 46925b532ceSMauro Carvalho Chehab 47025b532ceSMauro Carvalho Chehab**mandatory** 47125b532ceSMauro Carvalho Chehab 47225b532ceSMauro Carvalho ChehabIf you implement your own ->llseek() you must handle SEEK_HOLE and 473d56b699dSBjorn HelgaasSEEK_DATA. You can handle this by returning -EINVAL, but it would be nicer to 47425b532ceSMauro Carvalho Chehabsupport it in some way. The generic handler assumes that the entire file is 47525b532ceSMauro Carvalho Chehabdata and there is a virtual hole at the end of the file. So if the provided 47625b532ceSMauro Carvalho Chehaboffset is less than i_size and SEEK_DATA is specified, return the same offset. 47725b532ceSMauro Carvalho ChehabIf the above is true for the offset and you are given SEEK_HOLE, return the end 47825b532ceSMauro Carvalho Chehabof the file. If the offset is i_size or greater return -ENXIO in either case. 47925b532ceSMauro Carvalho Chehab 48025b532ceSMauro Carvalho Chehab**mandatory** 48125b532ceSMauro Carvalho Chehab 48225b532ceSMauro Carvalho ChehabIf you have your own ->fsync() you must make sure to call 48325b532ceSMauro Carvalho Chehabfilemap_write_and_wait_range() so that all dirty pages are synced out properly. 48425b532ceSMauro Carvalho ChehabYou must also keep in mind that ->fsync() is not called with i_mutex held 48525b532ceSMauro Carvalho Chehabanymore, so if you require i_mutex locking you must make sure to take it and 48625b532ceSMauro Carvalho Chehabrelease it yourself. 48725b532ceSMauro Carvalho Chehab 48825b532ceSMauro Carvalho Chehab--- 48925b532ceSMauro Carvalho Chehab 49025b532ceSMauro Carvalho Chehab**mandatory** 49125b532ceSMauro Carvalho Chehab 49225b532ceSMauro Carvalho Chehabd_alloc_root() is gone, along with a lot of bugs caused by code 49325b532ceSMauro Carvalho Chehabmisusing it. Replacement: d_make_root(inode). On success d_make_root(inode) 49425b532ceSMauro Carvalho Chehaballocates and returns a new dentry instantiated with the passed in inode. 49525b532ceSMauro Carvalho ChehabOn failure NULL is returned and the passed in inode is dropped so the reference 49625b532ceSMauro Carvalho Chehabto inode is consumed in all cases and failure handling need not do any cleanup 49725b532ceSMauro Carvalho Chehabfor the inode. If d_make_root(inode) is passed a NULL inode it returns NULL 49825b532ceSMauro Carvalho Chehaband also requires no further error handling. Typical usage is:: 49925b532ceSMauro Carvalho Chehab 50025b532ceSMauro Carvalho Chehab inode = foofs_new_inode(....); 50125b532ceSMauro Carvalho Chehab s->s_root = d_make_root(inode); 50225b532ceSMauro Carvalho Chehab if (!s->s_root) 50325b532ceSMauro Carvalho Chehab /* Nothing needed for the inode cleanup */ 50425b532ceSMauro Carvalho Chehab return -ENOMEM; 50525b532ceSMauro Carvalho Chehab ... 50625b532ceSMauro Carvalho Chehab 50725b532ceSMauro Carvalho Chehab--- 50825b532ceSMauro Carvalho Chehab 50925b532ceSMauro Carvalho Chehab**mandatory** 51025b532ceSMauro Carvalho Chehab 51125b532ceSMauro Carvalho ChehabThe witch is dead! Well, 2/3 of it, anyway. ->d_revalidate() and 51225b532ceSMauro Carvalho Chehab->lookup() do *not* take struct nameidata anymore; just the flags. 51325b532ceSMauro Carvalho Chehab 51425b532ceSMauro Carvalho Chehab--- 51525b532ceSMauro Carvalho Chehab 51625b532ceSMauro Carvalho Chehab**mandatory** 51725b532ceSMauro Carvalho Chehab 51825b532ceSMauro Carvalho Chehab->create() doesn't take ``struct nameidata *``; unlike the previous 51925b532ceSMauro Carvalho Chehabtwo, it gets "is it an O_EXCL or equivalent?" boolean argument. Note that 520d56b699dSBjorn Helgaaslocal filesystems can ignore this argument - they are guaranteed that the 52125b532ceSMauro Carvalho Chehabobject doesn't exist. It's remote/distributed ones that might care... 52225b532ceSMauro Carvalho Chehab 52325b532ceSMauro Carvalho Chehab--- 52425b532ceSMauro Carvalho Chehab 52525b532ceSMauro Carvalho Chehab**mandatory** 52625b532ceSMauro Carvalho Chehab 52725b532ceSMauro Carvalho ChehabFS_REVAL_DOT is gone; if you used to have it, add ->d_weak_revalidate() 52825b532ceSMauro Carvalho Chehabin your dentry operations instead. 52925b532ceSMauro Carvalho Chehab 53025b532ceSMauro Carvalho Chehab--- 53125b532ceSMauro Carvalho Chehab 53225b532ceSMauro Carvalho Chehab**mandatory** 53325b532ceSMauro Carvalho Chehab 53425b532ceSMauro Carvalho Chehabvfs_readdir() is gone; switch to iterate_dir() instead 53525b532ceSMauro Carvalho Chehab 53625b532ceSMauro Carvalho Chehab--- 53725b532ceSMauro Carvalho Chehab 53825b532ceSMauro Carvalho Chehab**mandatory** 53925b532ceSMauro Carvalho Chehab 5403e327154SLinus Torvalds->readdir() is gone now; switch to ->iterate_shared() 54125b532ceSMauro Carvalho Chehab 54225b532ceSMauro Carvalho Chehab**mandatory** 54325b532ceSMauro Carvalho Chehab 54425b532ceSMauro Carvalho Chehabvfs_follow_link has been removed. Filesystems must use nd_set_link 54525b532ceSMauro Carvalho Chehabfrom ->follow_link for normal symlinks, or nd_jump_link for magic 54625b532ceSMauro Carvalho Chehab/proc/<pid> style links. 54725b532ceSMauro Carvalho Chehab 54825b532ceSMauro Carvalho Chehab--- 54925b532ceSMauro Carvalho Chehab 55025b532ceSMauro Carvalho Chehab**mandatory** 55125b532ceSMauro Carvalho Chehab 55225b532ceSMauro Carvalho Chehabiget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be 55325b532ceSMauro Carvalho Chehabcalled with both ->i_lock and inode_hash_lock held; the former is *not* 55425b532ceSMauro Carvalho Chehabtaken anymore, so verify that your callbacks do not rely on it (none 55525b532ceSMauro Carvalho Chehabof the in-tree instances did). inode_hash_lock is still held, 55625b532ceSMauro Carvalho Chehabof course, so they are still serialized wrt removal from inode hash, 55725b532ceSMauro Carvalho Chehabas well as wrt set() callback of iget5_locked(). 55825b532ceSMauro Carvalho Chehab 55925b532ceSMauro Carvalho Chehab--- 56025b532ceSMauro Carvalho Chehab 56125b532ceSMauro Carvalho Chehab**mandatory** 56225b532ceSMauro Carvalho Chehab 56325b532ceSMauro Carvalho Chehabd_materialise_unique() is gone; d_splice_alias() does everything you 56425b532ceSMauro Carvalho Chehabneed now. Remember that they have opposite orders of arguments ;-/ 56525b532ceSMauro Carvalho Chehab 56625b532ceSMauro Carvalho Chehab--- 56725b532ceSMauro Carvalho Chehab 56825b532ceSMauro Carvalho Chehab**mandatory** 56925b532ceSMauro Carvalho Chehab 57025b532ceSMauro Carvalho Chehabf_dentry is gone; use f_path.dentry, or, better yet, see if you can avoid 57125b532ceSMauro Carvalho Chehabit entirely. 57225b532ceSMauro Carvalho Chehab 57325b532ceSMauro Carvalho Chehab--- 57425b532ceSMauro Carvalho Chehab 57525b532ceSMauro Carvalho Chehab**mandatory** 57625b532ceSMauro Carvalho Chehab 57725b532ceSMauro Carvalho Chehabnever call ->read() and ->write() directly; use __vfs_{read,write} or 57825b532ceSMauro Carvalho Chehabwrappers; instead of checking for ->write or ->read being NULL, look for 57925b532ceSMauro Carvalho ChehabFMODE_CAN_{WRITE,READ} in file->f_mode. 58025b532ceSMauro Carvalho Chehab 58125b532ceSMauro Carvalho Chehab--- 58225b532ceSMauro Carvalho Chehab 58325b532ceSMauro Carvalho Chehab**mandatory** 58425b532ceSMauro Carvalho Chehab 58525b532ceSMauro Carvalho Chehabdo _not_ use new_sync_{read,write} for ->read/->write; leave it NULL 58625b532ceSMauro Carvalho Chehabinstead. 58725b532ceSMauro Carvalho Chehab 58825b532ceSMauro Carvalho Chehab--- 58925b532ceSMauro Carvalho Chehab 59025b532ceSMauro Carvalho Chehab**mandatory** 59125b532ceSMauro Carvalho Chehab ->aio_read/->aio_write are gone. Use ->read_iter/->write_iter. 59225b532ceSMauro Carvalho Chehab 59325b532ceSMauro Carvalho Chehab--- 59425b532ceSMauro Carvalho Chehab 59525b532ceSMauro Carvalho Chehab**recommended** 59625b532ceSMauro Carvalho Chehab 59725b532ceSMauro Carvalho Chehabfor embedded ("fast") symlinks just set inode->i_link to wherever the 59825b532ceSMauro Carvalho Chehabsymlink body is and use simple_follow_link() as ->follow_link(). 59925b532ceSMauro Carvalho Chehab 60025b532ceSMauro Carvalho Chehab--- 60125b532ceSMauro Carvalho Chehab 60225b532ceSMauro Carvalho Chehab**mandatory** 60325b532ceSMauro Carvalho Chehab 60425b532ceSMauro Carvalho Chehabcalling conventions for ->follow_link() have changed. Instead of returning 60525b532ceSMauro Carvalho Chehabcookie and using nd_set_link() to store the body to traverse, we return 60625b532ceSMauro Carvalho Chehabthe body to traverse and store the cookie using explicit void ** argument. 60725b532ceSMauro Carvalho Chehabnameidata isn't passed at all - nd_jump_link() doesn't need it and 60825b532ceSMauro Carvalho Chehabnd_[gs]et_link() is gone. 60925b532ceSMauro Carvalho Chehab 61025b532ceSMauro Carvalho Chehab--- 61125b532ceSMauro Carvalho Chehab 61225b532ceSMauro Carvalho Chehab**mandatory** 61325b532ceSMauro Carvalho Chehab 61425b532ceSMauro Carvalho Chehabcalling conventions for ->put_link() have changed. It gets inode instead of 61525b532ceSMauro Carvalho Chehabdentry, it does not get nameidata at all and it gets called only when cookie 61625b532ceSMauro Carvalho Chehabis non-NULL. Note that link body isn't available anymore, so if you need it, 61725b532ceSMauro Carvalho Chehabstore it as cookie. 61825b532ceSMauro Carvalho Chehab 61925b532ceSMauro Carvalho Chehab--- 62025b532ceSMauro Carvalho Chehab 62125b532ceSMauro Carvalho Chehab**mandatory** 62225b532ceSMauro Carvalho Chehab 62325b532ceSMauro Carvalho Chehabany symlink that might use page_follow_link_light/page_put_link() must 62425b532ceSMauro Carvalho Chehabhave inode_nohighmem(inode) called before anything might start playing with 62525b532ceSMauro Carvalho Chehabits pagecache. No highmem pages should end up in the pagecache of such 62625b532ceSMauro Carvalho Chehabsymlinks. That includes any preseeding that might be done during symlink 62756f5746cSMatthew Wilcox (Oracle)creation. page_symlink() will honour the mapping gfp flags, so once 62825b532ceSMauro Carvalho Chehabyou've done inode_nohighmem() it's safe to use, but if you allocate and 62925b532ceSMauro Carvalho Chehabinsert the page manually, make sure to use the right gfp flags. 63025b532ceSMauro Carvalho Chehab 63125b532ceSMauro Carvalho Chehab--- 63225b532ceSMauro Carvalho Chehab 63325b532ceSMauro Carvalho Chehab**mandatory** 63425b532ceSMauro Carvalho Chehab 63525b532ceSMauro Carvalho Chehab->follow_link() is replaced with ->get_link(); same API, except that 63625b532ceSMauro Carvalho Chehab 63725b532ceSMauro Carvalho Chehab * ->get_link() gets inode as a separate argument 63825b532ceSMauro Carvalho Chehab * ->get_link() may be called in RCU mode - in that case NULL 63925b532ceSMauro Carvalho Chehab dentry is passed 64025b532ceSMauro Carvalho Chehab 64125b532ceSMauro Carvalho Chehab--- 64225b532ceSMauro Carvalho Chehab 64325b532ceSMauro Carvalho Chehab**mandatory** 64425b532ceSMauro Carvalho Chehab 64525b532ceSMauro Carvalho Chehab->get_link() gets struct delayed_call ``*done`` now, and should do 64625b532ceSMauro Carvalho Chehabset_delayed_call() where it used to set ``*cookie``. 64725b532ceSMauro Carvalho Chehab 64825b532ceSMauro Carvalho Chehab->put_link() is gone - just give the destructor to set_delayed_call() 64925b532ceSMauro Carvalho Chehabin ->get_link(). 65025b532ceSMauro Carvalho Chehab 65125b532ceSMauro Carvalho Chehab--- 65225b532ceSMauro Carvalho Chehab 65325b532ceSMauro Carvalho Chehab**mandatory** 65425b532ceSMauro Carvalho Chehab 65525b532ceSMauro Carvalho Chehab->getxattr() and xattr_handler.get() get dentry and inode passed separately. 65625b532ceSMauro Carvalho Chehabdentry might be yet to be attached to inode, so do _not_ use its ->d_inode 65725b532ceSMauro Carvalho Chehabin the instances. Rationale: !@#!@# security_d_instantiate() needs to be 65825b532ceSMauro Carvalho Chehabcalled before we attach dentry to inode. 65925b532ceSMauro Carvalho Chehab 66025b532ceSMauro Carvalho Chehab--- 66125b532ceSMauro Carvalho Chehab 66225b532ceSMauro Carvalho Chehab**mandatory** 66325b532ceSMauro Carvalho Chehab 66425b532ceSMauro Carvalho Chehabsymlinks are no longer the only inodes that do *not* have i_bdev/i_cdev/ 66525b532ceSMauro Carvalho Chehabi_pipe/i_link union zeroed out at inode eviction. As the result, you can't 66625b532ceSMauro Carvalho Chehabassume that non-NULL value in ->i_nlink at ->destroy_inode() implies that 66725b532ceSMauro Carvalho Chehabit's a symlink. Checking ->i_mode is really needed now. In-tree we had 66825b532ceSMauro Carvalho Chehabto fix shmem_destroy_callback() that used to take that kind of shortcut; 66925b532ceSMauro Carvalho Chehabwatch out, since that shortcut is no longer valid. 67025b532ceSMauro Carvalho Chehab 67125b532ceSMauro Carvalho Chehab--- 67225b532ceSMauro Carvalho Chehab 67325b532ceSMauro Carvalho Chehab**mandatory** 67425b532ceSMauro Carvalho Chehab 67525b532ceSMauro Carvalho Chehab->i_mutex is replaced with ->i_rwsem now. inode_lock() et.al. work as 67625b532ceSMauro Carvalho Chehabthey used to - they just take it exclusive. However, ->lookup() may be 67725b532ceSMauro Carvalho Chehabcalled with parent locked shared. Its instances must not 67825b532ceSMauro Carvalho Chehab 67925b532ceSMauro Carvalho Chehab * use d_instantiate) and d_rehash() separately - use d_add() or 68025b532ceSMauro Carvalho Chehab d_splice_alias() instead. 68125b532ceSMauro Carvalho Chehab * use d_rehash() alone - call d_add(new_dentry, NULL) instead. 68225b532ceSMauro Carvalho Chehab * in the unlikely case when (read-only) access to filesystem 68325b532ceSMauro Carvalho Chehab data structures needs exclusion for some reason, arrange it 68425b532ceSMauro Carvalho Chehab yourself. None of the in-tree filesystems needed that. 68525b532ceSMauro Carvalho Chehab * rely on ->d_parent and ->d_name not changing after dentry has 68625b532ceSMauro Carvalho Chehab been fed to d_add() or d_splice_alias(). Again, none of the 68725b532ceSMauro Carvalho Chehab in-tree instances relied upon that. 68825b532ceSMauro Carvalho Chehab 68925b532ceSMauro Carvalho ChehabWe are guaranteed that lookups of the same name in the same directory 69025b532ceSMauro Carvalho Chehabwill not happen in parallel ("same" in the sense of your ->d_compare()). 69125b532ceSMauro Carvalho ChehabLookups on different names in the same directory can and do happen in 69225b532ceSMauro Carvalho Chehabparallel now. 69325b532ceSMauro Carvalho Chehab 69425b532ceSMauro Carvalho Chehab--- 69525b532ceSMauro Carvalho Chehab 6963e327154SLinus Torvalds**mandatory** 69725b532ceSMauro Carvalho Chehab 6983e327154SLinus Torvalds->iterate_shared() is added. 69925b532ceSMauro Carvalho ChehabExclusion on struct file level is still provided (as well as that 70025b532ceSMauro Carvalho Chehabbetween it and lseek on the same struct file), but if your directory 70125b532ceSMauro Carvalho Chehabhas been opened several times, you can get these called in parallel. 70225b532ceSMauro Carvalho ChehabExclusion between that method and all directory-modifying ones is 70325b532ceSMauro Carvalho Chehabstill provided, of course. 70425b532ceSMauro Carvalho Chehab 7053e327154SLinus TorvaldsIf you have any per-inode or per-dentry in-core data structures modified 7063e327154SLinus Torvaldsby ->iterate_shared(), you might need something to serialize the access 7073e327154SLinus Torvaldsto them. If you do dcache pre-seeding, you'll need to switch to 7083e327154SLinus Torvaldsd_alloc_parallel() for that; look for in-tree examples. 70925b532ceSMauro Carvalho Chehab 71025b532ceSMauro Carvalho Chehab--- 71125b532ceSMauro Carvalho Chehab 71225b532ceSMauro Carvalho Chehab**mandatory** 71325b532ceSMauro Carvalho Chehab 71425b532ceSMauro Carvalho Chehab->atomic_open() calls without O_CREAT may happen in parallel. 71525b532ceSMauro Carvalho Chehab 71625b532ceSMauro Carvalho Chehab--- 71725b532ceSMauro Carvalho Chehab 71825b532ceSMauro Carvalho Chehab**mandatory** 71925b532ceSMauro Carvalho Chehab 72025b532ceSMauro Carvalho Chehab->setxattr() and xattr_handler.set() get dentry and inode passed separately. 721e65ce2a5SChristian BraunerThe xattr_handler.set() gets passed the user namespace of the mount the inode 722e65ce2a5SChristian Brauneris seen from so filesystems can idmap the i_uid and i_gid accordingly. 72325b532ceSMauro Carvalho Chehabdentry might be yet to be attached to inode, so do _not_ use its ->d_inode 72425b532ceSMauro Carvalho Chehabin the instances. Rationale: !@#!@# security_d_instantiate() needs to be 72525b532ceSMauro Carvalho Chehabcalled before we attach dentry to inode and !@#!@##!@$!$#!@#$!@$!@$ smack 72625b532ceSMauro Carvalho Chehab->d_instantiate() uses not just ->getxattr() but ->setxattr() as well. 72725b532ceSMauro Carvalho Chehab 72825b532ceSMauro Carvalho Chehab--- 72925b532ceSMauro Carvalho Chehab 73025b532ceSMauro Carvalho Chehab**mandatory** 73125b532ceSMauro Carvalho Chehab 73225b532ceSMauro Carvalho Chehab->d_compare() doesn't get parent as a separate argument anymore. If you 73325b532ceSMauro Carvalho Chehabused it for finding the struct super_block involved, dentry->d_sb will 73425b532ceSMauro Carvalho Chehabwork just as well; if it's something more complicated, use dentry->d_parent. 73525b532ceSMauro Carvalho ChehabJust be careful not to assume that fetching it more than once will yield 73625b532ceSMauro Carvalho Chehabthe same value - in RCU mode it could change under you. 73725b532ceSMauro Carvalho Chehab 73825b532ceSMauro Carvalho Chehab--- 73925b532ceSMauro Carvalho Chehab 74025b532ceSMauro Carvalho Chehab**mandatory** 74125b532ceSMauro Carvalho Chehab 74225b532ceSMauro Carvalho Chehab->rename() has an added flags argument. Any flags not handled by the 74325b532ceSMauro Carvalho Chehabfilesystem should result in EINVAL being returned. 74425b532ceSMauro Carvalho Chehab 74525b532ceSMauro Carvalho Chehab--- 74625b532ceSMauro Carvalho Chehab 74725b532ceSMauro Carvalho Chehab 74825b532ceSMauro Carvalho Chehab**recommended** 74925b532ceSMauro Carvalho Chehab 75025b532ceSMauro Carvalho Chehab->readlink is optional for symlinks. Don't set, unless filesystem needs 75125b532ceSMauro Carvalho Chehabto fake something for readlink(2). 75225b532ceSMauro Carvalho Chehab 75325b532ceSMauro Carvalho Chehab--- 75425b532ceSMauro Carvalho Chehab 75525b532ceSMauro Carvalho Chehab**mandatory** 75625b532ceSMauro Carvalho Chehab 75725b532ceSMauro Carvalho Chehab->getattr() is now passed a struct path rather than a vfsmount and 75825b532ceSMauro Carvalho Chehabdentry separately, and it now has request_mask and query_flags arguments 75925b532ceSMauro Carvalho Chehabto specify the fields and sync type requested by statx. Filesystems not 76025b532ceSMauro Carvalho Chehabsupporting any statx-specific features may ignore the new arguments. 76125b532ceSMauro Carvalho Chehab 76225b532ceSMauro Carvalho Chehab--- 76325b532ceSMauro Carvalho Chehab 76425b532ceSMauro Carvalho Chehab**mandatory** 76525b532ceSMauro Carvalho Chehab 76625b532ceSMauro Carvalho Chehab->atomic_open() calling conventions have changed. Gone is ``int *opened``, 76725b532ceSMauro Carvalho Chehabalong with FILE_OPENED/FILE_CREATED. In place of those we have 76825b532ceSMauro Carvalho ChehabFMODE_OPENED/FMODE_CREATED, set in file->f_mode. Additionally, return 76925b532ceSMauro Carvalho Chehabvalue for 'called finish_no_open(), open it yourself' case has become 77025b532ceSMauro Carvalho Chehab0, not 1. Since finish_no_open() itself is returning 0 now, that part 77125b532ceSMauro Carvalho Chehabdoes not need any changes in ->atomic_open() instances. 77225b532ceSMauro Carvalho Chehab 77325b532ceSMauro Carvalho Chehab--- 77425b532ceSMauro Carvalho Chehab 77525b532ceSMauro Carvalho Chehab**mandatory** 77625b532ceSMauro Carvalho Chehab 77725b532ceSMauro Carvalho Chehaballoc_file() has become static now; two wrappers are to be used instead. 77825b532ceSMauro Carvalho Chehaballoc_file_pseudo(inode, vfsmount, name, flags, ops) is for the cases 77925b532ceSMauro Carvalho Chehabwhen dentry needs to be created; that's the majority of old alloc_file() 78025b532ceSMauro Carvalho Chehabusers. Calling conventions: on success a reference to new struct file 78125b532ceSMauro Carvalho Chehabis returned and callers reference to inode is subsumed by that. On 78225b532ceSMauro Carvalho Chehabfailure, ERR_PTR() is returned and no caller's references are affected, 78325b532ceSMauro Carvalho Chehabso the caller needs to drop the inode reference it held. 78425b532ceSMauro Carvalho Chehaballoc_file_clone(file, flags, ops) does not affect any caller's references. 78525b532ceSMauro Carvalho ChehabOn success you get a new struct file sharing the mount/dentry with the 78625b532ceSMauro Carvalho Chehaboriginal, on failure - ERR_PTR(). 78725b532ceSMauro Carvalho Chehab 78825b532ceSMauro Carvalho Chehab--- 78925b532ceSMauro Carvalho Chehab 79025b532ceSMauro Carvalho Chehab**mandatory** 79125b532ceSMauro Carvalho Chehab 79225b532ceSMauro Carvalho Chehab->clone_file_range() and ->dedupe_file_range have been replaced with 79325b532ceSMauro Carvalho Chehab->remap_file_range(). See Documentation/filesystems/vfs.rst for more 79425b532ceSMauro Carvalho Chehabinformation. 79525b532ceSMauro Carvalho Chehab 79625b532ceSMauro Carvalho Chehab--- 79725b532ceSMauro Carvalho Chehab 79825b532ceSMauro Carvalho Chehab**recommended** 79925b532ceSMauro Carvalho Chehab 80025b532ceSMauro Carvalho Chehab->lookup() instances doing an equivalent of:: 80125b532ceSMauro Carvalho Chehab 80225b532ceSMauro Carvalho Chehab if (IS_ERR(inode)) 80325b532ceSMauro Carvalho Chehab return ERR_CAST(inode); 80425b532ceSMauro Carvalho Chehab return d_splice_alias(inode, dentry); 80525b532ceSMauro Carvalho Chehab 80625b532ceSMauro Carvalho Chehabdon't need to bother with the check - d_splice_alias() will do the 80725b532ceSMauro Carvalho Chehabright thing when given ERR_PTR(...) as inode. Moreover, passing NULL 80825b532ceSMauro Carvalho Chehabinode to d_splice_alias() will also do the right thing (equivalent of 80925b532ceSMauro Carvalho Chehabd_add(dentry, NULL); return NULL;), so that kind of special cases 81025b532ceSMauro Carvalho Chehabalso doesn't need a separate treatment. 81125b532ceSMauro Carvalho Chehab 81225b532ceSMauro Carvalho Chehab--- 81325b532ceSMauro Carvalho Chehab 81425b532ceSMauro Carvalho Chehab**strongly recommended** 81525b532ceSMauro Carvalho Chehab 81625b532ceSMauro Carvalho Chehabtake the RCU-delayed parts of ->destroy_inode() into a new method - 81725b532ceSMauro Carvalho Chehab->free_inode(). If ->destroy_inode() becomes empty - all the better, 81825b532ceSMauro Carvalho Chehabjust get rid of it. Synchronous work (e.g. the stuff that can't 81925b532ceSMauro Carvalho Chehabbe done from an RCU callback, or any WARN_ON() where we want the 82025b532ceSMauro Carvalho Chehabstack trace) *might* be movable to ->evict_inode(); however, 82125b532ceSMauro Carvalho Chehabthat goes only for the things that are not needed to balance something 82225b532ceSMauro Carvalho Chehabdone by ->alloc_inode(). IOW, if it's cleaning up the stuff that 82325b532ceSMauro Carvalho Chehabmight have accumulated over the life of in-core inode, ->evict_inode() 82425b532ceSMauro Carvalho Chehabmight be a fit. 82525b532ceSMauro Carvalho Chehab 82625b532ceSMauro Carvalho ChehabRules for inode destruction: 82725b532ceSMauro Carvalho Chehab 82825b532ceSMauro Carvalho Chehab * if ->destroy_inode() is non-NULL, it gets called 82925b532ceSMauro Carvalho Chehab * if ->free_inode() is non-NULL, it gets scheduled by call_rcu() 83025b532ceSMauro Carvalho Chehab * combination of NULL ->destroy_inode and NULL ->free_inode is 83125b532ceSMauro Carvalho Chehab treated as NULL/free_inode_nonrcu, to preserve the compatibility. 83225b532ceSMauro Carvalho Chehab 83325b532ceSMauro Carvalho ChehabNote that the callback (be it via ->free_inode() or explicit call_rcu() 83425b532ceSMauro Carvalho Chehabin ->destroy_inode()) is *NOT* ordered wrt superblock destruction; 83525b532ceSMauro Carvalho Chehabas the matter of fact, the superblock and all associated structures 83625b532ceSMauro Carvalho Chehabmight be already gone. The filesystem driver is guaranteed to be still 83725b532ceSMauro Carvalho Chehabthere, but that's it. Freeing memory in the callback is fine; doing 83825b532ceSMauro Carvalho Chehabmore than that is possible, but requires a lot of care and is best 83925b532ceSMauro Carvalho Chehabavoided. 84025b532ceSMauro Carvalho Chehab 84125b532ceSMauro Carvalho Chehab--- 84225b532ceSMauro Carvalho Chehab 84325b532ceSMauro Carvalho Chehab**mandatory** 84425b532ceSMauro Carvalho Chehab 84525b532ceSMauro Carvalho ChehabDCACHE_RCUACCESS is gone; having an RCU delay on dentry freeing is the 84625b532ceSMauro Carvalho Chehabdefault. DCACHE_NORCU opts out, and only d_alloc_pseudo() has any 84725b532ceSMauro Carvalho Chehabbusiness doing so. 84825b532ceSMauro Carvalho Chehab 84925b532ceSMauro Carvalho Chehab--- 85025b532ceSMauro Carvalho Chehab 85125b532ceSMauro Carvalho Chehab**mandatory** 85225b532ceSMauro Carvalho Chehab 85325b532ceSMauro Carvalho Chehabd_alloc_pseudo() is internal-only; uses outside of alloc_file_pseudo() are 85425b532ceSMauro Carvalho Chehabvery suspect (and won't work in modules). Such uses are very likely to 85525b532ceSMauro Carvalho Chehabbe misspelled d_alloc_anon(). 856d9a9f484SAl Viro 857d9a9f484SAl Viro--- 858d9a9f484SAl Viro 859d9a9f484SAl Viro**mandatory** 860d9a9f484SAl Viro 861d9a9f484SAl Viro[should've been added in 2016] stale comment in finish_open() nonwithstanding, 862d9a9f484SAl Virofailure exits in ->atomic_open() instances should *NOT* fput() the file, 863d9a9f484SAl Virono matter what. Everything is handled by the caller. 864df820f8dSMiklos Szeredi 865df820f8dSMiklos Szeredi--- 866df820f8dSMiklos Szeredi 867df820f8dSMiklos Szeredi**mandatory** 868df820f8dSMiklos Szeredi 869df820f8dSMiklos Szerediclone_private_mount() returns a longterm mount now, so the proper destructor of 870df820f8dSMiklos Szerediits result is kern_unmount() or kern_unmount_array(). 8719b2e0016SPavel Begunkov 8729b2e0016SPavel Begunkov--- 8739b2e0016SPavel Begunkov 8749b2e0016SPavel Begunkov**mandatory** 8759b2e0016SPavel Begunkov 8769b2e0016SPavel Begunkovzero-length bvec segments are disallowed, they must be filtered out before 8779b2e0016SPavel Begunkovpassed on to an iterator. 878c42bca92SPavel Begunkov 879c42bca92SPavel Begunkov--- 880c42bca92SPavel Begunkov 881c42bca92SPavel Begunkov**mandatory** 882c42bca92SPavel Begunkov 883c42bca92SPavel BegunkovFor bvec based itererators bio_iov_iter_get_pages() now doesn't copy bvecs but 884c42bca92SPavel Begunkovuses the one provided. Anyone issuing kiocb-I/O should ensure that the bvec and 885c42bca92SPavel Begunkovpage references stay until I/O has completed, i.e. until ->ki_complete() has 886c42bca92SPavel Begunkovbeen called or returned with non -EIOCBQUEUED code. 8875ceabb60SLinus Torvalds 8885ceabb60SLinus Torvalds--- 8895ceabb60SLinus Torvalds 8905ceabb60SLinus Torvalds**mandatory** 8915ceabb60SLinus Torvalds 89214e43bf4SEric Biggersmnt_want_write_file() can now only be paired with mnt_drop_write_file(), 89314e43bf4SEric Biggerswhereas previously it could be paired with mnt_drop_write() as well. 894f0b65f39SAl Viro 895f0b65f39SAl Viro--- 896f0b65f39SAl Viro 897f0b65f39SAl Viro**mandatory** 898f0b65f39SAl Viro 899f0b65f39SAl Viroiov_iter_copy_from_user_atomic() is gone; use copy_page_from_iter_atomic(). 900f0b65f39SAl ViroThe difference is copy_page_from_iter_atomic() advances the iterator and 901f0b65f39SAl Viroyou don't need iov_iter_advance() after it. However, if you decide to use 902f0b65f39SAl Viroonly a part of obtained data, you should do iov_iter_revert(). 90358ec9059SLinus Torvalds 90458ec9059SLinus Torvalds--- 90558ec9059SLinus Torvalds 90658ec9059SLinus Torvalds**mandatory** 90758ec9059SLinus Torvalds 908ffb37ca3SAl ViroCalling conventions for file_open_root() changed; now it takes struct path * 909ffb37ca3SAl Viroinstead of passing mount and dentry separately. For callers that used to 910ffb37ca3SAl Viropass <mnt, mnt->mnt_root> pair (i.e. the root of given mount), a new helper 911ffb37ca3SAl Virois provided - file_open_root_mnt(). In-tree users adjusted. 912868941b1SJason A. Donenfeld 913868941b1SJason A. Donenfeld--- 914868941b1SJason A. Donenfeld 915868941b1SJason A. Donenfeld**mandatory** 916868941b1SJason A. Donenfeld 917868941b1SJason A. Donenfeldno_llseek is gone; don't set .llseek to that - just leave it NULL instead. 918868941b1SJason A. DonenfeldChecks for "does that file have llseek(2), or should it fail with ESPIPE" 919868941b1SJason A. Donenfeldshould be done by looking at FMODE_LSEEK in file->f_mode. 92025885a35SAl Viro 92125885a35SAl Viro--- 92225885a35SAl Viro 92325885a35SAl Viro*mandatory* 92425885a35SAl Viro 92525885a35SAl Virofilldir_t (readdir callbacks) calling conventions have changed. Instead of 92625885a35SAl Viroreturning 0 or -E... it returns bool now. false means "no more" (as -E... used 92725885a35SAl Viroto) and true - "keep going" (as 0 in old calling conventions). Rationale: 9283e327154SLinus Torvaldscallers never looked at specific -E... values anyway. -> iterate_shared() 9293e327154SLinus Torvaldsinstances require no changes at all, all filldir_t ones in the tree 9303e327154SLinus Torvaldsconverted. 931f721d24eSLinus Torvalds 932f721d24eSLinus Torvalds--- 933f721d24eSLinus Torvalds 934863f144fSMiklos Szeredi**mandatory** 935863f144fSMiklos Szeredi 936863f144fSMiklos SzerediCalling conventions for ->tmpfile() have changed. It now takes a struct 937863f144fSMiklos Szeredifile pointer instead of struct dentry pointer. d_tmpfile() is similarly 938863f144fSMiklos Szeredichanged to simplify callers. The passed file is in a non-open state and on 939863f144fSMiklos Szeredisuccess must be opened before returning (e.g. by calling 940863f144fSMiklos Szeredifinish_open_simple()). 94140d49a3cSMatthew Wilcox (Oracle) 94240d49a3cSMatthew Wilcox (Oracle)--- 94340d49a3cSMatthew Wilcox (Oracle) 94440d49a3cSMatthew Wilcox (Oracle)**mandatory** 94540d49a3cSMatthew Wilcox (Oracle) 94640d49a3cSMatthew Wilcox (Oracle)Calling convention for ->huge_fault has changed. It now takes a page 94740d49a3cSMatthew Wilcox (Oracle)order instead of an enum page_entry_size, and it may be called without the 94840d49a3cSMatthew Wilcox (Oracle)mmap_lock held. All in-tree users have been audited and do not seem to 94940d49a3cSMatthew Wilcox (Oracle)depend on the mmap_lock being held, but out of tree users should verify 95040d49a3cSMatthew Wilcox (Oracle)for themselves. If they do need it, they can return VM_FAULT_RETRY to 95140d49a3cSMatthew Wilcox (Oracle)be called with the mmap_lock held. 9522ba0dd65SChristian Brauner 9532ba0dd65SChristian Brauner--- 9542ba0dd65SChristian Brauner 9552ba0dd65SChristian Brauner**mandatory** 9562ba0dd65SChristian Brauner 9572ba0dd65SChristian BraunerThe order of opening block devices and matching or creating superblocks has 9582ba0dd65SChristian Braunerchanged. 9592ba0dd65SChristian Brauner 9602ba0dd65SChristian BraunerThe old logic opened block devices first and then tried to find a 9612ba0dd65SChristian Braunersuitable superblock to reuse based on the block device pointer. 9622ba0dd65SChristian Brauner 9632ba0dd65SChristian BraunerThe new logic tries to find a suitable superblock first based on the device 9642ba0dd65SChristian Braunernumber, and opening the block device afterwards. 9652ba0dd65SChristian Brauner 9662ba0dd65SChristian BraunerSince opening block devices cannot happen under s_umount because of lock 9672ba0dd65SChristian Braunerordering requirements s_umount is now dropped while opening block devices and 9682ba0dd65SChristian Braunerreacquired before calling fill_super(). 9692ba0dd65SChristian Brauner 9702ba0dd65SChristian BraunerIn the old logic concurrent mounters would find the superblock on the list of 9712ba0dd65SChristian Braunersuperblocks for the filesystem type. Since the first opener of the block device 9722ba0dd65SChristian Braunerwould hold s_umount they would wait until the superblock became either born or 9732ba0dd65SChristian Braunerwas discarded due to initialization failure. 9742ba0dd65SChristian Brauner 9752ba0dd65SChristian BraunerSince the new logic drops s_umount concurrent mounters could grab s_umount and 9762ba0dd65SChristian Braunerwould spin. Instead they are now made to wait using an explicit wait-wake 9772ba0dd65SChristian Braunermechanism without having to hold s_umount. 978060e6c7dSChristian Brauner 979060e6c7dSChristian Brauner--- 980060e6c7dSChristian Brauner 981060e6c7dSChristian Brauner**mandatory** 982060e6c7dSChristian Brauner 983060e6c7dSChristian BraunerThe holder of a block device is now the superblock. 984060e6c7dSChristian Brauner 985060e6c7dSChristian BraunerThe holder of a block device used to be the file_system_type which wasn't 986060e6c7dSChristian Braunerparticularly useful. It wasn't possible to go from block device to owning 987060e6c7dSChristian Braunersuperblock without matching on the device pointer stored in the superblock. 988060e6c7dSChristian BraunerThis mechanism would only work for a single device so the block layer couldn't 989060e6c7dSChristian Braunerfind the owning superblock of any additional devices. 990060e6c7dSChristian Brauner 991060e6c7dSChristian BraunerIn the old mechanism reusing or creating a superblock for a racing mount(2) and 992060e6c7dSChristian Braunerumount(2) relied on the file_system_type as the holder. This was severly 993060e6c7dSChristian Braunerunderdocumented however: 994060e6c7dSChristian Brauner 995060e6c7dSChristian Brauner(1) Any concurrent mounter that managed to grab an active reference on an 996060e6c7dSChristian Brauner existing superblock was made to wait until the superblock either became 997060e6c7dSChristian Brauner ready or until the superblock was removed from the list of superblocks of 998060e6c7dSChristian Brauner the filesystem type. If the superblock is ready the caller would simple 999060e6c7dSChristian Brauner reuse it. 1000060e6c7dSChristian Brauner 1001060e6c7dSChristian Brauner(2) If the mounter came after deactivate_locked_super() but before 1002060e6c7dSChristian Brauner the superblock had been removed from the list of superblocks of the 1003060e6c7dSChristian Brauner filesystem type the mounter would wait until the superblock was shutdown, 1004060e6c7dSChristian Brauner reuse the block device and allocate a new superblock. 1005060e6c7dSChristian Brauner 1006060e6c7dSChristian Brauner(3) If the mounter came after deactivate_locked_super() and after 1007060e6c7dSChristian Brauner the superblock had been removed from the list of superblocks of the 1008060e6c7dSChristian Brauner filesystem type the mounter would reuse the block device and allocate a new 1009060e6c7dSChristian Brauner superblock (the bd_holder point may still be set to the filesystem type). 1010060e6c7dSChristian Brauner 1011060e6c7dSChristian BraunerBecause the holder of the block device was the file_system_type any concurrent 1012060e6c7dSChristian Braunermounter could open the block devices of any superblock of the same 1013060e6c7dSChristian Braunerfile_system_type without risking seeing EBUSY because the block device was 1014060e6c7dSChristian Braunerstill in use by another superblock. 1015060e6c7dSChristian Brauner 1016060e6c7dSChristian BraunerMaking the superblock the owner of the block device changes this as the holder 1017060e6c7dSChristian Brauneris now a unique superblock and thus block devices associated with it cannot be 1018060e6c7dSChristian Braunerreused by concurrent mounters. So a concurrent mounter in (2) could suddenly 1019060e6c7dSChristian Braunersee EBUSY when trying to open a block device whose holder was a different 1020060e6c7dSChristian Braunersuperblock. 1021060e6c7dSChristian Brauner 1022060e6c7dSChristian BraunerThe new logic thus waits until the superblock and the devices are shutdown in 1023060e6c7dSChristian Brauner->kill_sb(). Removal of the superblock from the list of superblocks of the 1024060e6c7dSChristian Braunerfilesystem type is now moved to a later point when the devices are closed: 1025060e6c7dSChristian Brauner 1026060e6c7dSChristian Brauner(1) Any concurrent mounter managing to grab an active reference on an existing 1027060e6c7dSChristian Brauner superblock is made to wait until the superblock is either ready or until 1028060e6c7dSChristian Brauner the superblock and all devices are shutdown in ->kill_sb(). If the 1029060e6c7dSChristian Brauner superblock is ready the caller will simply reuse it. 1030060e6c7dSChristian Brauner 1031060e6c7dSChristian Brauner(2) If the mounter comes after deactivate_locked_super() but before 1032060e6c7dSChristian Brauner the superblock has been removed from the list of superblocks of the 1033060e6c7dSChristian Brauner filesystem type the mounter is made to wait until the superblock and the 1034060e6c7dSChristian Brauner devices are shut down in ->kill_sb() and the superblock is removed from the 1035060e6c7dSChristian Brauner list of superblocks of the filesystem type. The mounter will allocate a new 1036060e6c7dSChristian Brauner superblock and grab ownership of the block device (the bd_holder pointer of 1037060e6c7dSChristian Brauner the block device will be set to the newly allocated superblock). 1038060e6c7dSChristian Brauner 1039060e6c7dSChristian Brauner(3) This case is now collapsed into (2) as the superblock is left on the list 1040060e6c7dSChristian Brauner of superblocks of the filesystem type until all devices are shutdown in 1041060e6c7dSChristian Brauner ->kill_sb(). In other words, if the superblock isn't on the list of 1042060e6c7dSChristian Brauner superblock of the filesystem type anymore then it has given up ownership of 1043060e6c7dSChristian Brauner all associated block devices (the bd_holder pointer is NULL). 1044060e6c7dSChristian Brauner 1045060e6c7dSChristian BraunerAs this is a VFS level change it has no practical consequences for filesystems 1046060e6c7dSChristian Braunerother than that all of them must use one of the provided kill_litter_super(), 1047060e6c7dSChristian Braunerkill_anon_super(), or kill_block_super() helpers. 10485aa9130aSChristian Brauner 10495aa9130aSChristian Brauner--- 10505aa9130aSChristian Brauner 10515aa9130aSChristian Brauner**mandatory** 10525aa9130aSChristian Brauner 10535aa9130aSChristian BraunerLock ordering has been changed so that s_umount ranks above open_mutex again. 10545aa9130aSChristian BraunerAll places where s_umount was taken under open_mutex have been fixed up. 105513d88ac5SLinus Torvalds 105613d88ac5SLinus Torvalds--- 105713d88ac5SLinus Torvalds 105813d88ac5SLinus Torvalds**mandatory** 105913d88ac5SLinus Torvalds 1060e21fc203SAmir Goldsteinexport_operations ->encode_fh() no longer has a default implementation to 1061e21fc203SAmir Goldsteinencode FILEID_INO32_GEN* file handles. 1062e21fc203SAmir GoldsteinFilesystems that used the default implementation may use the generic helper 1063e21fc203SAmir Goldsteingeneric_encode_ino32_fh() explicitly. 1064*01bc8e9aSChristian Brauner 1065*01bc8e9aSChristian Brauner--- 1066*01bc8e9aSChristian Brauner 1067*01bc8e9aSChristian Brauner**recommended** 1068*01bc8e9aSChristian Brauner 1069*01bc8e9aSChristian BraunerBlock device freezing and thawing have been moved to holder operations. 1070*01bc8e9aSChristian Brauner 1071*01bc8e9aSChristian BraunerBefore this change, get_active_super() would only be able to find the 1072*01bc8e9aSChristian Braunersuperblock of the main block device, i.e., the one stored in sb->s_bdev. Block 1073*01bc8e9aSChristian Braunerdevice freezing now works for any block device owned by a given superblock, not 1074*01bc8e9aSChristian Braunerjust the main block device. The get_active_super() helper and bd_fsfreeze_sb 1075*01bc8e9aSChristian Braunerpointer are gone. 1076