xref: /linux/Documentation/filesystems/porting.rst (revision 5ad8b6ad9a08abdbc8c57a51a5faaf2ef1afc547)
125b532ceSMauro Carvalho Chehab====================
225b532ceSMauro Carvalho ChehabChanges since 2.5.0:
325b532ceSMauro Carvalho Chehab====================
425b532ceSMauro Carvalho Chehab
525b532ceSMauro Carvalho Chehab---
625b532ceSMauro Carvalho Chehab
725b532ceSMauro Carvalho Chehab**recommended**
825b532ceSMauro Carvalho Chehab
925b532ceSMauro Carvalho ChehabNew helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(),
1025b532ceSMauro Carvalho Chehabsb_set_blocksize() and sb_min_blocksize().
1125b532ceSMauro Carvalho Chehab
1225b532ceSMauro Carvalho ChehabUse them.
1325b532ceSMauro Carvalho Chehab
1425b532ceSMauro Carvalho Chehab(sb_find_get_block() replaces 2.4's get_hash_table())
1525b532ceSMauro Carvalho Chehab
1625b532ceSMauro Carvalho Chehab---
1725b532ceSMauro Carvalho Chehab
1825b532ceSMauro Carvalho Chehab**recommended**
1925b532ceSMauro Carvalho Chehab
2025b532ceSMauro Carvalho ChehabNew methods: ->alloc_inode() and ->destroy_inode().
2125b532ceSMauro Carvalho Chehab
2225b532ceSMauro Carvalho ChehabRemove inode->u.foo_inode_i
2325b532ceSMauro Carvalho Chehab
2425b532ceSMauro Carvalho ChehabDeclare::
2525b532ceSMauro Carvalho Chehab
2625b532ceSMauro Carvalho Chehab	struct foo_inode_info {
2725b532ceSMauro Carvalho Chehab		/* fs-private stuff */
2825b532ceSMauro Carvalho Chehab		struct inode vfs_inode;
2925b532ceSMauro Carvalho Chehab	};
3025b532ceSMauro Carvalho Chehab	static inline struct foo_inode_info *FOO_I(struct inode *inode)
3125b532ceSMauro Carvalho Chehab	{
3225b532ceSMauro Carvalho Chehab		return list_entry(inode, struct foo_inode_info, vfs_inode);
3325b532ceSMauro Carvalho Chehab	}
3425b532ceSMauro Carvalho Chehab
3525b532ceSMauro Carvalho ChehabUse FOO_I(inode) instead of &inode->u.foo_inode_i;
3625b532ceSMauro Carvalho Chehab
3725b532ceSMauro Carvalho ChehabAdd foo_alloc_inode() and foo_destroy_inode() - the former should allocate
3825b532ceSMauro Carvalho Chehabfoo_inode_info and return the address of ->vfs_inode, the latter should free
3925b532ceSMauro Carvalho ChehabFOO_I(inode) (see in-tree filesystems for examples).
4025b532ceSMauro Carvalho Chehab
4125b532ceSMauro Carvalho ChehabMake them ->alloc_inode and ->destroy_inode in your super_operations.
4225b532ceSMauro Carvalho Chehab
4325b532ceSMauro Carvalho ChehabKeep in mind that now you need explicit initialization of private data
4425b532ceSMauro Carvalho Chehabtypically between calling iget_locked() and unlocking the inode.
4525b532ceSMauro Carvalho Chehab
4625b532ceSMauro Carvalho ChehabAt some point that will become mandatory.
4725b532ceSMauro Carvalho Chehab
488b9f3ac5SMuchun Song**mandatory**
498b9f3ac5SMuchun Song
508b9f3ac5SMuchun SongThe foo_inode_info should always be allocated through alloc_inode_sb() rather
518b9f3ac5SMuchun Songthan kmem_cache_alloc() or kmalloc() related to set up the inode reclaim context
528b9f3ac5SMuchun Songcorrectly.
538b9f3ac5SMuchun Song
5425b532ceSMauro Carvalho Chehab---
5525b532ceSMauro Carvalho Chehab
5625b532ceSMauro Carvalho Chehab**mandatory**
5725b532ceSMauro Carvalho Chehab
5825b532ceSMauro Carvalho ChehabChange of file_system_type method (->read_super to ->get_sb)
5925b532ceSMauro Carvalho Chehab
6025b532ceSMauro Carvalho Chehab->read_super() is no more.  Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV.
6125b532ceSMauro Carvalho Chehab
6225b532ceSMauro Carvalho ChehabTurn your foo_read_super() into a function that would return 0 in case of
6325b532ceSMauro Carvalho Chehabsuccess and negative number in case of error (-EINVAL unless you have more
6425b532ceSMauro Carvalho Chehabinformative error value to report).  Call it foo_fill_super().  Now declare::
6525b532ceSMauro Carvalho Chehab
6625b532ceSMauro Carvalho Chehab  int foo_get_sb(struct file_system_type *fs_type,
6725b532ceSMauro Carvalho Chehab	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
6825b532ceSMauro Carvalho Chehab  {
6925b532ceSMauro Carvalho Chehab	return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super,
7025b532ceSMauro Carvalho Chehab			   mnt);
7125b532ceSMauro Carvalho Chehab  }
7225b532ceSMauro Carvalho Chehab
7325b532ceSMauro Carvalho Chehab(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of
7425b532ceSMauro Carvalho Chehabfilesystem).
7525b532ceSMauro Carvalho Chehab
7625b532ceSMauro Carvalho ChehabReplace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as
7725b532ceSMauro Carvalho Chehabfoo_get_sb.
7825b532ceSMauro Carvalho Chehab
7925b532ceSMauro Carvalho Chehab---
8025b532ceSMauro Carvalho Chehab
8125b532ceSMauro Carvalho Chehab**mandatory**
8225b532ceSMauro Carvalho Chehab
8325b532ceSMauro Carvalho ChehabLocking change: ->s_vfs_rename_sem is taken only by cross-directory renames.
8425b532ceSMauro Carvalho ChehabMost likely there is no need to change anything, but if you relied on
8525b532ceSMauro Carvalho Chehabglobal exclusion between renames for some internal purpose - you need to
8625b532ceSMauro Carvalho Chehabchange your internal locking.  Otherwise exclusion warranties remain the
8725b532ceSMauro Carvalho Chehabsame (i.e. parents and victim are locked, etc.).
8825b532ceSMauro Carvalho Chehab
8925b532ceSMauro Carvalho Chehab---
9025b532ceSMauro Carvalho Chehab
9125b532ceSMauro Carvalho Chehab**informational**
9225b532ceSMauro Carvalho Chehab
9325b532ceSMauro Carvalho ChehabNow we have the exclusion between ->lookup() and directory removal (by
9425b532ceSMauro Carvalho Chehab->rmdir() and ->rename()).  If you used to need that exclusion and do
9525b532ceSMauro Carvalho Chehabit by internal locking (most of filesystems couldn't care less) - you
9625b532ceSMauro Carvalho Chehabcan relax your locking.
9725b532ceSMauro Carvalho Chehab
9825b532ceSMauro Carvalho Chehab---
9925b532ceSMauro Carvalho Chehab
10025b532ceSMauro Carvalho Chehab**mandatory**
10125b532ceSMauro Carvalho Chehab
10225b532ceSMauro Carvalho Chehab->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(),
10325b532ceSMauro Carvalho Chehab->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename()
10425b532ceSMauro Carvalho Chehaband ->readdir() are called without BKL now.  Grab it on entry, drop upon return
10525b532ceSMauro Carvalho Chehab- that will guarantee the same locking you used to have.  If your method or its
10625b532ceSMauro Carvalho Chehabparts do not need BKL - better yet, now you can shift lock_kernel() and
10725b532ceSMauro Carvalho Chehabunlock_kernel() so that they would protect exactly what needs to be
10825b532ceSMauro Carvalho Chehabprotected.
10925b532ceSMauro Carvalho Chehab
11025b532ceSMauro Carvalho Chehab---
11125b532ceSMauro Carvalho Chehab
11225b532ceSMauro Carvalho Chehab**mandatory**
11325b532ceSMauro Carvalho Chehab
11425b532ceSMauro Carvalho ChehabBKL is also moved from around sb operations. BKL should have been shifted into
11525b532ceSMauro Carvalho Chehabindividual fs sb_op functions.  If you don't need it, remove it.
11625b532ceSMauro Carvalho Chehab
11725b532ceSMauro Carvalho Chehab---
11825b532ceSMauro Carvalho Chehab
11925b532ceSMauro Carvalho Chehab**informational**
12025b532ceSMauro Carvalho Chehab
12125b532ceSMauro Carvalho Chehabcheck for ->link() target not being a directory is done by callers.  Feel
12225b532ceSMauro Carvalho Chehabfree to drop it...
12325b532ceSMauro Carvalho Chehab
12425b532ceSMauro Carvalho Chehab---
12525b532ceSMauro Carvalho Chehab
12625b532ceSMauro Carvalho Chehab**informational**
12725b532ceSMauro Carvalho Chehab
12825b532ceSMauro Carvalho Chehab->link() callers hold ->i_mutex on the object we are linking to.  Some of your
12925b532ceSMauro Carvalho Chehabproblems might be over...
13025b532ceSMauro Carvalho Chehab
13125b532ceSMauro Carvalho Chehab---
13225b532ceSMauro Carvalho Chehab
13325b532ceSMauro Carvalho Chehab**mandatory**
13425b532ceSMauro Carvalho Chehab
13525b532ceSMauro Carvalho Chehabnew file_system_type method - kill_sb(superblock).  If you are converting
13625b532ceSMauro Carvalho Chehaban existing filesystem, set it according to ->fs_flags::
13725b532ceSMauro Carvalho Chehab
13825b532ceSMauro Carvalho Chehab	FS_REQUIRES_DEV		-	kill_block_super
13925b532ceSMauro Carvalho Chehab	FS_LITTER		-	kill_litter_super
14025b532ceSMauro Carvalho Chehab	neither			-	kill_anon_super
14125b532ceSMauro Carvalho Chehab
14225b532ceSMauro Carvalho ChehabFS_LITTER is gone - just remove it from fs_flags.
14325b532ceSMauro Carvalho Chehab
14425b532ceSMauro Carvalho Chehab---
14525b532ceSMauro Carvalho Chehab
14625b532ceSMauro Carvalho Chehab**mandatory**
14725b532ceSMauro Carvalho Chehab
14825b532ceSMauro Carvalho ChehabFS_SINGLE is gone (actually, that had happened back when ->get_sb()
14925b532ceSMauro Carvalho Chehabwent in - and hadn't been documented ;-/).  Just remove it from fs_flags
15025b532ceSMauro Carvalho Chehab(and see ->get_sb() entry for other actions).
15125b532ceSMauro Carvalho Chehab
15225b532ceSMauro Carvalho Chehab---
15325b532ceSMauro Carvalho Chehab
15425b532ceSMauro Carvalho Chehab**mandatory**
15525b532ceSMauro Carvalho Chehab
15625b532ceSMauro Carvalho Chehab->setattr() is called without BKL now.  Caller _always_ holds ->i_mutex, so
15725b532ceSMauro Carvalho Chehabwatch for ->i_mutex-grabbing code that might be used by your ->setattr().
15825b532ceSMauro Carvalho ChehabCallers of notify_change() need ->i_mutex now.
15925b532ceSMauro Carvalho Chehab
16025b532ceSMauro Carvalho Chehab---
16125b532ceSMauro Carvalho Chehab
16225b532ceSMauro Carvalho Chehab**recommended**
16325b532ceSMauro Carvalho Chehab
16425b532ceSMauro Carvalho ChehabNew super_block field ``struct export_operations *s_export_op`` for
16525b532ceSMauro Carvalho Chehabexplicit support for exporting, e.g. via NFS.  The structure is fully
16625b532ceSMauro Carvalho Chehabdocumented at its declaration in include/linux/fs.h, and in
1679195c3e8SMauro Carvalho ChehabDocumentation/filesystems/nfs/exporting.rst.
16825b532ceSMauro Carvalho Chehab
16925b532ceSMauro Carvalho ChehabBriefly it allows for the definition of decode_fh and encode_fh operations
17025b532ceSMauro Carvalho Chehabto encode and decode filehandles, and allows the filesystem to use
17125b532ceSMauro Carvalho Chehaba standard helper function for decode_fh, and provide file-system specific
17225b532ceSMauro Carvalho Chehabsupport for this helper, particularly get_parent.
17325b532ceSMauro Carvalho Chehab
17425b532ceSMauro Carvalho ChehabIt is planned that this will be required for exporting once the code
17525b532ceSMauro Carvalho Chehabsettles down a bit.
17625b532ceSMauro Carvalho Chehab
17725b532ceSMauro Carvalho Chehab**mandatory**
17825b532ceSMauro Carvalho Chehab
17925b532ceSMauro Carvalho Chehabs_export_op is now required for exporting a filesystem.
180d56b699dSBjorn Helgaasisofs, ext2, ext3, reiserfs, fat
18125b532ceSMauro Carvalho Chehabcan be used as examples of very different filesystems.
18225b532ceSMauro Carvalho Chehab
18325b532ceSMauro Carvalho Chehab---
18425b532ceSMauro Carvalho Chehab
18525b532ceSMauro Carvalho Chehab**mandatory**
18625b532ceSMauro Carvalho Chehab
18725b532ceSMauro Carvalho Chehabiget4() and the read_inode2 callback have been superseded by iget5_locked()
18825b532ceSMauro Carvalho Chehabwhich has the following prototype::
18925b532ceSMauro Carvalho Chehab
19025b532ceSMauro Carvalho Chehab    struct inode *iget5_locked(struct super_block *sb, unsigned long ino,
19125b532ceSMauro Carvalho Chehab				int (*test)(struct inode *, void *),
19225b532ceSMauro Carvalho Chehab				int (*set)(struct inode *, void *),
19325b532ceSMauro Carvalho Chehab				void *data);
19425b532ceSMauro Carvalho Chehab
19525b532ceSMauro Carvalho Chehab'test' is an additional function that can be used when the inode
19625b532ceSMauro Carvalho Chehabnumber is not sufficient to identify the actual file object. 'set'
19725b532ceSMauro Carvalho Chehabshould be a non-blocking function that initializes those parts of a
19825b532ceSMauro Carvalho Chehabnewly created inode to allow the test function to succeed. 'data' is
19925b532ceSMauro Carvalho Chehabpassed as an opaque value to both test and set functions.
20025b532ceSMauro Carvalho Chehab
20125b532ceSMauro Carvalho ChehabWhen the inode has been created by iget5_locked(), it will be returned with the
20225b532ceSMauro Carvalho ChehabI_NEW flag set and will still be locked.  The filesystem then needs to finalize
20325b532ceSMauro Carvalho Chehabthe initialization. Once the inode is initialized it must be unlocked by
20425b532ceSMauro Carvalho Chehabcalling unlock_new_inode().
20525b532ceSMauro Carvalho Chehab
20625b532ceSMauro Carvalho ChehabThe filesystem is responsible for setting (and possibly testing) i_ino
20725b532ceSMauro Carvalho Chehabwhen appropriate. There is also a simpler iget_locked function that
20825b532ceSMauro Carvalho Chehabjust takes the superblock and inode number as arguments and does the
20925b532ceSMauro Carvalho Chehabtest and set for you.
21025b532ceSMauro Carvalho Chehab
21125b532ceSMauro Carvalho Chehabe.g.::
21225b532ceSMauro Carvalho Chehab
21325b532ceSMauro Carvalho Chehab	inode = iget_locked(sb, ino);
21425b532ceSMauro Carvalho Chehab	if (inode->i_state & I_NEW) {
21525b532ceSMauro Carvalho Chehab		err = read_inode_from_disk(inode);
21625b532ceSMauro Carvalho Chehab		if (err < 0) {
21725b532ceSMauro Carvalho Chehab			iget_failed(inode);
21825b532ceSMauro Carvalho Chehab			return err;
21925b532ceSMauro Carvalho Chehab		}
22025b532ceSMauro Carvalho Chehab		unlock_new_inode(inode);
22125b532ceSMauro Carvalho Chehab	}
22225b532ceSMauro Carvalho Chehab
22325b532ceSMauro Carvalho ChehabNote that if the process of setting up a new inode fails, then iget_failed()
22425b532ceSMauro Carvalho Chehabshould be called on the inode to render it dead, and an appropriate error
22525b532ceSMauro Carvalho Chehabshould be passed back to the caller.
22625b532ceSMauro Carvalho Chehab
22725b532ceSMauro Carvalho Chehab---
22825b532ceSMauro Carvalho Chehab
22925b532ceSMauro Carvalho Chehab**recommended**
23025b532ceSMauro Carvalho Chehab
23125b532ceSMauro Carvalho Chehab->getattr() finally getting used.  See instances in nfs, minix, etc.
23225b532ceSMauro Carvalho Chehab
23325b532ceSMauro Carvalho Chehab---
23425b532ceSMauro Carvalho Chehab
23525b532ceSMauro Carvalho Chehab**mandatory**
23625b532ceSMauro Carvalho Chehab
23725b532ceSMauro Carvalho Chehab->revalidate() is gone.  If your filesystem had it - provide ->getattr()
23825b532ceSMauro Carvalho Chehaband let it call whatever you had as ->revlidate() + (for symlinks that
23925b532ceSMauro Carvalho Chehabhad ->revalidate()) add calls in ->follow_link()/->readlink().
24025b532ceSMauro Carvalho Chehab
24125b532ceSMauro Carvalho Chehab---
24225b532ceSMauro Carvalho Chehab
24325b532ceSMauro Carvalho Chehab**mandatory**
24425b532ceSMauro Carvalho Chehab
24525b532ceSMauro Carvalho Chehab->d_parent changes are not protected by BKL anymore.  Read access is safe
24625b532ceSMauro Carvalho Chehabif at least one of the following is true:
24725b532ceSMauro Carvalho Chehab
24825b532ceSMauro Carvalho Chehab	* filesystem has no cross-directory rename()
24925b532ceSMauro Carvalho Chehab	* we know that parent had been locked (e.g. we are looking at
25025b532ceSMauro Carvalho Chehab	  ->d_parent of ->lookup() argument).
25125b532ceSMauro Carvalho Chehab	* we are called from ->rename().
25225b532ceSMauro Carvalho Chehab	* the child's ->d_lock is held
25325b532ceSMauro Carvalho Chehab
25425b532ceSMauro Carvalho ChehabAudit your code and add locking if needed.  Notice that any place that is
25525b532ceSMauro Carvalho Chehabnot protected by the conditions above is risky even in the old tree - you
25625b532ceSMauro Carvalho Chehabhad been relying on BKL and that's prone to screwups.  Old tree had quite
25725b532ceSMauro Carvalho Chehaba few holes of that kind - unprotected access to ->d_parent leading to
25825b532ceSMauro Carvalho Chehabanything from oops to silent memory corruption.
25925b532ceSMauro Carvalho Chehab
26025b532ceSMauro Carvalho Chehab---
26125b532ceSMauro Carvalho Chehab
26225b532ceSMauro Carvalho Chehab**mandatory**
26325b532ceSMauro Carvalho Chehab
26425b532ceSMauro Carvalho ChehabFS_NOMOUNT is gone.  If you use it - just set SB_NOUSER in flags
26525b532ceSMauro Carvalho Chehab(see rootfs for one kind of solution and bdev/socket/pipe for another).
26625b532ceSMauro Carvalho Chehab
26725b532ceSMauro Carvalho Chehab---
26825b532ceSMauro Carvalho Chehab
26925b532ceSMauro Carvalho Chehab**recommended**
27025b532ceSMauro Carvalho Chehab
27125b532ceSMauro Carvalho ChehabUse bdev_read_only(bdev) instead of is_read_only(kdev).  The latter
27225b532ceSMauro Carvalho Chehabis still alive, but only because of the mess in drivers/s390/block/dasd.c.
27325b532ceSMauro Carvalho ChehabAs soon as it gets fixed is_read_only() will die.
27425b532ceSMauro Carvalho Chehab
27525b532ceSMauro Carvalho Chehab---
27625b532ceSMauro Carvalho Chehab
27725b532ceSMauro Carvalho Chehab**mandatory**
27825b532ceSMauro Carvalho Chehab
27925b532ceSMauro Carvalho Chehab->permission() is called without BKL now. Grab it on entry, drop upon
28025b532ceSMauro Carvalho Chehabreturn - that will guarantee the same locking you used to have.  If
28125b532ceSMauro Carvalho Chehabyour method or its parts do not need BKL - better yet, now you can
28225b532ceSMauro Carvalho Chehabshift lock_kernel() and unlock_kernel() so that they would protect
28325b532ceSMauro Carvalho Chehabexactly what needs to be protected.
28425b532ceSMauro Carvalho Chehab
28525b532ceSMauro Carvalho Chehab---
28625b532ceSMauro Carvalho Chehab
28725b532ceSMauro Carvalho Chehab**mandatory**
28825b532ceSMauro Carvalho Chehab
28925b532ceSMauro Carvalho Chehab->statfs() is now called without BKL held.  BKL should have been
29025b532ceSMauro Carvalho Chehabshifted into individual fs sb_op functions where it's not clear that
29125b532ceSMauro Carvalho Chehabit's safe to remove it.  If you don't need it, remove it.
29225b532ceSMauro Carvalho Chehab
29325b532ceSMauro Carvalho Chehab---
29425b532ceSMauro Carvalho Chehab
29525b532ceSMauro Carvalho Chehab**mandatory**
29625b532ceSMauro Carvalho Chehab
29725b532ceSMauro Carvalho Chehabis_read_only() is gone; use bdev_read_only() instead.
29825b532ceSMauro Carvalho Chehab
29925b532ceSMauro Carvalho Chehab---
30025b532ceSMauro Carvalho Chehab
30125b532ceSMauro Carvalho Chehab**mandatory**
30225b532ceSMauro Carvalho Chehab
30325b532ceSMauro Carvalho Chehabdestroy_buffers() is gone; use invalidate_bdev().
30425b532ceSMauro Carvalho Chehab
30525b532ceSMauro Carvalho Chehab---
30625b532ceSMauro Carvalho Chehab
30725b532ceSMauro Carvalho Chehab**mandatory**
30825b532ceSMauro Carvalho Chehab
30925b532ceSMauro Carvalho Chehabfsync_dev() is gone; use fsync_bdev().  NOTE: lvm breakage is
31025b532ceSMauro Carvalho Chehabdeliberate; as soon as struct block_device * is propagated in a reasonable
31125b532ceSMauro Carvalho Chehabway by that code fixing will become trivial; until then nothing can be
31225b532ceSMauro Carvalho Chehabdone.
31325b532ceSMauro Carvalho Chehab
31425b532ceSMauro Carvalho Chehab**mandatory**
31525b532ceSMauro Carvalho Chehab
31625b532ceSMauro Carvalho Chehabblock truncatation on error exit from ->write_begin, and ->direct_IO
31725b532ceSMauro Carvalho Chehabmoved from generic methods (block_write_begin, cont_write_begin,
31825b532ceSMauro Carvalho Chehabnobh_write_begin, blockdev_direct_IO*) to callers.  Take a look at
31925b532ceSMauro Carvalho Chehabext2_write_failed and callers for an example.
32025b532ceSMauro Carvalho Chehab
32125b532ceSMauro Carvalho Chehab**mandatory**
32225b532ceSMauro Carvalho Chehab
32325b532ceSMauro Carvalho Chehab->truncate is gone.  The whole truncate sequence needs to be
32425b532ceSMauro Carvalho Chehabimplemented in ->setattr, which is now mandatory for filesystems
32525b532ceSMauro Carvalho Chehabimplementing on-disk size changes.  Start with a copy of the old inode_setattr
32625b532ceSMauro Carvalho Chehaband vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to
32725b532ceSMauro Carvalho Chehabbe in order of zeroing blocks using block_truncate_page or similar helpers,
32825b532ceSMauro Carvalho Chehabsize update and on finally on-disk truncation which should not fail.
32925b532ceSMauro Carvalho Chehabsetattr_prepare (which used to be inode_change_ok) now includes the size checks
33025b532ceSMauro Carvalho Chehabfor ATTR_SIZE and must be called in the beginning of ->setattr unconditionally.
33125b532ceSMauro Carvalho Chehab
33225b532ceSMauro Carvalho Chehab**mandatory**
33325b532ceSMauro Carvalho Chehab
33425b532ceSMauro Carvalho Chehab->clear_inode() and ->delete_inode() are gone; ->evict_inode() should
33525b532ceSMauro Carvalho Chehabbe used instead.  It gets called whenever the inode is evicted, whether it has
33625b532ceSMauro Carvalho Chehabremaining links or not.  Caller does *not* evict the pagecache or inode-associated
33725b532ceSMauro Carvalho Chehabmetadata buffers; the method has to use truncate_inode_pages_final() to get rid
33825b532ceSMauro Carvalho Chehabof those. Caller makes sure async writeback cannot be running for the inode while
33925b532ceSMauro Carvalho Chehab(or after) ->evict_inode() is called.
34025b532ceSMauro Carvalho Chehab
34125b532ceSMauro Carvalho Chehab->drop_inode() returns int now; it's called on final iput() with
34225b532ceSMauro Carvalho Chehabinode->i_lock held and it returns true if filesystems wants the inode to be
34325b532ceSMauro Carvalho Chehabdropped.  As before, generic_drop_inode() is still the default and it's been
34425b532ceSMauro Carvalho Chehabupdated appropriately.  generic_delete_inode() is also alive and it consists
34525b532ceSMauro Carvalho Chehabsimply of return 1.  Note that all actual eviction work is done by caller after
34625b532ceSMauro Carvalho Chehab->drop_inode() returns.
34725b532ceSMauro Carvalho Chehab
34825b532ceSMauro Carvalho ChehabAs before, clear_inode() must be called exactly once on each call of
34925b532ceSMauro Carvalho Chehab->evict_inode() (as it used to be for each call of ->delete_inode()).  Unlike
35025b532ceSMauro Carvalho Chehabbefore, if you are using inode-associated metadata buffers (i.e.
35125b532ceSMauro Carvalho Chehabmark_buffer_dirty_inode()), it's your responsibility to call
35225b532ceSMauro Carvalho Chehabinvalidate_inode_buffers() before clear_inode().
35325b532ceSMauro Carvalho Chehab
35425b532ceSMauro Carvalho ChehabNOTE: checking i_nlink in the beginning of ->write_inode() and bailing out
35525b532ceSMauro Carvalho Chehabif it's zero is not *and* *never* *had* *been* enough.  Final unlink() and iput()
35625b532ceSMauro Carvalho Chehabmay happen while the inode is in the middle of ->write_inode(); e.g. if you blindly
35725b532ceSMauro Carvalho Chehabfree the on-disk inode, you may end up doing that while ->write_inode() is writing
35825b532ceSMauro Carvalho Chehabto it.
35925b532ceSMauro Carvalho Chehab
36025b532ceSMauro Carvalho Chehab---
36125b532ceSMauro Carvalho Chehab
36225b532ceSMauro Carvalho Chehab**mandatory**
36325b532ceSMauro Carvalho Chehab
36425b532ceSMauro Carvalho Chehab.d_delete() now only advises the dcache as to whether or not to cache
36525b532ceSMauro Carvalho Chehabunreferenced dentries, and is now only called when the dentry refcount goes to
36625b532ceSMauro Carvalho Chehab0. Even on 0 refcount transition, it must be able to tolerate being called 0,
36725b532ceSMauro Carvalho Chehab1, or more times (eg. constant, idempotent).
36825b532ceSMauro Carvalho Chehab
36925b532ceSMauro Carvalho Chehab---
37025b532ceSMauro Carvalho Chehab
37125b532ceSMauro Carvalho Chehab**mandatory**
37225b532ceSMauro Carvalho Chehab
37325b532ceSMauro Carvalho Chehab.d_compare() calling convention and locking rules are significantly
37425b532ceSMauro Carvalho Chehabchanged. Read updated documentation in Documentation/filesystems/vfs.rst (and
37525b532ceSMauro Carvalho Chehablook at examples of other filesystems) for guidance.
37625b532ceSMauro Carvalho Chehab
37725b532ceSMauro Carvalho Chehab---
37825b532ceSMauro Carvalho Chehab
37925b532ceSMauro Carvalho Chehab**mandatory**
38025b532ceSMauro Carvalho Chehab
38125b532ceSMauro Carvalho Chehab.d_hash() calling convention and locking rules are significantly
38225b532ceSMauro Carvalho Chehabchanged. Read updated documentation in Documentation/filesystems/vfs.rst (and
38325b532ceSMauro Carvalho Chehablook at examples of other filesystems) for guidance.
38425b532ceSMauro Carvalho Chehab
38525b532ceSMauro Carvalho Chehab---
38625b532ceSMauro Carvalho Chehab
38725b532ceSMauro Carvalho Chehab**mandatory**
38825b532ceSMauro Carvalho Chehab
38925b532ceSMauro Carvalho Chehabdcache_lock is gone, replaced by fine grained locks. See fs/dcache.c
39025b532ceSMauro Carvalho Chehabfor details of what locks to replace dcache_lock with in order to protect
39125b532ceSMauro Carvalho Chehabparticular things. Most of the time, a filesystem only needs ->d_lock, which
39225b532ceSMauro Carvalho Chehabprotects *all* the dcache state of a given dentry.
39325b532ceSMauro Carvalho Chehab
39425b532ceSMauro Carvalho Chehab---
39525b532ceSMauro Carvalho Chehab
39625b532ceSMauro Carvalho Chehab**mandatory**
39725b532ceSMauro Carvalho Chehab
39825b532ceSMauro Carvalho ChehabFilesystems must RCU-free their inodes, if they can have been accessed
39925b532ceSMauro Carvalho Chehabvia rcu-walk path walk (basically, if the file can have had a path name in the
40025b532ceSMauro Carvalho Chehabvfs namespace).
40125b532ceSMauro Carvalho Chehab
40225b532ceSMauro Carvalho ChehabEven though i_dentry and i_rcu share storage in a union, we will
40325b532ceSMauro Carvalho Chehabinitialize the former in inode_init_always(), so just leave it alone in
40425b532ceSMauro Carvalho Chehabthe callback.  It used to be necessary to clean it there, but not anymore
40525b532ceSMauro Carvalho Chehab(starting at 3.2).
40625b532ceSMauro Carvalho Chehab
40725b532ceSMauro Carvalho Chehab---
40825b532ceSMauro Carvalho Chehab
40925b532ceSMauro Carvalho Chehab**recommended**
41025b532ceSMauro Carvalho Chehab
41125b532ceSMauro Carvalho Chehabvfs now tries to do path walking in "rcu-walk mode", which avoids
41225b532ceSMauro Carvalho Chehabatomic operations and scalability hazards on dentries and inodes (see
41325b532ceSMauro Carvalho ChehabDocumentation/filesystems/path-lookup.txt). d_hash and d_compare changes
41425b532ceSMauro Carvalho Chehab(above) are examples of the changes required to support this. For more complex
41525b532ceSMauro Carvalho Chehabfilesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so
41625b532ceSMauro Carvalho Chehabno changes are required to the filesystem. However, this is costly and loses
41725b532ceSMauro Carvalho Chehabthe benefits of rcu-walk mode. We will begin to add filesystem callbacks that
41825b532ceSMauro Carvalho Chehabare rcu-walk aware, shown below. Filesystems should take advantage of this
41925b532ceSMauro Carvalho Chehabwhere possible.
42025b532ceSMauro Carvalho Chehab
42125b532ceSMauro Carvalho Chehab---
42225b532ceSMauro Carvalho Chehab
42325b532ceSMauro Carvalho Chehab**mandatory**
42425b532ceSMauro Carvalho Chehab
42525b532ceSMauro Carvalho Chehabd_revalidate is a callback that is made on every path element (if
42625b532ceSMauro Carvalho Chehabthe filesystem provides it), which requires dropping out of rcu-walk mode. This
42725b532ceSMauro Carvalho Chehabmay now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be
42825b532ceSMauro Carvalho Chehabreturned if the filesystem cannot handle rcu-walk. See
42925b532ceSMauro Carvalho ChehabDocumentation/filesystems/vfs.rst for more details.
43025b532ceSMauro Carvalho Chehab
43125b532ceSMauro Carvalho Chehabpermission is an inode permission check that is called on many or all
43225b532ceSMauro Carvalho Chehabdirectory inodes on the way down a path walk (to check for exec permission). It
43325b532ceSMauro Carvalho Chehabmust now be rcu-walk aware (mask & MAY_NOT_BLOCK).  See
43425b532ceSMauro Carvalho ChehabDocumentation/filesystems/vfs.rst for more details.
43525b532ceSMauro Carvalho Chehab
43625b532ceSMauro Carvalho Chehab---
43725b532ceSMauro Carvalho Chehab
43825b532ceSMauro Carvalho Chehab**mandatory**
43925b532ceSMauro Carvalho Chehab
44025b532ceSMauro Carvalho ChehabIn ->fallocate() you must check the mode option passed in.  If your
44125b532ceSMauro Carvalho Chehabfilesystem does not support hole punching (deallocating space in the middle of a
44225b532ceSMauro Carvalho Chehabfile) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode.
44325b532ceSMauro Carvalho ChehabCurrently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set,
44425b532ceSMauro Carvalho Chehabso the i_size should not change when hole punching, even when puching the end of
44525b532ceSMauro Carvalho Chehaba file off.
44625b532ceSMauro Carvalho Chehab
44725b532ceSMauro Carvalho Chehab---
44825b532ceSMauro Carvalho Chehab
44925b532ceSMauro Carvalho Chehab**mandatory**
45025b532ceSMauro Carvalho Chehab
45125b532ceSMauro Carvalho Chehab->get_sb() is gone.  Switch to use of ->mount().  Typically it's just
45225b532ceSMauro Carvalho Chehaba matter of switching from calling ``get_sb_``... to ``mount_``... and changing
45325b532ceSMauro Carvalho Chehabthe function type.  If you were doing it manually, just switch from setting
45425b532ceSMauro Carvalho Chehab->mnt_root to some pointer to returning that pointer.  On errors return
45525b532ceSMauro Carvalho ChehabERR_PTR(...).
45625b532ceSMauro Carvalho Chehab
45725b532ceSMauro Carvalho Chehab---
45825b532ceSMauro Carvalho Chehab
45925b532ceSMauro Carvalho Chehab**mandatory**
46025b532ceSMauro Carvalho Chehab
46125b532ceSMauro Carvalho Chehab->permission() and generic_permission()have lost flags
46225b532ceSMauro Carvalho Chehabargument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask.
46325b532ceSMauro Carvalho Chehab
46425b532ceSMauro Carvalho Chehabgeneric_permission() has also lost the check_acl argument; ACL checking
465cac2f8b8SChristian Braunerhas been taken to VFS and filesystems need to provide a non-NULL
466cac2f8b8SChristian Brauner->i_op->get_inode_acl to read an ACL from disk.
46725b532ceSMauro Carvalho Chehab
46825b532ceSMauro Carvalho Chehab---
46925b532ceSMauro Carvalho Chehab
47025b532ceSMauro Carvalho Chehab**mandatory**
47125b532ceSMauro Carvalho Chehab
47225b532ceSMauro Carvalho ChehabIf you implement your own ->llseek() you must handle SEEK_HOLE and
473d56b699dSBjorn HelgaasSEEK_DATA.  You can handle this by returning -EINVAL, but it would be nicer to
47425b532ceSMauro Carvalho Chehabsupport it in some way.  The generic handler assumes that the entire file is
47525b532ceSMauro Carvalho Chehabdata and there is a virtual hole at the end of the file.  So if the provided
47625b532ceSMauro Carvalho Chehaboffset is less than i_size and SEEK_DATA is specified, return the same offset.
47725b532ceSMauro Carvalho ChehabIf the above is true for the offset and you are given SEEK_HOLE, return the end
47825b532ceSMauro Carvalho Chehabof the file.  If the offset is i_size or greater return -ENXIO in either case.
47925b532ceSMauro Carvalho Chehab
48025b532ceSMauro Carvalho Chehab**mandatory**
48125b532ceSMauro Carvalho Chehab
48225b532ceSMauro Carvalho ChehabIf you have your own ->fsync() you must make sure to call
48325b532ceSMauro Carvalho Chehabfilemap_write_and_wait_range() so that all dirty pages are synced out properly.
48425b532ceSMauro Carvalho ChehabYou must also keep in mind that ->fsync() is not called with i_mutex held
48525b532ceSMauro Carvalho Chehabanymore, so if you require i_mutex locking you must make sure to take it and
48625b532ceSMauro Carvalho Chehabrelease it yourself.
48725b532ceSMauro Carvalho Chehab
48825b532ceSMauro Carvalho Chehab---
48925b532ceSMauro Carvalho Chehab
49025b532ceSMauro Carvalho Chehab**mandatory**
49125b532ceSMauro Carvalho Chehab
49225b532ceSMauro Carvalho Chehabd_alloc_root() is gone, along with a lot of bugs caused by code
49325b532ceSMauro Carvalho Chehabmisusing it.  Replacement: d_make_root(inode).  On success d_make_root(inode)
49425b532ceSMauro Carvalho Chehaballocates and returns a new dentry instantiated with the passed in inode.
49525b532ceSMauro Carvalho ChehabOn failure NULL is returned and the passed in inode is dropped so the reference
49625b532ceSMauro Carvalho Chehabto inode is consumed in all cases and failure handling need not do any cleanup
49725b532ceSMauro Carvalho Chehabfor the inode.  If d_make_root(inode) is passed a NULL inode it returns NULL
49825b532ceSMauro Carvalho Chehaband also requires no further error handling. Typical usage is::
49925b532ceSMauro Carvalho Chehab
50025b532ceSMauro Carvalho Chehab	inode = foofs_new_inode(....);
50125b532ceSMauro Carvalho Chehab	s->s_root = d_make_root(inode);
50225b532ceSMauro Carvalho Chehab	if (!s->s_root)
50325b532ceSMauro Carvalho Chehab		/* Nothing needed for the inode cleanup */
50425b532ceSMauro Carvalho Chehab		return -ENOMEM;
50525b532ceSMauro Carvalho Chehab	...
50625b532ceSMauro Carvalho Chehab
50725b532ceSMauro Carvalho Chehab---
50825b532ceSMauro Carvalho Chehab
50925b532ceSMauro Carvalho Chehab**mandatory**
51025b532ceSMauro Carvalho Chehab
51125b532ceSMauro Carvalho ChehabThe witch is dead!  Well, 2/3 of it, anyway.  ->d_revalidate() and
51225b532ceSMauro Carvalho Chehab->lookup() do *not* take struct nameidata anymore; just the flags.
51325b532ceSMauro Carvalho Chehab
51425b532ceSMauro Carvalho Chehab---
51525b532ceSMauro Carvalho Chehab
51625b532ceSMauro Carvalho Chehab**mandatory**
51725b532ceSMauro Carvalho Chehab
51825b532ceSMauro Carvalho Chehab->create() doesn't take ``struct nameidata *``; unlike the previous
51925b532ceSMauro Carvalho Chehabtwo, it gets "is it an O_EXCL or equivalent?" boolean argument.  Note that
520d56b699dSBjorn Helgaaslocal filesystems can ignore this argument - they are guaranteed that the
52125b532ceSMauro Carvalho Chehabobject doesn't exist.  It's remote/distributed ones that might care...
52225b532ceSMauro Carvalho Chehab
52325b532ceSMauro Carvalho Chehab---
52425b532ceSMauro Carvalho Chehab
52525b532ceSMauro Carvalho Chehab**mandatory**
52625b532ceSMauro Carvalho Chehab
52725b532ceSMauro Carvalho ChehabFS_REVAL_DOT is gone; if you used to have it, add ->d_weak_revalidate()
52825b532ceSMauro Carvalho Chehabin your dentry operations instead.
52925b532ceSMauro Carvalho Chehab
53025b532ceSMauro Carvalho Chehab---
53125b532ceSMauro Carvalho Chehab
53225b532ceSMauro Carvalho Chehab**mandatory**
53325b532ceSMauro Carvalho Chehab
53425b532ceSMauro Carvalho Chehabvfs_readdir() is gone; switch to iterate_dir() instead
53525b532ceSMauro Carvalho Chehab
53625b532ceSMauro Carvalho Chehab---
53725b532ceSMauro Carvalho Chehab
53825b532ceSMauro Carvalho Chehab**mandatory**
53925b532ceSMauro Carvalho Chehab
5403e327154SLinus Torvalds->readdir() is gone now; switch to ->iterate_shared()
54125b532ceSMauro Carvalho Chehab
54225b532ceSMauro Carvalho Chehab**mandatory**
54325b532ceSMauro Carvalho Chehab
54425b532ceSMauro Carvalho Chehabvfs_follow_link has been removed.  Filesystems must use nd_set_link
54525b532ceSMauro Carvalho Chehabfrom ->follow_link for normal symlinks, or nd_jump_link for magic
54625b532ceSMauro Carvalho Chehab/proc/<pid> style links.
54725b532ceSMauro Carvalho Chehab
54825b532ceSMauro Carvalho Chehab---
54925b532ceSMauro Carvalho Chehab
55025b532ceSMauro Carvalho Chehab**mandatory**
55125b532ceSMauro Carvalho Chehab
55225b532ceSMauro Carvalho Chehabiget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be
55325b532ceSMauro Carvalho Chehabcalled with both ->i_lock and inode_hash_lock held; the former is *not*
55425b532ceSMauro Carvalho Chehabtaken anymore, so verify that your callbacks do not rely on it (none
55525b532ceSMauro Carvalho Chehabof the in-tree instances did).  inode_hash_lock is still held,
55625b532ceSMauro Carvalho Chehabof course, so they are still serialized wrt removal from inode hash,
55725b532ceSMauro Carvalho Chehabas well as wrt set() callback of iget5_locked().
55825b532ceSMauro Carvalho Chehab
55925b532ceSMauro Carvalho Chehab---
56025b532ceSMauro Carvalho Chehab
56125b532ceSMauro Carvalho Chehab**mandatory**
56225b532ceSMauro Carvalho Chehab
56325b532ceSMauro Carvalho Chehabd_materialise_unique() is gone; d_splice_alias() does everything you
56425b532ceSMauro Carvalho Chehabneed now.  Remember that they have opposite orders of arguments ;-/
56525b532ceSMauro Carvalho Chehab
56625b532ceSMauro Carvalho Chehab---
56725b532ceSMauro Carvalho Chehab
56825b532ceSMauro Carvalho Chehab**mandatory**
56925b532ceSMauro Carvalho Chehab
57025b532ceSMauro Carvalho Chehabf_dentry is gone; use f_path.dentry, or, better yet, see if you can avoid
57125b532ceSMauro Carvalho Chehabit entirely.
57225b532ceSMauro Carvalho Chehab
57325b532ceSMauro Carvalho Chehab---
57425b532ceSMauro Carvalho Chehab
57525b532ceSMauro Carvalho Chehab**mandatory**
57625b532ceSMauro Carvalho Chehab
57725b532ceSMauro Carvalho Chehabnever call ->read() and ->write() directly; use __vfs_{read,write} or
57825b532ceSMauro Carvalho Chehabwrappers; instead of checking for ->write or ->read being NULL, look for
57925b532ceSMauro Carvalho ChehabFMODE_CAN_{WRITE,READ} in file->f_mode.
58025b532ceSMauro Carvalho Chehab
58125b532ceSMauro Carvalho Chehab---
58225b532ceSMauro Carvalho Chehab
58325b532ceSMauro Carvalho Chehab**mandatory**
58425b532ceSMauro Carvalho Chehab
58525b532ceSMauro Carvalho Chehabdo _not_ use new_sync_{read,write} for ->read/->write; leave it NULL
58625b532ceSMauro Carvalho Chehabinstead.
58725b532ceSMauro Carvalho Chehab
58825b532ceSMauro Carvalho Chehab---
58925b532ceSMauro Carvalho Chehab
59025b532ceSMauro Carvalho Chehab**mandatory**
59125b532ceSMauro Carvalho Chehab	->aio_read/->aio_write are gone.  Use ->read_iter/->write_iter.
59225b532ceSMauro Carvalho Chehab
59325b532ceSMauro Carvalho Chehab---
59425b532ceSMauro Carvalho Chehab
59525b532ceSMauro Carvalho Chehab**recommended**
59625b532ceSMauro Carvalho Chehab
59725b532ceSMauro Carvalho Chehabfor embedded ("fast") symlinks just set inode->i_link to wherever the
59825b532ceSMauro Carvalho Chehabsymlink body is and use simple_follow_link() as ->follow_link().
59925b532ceSMauro Carvalho Chehab
60025b532ceSMauro Carvalho Chehab---
60125b532ceSMauro Carvalho Chehab
60225b532ceSMauro Carvalho Chehab**mandatory**
60325b532ceSMauro Carvalho Chehab
60425b532ceSMauro Carvalho Chehabcalling conventions for ->follow_link() have changed.  Instead of returning
60525b532ceSMauro Carvalho Chehabcookie and using nd_set_link() to store the body to traverse, we return
60625b532ceSMauro Carvalho Chehabthe body to traverse and store the cookie using explicit void ** argument.
60725b532ceSMauro Carvalho Chehabnameidata isn't passed at all - nd_jump_link() doesn't need it and
60825b532ceSMauro Carvalho Chehabnd_[gs]et_link() is gone.
60925b532ceSMauro Carvalho Chehab
61025b532ceSMauro Carvalho Chehab---
61125b532ceSMauro Carvalho Chehab
61225b532ceSMauro Carvalho Chehab**mandatory**
61325b532ceSMauro Carvalho Chehab
61425b532ceSMauro Carvalho Chehabcalling conventions for ->put_link() have changed.  It gets inode instead of
61525b532ceSMauro Carvalho Chehabdentry,  it does not get nameidata at all and it gets called only when cookie
61625b532ceSMauro Carvalho Chehabis non-NULL.  Note that link body isn't available anymore, so if you need it,
61725b532ceSMauro Carvalho Chehabstore it as cookie.
61825b532ceSMauro Carvalho Chehab
61925b532ceSMauro Carvalho Chehab---
62025b532ceSMauro Carvalho Chehab
62125b532ceSMauro Carvalho Chehab**mandatory**
62225b532ceSMauro Carvalho Chehab
62325b532ceSMauro Carvalho Chehabany symlink that might use page_follow_link_light/page_put_link() must
62425b532ceSMauro Carvalho Chehabhave inode_nohighmem(inode) called before anything might start playing with
62525b532ceSMauro Carvalho Chehabits pagecache.  No highmem pages should end up in the pagecache of such
62625b532ceSMauro Carvalho Chehabsymlinks.  That includes any preseeding that might be done during symlink
62756f5746cSMatthew Wilcox (Oracle)creation.  page_symlink() will honour the mapping gfp flags, so once
62825b532ceSMauro Carvalho Chehabyou've done inode_nohighmem() it's safe to use, but if you allocate and
62925b532ceSMauro Carvalho Chehabinsert the page manually, make sure to use the right gfp flags.
63025b532ceSMauro Carvalho Chehab
63125b532ceSMauro Carvalho Chehab---
63225b532ceSMauro Carvalho Chehab
63325b532ceSMauro Carvalho Chehab**mandatory**
63425b532ceSMauro Carvalho Chehab
63525b532ceSMauro Carvalho Chehab->follow_link() is replaced with ->get_link(); same API, except that
63625b532ceSMauro Carvalho Chehab
63725b532ceSMauro Carvalho Chehab	* ->get_link() gets inode as a separate argument
63825b532ceSMauro Carvalho Chehab	* ->get_link() may be called in RCU mode - in that case NULL
63925b532ceSMauro Carvalho Chehab	  dentry is passed
64025b532ceSMauro Carvalho Chehab
64125b532ceSMauro Carvalho Chehab---
64225b532ceSMauro Carvalho Chehab
64325b532ceSMauro Carvalho Chehab**mandatory**
64425b532ceSMauro Carvalho Chehab
64525b532ceSMauro Carvalho Chehab->get_link() gets struct delayed_call ``*done`` now, and should do
64625b532ceSMauro Carvalho Chehabset_delayed_call() where it used to set ``*cookie``.
64725b532ceSMauro Carvalho Chehab
64825b532ceSMauro Carvalho Chehab->put_link() is gone - just give the destructor to set_delayed_call()
64925b532ceSMauro Carvalho Chehabin ->get_link().
65025b532ceSMauro Carvalho Chehab
65125b532ceSMauro Carvalho Chehab---
65225b532ceSMauro Carvalho Chehab
65325b532ceSMauro Carvalho Chehab**mandatory**
65425b532ceSMauro Carvalho Chehab
65525b532ceSMauro Carvalho Chehab->getxattr() and xattr_handler.get() get dentry and inode passed separately.
65625b532ceSMauro Carvalho Chehabdentry might be yet to be attached to inode, so do _not_ use its ->d_inode
65725b532ceSMauro Carvalho Chehabin the instances.  Rationale: !@#!@# security_d_instantiate() needs to be
65825b532ceSMauro Carvalho Chehabcalled before we attach dentry to inode.
65925b532ceSMauro Carvalho Chehab
66025b532ceSMauro Carvalho Chehab---
66125b532ceSMauro Carvalho Chehab
66225b532ceSMauro Carvalho Chehab**mandatory**
66325b532ceSMauro Carvalho Chehab
66425b532ceSMauro Carvalho Chehabsymlinks are no longer the only inodes that do *not* have i_bdev/i_cdev/
66525b532ceSMauro Carvalho Chehabi_pipe/i_link union zeroed out at inode eviction.  As the result, you can't
66625b532ceSMauro Carvalho Chehabassume that non-NULL value in ->i_nlink at ->destroy_inode() implies that
66725b532ceSMauro Carvalho Chehabit's a symlink.  Checking ->i_mode is really needed now.  In-tree we had
66825b532ceSMauro Carvalho Chehabto fix shmem_destroy_callback() that used to take that kind of shortcut;
66925b532ceSMauro Carvalho Chehabwatch out, since that shortcut is no longer valid.
67025b532ceSMauro Carvalho Chehab
67125b532ceSMauro Carvalho Chehab---
67225b532ceSMauro Carvalho Chehab
67325b532ceSMauro Carvalho Chehab**mandatory**
67425b532ceSMauro Carvalho Chehab
67525b532ceSMauro Carvalho Chehab->i_mutex is replaced with ->i_rwsem now.  inode_lock() et.al. work as
67625b532ceSMauro Carvalho Chehabthey used to - they just take it exclusive.  However, ->lookup() may be
67725b532ceSMauro Carvalho Chehabcalled with parent locked shared.  Its instances must not
67825b532ceSMauro Carvalho Chehab
67925b532ceSMauro Carvalho Chehab	* use d_instantiate) and d_rehash() separately - use d_add() or
68025b532ceSMauro Carvalho Chehab	  d_splice_alias() instead.
68125b532ceSMauro Carvalho Chehab	* use d_rehash() alone - call d_add(new_dentry, NULL) instead.
68225b532ceSMauro Carvalho Chehab	* in the unlikely case when (read-only) access to filesystem
68325b532ceSMauro Carvalho Chehab	  data structures needs exclusion for some reason, arrange it
68425b532ceSMauro Carvalho Chehab	  yourself.  None of the in-tree filesystems needed that.
68525b532ceSMauro Carvalho Chehab	* rely on ->d_parent and ->d_name not changing after dentry has
68625b532ceSMauro Carvalho Chehab	  been fed to d_add() or d_splice_alias().  Again, none of the
68725b532ceSMauro Carvalho Chehab	  in-tree instances relied upon that.
68825b532ceSMauro Carvalho Chehab
68925b532ceSMauro Carvalho ChehabWe are guaranteed that lookups of the same name in the same directory
69025b532ceSMauro Carvalho Chehabwill not happen in parallel ("same" in the sense of your ->d_compare()).
69125b532ceSMauro Carvalho ChehabLookups on different names in the same directory can and do happen in
69225b532ceSMauro Carvalho Chehabparallel now.
69325b532ceSMauro Carvalho Chehab
69425b532ceSMauro Carvalho Chehab---
69525b532ceSMauro Carvalho Chehab
6963e327154SLinus Torvalds**mandatory**
69725b532ceSMauro Carvalho Chehab
6983e327154SLinus Torvalds->iterate_shared() is added.
69925b532ceSMauro Carvalho ChehabExclusion on struct file level is still provided (as well as that
70025b532ceSMauro Carvalho Chehabbetween it and lseek on the same struct file), but if your directory
70125b532ceSMauro Carvalho Chehabhas been opened several times, you can get these called in parallel.
70225b532ceSMauro Carvalho ChehabExclusion between that method and all directory-modifying ones is
70325b532ceSMauro Carvalho Chehabstill provided, of course.
70425b532ceSMauro Carvalho Chehab
7053e327154SLinus TorvaldsIf you have any per-inode or per-dentry in-core data structures modified
7063e327154SLinus Torvaldsby ->iterate_shared(), you might need something to serialize the access
7073e327154SLinus Torvaldsto them.  If you do dcache pre-seeding, you'll need to switch to
7083e327154SLinus Torvaldsd_alloc_parallel() for that; look for in-tree examples.
70925b532ceSMauro Carvalho Chehab
71025b532ceSMauro Carvalho Chehab---
71125b532ceSMauro Carvalho Chehab
71225b532ceSMauro Carvalho Chehab**mandatory**
71325b532ceSMauro Carvalho Chehab
71425b532ceSMauro Carvalho Chehab->atomic_open() calls without O_CREAT may happen in parallel.
71525b532ceSMauro Carvalho Chehab
71625b532ceSMauro Carvalho Chehab---
71725b532ceSMauro Carvalho Chehab
71825b532ceSMauro Carvalho Chehab**mandatory**
71925b532ceSMauro Carvalho Chehab
72025b532ceSMauro Carvalho Chehab->setxattr() and xattr_handler.set() get dentry and inode passed separately.
721e65ce2a5SChristian BraunerThe xattr_handler.set() gets passed the user namespace of the mount the inode
722e65ce2a5SChristian Brauneris seen from so filesystems can idmap the i_uid and i_gid accordingly.
72325b532ceSMauro Carvalho Chehabdentry might be yet to be attached to inode, so do _not_ use its ->d_inode
72425b532ceSMauro Carvalho Chehabin the instances.  Rationale: !@#!@# security_d_instantiate() needs to be
72525b532ceSMauro Carvalho Chehabcalled before we attach dentry to inode and !@#!@##!@$!$#!@#$!@$!@$ smack
72625b532ceSMauro Carvalho Chehab->d_instantiate() uses not just ->getxattr() but ->setxattr() as well.
72725b532ceSMauro Carvalho Chehab
72825b532ceSMauro Carvalho Chehab---
72925b532ceSMauro Carvalho Chehab
73025b532ceSMauro Carvalho Chehab**mandatory**
73125b532ceSMauro Carvalho Chehab
73225b532ceSMauro Carvalho Chehab->d_compare() doesn't get parent as a separate argument anymore.  If you
73325b532ceSMauro Carvalho Chehabused it for finding the struct super_block involved, dentry->d_sb will
73425b532ceSMauro Carvalho Chehabwork just as well; if it's something more complicated, use dentry->d_parent.
73525b532ceSMauro Carvalho ChehabJust be careful not to assume that fetching it more than once will yield
73625b532ceSMauro Carvalho Chehabthe same value - in RCU mode it could change under you.
73725b532ceSMauro Carvalho Chehab
73825b532ceSMauro Carvalho Chehab---
73925b532ceSMauro Carvalho Chehab
74025b532ceSMauro Carvalho Chehab**mandatory**
74125b532ceSMauro Carvalho Chehab
74225b532ceSMauro Carvalho Chehab->rename() has an added flags argument.  Any flags not handled by the
74325b532ceSMauro Carvalho Chehabfilesystem should result in EINVAL being returned.
74425b532ceSMauro Carvalho Chehab
74525b532ceSMauro Carvalho Chehab---
74625b532ceSMauro Carvalho Chehab
74725b532ceSMauro Carvalho Chehab
74825b532ceSMauro Carvalho Chehab**recommended**
74925b532ceSMauro Carvalho Chehab
75025b532ceSMauro Carvalho Chehab->readlink is optional for symlinks.  Don't set, unless filesystem needs
75125b532ceSMauro Carvalho Chehabto fake something for readlink(2).
75225b532ceSMauro Carvalho Chehab
75325b532ceSMauro Carvalho Chehab---
75425b532ceSMauro Carvalho Chehab
75525b532ceSMauro Carvalho Chehab**mandatory**
75625b532ceSMauro Carvalho Chehab
75725b532ceSMauro Carvalho Chehab->getattr() is now passed a struct path rather than a vfsmount and
75825b532ceSMauro Carvalho Chehabdentry separately, and it now has request_mask and query_flags arguments
75925b532ceSMauro Carvalho Chehabto specify the fields and sync type requested by statx.  Filesystems not
76025b532ceSMauro Carvalho Chehabsupporting any statx-specific features may ignore the new arguments.
76125b532ceSMauro Carvalho Chehab
76225b532ceSMauro Carvalho Chehab---
76325b532ceSMauro Carvalho Chehab
76425b532ceSMauro Carvalho Chehab**mandatory**
76525b532ceSMauro Carvalho Chehab
76625b532ceSMauro Carvalho Chehab->atomic_open() calling conventions have changed.  Gone is ``int *opened``,
76725b532ceSMauro Carvalho Chehabalong with FILE_OPENED/FILE_CREATED.  In place of those we have
76825b532ceSMauro Carvalho ChehabFMODE_OPENED/FMODE_CREATED, set in file->f_mode.  Additionally, return
76925b532ceSMauro Carvalho Chehabvalue for 'called finish_no_open(), open it yourself' case has become
77025b532ceSMauro Carvalho Chehab0, not 1.  Since finish_no_open() itself is returning 0 now, that part
77125b532ceSMauro Carvalho Chehabdoes not need any changes in ->atomic_open() instances.
77225b532ceSMauro Carvalho Chehab
77325b532ceSMauro Carvalho Chehab---
77425b532ceSMauro Carvalho Chehab
77525b532ceSMauro Carvalho Chehab**mandatory**
77625b532ceSMauro Carvalho Chehab
77725b532ceSMauro Carvalho Chehaballoc_file() has become static now; two wrappers are to be used instead.
77825b532ceSMauro Carvalho Chehaballoc_file_pseudo(inode, vfsmount, name, flags, ops) is for the cases
77925b532ceSMauro Carvalho Chehabwhen dentry needs to be created; that's the majority of old alloc_file()
78025b532ceSMauro Carvalho Chehabusers.  Calling conventions: on success a reference to new struct file
78125b532ceSMauro Carvalho Chehabis returned and callers reference to inode is subsumed by that.  On
78225b532ceSMauro Carvalho Chehabfailure, ERR_PTR() is returned and no caller's references are affected,
78325b532ceSMauro Carvalho Chehabso the caller needs to drop the inode reference it held.
78425b532ceSMauro Carvalho Chehaballoc_file_clone(file, flags, ops) does not affect any caller's references.
78525b532ceSMauro Carvalho ChehabOn success you get a new struct file sharing the mount/dentry with the
78625b532ceSMauro Carvalho Chehaboriginal, on failure - ERR_PTR().
78725b532ceSMauro Carvalho Chehab
78825b532ceSMauro Carvalho Chehab---
78925b532ceSMauro Carvalho Chehab
79025b532ceSMauro Carvalho Chehab**mandatory**
79125b532ceSMauro Carvalho Chehab
79225b532ceSMauro Carvalho Chehab->clone_file_range() and ->dedupe_file_range have been replaced with
79325b532ceSMauro Carvalho Chehab->remap_file_range().  See Documentation/filesystems/vfs.rst for more
79425b532ceSMauro Carvalho Chehabinformation.
79525b532ceSMauro Carvalho Chehab
79625b532ceSMauro Carvalho Chehab---
79725b532ceSMauro Carvalho Chehab
79825b532ceSMauro Carvalho Chehab**recommended**
79925b532ceSMauro Carvalho Chehab
80025b532ceSMauro Carvalho Chehab->lookup() instances doing an equivalent of::
80125b532ceSMauro Carvalho Chehab
80225b532ceSMauro Carvalho Chehab	if (IS_ERR(inode))
80325b532ceSMauro Carvalho Chehab		return ERR_CAST(inode);
80425b532ceSMauro Carvalho Chehab	return d_splice_alias(inode, dentry);
80525b532ceSMauro Carvalho Chehab
80625b532ceSMauro Carvalho Chehabdon't need to bother with the check - d_splice_alias() will do the
80725b532ceSMauro Carvalho Chehabright thing when given ERR_PTR(...) as inode.  Moreover, passing NULL
80825b532ceSMauro Carvalho Chehabinode to d_splice_alias() will also do the right thing (equivalent of
80925b532ceSMauro Carvalho Chehabd_add(dentry, NULL); return NULL;), so that kind of special cases
81025b532ceSMauro Carvalho Chehabalso doesn't need a separate treatment.
81125b532ceSMauro Carvalho Chehab
81225b532ceSMauro Carvalho Chehab---
81325b532ceSMauro Carvalho Chehab
81425b532ceSMauro Carvalho Chehab**strongly recommended**
81525b532ceSMauro Carvalho Chehab
81625b532ceSMauro Carvalho Chehabtake the RCU-delayed parts of ->destroy_inode() into a new method -
81725b532ceSMauro Carvalho Chehab->free_inode().  If ->destroy_inode() becomes empty - all the better,
81825b532ceSMauro Carvalho Chehabjust get rid of it.  Synchronous work (e.g. the stuff that can't
81925b532ceSMauro Carvalho Chehabbe done from an RCU callback, or any WARN_ON() where we want the
82025b532ceSMauro Carvalho Chehabstack trace) *might* be movable to ->evict_inode(); however,
82125b532ceSMauro Carvalho Chehabthat goes only for the things that are not needed to balance something
82225b532ceSMauro Carvalho Chehabdone by ->alloc_inode().  IOW, if it's cleaning up the stuff that
82325b532ceSMauro Carvalho Chehabmight have accumulated over the life of in-core inode, ->evict_inode()
82425b532ceSMauro Carvalho Chehabmight be a fit.
82525b532ceSMauro Carvalho Chehab
82625b532ceSMauro Carvalho ChehabRules for inode destruction:
82725b532ceSMauro Carvalho Chehab
82825b532ceSMauro Carvalho Chehab	* if ->destroy_inode() is non-NULL, it gets called
82925b532ceSMauro Carvalho Chehab	* if ->free_inode() is non-NULL, it gets scheduled by call_rcu()
83025b532ceSMauro Carvalho Chehab	* combination of NULL ->destroy_inode and NULL ->free_inode is
83125b532ceSMauro Carvalho Chehab	  treated as NULL/free_inode_nonrcu, to preserve the compatibility.
83225b532ceSMauro Carvalho Chehab
83325b532ceSMauro Carvalho ChehabNote that the callback (be it via ->free_inode() or explicit call_rcu()
83425b532ceSMauro Carvalho Chehabin ->destroy_inode()) is *NOT* ordered wrt superblock destruction;
83525b532ceSMauro Carvalho Chehabas the matter of fact, the superblock and all associated structures
83625b532ceSMauro Carvalho Chehabmight be already gone.  The filesystem driver is guaranteed to be still
83725b532ceSMauro Carvalho Chehabthere, but that's it.  Freeing memory in the callback is fine; doing
83825b532ceSMauro Carvalho Chehabmore than that is possible, but requires a lot of care and is best
83925b532ceSMauro Carvalho Chehabavoided.
84025b532ceSMauro Carvalho Chehab
84125b532ceSMauro Carvalho Chehab---
84225b532ceSMauro Carvalho Chehab
84325b532ceSMauro Carvalho Chehab**mandatory**
84425b532ceSMauro Carvalho Chehab
84525b532ceSMauro Carvalho ChehabDCACHE_RCUACCESS is gone; having an RCU delay on dentry freeing is the
84625b532ceSMauro Carvalho Chehabdefault.  DCACHE_NORCU opts out, and only d_alloc_pseudo() has any
84725b532ceSMauro Carvalho Chehabbusiness doing so.
84825b532ceSMauro Carvalho Chehab
84925b532ceSMauro Carvalho Chehab---
85025b532ceSMauro Carvalho Chehab
85125b532ceSMauro Carvalho Chehab**mandatory**
85225b532ceSMauro Carvalho Chehab
85325b532ceSMauro Carvalho Chehabd_alloc_pseudo() is internal-only; uses outside of alloc_file_pseudo() are
85425b532ceSMauro Carvalho Chehabvery suspect (and won't work in modules).  Such uses are very likely to
85525b532ceSMauro Carvalho Chehabbe misspelled d_alloc_anon().
856d9a9f484SAl Viro
857d9a9f484SAl Viro---
858d9a9f484SAl Viro
859d9a9f484SAl Viro**mandatory**
860d9a9f484SAl Viro
861da51bbcdSRemington Brasga[should've been added in 2016] stale comment in finish_open() notwithstanding,
862d9a9f484SAl Virofailure exits in ->atomic_open() instances should *NOT* fput() the file,
863d9a9f484SAl Virono matter what.  Everything is handled by the caller.
864df820f8dSMiklos Szeredi
865df820f8dSMiklos Szeredi---
866df820f8dSMiklos Szeredi
867df820f8dSMiklos Szeredi**mandatory**
868df820f8dSMiklos Szeredi
869df820f8dSMiklos Szerediclone_private_mount() returns a longterm mount now, so the proper destructor of
870df820f8dSMiklos Szerediits result is kern_unmount() or kern_unmount_array().
8719b2e0016SPavel Begunkov
8729b2e0016SPavel Begunkov---
8739b2e0016SPavel Begunkov
8749b2e0016SPavel Begunkov**mandatory**
8759b2e0016SPavel Begunkov
8769b2e0016SPavel Begunkovzero-length bvec segments are disallowed, they must be filtered out before
8779b2e0016SPavel Begunkovpassed on to an iterator.
878c42bca92SPavel Begunkov
879c42bca92SPavel Begunkov---
880c42bca92SPavel Begunkov
881c42bca92SPavel Begunkov**mandatory**
882c42bca92SPavel Begunkov
883c42bca92SPavel BegunkovFor bvec based itererators bio_iov_iter_get_pages() now doesn't copy bvecs but
884c42bca92SPavel Begunkovuses the one provided. Anyone issuing kiocb-I/O should ensure that the bvec and
885c42bca92SPavel Begunkovpage references stay until I/O has completed, i.e. until ->ki_complete() has
886c42bca92SPavel Begunkovbeen called or returned with non -EIOCBQUEUED code.
8875ceabb60SLinus Torvalds
8885ceabb60SLinus Torvalds---
8895ceabb60SLinus Torvalds
8905ceabb60SLinus Torvalds**mandatory**
8915ceabb60SLinus Torvalds
89214e43bf4SEric Biggersmnt_want_write_file() can now only be paired with mnt_drop_write_file(),
89314e43bf4SEric Biggerswhereas previously it could be paired with mnt_drop_write() as well.
894f0b65f39SAl Viro
895f0b65f39SAl Viro---
896f0b65f39SAl Viro
897f0b65f39SAl Viro**mandatory**
898f0b65f39SAl Viro
899f0b65f39SAl Viroiov_iter_copy_from_user_atomic() is gone; use copy_page_from_iter_atomic().
900f0b65f39SAl ViroThe difference is copy_page_from_iter_atomic() advances the iterator and
901f0b65f39SAl Viroyou don't need iov_iter_advance() after it.  However, if you decide to use
902f0b65f39SAl Viroonly a part of obtained data, you should do iov_iter_revert().
90358ec9059SLinus Torvalds
90458ec9059SLinus Torvalds---
90558ec9059SLinus Torvalds
90658ec9059SLinus Torvalds**mandatory**
90758ec9059SLinus Torvalds
908ffb37ca3SAl ViroCalling conventions for file_open_root() changed; now it takes struct path *
909ffb37ca3SAl Viroinstead of passing mount and dentry separately.  For callers that used to
910ffb37ca3SAl Viropass <mnt, mnt->mnt_root> pair (i.e. the root of given mount), a new helper
911ffb37ca3SAl Virois provided - file_open_root_mnt().  In-tree users adjusted.
912868941b1SJason A. Donenfeld
913868941b1SJason A. Donenfeld---
914868941b1SJason A. Donenfeld
915868941b1SJason A. Donenfeld**mandatory**
916868941b1SJason A. Donenfeld
917868941b1SJason A. Donenfeldno_llseek is gone; don't set .llseek to that - just leave it NULL instead.
918868941b1SJason A. DonenfeldChecks for "does that file have llseek(2), or should it fail with ESPIPE"
919868941b1SJason A. Donenfeldshould be done by looking at FMODE_LSEEK in file->f_mode.
92025885a35SAl Viro
92125885a35SAl Viro---
92225885a35SAl Viro
92325885a35SAl Viro*mandatory*
92425885a35SAl Viro
92525885a35SAl Virofilldir_t (readdir callbacks) calling conventions have changed.  Instead of
92625885a35SAl Viroreturning 0 or -E... it returns bool now.  false means "no more" (as -E... used
92725885a35SAl Viroto) and true - "keep going" (as 0 in old calling conventions).  Rationale:
9283e327154SLinus Torvaldscallers never looked at specific -E... values anyway. -> iterate_shared()
9293e327154SLinus Torvaldsinstances require no changes at all, all filldir_t ones in the tree
9303e327154SLinus Torvaldsconverted.
931f721d24eSLinus Torvalds
932f721d24eSLinus Torvalds---
933f721d24eSLinus Torvalds
934863f144fSMiklos Szeredi**mandatory**
935863f144fSMiklos Szeredi
936863f144fSMiklos SzerediCalling conventions for ->tmpfile() have changed.  It now takes a struct
937863f144fSMiklos Szeredifile pointer instead of struct dentry pointer.  d_tmpfile() is similarly
938863f144fSMiklos Szeredichanged to simplify callers.  The passed file is in a non-open state and on
939863f144fSMiklos Szeredisuccess must be opened before returning (e.g. by calling
940863f144fSMiklos Szeredifinish_open_simple()).
94140d49a3cSMatthew Wilcox (Oracle)
94240d49a3cSMatthew Wilcox (Oracle)---
94340d49a3cSMatthew Wilcox (Oracle)
94440d49a3cSMatthew Wilcox (Oracle)**mandatory**
94540d49a3cSMatthew Wilcox (Oracle)
94640d49a3cSMatthew Wilcox (Oracle)Calling convention for ->huge_fault has changed.  It now takes a page
94740d49a3cSMatthew Wilcox (Oracle)order instead of an enum page_entry_size, and it may be called without the
94840d49a3cSMatthew Wilcox (Oracle)mmap_lock held.  All in-tree users have been audited and do not seem to
94940d49a3cSMatthew Wilcox (Oracle)depend on the mmap_lock being held, but out of tree users should verify
95040d49a3cSMatthew Wilcox (Oracle)for themselves.  If they do need it, they can return VM_FAULT_RETRY to
95140d49a3cSMatthew Wilcox (Oracle)be called with the mmap_lock held.
9522ba0dd65SChristian Brauner
9532ba0dd65SChristian Brauner---
9542ba0dd65SChristian Brauner
9552ba0dd65SChristian Brauner**mandatory**
9562ba0dd65SChristian Brauner
9572ba0dd65SChristian BraunerThe order of opening block devices and matching or creating superblocks has
9582ba0dd65SChristian Braunerchanged.
9592ba0dd65SChristian Brauner
9602ba0dd65SChristian BraunerThe old logic opened block devices first and then tried to find a
9612ba0dd65SChristian Braunersuitable superblock to reuse based on the block device pointer.
9622ba0dd65SChristian Brauner
9632ba0dd65SChristian BraunerThe new logic tries to find a suitable superblock first based on the device
9642ba0dd65SChristian Braunernumber, and opening the block device afterwards.
9652ba0dd65SChristian Brauner
9662ba0dd65SChristian BraunerSince opening block devices cannot happen under s_umount because of lock
9672ba0dd65SChristian Braunerordering requirements s_umount is now dropped while opening block devices and
9682ba0dd65SChristian Braunerreacquired before calling fill_super().
9692ba0dd65SChristian Brauner
9702ba0dd65SChristian BraunerIn the old logic concurrent mounters would find the superblock on the list of
9712ba0dd65SChristian Braunersuperblocks for the filesystem type. Since the first opener of the block device
9722ba0dd65SChristian Braunerwould hold s_umount they would wait until the superblock became either born or
9732ba0dd65SChristian Braunerwas discarded due to initialization failure.
9742ba0dd65SChristian Brauner
9752ba0dd65SChristian BraunerSince the new logic drops s_umount concurrent mounters could grab s_umount and
9762ba0dd65SChristian Braunerwould spin. Instead they are now made to wait using an explicit wait-wake
9772ba0dd65SChristian Braunermechanism without having to hold s_umount.
978060e6c7dSChristian Brauner
979060e6c7dSChristian Brauner---
980060e6c7dSChristian Brauner
981060e6c7dSChristian Brauner**mandatory**
982060e6c7dSChristian Brauner
983060e6c7dSChristian BraunerThe holder of a block device is now the superblock.
984060e6c7dSChristian Brauner
985060e6c7dSChristian BraunerThe holder of a block device used to be the file_system_type which wasn't
986060e6c7dSChristian Braunerparticularly useful. It wasn't possible to go from block device to owning
987060e6c7dSChristian Braunersuperblock without matching on the device pointer stored in the superblock.
988060e6c7dSChristian BraunerThis mechanism would only work for a single device so the block layer couldn't
989060e6c7dSChristian Braunerfind the owning superblock of any additional devices.
990060e6c7dSChristian Brauner
991060e6c7dSChristian BraunerIn the old mechanism reusing or creating a superblock for a racing mount(2) and
992da51bbcdSRemington Brasgaumount(2) relied on the file_system_type as the holder. This was severely
993060e6c7dSChristian Braunerunderdocumented however:
994060e6c7dSChristian Brauner
995060e6c7dSChristian Brauner(1) Any concurrent mounter that managed to grab an active reference on an
996060e6c7dSChristian Brauner    existing superblock was made to wait until the superblock either became
997060e6c7dSChristian Brauner    ready or until the superblock was removed from the list of superblocks of
998060e6c7dSChristian Brauner    the filesystem type. If the superblock is ready the caller would simple
999060e6c7dSChristian Brauner    reuse it.
1000060e6c7dSChristian Brauner
1001060e6c7dSChristian Brauner(2) If the mounter came after deactivate_locked_super() but before
1002060e6c7dSChristian Brauner    the superblock had been removed from the list of superblocks of the
1003060e6c7dSChristian Brauner    filesystem type the mounter would wait until the superblock was shutdown,
1004060e6c7dSChristian Brauner    reuse the block device and allocate a new superblock.
1005060e6c7dSChristian Brauner
1006060e6c7dSChristian Brauner(3) If the mounter came after deactivate_locked_super() and after
1007060e6c7dSChristian Brauner    the superblock had been removed from the list of superblocks of the
1008060e6c7dSChristian Brauner    filesystem type the mounter would reuse the block device and allocate a new
1009060e6c7dSChristian Brauner    superblock (the bd_holder point may still be set to the filesystem type).
1010060e6c7dSChristian Brauner
1011060e6c7dSChristian BraunerBecause the holder of the block device was the file_system_type any concurrent
1012060e6c7dSChristian Braunermounter could open the block devices of any superblock of the same
1013060e6c7dSChristian Braunerfile_system_type without risking seeing EBUSY because the block device was
1014060e6c7dSChristian Braunerstill in use by another superblock.
1015060e6c7dSChristian Brauner
1016060e6c7dSChristian BraunerMaking the superblock the owner of the block device changes this as the holder
1017060e6c7dSChristian Brauneris now a unique superblock and thus block devices associated with it cannot be
1018060e6c7dSChristian Braunerreused by concurrent mounters. So a concurrent mounter in (2) could suddenly
1019060e6c7dSChristian Braunersee EBUSY when trying to open a block device whose holder was a different
1020060e6c7dSChristian Braunersuperblock.
1021060e6c7dSChristian Brauner
1022060e6c7dSChristian BraunerThe new logic thus waits until the superblock and the devices are shutdown in
1023060e6c7dSChristian Brauner->kill_sb(). Removal of the superblock from the list of superblocks of the
1024060e6c7dSChristian Braunerfilesystem type is now moved to a later point when the devices are closed:
1025060e6c7dSChristian Brauner
1026060e6c7dSChristian Brauner(1) Any concurrent mounter managing to grab an active reference on an existing
1027060e6c7dSChristian Brauner    superblock is made to wait until the superblock is either ready or until
1028060e6c7dSChristian Brauner    the superblock and all devices are shutdown in ->kill_sb(). If the
1029060e6c7dSChristian Brauner    superblock is ready the caller will simply reuse it.
1030060e6c7dSChristian Brauner
1031060e6c7dSChristian Brauner(2) If the mounter comes after deactivate_locked_super() but before
1032060e6c7dSChristian Brauner    the superblock has been removed from the list of superblocks of the
1033060e6c7dSChristian Brauner    filesystem type the mounter is made to wait until the superblock and the
1034060e6c7dSChristian Brauner    devices are shut down in ->kill_sb() and the superblock is removed from the
1035060e6c7dSChristian Brauner    list of superblocks of the filesystem type. The mounter will allocate a new
1036060e6c7dSChristian Brauner    superblock and grab ownership of the block device (the bd_holder pointer of
1037060e6c7dSChristian Brauner    the block device will be set to the newly allocated superblock).
1038060e6c7dSChristian Brauner
1039060e6c7dSChristian Brauner(3) This case is now collapsed into (2) as the superblock is left on the list
1040060e6c7dSChristian Brauner    of superblocks of the filesystem type until all devices are shutdown in
1041060e6c7dSChristian Brauner    ->kill_sb(). In other words, if the superblock isn't on the list of
1042060e6c7dSChristian Brauner    superblock of the filesystem type anymore then it has given up ownership of
1043060e6c7dSChristian Brauner    all associated block devices (the bd_holder pointer is NULL).
1044060e6c7dSChristian Brauner
1045060e6c7dSChristian BraunerAs this is a VFS level change it has no practical consequences for filesystems
1046060e6c7dSChristian Braunerother than that all of them must use one of the provided kill_litter_super(),
1047060e6c7dSChristian Braunerkill_anon_super(), or kill_block_super() helpers.
10485aa9130aSChristian Brauner
10495aa9130aSChristian Brauner---
10505aa9130aSChristian Brauner
10515aa9130aSChristian Brauner**mandatory**
10525aa9130aSChristian Brauner
10535aa9130aSChristian BraunerLock ordering has been changed so that s_umount ranks above open_mutex again.
10545aa9130aSChristian BraunerAll places where s_umount was taken under open_mutex have been fixed up.
105513d88ac5SLinus Torvalds
105613d88ac5SLinus Torvalds---
105713d88ac5SLinus Torvalds
105813d88ac5SLinus Torvalds**mandatory**
105913d88ac5SLinus Torvalds
1060e21fc203SAmir Goldsteinexport_operations ->encode_fh() no longer has a default implementation to
1061e21fc203SAmir Goldsteinencode FILEID_INO32_GEN* file handles.
1062e21fc203SAmir GoldsteinFilesystems that used the default implementation may use the generic helper
1063e21fc203SAmir Goldsteingeneric_encode_ino32_fh() explicitly.
106401bc8e9aSChristian Brauner
106501bc8e9aSChristian Brauner---
106601bc8e9aSChristian Brauner
106722e111edSAl Viro**mandatory**
106822e111edSAl Viro
106922e111edSAl ViroIf ->rename() update of .. on cross-directory move needs an exclusion with
107022e111edSAl Virodirectory modifications, do *not* lock the subdirectory in question in your
107122e111edSAl Viro->rename() - it's done by the caller now [that item should've been added in
107222e111edSAl Viro28eceeda130f "fs: Lock moved directories"].
107322e111edSAl Viro
107422e111edSAl Viro---
107522e111edSAl Viro
107622e111edSAl Viro**mandatory**
107722e111edSAl Viro
107822e111edSAl ViroOn same-directory ->rename() the (tautological) update of .. is not protected
107922e111edSAl Viroby any locks; just don't do it if the old parent is the same as the new one.
108022e111edSAl ViroWe really can't lock two subdirectories in same-directory rename - not without
108122e111edSAl Virodeadlocks.
1082a8b00268SAl Viro
1083a8b00268SAl Viro---
1084a8b00268SAl Viro
1085a8b00268SAl Viro**mandatory**
1086a8b00268SAl Viro
1087a8b00268SAl Virolock_rename() and lock_rename_child() may fail in cross-directory case, if
1088a8b00268SAl Virotheir arguments do not have a common ancestor.  In that case ERR_PTR(-EXDEV)
1089a8b00268SAl Virois returned, with no locks taken.  In-tree users updated; out-of-tree ones
1090a8b00268SAl Virowould need to do so.
1091bf4e7080SLinus Torvalds
1092bf4e7080SLinus Torvalds---
1093bf4e7080SLinus Torvalds
1094499aa1caSLinus Torvalds**mandatory**
1095499aa1caSLinus Torvalds
1096da549bddSAl ViroThe list of children anchored in parent dentry got turned into hlist now.
1097da549bddSAl ViroField names got changed (->d_children/->d_sib instead of ->d_subdirs/->d_child
1098da549bddSAl Virofor anchor/entries resp.), so any affected places will be immediately caught
1099da549bddSAl Viroby compiler.
11002f42f1ebSAl Viro
11012f42f1ebSAl Viro---
11022f42f1ebSAl Viro
11032f42f1ebSAl Viro**mandatory**
11042f42f1ebSAl Viro
11052f42f1ebSAl Viro->d_delete() instances are now called for dentries with ->d_lock held
11062f42f1ebSAl Viroand refcount equal to 0.  They are not permitted to drop/regain ->d_lock.
11072f42f1ebSAl ViroNone of in-tree instances did anything of that sort.  Make sure yours do not...
11081c18edd1SAl Viro
1109499aa1caSLinus Torvalds---
11101c18edd1SAl Viro
11111c18edd1SAl Viro**mandatory**
11121c18edd1SAl Viro
11131c18edd1SAl Viro->d_prune() instances are now called without ->d_lock held on the parent.
11141c18edd1SAl Viro->d_lock on dentry itself is still held; if you need per-parent exclusions (none
11151c18edd1SAl Viroof the in-tree instances did), use your own spinlock.
11161c18edd1SAl Viro
11171c18edd1SAl Viro->d_iput() and ->d_release() are called with victim dentry still in the
11181c18edd1SAl Virolist of parent's children.  It is still unhashed, marked killed, etc., just not
11191c18edd1SAl Viroremoved from parent's ->d_children yet.
11201c18edd1SAl Viro
11211c18edd1SAl ViroAnyone iterating through the list of children needs to be aware of the
11221c18edd1SAl Virohalf-killed dentries that might be seen there; taking ->d_lock on those will
11231c18edd1SAl Virosee them negative, unhashed and with negative refcount, which means that most
11241c18edd1SAl Viroof the in-kernel users would've done the right thing anyway without any adjustment.
1125499aa1caSLinus Torvalds
1126499aa1caSLinus Torvalds---
1127499aa1caSLinus Torvalds
112801bc8e9aSChristian Brauner**recommended**
112901bc8e9aSChristian Brauner
113001bc8e9aSChristian BraunerBlock device freezing and thawing have been moved to holder operations.
113101bc8e9aSChristian Brauner
113201bc8e9aSChristian BraunerBefore this change, get_active_super() would only be able to find the
113301bc8e9aSChristian Braunersuperblock of the main block device, i.e., the one stored in sb->s_bdev. Block
113401bc8e9aSChristian Braunerdevice freezing now works for any block device owned by a given superblock, not
113501bc8e9aSChristian Braunerjust the main block device. The get_active_super() helper and bd_fsfreeze_sb
113601bc8e9aSChristian Braunerpointer are gone.
1137*d18a8679SAl Viro
1138*d18a8679SAl Viro---
1139*d18a8679SAl Viro
1140*d18a8679SAl Viro**mandatory**
1141*d18a8679SAl Viro
1142*d18a8679SAl Viroset_blocksize() takes opened struct file instead of struct block_device now
1143*d18a8679SAl Viroand it *must* be opened exclusive.
1144