xref: /linux/Documentation/filesystems/porting.rst (revision f0b65f39ac505e8f1dcdaa165aa7b8c0bd6fd454)
125b532ceSMauro Carvalho Chehab====================
225b532ceSMauro Carvalho ChehabChanges since 2.5.0:
325b532ceSMauro Carvalho Chehab====================
425b532ceSMauro Carvalho Chehab
525b532ceSMauro Carvalho Chehab---
625b532ceSMauro Carvalho Chehab
725b532ceSMauro Carvalho Chehab**recommended**
825b532ceSMauro Carvalho Chehab
925b532ceSMauro Carvalho ChehabNew helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(),
1025b532ceSMauro Carvalho Chehabsb_set_blocksize() and sb_min_blocksize().
1125b532ceSMauro Carvalho Chehab
1225b532ceSMauro Carvalho ChehabUse them.
1325b532ceSMauro Carvalho Chehab
1425b532ceSMauro Carvalho Chehab(sb_find_get_block() replaces 2.4's get_hash_table())
1525b532ceSMauro Carvalho Chehab
1625b532ceSMauro Carvalho Chehab---
1725b532ceSMauro Carvalho Chehab
1825b532ceSMauro Carvalho Chehab**recommended**
1925b532ceSMauro Carvalho Chehab
2025b532ceSMauro Carvalho ChehabNew methods: ->alloc_inode() and ->destroy_inode().
2125b532ceSMauro Carvalho Chehab
2225b532ceSMauro Carvalho ChehabRemove inode->u.foo_inode_i
2325b532ceSMauro Carvalho Chehab
2425b532ceSMauro Carvalho ChehabDeclare::
2525b532ceSMauro Carvalho Chehab
2625b532ceSMauro Carvalho Chehab	struct foo_inode_info {
2725b532ceSMauro Carvalho Chehab		/* fs-private stuff */
2825b532ceSMauro Carvalho Chehab		struct inode vfs_inode;
2925b532ceSMauro Carvalho Chehab	};
3025b532ceSMauro Carvalho Chehab	static inline struct foo_inode_info *FOO_I(struct inode *inode)
3125b532ceSMauro Carvalho Chehab	{
3225b532ceSMauro Carvalho Chehab		return list_entry(inode, struct foo_inode_info, vfs_inode);
3325b532ceSMauro Carvalho Chehab	}
3425b532ceSMauro Carvalho Chehab
3525b532ceSMauro Carvalho ChehabUse FOO_I(inode) instead of &inode->u.foo_inode_i;
3625b532ceSMauro Carvalho Chehab
3725b532ceSMauro Carvalho ChehabAdd foo_alloc_inode() and foo_destroy_inode() - the former should allocate
3825b532ceSMauro Carvalho Chehabfoo_inode_info and return the address of ->vfs_inode, the latter should free
3925b532ceSMauro Carvalho ChehabFOO_I(inode) (see in-tree filesystems for examples).
4025b532ceSMauro Carvalho Chehab
4125b532ceSMauro Carvalho ChehabMake them ->alloc_inode and ->destroy_inode in your super_operations.
4225b532ceSMauro Carvalho Chehab
4325b532ceSMauro Carvalho ChehabKeep in mind that now you need explicit initialization of private data
4425b532ceSMauro Carvalho Chehabtypically between calling iget_locked() and unlocking the inode.
4525b532ceSMauro Carvalho Chehab
4625b532ceSMauro Carvalho ChehabAt some point that will become mandatory.
4725b532ceSMauro Carvalho Chehab
4825b532ceSMauro Carvalho Chehab---
4925b532ceSMauro Carvalho Chehab
5025b532ceSMauro Carvalho Chehab**mandatory**
5125b532ceSMauro Carvalho Chehab
5225b532ceSMauro Carvalho ChehabChange of file_system_type method (->read_super to ->get_sb)
5325b532ceSMauro Carvalho Chehab
5425b532ceSMauro Carvalho Chehab->read_super() is no more.  Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV.
5525b532ceSMauro Carvalho Chehab
5625b532ceSMauro Carvalho ChehabTurn your foo_read_super() into a function that would return 0 in case of
5725b532ceSMauro Carvalho Chehabsuccess and negative number in case of error (-EINVAL unless you have more
5825b532ceSMauro Carvalho Chehabinformative error value to report).  Call it foo_fill_super().  Now declare::
5925b532ceSMauro Carvalho Chehab
6025b532ceSMauro Carvalho Chehab  int foo_get_sb(struct file_system_type *fs_type,
6125b532ceSMauro Carvalho Chehab	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
6225b532ceSMauro Carvalho Chehab  {
6325b532ceSMauro Carvalho Chehab	return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super,
6425b532ceSMauro Carvalho Chehab			   mnt);
6525b532ceSMauro Carvalho Chehab  }
6625b532ceSMauro Carvalho Chehab
6725b532ceSMauro Carvalho Chehab(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of
6825b532ceSMauro Carvalho Chehabfilesystem).
6925b532ceSMauro Carvalho Chehab
7025b532ceSMauro Carvalho ChehabReplace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as
7125b532ceSMauro Carvalho Chehabfoo_get_sb.
7225b532ceSMauro Carvalho Chehab
7325b532ceSMauro Carvalho Chehab---
7425b532ceSMauro Carvalho Chehab
7525b532ceSMauro Carvalho Chehab**mandatory**
7625b532ceSMauro Carvalho Chehab
7725b532ceSMauro Carvalho ChehabLocking change: ->s_vfs_rename_sem is taken only by cross-directory renames.
7825b532ceSMauro Carvalho ChehabMost likely there is no need to change anything, but if you relied on
7925b532ceSMauro Carvalho Chehabglobal exclusion between renames for some internal purpose - you need to
8025b532ceSMauro Carvalho Chehabchange your internal locking.  Otherwise exclusion warranties remain the
8125b532ceSMauro Carvalho Chehabsame (i.e. parents and victim are locked, etc.).
8225b532ceSMauro Carvalho Chehab
8325b532ceSMauro Carvalho Chehab---
8425b532ceSMauro Carvalho Chehab
8525b532ceSMauro Carvalho Chehab**informational**
8625b532ceSMauro Carvalho Chehab
8725b532ceSMauro Carvalho ChehabNow we have the exclusion between ->lookup() and directory removal (by
8825b532ceSMauro Carvalho Chehab->rmdir() and ->rename()).  If you used to need that exclusion and do
8925b532ceSMauro Carvalho Chehabit by internal locking (most of filesystems couldn't care less) - you
9025b532ceSMauro Carvalho Chehabcan relax your locking.
9125b532ceSMauro Carvalho Chehab
9225b532ceSMauro Carvalho Chehab---
9325b532ceSMauro Carvalho Chehab
9425b532ceSMauro Carvalho Chehab**mandatory**
9525b532ceSMauro Carvalho Chehab
9625b532ceSMauro Carvalho Chehab->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(),
9725b532ceSMauro Carvalho Chehab->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename()
9825b532ceSMauro Carvalho Chehaband ->readdir() are called without BKL now.  Grab it on entry, drop upon return
9925b532ceSMauro Carvalho Chehab- that will guarantee the same locking you used to have.  If your method or its
10025b532ceSMauro Carvalho Chehabparts do not need BKL - better yet, now you can shift lock_kernel() and
10125b532ceSMauro Carvalho Chehabunlock_kernel() so that they would protect exactly what needs to be
10225b532ceSMauro Carvalho Chehabprotected.
10325b532ceSMauro Carvalho Chehab
10425b532ceSMauro Carvalho Chehab---
10525b532ceSMauro Carvalho Chehab
10625b532ceSMauro Carvalho Chehab**mandatory**
10725b532ceSMauro Carvalho Chehab
10825b532ceSMauro Carvalho ChehabBKL is also moved from around sb operations. BKL should have been shifted into
10925b532ceSMauro Carvalho Chehabindividual fs sb_op functions.  If you don't need it, remove it.
11025b532ceSMauro Carvalho Chehab
11125b532ceSMauro Carvalho Chehab---
11225b532ceSMauro Carvalho Chehab
11325b532ceSMauro Carvalho Chehab**informational**
11425b532ceSMauro Carvalho Chehab
11525b532ceSMauro Carvalho Chehabcheck for ->link() target not being a directory is done by callers.  Feel
11625b532ceSMauro Carvalho Chehabfree to drop it...
11725b532ceSMauro Carvalho Chehab
11825b532ceSMauro Carvalho Chehab---
11925b532ceSMauro Carvalho Chehab
12025b532ceSMauro Carvalho Chehab**informational**
12125b532ceSMauro Carvalho Chehab
12225b532ceSMauro Carvalho Chehab->link() callers hold ->i_mutex on the object we are linking to.  Some of your
12325b532ceSMauro Carvalho Chehabproblems might be over...
12425b532ceSMauro Carvalho Chehab
12525b532ceSMauro Carvalho Chehab---
12625b532ceSMauro Carvalho Chehab
12725b532ceSMauro Carvalho Chehab**mandatory**
12825b532ceSMauro Carvalho Chehab
12925b532ceSMauro Carvalho Chehabnew file_system_type method - kill_sb(superblock).  If you are converting
13025b532ceSMauro Carvalho Chehaban existing filesystem, set it according to ->fs_flags::
13125b532ceSMauro Carvalho Chehab
13225b532ceSMauro Carvalho Chehab	FS_REQUIRES_DEV		-	kill_block_super
13325b532ceSMauro Carvalho Chehab	FS_LITTER		-	kill_litter_super
13425b532ceSMauro Carvalho Chehab	neither			-	kill_anon_super
13525b532ceSMauro Carvalho Chehab
13625b532ceSMauro Carvalho ChehabFS_LITTER is gone - just remove it from fs_flags.
13725b532ceSMauro Carvalho Chehab
13825b532ceSMauro Carvalho Chehab---
13925b532ceSMauro Carvalho Chehab
14025b532ceSMauro Carvalho Chehab**mandatory**
14125b532ceSMauro Carvalho Chehab
14225b532ceSMauro Carvalho ChehabFS_SINGLE is gone (actually, that had happened back when ->get_sb()
14325b532ceSMauro Carvalho Chehabwent in - and hadn't been documented ;-/).  Just remove it from fs_flags
14425b532ceSMauro Carvalho Chehab(and see ->get_sb() entry for other actions).
14525b532ceSMauro Carvalho Chehab
14625b532ceSMauro Carvalho Chehab---
14725b532ceSMauro Carvalho Chehab
14825b532ceSMauro Carvalho Chehab**mandatory**
14925b532ceSMauro Carvalho Chehab
15025b532ceSMauro Carvalho Chehab->setattr() is called without BKL now.  Caller _always_ holds ->i_mutex, so
15125b532ceSMauro Carvalho Chehabwatch for ->i_mutex-grabbing code that might be used by your ->setattr().
15225b532ceSMauro Carvalho ChehabCallers of notify_change() need ->i_mutex now.
15325b532ceSMauro Carvalho Chehab
15425b532ceSMauro Carvalho Chehab---
15525b532ceSMauro Carvalho Chehab
15625b532ceSMauro Carvalho Chehab**recommended**
15725b532ceSMauro Carvalho Chehab
15825b532ceSMauro Carvalho ChehabNew super_block field ``struct export_operations *s_export_op`` for
15925b532ceSMauro Carvalho Chehabexplicit support for exporting, e.g. via NFS.  The structure is fully
16025b532ceSMauro Carvalho Chehabdocumented at its declaration in include/linux/fs.h, and in
1619195c3e8SMauro Carvalho ChehabDocumentation/filesystems/nfs/exporting.rst.
16225b532ceSMauro Carvalho Chehab
16325b532ceSMauro Carvalho ChehabBriefly it allows for the definition of decode_fh and encode_fh operations
16425b532ceSMauro Carvalho Chehabto encode and decode filehandles, and allows the filesystem to use
16525b532ceSMauro Carvalho Chehaba standard helper function for decode_fh, and provide file-system specific
16625b532ceSMauro Carvalho Chehabsupport for this helper, particularly get_parent.
16725b532ceSMauro Carvalho Chehab
16825b532ceSMauro Carvalho ChehabIt is planned that this will be required for exporting once the code
16925b532ceSMauro Carvalho Chehabsettles down a bit.
17025b532ceSMauro Carvalho Chehab
17125b532ceSMauro Carvalho Chehab**mandatory**
17225b532ceSMauro Carvalho Chehab
17325b532ceSMauro Carvalho Chehabs_export_op is now required for exporting a filesystem.
17425b532ceSMauro Carvalho Chehabisofs, ext2, ext3, resierfs, fat
17525b532ceSMauro Carvalho Chehabcan be used as examples of very different filesystems.
17625b532ceSMauro Carvalho Chehab
17725b532ceSMauro Carvalho Chehab---
17825b532ceSMauro Carvalho Chehab
17925b532ceSMauro Carvalho Chehab**mandatory**
18025b532ceSMauro Carvalho Chehab
18125b532ceSMauro Carvalho Chehabiget4() and the read_inode2 callback have been superseded by iget5_locked()
18225b532ceSMauro Carvalho Chehabwhich has the following prototype::
18325b532ceSMauro Carvalho Chehab
18425b532ceSMauro Carvalho Chehab    struct inode *iget5_locked(struct super_block *sb, unsigned long ino,
18525b532ceSMauro Carvalho Chehab				int (*test)(struct inode *, void *),
18625b532ceSMauro Carvalho Chehab				int (*set)(struct inode *, void *),
18725b532ceSMauro Carvalho Chehab				void *data);
18825b532ceSMauro Carvalho Chehab
18925b532ceSMauro Carvalho Chehab'test' is an additional function that can be used when the inode
19025b532ceSMauro Carvalho Chehabnumber is not sufficient to identify the actual file object. 'set'
19125b532ceSMauro Carvalho Chehabshould be a non-blocking function that initializes those parts of a
19225b532ceSMauro Carvalho Chehabnewly created inode to allow the test function to succeed. 'data' is
19325b532ceSMauro Carvalho Chehabpassed as an opaque value to both test and set functions.
19425b532ceSMauro Carvalho Chehab
19525b532ceSMauro Carvalho ChehabWhen the inode has been created by iget5_locked(), it will be returned with the
19625b532ceSMauro Carvalho ChehabI_NEW flag set and will still be locked.  The filesystem then needs to finalize
19725b532ceSMauro Carvalho Chehabthe initialization. Once the inode is initialized it must be unlocked by
19825b532ceSMauro Carvalho Chehabcalling unlock_new_inode().
19925b532ceSMauro Carvalho Chehab
20025b532ceSMauro Carvalho ChehabThe filesystem is responsible for setting (and possibly testing) i_ino
20125b532ceSMauro Carvalho Chehabwhen appropriate. There is also a simpler iget_locked function that
20225b532ceSMauro Carvalho Chehabjust takes the superblock and inode number as arguments and does the
20325b532ceSMauro Carvalho Chehabtest and set for you.
20425b532ceSMauro Carvalho Chehab
20525b532ceSMauro Carvalho Chehabe.g.::
20625b532ceSMauro Carvalho Chehab
20725b532ceSMauro Carvalho Chehab	inode = iget_locked(sb, ino);
20825b532ceSMauro Carvalho Chehab	if (inode->i_state & I_NEW) {
20925b532ceSMauro Carvalho Chehab		err = read_inode_from_disk(inode);
21025b532ceSMauro Carvalho Chehab		if (err < 0) {
21125b532ceSMauro Carvalho Chehab			iget_failed(inode);
21225b532ceSMauro Carvalho Chehab			return err;
21325b532ceSMauro Carvalho Chehab		}
21425b532ceSMauro Carvalho Chehab		unlock_new_inode(inode);
21525b532ceSMauro Carvalho Chehab	}
21625b532ceSMauro Carvalho Chehab
21725b532ceSMauro Carvalho ChehabNote that if the process of setting up a new inode fails, then iget_failed()
21825b532ceSMauro Carvalho Chehabshould be called on the inode to render it dead, and an appropriate error
21925b532ceSMauro Carvalho Chehabshould be passed back to the caller.
22025b532ceSMauro Carvalho Chehab
22125b532ceSMauro Carvalho Chehab---
22225b532ceSMauro Carvalho Chehab
22325b532ceSMauro Carvalho Chehab**recommended**
22425b532ceSMauro Carvalho Chehab
22525b532ceSMauro Carvalho Chehab->getattr() finally getting used.  See instances in nfs, minix, etc.
22625b532ceSMauro Carvalho Chehab
22725b532ceSMauro Carvalho Chehab---
22825b532ceSMauro Carvalho Chehab
22925b532ceSMauro Carvalho Chehab**mandatory**
23025b532ceSMauro Carvalho Chehab
23125b532ceSMauro Carvalho Chehab->revalidate() is gone.  If your filesystem had it - provide ->getattr()
23225b532ceSMauro Carvalho Chehaband let it call whatever you had as ->revlidate() + (for symlinks that
23325b532ceSMauro Carvalho Chehabhad ->revalidate()) add calls in ->follow_link()/->readlink().
23425b532ceSMauro Carvalho Chehab
23525b532ceSMauro Carvalho Chehab---
23625b532ceSMauro Carvalho Chehab
23725b532ceSMauro Carvalho Chehab**mandatory**
23825b532ceSMauro Carvalho Chehab
23925b532ceSMauro Carvalho Chehab->d_parent changes are not protected by BKL anymore.  Read access is safe
24025b532ceSMauro Carvalho Chehabif at least one of the following is true:
24125b532ceSMauro Carvalho Chehab
24225b532ceSMauro Carvalho Chehab	* filesystem has no cross-directory rename()
24325b532ceSMauro Carvalho Chehab	* we know that parent had been locked (e.g. we are looking at
24425b532ceSMauro Carvalho Chehab	  ->d_parent of ->lookup() argument).
24525b532ceSMauro Carvalho Chehab	* we are called from ->rename().
24625b532ceSMauro Carvalho Chehab	* the child's ->d_lock is held
24725b532ceSMauro Carvalho Chehab
24825b532ceSMauro Carvalho ChehabAudit your code and add locking if needed.  Notice that any place that is
24925b532ceSMauro Carvalho Chehabnot protected by the conditions above is risky even in the old tree - you
25025b532ceSMauro Carvalho Chehabhad been relying on BKL and that's prone to screwups.  Old tree had quite
25125b532ceSMauro Carvalho Chehaba few holes of that kind - unprotected access to ->d_parent leading to
25225b532ceSMauro Carvalho Chehabanything from oops to silent memory corruption.
25325b532ceSMauro Carvalho Chehab
25425b532ceSMauro Carvalho Chehab---
25525b532ceSMauro Carvalho Chehab
25625b532ceSMauro Carvalho Chehab**mandatory**
25725b532ceSMauro Carvalho Chehab
25825b532ceSMauro Carvalho ChehabFS_NOMOUNT is gone.  If you use it - just set SB_NOUSER in flags
25925b532ceSMauro Carvalho Chehab(see rootfs for one kind of solution and bdev/socket/pipe for another).
26025b532ceSMauro Carvalho Chehab
26125b532ceSMauro Carvalho Chehab---
26225b532ceSMauro Carvalho Chehab
26325b532ceSMauro Carvalho Chehab**recommended**
26425b532ceSMauro Carvalho Chehab
26525b532ceSMauro Carvalho ChehabUse bdev_read_only(bdev) instead of is_read_only(kdev).  The latter
26625b532ceSMauro Carvalho Chehabis still alive, but only because of the mess in drivers/s390/block/dasd.c.
26725b532ceSMauro Carvalho ChehabAs soon as it gets fixed is_read_only() will die.
26825b532ceSMauro Carvalho Chehab
26925b532ceSMauro Carvalho Chehab---
27025b532ceSMauro Carvalho Chehab
27125b532ceSMauro Carvalho Chehab**mandatory**
27225b532ceSMauro Carvalho Chehab
27325b532ceSMauro Carvalho Chehab->permission() is called without BKL now. Grab it on entry, drop upon
27425b532ceSMauro Carvalho Chehabreturn - that will guarantee the same locking you used to have.  If
27525b532ceSMauro Carvalho Chehabyour method or its parts do not need BKL - better yet, now you can
27625b532ceSMauro Carvalho Chehabshift lock_kernel() and unlock_kernel() so that they would protect
27725b532ceSMauro Carvalho Chehabexactly what needs to be protected.
27825b532ceSMauro Carvalho Chehab
27925b532ceSMauro Carvalho Chehab---
28025b532ceSMauro Carvalho Chehab
28125b532ceSMauro Carvalho Chehab**mandatory**
28225b532ceSMauro Carvalho Chehab
28325b532ceSMauro Carvalho Chehab->statfs() is now called without BKL held.  BKL should have been
28425b532ceSMauro Carvalho Chehabshifted into individual fs sb_op functions where it's not clear that
28525b532ceSMauro Carvalho Chehabit's safe to remove it.  If you don't need it, remove it.
28625b532ceSMauro Carvalho Chehab
28725b532ceSMauro Carvalho Chehab---
28825b532ceSMauro Carvalho Chehab
28925b532ceSMauro Carvalho Chehab**mandatory**
29025b532ceSMauro Carvalho Chehab
29125b532ceSMauro Carvalho Chehabis_read_only() is gone; use bdev_read_only() instead.
29225b532ceSMauro Carvalho Chehab
29325b532ceSMauro Carvalho Chehab---
29425b532ceSMauro Carvalho Chehab
29525b532ceSMauro Carvalho Chehab**mandatory**
29625b532ceSMauro Carvalho Chehab
29725b532ceSMauro Carvalho Chehabdestroy_buffers() is gone; use invalidate_bdev().
29825b532ceSMauro Carvalho Chehab
29925b532ceSMauro Carvalho Chehab---
30025b532ceSMauro Carvalho Chehab
30125b532ceSMauro Carvalho Chehab**mandatory**
30225b532ceSMauro Carvalho Chehab
30325b532ceSMauro Carvalho Chehabfsync_dev() is gone; use fsync_bdev().  NOTE: lvm breakage is
30425b532ceSMauro Carvalho Chehabdeliberate; as soon as struct block_device * is propagated in a reasonable
30525b532ceSMauro Carvalho Chehabway by that code fixing will become trivial; until then nothing can be
30625b532ceSMauro Carvalho Chehabdone.
30725b532ceSMauro Carvalho Chehab
30825b532ceSMauro Carvalho Chehab**mandatory**
30925b532ceSMauro Carvalho Chehab
31025b532ceSMauro Carvalho Chehabblock truncatation on error exit from ->write_begin, and ->direct_IO
31125b532ceSMauro Carvalho Chehabmoved from generic methods (block_write_begin, cont_write_begin,
31225b532ceSMauro Carvalho Chehabnobh_write_begin, blockdev_direct_IO*) to callers.  Take a look at
31325b532ceSMauro Carvalho Chehabext2_write_failed and callers for an example.
31425b532ceSMauro Carvalho Chehab
31525b532ceSMauro Carvalho Chehab**mandatory**
31625b532ceSMauro Carvalho Chehab
31725b532ceSMauro Carvalho Chehab->truncate is gone.  The whole truncate sequence needs to be
31825b532ceSMauro Carvalho Chehabimplemented in ->setattr, which is now mandatory for filesystems
31925b532ceSMauro Carvalho Chehabimplementing on-disk size changes.  Start with a copy of the old inode_setattr
32025b532ceSMauro Carvalho Chehaband vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to
32125b532ceSMauro Carvalho Chehabbe in order of zeroing blocks using block_truncate_page or similar helpers,
32225b532ceSMauro Carvalho Chehabsize update and on finally on-disk truncation which should not fail.
32325b532ceSMauro Carvalho Chehabsetattr_prepare (which used to be inode_change_ok) now includes the size checks
32425b532ceSMauro Carvalho Chehabfor ATTR_SIZE and must be called in the beginning of ->setattr unconditionally.
32525b532ceSMauro Carvalho Chehab
32625b532ceSMauro Carvalho Chehab**mandatory**
32725b532ceSMauro Carvalho Chehab
32825b532ceSMauro Carvalho Chehab->clear_inode() and ->delete_inode() are gone; ->evict_inode() should
32925b532ceSMauro Carvalho Chehabbe used instead.  It gets called whenever the inode is evicted, whether it has
33025b532ceSMauro Carvalho Chehabremaining links or not.  Caller does *not* evict the pagecache or inode-associated
33125b532ceSMauro Carvalho Chehabmetadata buffers; the method has to use truncate_inode_pages_final() to get rid
33225b532ceSMauro Carvalho Chehabof those. Caller makes sure async writeback cannot be running for the inode while
33325b532ceSMauro Carvalho Chehab(or after) ->evict_inode() is called.
33425b532ceSMauro Carvalho Chehab
33525b532ceSMauro Carvalho Chehab->drop_inode() returns int now; it's called on final iput() with
33625b532ceSMauro Carvalho Chehabinode->i_lock held and it returns true if filesystems wants the inode to be
33725b532ceSMauro Carvalho Chehabdropped.  As before, generic_drop_inode() is still the default and it's been
33825b532ceSMauro Carvalho Chehabupdated appropriately.  generic_delete_inode() is also alive and it consists
33925b532ceSMauro Carvalho Chehabsimply of return 1.  Note that all actual eviction work is done by caller after
34025b532ceSMauro Carvalho Chehab->drop_inode() returns.
34125b532ceSMauro Carvalho Chehab
34225b532ceSMauro Carvalho ChehabAs before, clear_inode() must be called exactly once on each call of
34325b532ceSMauro Carvalho Chehab->evict_inode() (as it used to be for each call of ->delete_inode()).  Unlike
34425b532ceSMauro Carvalho Chehabbefore, if you are using inode-associated metadata buffers (i.e.
34525b532ceSMauro Carvalho Chehabmark_buffer_dirty_inode()), it's your responsibility to call
34625b532ceSMauro Carvalho Chehabinvalidate_inode_buffers() before clear_inode().
34725b532ceSMauro Carvalho Chehab
34825b532ceSMauro Carvalho ChehabNOTE: checking i_nlink in the beginning of ->write_inode() and bailing out
34925b532ceSMauro Carvalho Chehabif it's zero is not *and* *never* *had* *been* enough.  Final unlink() and iput()
35025b532ceSMauro Carvalho Chehabmay happen while the inode is in the middle of ->write_inode(); e.g. if you blindly
35125b532ceSMauro Carvalho Chehabfree the on-disk inode, you may end up doing that while ->write_inode() is writing
35225b532ceSMauro Carvalho Chehabto it.
35325b532ceSMauro Carvalho Chehab
35425b532ceSMauro Carvalho Chehab---
35525b532ceSMauro Carvalho Chehab
35625b532ceSMauro Carvalho Chehab**mandatory**
35725b532ceSMauro Carvalho Chehab
35825b532ceSMauro Carvalho Chehab.d_delete() now only advises the dcache as to whether or not to cache
35925b532ceSMauro Carvalho Chehabunreferenced dentries, and is now only called when the dentry refcount goes to
36025b532ceSMauro Carvalho Chehab0. Even on 0 refcount transition, it must be able to tolerate being called 0,
36125b532ceSMauro Carvalho Chehab1, or more times (eg. constant, idempotent).
36225b532ceSMauro Carvalho Chehab
36325b532ceSMauro Carvalho Chehab---
36425b532ceSMauro Carvalho Chehab
36525b532ceSMauro Carvalho Chehab**mandatory**
36625b532ceSMauro Carvalho Chehab
36725b532ceSMauro Carvalho Chehab.d_compare() calling convention and locking rules are significantly
36825b532ceSMauro Carvalho Chehabchanged. Read updated documentation in Documentation/filesystems/vfs.rst (and
36925b532ceSMauro Carvalho Chehablook at examples of other filesystems) for guidance.
37025b532ceSMauro Carvalho Chehab
37125b532ceSMauro Carvalho Chehab---
37225b532ceSMauro Carvalho Chehab
37325b532ceSMauro Carvalho Chehab**mandatory**
37425b532ceSMauro Carvalho Chehab
37525b532ceSMauro Carvalho Chehab.d_hash() calling convention and locking rules are significantly
37625b532ceSMauro Carvalho Chehabchanged. Read updated documentation in Documentation/filesystems/vfs.rst (and
37725b532ceSMauro Carvalho Chehablook at examples of other filesystems) for guidance.
37825b532ceSMauro Carvalho Chehab
37925b532ceSMauro Carvalho Chehab---
38025b532ceSMauro Carvalho Chehab
38125b532ceSMauro Carvalho Chehab**mandatory**
38225b532ceSMauro Carvalho Chehab
38325b532ceSMauro Carvalho Chehabdcache_lock is gone, replaced by fine grained locks. See fs/dcache.c
38425b532ceSMauro Carvalho Chehabfor details of what locks to replace dcache_lock with in order to protect
38525b532ceSMauro Carvalho Chehabparticular things. Most of the time, a filesystem only needs ->d_lock, which
38625b532ceSMauro Carvalho Chehabprotects *all* the dcache state of a given dentry.
38725b532ceSMauro Carvalho Chehab
38825b532ceSMauro Carvalho Chehab---
38925b532ceSMauro Carvalho Chehab
39025b532ceSMauro Carvalho Chehab**mandatory**
39125b532ceSMauro Carvalho Chehab
39225b532ceSMauro Carvalho ChehabFilesystems must RCU-free their inodes, if they can have been accessed
39325b532ceSMauro Carvalho Chehabvia rcu-walk path walk (basically, if the file can have had a path name in the
39425b532ceSMauro Carvalho Chehabvfs namespace).
39525b532ceSMauro Carvalho Chehab
39625b532ceSMauro Carvalho ChehabEven though i_dentry and i_rcu share storage in a union, we will
39725b532ceSMauro Carvalho Chehabinitialize the former in inode_init_always(), so just leave it alone in
39825b532ceSMauro Carvalho Chehabthe callback.  It used to be necessary to clean it there, but not anymore
39925b532ceSMauro Carvalho Chehab(starting at 3.2).
40025b532ceSMauro Carvalho Chehab
40125b532ceSMauro Carvalho Chehab---
40225b532ceSMauro Carvalho Chehab
40325b532ceSMauro Carvalho Chehab**recommended**
40425b532ceSMauro Carvalho Chehab
40525b532ceSMauro Carvalho Chehabvfs now tries to do path walking in "rcu-walk mode", which avoids
40625b532ceSMauro Carvalho Chehabatomic operations and scalability hazards on dentries and inodes (see
40725b532ceSMauro Carvalho ChehabDocumentation/filesystems/path-lookup.txt). d_hash and d_compare changes
40825b532ceSMauro Carvalho Chehab(above) are examples of the changes required to support this. For more complex
40925b532ceSMauro Carvalho Chehabfilesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so
41025b532ceSMauro Carvalho Chehabno changes are required to the filesystem. However, this is costly and loses
41125b532ceSMauro Carvalho Chehabthe benefits of rcu-walk mode. We will begin to add filesystem callbacks that
41225b532ceSMauro Carvalho Chehabare rcu-walk aware, shown below. Filesystems should take advantage of this
41325b532ceSMauro Carvalho Chehabwhere possible.
41425b532ceSMauro Carvalho Chehab
41525b532ceSMauro Carvalho Chehab---
41625b532ceSMauro Carvalho Chehab
41725b532ceSMauro Carvalho Chehab**mandatory**
41825b532ceSMauro Carvalho Chehab
41925b532ceSMauro Carvalho Chehabd_revalidate is a callback that is made on every path element (if
42025b532ceSMauro Carvalho Chehabthe filesystem provides it), which requires dropping out of rcu-walk mode. This
42125b532ceSMauro Carvalho Chehabmay now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be
42225b532ceSMauro Carvalho Chehabreturned if the filesystem cannot handle rcu-walk. See
42325b532ceSMauro Carvalho ChehabDocumentation/filesystems/vfs.rst for more details.
42425b532ceSMauro Carvalho Chehab
42525b532ceSMauro Carvalho Chehabpermission is an inode permission check that is called on many or all
42625b532ceSMauro Carvalho Chehabdirectory inodes on the way down a path walk (to check for exec permission). It
42725b532ceSMauro Carvalho Chehabmust now be rcu-walk aware (mask & MAY_NOT_BLOCK).  See
42825b532ceSMauro Carvalho ChehabDocumentation/filesystems/vfs.rst for more details.
42925b532ceSMauro Carvalho Chehab
43025b532ceSMauro Carvalho Chehab---
43125b532ceSMauro Carvalho Chehab
43225b532ceSMauro Carvalho Chehab**mandatory**
43325b532ceSMauro Carvalho Chehab
43425b532ceSMauro Carvalho ChehabIn ->fallocate() you must check the mode option passed in.  If your
43525b532ceSMauro Carvalho Chehabfilesystem does not support hole punching (deallocating space in the middle of a
43625b532ceSMauro Carvalho Chehabfile) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode.
43725b532ceSMauro Carvalho ChehabCurrently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set,
43825b532ceSMauro Carvalho Chehabso the i_size should not change when hole punching, even when puching the end of
43925b532ceSMauro Carvalho Chehaba file off.
44025b532ceSMauro Carvalho Chehab
44125b532ceSMauro Carvalho Chehab---
44225b532ceSMauro Carvalho Chehab
44325b532ceSMauro Carvalho Chehab**mandatory**
44425b532ceSMauro Carvalho Chehab
44525b532ceSMauro Carvalho Chehab->get_sb() is gone.  Switch to use of ->mount().  Typically it's just
44625b532ceSMauro Carvalho Chehaba matter of switching from calling ``get_sb_``... to ``mount_``... and changing
44725b532ceSMauro Carvalho Chehabthe function type.  If you were doing it manually, just switch from setting
44825b532ceSMauro Carvalho Chehab->mnt_root to some pointer to returning that pointer.  On errors return
44925b532ceSMauro Carvalho ChehabERR_PTR(...).
45025b532ceSMauro Carvalho Chehab
45125b532ceSMauro Carvalho Chehab---
45225b532ceSMauro Carvalho Chehab
45325b532ceSMauro Carvalho Chehab**mandatory**
45425b532ceSMauro Carvalho Chehab
45525b532ceSMauro Carvalho Chehab->permission() and generic_permission()have lost flags
45625b532ceSMauro Carvalho Chehabargument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask.
45725b532ceSMauro Carvalho Chehab
45825b532ceSMauro Carvalho Chehabgeneric_permission() has also lost the check_acl argument; ACL checking
45925b532ceSMauro Carvalho Chehabhas been taken to VFS and filesystems need to provide a non-NULL ->i_op->get_acl
46025b532ceSMauro Carvalho Chehabto read an ACL from disk.
46125b532ceSMauro Carvalho Chehab
46225b532ceSMauro Carvalho Chehab---
46325b532ceSMauro Carvalho Chehab
46425b532ceSMauro Carvalho Chehab**mandatory**
46525b532ceSMauro Carvalho Chehab
46625b532ceSMauro Carvalho ChehabIf you implement your own ->llseek() you must handle SEEK_HOLE and
46725b532ceSMauro Carvalho ChehabSEEK_DATA.  You can hanle this by returning -EINVAL, but it would be nicer to
46825b532ceSMauro Carvalho Chehabsupport it in some way.  The generic handler assumes that the entire file is
46925b532ceSMauro Carvalho Chehabdata and there is a virtual hole at the end of the file.  So if the provided
47025b532ceSMauro Carvalho Chehaboffset is less than i_size and SEEK_DATA is specified, return the same offset.
47125b532ceSMauro Carvalho ChehabIf the above is true for the offset and you are given SEEK_HOLE, return the end
47225b532ceSMauro Carvalho Chehabof the file.  If the offset is i_size or greater return -ENXIO in either case.
47325b532ceSMauro Carvalho Chehab
47425b532ceSMauro Carvalho Chehab**mandatory**
47525b532ceSMauro Carvalho Chehab
47625b532ceSMauro Carvalho ChehabIf you have your own ->fsync() you must make sure to call
47725b532ceSMauro Carvalho Chehabfilemap_write_and_wait_range() so that all dirty pages are synced out properly.
47825b532ceSMauro Carvalho ChehabYou must also keep in mind that ->fsync() is not called with i_mutex held
47925b532ceSMauro Carvalho Chehabanymore, so if you require i_mutex locking you must make sure to take it and
48025b532ceSMauro Carvalho Chehabrelease it yourself.
48125b532ceSMauro Carvalho Chehab
48225b532ceSMauro Carvalho Chehab---
48325b532ceSMauro Carvalho Chehab
48425b532ceSMauro Carvalho Chehab**mandatory**
48525b532ceSMauro Carvalho Chehab
48625b532ceSMauro Carvalho Chehabd_alloc_root() is gone, along with a lot of bugs caused by code
48725b532ceSMauro Carvalho Chehabmisusing it.  Replacement: d_make_root(inode).  On success d_make_root(inode)
48825b532ceSMauro Carvalho Chehaballocates and returns a new dentry instantiated with the passed in inode.
48925b532ceSMauro Carvalho ChehabOn failure NULL is returned and the passed in inode is dropped so the reference
49025b532ceSMauro Carvalho Chehabto inode is consumed in all cases and failure handling need not do any cleanup
49125b532ceSMauro Carvalho Chehabfor the inode.  If d_make_root(inode) is passed a NULL inode it returns NULL
49225b532ceSMauro Carvalho Chehaband also requires no further error handling. Typical usage is::
49325b532ceSMauro Carvalho Chehab
49425b532ceSMauro Carvalho Chehab	inode = foofs_new_inode(....);
49525b532ceSMauro Carvalho Chehab	s->s_root = d_make_root(inode);
49625b532ceSMauro Carvalho Chehab	if (!s->s_root)
49725b532ceSMauro Carvalho Chehab		/* Nothing needed for the inode cleanup */
49825b532ceSMauro Carvalho Chehab		return -ENOMEM;
49925b532ceSMauro Carvalho Chehab	...
50025b532ceSMauro Carvalho Chehab
50125b532ceSMauro Carvalho Chehab---
50225b532ceSMauro Carvalho Chehab
50325b532ceSMauro Carvalho Chehab**mandatory**
50425b532ceSMauro Carvalho Chehab
50525b532ceSMauro Carvalho ChehabThe witch is dead!  Well, 2/3 of it, anyway.  ->d_revalidate() and
50625b532ceSMauro Carvalho Chehab->lookup() do *not* take struct nameidata anymore; just the flags.
50725b532ceSMauro Carvalho Chehab
50825b532ceSMauro Carvalho Chehab---
50925b532ceSMauro Carvalho Chehab
51025b532ceSMauro Carvalho Chehab**mandatory**
51125b532ceSMauro Carvalho Chehab
51225b532ceSMauro Carvalho Chehab->create() doesn't take ``struct nameidata *``; unlike the previous
51325b532ceSMauro Carvalho Chehabtwo, it gets "is it an O_EXCL or equivalent?" boolean argument.  Note that
51425b532ceSMauro Carvalho Chehablocal filesystems can ignore tha argument - they are guaranteed that the
51525b532ceSMauro Carvalho Chehabobject doesn't exist.  It's remote/distributed ones that might care...
51625b532ceSMauro Carvalho Chehab
51725b532ceSMauro Carvalho Chehab---
51825b532ceSMauro Carvalho Chehab
51925b532ceSMauro Carvalho Chehab**mandatory**
52025b532ceSMauro Carvalho Chehab
52125b532ceSMauro Carvalho ChehabFS_REVAL_DOT is gone; if you used to have it, add ->d_weak_revalidate()
52225b532ceSMauro Carvalho Chehabin your dentry operations instead.
52325b532ceSMauro Carvalho Chehab
52425b532ceSMauro Carvalho Chehab---
52525b532ceSMauro Carvalho Chehab
52625b532ceSMauro Carvalho Chehab**mandatory**
52725b532ceSMauro Carvalho Chehab
52825b532ceSMauro Carvalho Chehabvfs_readdir() is gone; switch to iterate_dir() instead
52925b532ceSMauro Carvalho Chehab
53025b532ceSMauro Carvalho Chehab---
53125b532ceSMauro Carvalho Chehab
53225b532ceSMauro Carvalho Chehab**mandatory**
53325b532ceSMauro Carvalho Chehab
53425b532ceSMauro Carvalho Chehab->readdir() is gone now; switch to ->iterate()
53525b532ceSMauro Carvalho Chehab
53625b532ceSMauro Carvalho Chehab**mandatory**
53725b532ceSMauro Carvalho Chehab
53825b532ceSMauro Carvalho Chehabvfs_follow_link has been removed.  Filesystems must use nd_set_link
53925b532ceSMauro Carvalho Chehabfrom ->follow_link for normal symlinks, or nd_jump_link for magic
54025b532ceSMauro Carvalho Chehab/proc/<pid> style links.
54125b532ceSMauro Carvalho Chehab
54225b532ceSMauro Carvalho Chehab---
54325b532ceSMauro Carvalho Chehab
54425b532ceSMauro Carvalho Chehab**mandatory**
54525b532ceSMauro Carvalho Chehab
54625b532ceSMauro Carvalho Chehabiget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be
54725b532ceSMauro Carvalho Chehabcalled with both ->i_lock and inode_hash_lock held; the former is *not*
54825b532ceSMauro Carvalho Chehabtaken anymore, so verify that your callbacks do not rely on it (none
54925b532ceSMauro Carvalho Chehabof the in-tree instances did).  inode_hash_lock is still held,
55025b532ceSMauro Carvalho Chehabof course, so they are still serialized wrt removal from inode hash,
55125b532ceSMauro Carvalho Chehabas well as wrt set() callback of iget5_locked().
55225b532ceSMauro Carvalho Chehab
55325b532ceSMauro Carvalho Chehab---
55425b532ceSMauro Carvalho Chehab
55525b532ceSMauro Carvalho Chehab**mandatory**
55625b532ceSMauro Carvalho Chehab
55725b532ceSMauro Carvalho Chehabd_materialise_unique() is gone; d_splice_alias() does everything you
55825b532ceSMauro Carvalho Chehabneed now.  Remember that they have opposite orders of arguments ;-/
55925b532ceSMauro Carvalho Chehab
56025b532ceSMauro Carvalho Chehab---
56125b532ceSMauro Carvalho Chehab
56225b532ceSMauro Carvalho Chehab**mandatory**
56325b532ceSMauro Carvalho Chehab
56425b532ceSMauro Carvalho Chehabf_dentry is gone; use f_path.dentry, or, better yet, see if you can avoid
56525b532ceSMauro Carvalho Chehabit entirely.
56625b532ceSMauro Carvalho Chehab
56725b532ceSMauro Carvalho Chehab---
56825b532ceSMauro Carvalho Chehab
56925b532ceSMauro Carvalho Chehab**mandatory**
57025b532ceSMauro Carvalho Chehab
57125b532ceSMauro Carvalho Chehabnever call ->read() and ->write() directly; use __vfs_{read,write} or
57225b532ceSMauro Carvalho Chehabwrappers; instead of checking for ->write or ->read being NULL, look for
57325b532ceSMauro Carvalho ChehabFMODE_CAN_{WRITE,READ} in file->f_mode.
57425b532ceSMauro Carvalho Chehab
57525b532ceSMauro Carvalho Chehab---
57625b532ceSMauro Carvalho Chehab
57725b532ceSMauro Carvalho Chehab**mandatory**
57825b532ceSMauro Carvalho Chehab
57925b532ceSMauro Carvalho Chehabdo _not_ use new_sync_{read,write} for ->read/->write; leave it NULL
58025b532ceSMauro Carvalho Chehabinstead.
58125b532ceSMauro Carvalho Chehab
58225b532ceSMauro Carvalho Chehab---
58325b532ceSMauro Carvalho Chehab
58425b532ceSMauro Carvalho Chehab**mandatory**
58525b532ceSMauro Carvalho Chehab	->aio_read/->aio_write are gone.  Use ->read_iter/->write_iter.
58625b532ceSMauro Carvalho Chehab
58725b532ceSMauro Carvalho Chehab---
58825b532ceSMauro Carvalho Chehab
58925b532ceSMauro Carvalho Chehab**recommended**
59025b532ceSMauro Carvalho Chehab
59125b532ceSMauro Carvalho Chehabfor embedded ("fast") symlinks just set inode->i_link to wherever the
59225b532ceSMauro Carvalho Chehabsymlink body is and use simple_follow_link() as ->follow_link().
59325b532ceSMauro Carvalho Chehab
59425b532ceSMauro Carvalho Chehab---
59525b532ceSMauro Carvalho Chehab
59625b532ceSMauro Carvalho Chehab**mandatory**
59725b532ceSMauro Carvalho Chehab
59825b532ceSMauro Carvalho Chehabcalling conventions for ->follow_link() have changed.  Instead of returning
59925b532ceSMauro Carvalho Chehabcookie and using nd_set_link() to store the body to traverse, we return
60025b532ceSMauro Carvalho Chehabthe body to traverse and store the cookie using explicit void ** argument.
60125b532ceSMauro Carvalho Chehabnameidata isn't passed at all - nd_jump_link() doesn't need it and
60225b532ceSMauro Carvalho Chehabnd_[gs]et_link() is gone.
60325b532ceSMauro Carvalho Chehab
60425b532ceSMauro Carvalho Chehab---
60525b532ceSMauro Carvalho Chehab
60625b532ceSMauro Carvalho Chehab**mandatory**
60725b532ceSMauro Carvalho Chehab
60825b532ceSMauro Carvalho Chehabcalling conventions for ->put_link() have changed.  It gets inode instead of
60925b532ceSMauro Carvalho Chehabdentry,  it does not get nameidata at all and it gets called only when cookie
61025b532ceSMauro Carvalho Chehabis non-NULL.  Note that link body isn't available anymore, so if you need it,
61125b532ceSMauro Carvalho Chehabstore it as cookie.
61225b532ceSMauro Carvalho Chehab
61325b532ceSMauro Carvalho Chehab---
61425b532ceSMauro Carvalho Chehab
61525b532ceSMauro Carvalho Chehab**mandatory**
61625b532ceSMauro Carvalho Chehab
61725b532ceSMauro Carvalho Chehabany symlink that might use page_follow_link_light/page_put_link() must
61825b532ceSMauro Carvalho Chehabhave inode_nohighmem(inode) called before anything might start playing with
61925b532ceSMauro Carvalho Chehabits pagecache.  No highmem pages should end up in the pagecache of such
62025b532ceSMauro Carvalho Chehabsymlinks.  That includes any preseeding that might be done during symlink
62125b532ceSMauro Carvalho Chehabcreation.  __page_symlink() will honour the mapping gfp flags, so once
62225b532ceSMauro Carvalho Chehabyou've done inode_nohighmem() it's safe to use, but if you allocate and
62325b532ceSMauro Carvalho Chehabinsert the page manually, make sure to use the right gfp flags.
62425b532ceSMauro Carvalho Chehab
62525b532ceSMauro Carvalho Chehab---
62625b532ceSMauro Carvalho Chehab
62725b532ceSMauro Carvalho Chehab**mandatory**
62825b532ceSMauro Carvalho Chehab
62925b532ceSMauro Carvalho Chehab->follow_link() is replaced with ->get_link(); same API, except that
63025b532ceSMauro Carvalho Chehab
63125b532ceSMauro Carvalho Chehab	* ->get_link() gets inode as a separate argument
63225b532ceSMauro Carvalho Chehab	* ->get_link() may be called in RCU mode - in that case NULL
63325b532ceSMauro Carvalho Chehab	  dentry is passed
63425b532ceSMauro Carvalho Chehab
63525b532ceSMauro Carvalho Chehab---
63625b532ceSMauro Carvalho Chehab
63725b532ceSMauro Carvalho Chehab**mandatory**
63825b532ceSMauro Carvalho Chehab
63925b532ceSMauro Carvalho Chehab->get_link() gets struct delayed_call ``*done`` now, and should do
64025b532ceSMauro Carvalho Chehabset_delayed_call() where it used to set ``*cookie``.
64125b532ceSMauro Carvalho Chehab
64225b532ceSMauro Carvalho Chehab->put_link() is gone - just give the destructor to set_delayed_call()
64325b532ceSMauro Carvalho Chehabin ->get_link().
64425b532ceSMauro Carvalho Chehab
64525b532ceSMauro Carvalho Chehab---
64625b532ceSMauro Carvalho Chehab
64725b532ceSMauro Carvalho Chehab**mandatory**
64825b532ceSMauro Carvalho Chehab
64925b532ceSMauro Carvalho Chehab->getxattr() and xattr_handler.get() get dentry and inode passed separately.
65025b532ceSMauro Carvalho Chehabdentry might be yet to be attached to inode, so do _not_ use its ->d_inode
65125b532ceSMauro Carvalho Chehabin the instances.  Rationale: !@#!@# security_d_instantiate() needs to be
65225b532ceSMauro Carvalho Chehabcalled before we attach dentry to inode.
65325b532ceSMauro Carvalho Chehab
65425b532ceSMauro Carvalho Chehab---
65525b532ceSMauro Carvalho Chehab
65625b532ceSMauro Carvalho Chehab**mandatory**
65725b532ceSMauro Carvalho Chehab
65825b532ceSMauro Carvalho Chehabsymlinks are no longer the only inodes that do *not* have i_bdev/i_cdev/
65925b532ceSMauro Carvalho Chehabi_pipe/i_link union zeroed out at inode eviction.  As the result, you can't
66025b532ceSMauro Carvalho Chehabassume that non-NULL value in ->i_nlink at ->destroy_inode() implies that
66125b532ceSMauro Carvalho Chehabit's a symlink.  Checking ->i_mode is really needed now.  In-tree we had
66225b532ceSMauro Carvalho Chehabto fix shmem_destroy_callback() that used to take that kind of shortcut;
66325b532ceSMauro Carvalho Chehabwatch out, since that shortcut is no longer valid.
66425b532ceSMauro Carvalho Chehab
66525b532ceSMauro Carvalho Chehab---
66625b532ceSMauro Carvalho Chehab
66725b532ceSMauro Carvalho Chehab**mandatory**
66825b532ceSMauro Carvalho Chehab
66925b532ceSMauro Carvalho Chehab->i_mutex is replaced with ->i_rwsem now.  inode_lock() et.al. work as
67025b532ceSMauro Carvalho Chehabthey used to - they just take it exclusive.  However, ->lookup() may be
67125b532ceSMauro Carvalho Chehabcalled with parent locked shared.  Its instances must not
67225b532ceSMauro Carvalho Chehab
67325b532ceSMauro Carvalho Chehab	* use d_instantiate) and d_rehash() separately - use d_add() or
67425b532ceSMauro Carvalho Chehab	  d_splice_alias() instead.
67525b532ceSMauro Carvalho Chehab	* use d_rehash() alone - call d_add(new_dentry, NULL) instead.
67625b532ceSMauro Carvalho Chehab	* in the unlikely case when (read-only) access to filesystem
67725b532ceSMauro Carvalho Chehab	  data structures needs exclusion for some reason, arrange it
67825b532ceSMauro Carvalho Chehab	  yourself.  None of the in-tree filesystems needed that.
67925b532ceSMauro Carvalho Chehab	* rely on ->d_parent and ->d_name not changing after dentry has
68025b532ceSMauro Carvalho Chehab	  been fed to d_add() or d_splice_alias().  Again, none of the
68125b532ceSMauro Carvalho Chehab	  in-tree instances relied upon that.
68225b532ceSMauro Carvalho Chehab
68325b532ceSMauro Carvalho ChehabWe are guaranteed that lookups of the same name in the same directory
68425b532ceSMauro Carvalho Chehabwill not happen in parallel ("same" in the sense of your ->d_compare()).
68525b532ceSMauro Carvalho ChehabLookups on different names in the same directory can and do happen in
68625b532ceSMauro Carvalho Chehabparallel now.
68725b532ceSMauro Carvalho Chehab
68825b532ceSMauro Carvalho Chehab---
68925b532ceSMauro Carvalho Chehab
69025b532ceSMauro Carvalho Chehab**recommended**
69125b532ceSMauro Carvalho Chehab
69225b532ceSMauro Carvalho Chehab->iterate_shared() is added; it's a parallel variant of ->iterate().
69325b532ceSMauro Carvalho ChehabExclusion on struct file level is still provided (as well as that
69425b532ceSMauro Carvalho Chehabbetween it and lseek on the same struct file), but if your directory
69525b532ceSMauro Carvalho Chehabhas been opened several times, you can get these called in parallel.
69625b532ceSMauro Carvalho ChehabExclusion between that method and all directory-modifying ones is
69725b532ceSMauro Carvalho Chehabstill provided, of course.
69825b532ceSMauro Carvalho Chehab
69925b532ceSMauro Carvalho ChehabOften enough ->iterate() can serve as ->iterate_shared() without any
70025b532ceSMauro Carvalho Chehabchanges - it is a read-only operation, after all.  If you have any
70125b532ceSMauro Carvalho Chehabper-inode or per-dentry in-core data structures modified by ->iterate(),
70225b532ceSMauro Carvalho Chehabyou might need something to serialize the access to them.  If you
70325b532ceSMauro Carvalho Chehabdo dcache pre-seeding, you'll need to switch to d_alloc_parallel() for
70425b532ceSMauro Carvalho Chehabthat; look for in-tree examples.
70525b532ceSMauro Carvalho Chehab
70625b532ceSMauro Carvalho ChehabOld method is only used if the new one is absent; eventually it will
70725b532ceSMauro Carvalho Chehabbe removed.  Switch while you still can; the old one won't stay.
70825b532ceSMauro Carvalho Chehab
70925b532ceSMauro Carvalho Chehab---
71025b532ceSMauro Carvalho Chehab
71125b532ceSMauro Carvalho Chehab**mandatory**
71225b532ceSMauro Carvalho Chehab
71325b532ceSMauro Carvalho Chehab->atomic_open() calls without O_CREAT may happen in parallel.
71425b532ceSMauro Carvalho Chehab
71525b532ceSMauro Carvalho Chehab---
71625b532ceSMauro Carvalho Chehab
71725b532ceSMauro Carvalho Chehab**mandatory**
71825b532ceSMauro Carvalho Chehab
71925b532ceSMauro Carvalho Chehab->setxattr() and xattr_handler.set() get dentry and inode passed separately.
720e65ce2a5SChristian BraunerThe xattr_handler.set() gets passed the user namespace of the mount the inode
721e65ce2a5SChristian Brauneris seen from so filesystems can idmap the i_uid and i_gid accordingly.
72225b532ceSMauro Carvalho Chehabdentry might be yet to be attached to inode, so do _not_ use its ->d_inode
72325b532ceSMauro Carvalho Chehabin the instances.  Rationale: !@#!@# security_d_instantiate() needs to be
72425b532ceSMauro Carvalho Chehabcalled before we attach dentry to inode and !@#!@##!@$!$#!@#$!@$!@$ smack
72525b532ceSMauro Carvalho Chehab->d_instantiate() uses not just ->getxattr() but ->setxattr() as well.
72625b532ceSMauro Carvalho Chehab
72725b532ceSMauro Carvalho Chehab---
72825b532ceSMauro Carvalho Chehab
72925b532ceSMauro Carvalho Chehab**mandatory**
73025b532ceSMauro Carvalho Chehab
73125b532ceSMauro Carvalho Chehab->d_compare() doesn't get parent as a separate argument anymore.  If you
73225b532ceSMauro Carvalho Chehabused it for finding the struct super_block involved, dentry->d_sb will
73325b532ceSMauro Carvalho Chehabwork just as well; if it's something more complicated, use dentry->d_parent.
73425b532ceSMauro Carvalho ChehabJust be careful not to assume that fetching it more than once will yield
73525b532ceSMauro Carvalho Chehabthe same value - in RCU mode it could change under you.
73625b532ceSMauro Carvalho Chehab
73725b532ceSMauro Carvalho Chehab---
73825b532ceSMauro Carvalho Chehab
73925b532ceSMauro Carvalho Chehab**mandatory**
74025b532ceSMauro Carvalho Chehab
74125b532ceSMauro Carvalho Chehab->rename() has an added flags argument.  Any flags not handled by the
74225b532ceSMauro Carvalho Chehabfilesystem should result in EINVAL being returned.
74325b532ceSMauro Carvalho Chehab
74425b532ceSMauro Carvalho Chehab---
74525b532ceSMauro Carvalho Chehab
74625b532ceSMauro Carvalho Chehab
74725b532ceSMauro Carvalho Chehab**recommended**
74825b532ceSMauro Carvalho Chehab
74925b532ceSMauro Carvalho Chehab->readlink is optional for symlinks.  Don't set, unless filesystem needs
75025b532ceSMauro Carvalho Chehabto fake something for readlink(2).
75125b532ceSMauro Carvalho Chehab
75225b532ceSMauro Carvalho Chehab---
75325b532ceSMauro Carvalho Chehab
75425b532ceSMauro Carvalho Chehab**mandatory**
75525b532ceSMauro Carvalho Chehab
75625b532ceSMauro Carvalho Chehab->getattr() is now passed a struct path rather than a vfsmount and
75725b532ceSMauro Carvalho Chehabdentry separately, and it now has request_mask and query_flags arguments
75825b532ceSMauro Carvalho Chehabto specify the fields and sync type requested by statx.  Filesystems not
75925b532ceSMauro Carvalho Chehabsupporting any statx-specific features may ignore the new arguments.
76025b532ceSMauro Carvalho Chehab
76125b532ceSMauro Carvalho Chehab---
76225b532ceSMauro Carvalho Chehab
76325b532ceSMauro Carvalho Chehab**mandatory**
76425b532ceSMauro Carvalho Chehab
76525b532ceSMauro Carvalho Chehab->atomic_open() calling conventions have changed.  Gone is ``int *opened``,
76625b532ceSMauro Carvalho Chehabalong with FILE_OPENED/FILE_CREATED.  In place of those we have
76725b532ceSMauro Carvalho ChehabFMODE_OPENED/FMODE_CREATED, set in file->f_mode.  Additionally, return
76825b532ceSMauro Carvalho Chehabvalue for 'called finish_no_open(), open it yourself' case has become
76925b532ceSMauro Carvalho Chehab0, not 1.  Since finish_no_open() itself is returning 0 now, that part
77025b532ceSMauro Carvalho Chehabdoes not need any changes in ->atomic_open() instances.
77125b532ceSMauro Carvalho Chehab
77225b532ceSMauro Carvalho Chehab---
77325b532ceSMauro Carvalho Chehab
77425b532ceSMauro Carvalho Chehab**mandatory**
77525b532ceSMauro Carvalho Chehab
77625b532ceSMauro Carvalho Chehaballoc_file() has become static now; two wrappers are to be used instead.
77725b532ceSMauro Carvalho Chehaballoc_file_pseudo(inode, vfsmount, name, flags, ops) is for the cases
77825b532ceSMauro Carvalho Chehabwhen dentry needs to be created; that's the majority of old alloc_file()
77925b532ceSMauro Carvalho Chehabusers.  Calling conventions: on success a reference to new struct file
78025b532ceSMauro Carvalho Chehabis returned and callers reference to inode is subsumed by that.  On
78125b532ceSMauro Carvalho Chehabfailure, ERR_PTR() is returned and no caller's references are affected,
78225b532ceSMauro Carvalho Chehabso the caller needs to drop the inode reference it held.
78325b532ceSMauro Carvalho Chehaballoc_file_clone(file, flags, ops) does not affect any caller's references.
78425b532ceSMauro Carvalho ChehabOn success you get a new struct file sharing the mount/dentry with the
78525b532ceSMauro Carvalho Chehaboriginal, on failure - ERR_PTR().
78625b532ceSMauro Carvalho Chehab
78725b532ceSMauro Carvalho Chehab---
78825b532ceSMauro Carvalho Chehab
78925b532ceSMauro Carvalho Chehab**mandatory**
79025b532ceSMauro Carvalho Chehab
79125b532ceSMauro Carvalho Chehab->clone_file_range() and ->dedupe_file_range have been replaced with
79225b532ceSMauro Carvalho Chehab->remap_file_range().  See Documentation/filesystems/vfs.rst for more
79325b532ceSMauro Carvalho Chehabinformation.
79425b532ceSMauro Carvalho Chehab
79525b532ceSMauro Carvalho Chehab---
79625b532ceSMauro Carvalho Chehab
79725b532ceSMauro Carvalho Chehab**recommended**
79825b532ceSMauro Carvalho Chehab
79925b532ceSMauro Carvalho Chehab->lookup() instances doing an equivalent of::
80025b532ceSMauro Carvalho Chehab
80125b532ceSMauro Carvalho Chehab	if (IS_ERR(inode))
80225b532ceSMauro Carvalho Chehab		return ERR_CAST(inode);
80325b532ceSMauro Carvalho Chehab	return d_splice_alias(inode, dentry);
80425b532ceSMauro Carvalho Chehab
80525b532ceSMauro Carvalho Chehabdon't need to bother with the check - d_splice_alias() will do the
80625b532ceSMauro Carvalho Chehabright thing when given ERR_PTR(...) as inode.  Moreover, passing NULL
80725b532ceSMauro Carvalho Chehabinode to d_splice_alias() will also do the right thing (equivalent of
80825b532ceSMauro Carvalho Chehabd_add(dentry, NULL); return NULL;), so that kind of special cases
80925b532ceSMauro Carvalho Chehabalso doesn't need a separate treatment.
81025b532ceSMauro Carvalho Chehab
81125b532ceSMauro Carvalho Chehab---
81225b532ceSMauro Carvalho Chehab
81325b532ceSMauro Carvalho Chehab**strongly recommended**
81425b532ceSMauro Carvalho Chehab
81525b532ceSMauro Carvalho Chehabtake the RCU-delayed parts of ->destroy_inode() into a new method -
81625b532ceSMauro Carvalho Chehab->free_inode().  If ->destroy_inode() becomes empty - all the better,
81725b532ceSMauro Carvalho Chehabjust get rid of it.  Synchronous work (e.g. the stuff that can't
81825b532ceSMauro Carvalho Chehabbe done from an RCU callback, or any WARN_ON() where we want the
81925b532ceSMauro Carvalho Chehabstack trace) *might* be movable to ->evict_inode(); however,
82025b532ceSMauro Carvalho Chehabthat goes only for the things that are not needed to balance something
82125b532ceSMauro Carvalho Chehabdone by ->alloc_inode().  IOW, if it's cleaning up the stuff that
82225b532ceSMauro Carvalho Chehabmight have accumulated over the life of in-core inode, ->evict_inode()
82325b532ceSMauro Carvalho Chehabmight be a fit.
82425b532ceSMauro Carvalho Chehab
82525b532ceSMauro Carvalho ChehabRules for inode destruction:
82625b532ceSMauro Carvalho Chehab
82725b532ceSMauro Carvalho Chehab	* if ->destroy_inode() is non-NULL, it gets called
82825b532ceSMauro Carvalho Chehab	* if ->free_inode() is non-NULL, it gets scheduled by call_rcu()
82925b532ceSMauro Carvalho Chehab	* combination of NULL ->destroy_inode and NULL ->free_inode is
83025b532ceSMauro Carvalho Chehab	  treated as NULL/free_inode_nonrcu, to preserve the compatibility.
83125b532ceSMauro Carvalho Chehab
83225b532ceSMauro Carvalho ChehabNote that the callback (be it via ->free_inode() or explicit call_rcu()
83325b532ceSMauro Carvalho Chehabin ->destroy_inode()) is *NOT* ordered wrt superblock destruction;
83425b532ceSMauro Carvalho Chehabas the matter of fact, the superblock and all associated structures
83525b532ceSMauro Carvalho Chehabmight be already gone.  The filesystem driver is guaranteed to be still
83625b532ceSMauro Carvalho Chehabthere, but that's it.  Freeing memory in the callback is fine; doing
83725b532ceSMauro Carvalho Chehabmore than that is possible, but requires a lot of care and is best
83825b532ceSMauro Carvalho Chehabavoided.
83925b532ceSMauro Carvalho Chehab
84025b532ceSMauro Carvalho Chehab---
84125b532ceSMauro Carvalho Chehab
84225b532ceSMauro Carvalho Chehab**mandatory**
84325b532ceSMauro Carvalho Chehab
84425b532ceSMauro Carvalho ChehabDCACHE_RCUACCESS is gone; having an RCU delay on dentry freeing is the
84525b532ceSMauro Carvalho Chehabdefault.  DCACHE_NORCU opts out, and only d_alloc_pseudo() has any
84625b532ceSMauro Carvalho Chehabbusiness doing so.
84725b532ceSMauro Carvalho Chehab
84825b532ceSMauro Carvalho Chehab---
84925b532ceSMauro Carvalho Chehab
85025b532ceSMauro Carvalho Chehab**mandatory**
85125b532ceSMauro Carvalho Chehab
85225b532ceSMauro Carvalho Chehabd_alloc_pseudo() is internal-only; uses outside of alloc_file_pseudo() are
85325b532ceSMauro Carvalho Chehabvery suspect (and won't work in modules).  Such uses are very likely to
85425b532ceSMauro Carvalho Chehabbe misspelled d_alloc_anon().
855d9a9f484SAl Viro
856d9a9f484SAl Viro---
857d9a9f484SAl Viro
858d9a9f484SAl Viro**mandatory**
859d9a9f484SAl Viro
860d9a9f484SAl Viro[should've been added in 2016] stale comment in finish_open() nonwithstanding,
861d9a9f484SAl Virofailure exits in ->atomic_open() instances should *NOT* fput() the file,
862d9a9f484SAl Virono matter what.  Everything is handled by the caller.
863df820f8dSMiklos Szeredi
864df820f8dSMiklos Szeredi---
865df820f8dSMiklos Szeredi
866df820f8dSMiklos Szeredi**mandatory**
867df820f8dSMiklos Szeredi
868df820f8dSMiklos Szerediclone_private_mount() returns a longterm mount now, so the proper destructor of
869df820f8dSMiklos Szerediits result is kern_unmount() or kern_unmount_array().
8709b2e0016SPavel Begunkov
8719b2e0016SPavel Begunkov---
8729b2e0016SPavel Begunkov
8739b2e0016SPavel Begunkov**mandatory**
8749b2e0016SPavel Begunkov
8759b2e0016SPavel Begunkovzero-length bvec segments are disallowed, they must be filtered out before
8769b2e0016SPavel Begunkovpassed on to an iterator.
877c42bca92SPavel Begunkov
878c42bca92SPavel Begunkov---
879c42bca92SPavel Begunkov
880c42bca92SPavel Begunkov**mandatory**
881c42bca92SPavel Begunkov
882c42bca92SPavel BegunkovFor bvec based itererators bio_iov_iter_get_pages() now doesn't copy bvecs but
883c42bca92SPavel Begunkovuses the one provided. Anyone issuing kiocb-I/O should ensure that the bvec and
884c42bca92SPavel Begunkovpage references stay until I/O has completed, i.e. until ->ki_complete() has
885c42bca92SPavel Begunkovbeen called or returned with non -EIOCBQUEUED code.
8865ceabb60SLinus Torvalds
8875ceabb60SLinus Torvalds---
8885ceabb60SLinus Torvalds
8895ceabb60SLinus Torvalds**mandatory**
8905ceabb60SLinus Torvalds
89114e43bf4SEric Biggersmnt_want_write_file() can now only be paired with mnt_drop_write_file(),
89214e43bf4SEric Biggerswhereas previously it could be paired with mnt_drop_write() as well.
893*f0b65f39SAl Viro
894*f0b65f39SAl Viro---
895*f0b65f39SAl Viro
896*f0b65f39SAl Viro**mandatory**
897*f0b65f39SAl Viro
898*f0b65f39SAl Viroiov_iter_copy_from_user_atomic() is gone; use copy_page_from_iter_atomic().
899*f0b65f39SAl ViroThe difference is copy_page_from_iter_atomic() advances the iterator and
900*f0b65f39SAl Viroyou don't need iov_iter_advance() after it.  However, if you decide to use
901*f0b65f39SAl Viroonly a part of obtained data, you should do iov_iter_revert().
902