xref: /linux/Documentation/filesystems/porting.rst (revision 8b9f3ac5b01db85c6cf74c2c3a71280cc3045c9c)
125b532ceSMauro Carvalho Chehab====================
225b532ceSMauro Carvalho ChehabChanges since 2.5.0:
325b532ceSMauro Carvalho Chehab====================
425b532ceSMauro Carvalho Chehab
525b532ceSMauro Carvalho Chehab---
625b532ceSMauro Carvalho Chehab
725b532ceSMauro Carvalho Chehab**recommended**
825b532ceSMauro Carvalho Chehab
925b532ceSMauro Carvalho ChehabNew helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(),
1025b532ceSMauro Carvalho Chehabsb_set_blocksize() and sb_min_blocksize().
1125b532ceSMauro Carvalho Chehab
1225b532ceSMauro Carvalho ChehabUse them.
1325b532ceSMauro Carvalho Chehab
1425b532ceSMauro Carvalho Chehab(sb_find_get_block() replaces 2.4's get_hash_table())
1525b532ceSMauro Carvalho Chehab
1625b532ceSMauro Carvalho Chehab---
1725b532ceSMauro Carvalho Chehab
1825b532ceSMauro Carvalho Chehab**recommended**
1925b532ceSMauro Carvalho Chehab
2025b532ceSMauro Carvalho ChehabNew methods: ->alloc_inode() and ->destroy_inode().
2125b532ceSMauro Carvalho Chehab
2225b532ceSMauro Carvalho ChehabRemove inode->u.foo_inode_i
2325b532ceSMauro Carvalho Chehab
2425b532ceSMauro Carvalho ChehabDeclare::
2525b532ceSMauro Carvalho Chehab
2625b532ceSMauro Carvalho Chehab	struct foo_inode_info {
2725b532ceSMauro Carvalho Chehab		/* fs-private stuff */
2825b532ceSMauro Carvalho Chehab		struct inode vfs_inode;
2925b532ceSMauro Carvalho Chehab	};
3025b532ceSMauro Carvalho Chehab	static inline struct foo_inode_info *FOO_I(struct inode *inode)
3125b532ceSMauro Carvalho Chehab	{
3225b532ceSMauro Carvalho Chehab		return list_entry(inode, struct foo_inode_info, vfs_inode);
3325b532ceSMauro Carvalho Chehab	}
3425b532ceSMauro Carvalho Chehab
3525b532ceSMauro Carvalho ChehabUse FOO_I(inode) instead of &inode->u.foo_inode_i;
3625b532ceSMauro Carvalho Chehab
3725b532ceSMauro Carvalho ChehabAdd foo_alloc_inode() and foo_destroy_inode() - the former should allocate
3825b532ceSMauro Carvalho Chehabfoo_inode_info and return the address of ->vfs_inode, the latter should free
3925b532ceSMauro Carvalho ChehabFOO_I(inode) (see in-tree filesystems for examples).
4025b532ceSMauro Carvalho Chehab
4125b532ceSMauro Carvalho ChehabMake them ->alloc_inode and ->destroy_inode in your super_operations.
4225b532ceSMauro Carvalho Chehab
4325b532ceSMauro Carvalho ChehabKeep in mind that now you need explicit initialization of private data
4425b532ceSMauro Carvalho Chehabtypically between calling iget_locked() and unlocking the inode.
4525b532ceSMauro Carvalho Chehab
4625b532ceSMauro Carvalho ChehabAt some point that will become mandatory.
4725b532ceSMauro Carvalho Chehab
48*8b9f3ac5SMuchun Song**mandatory**
49*8b9f3ac5SMuchun Song
50*8b9f3ac5SMuchun SongThe foo_inode_info should always be allocated through alloc_inode_sb() rather
51*8b9f3ac5SMuchun Songthan kmem_cache_alloc() or kmalloc() related to set up the inode reclaim context
52*8b9f3ac5SMuchun Songcorrectly.
53*8b9f3ac5SMuchun Song
5425b532ceSMauro Carvalho Chehab---
5525b532ceSMauro Carvalho Chehab
5625b532ceSMauro Carvalho Chehab**mandatory**
5725b532ceSMauro Carvalho Chehab
5825b532ceSMauro Carvalho ChehabChange of file_system_type method (->read_super to ->get_sb)
5925b532ceSMauro Carvalho Chehab
6025b532ceSMauro Carvalho Chehab->read_super() is no more.  Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV.
6125b532ceSMauro Carvalho Chehab
6225b532ceSMauro Carvalho ChehabTurn your foo_read_super() into a function that would return 0 in case of
6325b532ceSMauro Carvalho Chehabsuccess and negative number in case of error (-EINVAL unless you have more
6425b532ceSMauro Carvalho Chehabinformative error value to report).  Call it foo_fill_super().  Now declare::
6525b532ceSMauro Carvalho Chehab
6625b532ceSMauro Carvalho Chehab  int foo_get_sb(struct file_system_type *fs_type,
6725b532ceSMauro Carvalho Chehab	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
6825b532ceSMauro Carvalho Chehab  {
6925b532ceSMauro Carvalho Chehab	return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super,
7025b532ceSMauro Carvalho Chehab			   mnt);
7125b532ceSMauro Carvalho Chehab  }
7225b532ceSMauro Carvalho Chehab
7325b532ceSMauro Carvalho Chehab(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of
7425b532ceSMauro Carvalho Chehabfilesystem).
7525b532ceSMauro Carvalho Chehab
7625b532ceSMauro Carvalho ChehabReplace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as
7725b532ceSMauro Carvalho Chehabfoo_get_sb.
7825b532ceSMauro Carvalho Chehab
7925b532ceSMauro Carvalho Chehab---
8025b532ceSMauro Carvalho Chehab
8125b532ceSMauro Carvalho Chehab**mandatory**
8225b532ceSMauro Carvalho Chehab
8325b532ceSMauro Carvalho ChehabLocking change: ->s_vfs_rename_sem is taken only by cross-directory renames.
8425b532ceSMauro Carvalho ChehabMost likely there is no need to change anything, but if you relied on
8525b532ceSMauro Carvalho Chehabglobal exclusion between renames for some internal purpose - you need to
8625b532ceSMauro Carvalho Chehabchange your internal locking.  Otherwise exclusion warranties remain the
8725b532ceSMauro Carvalho Chehabsame (i.e. parents and victim are locked, etc.).
8825b532ceSMauro Carvalho Chehab
8925b532ceSMauro Carvalho Chehab---
9025b532ceSMauro Carvalho Chehab
9125b532ceSMauro Carvalho Chehab**informational**
9225b532ceSMauro Carvalho Chehab
9325b532ceSMauro Carvalho ChehabNow we have the exclusion between ->lookup() and directory removal (by
9425b532ceSMauro Carvalho Chehab->rmdir() and ->rename()).  If you used to need that exclusion and do
9525b532ceSMauro Carvalho Chehabit by internal locking (most of filesystems couldn't care less) - you
9625b532ceSMauro Carvalho Chehabcan relax your locking.
9725b532ceSMauro Carvalho Chehab
9825b532ceSMauro Carvalho Chehab---
9925b532ceSMauro Carvalho Chehab
10025b532ceSMauro Carvalho Chehab**mandatory**
10125b532ceSMauro Carvalho Chehab
10225b532ceSMauro Carvalho Chehab->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(),
10325b532ceSMauro Carvalho Chehab->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename()
10425b532ceSMauro Carvalho Chehaband ->readdir() are called without BKL now.  Grab it on entry, drop upon return
10525b532ceSMauro Carvalho Chehab- that will guarantee the same locking you used to have.  If your method or its
10625b532ceSMauro Carvalho Chehabparts do not need BKL - better yet, now you can shift lock_kernel() and
10725b532ceSMauro Carvalho Chehabunlock_kernel() so that they would protect exactly what needs to be
10825b532ceSMauro Carvalho Chehabprotected.
10925b532ceSMauro Carvalho Chehab
11025b532ceSMauro Carvalho Chehab---
11125b532ceSMauro Carvalho Chehab
11225b532ceSMauro Carvalho Chehab**mandatory**
11325b532ceSMauro Carvalho Chehab
11425b532ceSMauro Carvalho ChehabBKL is also moved from around sb operations. BKL should have been shifted into
11525b532ceSMauro Carvalho Chehabindividual fs sb_op functions.  If you don't need it, remove it.
11625b532ceSMauro Carvalho Chehab
11725b532ceSMauro Carvalho Chehab---
11825b532ceSMauro Carvalho Chehab
11925b532ceSMauro Carvalho Chehab**informational**
12025b532ceSMauro Carvalho Chehab
12125b532ceSMauro Carvalho Chehabcheck for ->link() target not being a directory is done by callers.  Feel
12225b532ceSMauro Carvalho Chehabfree to drop it...
12325b532ceSMauro Carvalho Chehab
12425b532ceSMauro Carvalho Chehab---
12525b532ceSMauro Carvalho Chehab
12625b532ceSMauro Carvalho Chehab**informational**
12725b532ceSMauro Carvalho Chehab
12825b532ceSMauro Carvalho Chehab->link() callers hold ->i_mutex on the object we are linking to.  Some of your
12925b532ceSMauro Carvalho Chehabproblems might be over...
13025b532ceSMauro Carvalho Chehab
13125b532ceSMauro Carvalho Chehab---
13225b532ceSMauro Carvalho Chehab
13325b532ceSMauro Carvalho Chehab**mandatory**
13425b532ceSMauro Carvalho Chehab
13525b532ceSMauro Carvalho Chehabnew file_system_type method - kill_sb(superblock).  If you are converting
13625b532ceSMauro Carvalho Chehaban existing filesystem, set it according to ->fs_flags::
13725b532ceSMauro Carvalho Chehab
13825b532ceSMauro Carvalho Chehab	FS_REQUIRES_DEV		-	kill_block_super
13925b532ceSMauro Carvalho Chehab	FS_LITTER		-	kill_litter_super
14025b532ceSMauro Carvalho Chehab	neither			-	kill_anon_super
14125b532ceSMauro Carvalho Chehab
14225b532ceSMauro Carvalho ChehabFS_LITTER is gone - just remove it from fs_flags.
14325b532ceSMauro Carvalho Chehab
14425b532ceSMauro Carvalho Chehab---
14525b532ceSMauro Carvalho Chehab
14625b532ceSMauro Carvalho Chehab**mandatory**
14725b532ceSMauro Carvalho Chehab
14825b532ceSMauro Carvalho ChehabFS_SINGLE is gone (actually, that had happened back when ->get_sb()
14925b532ceSMauro Carvalho Chehabwent in - and hadn't been documented ;-/).  Just remove it from fs_flags
15025b532ceSMauro Carvalho Chehab(and see ->get_sb() entry for other actions).
15125b532ceSMauro Carvalho Chehab
15225b532ceSMauro Carvalho Chehab---
15325b532ceSMauro Carvalho Chehab
15425b532ceSMauro Carvalho Chehab**mandatory**
15525b532ceSMauro Carvalho Chehab
15625b532ceSMauro Carvalho Chehab->setattr() is called without BKL now.  Caller _always_ holds ->i_mutex, so
15725b532ceSMauro Carvalho Chehabwatch for ->i_mutex-grabbing code that might be used by your ->setattr().
15825b532ceSMauro Carvalho ChehabCallers of notify_change() need ->i_mutex now.
15925b532ceSMauro Carvalho Chehab
16025b532ceSMauro Carvalho Chehab---
16125b532ceSMauro Carvalho Chehab
16225b532ceSMauro Carvalho Chehab**recommended**
16325b532ceSMauro Carvalho Chehab
16425b532ceSMauro Carvalho ChehabNew super_block field ``struct export_operations *s_export_op`` for
16525b532ceSMauro Carvalho Chehabexplicit support for exporting, e.g. via NFS.  The structure is fully
16625b532ceSMauro Carvalho Chehabdocumented at its declaration in include/linux/fs.h, and in
1679195c3e8SMauro Carvalho ChehabDocumentation/filesystems/nfs/exporting.rst.
16825b532ceSMauro Carvalho Chehab
16925b532ceSMauro Carvalho ChehabBriefly it allows for the definition of decode_fh and encode_fh operations
17025b532ceSMauro Carvalho Chehabto encode and decode filehandles, and allows the filesystem to use
17125b532ceSMauro Carvalho Chehaba standard helper function for decode_fh, and provide file-system specific
17225b532ceSMauro Carvalho Chehabsupport for this helper, particularly get_parent.
17325b532ceSMauro Carvalho Chehab
17425b532ceSMauro Carvalho ChehabIt is planned that this will be required for exporting once the code
17525b532ceSMauro Carvalho Chehabsettles down a bit.
17625b532ceSMauro Carvalho Chehab
17725b532ceSMauro Carvalho Chehab**mandatory**
17825b532ceSMauro Carvalho Chehab
17925b532ceSMauro Carvalho Chehabs_export_op is now required for exporting a filesystem.
18025b532ceSMauro Carvalho Chehabisofs, ext2, ext3, resierfs, fat
18125b532ceSMauro Carvalho Chehabcan be used as examples of very different filesystems.
18225b532ceSMauro Carvalho Chehab
18325b532ceSMauro Carvalho Chehab---
18425b532ceSMauro Carvalho Chehab
18525b532ceSMauro Carvalho Chehab**mandatory**
18625b532ceSMauro Carvalho Chehab
18725b532ceSMauro Carvalho Chehabiget4() and the read_inode2 callback have been superseded by iget5_locked()
18825b532ceSMauro Carvalho Chehabwhich has the following prototype::
18925b532ceSMauro Carvalho Chehab
19025b532ceSMauro Carvalho Chehab    struct inode *iget5_locked(struct super_block *sb, unsigned long ino,
19125b532ceSMauro Carvalho Chehab				int (*test)(struct inode *, void *),
19225b532ceSMauro Carvalho Chehab				int (*set)(struct inode *, void *),
19325b532ceSMauro Carvalho Chehab				void *data);
19425b532ceSMauro Carvalho Chehab
19525b532ceSMauro Carvalho Chehab'test' is an additional function that can be used when the inode
19625b532ceSMauro Carvalho Chehabnumber is not sufficient to identify the actual file object. 'set'
19725b532ceSMauro Carvalho Chehabshould be a non-blocking function that initializes those parts of a
19825b532ceSMauro Carvalho Chehabnewly created inode to allow the test function to succeed. 'data' is
19925b532ceSMauro Carvalho Chehabpassed as an opaque value to both test and set functions.
20025b532ceSMauro Carvalho Chehab
20125b532ceSMauro Carvalho ChehabWhen the inode has been created by iget5_locked(), it will be returned with the
20225b532ceSMauro Carvalho ChehabI_NEW flag set and will still be locked.  The filesystem then needs to finalize
20325b532ceSMauro Carvalho Chehabthe initialization. Once the inode is initialized it must be unlocked by
20425b532ceSMauro Carvalho Chehabcalling unlock_new_inode().
20525b532ceSMauro Carvalho Chehab
20625b532ceSMauro Carvalho ChehabThe filesystem is responsible for setting (and possibly testing) i_ino
20725b532ceSMauro Carvalho Chehabwhen appropriate. There is also a simpler iget_locked function that
20825b532ceSMauro Carvalho Chehabjust takes the superblock and inode number as arguments and does the
20925b532ceSMauro Carvalho Chehabtest and set for you.
21025b532ceSMauro Carvalho Chehab
21125b532ceSMauro Carvalho Chehabe.g.::
21225b532ceSMauro Carvalho Chehab
21325b532ceSMauro Carvalho Chehab	inode = iget_locked(sb, ino);
21425b532ceSMauro Carvalho Chehab	if (inode->i_state & I_NEW) {
21525b532ceSMauro Carvalho Chehab		err = read_inode_from_disk(inode);
21625b532ceSMauro Carvalho Chehab		if (err < 0) {
21725b532ceSMauro Carvalho Chehab			iget_failed(inode);
21825b532ceSMauro Carvalho Chehab			return err;
21925b532ceSMauro Carvalho Chehab		}
22025b532ceSMauro Carvalho Chehab		unlock_new_inode(inode);
22125b532ceSMauro Carvalho Chehab	}
22225b532ceSMauro Carvalho Chehab
22325b532ceSMauro Carvalho ChehabNote that if the process of setting up a new inode fails, then iget_failed()
22425b532ceSMauro Carvalho Chehabshould be called on the inode to render it dead, and an appropriate error
22525b532ceSMauro Carvalho Chehabshould be passed back to the caller.
22625b532ceSMauro Carvalho Chehab
22725b532ceSMauro Carvalho Chehab---
22825b532ceSMauro Carvalho Chehab
22925b532ceSMauro Carvalho Chehab**recommended**
23025b532ceSMauro Carvalho Chehab
23125b532ceSMauro Carvalho Chehab->getattr() finally getting used.  See instances in nfs, minix, etc.
23225b532ceSMauro Carvalho Chehab
23325b532ceSMauro Carvalho Chehab---
23425b532ceSMauro Carvalho Chehab
23525b532ceSMauro Carvalho Chehab**mandatory**
23625b532ceSMauro Carvalho Chehab
23725b532ceSMauro Carvalho Chehab->revalidate() is gone.  If your filesystem had it - provide ->getattr()
23825b532ceSMauro Carvalho Chehaband let it call whatever you had as ->revlidate() + (for symlinks that
23925b532ceSMauro Carvalho Chehabhad ->revalidate()) add calls in ->follow_link()/->readlink().
24025b532ceSMauro Carvalho Chehab
24125b532ceSMauro Carvalho Chehab---
24225b532ceSMauro Carvalho Chehab
24325b532ceSMauro Carvalho Chehab**mandatory**
24425b532ceSMauro Carvalho Chehab
24525b532ceSMauro Carvalho Chehab->d_parent changes are not protected by BKL anymore.  Read access is safe
24625b532ceSMauro Carvalho Chehabif at least one of the following is true:
24725b532ceSMauro Carvalho Chehab
24825b532ceSMauro Carvalho Chehab	* filesystem has no cross-directory rename()
24925b532ceSMauro Carvalho Chehab	* we know that parent had been locked (e.g. we are looking at
25025b532ceSMauro Carvalho Chehab	  ->d_parent of ->lookup() argument).
25125b532ceSMauro Carvalho Chehab	* we are called from ->rename().
25225b532ceSMauro Carvalho Chehab	* the child's ->d_lock is held
25325b532ceSMauro Carvalho Chehab
25425b532ceSMauro Carvalho ChehabAudit your code and add locking if needed.  Notice that any place that is
25525b532ceSMauro Carvalho Chehabnot protected by the conditions above is risky even in the old tree - you
25625b532ceSMauro Carvalho Chehabhad been relying on BKL and that's prone to screwups.  Old tree had quite
25725b532ceSMauro Carvalho Chehaba few holes of that kind - unprotected access to ->d_parent leading to
25825b532ceSMauro Carvalho Chehabanything from oops to silent memory corruption.
25925b532ceSMauro Carvalho Chehab
26025b532ceSMauro Carvalho Chehab---
26125b532ceSMauro Carvalho Chehab
26225b532ceSMauro Carvalho Chehab**mandatory**
26325b532ceSMauro Carvalho Chehab
26425b532ceSMauro Carvalho ChehabFS_NOMOUNT is gone.  If you use it - just set SB_NOUSER in flags
26525b532ceSMauro Carvalho Chehab(see rootfs for one kind of solution and bdev/socket/pipe for another).
26625b532ceSMauro Carvalho Chehab
26725b532ceSMauro Carvalho Chehab---
26825b532ceSMauro Carvalho Chehab
26925b532ceSMauro Carvalho Chehab**recommended**
27025b532ceSMauro Carvalho Chehab
27125b532ceSMauro Carvalho ChehabUse bdev_read_only(bdev) instead of is_read_only(kdev).  The latter
27225b532ceSMauro Carvalho Chehabis still alive, but only because of the mess in drivers/s390/block/dasd.c.
27325b532ceSMauro Carvalho ChehabAs soon as it gets fixed is_read_only() will die.
27425b532ceSMauro Carvalho Chehab
27525b532ceSMauro Carvalho Chehab---
27625b532ceSMauro Carvalho Chehab
27725b532ceSMauro Carvalho Chehab**mandatory**
27825b532ceSMauro Carvalho Chehab
27925b532ceSMauro Carvalho Chehab->permission() is called without BKL now. Grab it on entry, drop upon
28025b532ceSMauro Carvalho Chehabreturn - that will guarantee the same locking you used to have.  If
28125b532ceSMauro Carvalho Chehabyour method or its parts do not need BKL - better yet, now you can
28225b532ceSMauro Carvalho Chehabshift lock_kernel() and unlock_kernel() so that they would protect
28325b532ceSMauro Carvalho Chehabexactly what needs to be protected.
28425b532ceSMauro Carvalho Chehab
28525b532ceSMauro Carvalho Chehab---
28625b532ceSMauro Carvalho Chehab
28725b532ceSMauro Carvalho Chehab**mandatory**
28825b532ceSMauro Carvalho Chehab
28925b532ceSMauro Carvalho Chehab->statfs() is now called without BKL held.  BKL should have been
29025b532ceSMauro Carvalho Chehabshifted into individual fs sb_op functions where it's not clear that
29125b532ceSMauro Carvalho Chehabit's safe to remove it.  If you don't need it, remove it.
29225b532ceSMauro Carvalho Chehab
29325b532ceSMauro Carvalho Chehab---
29425b532ceSMauro Carvalho Chehab
29525b532ceSMauro Carvalho Chehab**mandatory**
29625b532ceSMauro Carvalho Chehab
29725b532ceSMauro Carvalho Chehabis_read_only() is gone; use bdev_read_only() instead.
29825b532ceSMauro Carvalho Chehab
29925b532ceSMauro Carvalho Chehab---
30025b532ceSMauro Carvalho Chehab
30125b532ceSMauro Carvalho Chehab**mandatory**
30225b532ceSMauro Carvalho Chehab
30325b532ceSMauro Carvalho Chehabdestroy_buffers() is gone; use invalidate_bdev().
30425b532ceSMauro Carvalho Chehab
30525b532ceSMauro Carvalho Chehab---
30625b532ceSMauro Carvalho Chehab
30725b532ceSMauro Carvalho Chehab**mandatory**
30825b532ceSMauro Carvalho Chehab
30925b532ceSMauro Carvalho Chehabfsync_dev() is gone; use fsync_bdev().  NOTE: lvm breakage is
31025b532ceSMauro Carvalho Chehabdeliberate; as soon as struct block_device * is propagated in a reasonable
31125b532ceSMauro Carvalho Chehabway by that code fixing will become trivial; until then nothing can be
31225b532ceSMauro Carvalho Chehabdone.
31325b532ceSMauro Carvalho Chehab
31425b532ceSMauro Carvalho Chehab**mandatory**
31525b532ceSMauro Carvalho Chehab
31625b532ceSMauro Carvalho Chehabblock truncatation on error exit from ->write_begin, and ->direct_IO
31725b532ceSMauro Carvalho Chehabmoved from generic methods (block_write_begin, cont_write_begin,
31825b532ceSMauro Carvalho Chehabnobh_write_begin, blockdev_direct_IO*) to callers.  Take a look at
31925b532ceSMauro Carvalho Chehabext2_write_failed and callers for an example.
32025b532ceSMauro Carvalho Chehab
32125b532ceSMauro Carvalho Chehab**mandatory**
32225b532ceSMauro Carvalho Chehab
32325b532ceSMauro Carvalho Chehab->truncate is gone.  The whole truncate sequence needs to be
32425b532ceSMauro Carvalho Chehabimplemented in ->setattr, which is now mandatory for filesystems
32525b532ceSMauro Carvalho Chehabimplementing on-disk size changes.  Start with a copy of the old inode_setattr
32625b532ceSMauro Carvalho Chehaband vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to
32725b532ceSMauro Carvalho Chehabbe in order of zeroing blocks using block_truncate_page or similar helpers,
32825b532ceSMauro Carvalho Chehabsize update and on finally on-disk truncation which should not fail.
32925b532ceSMauro Carvalho Chehabsetattr_prepare (which used to be inode_change_ok) now includes the size checks
33025b532ceSMauro Carvalho Chehabfor ATTR_SIZE and must be called in the beginning of ->setattr unconditionally.
33125b532ceSMauro Carvalho Chehab
33225b532ceSMauro Carvalho Chehab**mandatory**
33325b532ceSMauro Carvalho Chehab
33425b532ceSMauro Carvalho Chehab->clear_inode() and ->delete_inode() are gone; ->evict_inode() should
33525b532ceSMauro Carvalho Chehabbe used instead.  It gets called whenever the inode is evicted, whether it has
33625b532ceSMauro Carvalho Chehabremaining links or not.  Caller does *not* evict the pagecache or inode-associated
33725b532ceSMauro Carvalho Chehabmetadata buffers; the method has to use truncate_inode_pages_final() to get rid
33825b532ceSMauro Carvalho Chehabof those. Caller makes sure async writeback cannot be running for the inode while
33925b532ceSMauro Carvalho Chehab(or after) ->evict_inode() is called.
34025b532ceSMauro Carvalho Chehab
34125b532ceSMauro Carvalho Chehab->drop_inode() returns int now; it's called on final iput() with
34225b532ceSMauro Carvalho Chehabinode->i_lock held and it returns true if filesystems wants the inode to be
34325b532ceSMauro Carvalho Chehabdropped.  As before, generic_drop_inode() is still the default and it's been
34425b532ceSMauro Carvalho Chehabupdated appropriately.  generic_delete_inode() is also alive and it consists
34525b532ceSMauro Carvalho Chehabsimply of return 1.  Note that all actual eviction work is done by caller after
34625b532ceSMauro Carvalho Chehab->drop_inode() returns.
34725b532ceSMauro Carvalho Chehab
34825b532ceSMauro Carvalho ChehabAs before, clear_inode() must be called exactly once on each call of
34925b532ceSMauro Carvalho Chehab->evict_inode() (as it used to be for each call of ->delete_inode()).  Unlike
35025b532ceSMauro Carvalho Chehabbefore, if you are using inode-associated metadata buffers (i.e.
35125b532ceSMauro Carvalho Chehabmark_buffer_dirty_inode()), it's your responsibility to call
35225b532ceSMauro Carvalho Chehabinvalidate_inode_buffers() before clear_inode().
35325b532ceSMauro Carvalho Chehab
35425b532ceSMauro Carvalho ChehabNOTE: checking i_nlink in the beginning of ->write_inode() and bailing out
35525b532ceSMauro Carvalho Chehabif it's zero is not *and* *never* *had* *been* enough.  Final unlink() and iput()
35625b532ceSMauro Carvalho Chehabmay happen while the inode is in the middle of ->write_inode(); e.g. if you blindly
35725b532ceSMauro Carvalho Chehabfree the on-disk inode, you may end up doing that while ->write_inode() is writing
35825b532ceSMauro Carvalho Chehabto it.
35925b532ceSMauro Carvalho Chehab
36025b532ceSMauro Carvalho Chehab---
36125b532ceSMauro Carvalho Chehab
36225b532ceSMauro Carvalho Chehab**mandatory**
36325b532ceSMauro Carvalho Chehab
36425b532ceSMauro Carvalho Chehab.d_delete() now only advises the dcache as to whether or not to cache
36525b532ceSMauro Carvalho Chehabunreferenced dentries, and is now only called when the dentry refcount goes to
36625b532ceSMauro Carvalho Chehab0. Even on 0 refcount transition, it must be able to tolerate being called 0,
36725b532ceSMauro Carvalho Chehab1, or more times (eg. constant, idempotent).
36825b532ceSMauro Carvalho Chehab
36925b532ceSMauro Carvalho Chehab---
37025b532ceSMauro Carvalho Chehab
37125b532ceSMauro Carvalho Chehab**mandatory**
37225b532ceSMauro Carvalho Chehab
37325b532ceSMauro Carvalho Chehab.d_compare() calling convention and locking rules are significantly
37425b532ceSMauro Carvalho Chehabchanged. Read updated documentation in Documentation/filesystems/vfs.rst (and
37525b532ceSMauro Carvalho Chehablook at examples of other filesystems) for guidance.
37625b532ceSMauro Carvalho Chehab
37725b532ceSMauro Carvalho Chehab---
37825b532ceSMauro Carvalho Chehab
37925b532ceSMauro Carvalho Chehab**mandatory**
38025b532ceSMauro Carvalho Chehab
38125b532ceSMauro Carvalho Chehab.d_hash() calling convention and locking rules are significantly
38225b532ceSMauro Carvalho Chehabchanged. Read updated documentation in Documentation/filesystems/vfs.rst (and
38325b532ceSMauro Carvalho Chehablook at examples of other filesystems) for guidance.
38425b532ceSMauro Carvalho Chehab
38525b532ceSMauro Carvalho Chehab---
38625b532ceSMauro Carvalho Chehab
38725b532ceSMauro Carvalho Chehab**mandatory**
38825b532ceSMauro Carvalho Chehab
38925b532ceSMauro Carvalho Chehabdcache_lock is gone, replaced by fine grained locks. See fs/dcache.c
39025b532ceSMauro Carvalho Chehabfor details of what locks to replace dcache_lock with in order to protect
39125b532ceSMauro Carvalho Chehabparticular things. Most of the time, a filesystem only needs ->d_lock, which
39225b532ceSMauro Carvalho Chehabprotects *all* the dcache state of a given dentry.
39325b532ceSMauro Carvalho Chehab
39425b532ceSMauro Carvalho Chehab---
39525b532ceSMauro Carvalho Chehab
39625b532ceSMauro Carvalho Chehab**mandatory**
39725b532ceSMauro Carvalho Chehab
39825b532ceSMauro Carvalho ChehabFilesystems must RCU-free their inodes, if they can have been accessed
39925b532ceSMauro Carvalho Chehabvia rcu-walk path walk (basically, if the file can have had a path name in the
40025b532ceSMauro Carvalho Chehabvfs namespace).
40125b532ceSMauro Carvalho Chehab
40225b532ceSMauro Carvalho ChehabEven though i_dentry and i_rcu share storage in a union, we will
40325b532ceSMauro Carvalho Chehabinitialize the former in inode_init_always(), so just leave it alone in
40425b532ceSMauro Carvalho Chehabthe callback.  It used to be necessary to clean it there, but not anymore
40525b532ceSMauro Carvalho Chehab(starting at 3.2).
40625b532ceSMauro Carvalho Chehab
40725b532ceSMauro Carvalho Chehab---
40825b532ceSMauro Carvalho Chehab
40925b532ceSMauro Carvalho Chehab**recommended**
41025b532ceSMauro Carvalho Chehab
41125b532ceSMauro Carvalho Chehabvfs now tries to do path walking in "rcu-walk mode", which avoids
41225b532ceSMauro Carvalho Chehabatomic operations and scalability hazards on dentries and inodes (see
41325b532ceSMauro Carvalho ChehabDocumentation/filesystems/path-lookup.txt). d_hash and d_compare changes
41425b532ceSMauro Carvalho Chehab(above) are examples of the changes required to support this. For more complex
41525b532ceSMauro Carvalho Chehabfilesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so
41625b532ceSMauro Carvalho Chehabno changes are required to the filesystem. However, this is costly and loses
41725b532ceSMauro Carvalho Chehabthe benefits of rcu-walk mode. We will begin to add filesystem callbacks that
41825b532ceSMauro Carvalho Chehabare rcu-walk aware, shown below. Filesystems should take advantage of this
41925b532ceSMauro Carvalho Chehabwhere possible.
42025b532ceSMauro Carvalho Chehab
42125b532ceSMauro Carvalho Chehab---
42225b532ceSMauro Carvalho Chehab
42325b532ceSMauro Carvalho Chehab**mandatory**
42425b532ceSMauro Carvalho Chehab
42525b532ceSMauro Carvalho Chehabd_revalidate is a callback that is made on every path element (if
42625b532ceSMauro Carvalho Chehabthe filesystem provides it), which requires dropping out of rcu-walk mode. This
42725b532ceSMauro Carvalho Chehabmay now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be
42825b532ceSMauro Carvalho Chehabreturned if the filesystem cannot handle rcu-walk. See
42925b532ceSMauro Carvalho ChehabDocumentation/filesystems/vfs.rst for more details.
43025b532ceSMauro Carvalho Chehab
43125b532ceSMauro Carvalho Chehabpermission is an inode permission check that is called on many or all
43225b532ceSMauro Carvalho Chehabdirectory inodes on the way down a path walk (to check for exec permission). It
43325b532ceSMauro Carvalho Chehabmust now be rcu-walk aware (mask & MAY_NOT_BLOCK).  See
43425b532ceSMauro Carvalho ChehabDocumentation/filesystems/vfs.rst for more details.
43525b532ceSMauro Carvalho Chehab
43625b532ceSMauro Carvalho Chehab---
43725b532ceSMauro Carvalho Chehab
43825b532ceSMauro Carvalho Chehab**mandatory**
43925b532ceSMauro Carvalho Chehab
44025b532ceSMauro Carvalho ChehabIn ->fallocate() you must check the mode option passed in.  If your
44125b532ceSMauro Carvalho Chehabfilesystem does not support hole punching (deallocating space in the middle of a
44225b532ceSMauro Carvalho Chehabfile) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode.
44325b532ceSMauro Carvalho ChehabCurrently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set,
44425b532ceSMauro Carvalho Chehabso the i_size should not change when hole punching, even when puching the end of
44525b532ceSMauro Carvalho Chehaba file off.
44625b532ceSMauro Carvalho Chehab
44725b532ceSMauro Carvalho Chehab---
44825b532ceSMauro Carvalho Chehab
44925b532ceSMauro Carvalho Chehab**mandatory**
45025b532ceSMauro Carvalho Chehab
45125b532ceSMauro Carvalho Chehab->get_sb() is gone.  Switch to use of ->mount().  Typically it's just
45225b532ceSMauro Carvalho Chehaba matter of switching from calling ``get_sb_``... to ``mount_``... and changing
45325b532ceSMauro Carvalho Chehabthe function type.  If you were doing it manually, just switch from setting
45425b532ceSMauro Carvalho Chehab->mnt_root to some pointer to returning that pointer.  On errors return
45525b532ceSMauro Carvalho ChehabERR_PTR(...).
45625b532ceSMauro Carvalho Chehab
45725b532ceSMauro Carvalho Chehab---
45825b532ceSMauro Carvalho Chehab
45925b532ceSMauro Carvalho Chehab**mandatory**
46025b532ceSMauro Carvalho Chehab
46125b532ceSMauro Carvalho Chehab->permission() and generic_permission()have lost flags
46225b532ceSMauro Carvalho Chehabargument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask.
46325b532ceSMauro Carvalho Chehab
46425b532ceSMauro Carvalho Chehabgeneric_permission() has also lost the check_acl argument; ACL checking
46525b532ceSMauro Carvalho Chehabhas been taken to VFS and filesystems need to provide a non-NULL ->i_op->get_acl
46625b532ceSMauro Carvalho Chehabto read an ACL from disk.
46725b532ceSMauro Carvalho Chehab
46825b532ceSMauro Carvalho Chehab---
46925b532ceSMauro Carvalho Chehab
47025b532ceSMauro Carvalho Chehab**mandatory**
47125b532ceSMauro Carvalho Chehab
47225b532ceSMauro Carvalho ChehabIf you implement your own ->llseek() you must handle SEEK_HOLE and
47325b532ceSMauro Carvalho ChehabSEEK_DATA.  You can hanle this by returning -EINVAL, but it would be nicer to
47425b532ceSMauro Carvalho Chehabsupport it in some way.  The generic handler assumes that the entire file is
47525b532ceSMauro Carvalho Chehabdata and there is a virtual hole at the end of the file.  So if the provided
47625b532ceSMauro Carvalho Chehaboffset is less than i_size and SEEK_DATA is specified, return the same offset.
47725b532ceSMauro Carvalho ChehabIf the above is true for the offset and you are given SEEK_HOLE, return the end
47825b532ceSMauro Carvalho Chehabof the file.  If the offset is i_size or greater return -ENXIO in either case.
47925b532ceSMauro Carvalho Chehab
48025b532ceSMauro Carvalho Chehab**mandatory**
48125b532ceSMauro Carvalho Chehab
48225b532ceSMauro Carvalho ChehabIf you have your own ->fsync() you must make sure to call
48325b532ceSMauro Carvalho Chehabfilemap_write_and_wait_range() so that all dirty pages are synced out properly.
48425b532ceSMauro Carvalho ChehabYou must also keep in mind that ->fsync() is not called with i_mutex held
48525b532ceSMauro Carvalho Chehabanymore, so if you require i_mutex locking you must make sure to take it and
48625b532ceSMauro Carvalho Chehabrelease it yourself.
48725b532ceSMauro Carvalho Chehab
48825b532ceSMauro Carvalho Chehab---
48925b532ceSMauro Carvalho Chehab
49025b532ceSMauro Carvalho Chehab**mandatory**
49125b532ceSMauro Carvalho Chehab
49225b532ceSMauro Carvalho Chehabd_alloc_root() is gone, along with a lot of bugs caused by code
49325b532ceSMauro Carvalho Chehabmisusing it.  Replacement: d_make_root(inode).  On success d_make_root(inode)
49425b532ceSMauro Carvalho Chehaballocates and returns a new dentry instantiated with the passed in inode.
49525b532ceSMauro Carvalho ChehabOn failure NULL is returned and the passed in inode is dropped so the reference
49625b532ceSMauro Carvalho Chehabto inode is consumed in all cases and failure handling need not do any cleanup
49725b532ceSMauro Carvalho Chehabfor the inode.  If d_make_root(inode) is passed a NULL inode it returns NULL
49825b532ceSMauro Carvalho Chehaband also requires no further error handling. Typical usage is::
49925b532ceSMauro Carvalho Chehab
50025b532ceSMauro Carvalho Chehab	inode = foofs_new_inode(....);
50125b532ceSMauro Carvalho Chehab	s->s_root = d_make_root(inode);
50225b532ceSMauro Carvalho Chehab	if (!s->s_root)
50325b532ceSMauro Carvalho Chehab		/* Nothing needed for the inode cleanup */
50425b532ceSMauro Carvalho Chehab		return -ENOMEM;
50525b532ceSMauro Carvalho Chehab	...
50625b532ceSMauro Carvalho Chehab
50725b532ceSMauro Carvalho Chehab---
50825b532ceSMauro Carvalho Chehab
50925b532ceSMauro Carvalho Chehab**mandatory**
51025b532ceSMauro Carvalho Chehab
51125b532ceSMauro Carvalho ChehabThe witch is dead!  Well, 2/3 of it, anyway.  ->d_revalidate() and
51225b532ceSMauro Carvalho Chehab->lookup() do *not* take struct nameidata anymore; just the flags.
51325b532ceSMauro Carvalho Chehab
51425b532ceSMauro Carvalho Chehab---
51525b532ceSMauro Carvalho Chehab
51625b532ceSMauro Carvalho Chehab**mandatory**
51725b532ceSMauro Carvalho Chehab
51825b532ceSMauro Carvalho Chehab->create() doesn't take ``struct nameidata *``; unlike the previous
51925b532ceSMauro Carvalho Chehabtwo, it gets "is it an O_EXCL or equivalent?" boolean argument.  Note that
52025b532ceSMauro Carvalho Chehablocal filesystems can ignore tha argument - they are guaranteed that the
52125b532ceSMauro Carvalho Chehabobject doesn't exist.  It's remote/distributed ones that might care...
52225b532ceSMauro Carvalho Chehab
52325b532ceSMauro Carvalho Chehab---
52425b532ceSMauro Carvalho Chehab
52525b532ceSMauro Carvalho Chehab**mandatory**
52625b532ceSMauro Carvalho Chehab
52725b532ceSMauro Carvalho ChehabFS_REVAL_DOT is gone; if you used to have it, add ->d_weak_revalidate()
52825b532ceSMauro Carvalho Chehabin your dentry operations instead.
52925b532ceSMauro Carvalho Chehab
53025b532ceSMauro Carvalho Chehab---
53125b532ceSMauro Carvalho Chehab
53225b532ceSMauro Carvalho Chehab**mandatory**
53325b532ceSMauro Carvalho Chehab
53425b532ceSMauro Carvalho Chehabvfs_readdir() is gone; switch to iterate_dir() instead
53525b532ceSMauro Carvalho Chehab
53625b532ceSMauro Carvalho Chehab---
53725b532ceSMauro Carvalho Chehab
53825b532ceSMauro Carvalho Chehab**mandatory**
53925b532ceSMauro Carvalho Chehab
54025b532ceSMauro Carvalho Chehab->readdir() is gone now; switch to ->iterate()
54125b532ceSMauro Carvalho Chehab
54225b532ceSMauro Carvalho Chehab**mandatory**
54325b532ceSMauro Carvalho Chehab
54425b532ceSMauro Carvalho Chehabvfs_follow_link has been removed.  Filesystems must use nd_set_link
54525b532ceSMauro Carvalho Chehabfrom ->follow_link for normal symlinks, or nd_jump_link for magic
54625b532ceSMauro Carvalho Chehab/proc/<pid> style links.
54725b532ceSMauro Carvalho Chehab
54825b532ceSMauro Carvalho Chehab---
54925b532ceSMauro Carvalho Chehab
55025b532ceSMauro Carvalho Chehab**mandatory**
55125b532ceSMauro Carvalho Chehab
55225b532ceSMauro Carvalho Chehabiget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be
55325b532ceSMauro Carvalho Chehabcalled with both ->i_lock and inode_hash_lock held; the former is *not*
55425b532ceSMauro Carvalho Chehabtaken anymore, so verify that your callbacks do not rely on it (none
55525b532ceSMauro Carvalho Chehabof the in-tree instances did).  inode_hash_lock is still held,
55625b532ceSMauro Carvalho Chehabof course, so they are still serialized wrt removal from inode hash,
55725b532ceSMauro Carvalho Chehabas well as wrt set() callback of iget5_locked().
55825b532ceSMauro Carvalho Chehab
55925b532ceSMauro Carvalho Chehab---
56025b532ceSMauro Carvalho Chehab
56125b532ceSMauro Carvalho Chehab**mandatory**
56225b532ceSMauro Carvalho Chehab
56325b532ceSMauro Carvalho Chehabd_materialise_unique() is gone; d_splice_alias() does everything you
56425b532ceSMauro Carvalho Chehabneed now.  Remember that they have opposite orders of arguments ;-/
56525b532ceSMauro Carvalho Chehab
56625b532ceSMauro Carvalho Chehab---
56725b532ceSMauro Carvalho Chehab
56825b532ceSMauro Carvalho Chehab**mandatory**
56925b532ceSMauro Carvalho Chehab
57025b532ceSMauro Carvalho Chehabf_dentry is gone; use f_path.dentry, or, better yet, see if you can avoid
57125b532ceSMauro Carvalho Chehabit entirely.
57225b532ceSMauro Carvalho Chehab
57325b532ceSMauro Carvalho Chehab---
57425b532ceSMauro Carvalho Chehab
57525b532ceSMauro Carvalho Chehab**mandatory**
57625b532ceSMauro Carvalho Chehab
57725b532ceSMauro Carvalho Chehabnever call ->read() and ->write() directly; use __vfs_{read,write} or
57825b532ceSMauro Carvalho Chehabwrappers; instead of checking for ->write or ->read being NULL, look for
57925b532ceSMauro Carvalho ChehabFMODE_CAN_{WRITE,READ} in file->f_mode.
58025b532ceSMauro Carvalho Chehab
58125b532ceSMauro Carvalho Chehab---
58225b532ceSMauro Carvalho Chehab
58325b532ceSMauro Carvalho Chehab**mandatory**
58425b532ceSMauro Carvalho Chehab
58525b532ceSMauro Carvalho Chehabdo _not_ use new_sync_{read,write} for ->read/->write; leave it NULL
58625b532ceSMauro Carvalho Chehabinstead.
58725b532ceSMauro Carvalho Chehab
58825b532ceSMauro Carvalho Chehab---
58925b532ceSMauro Carvalho Chehab
59025b532ceSMauro Carvalho Chehab**mandatory**
59125b532ceSMauro Carvalho Chehab	->aio_read/->aio_write are gone.  Use ->read_iter/->write_iter.
59225b532ceSMauro Carvalho Chehab
59325b532ceSMauro Carvalho Chehab---
59425b532ceSMauro Carvalho Chehab
59525b532ceSMauro Carvalho Chehab**recommended**
59625b532ceSMauro Carvalho Chehab
59725b532ceSMauro Carvalho Chehabfor embedded ("fast") symlinks just set inode->i_link to wherever the
59825b532ceSMauro Carvalho Chehabsymlink body is and use simple_follow_link() as ->follow_link().
59925b532ceSMauro Carvalho Chehab
60025b532ceSMauro Carvalho Chehab---
60125b532ceSMauro Carvalho Chehab
60225b532ceSMauro Carvalho Chehab**mandatory**
60325b532ceSMauro Carvalho Chehab
60425b532ceSMauro Carvalho Chehabcalling conventions for ->follow_link() have changed.  Instead of returning
60525b532ceSMauro Carvalho Chehabcookie and using nd_set_link() to store the body to traverse, we return
60625b532ceSMauro Carvalho Chehabthe body to traverse and store the cookie using explicit void ** argument.
60725b532ceSMauro Carvalho Chehabnameidata isn't passed at all - nd_jump_link() doesn't need it and
60825b532ceSMauro Carvalho Chehabnd_[gs]et_link() is gone.
60925b532ceSMauro Carvalho Chehab
61025b532ceSMauro Carvalho Chehab---
61125b532ceSMauro Carvalho Chehab
61225b532ceSMauro Carvalho Chehab**mandatory**
61325b532ceSMauro Carvalho Chehab
61425b532ceSMauro Carvalho Chehabcalling conventions for ->put_link() have changed.  It gets inode instead of
61525b532ceSMauro Carvalho Chehabdentry,  it does not get nameidata at all and it gets called only when cookie
61625b532ceSMauro Carvalho Chehabis non-NULL.  Note that link body isn't available anymore, so if you need it,
61725b532ceSMauro Carvalho Chehabstore it as cookie.
61825b532ceSMauro Carvalho Chehab
61925b532ceSMauro Carvalho Chehab---
62025b532ceSMauro Carvalho Chehab
62125b532ceSMauro Carvalho Chehab**mandatory**
62225b532ceSMauro Carvalho Chehab
62325b532ceSMauro Carvalho Chehabany symlink that might use page_follow_link_light/page_put_link() must
62425b532ceSMauro Carvalho Chehabhave inode_nohighmem(inode) called before anything might start playing with
62525b532ceSMauro Carvalho Chehabits pagecache.  No highmem pages should end up in the pagecache of such
62625b532ceSMauro Carvalho Chehabsymlinks.  That includes any preseeding that might be done during symlink
62725b532ceSMauro Carvalho Chehabcreation.  __page_symlink() will honour the mapping gfp flags, so once
62825b532ceSMauro Carvalho Chehabyou've done inode_nohighmem() it's safe to use, but if you allocate and
62925b532ceSMauro Carvalho Chehabinsert the page manually, make sure to use the right gfp flags.
63025b532ceSMauro Carvalho Chehab
63125b532ceSMauro Carvalho Chehab---
63225b532ceSMauro Carvalho Chehab
63325b532ceSMauro Carvalho Chehab**mandatory**
63425b532ceSMauro Carvalho Chehab
63525b532ceSMauro Carvalho Chehab->follow_link() is replaced with ->get_link(); same API, except that
63625b532ceSMauro Carvalho Chehab
63725b532ceSMauro Carvalho Chehab	* ->get_link() gets inode as a separate argument
63825b532ceSMauro Carvalho Chehab	* ->get_link() may be called in RCU mode - in that case NULL
63925b532ceSMauro Carvalho Chehab	  dentry is passed
64025b532ceSMauro Carvalho Chehab
64125b532ceSMauro Carvalho Chehab---
64225b532ceSMauro Carvalho Chehab
64325b532ceSMauro Carvalho Chehab**mandatory**
64425b532ceSMauro Carvalho Chehab
64525b532ceSMauro Carvalho Chehab->get_link() gets struct delayed_call ``*done`` now, and should do
64625b532ceSMauro Carvalho Chehabset_delayed_call() where it used to set ``*cookie``.
64725b532ceSMauro Carvalho Chehab
64825b532ceSMauro Carvalho Chehab->put_link() is gone - just give the destructor to set_delayed_call()
64925b532ceSMauro Carvalho Chehabin ->get_link().
65025b532ceSMauro Carvalho Chehab
65125b532ceSMauro Carvalho Chehab---
65225b532ceSMauro Carvalho Chehab
65325b532ceSMauro Carvalho Chehab**mandatory**
65425b532ceSMauro Carvalho Chehab
65525b532ceSMauro Carvalho Chehab->getxattr() and xattr_handler.get() get dentry and inode passed separately.
65625b532ceSMauro Carvalho Chehabdentry might be yet to be attached to inode, so do _not_ use its ->d_inode
65725b532ceSMauro Carvalho Chehabin the instances.  Rationale: !@#!@# security_d_instantiate() needs to be
65825b532ceSMauro Carvalho Chehabcalled before we attach dentry to inode.
65925b532ceSMauro Carvalho Chehab
66025b532ceSMauro Carvalho Chehab---
66125b532ceSMauro Carvalho Chehab
66225b532ceSMauro Carvalho Chehab**mandatory**
66325b532ceSMauro Carvalho Chehab
66425b532ceSMauro Carvalho Chehabsymlinks are no longer the only inodes that do *not* have i_bdev/i_cdev/
66525b532ceSMauro Carvalho Chehabi_pipe/i_link union zeroed out at inode eviction.  As the result, you can't
66625b532ceSMauro Carvalho Chehabassume that non-NULL value in ->i_nlink at ->destroy_inode() implies that
66725b532ceSMauro Carvalho Chehabit's a symlink.  Checking ->i_mode is really needed now.  In-tree we had
66825b532ceSMauro Carvalho Chehabto fix shmem_destroy_callback() that used to take that kind of shortcut;
66925b532ceSMauro Carvalho Chehabwatch out, since that shortcut is no longer valid.
67025b532ceSMauro Carvalho Chehab
67125b532ceSMauro Carvalho Chehab---
67225b532ceSMauro Carvalho Chehab
67325b532ceSMauro Carvalho Chehab**mandatory**
67425b532ceSMauro Carvalho Chehab
67525b532ceSMauro Carvalho Chehab->i_mutex is replaced with ->i_rwsem now.  inode_lock() et.al. work as
67625b532ceSMauro Carvalho Chehabthey used to - they just take it exclusive.  However, ->lookup() may be
67725b532ceSMauro Carvalho Chehabcalled with parent locked shared.  Its instances must not
67825b532ceSMauro Carvalho Chehab
67925b532ceSMauro Carvalho Chehab	* use d_instantiate) and d_rehash() separately - use d_add() or
68025b532ceSMauro Carvalho Chehab	  d_splice_alias() instead.
68125b532ceSMauro Carvalho Chehab	* use d_rehash() alone - call d_add(new_dentry, NULL) instead.
68225b532ceSMauro Carvalho Chehab	* in the unlikely case when (read-only) access to filesystem
68325b532ceSMauro Carvalho Chehab	  data structures needs exclusion for some reason, arrange it
68425b532ceSMauro Carvalho Chehab	  yourself.  None of the in-tree filesystems needed that.
68525b532ceSMauro Carvalho Chehab	* rely on ->d_parent and ->d_name not changing after dentry has
68625b532ceSMauro Carvalho Chehab	  been fed to d_add() or d_splice_alias().  Again, none of the
68725b532ceSMauro Carvalho Chehab	  in-tree instances relied upon that.
68825b532ceSMauro Carvalho Chehab
68925b532ceSMauro Carvalho ChehabWe are guaranteed that lookups of the same name in the same directory
69025b532ceSMauro Carvalho Chehabwill not happen in parallel ("same" in the sense of your ->d_compare()).
69125b532ceSMauro Carvalho ChehabLookups on different names in the same directory can and do happen in
69225b532ceSMauro Carvalho Chehabparallel now.
69325b532ceSMauro Carvalho Chehab
69425b532ceSMauro Carvalho Chehab---
69525b532ceSMauro Carvalho Chehab
69625b532ceSMauro Carvalho Chehab**recommended**
69725b532ceSMauro Carvalho Chehab
69825b532ceSMauro Carvalho Chehab->iterate_shared() is added; it's a parallel variant of ->iterate().
69925b532ceSMauro Carvalho ChehabExclusion on struct file level is still provided (as well as that
70025b532ceSMauro Carvalho Chehabbetween it and lseek on the same struct file), but if your directory
70125b532ceSMauro Carvalho Chehabhas been opened several times, you can get these called in parallel.
70225b532ceSMauro Carvalho ChehabExclusion between that method and all directory-modifying ones is
70325b532ceSMauro Carvalho Chehabstill provided, of course.
70425b532ceSMauro Carvalho Chehab
70525b532ceSMauro Carvalho ChehabOften enough ->iterate() can serve as ->iterate_shared() without any
70625b532ceSMauro Carvalho Chehabchanges - it is a read-only operation, after all.  If you have any
70725b532ceSMauro Carvalho Chehabper-inode or per-dentry in-core data structures modified by ->iterate(),
70825b532ceSMauro Carvalho Chehabyou might need something to serialize the access to them.  If you
70925b532ceSMauro Carvalho Chehabdo dcache pre-seeding, you'll need to switch to d_alloc_parallel() for
71025b532ceSMauro Carvalho Chehabthat; look for in-tree examples.
71125b532ceSMauro Carvalho Chehab
71225b532ceSMauro Carvalho ChehabOld method is only used if the new one is absent; eventually it will
71325b532ceSMauro Carvalho Chehabbe removed.  Switch while you still can; the old one won't stay.
71425b532ceSMauro Carvalho Chehab
71525b532ceSMauro Carvalho Chehab---
71625b532ceSMauro Carvalho Chehab
71725b532ceSMauro Carvalho Chehab**mandatory**
71825b532ceSMauro Carvalho Chehab
71925b532ceSMauro Carvalho Chehab->atomic_open() calls without O_CREAT may happen in parallel.
72025b532ceSMauro Carvalho Chehab
72125b532ceSMauro Carvalho Chehab---
72225b532ceSMauro Carvalho Chehab
72325b532ceSMauro Carvalho Chehab**mandatory**
72425b532ceSMauro Carvalho Chehab
72525b532ceSMauro Carvalho Chehab->setxattr() and xattr_handler.set() get dentry and inode passed separately.
726e65ce2a5SChristian BraunerThe xattr_handler.set() gets passed the user namespace of the mount the inode
727e65ce2a5SChristian Brauneris seen from so filesystems can idmap the i_uid and i_gid accordingly.
72825b532ceSMauro Carvalho Chehabdentry might be yet to be attached to inode, so do _not_ use its ->d_inode
72925b532ceSMauro Carvalho Chehabin the instances.  Rationale: !@#!@# security_d_instantiate() needs to be
73025b532ceSMauro Carvalho Chehabcalled before we attach dentry to inode and !@#!@##!@$!$#!@#$!@$!@$ smack
73125b532ceSMauro Carvalho Chehab->d_instantiate() uses not just ->getxattr() but ->setxattr() as well.
73225b532ceSMauro Carvalho Chehab
73325b532ceSMauro Carvalho Chehab---
73425b532ceSMauro Carvalho Chehab
73525b532ceSMauro Carvalho Chehab**mandatory**
73625b532ceSMauro Carvalho Chehab
73725b532ceSMauro Carvalho Chehab->d_compare() doesn't get parent as a separate argument anymore.  If you
73825b532ceSMauro Carvalho Chehabused it for finding the struct super_block involved, dentry->d_sb will
73925b532ceSMauro Carvalho Chehabwork just as well; if it's something more complicated, use dentry->d_parent.
74025b532ceSMauro Carvalho ChehabJust be careful not to assume that fetching it more than once will yield
74125b532ceSMauro Carvalho Chehabthe same value - in RCU mode it could change under you.
74225b532ceSMauro Carvalho Chehab
74325b532ceSMauro Carvalho Chehab---
74425b532ceSMauro Carvalho Chehab
74525b532ceSMauro Carvalho Chehab**mandatory**
74625b532ceSMauro Carvalho Chehab
74725b532ceSMauro Carvalho Chehab->rename() has an added flags argument.  Any flags not handled by the
74825b532ceSMauro Carvalho Chehabfilesystem should result in EINVAL being returned.
74925b532ceSMauro Carvalho Chehab
75025b532ceSMauro Carvalho Chehab---
75125b532ceSMauro Carvalho Chehab
75225b532ceSMauro Carvalho Chehab
75325b532ceSMauro Carvalho Chehab**recommended**
75425b532ceSMauro Carvalho Chehab
75525b532ceSMauro Carvalho Chehab->readlink is optional for symlinks.  Don't set, unless filesystem needs
75625b532ceSMauro Carvalho Chehabto fake something for readlink(2).
75725b532ceSMauro Carvalho Chehab
75825b532ceSMauro Carvalho Chehab---
75925b532ceSMauro Carvalho Chehab
76025b532ceSMauro Carvalho Chehab**mandatory**
76125b532ceSMauro Carvalho Chehab
76225b532ceSMauro Carvalho Chehab->getattr() is now passed a struct path rather than a vfsmount and
76325b532ceSMauro Carvalho Chehabdentry separately, and it now has request_mask and query_flags arguments
76425b532ceSMauro Carvalho Chehabto specify the fields and sync type requested by statx.  Filesystems not
76525b532ceSMauro Carvalho Chehabsupporting any statx-specific features may ignore the new arguments.
76625b532ceSMauro Carvalho Chehab
76725b532ceSMauro Carvalho Chehab---
76825b532ceSMauro Carvalho Chehab
76925b532ceSMauro Carvalho Chehab**mandatory**
77025b532ceSMauro Carvalho Chehab
77125b532ceSMauro Carvalho Chehab->atomic_open() calling conventions have changed.  Gone is ``int *opened``,
77225b532ceSMauro Carvalho Chehabalong with FILE_OPENED/FILE_CREATED.  In place of those we have
77325b532ceSMauro Carvalho ChehabFMODE_OPENED/FMODE_CREATED, set in file->f_mode.  Additionally, return
77425b532ceSMauro Carvalho Chehabvalue for 'called finish_no_open(), open it yourself' case has become
77525b532ceSMauro Carvalho Chehab0, not 1.  Since finish_no_open() itself is returning 0 now, that part
77625b532ceSMauro Carvalho Chehabdoes not need any changes in ->atomic_open() instances.
77725b532ceSMauro Carvalho Chehab
77825b532ceSMauro Carvalho Chehab---
77925b532ceSMauro Carvalho Chehab
78025b532ceSMauro Carvalho Chehab**mandatory**
78125b532ceSMauro Carvalho Chehab
78225b532ceSMauro Carvalho Chehaballoc_file() has become static now; two wrappers are to be used instead.
78325b532ceSMauro Carvalho Chehaballoc_file_pseudo(inode, vfsmount, name, flags, ops) is for the cases
78425b532ceSMauro Carvalho Chehabwhen dentry needs to be created; that's the majority of old alloc_file()
78525b532ceSMauro Carvalho Chehabusers.  Calling conventions: on success a reference to new struct file
78625b532ceSMauro Carvalho Chehabis returned and callers reference to inode is subsumed by that.  On
78725b532ceSMauro Carvalho Chehabfailure, ERR_PTR() is returned and no caller's references are affected,
78825b532ceSMauro Carvalho Chehabso the caller needs to drop the inode reference it held.
78925b532ceSMauro Carvalho Chehaballoc_file_clone(file, flags, ops) does not affect any caller's references.
79025b532ceSMauro Carvalho ChehabOn success you get a new struct file sharing the mount/dentry with the
79125b532ceSMauro Carvalho Chehaboriginal, on failure - ERR_PTR().
79225b532ceSMauro Carvalho Chehab
79325b532ceSMauro Carvalho Chehab---
79425b532ceSMauro Carvalho Chehab
79525b532ceSMauro Carvalho Chehab**mandatory**
79625b532ceSMauro Carvalho Chehab
79725b532ceSMauro Carvalho Chehab->clone_file_range() and ->dedupe_file_range have been replaced with
79825b532ceSMauro Carvalho Chehab->remap_file_range().  See Documentation/filesystems/vfs.rst for more
79925b532ceSMauro Carvalho Chehabinformation.
80025b532ceSMauro Carvalho Chehab
80125b532ceSMauro Carvalho Chehab---
80225b532ceSMauro Carvalho Chehab
80325b532ceSMauro Carvalho Chehab**recommended**
80425b532ceSMauro Carvalho Chehab
80525b532ceSMauro Carvalho Chehab->lookup() instances doing an equivalent of::
80625b532ceSMauro Carvalho Chehab
80725b532ceSMauro Carvalho Chehab	if (IS_ERR(inode))
80825b532ceSMauro Carvalho Chehab		return ERR_CAST(inode);
80925b532ceSMauro Carvalho Chehab	return d_splice_alias(inode, dentry);
81025b532ceSMauro Carvalho Chehab
81125b532ceSMauro Carvalho Chehabdon't need to bother with the check - d_splice_alias() will do the
81225b532ceSMauro Carvalho Chehabright thing when given ERR_PTR(...) as inode.  Moreover, passing NULL
81325b532ceSMauro Carvalho Chehabinode to d_splice_alias() will also do the right thing (equivalent of
81425b532ceSMauro Carvalho Chehabd_add(dentry, NULL); return NULL;), so that kind of special cases
81525b532ceSMauro Carvalho Chehabalso doesn't need a separate treatment.
81625b532ceSMauro Carvalho Chehab
81725b532ceSMauro Carvalho Chehab---
81825b532ceSMauro Carvalho Chehab
81925b532ceSMauro Carvalho Chehab**strongly recommended**
82025b532ceSMauro Carvalho Chehab
82125b532ceSMauro Carvalho Chehabtake the RCU-delayed parts of ->destroy_inode() into a new method -
82225b532ceSMauro Carvalho Chehab->free_inode().  If ->destroy_inode() becomes empty - all the better,
82325b532ceSMauro Carvalho Chehabjust get rid of it.  Synchronous work (e.g. the stuff that can't
82425b532ceSMauro Carvalho Chehabbe done from an RCU callback, or any WARN_ON() where we want the
82525b532ceSMauro Carvalho Chehabstack trace) *might* be movable to ->evict_inode(); however,
82625b532ceSMauro Carvalho Chehabthat goes only for the things that are not needed to balance something
82725b532ceSMauro Carvalho Chehabdone by ->alloc_inode().  IOW, if it's cleaning up the stuff that
82825b532ceSMauro Carvalho Chehabmight have accumulated over the life of in-core inode, ->evict_inode()
82925b532ceSMauro Carvalho Chehabmight be a fit.
83025b532ceSMauro Carvalho Chehab
83125b532ceSMauro Carvalho ChehabRules for inode destruction:
83225b532ceSMauro Carvalho Chehab
83325b532ceSMauro Carvalho Chehab	* if ->destroy_inode() is non-NULL, it gets called
83425b532ceSMauro Carvalho Chehab	* if ->free_inode() is non-NULL, it gets scheduled by call_rcu()
83525b532ceSMauro Carvalho Chehab	* combination of NULL ->destroy_inode and NULL ->free_inode is
83625b532ceSMauro Carvalho Chehab	  treated as NULL/free_inode_nonrcu, to preserve the compatibility.
83725b532ceSMauro Carvalho Chehab
83825b532ceSMauro Carvalho ChehabNote that the callback (be it via ->free_inode() or explicit call_rcu()
83925b532ceSMauro Carvalho Chehabin ->destroy_inode()) is *NOT* ordered wrt superblock destruction;
84025b532ceSMauro Carvalho Chehabas the matter of fact, the superblock and all associated structures
84125b532ceSMauro Carvalho Chehabmight be already gone.  The filesystem driver is guaranteed to be still
84225b532ceSMauro Carvalho Chehabthere, but that's it.  Freeing memory in the callback is fine; doing
84325b532ceSMauro Carvalho Chehabmore than that is possible, but requires a lot of care and is best
84425b532ceSMauro Carvalho Chehabavoided.
84525b532ceSMauro Carvalho Chehab
84625b532ceSMauro Carvalho Chehab---
84725b532ceSMauro Carvalho Chehab
84825b532ceSMauro Carvalho Chehab**mandatory**
84925b532ceSMauro Carvalho Chehab
85025b532ceSMauro Carvalho ChehabDCACHE_RCUACCESS is gone; having an RCU delay on dentry freeing is the
85125b532ceSMauro Carvalho Chehabdefault.  DCACHE_NORCU opts out, and only d_alloc_pseudo() has any
85225b532ceSMauro Carvalho Chehabbusiness doing so.
85325b532ceSMauro Carvalho Chehab
85425b532ceSMauro Carvalho Chehab---
85525b532ceSMauro Carvalho Chehab
85625b532ceSMauro Carvalho Chehab**mandatory**
85725b532ceSMauro Carvalho Chehab
85825b532ceSMauro Carvalho Chehabd_alloc_pseudo() is internal-only; uses outside of alloc_file_pseudo() are
85925b532ceSMauro Carvalho Chehabvery suspect (and won't work in modules).  Such uses are very likely to
86025b532ceSMauro Carvalho Chehabbe misspelled d_alloc_anon().
861d9a9f484SAl Viro
862d9a9f484SAl Viro---
863d9a9f484SAl Viro
864d9a9f484SAl Viro**mandatory**
865d9a9f484SAl Viro
866d9a9f484SAl Viro[should've been added in 2016] stale comment in finish_open() nonwithstanding,
867d9a9f484SAl Virofailure exits in ->atomic_open() instances should *NOT* fput() the file,
868d9a9f484SAl Virono matter what.  Everything is handled by the caller.
869df820f8dSMiklos Szeredi
870df820f8dSMiklos Szeredi---
871df820f8dSMiklos Szeredi
872df820f8dSMiklos Szeredi**mandatory**
873df820f8dSMiklos Szeredi
874df820f8dSMiklos Szerediclone_private_mount() returns a longterm mount now, so the proper destructor of
875df820f8dSMiklos Szerediits result is kern_unmount() or kern_unmount_array().
8769b2e0016SPavel Begunkov
8779b2e0016SPavel Begunkov---
8789b2e0016SPavel Begunkov
8799b2e0016SPavel Begunkov**mandatory**
8809b2e0016SPavel Begunkov
8819b2e0016SPavel Begunkovzero-length bvec segments are disallowed, they must be filtered out before
8829b2e0016SPavel Begunkovpassed on to an iterator.
883c42bca92SPavel Begunkov
884c42bca92SPavel Begunkov---
885c42bca92SPavel Begunkov
886c42bca92SPavel Begunkov**mandatory**
887c42bca92SPavel Begunkov
888c42bca92SPavel BegunkovFor bvec based itererators bio_iov_iter_get_pages() now doesn't copy bvecs but
889c42bca92SPavel Begunkovuses the one provided. Anyone issuing kiocb-I/O should ensure that the bvec and
890c42bca92SPavel Begunkovpage references stay until I/O has completed, i.e. until ->ki_complete() has
891c42bca92SPavel Begunkovbeen called or returned with non -EIOCBQUEUED code.
8925ceabb60SLinus Torvalds
8935ceabb60SLinus Torvalds---
8945ceabb60SLinus Torvalds
8955ceabb60SLinus Torvalds**mandatory**
8965ceabb60SLinus Torvalds
89714e43bf4SEric Biggersmnt_want_write_file() can now only be paired with mnt_drop_write_file(),
89814e43bf4SEric Biggerswhereas previously it could be paired with mnt_drop_write() as well.
899f0b65f39SAl Viro
900f0b65f39SAl Viro---
901f0b65f39SAl Viro
902f0b65f39SAl Viro**mandatory**
903f0b65f39SAl Viro
904f0b65f39SAl Viroiov_iter_copy_from_user_atomic() is gone; use copy_page_from_iter_atomic().
905f0b65f39SAl ViroThe difference is copy_page_from_iter_atomic() advances the iterator and
906f0b65f39SAl Viroyou don't need iov_iter_advance() after it.  However, if you decide to use
907f0b65f39SAl Viroonly a part of obtained data, you should do iov_iter_revert().
90858ec9059SLinus Torvalds
90958ec9059SLinus Torvalds---
91058ec9059SLinus Torvalds
91158ec9059SLinus Torvalds**mandatory**
91258ec9059SLinus Torvalds
913ffb37ca3SAl ViroCalling conventions for file_open_root() changed; now it takes struct path *
914ffb37ca3SAl Viroinstead of passing mount and dentry separately.  For callers that used to
915ffb37ca3SAl Viropass <mnt, mnt->mnt_root> pair (i.e. the root of given mount), a new helper
916ffb37ca3SAl Virois provided - file_open_root_mnt().  In-tree users adjusted.
917