xref: /linux/Documentation/filesystems/porting.rst (revision 25b532cec59ee119bf1c4081fe06fac43722f61f)
1*25b532ceSMauro Carvalho Chehab====================
2*25b532ceSMauro Carvalho ChehabChanges since 2.5.0:
3*25b532ceSMauro Carvalho Chehab====================
4*25b532ceSMauro Carvalho Chehab
5*25b532ceSMauro Carvalho Chehab---
6*25b532ceSMauro Carvalho Chehab
7*25b532ceSMauro Carvalho Chehab**recommended**
8*25b532ceSMauro Carvalho Chehab
9*25b532ceSMauro Carvalho ChehabNew helpers: sb_bread(), sb_getblk(), sb_find_get_block(), set_bh(),
10*25b532ceSMauro Carvalho Chehabsb_set_blocksize() and sb_min_blocksize().
11*25b532ceSMauro Carvalho Chehab
12*25b532ceSMauro Carvalho ChehabUse them.
13*25b532ceSMauro Carvalho Chehab
14*25b532ceSMauro Carvalho Chehab(sb_find_get_block() replaces 2.4's get_hash_table())
15*25b532ceSMauro Carvalho Chehab
16*25b532ceSMauro Carvalho Chehab---
17*25b532ceSMauro Carvalho Chehab
18*25b532ceSMauro Carvalho Chehab**recommended**
19*25b532ceSMauro Carvalho Chehab
20*25b532ceSMauro Carvalho ChehabNew methods: ->alloc_inode() and ->destroy_inode().
21*25b532ceSMauro Carvalho Chehab
22*25b532ceSMauro Carvalho ChehabRemove inode->u.foo_inode_i
23*25b532ceSMauro Carvalho Chehab
24*25b532ceSMauro Carvalho ChehabDeclare::
25*25b532ceSMauro Carvalho Chehab
26*25b532ceSMauro Carvalho Chehab	struct foo_inode_info {
27*25b532ceSMauro Carvalho Chehab		/* fs-private stuff */
28*25b532ceSMauro Carvalho Chehab		struct inode vfs_inode;
29*25b532ceSMauro Carvalho Chehab	};
30*25b532ceSMauro Carvalho Chehab	static inline struct foo_inode_info *FOO_I(struct inode *inode)
31*25b532ceSMauro Carvalho Chehab	{
32*25b532ceSMauro Carvalho Chehab		return list_entry(inode, struct foo_inode_info, vfs_inode);
33*25b532ceSMauro Carvalho Chehab	}
34*25b532ceSMauro Carvalho Chehab
35*25b532ceSMauro Carvalho ChehabUse FOO_I(inode) instead of &inode->u.foo_inode_i;
36*25b532ceSMauro Carvalho Chehab
37*25b532ceSMauro Carvalho ChehabAdd foo_alloc_inode() and foo_destroy_inode() - the former should allocate
38*25b532ceSMauro Carvalho Chehabfoo_inode_info and return the address of ->vfs_inode, the latter should free
39*25b532ceSMauro Carvalho ChehabFOO_I(inode) (see in-tree filesystems for examples).
40*25b532ceSMauro Carvalho Chehab
41*25b532ceSMauro Carvalho ChehabMake them ->alloc_inode and ->destroy_inode in your super_operations.
42*25b532ceSMauro Carvalho Chehab
43*25b532ceSMauro Carvalho ChehabKeep in mind that now you need explicit initialization of private data
44*25b532ceSMauro Carvalho Chehabtypically between calling iget_locked() and unlocking the inode.
45*25b532ceSMauro Carvalho Chehab
46*25b532ceSMauro Carvalho ChehabAt some point that will become mandatory.
47*25b532ceSMauro Carvalho Chehab
48*25b532ceSMauro Carvalho Chehab---
49*25b532ceSMauro Carvalho Chehab
50*25b532ceSMauro Carvalho Chehab**mandatory**
51*25b532ceSMauro Carvalho Chehab
52*25b532ceSMauro Carvalho ChehabChange of file_system_type method (->read_super to ->get_sb)
53*25b532ceSMauro Carvalho Chehab
54*25b532ceSMauro Carvalho Chehab->read_super() is no more.  Ditto for DECLARE_FSTYPE and DECLARE_FSTYPE_DEV.
55*25b532ceSMauro Carvalho Chehab
56*25b532ceSMauro Carvalho ChehabTurn your foo_read_super() into a function that would return 0 in case of
57*25b532ceSMauro Carvalho Chehabsuccess and negative number in case of error (-EINVAL unless you have more
58*25b532ceSMauro Carvalho Chehabinformative error value to report).  Call it foo_fill_super().  Now declare::
59*25b532ceSMauro Carvalho Chehab
60*25b532ceSMauro Carvalho Chehab  int foo_get_sb(struct file_system_type *fs_type,
61*25b532ceSMauro Carvalho Chehab	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
62*25b532ceSMauro Carvalho Chehab  {
63*25b532ceSMauro Carvalho Chehab	return get_sb_bdev(fs_type, flags, dev_name, data, foo_fill_super,
64*25b532ceSMauro Carvalho Chehab			   mnt);
65*25b532ceSMauro Carvalho Chehab  }
66*25b532ceSMauro Carvalho Chehab
67*25b532ceSMauro Carvalho Chehab(or similar with s/bdev/nodev/ or s/bdev/single/, depending on the kind of
68*25b532ceSMauro Carvalho Chehabfilesystem).
69*25b532ceSMauro Carvalho Chehab
70*25b532ceSMauro Carvalho ChehabReplace DECLARE_FSTYPE... with explicit initializer and have ->get_sb set as
71*25b532ceSMauro Carvalho Chehabfoo_get_sb.
72*25b532ceSMauro Carvalho Chehab
73*25b532ceSMauro Carvalho Chehab---
74*25b532ceSMauro Carvalho Chehab
75*25b532ceSMauro Carvalho Chehab**mandatory**
76*25b532ceSMauro Carvalho Chehab
77*25b532ceSMauro Carvalho ChehabLocking change: ->s_vfs_rename_sem is taken only by cross-directory renames.
78*25b532ceSMauro Carvalho ChehabMost likely there is no need to change anything, but if you relied on
79*25b532ceSMauro Carvalho Chehabglobal exclusion between renames for some internal purpose - you need to
80*25b532ceSMauro Carvalho Chehabchange your internal locking.  Otherwise exclusion warranties remain the
81*25b532ceSMauro Carvalho Chehabsame (i.e. parents and victim are locked, etc.).
82*25b532ceSMauro Carvalho Chehab
83*25b532ceSMauro Carvalho Chehab---
84*25b532ceSMauro Carvalho Chehab
85*25b532ceSMauro Carvalho Chehab**informational**
86*25b532ceSMauro Carvalho Chehab
87*25b532ceSMauro Carvalho ChehabNow we have the exclusion between ->lookup() and directory removal (by
88*25b532ceSMauro Carvalho Chehab->rmdir() and ->rename()).  If you used to need that exclusion and do
89*25b532ceSMauro Carvalho Chehabit by internal locking (most of filesystems couldn't care less) - you
90*25b532ceSMauro Carvalho Chehabcan relax your locking.
91*25b532ceSMauro Carvalho Chehab
92*25b532ceSMauro Carvalho Chehab---
93*25b532ceSMauro Carvalho Chehab
94*25b532ceSMauro Carvalho Chehab**mandatory**
95*25b532ceSMauro Carvalho Chehab
96*25b532ceSMauro Carvalho Chehab->lookup(), ->truncate(), ->create(), ->unlink(), ->mknod(), ->mkdir(),
97*25b532ceSMauro Carvalho Chehab->rmdir(), ->link(), ->lseek(), ->symlink(), ->rename()
98*25b532ceSMauro Carvalho Chehaband ->readdir() are called without BKL now.  Grab it on entry, drop upon return
99*25b532ceSMauro Carvalho Chehab- that will guarantee the same locking you used to have.  If your method or its
100*25b532ceSMauro Carvalho Chehabparts do not need BKL - better yet, now you can shift lock_kernel() and
101*25b532ceSMauro Carvalho Chehabunlock_kernel() so that they would protect exactly what needs to be
102*25b532ceSMauro Carvalho Chehabprotected.
103*25b532ceSMauro Carvalho Chehab
104*25b532ceSMauro Carvalho Chehab---
105*25b532ceSMauro Carvalho Chehab
106*25b532ceSMauro Carvalho Chehab**mandatory**
107*25b532ceSMauro Carvalho Chehab
108*25b532ceSMauro Carvalho ChehabBKL is also moved from around sb operations. BKL should have been shifted into
109*25b532ceSMauro Carvalho Chehabindividual fs sb_op functions.  If you don't need it, remove it.
110*25b532ceSMauro Carvalho Chehab
111*25b532ceSMauro Carvalho Chehab---
112*25b532ceSMauro Carvalho Chehab
113*25b532ceSMauro Carvalho Chehab**informational**
114*25b532ceSMauro Carvalho Chehab
115*25b532ceSMauro Carvalho Chehabcheck for ->link() target not being a directory is done by callers.  Feel
116*25b532ceSMauro Carvalho Chehabfree to drop it...
117*25b532ceSMauro Carvalho Chehab
118*25b532ceSMauro Carvalho Chehab---
119*25b532ceSMauro Carvalho Chehab
120*25b532ceSMauro Carvalho Chehab**informational**
121*25b532ceSMauro Carvalho Chehab
122*25b532ceSMauro Carvalho Chehab->link() callers hold ->i_mutex on the object we are linking to.  Some of your
123*25b532ceSMauro Carvalho Chehabproblems might be over...
124*25b532ceSMauro Carvalho Chehab
125*25b532ceSMauro Carvalho Chehab---
126*25b532ceSMauro Carvalho Chehab
127*25b532ceSMauro Carvalho Chehab**mandatory**
128*25b532ceSMauro Carvalho Chehab
129*25b532ceSMauro Carvalho Chehabnew file_system_type method - kill_sb(superblock).  If you are converting
130*25b532ceSMauro Carvalho Chehaban existing filesystem, set it according to ->fs_flags::
131*25b532ceSMauro Carvalho Chehab
132*25b532ceSMauro Carvalho Chehab	FS_REQUIRES_DEV		-	kill_block_super
133*25b532ceSMauro Carvalho Chehab	FS_LITTER		-	kill_litter_super
134*25b532ceSMauro Carvalho Chehab	neither			-	kill_anon_super
135*25b532ceSMauro Carvalho Chehab
136*25b532ceSMauro Carvalho ChehabFS_LITTER is gone - just remove it from fs_flags.
137*25b532ceSMauro Carvalho Chehab
138*25b532ceSMauro Carvalho Chehab---
139*25b532ceSMauro Carvalho Chehab
140*25b532ceSMauro Carvalho Chehab**mandatory**
141*25b532ceSMauro Carvalho Chehab
142*25b532ceSMauro Carvalho ChehabFS_SINGLE is gone (actually, that had happened back when ->get_sb()
143*25b532ceSMauro Carvalho Chehabwent in - and hadn't been documented ;-/).  Just remove it from fs_flags
144*25b532ceSMauro Carvalho Chehab(and see ->get_sb() entry for other actions).
145*25b532ceSMauro Carvalho Chehab
146*25b532ceSMauro Carvalho Chehab---
147*25b532ceSMauro Carvalho Chehab
148*25b532ceSMauro Carvalho Chehab**mandatory**
149*25b532ceSMauro Carvalho Chehab
150*25b532ceSMauro Carvalho Chehab->setattr() is called without BKL now.  Caller _always_ holds ->i_mutex, so
151*25b532ceSMauro Carvalho Chehabwatch for ->i_mutex-grabbing code that might be used by your ->setattr().
152*25b532ceSMauro Carvalho ChehabCallers of notify_change() need ->i_mutex now.
153*25b532ceSMauro Carvalho Chehab
154*25b532ceSMauro Carvalho Chehab---
155*25b532ceSMauro Carvalho Chehab
156*25b532ceSMauro Carvalho Chehab**recommended**
157*25b532ceSMauro Carvalho Chehab
158*25b532ceSMauro Carvalho ChehabNew super_block field ``struct export_operations *s_export_op`` for
159*25b532ceSMauro Carvalho Chehabexplicit support for exporting, e.g. via NFS.  The structure is fully
160*25b532ceSMauro Carvalho Chehabdocumented at its declaration in include/linux/fs.h, and in
161*25b532ceSMauro Carvalho ChehabDocumentation/filesystems/nfs/Exporting.
162*25b532ceSMauro Carvalho Chehab
163*25b532ceSMauro Carvalho ChehabBriefly it allows for the definition of decode_fh and encode_fh operations
164*25b532ceSMauro Carvalho Chehabto encode and decode filehandles, and allows the filesystem to use
165*25b532ceSMauro Carvalho Chehaba standard helper function for decode_fh, and provide file-system specific
166*25b532ceSMauro Carvalho Chehabsupport for this helper, particularly get_parent.
167*25b532ceSMauro Carvalho Chehab
168*25b532ceSMauro Carvalho ChehabIt is planned that this will be required for exporting once the code
169*25b532ceSMauro Carvalho Chehabsettles down a bit.
170*25b532ceSMauro Carvalho Chehab
171*25b532ceSMauro Carvalho Chehab**mandatory**
172*25b532ceSMauro Carvalho Chehab
173*25b532ceSMauro Carvalho Chehabs_export_op is now required for exporting a filesystem.
174*25b532ceSMauro Carvalho Chehabisofs, ext2, ext3, resierfs, fat
175*25b532ceSMauro Carvalho Chehabcan be used as examples of very different filesystems.
176*25b532ceSMauro Carvalho Chehab
177*25b532ceSMauro Carvalho Chehab---
178*25b532ceSMauro Carvalho Chehab
179*25b532ceSMauro Carvalho Chehab**mandatory**
180*25b532ceSMauro Carvalho Chehab
181*25b532ceSMauro Carvalho Chehabiget4() and the read_inode2 callback have been superseded by iget5_locked()
182*25b532ceSMauro Carvalho Chehabwhich has the following prototype::
183*25b532ceSMauro Carvalho Chehab
184*25b532ceSMauro Carvalho Chehab    struct inode *iget5_locked(struct super_block *sb, unsigned long ino,
185*25b532ceSMauro Carvalho Chehab				int (*test)(struct inode *, void *),
186*25b532ceSMauro Carvalho Chehab				int (*set)(struct inode *, void *),
187*25b532ceSMauro Carvalho Chehab				void *data);
188*25b532ceSMauro Carvalho Chehab
189*25b532ceSMauro Carvalho Chehab'test' is an additional function that can be used when the inode
190*25b532ceSMauro Carvalho Chehabnumber is not sufficient to identify the actual file object. 'set'
191*25b532ceSMauro Carvalho Chehabshould be a non-blocking function that initializes those parts of a
192*25b532ceSMauro Carvalho Chehabnewly created inode to allow the test function to succeed. 'data' is
193*25b532ceSMauro Carvalho Chehabpassed as an opaque value to both test and set functions.
194*25b532ceSMauro Carvalho Chehab
195*25b532ceSMauro Carvalho ChehabWhen the inode has been created by iget5_locked(), it will be returned with the
196*25b532ceSMauro Carvalho ChehabI_NEW flag set and will still be locked.  The filesystem then needs to finalize
197*25b532ceSMauro Carvalho Chehabthe initialization. Once the inode is initialized it must be unlocked by
198*25b532ceSMauro Carvalho Chehabcalling unlock_new_inode().
199*25b532ceSMauro Carvalho Chehab
200*25b532ceSMauro Carvalho ChehabThe filesystem is responsible for setting (and possibly testing) i_ino
201*25b532ceSMauro Carvalho Chehabwhen appropriate. There is also a simpler iget_locked function that
202*25b532ceSMauro Carvalho Chehabjust takes the superblock and inode number as arguments and does the
203*25b532ceSMauro Carvalho Chehabtest and set for you.
204*25b532ceSMauro Carvalho Chehab
205*25b532ceSMauro Carvalho Chehabe.g.::
206*25b532ceSMauro Carvalho Chehab
207*25b532ceSMauro Carvalho Chehab	inode = iget_locked(sb, ino);
208*25b532ceSMauro Carvalho Chehab	if (inode->i_state & I_NEW) {
209*25b532ceSMauro Carvalho Chehab		err = read_inode_from_disk(inode);
210*25b532ceSMauro Carvalho Chehab		if (err < 0) {
211*25b532ceSMauro Carvalho Chehab			iget_failed(inode);
212*25b532ceSMauro Carvalho Chehab			return err;
213*25b532ceSMauro Carvalho Chehab		}
214*25b532ceSMauro Carvalho Chehab		unlock_new_inode(inode);
215*25b532ceSMauro Carvalho Chehab	}
216*25b532ceSMauro Carvalho Chehab
217*25b532ceSMauro Carvalho ChehabNote that if the process of setting up a new inode fails, then iget_failed()
218*25b532ceSMauro Carvalho Chehabshould be called on the inode to render it dead, and an appropriate error
219*25b532ceSMauro Carvalho Chehabshould be passed back to the caller.
220*25b532ceSMauro Carvalho Chehab
221*25b532ceSMauro Carvalho Chehab---
222*25b532ceSMauro Carvalho Chehab
223*25b532ceSMauro Carvalho Chehab**recommended**
224*25b532ceSMauro Carvalho Chehab
225*25b532ceSMauro Carvalho Chehab->getattr() finally getting used.  See instances in nfs, minix, etc.
226*25b532ceSMauro Carvalho Chehab
227*25b532ceSMauro Carvalho Chehab---
228*25b532ceSMauro Carvalho Chehab
229*25b532ceSMauro Carvalho Chehab**mandatory**
230*25b532ceSMauro Carvalho Chehab
231*25b532ceSMauro Carvalho Chehab->revalidate() is gone.  If your filesystem had it - provide ->getattr()
232*25b532ceSMauro Carvalho Chehaband let it call whatever you had as ->revlidate() + (for symlinks that
233*25b532ceSMauro Carvalho Chehabhad ->revalidate()) add calls in ->follow_link()/->readlink().
234*25b532ceSMauro Carvalho Chehab
235*25b532ceSMauro Carvalho Chehab---
236*25b532ceSMauro Carvalho Chehab
237*25b532ceSMauro Carvalho Chehab**mandatory**
238*25b532ceSMauro Carvalho Chehab
239*25b532ceSMauro Carvalho Chehab->d_parent changes are not protected by BKL anymore.  Read access is safe
240*25b532ceSMauro Carvalho Chehabif at least one of the following is true:
241*25b532ceSMauro Carvalho Chehab
242*25b532ceSMauro Carvalho Chehab	* filesystem has no cross-directory rename()
243*25b532ceSMauro Carvalho Chehab	* we know that parent had been locked (e.g. we are looking at
244*25b532ceSMauro Carvalho Chehab	  ->d_parent of ->lookup() argument).
245*25b532ceSMauro Carvalho Chehab	* we are called from ->rename().
246*25b532ceSMauro Carvalho Chehab	* the child's ->d_lock is held
247*25b532ceSMauro Carvalho Chehab
248*25b532ceSMauro Carvalho ChehabAudit your code and add locking if needed.  Notice that any place that is
249*25b532ceSMauro Carvalho Chehabnot protected by the conditions above is risky even in the old tree - you
250*25b532ceSMauro Carvalho Chehabhad been relying on BKL and that's prone to screwups.  Old tree had quite
251*25b532ceSMauro Carvalho Chehaba few holes of that kind - unprotected access to ->d_parent leading to
252*25b532ceSMauro Carvalho Chehabanything from oops to silent memory corruption.
253*25b532ceSMauro Carvalho Chehab
254*25b532ceSMauro Carvalho Chehab---
255*25b532ceSMauro Carvalho Chehab
256*25b532ceSMauro Carvalho Chehab**mandatory**
257*25b532ceSMauro Carvalho Chehab
258*25b532ceSMauro Carvalho ChehabFS_NOMOUNT is gone.  If you use it - just set SB_NOUSER in flags
259*25b532ceSMauro Carvalho Chehab(see rootfs for one kind of solution and bdev/socket/pipe for another).
260*25b532ceSMauro Carvalho Chehab
261*25b532ceSMauro Carvalho Chehab---
262*25b532ceSMauro Carvalho Chehab
263*25b532ceSMauro Carvalho Chehab**recommended**
264*25b532ceSMauro Carvalho Chehab
265*25b532ceSMauro Carvalho ChehabUse bdev_read_only(bdev) instead of is_read_only(kdev).  The latter
266*25b532ceSMauro Carvalho Chehabis still alive, but only because of the mess in drivers/s390/block/dasd.c.
267*25b532ceSMauro Carvalho ChehabAs soon as it gets fixed is_read_only() will die.
268*25b532ceSMauro Carvalho Chehab
269*25b532ceSMauro Carvalho Chehab---
270*25b532ceSMauro Carvalho Chehab
271*25b532ceSMauro Carvalho Chehab**mandatory**
272*25b532ceSMauro Carvalho Chehab
273*25b532ceSMauro Carvalho Chehab->permission() is called without BKL now. Grab it on entry, drop upon
274*25b532ceSMauro Carvalho Chehabreturn - that will guarantee the same locking you used to have.  If
275*25b532ceSMauro Carvalho Chehabyour method or its parts do not need BKL - better yet, now you can
276*25b532ceSMauro Carvalho Chehabshift lock_kernel() and unlock_kernel() so that they would protect
277*25b532ceSMauro Carvalho Chehabexactly what needs to be protected.
278*25b532ceSMauro Carvalho Chehab
279*25b532ceSMauro Carvalho Chehab---
280*25b532ceSMauro Carvalho Chehab
281*25b532ceSMauro Carvalho Chehab**mandatory**
282*25b532ceSMauro Carvalho Chehab
283*25b532ceSMauro Carvalho Chehab->statfs() is now called without BKL held.  BKL should have been
284*25b532ceSMauro Carvalho Chehabshifted into individual fs sb_op functions where it's not clear that
285*25b532ceSMauro Carvalho Chehabit's safe to remove it.  If you don't need it, remove it.
286*25b532ceSMauro Carvalho Chehab
287*25b532ceSMauro Carvalho Chehab---
288*25b532ceSMauro Carvalho Chehab
289*25b532ceSMauro Carvalho Chehab**mandatory**
290*25b532ceSMauro Carvalho Chehab
291*25b532ceSMauro Carvalho Chehabis_read_only() is gone; use bdev_read_only() instead.
292*25b532ceSMauro Carvalho Chehab
293*25b532ceSMauro Carvalho Chehab---
294*25b532ceSMauro Carvalho Chehab
295*25b532ceSMauro Carvalho Chehab**mandatory**
296*25b532ceSMauro Carvalho Chehab
297*25b532ceSMauro Carvalho Chehabdestroy_buffers() is gone; use invalidate_bdev().
298*25b532ceSMauro Carvalho Chehab
299*25b532ceSMauro Carvalho Chehab---
300*25b532ceSMauro Carvalho Chehab
301*25b532ceSMauro Carvalho Chehab**mandatory**
302*25b532ceSMauro Carvalho Chehab
303*25b532ceSMauro Carvalho Chehabfsync_dev() is gone; use fsync_bdev().  NOTE: lvm breakage is
304*25b532ceSMauro Carvalho Chehabdeliberate; as soon as struct block_device * is propagated in a reasonable
305*25b532ceSMauro Carvalho Chehabway by that code fixing will become trivial; until then nothing can be
306*25b532ceSMauro Carvalho Chehabdone.
307*25b532ceSMauro Carvalho Chehab
308*25b532ceSMauro Carvalho Chehab**mandatory**
309*25b532ceSMauro Carvalho Chehab
310*25b532ceSMauro Carvalho Chehabblock truncatation on error exit from ->write_begin, and ->direct_IO
311*25b532ceSMauro Carvalho Chehabmoved from generic methods (block_write_begin, cont_write_begin,
312*25b532ceSMauro Carvalho Chehabnobh_write_begin, blockdev_direct_IO*) to callers.  Take a look at
313*25b532ceSMauro Carvalho Chehabext2_write_failed and callers for an example.
314*25b532ceSMauro Carvalho Chehab
315*25b532ceSMauro Carvalho Chehab**mandatory**
316*25b532ceSMauro Carvalho Chehab
317*25b532ceSMauro Carvalho Chehab->truncate is gone.  The whole truncate sequence needs to be
318*25b532ceSMauro Carvalho Chehabimplemented in ->setattr, which is now mandatory for filesystems
319*25b532ceSMauro Carvalho Chehabimplementing on-disk size changes.  Start with a copy of the old inode_setattr
320*25b532ceSMauro Carvalho Chehaband vmtruncate, and the reorder the vmtruncate + foofs_vmtruncate sequence to
321*25b532ceSMauro Carvalho Chehabbe in order of zeroing blocks using block_truncate_page or similar helpers,
322*25b532ceSMauro Carvalho Chehabsize update and on finally on-disk truncation which should not fail.
323*25b532ceSMauro Carvalho Chehabsetattr_prepare (which used to be inode_change_ok) now includes the size checks
324*25b532ceSMauro Carvalho Chehabfor ATTR_SIZE and must be called in the beginning of ->setattr unconditionally.
325*25b532ceSMauro Carvalho Chehab
326*25b532ceSMauro Carvalho Chehab**mandatory**
327*25b532ceSMauro Carvalho Chehab
328*25b532ceSMauro Carvalho Chehab->clear_inode() and ->delete_inode() are gone; ->evict_inode() should
329*25b532ceSMauro Carvalho Chehabbe used instead.  It gets called whenever the inode is evicted, whether it has
330*25b532ceSMauro Carvalho Chehabremaining links or not.  Caller does *not* evict the pagecache or inode-associated
331*25b532ceSMauro Carvalho Chehabmetadata buffers; the method has to use truncate_inode_pages_final() to get rid
332*25b532ceSMauro Carvalho Chehabof those. Caller makes sure async writeback cannot be running for the inode while
333*25b532ceSMauro Carvalho Chehab(or after) ->evict_inode() is called.
334*25b532ceSMauro Carvalho Chehab
335*25b532ceSMauro Carvalho Chehab->drop_inode() returns int now; it's called on final iput() with
336*25b532ceSMauro Carvalho Chehabinode->i_lock held and it returns true if filesystems wants the inode to be
337*25b532ceSMauro Carvalho Chehabdropped.  As before, generic_drop_inode() is still the default and it's been
338*25b532ceSMauro Carvalho Chehabupdated appropriately.  generic_delete_inode() is also alive and it consists
339*25b532ceSMauro Carvalho Chehabsimply of return 1.  Note that all actual eviction work is done by caller after
340*25b532ceSMauro Carvalho Chehab->drop_inode() returns.
341*25b532ceSMauro Carvalho Chehab
342*25b532ceSMauro Carvalho ChehabAs before, clear_inode() must be called exactly once on each call of
343*25b532ceSMauro Carvalho Chehab->evict_inode() (as it used to be for each call of ->delete_inode()).  Unlike
344*25b532ceSMauro Carvalho Chehabbefore, if you are using inode-associated metadata buffers (i.e.
345*25b532ceSMauro Carvalho Chehabmark_buffer_dirty_inode()), it's your responsibility to call
346*25b532ceSMauro Carvalho Chehabinvalidate_inode_buffers() before clear_inode().
347*25b532ceSMauro Carvalho Chehab
348*25b532ceSMauro Carvalho ChehabNOTE: checking i_nlink in the beginning of ->write_inode() and bailing out
349*25b532ceSMauro Carvalho Chehabif it's zero is not *and* *never* *had* *been* enough.  Final unlink() and iput()
350*25b532ceSMauro Carvalho Chehabmay happen while the inode is in the middle of ->write_inode(); e.g. if you blindly
351*25b532ceSMauro Carvalho Chehabfree the on-disk inode, you may end up doing that while ->write_inode() is writing
352*25b532ceSMauro Carvalho Chehabto it.
353*25b532ceSMauro Carvalho Chehab
354*25b532ceSMauro Carvalho Chehab---
355*25b532ceSMauro Carvalho Chehab
356*25b532ceSMauro Carvalho Chehab**mandatory**
357*25b532ceSMauro Carvalho Chehab
358*25b532ceSMauro Carvalho Chehab.d_delete() now only advises the dcache as to whether or not to cache
359*25b532ceSMauro Carvalho Chehabunreferenced dentries, and is now only called when the dentry refcount goes to
360*25b532ceSMauro Carvalho Chehab0. Even on 0 refcount transition, it must be able to tolerate being called 0,
361*25b532ceSMauro Carvalho Chehab1, or more times (eg. constant, idempotent).
362*25b532ceSMauro Carvalho Chehab
363*25b532ceSMauro Carvalho Chehab---
364*25b532ceSMauro Carvalho Chehab
365*25b532ceSMauro Carvalho Chehab**mandatory**
366*25b532ceSMauro Carvalho Chehab
367*25b532ceSMauro Carvalho Chehab.d_compare() calling convention and locking rules are significantly
368*25b532ceSMauro Carvalho Chehabchanged. Read updated documentation in Documentation/filesystems/vfs.rst (and
369*25b532ceSMauro Carvalho Chehablook at examples of other filesystems) for guidance.
370*25b532ceSMauro Carvalho Chehab
371*25b532ceSMauro Carvalho Chehab---
372*25b532ceSMauro Carvalho Chehab
373*25b532ceSMauro Carvalho Chehab**mandatory**
374*25b532ceSMauro Carvalho Chehab
375*25b532ceSMauro Carvalho Chehab.d_hash() calling convention and locking rules are significantly
376*25b532ceSMauro Carvalho Chehabchanged. Read updated documentation in Documentation/filesystems/vfs.rst (and
377*25b532ceSMauro Carvalho Chehablook at examples of other filesystems) for guidance.
378*25b532ceSMauro Carvalho Chehab
379*25b532ceSMauro Carvalho Chehab---
380*25b532ceSMauro Carvalho Chehab
381*25b532ceSMauro Carvalho Chehab**mandatory**
382*25b532ceSMauro Carvalho Chehab
383*25b532ceSMauro Carvalho Chehabdcache_lock is gone, replaced by fine grained locks. See fs/dcache.c
384*25b532ceSMauro Carvalho Chehabfor details of what locks to replace dcache_lock with in order to protect
385*25b532ceSMauro Carvalho Chehabparticular things. Most of the time, a filesystem only needs ->d_lock, which
386*25b532ceSMauro Carvalho Chehabprotects *all* the dcache state of a given dentry.
387*25b532ceSMauro Carvalho Chehab
388*25b532ceSMauro Carvalho Chehab---
389*25b532ceSMauro Carvalho Chehab
390*25b532ceSMauro Carvalho Chehab**mandatory**
391*25b532ceSMauro Carvalho Chehab
392*25b532ceSMauro Carvalho ChehabFilesystems must RCU-free their inodes, if they can have been accessed
393*25b532ceSMauro Carvalho Chehabvia rcu-walk path walk (basically, if the file can have had a path name in the
394*25b532ceSMauro Carvalho Chehabvfs namespace).
395*25b532ceSMauro Carvalho Chehab
396*25b532ceSMauro Carvalho ChehabEven though i_dentry and i_rcu share storage in a union, we will
397*25b532ceSMauro Carvalho Chehabinitialize the former in inode_init_always(), so just leave it alone in
398*25b532ceSMauro Carvalho Chehabthe callback.  It used to be necessary to clean it there, but not anymore
399*25b532ceSMauro Carvalho Chehab(starting at 3.2).
400*25b532ceSMauro Carvalho Chehab
401*25b532ceSMauro Carvalho Chehab---
402*25b532ceSMauro Carvalho Chehab
403*25b532ceSMauro Carvalho Chehab**recommended**
404*25b532ceSMauro Carvalho Chehab
405*25b532ceSMauro Carvalho Chehabvfs now tries to do path walking in "rcu-walk mode", which avoids
406*25b532ceSMauro Carvalho Chehabatomic operations and scalability hazards on dentries and inodes (see
407*25b532ceSMauro Carvalho ChehabDocumentation/filesystems/path-lookup.txt). d_hash and d_compare changes
408*25b532ceSMauro Carvalho Chehab(above) are examples of the changes required to support this. For more complex
409*25b532ceSMauro Carvalho Chehabfilesystem callbacks, the vfs drops out of rcu-walk mode before the fs call, so
410*25b532ceSMauro Carvalho Chehabno changes are required to the filesystem. However, this is costly and loses
411*25b532ceSMauro Carvalho Chehabthe benefits of rcu-walk mode. We will begin to add filesystem callbacks that
412*25b532ceSMauro Carvalho Chehabare rcu-walk aware, shown below. Filesystems should take advantage of this
413*25b532ceSMauro Carvalho Chehabwhere possible.
414*25b532ceSMauro Carvalho Chehab
415*25b532ceSMauro Carvalho Chehab---
416*25b532ceSMauro Carvalho Chehab
417*25b532ceSMauro Carvalho Chehab**mandatory**
418*25b532ceSMauro Carvalho Chehab
419*25b532ceSMauro Carvalho Chehabd_revalidate is a callback that is made on every path element (if
420*25b532ceSMauro Carvalho Chehabthe filesystem provides it), which requires dropping out of rcu-walk mode. This
421*25b532ceSMauro Carvalho Chehabmay now be called in rcu-walk mode (nd->flags & LOOKUP_RCU). -ECHILD should be
422*25b532ceSMauro Carvalho Chehabreturned if the filesystem cannot handle rcu-walk. See
423*25b532ceSMauro Carvalho ChehabDocumentation/filesystems/vfs.rst for more details.
424*25b532ceSMauro Carvalho Chehab
425*25b532ceSMauro Carvalho Chehabpermission is an inode permission check that is called on many or all
426*25b532ceSMauro Carvalho Chehabdirectory inodes on the way down a path walk (to check for exec permission). It
427*25b532ceSMauro Carvalho Chehabmust now be rcu-walk aware (mask & MAY_NOT_BLOCK).  See
428*25b532ceSMauro Carvalho ChehabDocumentation/filesystems/vfs.rst for more details.
429*25b532ceSMauro Carvalho Chehab
430*25b532ceSMauro Carvalho Chehab---
431*25b532ceSMauro Carvalho Chehab
432*25b532ceSMauro Carvalho Chehab**mandatory**
433*25b532ceSMauro Carvalho Chehab
434*25b532ceSMauro Carvalho ChehabIn ->fallocate() you must check the mode option passed in.  If your
435*25b532ceSMauro Carvalho Chehabfilesystem does not support hole punching (deallocating space in the middle of a
436*25b532ceSMauro Carvalho Chehabfile) you must return -EOPNOTSUPP if FALLOC_FL_PUNCH_HOLE is set in mode.
437*25b532ceSMauro Carvalho ChehabCurrently you can only have FALLOC_FL_PUNCH_HOLE with FALLOC_FL_KEEP_SIZE set,
438*25b532ceSMauro Carvalho Chehabso the i_size should not change when hole punching, even when puching the end of
439*25b532ceSMauro Carvalho Chehaba file off.
440*25b532ceSMauro Carvalho Chehab
441*25b532ceSMauro Carvalho Chehab---
442*25b532ceSMauro Carvalho Chehab
443*25b532ceSMauro Carvalho Chehab**mandatory**
444*25b532ceSMauro Carvalho Chehab
445*25b532ceSMauro Carvalho Chehab->get_sb() is gone.  Switch to use of ->mount().  Typically it's just
446*25b532ceSMauro Carvalho Chehaba matter of switching from calling ``get_sb_``... to ``mount_``... and changing
447*25b532ceSMauro Carvalho Chehabthe function type.  If you were doing it manually, just switch from setting
448*25b532ceSMauro Carvalho Chehab->mnt_root to some pointer to returning that pointer.  On errors return
449*25b532ceSMauro Carvalho ChehabERR_PTR(...).
450*25b532ceSMauro Carvalho Chehab
451*25b532ceSMauro Carvalho Chehab---
452*25b532ceSMauro Carvalho Chehab
453*25b532ceSMauro Carvalho Chehab**mandatory**
454*25b532ceSMauro Carvalho Chehab
455*25b532ceSMauro Carvalho Chehab->permission() and generic_permission()have lost flags
456*25b532ceSMauro Carvalho Chehabargument; instead of passing IPERM_FLAG_RCU we add MAY_NOT_BLOCK into mask.
457*25b532ceSMauro Carvalho Chehab
458*25b532ceSMauro Carvalho Chehabgeneric_permission() has also lost the check_acl argument; ACL checking
459*25b532ceSMauro Carvalho Chehabhas been taken to VFS and filesystems need to provide a non-NULL ->i_op->get_acl
460*25b532ceSMauro Carvalho Chehabto read an ACL from disk.
461*25b532ceSMauro Carvalho Chehab
462*25b532ceSMauro Carvalho Chehab---
463*25b532ceSMauro Carvalho Chehab
464*25b532ceSMauro Carvalho Chehab**mandatory**
465*25b532ceSMauro Carvalho Chehab
466*25b532ceSMauro Carvalho ChehabIf you implement your own ->llseek() you must handle SEEK_HOLE and
467*25b532ceSMauro Carvalho ChehabSEEK_DATA.  You can hanle this by returning -EINVAL, but it would be nicer to
468*25b532ceSMauro Carvalho Chehabsupport it in some way.  The generic handler assumes that the entire file is
469*25b532ceSMauro Carvalho Chehabdata and there is a virtual hole at the end of the file.  So if the provided
470*25b532ceSMauro Carvalho Chehaboffset is less than i_size and SEEK_DATA is specified, return the same offset.
471*25b532ceSMauro Carvalho ChehabIf the above is true for the offset and you are given SEEK_HOLE, return the end
472*25b532ceSMauro Carvalho Chehabof the file.  If the offset is i_size or greater return -ENXIO in either case.
473*25b532ceSMauro Carvalho Chehab
474*25b532ceSMauro Carvalho Chehab**mandatory**
475*25b532ceSMauro Carvalho Chehab
476*25b532ceSMauro Carvalho ChehabIf you have your own ->fsync() you must make sure to call
477*25b532ceSMauro Carvalho Chehabfilemap_write_and_wait_range() so that all dirty pages are synced out properly.
478*25b532ceSMauro Carvalho ChehabYou must also keep in mind that ->fsync() is not called with i_mutex held
479*25b532ceSMauro Carvalho Chehabanymore, so if you require i_mutex locking you must make sure to take it and
480*25b532ceSMauro Carvalho Chehabrelease it yourself.
481*25b532ceSMauro Carvalho Chehab
482*25b532ceSMauro Carvalho Chehab---
483*25b532ceSMauro Carvalho Chehab
484*25b532ceSMauro Carvalho Chehab**mandatory**
485*25b532ceSMauro Carvalho Chehab
486*25b532ceSMauro Carvalho Chehabd_alloc_root() is gone, along with a lot of bugs caused by code
487*25b532ceSMauro Carvalho Chehabmisusing it.  Replacement: d_make_root(inode).  On success d_make_root(inode)
488*25b532ceSMauro Carvalho Chehaballocates and returns a new dentry instantiated with the passed in inode.
489*25b532ceSMauro Carvalho ChehabOn failure NULL is returned and the passed in inode is dropped so the reference
490*25b532ceSMauro Carvalho Chehabto inode is consumed in all cases and failure handling need not do any cleanup
491*25b532ceSMauro Carvalho Chehabfor the inode.  If d_make_root(inode) is passed a NULL inode it returns NULL
492*25b532ceSMauro Carvalho Chehaband also requires no further error handling. Typical usage is::
493*25b532ceSMauro Carvalho Chehab
494*25b532ceSMauro Carvalho Chehab	inode = foofs_new_inode(....);
495*25b532ceSMauro Carvalho Chehab	s->s_root = d_make_root(inode);
496*25b532ceSMauro Carvalho Chehab	if (!s->s_root)
497*25b532ceSMauro Carvalho Chehab		/* Nothing needed for the inode cleanup */
498*25b532ceSMauro Carvalho Chehab		return -ENOMEM;
499*25b532ceSMauro Carvalho Chehab	...
500*25b532ceSMauro Carvalho Chehab
501*25b532ceSMauro Carvalho Chehab---
502*25b532ceSMauro Carvalho Chehab
503*25b532ceSMauro Carvalho Chehab**mandatory**
504*25b532ceSMauro Carvalho Chehab
505*25b532ceSMauro Carvalho ChehabThe witch is dead!  Well, 2/3 of it, anyway.  ->d_revalidate() and
506*25b532ceSMauro Carvalho Chehab->lookup() do *not* take struct nameidata anymore; just the flags.
507*25b532ceSMauro Carvalho Chehab
508*25b532ceSMauro Carvalho Chehab---
509*25b532ceSMauro Carvalho Chehab
510*25b532ceSMauro Carvalho Chehab**mandatory**
511*25b532ceSMauro Carvalho Chehab
512*25b532ceSMauro Carvalho Chehab->create() doesn't take ``struct nameidata *``; unlike the previous
513*25b532ceSMauro Carvalho Chehabtwo, it gets "is it an O_EXCL or equivalent?" boolean argument.  Note that
514*25b532ceSMauro Carvalho Chehablocal filesystems can ignore tha argument - they are guaranteed that the
515*25b532ceSMauro Carvalho Chehabobject doesn't exist.  It's remote/distributed ones that might care...
516*25b532ceSMauro Carvalho Chehab
517*25b532ceSMauro Carvalho Chehab---
518*25b532ceSMauro Carvalho Chehab
519*25b532ceSMauro Carvalho Chehab**mandatory**
520*25b532ceSMauro Carvalho Chehab
521*25b532ceSMauro Carvalho ChehabFS_REVAL_DOT is gone; if you used to have it, add ->d_weak_revalidate()
522*25b532ceSMauro Carvalho Chehabin your dentry operations instead.
523*25b532ceSMauro Carvalho Chehab
524*25b532ceSMauro Carvalho Chehab---
525*25b532ceSMauro Carvalho Chehab
526*25b532ceSMauro Carvalho Chehab**mandatory**
527*25b532ceSMauro Carvalho Chehab
528*25b532ceSMauro Carvalho Chehabvfs_readdir() is gone; switch to iterate_dir() instead
529*25b532ceSMauro Carvalho Chehab
530*25b532ceSMauro Carvalho Chehab---
531*25b532ceSMauro Carvalho Chehab
532*25b532ceSMauro Carvalho Chehab**mandatory**
533*25b532ceSMauro Carvalho Chehab
534*25b532ceSMauro Carvalho Chehab->readdir() is gone now; switch to ->iterate()
535*25b532ceSMauro Carvalho Chehab
536*25b532ceSMauro Carvalho Chehab**mandatory**
537*25b532ceSMauro Carvalho Chehab
538*25b532ceSMauro Carvalho Chehabvfs_follow_link has been removed.  Filesystems must use nd_set_link
539*25b532ceSMauro Carvalho Chehabfrom ->follow_link for normal symlinks, or nd_jump_link for magic
540*25b532ceSMauro Carvalho Chehab/proc/<pid> style links.
541*25b532ceSMauro Carvalho Chehab
542*25b532ceSMauro Carvalho Chehab---
543*25b532ceSMauro Carvalho Chehab
544*25b532ceSMauro Carvalho Chehab**mandatory**
545*25b532ceSMauro Carvalho Chehab
546*25b532ceSMauro Carvalho Chehabiget5_locked()/ilookup5()/ilookup5_nowait() test() callback used to be
547*25b532ceSMauro Carvalho Chehabcalled with both ->i_lock and inode_hash_lock held; the former is *not*
548*25b532ceSMauro Carvalho Chehabtaken anymore, so verify that your callbacks do not rely on it (none
549*25b532ceSMauro Carvalho Chehabof the in-tree instances did).  inode_hash_lock is still held,
550*25b532ceSMauro Carvalho Chehabof course, so they are still serialized wrt removal from inode hash,
551*25b532ceSMauro Carvalho Chehabas well as wrt set() callback of iget5_locked().
552*25b532ceSMauro Carvalho Chehab
553*25b532ceSMauro Carvalho Chehab---
554*25b532ceSMauro Carvalho Chehab
555*25b532ceSMauro Carvalho Chehab**mandatory**
556*25b532ceSMauro Carvalho Chehab
557*25b532ceSMauro Carvalho Chehabd_materialise_unique() is gone; d_splice_alias() does everything you
558*25b532ceSMauro Carvalho Chehabneed now.  Remember that they have opposite orders of arguments ;-/
559*25b532ceSMauro Carvalho Chehab
560*25b532ceSMauro Carvalho Chehab---
561*25b532ceSMauro Carvalho Chehab
562*25b532ceSMauro Carvalho Chehab**mandatory**
563*25b532ceSMauro Carvalho Chehab
564*25b532ceSMauro Carvalho Chehabf_dentry is gone; use f_path.dentry, or, better yet, see if you can avoid
565*25b532ceSMauro Carvalho Chehabit entirely.
566*25b532ceSMauro Carvalho Chehab
567*25b532ceSMauro Carvalho Chehab---
568*25b532ceSMauro Carvalho Chehab
569*25b532ceSMauro Carvalho Chehab**mandatory**
570*25b532ceSMauro Carvalho Chehab
571*25b532ceSMauro Carvalho Chehabnever call ->read() and ->write() directly; use __vfs_{read,write} or
572*25b532ceSMauro Carvalho Chehabwrappers; instead of checking for ->write or ->read being NULL, look for
573*25b532ceSMauro Carvalho ChehabFMODE_CAN_{WRITE,READ} in file->f_mode.
574*25b532ceSMauro Carvalho Chehab
575*25b532ceSMauro Carvalho Chehab---
576*25b532ceSMauro Carvalho Chehab
577*25b532ceSMauro Carvalho Chehab**mandatory**
578*25b532ceSMauro Carvalho Chehab
579*25b532ceSMauro Carvalho Chehabdo _not_ use new_sync_{read,write} for ->read/->write; leave it NULL
580*25b532ceSMauro Carvalho Chehabinstead.
581*25b532ceSMauro Carvalho Chehab
582*25b532ceSMauro Carvalho Chehab---
583*25b532ceSMauro Carvalho Chehab
584*25b532ceSMauro Carvalho Chehab**mandatory**
585*25b532ceSMauro Carvalho Chehab	->aio_read/->aio_write are gone.  Use ->read_iter/->write_iter.
586*25b532ceSMauro Carvalho Chehab
587*25b532ceSMauro Carvalho Chehab---
588*25b532ceSMauro Carvalho Chehab
589*25b532ceSMauro Carvalho Chehab**recommended**
590*25b532ceSMauro Carvalho Chehab
591*25b532ceSMauro Carvalho Chehabfor embedded ("fast") symlinks just set inode->i_link to wherever the
592*25b532ceSMauro Carvalho Chehabsymlink body is and use simple_follow_link() as ->follow_link().
593*25b532ceSMauro Carvalho Chehab
594*25b532ceSMauro Carvalho Chehab---
595*25b532ceSMauro Carvalho Chehab
596*25b532ceSMauro Carvalho Chehab**mandatory**
597*25b532ceSMauro Carvalho Chehab
598*25b532ceSMauro Carvalho Chehabcalling conventions for ->follow_link() have changed.  Instead of returning
599*25b532ceSMauro Carvalho Chehabcookie and using nd_set_link() to store the body to traverse, we return
600*25b532ceSMauro Carvalho Chehabthe body to traverse and store the cookie using explicit void ** argument.
601*25b532ceSMauro Carvalho Chehabnameidata isn't passed at all - nd_jump_link() doesn't need it and
602*25b532ceSMauro Carvalho Chehabnd_[gs]et_link() is gone.
603*25b532ceSMauro Carvalho Chehab
604*25b532ceSMauro Carvalho Chehab---
605*25b532ceSMauro Carvalho Chehab
606*25b532ceSMauro Carvalho Chehab**mandatory**
607*25b532ceSMauro Carvalho Chehab
608*25b532ceSMauro Carvalho Chehabcalling conventions for ->put_link() have changed.  It gets inode instead of
609*25b532ceSMauro Carvalho Chehabdentry,  it does not get nameidata at all and it gets called only when cookie
610*25b532ceSMauro Carvalho Chehabis non-NULL.  Note that link body isn't available anymore, so if you need it,
611*25b532ceSMauro Carvalho Chehabstore it as cookie.
612*25b532ceSMauro Carvalho Chehab
613*25b532ceSMauro Carvalho Chehab---
614*25b532ceSMauro Carvalho Chehab
615*25b532ceSMauro Carvalho Chehab**mandatory**
616*25b532ceSMauro Carvalho Chehab
617*25b532ceSMauro Carvalho Chehabany symlink that might use page_follow_link_light/page_put_link() must
618*25b532ceSMauro Carvalho Chehabhave inode_nohighmem(inode) called before anything might start playing with
619*25b532ceSMauro Carvalho Chehabits pagecache.  No highmem pages should end up in the pagecache of such
620*25b532ceSMauro Carvalho Chehabsymlinks.  That includes any preseeding that might be done during symlink
621*25b532ceSMauro Carvalho Chehabcreation.  __page_symlink() will honour the mapping gfp flags, so once
622*25b532ceSMauro Carvalho Chehabyou've done inode_nohighmem() it's safe to use, but if you allocate and
623*25b532ceSMauro Carvalho Chehabinsert the page manually, make sure to use the right gfp flags.
624*25b532ceSMauro Carvalho Chehab
625*25b532ceSMauro Carvalho Chehab---
626*25b532ceSMauro Carvalho Chehab
627*25b532ceSMauro Carvalho Chehab**mandatory**
628*25b532ceSMauro Carvalho Chehab
629*25b532ceSMauro Carvalho Chehab->follow_link() is replaced with ->get_link(); same API, except that
630*25b532ceSMauro Carvalho Chehab
631*25b532ceSMauro Carvalho Chehab	* ->get_link() gets inode as a separate argument
632*25b532ceSMauro Carvalho Chehab	* ->get_link() may be called in RCU mode - in that case NULL
633*25b532ceSMauro Carvalho Chehab	  dentry is passed
634*25b532ceSMauro Carvalho Chehab
635*25b532ceSMauro Carvalho Chehab---
636*25b532ceSMauro Carvalho Chehab
637*25b532ceSMauro Carvalho Chehab**mandatory**
638*25b532ceSMauro Carvalho Chehab
639*25b532ceSMauro Carvalho Chehab->get_link() gets struct delayed_call ``*done`` now, and should do
640*25b532ceSMauro Carvalho Chehabset_delayed_call() where it used to set ``*cookie``.
641*25b532ceSMauro Carvalho Chehab
642*25b532ceSMauro Carvalho Chehab->put_link() is gone - just give the destructor to set_delayed_call()
643*25b532ceSMauro Carvalho Chehabin ->get_link().
644*25b532ceSMauro Carvalho Chehab
645*25b532ceSMauro Carvalho Chehab---
646*25b532ceSMauro Carvalho Chehab
647*25b532ceSMauro Carvalho Chehab**mandatory**
648*25b532ceSMauro Carvalho Chehab
649*25b532ceSMauro Carvalho Chehab->getxattr() and xattr_handler.get() get dentry and inode passed separately.
650*25b532ceSMauro Carvalho Chehabdentry might be yet to be attached to inode, so do _not_ use its ->d_inode
651*25b532ceSMauro Carvalho Chehabin the instances.  Rationale: !@#!@# security_d_instantiate() needs to be
652*25b532ceSMauro Carvalho Chehabcalled before we attach dentry to inode.
653*25b532ceSMauro Carvalho Chehab
654*25b532ceSMauro Carvalho Chehab---
655*25b532ceSMauro Carvalho Chehab
656*25b532ceSMauro Carvalho Chehab**mandatory**
657*25b532ceSMauro Carvalho Chehab
658*25b532ceSMauro Carvalho Chehabsymlinks are no longer the only inodes that do *not* have i_bdev/i_cdev/
659*25b532ceSMauro Carvalho Chehabi_pipe/i_link union zeroed out at inode eviction.  As the result, you can't
660*25b532ceSMauro Carvalho Chehabassume that non-NULL value in ->i_nlink at ->destroy_inode() implies that
661*25b532ceSMauro Carvalho Chehabit's a symlink.  Checking ->i_mode is really needed now.  In-tree we had
662*25b532ceSMauro Carvalho Chehabto fix shmem_destroy_callback() that used to take that kind of shortcut;
663*25b532ceSMauro Carvalho Chehabwatch out, since that shortcut is no longer valid.
664*25b532ceSMauro Carvalho Chehab
665*25b532ceSMauro Carvalho Chehab---
666*25b532ceSMauro Carvalho Chehab
667*25b532ceSMauro Carvalho Chehab**mandatory**
668*25b532ceSMauro Carvalho Chehab
669*25b532ceSMauro Carvalho Chehab->i_mutex is replaced with ->i_rwsem now.  inode_lock() et.al. work as
670*25b532ceSMauro Carvalho Chehabthey used to - they just take it exclusive.  However, ->lookup() may be
671*25b532ceSMauro Carvalho Chehabcalled with parent locked shared.  Its instances must not
672*25b532ceSMauro Carvalho Chehab
673*25b532ceSMauro Carvalho Chehab	* use d_instantiate) and d_rehash() separately - use d_add() or
674*25b532ceSMauro Carvalho Chehab	  d_splice_alias() instead.
675*25b532ceSMauro Carvalho Chehab	* use d_rehash() alone - call d_add(new_dentry, NULL) instead.
676*25b532ceSMauro Carvalho Chehab	* in the unlikely case when (read-only) access to filesystem
677*25b532ceSMauro Carvalho Chehab	  data structures needs exclusion for some reason, arrange it
678*25b532ceSMauro Carvalho Chehab	  yourself.  None of the in-tree filesystems needed that.
679*25b532ceSMauro Carvalho Chehab	* rely on ->d_parent and ->d_name not changing after dentry has
680*25b532ceSMauro Carvalho Chehab	  been fed to d_add() or d_splice_alias().  Again, none of the
681*25b532ceSMauro Carvalho Chehab	  in-tree instances relied upon that.
682*25b532ceSMauro Carvalho Chehab
683*25b532ceSMauro Carvalho ChehabWe are guaranteed that lookups of the same name in the same directory
684*25b532ceSMauro Carvalho Chehabwill not happen in parallel ("same" in the sense of your ->d_compare()).
685*25b532ceSMauro Carvalho ChehabLookups on different names in the same directory can and do happen in
686*25b532ceSMauro Carvalho Chehabparallel now.
687*25b532ceSMauro Carvalho Chehab
688*25b532ceSMauro Carvalho Chehab---
689*25b532ceSMauro Carvalho Chehab
690*25b532ceSMauro Carvalho Chehab**recommended**
691*25b532ceSMauro Carvalho Chehab
692*25b532ceSMauro Carvalho Chehab->iterate_shared() is added; it's a parallel variant of ->iterate().
693*25b532ceSMauro Carvalho ChehabExclusion on struct file level is still provided (as well as that
694*25b532ceSMauro Carvalho Chehabbetween it and lseek on the same struct file), but if your directory
695*25b532ceSMauro Carvalho Chehabhas been opened several times, you can get these called in parallel.
696*25b532ceSMauro Carvalho ChehabExclusion between that method and all directory-modifying ones is
697*25b532ceSMauro Carvalho Chehabstill provided, of course.
698*25b532ceSMauro Carvalho Chehab
699*25b532ceSMauro Carvalho ChehabOften enough ->iterate() can serve as ->iterate_shared() without any
700*25b532ceSMauro Carvalho Chehabchanges - it is a read-only operation, after all.  If you have any
701*25b532ceSMauro Carvalho Chehabper-inode or per-dentry in-core data structures modified by ->iterate(),
702*25b532ceSMauro Carvalho Chehabyou might need something to serialize the access to them.  If you
703*25b532ceSMauro Carvalho Chehabdo dcache pre-seeding, you'll need to switch to d_alloc_parallel() for
704*25b532ceSMauro Carvalho Chehabthat; look for in-tree examples.
705*25b532ceSMauro Carvalho Chehab
706*25b532ceSMauro Carvalho ChehabOld method is only used if the new one is absent; eventually it will
707*25b532ceSMauro Carvalho Chehabbe removed.  Switch while you still can; the old one won't stay.
708*25b532ceSMauro Carvalho Chehab
709*25b532ceSMauro Carvalho Chehab---
710*25b532ceSMauro Carvalho Chehab
711*25b532ceSMauro Carvalho Chehab**mandatory**
712*25b532ceSMauro Carvalho Chehab
713*25b532ceSMauro Carvalho Chehab->atomic_open() calls without O_CREAT may happen in parallel.
714*25b532ceSMauro Carvalho Chehab
715*25b532ceSMauro Carvalho Chehab---
716*25b532ceSMauro Carvalho Chehab
717*25b532ceSMauro Carvalho Chehab**mandatory**
718*25b532ceSMauro Carvalho Chehab
719*25b532ceSMauro Carvalho Chehab->setxattr() and xattr_handler.set() get dentry and inode passed separately.
720*25b532ceSMauro Carvalho Chehabdentry might be yet to be attached to inode, so do _not_ use its ->d_inode
721*25b532ceSMauro Carvalho Chehabin the instances.  Rationale: !@#!@# security_d_instantiate() needs to be
722*25b532ceSMauro Carvalho Chehabcalled before we attach dentry to inode and !@#!@##!@$!$#!@#$!@$!@$ smack
723*25b532ceSMauro Carvalho Chehab->d_instantiate() uses not just ->getxattr() but ->setxattr() as well.
724*25b532ceSMauro Carvalho Chehab
725*25b532ceSMauro Carvalho Chehab---
726*25b532ceSMauro Carvalho Chehab
727*25b532ceSMauro Carvalho Chehab**mandatory**
728*25b532ceSMauro Carvalho Chehab
729*25b532ceSMauro Carvalho Chehab->d_compare() doesn't get parent as a separate argument anymore.  If you
730*25b532ceSMauro Carvalho Chehabused it for finding the struct super_block involved, dentry->d_sb will
731*25b532ceSMauro Carvalho Chehabwork just as well; if it's something more complicated, use dentry->d_parent.
732*25b532ceSMauro Carvalho ChehabJust be careful not to assume that fetching it more than once will yield
733*25b532ceSMauro Carvalho Chehabthe same value - in RCU mode it could change under you.
734*25b532ceSMauro Carvalho Chehab
735*25b532ceSMauro Carvalho Chehab---
736*25b532ceSMauro Carvalho Chehab
737*25b532ceSMauro Carvalho Chehab**mandatory**
738*25b532ceSMauro Carvalho Chehab
739*25b532ceSMauro Carvalho Chehab->rename() has an added flags argument.  Any flags not handled by the
740*25b532ceSMauro Carvalho Chehabfilesystem should result in EINVAL being returned.
741*25b532ceSMauro Carvalho Chehab
742*25b532ceSMauro Carvalho Chehab---
743*25b532ceSMauro Carvalho Chehab
744*25b532ceSMauro Carvalho Chehab
745*25b532ceSMauro Carvalho Chehab**recommended**
746*25b532ceSMauro Carvalho Chehab
747*25b532ceSMauro Carvalho Chehab->readlink is optional for symlinks.  Don't set, unless filesystem needs
748*25b532ceSMauro Carvalho Chehabto fake something for readlink(2).
749*25b532ceSMauro Carvalho Chehab
750*25b532ceSMauro Carvalho Chehab---
751*25b532ceSMauro Carvalho Chehab
752*25b532ceSMauro Carvalho Chehab**mandatory**
753*25b532ceSMauro Carvalho Chehab
754*25b532ceSMauro Carvalho Chehab->getattr() is now passed a struct path rather than a vfsmount and
755*25b532ceSMauro Carvalho Chehabdentry separately, and it now has request_mask and query_flags arguments
756*25b532ceSMauro Carvalho Chehabto specify the fields and sync type requested by statx.  Filesystems not
757*25b532ceSMauro Carvalho Chehabsupporting any statx-specific features may ignore the new arguments.
758*25b532ceSMauro Carvalho Chehab
759*25b532ceSMauro Carvalho Chehab---
760*25b532ceSMauro Carvalho Chehab
761*25b532ceSMauro Carvalho Chehab**mandatory**
762*25b532ceSMauro Carvalho Chehab
763*25b532ceSMauro Carvalho Chehab->atomic_open() calling conventions have changed.  Gone is ``int *opened``,
764*25b532ceSMauro Carvalho Chehabalong with FILE_OPENED/FILE_CREATED.  In place of those we have
765*25b532ceSMauro Carvalho ChehabFMODE_OPENED/FMODE_CREATED, set in file->f_mode.  Additionally, return
766*25b532ceSMauro Carvalho Chehabvalue for 'called finish_no_open(), open it yourself' case has become
767*25b532ceSMauro Carvalho Chehab0, not 1.  Since finish_no_open() itself is returning 0 now, that part
768*25b532ceSMauro Carvalho Chehabdoes not need any changes in ->atomic_open() instances.
769*25b532ceSMauro Carvalho Chehab
770*25b532ceSMauro Carvalho Chehab---
771*25b532ceSMauro Carvalho Chehab
772*25b532ceSMauro Carvalho Chehab**mandatory**
773*25b532ceSMauro Carvalho Chehab
774*25b532ceSMauro Carvalho Chehaballoc_file() has become static now; two wrappers are to be used instead.
775*25b532ceSMauro Carvalho Chehaballoc_file_pseudo(inode, vfsmount, name, flags, ops) is for the cases
776*25b532ceSMauro Carvalho Chehabwhen dentry needs to be created; that's the majority of old alloc_file()
777*25b532ceSMauro Carvalho Chehabusers.  Calling conventions: on success a reference to new struct file
778*25b532ceSMauro Carvalho Chehabis returned and callers reference to inode is subsumed by that.  On
779*25b532ceSMauro Carvalho Chehabfailure, ERR_PTR() is returned and no caller's references are affected,
780*25b532ceSMauro Carvalho Chehabso the caller needs to drop the inode reference it held.
781*25b532ceSMauro Carvalho Chehaballoc_file_clone(file, flags, ops) does not affect any caller's references.
782*25b532ceSMauro Carvalho ChehabOn success you get a new struct file sharing the mount/dentry with the
783*25b532ceSMauro Carvalho Chehaboriginal, on failure - ERR_PTR().
784*25b532ceSMauro Carvalho Chehab
785*25b532ceSMauro Carvalho Chehab---
786*25b532ceSMauro Carvalho Chehab
787*25b532ceSMauro Carvalho Chehab**mandatory**
788*25b532ceSMauro Carvalho Chehab
789*25b532ceSMauro Carvalho Chehab->clone_file_range() and ->dedupe_file_range have been replaced with
790*25b532ceSMauro Carvalho Chehab->remap_file_range().  See Documentation/filesystems/vfs.rst for more
791*25b532ceSMauro Carvalho Chehabinformation.
792*25b532ceSMauro Carvalho Chehab
793*25b532ceSMauro Carvalho Chehab---
794*25b532ceSMauro Carvalho Chehab
795*25b532ceSMauro Carvalho Chehab**recommended**
796*25b532ceSMauro Carvalho Chehab
797*25b532ceSMauro Carvalho Chehab->lookup() instances doing an equivalent of::
798*25b532ceSMauro Carvalho Chehab
799*25b532ceSMauro Carvalho Chehab	if (IS_ERR(inode))
800*25b532ceSMauro Carvalho Chehab		return ERR_CAST(inode);
801*25b532ceSMauro Carvalho Chehab	return d_splice_alias(inode, dentry);
802*25b532ceSMauro Carvalho Chehab
803*25b532ceSMauro Carvalho Chehabdon't need to bother with the check - d_splice_alias() will do the
804*25b532ceSMauro Carvalho Chehabright thing when given ERR_PTR(...) as inode.  Moreover, passing NULL
805*25b532ceSMauro Carvalho Chehabinode to d_splice_alias() will also do the right thing (equivalent of
806*25b532ceSMauro Carvalho Chehabd_add(dentry, NULL); return NULL;), so that kind of special cases
807*25b532ceSMauro Carvalho Chehabalso doesn't need a separate treatment.
808*25b532ceSMauro Carvalho Chehab
809*25b532ceSMauro Carvalho Chehab---
810*25b532ceSMauro Carvalho Chehab
811*25b532ceSMauro Carvalho Chehab**strongly recommended**
812*25b532ceSMauro Carvalho Chehab
813*25b532ceSMauro Carvalho Chehabtake the RCU-delayed parts of ->destroy_inode() into a new method -
814*25b532ceSMauro Carvalho Chehab->free_inode().  If ->destroy_inode() becomes empty - all the better,
815*25b532ceSMauro Carvalho Chehabjust get rid of it.  Synchronous work (e.g. the stuff that can't
816*25b532ceSMauro Carvalho Chehabbe done from an RCU callback, or any WARN_ON() where we want the
817*25b532ceSMauro Carvalho Chehabstack trace) *might* be movable to ->evict_inode(); however,
818*25b532ceSMauro Carvalho Chehabthat goes only for the things that are not needed to balance something
819*25b532ceSMauro Carvalho Chehabdone by ->alloc_inode().  IOW, if it's cleaning up the stuff that
820*25b532ceSMauro Carvalho Chehabmight have accumulated over the life of in-core inode, ->evict_inode()
821*25b532ceSMauro Carvalho Chehabmight be a fit.
822*25b532ceSMauro Carvalho Chehab
823*25b532ceSMauro Carvalho ChehabRules for inode destruction:
824*25b532ceSMauro Carvalho Chehab
825*25b532ceSMauro Carvalho Chehab	* if ->destroy_inode() is non-NULL, it gets called
826*25b532ceSMauro Carvalho Chehab	* if ->free_inode() is non-NULL, it gets scheduled by call_rcu()
827*25b532ceSMauro Carvalho Chehab	* combination of NULL ->destroy_inode and NULL ->free_inode is
828*25b532ceSMauro Carvalho Chehab	  treated as NULL/free_inode_nonrcu, to preserve the compatibility.
829*25b532ceSMauro Carvalho Chehab
830*25b532ceSMauro Carvalho ChehabNote that the callback (be it via ->free_inode() or explicit call_rcu()
831*25b532ceSMauro Carvalho Chehabin ->destroy_inode()) is *NOT* ordered wrt superblock destruction;
832*25b532ceSMauro Carvalho Chehabas the matter of fact, the superblock and all associated structures
833*25b532ceSMauro Carvalho Chehabmight be already gone.  The filesystem driver is guaranteed to be still
834*25b532ceSMauro Carvalho Chehabthere, but that's it.  Freeing memory in the callback is fine; doing
835*25b532ceSMauro Carvalho Chehabmore than that is possible, but requires a lot of care and is best
836*25b532ceSMauro Carvalho Chehabavoided.
837*25b532ceSMauro Carvalho Chehab
838*25b532ceSMauro Carvalho Chehab---
839*25b532ceSMauro Carvalho Chehab
840*25b532ceSMauro Carvalho Chehab**mandatory**
841*25b532ceSMauro Carvalho Chehab
842*25b532ceSMauro Carvalho ChehabDCACHE_RCUACCESS is gone; having an RCU delay on dentry freeing is the
843*25b532ceSMauro Carvalho Chehabdefault.  DCACHE_NORCU opts out, and only d_alloc_pseudo() has any
844*25b532ceSMauro Carvalho Chehabbusiness doing so.
845*25b532ceSMauro Carvalho Chehab
846*25b532ceSMauro Carvalho Chehab---
847*25b532ceSMauro Carvalho Chehab
848*25b532ceSMauro Carvalho Chehab**mandatory**
849*25b532ceSMauro Carvalho Chehab
850*25b532ceSMauro Carvalho Chehabd_alloc_pseudo() is internal-only; uses outside of alloc_file_pseudo() are
851*25b532ceSMauro Carvalho Chehabvery suspect (and won't work in modules).  Such uses are very likely to
852*25b532ceSMauro Carvalho Chehabbe misspelled d_alloc_anon().
853