xref: /linux/Documentation/filesystems/netfs_library.rst (revision 5cfe477f6a3f9a4d9b2906d442964f2115b0403f)
1.. SPDX-License-Identifier: GPL-2.0
2
3=================================
4Network Filesystem Helper Library
5=================================
6
7.. Contents:
8
9 - Overview.
10 - Per-inode context.
11   - Inode context helper functions.
12 - Buffered read helpers.
13   - Read helper functions.
14   - Read helper structures.
15   - Read helper operations.
16   - Read helper procedure.
17   - Read helper cache API.
18
19
20Overview
21========
22
23The network filesystem helper library is a set of functions designed to aid a
24network filesystem in implementing VM/VFS operations.  For the moment, that
25just includes turning various VM buffered read operations into requests to read
26from the server.  The helper library, however, can also interpose other
27services, such as local caching or local data encryption.
28
29Note that the library module doesn't link against local caching directly, so
30access must be provided by the netfs.
31
32
33Per-Inode Context
34=================
35
36The network filesystem helper library needs a place to store a bit of state for
37its use on each netfs inode it is helping to manage.  To this end, a context
38structure is defined::
39
40	struct netfs_i_context {
41		const struct netfs_request_ops *ops;
42		struct fscache_cookie	*cache;
43	};
44
45A network filesystem that wants to use netfs lib must place one of these
46directly after the VFS ``struct inode`` it allocates, usually as part of its
47own struct.  This can be done in a way similar to the following::
48
49	struct my_inode {
50		struct {
51			/* These must be contiguous */
52			struct inode		vfs_inode;
53			struct netfs_i_context  netfs_ctx;
54		};
55		...
56	};
57
58This allows netfslib to find its state by simple offset from the inode pointer,
59thereby allowing the netfslib helper functions to be pointed to directly by the
60VFS/VM operation tables.
61
62The structure contains the following fields:
63
64 * ``ops``
65
66   The set of operations provided by the network filesystem to netfslib.
67
68 * ``cache``
69
70   Local caching cookie, or NULL if no caching is enabled.  This field does not
71   exist if fscache is disabled.
72
73
74Inode Context Helper Functions
75------------------------------
76
77To help deal with the per-inode context, a number helper functions are
78provided.  Firstly, a function to perform basic initialisation on a context and
79set the operations table pointer::
80
81	void netfs_i_context_init(struct inode *inode,
82				  const struct netfs_request_ops *ops);
83
84then two functions to cast between the VFS inode structure and the netfs
85context::
86
87	struct netfs_i_context *netfs_i_context(struct inode *inode);
88	struct inode *netfs_inode(struct netfs_i_context *ctx);
89
90and finally, a function to get the cache cookie pointer from the context
91attached to an inode (or NULL if fscache is disabled)::
92
93	struct fscache_cookie *netfs_i_cookie(struct inode *inode);
94
95
96Buffered Read Helpers
97=====================
98
99The library provides a set of read helpers that handle the ->readpage(),
100->readahead() and much of the ->write_begin() VM operations and translate them
101into a common call framework.
102
103The following services are provided:
104
105 * Handle folios that span multiple pages.
106
107 * Insulate the netfs from VM interface changes.
108
109 * Allow the netfs to arbitrarily split reads up into pieces, even ones that
110   don't match folio sizes or folio alignments and that may cross folios.
111
112 * Allow the netfs to expand a readahead request in both directions to meet its
113   needs.
114
115 * Allow the netfs to partially fulfil a read, which will then be resubmitted.
116
117 * Handle local caching, allowing cached data and server-read data to be
118   interleaved for a single request.
119
120 * Handle clearing of bufferage that aren't on the server.
121
122 * Handle retrying of reads that failed, switching reads from the cache to the
123   server as necessary.
124
125 * In the future, this is a place that other services can be performed, such as
126   local encryption of data to be stored remotely or in the cache.
127
128From the network filesystem, the helpers require a table of operations.  This
129includes a mandatory method to issue a read operation along with a number of
130optional methods.
131
132
133Read Helper Functions
134---------------------
135
136Three read helpers are provided::
137
138	void netfs_readahead(struct readahead_control *ractl);
139	int netfs_readpage(struct file *file,
140			   struct page *page);
141	int netfs_write_begin(struct file *file,
142			      struct address_space *mapping,
143			      loff_t pos,
144			      unsigned int len,
145			      unsigned int flags,
146			      struct folio **_folio,
147			      void **_fsdata);
148
149Each corresponds to a VM address space operation.  These operations use the
150state in the per-inode context.
151
152For ->readahead() and ->readpage(), the network filesystem just point directly
153at the corresponding read helper; whereas for ->write_begin(), it may be a
154little more complicated as the network filesystem might want to flush
155conflicting writes or track dirty data and needs to put the acquired folio if
156an error occurs after calling the helper.
157
158The helpers manage the read request, calling back into the network filesystem
159through the suppplied table of operations.  Waits will be performed as
160necessary before returning for helpers that are meant to be synchronous.
161
162If an error occurs and netfs_priv is non-NULL, ops->cleanup() will be called to
163deal with it.  If some parts of the request are in progress when an error
164occurs, the request will get partially completed if sufficient data is read.
165
166Additionally, there is::
167
168  * void netfs_subreq_terminated(struct netfs_io_subrequest *subreq,
169				 ssize_t transferred_or_error,
170				 bool was_async);
171
172which should be called to complete a read subrequest.  This is given the number
173of bytes transferred or a negative error code, plus a flag indicating whether
174the operation was asynchronous (ie. whether the follow-on processing can be
175done in the current context, given this may involve sleeping).
176
177
178Read Helper Structures
179----------------------
180
181The read helpers make use of a couple of structures to maintain the state of
182the read.  The first is a structure that manages a read request as a whole::
183
184	struct netfs_io_request {
185		struct inode		*inode;
186		struct address_space	*mapping;
187		struct netfs_cache_resources cache_resources;
188		void			*netfs_priv;
189		loff_t			start;
190		size_t			len;
191		loff_t			i_size;
192		const struct netfs_request_ops *netfs_ops;
193		unsigned int		debug_id;
194		...
195	};
196
197The above fields are the ones the netfs can use.  They are:
198
199 * ``inode``
200 * ``mapping``
201
202   The inode and the address space of the file being read from.  The mapping
203   may or may not point to inode->i_data.
204
205 * ``cache_resources``
206
207   Resources for the local cache to use, if present.
208
209 * ``netfs_priv``
210
211   The network filesystem's private data.  The value for this can be passed in
212   to the helper functions or set during the request.  The ->cleanup() op will
213   be called if this is non-NULL at the end.
214
215 * ``start``
216 * ``len``
217
218   The file position of the start of the read request and the length.  These
219   may be altered by the ->expand_readahead() op.
220
221 * ``i_size``
222
223   The size of the file at the start of the request.
224
225 * ``netfs_ops``
226
227   A pointer to the operation table.  The value for this is passed into the
228   helper functions.
229
230 * ``debug_id``
231
232   A number allocated to this operation that can be displayed in trace lines
233   for reference.
234
235
236The second structure is used to manage individual slices of the overall read
237request::
238
239	struct netfs_io_subrequest {
240		struct netfs_io_request *rreq;
241		loff_t			start;
242		size_t			len;
243		size_t			transferred;
244		unsigned long		flags;
245		unsigned short		debug_index;
246		...
247	};
248
249Each subrequest is expected to access a single source, though the helpers will
250handle falling back from one source type to another.  The members are:
251
252 * ``rreq``
253
254   A pointer to the read request.
255
256 * ``start``
257 * ``len``
258
259   The file position of the start of this slice of the read request and the
260   length.
261
262 * ``transferred``
263
264   The amount of data transferred so far of the length of this slice.  The
265   network filesystem or cache should start the operation this far into the
266   slice.  If a short read occurs, the helpers will call again, having updated
267   this to reflect the amount read so far.
268
269 * ``flags``
270
271   Flags pertaining to the read.  There are two of interest to the filesystem
272   or cache:
273
274   * ``NETFS_SREQ_CLEAR_TAIL``
275
276     This can be set to indicate that the remainder of the slice, from
277     transferred to len, should be cleared.
278
279   * ``NETFS_SREQ_SEEK_DATA_READ``
280
281     This is a hint to the cache that it might want to try skipping ahead to
282     the next data (ie. using SEEK_DATA).
283
284 * ``debug_index``
285
286   A number allocated to this slice that can be displayed in trace lines for
287   reference.
288
289
290Read Helper Operations
291----------------------
292
293The network filesystem must provide the read helpers with a table of operations
294through which it can issue requests and negotiate::
295
296	struct netfs_request_ops {
297		void (*init_request)(struct netfs_io_request *rreq, struct file *file);
298		int (*begin_cache_operation)(struct netfs_io_request *rreq);
299		void (*expand_readahead)(struct netfs_io_request *rreq);
300		bool (*clamp_length)(struct netfs_io_subrequest *subreq);
301		void (*issue_read)(struct netfs_io_subrequest *subreq);
302		bool (*is_still_valid)(struct netfs_io_request *rreq);
303		int (*check_write_begin)(struct file *file, loff_t pos, unsigned len,
304					 struct folio *folio, void **_fsdata);
305		void (*done)(struct netfs_io_request *rreq);
306		void (*cleanup)(struct address_space *mapping, void *netfs_priv);
307	};
308
309The operations are as follows:
310
311 * ``init_request()``
312
313   [Optional] This is called to initialise the request structure.  It is given
314   the file for reference and can modify the ->netfs_priv value.
315
316 * ``begin_cache_operation()``
317
318   [Optional] This is called to ask the network filesystem to call into the
319   cache (if present) to initialise the caching state for this read.  The netfs
320   library module cannot access the cache directly, so the cache should call
321   something like fscache_begin_read_operation() to do this.
322
323   The cache gets to store its state in ->cache_resources and must set a table
324   of operations of its own there (though of a different type).
325
326   This should return 0 on success and an error code otherwise.  If an error is
327   reported, the operation may proceed anyway, just without local caching (only
328   out of memory and interruption errors cause failure here).
329
330 * ``expand_readahead()``
331
332   [Optional] This is called to allow the filesystem to expand the size of a
333   readahead read request.  The filesystem gets to expand the request in both
334   directions, though it's not permitted to reduce it as the numbers may
335   represent an allocation already made.  If local caching is enabled, it gets
336   to expand the request first.
337
338   Expansion is communicated by changing ->start and ->len in the request
339   structure.  Note that if any change is made, ->len must be increased by at
340   least as much as ->start is reduced.
341
342 * ``clamp_length()``
343
344   [Optional] This is called to allow the filesystem to reduce the size of a
345   subrequest.  The filesystem can use this, for example, to chop up a request
346   that has to be split across multiple servers or to put multiple reads in
347   flight.
348
349   This should return 0 on success and an error code on error.
350
351 * ``issue_read()``
352
353   [Required] The helpers use this to dispatch a subrequest to the server for
354   reading.  In the subrequest, ->start, ->len and ->transferred indicate what
355   data should be read from the server.
356
357   There is no return value; the netfs_subreq_terminated() function should be
358   called to indicate whether or not the operation succeeded and how much data
359   it transferred.  The filesystem also should not deal with setting folios
360   uptodate, unlocking them or dropping their refs - the helpers need to deal
361   with this as they have to coordinate with copying to the local cache.
362
363   Note that the helpers have the folios locked, but not pinned.  It is
364   possible to use the ITER_XARRAY iov iterator to refer to the range of the
365   inode that is being operated upon without the need to allocate large bvec
366   tables.
367
368 * ``is_still_valid()``
369
370   [Optional] This is called to find out if the data just read from the local
371   cache is still valid.  It should return true if it is still valid and false
372   if not.  If it's not still valid, it will be reread from the server.
373
374 * ``check_write_begin()``
375
376   [Optional] This is called from the netfs_write_begin() helper once it has
377   allocated/grabbed the folio to be modified to allow the filesystem to flush
378   conflicting state before allowing it to be modified.
379
380   It should return 0 if everything is now fine, -EAGAIN if the folio should be
381   regrabbed and any other error code to abort the operation.
382
383 * ``done``
384
385   [Optional] This is called after the folios in the request have all been
386   unlocked (and marked uptodate if applicable).
387
388 * ``cleanup``
389
390   [Optional] This is called as the request is being deallocated so that the
391   filesystem can clean up ->netfs_priv.
392
393
394
395Read Helper Procedure
396---------------------
397
398The read helpers work by the following general procedure:
399
400 * Set up the request.
401
402 * For readahead, allow the local cache and then the network filesystem to
403   propose expansions to the read request.  This is then proposed to the VM.
404   If the VM cannot fully perform the expansion, a partially expanded read will
405   be performed, though this may not get written to the cache in its entirety.
406
407 * Loop around slicing chunks off of the request to form subrequests:
408
409   * If a local cache is present, it gets to do the slicing, otherwise the
410     helpers just try to generate maximal slices.
411
412   * The network filesystem gets to clamp the size of each slice if it is to be
413     the source.  This allows rsize and chunking to be implemented.
414
415   * The helpers issue a read from the cache or a read from the server or just
416     clears the slice as appropriate.
417
418   * The next slice begins at the end of the last one.
419
420   * As slices finish being read, they terminate.
421
422 * When all the subrequests have terminated, the subrequests are assessed and
423   any that are short or have failed are reissued:
424
425   * Failed cache requests are issued against the server instead.
426
427   * Failed server requests just fail.
428
429   * Short reads against either source will be reissued against that source
430     provided they have transferred some more data:
431
432     * The cache may need to skip holes that it can't do DIO from.
433
434     * If NETFS_SREQ_CLEAR_TAIL was set, a short read will be cleared to the
435       end of the slice instead of reissuing.
436
437 * Once the data is read, the folios that have been fully read/cleared:
438
439   * Will be marked uptodate.
440
441   * If a cache is present, will be marked with PG_fscache.
442
443   * Unlocked
444
445 * Any folios that need writing to the cache will then have DIO writes issued.
446
447 * Synchronous operations will wait for reading to be complete.
448
449 * Writes to the cache will proceed asynchronously and the folios will have the
450   PG_fscache mark removed when that completes.
451
452 * The request structures will be cleaned up when everything has completed.
453
454
455Read Helper Cache API
456---------------------
457
458When implementing a local cache to be used by the read helpers, two things are
459required: some way for the network filesystem to initialise the caching for a
460read request and a table of operations for the helpers to call.
461
462The network filesystem's ->begin_cache_operation() method is called to set up a
463cache and this must call into the cache to do the work.  If using fscache, for
464example, the cache would call::
465
466	int fscache_begin_read_operation(struct netfs_io_request *rreq,
467					 struct fscache_cookie *cookie);
468
469passing in the request pointer and the cookie corresponding to the file.
470
471The netfs_io_request object contains a place for the cache to hang its
472state::
473
474	struct netfs_cache_resources {
475		const struct netfs_cache_ops	*ops;
476		void				*cache_priv;
477		void				*cache_priv2;
478	};
479
480This contains an operations table pointer and two private pointers.  The
481operation table looks like the following::
482
483	struct netfs_cache_ops {
484		void (*end_operation)(struct netfs_cache_resources *cres);
485
486		void (*expand_readahead)(struct netfs_cache_resources *cres,
487					 loff_t *_start, size_t *_len, loff_t i_size);
488
489		enum netfs_io_source (*prepare_read)(struct netfs_io_subrequest *subreq,
490						       loff_t i_size);
491
492		int (*read)(struct netfs_cache_resources *cres,
493			    loff_t start_pos,
494			    struct iov_iter *iter,
495			    bool seek_data,
496			    netfs_io_terminated_t term_func,
497			    void *term_func_priv);
498
499		int (*prepare_write)(struct netfs_cache_resources *cres,
500				     loff_t *_start, size_t *_len, loff_t i_size,
501				     bool no_space_allocated_yet);
502
503		int (*write)(struct netfs_cache_resources *cres,
504			     loff_t start_pos,
505			     struct iov_iter *iter,
506			     netfs_io_terminated_t term_func,
507			     void *term_func_priv);
508
509		int (*query_occupancy)(struct netfs_cache_resources *cres,
510				       loff_t start, size_t len, size_t granularity,
511				       loff_t *_data_start, size_t *_data_len);
512	};
513
514With a termination handler function pointer::
515
516	typedef void (*netfs_io_terminated_t)(void *priv,
517					      ssize_t transferred_or_error,
518					      bool was_async);
519
520The methods defined in the table are:
521
522 * ``end_operation()``
523
524   [Required] Called to clean up the resources at the end of the read request.
525
526 * ``expand_readahead()``
527
528   [Optional] Called at the beginning of a netfs_readahead() operation to allow
529   the cache to expand a request in either direction.  This allows the cache to
530   size the request appropriately for the cache granularity.
531
532   The function is passed poiners to the start and length in its parameters,
533   plus the size of the file for reference, and adjusts the start and length
534   appropriately.  It should return one of:
535
536   * ``NETFS_FILL_WITH_ZEROES``
537   * ``NETFS_DOWNLOAD_FROM_SERVER``
538   * ``NETFS_READ_FROM_CACHE``
539   * ``NETFS_INVALID_READ``
540
541   to indicate whether the slice should just be cleared or whether it should be
542   downloaded from the server or read from the cache - or whether slicing
543   should be given up at the current point.
544
545 * ``prepare_read()``
546
547   [Required] Called to configure the next slice of a request.  ->start and
548   ->len in the subrequest indicate where and how big the next slice can be;
549   the cache gets to reduce the length to match its granularity requirements.
550
551 * ``read()``
552
553   [Required] Called to read from the cache.  The start file offset is given
554   along with an iterator to read to, which gives the length also.  It can be
555   given a hint requesting that it seek forward from that start position for
556   data.
557
558   Also provided is a pointer to a termination handler function and private
559   data to pass to that function.  The termination function should be called
560   with the number of bytes transferred or an error code, plus a flag
561   indicating whether the termination is definitely happening in the caller's
562   context.
563
564 * ``prepare_write()``
565
566   [Required] Called to prepare a write to the cache to take place.  This
567   involves checking to see whether the cache has sufficient space to honour
568   the write.  ``*_start`` and ``*_len`` indicate the region to be written; the
569   region can be shrunk or it can be expanded to a page boundary either way as
570   necessary to align for direct I/O.  i_size holds the size of the object and
571   is provided for reference.  no_space_allocated_yet is set to true if the
572   caller is certain that no data has been written to that region - for example
573   if it tried to do a read from there already.
574
575 * ``write()``
576
577   [Required] Called to write to the cache.  The start file offset is given
578   along with an iterator to write from, which gives the length also.
579
580   Also provided is a pointer to a termination handler function and private
581   data to pass to that function.  The termination function should be called
582   with the number of bytes transferred or an error code, plus a flag
583   indicating whether the termination is definitely happening in the caller's
584   context.
585
586 * ``query_occupancy()``
587
588   [Required] Called to find out where the next piece of data is within a
589   particular region of the cache.  The start and length of the region to be
590   queried are passed in, along with the granularity to which the answer needs
591   to be aligned.  The function passes back the start and length of the data,
592   if any, available within that region.  Note that there may be a hole at the
593   front.
594
595   It returns 0 if some data was found, -ENODATA if there was no usable data
596   within the region or -ENOBUFS if there is no caching on this file.
597
598Note that these methods are passed a pointer to the cache resource structure,
599not the read request structure as they could be used in other situations where
600there isn't a read request structure as well, such as writing dirty data to the
601cache.
602
603
604API Function Reference
605======================
606
607.. kernel-doc:: include/linux/netfs.h
608.. kernel-doc:: fs/netfs/buffered_read.c
609.. kernel-doc:: fs/netfs/io.c
610