xref: /linux/Documentation/filesystems/nfs/pnfs.rst (revision ead5d1f4d877e92c051e1a1ade623d0d30e71619)
1*34e75cf4SDaniel W. S. Almeida==========================
2*34e75cf4SDaniel W. S. AlmeidaReference counting in pnfs
3*34e75cf4SDaniel W. S. Almeida==========================
4*34e75cf4SDaniel W. S. Almeida
5*34e75cf4SDaniel W. S. AlmeidaThe are several inter-related caches.  We have layouts which can
6*34e75cf4SDaniel W. S. Almeidareference multiple devices, each of which can reference multiple data servers.
7*34e75cf4SDaniel W. S. AlmeidaEach data server can be referenced by multiple devices.  Each device
8*34e75cf4SDaniel W. S. Almeidacan be referenced by multiple layouts. To keep all of this straight,
9*34e75cf4SDaniel W. S. Almeidawe need to reference count.
10*34e75cf4SDaniel W. S. Almeida
11*34e75cf4SDaniel W. S. Almeida
12*34e75cf4SDaniel W. S. Almeidastruct pnfs_layout_hdr
13*34e75cf4SDaniel W. S. Almeida======================
14*34e75cf4SDaniel W. S. Almeida
15*34e75cf4SDaniel W. S. AlmeidaThe on-the-wire command LAYOUTGET corresponds to struct
16*34e75cf4SDaniel W. S. Almeidapnfs_layout_segment, usually referred to by the variable name lseg.
17*34e75cf4SDaniel W. S. AlmeidaEach nfs_inode may hold a pointer to a cache of these layout
18*34e75cf4SDaniel W. S. Almeidasegments in nfsi->layout, of type struct pnfs_layout_hdr.
19*34e75cf4SDaniel W. S. Almeida
20*34e75cf4SDaniel W. S. AlmeidaWe reference the header for the inode pointing to it, across each
21*34e75cf4SDaniel W. S. Almeidaoutstanding RPC call that references it (LAYOUTGET, LAYOUTRETURN,
22*34e75cf4SDaniel W. S. AlmeidaLAYOUTCOMMIT), and for each lseg held within.
23*34e75cf4SDaniel W. S. Almeida
24*34e75cf4SDaniel W. S. AlmeidaEach header is also (when non-empty) put on a list associated with
25*34e75cf4SDaniel W. S. Almeidastruct nfs_client (cl_layouts).  Being put on this list does not bump
26*34e75cf4SDaniel W. S. Almeidathe reference count, as the layout is kept around by the lseg that
27*34e75cf4SDaniel W. S. Almeidakeeps it in the list.
28*34e75cf4SDaniel W. S. Almeida
29*34e75cf4SDaniel W. S. Almeidadeviceid_cache
30*34e75cf4SDaniel W. S. Almeida==============
31*34e75cf4SDaniel W. S. Almeida
32*34e75cf4SDaniel W. S. Almeidalsegs reference device ids, which are resolved per nfs_client and
33*34e75cf4SDaniel W. S. Almeidalayout driver type.  The device ids are held in a RCU cache (struct
34*34e75cf4SDaniel W. S. Almeidanfs4_deviceid_cache).  The cache itself is referenced across each
35*34e75cf4SDaniel W. S. Almeidamount.  The entries (struct nfs4_deviceid) themselves are held across
36*34e75cf4SDaniel W. S. Almeidathe lifetime of each lseg referencing them.
37*34e75cf4SDaniel W. S. Almeida
38*34e75cf4SDaniel W. S. AlmeidaRCU is used because the deviceid is basically a write once, read many
39*34e75cf4SDaniel W. S. Almeidadata structure.  The hlist size of 32 buckets needs better
40*34e75cf4SDaniel W. S. Almeidajustification, but seems reasonable given that we can have multiple
41*34e75cf4SDaniel W. S. Almeidadeviceid's per filesystem, and multiple filesystems per nfs_client.
42*34e75cf4SDaniel W. S. Almeida
43*34e75cf4SDaniel W. S. AlmeidaThe hash code is copied from the nfsd code base.  A discussion of
44*34e75cf4SDaniel W. S. Almeidahashing and variations of this algorithm can be found `here.
45*34e75cf4SDaniel W. S. Almeida<http://groups.google.com/group/comp.lang.c/browse_thread/thread/9522965e2b8d3809>`_
46*34e75cf4SDaniel W. S. Almeida
47*34e75cf4SDaniel W. S. Almeidadata server cache
48*34e75cf4SDaniel W. S. Almeida=================
49*34e75cf4SDaniel W. S. Almeida
50*34e75cf4SDaniel W. S. Almeidafile driver devices refer to data servers, which are kept in a module
51*34e75cf4SDaniel W. S. Almeidalevel cache.  Its reference is held over the lifetime of the deviceid
52*34e75cf4SDaniel W. S. Almeidapointing to it.
53*34e75cf4SDaniel W. S. Almeida
54*34e75cf4SDaniel W. S. Almeidalseg
55*34e75cf4SDaniel W. S. Almeida====
56*34e75cf4SDaniel W. S. Almeida
57*34e75cf4SDaniel W. S. Almeidalseg maintains an extra reference corresponding to the NFS_LSEG_VALID
58*34e75cf4SDaniel W. S. Almeidabit which holds it in the pnfs_layout_hdr's list.  When the final lseg
59*34e75cf4SDaniel W. S. Almeidais removed from the pnfs_layout_hdr's list, the NFS_LAYOUT_DESTROYED
60*34e75cf4SDaniel W. S. Almeidabit is set, preventing any new lsegs from being added.
61*34e75cf4SDaniel W. S. Almeida
62*34e75cf4SDaniel W. S. Almeidalayout drivers
63*34e75cf4SDaniel W. S. Almeida==============
64*34e75cf4SDaniel W. S. Almeida
65*34e75cf4SDaniel W. S. AlmeidaPNFS utilizes what is called layout drivers. The STD defines 4 basic
66*34e75cf4SDaniel W. S. Almeidalayout types: "files", "objects", "blocks", and "flexfiles". For each
67*34e75cf4SDaniel W. S. Almeidaof these types there is a layout-driver with a common function-vectors
68*34e75cf4SDaniel W. S. Almeidatable which are called by the nfs-client pnfs-core to implement the
69*34e75cf4SDaniel W. S. Almeidadifferent layout types.
70*34e75cf4SDaniel W. S. Almeida
71*34e75cf4SDaniel W. S. AlmeidaFiles-layout-driver code is in: fs/nfs/filelayout/.. directory
72*34e75cf4SDaniel W. S. AlmeidaBlocks-layout-driver code is in: fs/nfs/blocklayout/.. directory
73*34e75cf4SDaniel W. S. AlmeidaFlexfiles-layout-driver code is in: fs/nfs/flexfilelayout/.. directory
74*34e75cf4SDaniel W. S. Almeida
75*34e75cf4SDaniel W. S. Almeidablocks-layout setup
76*34e75cf4SDaniel W. S. Almeida===================
77*34e75cf4SDaniel W. S. Almeida
78*34e75cf4SDaniel W. S. AlmeidaTODO: Document the setup needs of the blocks layout driver
79