xref: /linux/fs/jffs2/README.Locking (revision 1da177e4c3f41524e886b7f1b8a0c1fc7321cac2)
1*1da177e4SLinus Torvalds	$Id: README.Locking,v 1.9 2004/11/20 10:35:40 dwmw2 Exp $
2*1da177e4SLinus Torvalds
3*1da177e4SLinus Torvalds	JFFS2 LOCKING DOCUMENTATION
4*1da177e4SLinus Torvalds	---------------------------
5*1da177e4SLinus Torvalds
6*1da177e4SLinus TorvaldsAt least theoretically, JFFS2 does not require the Big Kernel Lock
7*1da177e4SLinus Torvalds(BKL), which was always helpfully obtained for it by Linux 2.4 VFS
8*1da177e4SLinus Torvaldscode. It has its own locking, as described below.
9*1da177e4SLinus Torvalds
10*1da177e4SLinus TorvaldsThis document attempts to describe the existing locking rules for
11*1da177e4SLinus TorvaldsJFFS2. It is not expected to remain perfectly up to date, but ought to
12*1da177e4SLinus Torvaldsbe fairly close.
13*1da177e4SLinus Torvalds
14*1da177e4SLinus Torvalds
15*1da177e4SLinus Torvalds	alloc_sem
16*1da177e4SLinus Torvalds	---------
17*1da177e4SLinus Torvalds
18*1da177e4SLinus TorvaldsThe alloc_sem is a per-filesystem semaphore, used primarily to ensure
19*1da177e4SLinus Torvaldscontiguous allocation of space on the medium. It is automatically
20*1da177e4SLinus Torvaldsobtained during space allocations (jffs2_reserve_space()) and freed
21*1da177e4SLinus Torvaldsupon write completion (jffs2_complete_reservation()). Note that
22*1da177e4SLinus Torvaldsthe garbage collector will obtain this right at the beginning of
23*1da177e4SLinus Torvaldsjffs2_garbage_collect_pass() and release it at the end, thereby
24*1da177e4SLinus Torvaldspreventing any other write activity on the file system during a
25*1da177e4SLinus Torvaldsgarbage collect pass.
26*1da177e4SLinus Torvalds
27*1da177e4SLinus TorvaldsWhen writing new nodes, the alloc_sem must be held until the new nodes
28*1da177e4SLinus Torvaldshave been properly linked into the data structures for the inode to
29*1da177e4SLinus Torvaldswhich they belong. This is for the benefit of NAND flash - adding new
30*1da177e4SLinus Torvaldsnodes to an inode may obsolete old ones, and by holding the alloc_sem
31*1da177e4SLinus Torvaldsuntil this happens we ensure that any data in the write-buffer at the
32*1da177e4SLinus Torvaldstime this happens are part of the new node, not just something that
33*1da177e4SLinus Torvaldswas written afterwards. Hence, we can ensure the newly-obsoleted nodes
34*1da177e4SLinus Torvaldsdon't actually get erased until the write-buffer has been flushed to
35*1da177e4SLinus Torvaldsthe medium.
36*1da177e4SLinus Torvalds
37*1da177e4SLinus TorvaldsWith the introduction of NAND flash support and the write-buffer,
38*1da177e4SLinus Torvaldsthe alloc_sem is also used to protect the wbuf-related members of the
39*1da177e4SLinus Torvaldsjffs2_sb_info structure. Atomically reading the wbuf_len member to see
40*1da177e4SLinus Torvaldsif the wbuf is currently holding any data is permitted, though.
41*1da177e4SLinus Torvalds
42*1da177e4SLinus TorvaldsOrdering constraints: See f->sem.
43*1da177e4SLinus Torvalds
44*1da177e4SLinus Torvalds
45*1da177e4SLinus Torvalds	File Semaphore f->sem
46*1da177e4SLinus Torvalds	---------------------
47*1da177e4SLinus Torvalds
48*1da177e4SLinus TorvaldsThis is the JFFS2-internal equivalent of the inode semaphore i->i_sem.
49*1da177e4SLinus TorvaldsIt protects the contents of the jffs2_inode_info private inode data,
50*1da177e4SLinus Torvaldsincluding the linked list of node fragments (but see the notes below on
51*1da177e4SLinus Torvaldserase_completion_lock), etc.
52*1da177e4SLinus Torvalds
53*1da177e4SLinus TorvaldsThe reason that the i_sem itself isn't used for this purpose is to
54*1da177e4SLinus Torvaldsavoid deadlocks with garbage collection -- the VFS will lock the i_sem
55*1da177e4SLinus Torvaldsbefore calling a function which may need to allocate space. The
56*1da177e4SLinus Torvaldsallocation may trigger garbage-collection, which may need to move a
57*1da177e4SLinus Torvaldsnode belonging to the inode which was locked in the first place by the
58*1da177e4SLinus TorvaldsVFS. If the garbage collection code were to attempt to lock the i_sem
59*1da177e4SLinus Torvaldsof the inode from which it's garbage-collecting a physical node, this
60*1da177e4SLinus Torvaldslead to deadlock, unless we played games with unlocking the i_sem
61*1da177e4SLinus Torvaldsbefore calling the space allocation functions.
62*1da177e4SLinus Torvalds
63*1da177e4SLinus TorvaldsInstead of playing such games, we just have an extra internal
64*1da177e4SLinus Torvaldssemaphore, which is obtained by the garbage collection code and also
65*1da177e4SLinus Torvaldsby the normal file system code _after_ allocation of space.
66*1da177e4SLinus Torvalds
67*1da177e4SLinus TorvaldsOrdering constraints:
68*1da177e4SLinus Torvalds
69*1da177e4SLinus Torvalds	1. Never attempt to allocate space or lock alloc_sem with
70*1da177e4SLinus Torvalds	   any f->sem held.
71*1da177e4SLinus Torvalds	2. Never attempt to lock two file semaphores in one thread.
72*1da177e4SLinus Torvalds	   No ordering rules have been made for doing so.
73*1da177e4SLinus Torvalds
74*1da177e4SLinus Torvalds
75*1da177e4SLinus Torvalds	erase_completion_lock spinlock
76*1da177e4SLinus Torvalds	------------------------------
77*1da177e4SLinus Torvalds
78*1da177e4SLinus TorvaldsThis is used to serialise access to the eraseblock lists, to the
79*1da177e4SLinus Torvaldsper-eraseblock lists of physical jffs2_raw_node_ref structures, and
80*1da177e4SLinus Torvalds(NB) the per-inode list of physical nodes. The latter is a special
81*1da177e4SLinus Torvaldscase - see below.
82*1da177e4SLinus Torvalds
83*1da177e4SLinus TorvaldsAs the MTD API no longer permits erase-completion callback functions
84*1da177e4SLinus Torvaldsto be called from bottom-half (timer) context (on the basis that nobody
85*1da177e4SLinus Torvaldsever actually implemented such a thing), it's now sufficient to use
86*1da177e4SLinus Torvaldsa simple spin_lock() rather than spin_lock_bh().
87*1da177e4SLinus Torvalds
88*1da177e4SLinus TorvaldsNote that the per-inode list of physical nodes (f->nodes) is a special
89*1da177e4SLinus Torvaldscase. Any changes to _valid_ nodes (i.e. ->flash_offset & 1 == 0) in
90*1da177e4SLinus Torvaldsthe list are protected by the file semaphore f->sem. But the erase
91*1da177e4SLinus Torvaldscode may remove _obsolete_ nodes from the list while holding only the
92*1da177e4SLinus Torvaldserase_completion_lock. So you can walk the list only while holding the
93*1da177e4SLinus Torvaldserase_completion_lock, and can drop the lock temporarily mid-walk as
94*1da177e4SLinus Torvaldslong as the pointer you're holding is to a _valid_ node, not an
95*1da177e4SLinus Torvaldsobsolete one.
96*1da177e4SLinus Torvalds
97*1da177e4SLinus TorvaldsThe erase_completion_lock is also used to protect the c->gc_task
98*1da177e4SLinus Torvaldspointer when the garbage collection thread exits. The code to kill the
99*1da177e4SLinus TorvaldsGC thread locks it, sends the signal, then unlocks it - while the GC
100*1da177e4SLinus Torvaldsthread itself locks it, zeroes c->gc_task, then unlocks on the exit path.
101*1da177e4SLinus Torvalds
102*1da177e4SLinus Torvalds
103*1da177e4SLinus Torvalds	inocache_lock spinlock
104*1da177e4SLinus Torvalds	----------------------
105*1da177e4SLinus Torvalds
106*1da177e4SLinus TorvaldsThis spinlock protects the hashed list (c->inocache_list) of the
107*1da177e4SLinus Torvaldsin-core jffs2_inode_cache objects (each inode in JFFS2 has the
108*1da177e4SLinus Torvaldscorrespondent jffs2_inode_cache object). So, the inocache_lock
109*1da177e4SLinus Torvaldshas to be locked while walking the c->inocache_list hash buckets.
110*1da177e4SLinus Torvalds
111*1da177e4SLinus TorvaldsNote, the f->sem guarantees that the correspondent jffs2_inode_cache
112*1da177e4SLinus Torvaldswill not be removed. So, it is allowed to access it without locking
113*1da177e4SLinus Torvaldsthe inocache_lock spinlock.
114*1da177e4SLinus Torvalds
115*1da177e4SLinus TorvaldsOrdering constraints:
116*1da177e4SLinus Torvalds
117*1da177e4SLinus Torvalds	If both erase_completion_lock and inocache_lock are needed, the
118*1da177e4SLinus Torvalds	c->erase_completion has to be acquired first.
119*1da177e4SLinus Torvalds
120*1da177e4SLinus Torvalds
121*1da177e4SLinus Torvalds	erase_free_sem
122*1da177e4SLinus Torvalds	--------------
123*1da177e4SLinus Torvalds
124*1da177e4SLinus TorvaldsThis semaphore is only used by the erase code which frees obsolete
125*1da177e4SLinus Torvaldsnode references and the jffs2_garbage_collect_deletion_dirent()
126*1da177e4SLinus Torvaldsfunction. The latter function on NAND flash must read _obsolete_ nodes
127*1da177e4SLinus Torvaldsto determine whether the 'deletion dirent' under consideration can be
128*1da177e4SLinus Torvaldsdiscarded or whether it is still required to show that an inode has
129*1da177e4SLinus Torvaldsbeen unlinked. Because reading from the flash may sleep, the
130*1da177e4SLinus Torvaldserase_completion_lock cannot be held, so an alternative, more
131*1da177e4SLinus Torvaldsheavyweight lock was required to prevent the erase code from freeing
132*1da177e4SLinus Torvaldsthe jffs2_raw_node_ref structures in question while the garbage
133*1da177e4SLinus Torvaldscollection code is looking at them.
134*1da177e4SLinus Torvalds
135*1da177e4SLinus TorvaldsSuggestions for alternative solutions to this problem would be welcomed.
136*1da177e4SLinus Torvalds
137*1da177e4SLinus Torvalds
138*1da177e4SLinus Torvalds	wbuf_sem
139*1da177e4SLinus Torvalds	--------
140*1da177e4SLinus Torvalds
141*1da177e4SLinus TorvaldsThis read/write semaphore protects against concurrent access to the
142*1da177e4SLinus Torvaldswrite-behind buffer ('wbuf') used for flash chips where we must write
143*1da177e4SLinus Torvaldsin blocks. It protects both the contents of the wbuf and the metadata
144*1da177e4SLinus Torvaldswhich indicates which flash region (if any) is currently covered by
145*1da177e4SLinus Torvaldsthe buffer.
146*1da177e4SLinus Torvalds
147*1da177e4SLinus TorvaldsOrdering constraints:
148*1da177e4SLinus Torvalds	Lock wbuf_sem last, after the alloc_sem or and f->sem.
149