1.. SPDX-License-Identifier: GPL-2.0 2 3=================================== 4File management in the Linux kernel 5=================================== 6 7This document describes how locking for files (struct file) 8and file descriptor table (struct files) works. 9 10Up until 2.6.12, the file descriptor table has been protected 11with a lock (files->file_lock) and reference count (files->count). 12->file_lock protected accesses to all the file related fields 13of the table. ->count was used for sharing the file descriptor 14table between tasks cloned with CLONE_FILES flag. Typically 15this would be the case for posix threads. As with the common 16refcounting model in the kernel, the last task doing 17a put_files_struct() frees the file descriptor (fd) table. 18The files (struct file) themselves are protected using 19reference count (->f_count). 20 21In the new lock-free model of file descriptor management, 22the reference counting is similar, but the locking is 23based on RCU. The file descriptor table contains multiple 24elements - the fd sets (open_fds and close_on_exec, the 25array of file pointers, the sizes of the sets and the array 26etc.). In order for the updates to appear atomic to 27a lock-free reader, all the elements of the file descriptor 28table are in a separate structure - struct fdtable. 29files_struct contains a pointer to struct fdtable through 30which the actual fd table is accessed. Initially the 31fdtable is embedded in files_struct itself. On a subsequent 32expansion of fdtable, a new fdtable structure is allocated 33and files->fdtab points to the new structure. The fdtable 34structure is freed with RCU and lock-free readers either 35see the old fdtable or the new fdtable making the update 36appear atomic. Here are the locking rules for 37the fdtable structure - 38 391. All references to the fdtable must be done through 40 the files_fdtable() macro:: 41 42 struct fdtable *fdt; 43 44 rcu_read_lock(); 45 46 fdt = files_fdtable(files); 47 .... 48 if (n <= fdt->max_fds) 49 .... 50 ... 51 rcu_read_unlock(); 52 53 files_fdtable() uses rcu_dereference() macro which takes care of 54 the memory barrier requirements for lock-free dereference. 55 The fdtable pointer must be read within the read-side 56 critical section. 57 582. Reading of the fdtable as described above must be protected 59 by rcu_read_lock()/rcu_read_unlock(). 60 613. For any update to the fd table, files->file_lock must 62 be held. 63 644. To look up the file structure given an fd, a reader 65 must use either lookup_fdget_rcu() or files_lookup_fdget_rcu() APIs. These 66 take care of barrier requirements due to lock-free lookup. 67 68 An example:: 69 70 struct file *file; 71 72 rcu_read_lock(); 73 file = lookup_fdget_rcu(fd); 74 rcu_read_unlock(); 75 if (file) { 76 ... 77 fput(file); 78 } 79 .... 80 815. Since both fdtable and file structures can be looked up 82 lock-free, they must be installed using rcu_assign_pointer() 83 API. If they are looked up lock-free, rcu_dereference() 84 must be used. However it is advisable to use files_fdtable() 85 and lookup_fdget_rcu()/files_lookup_fdget_rcu() which take care of these 86 issues. 87 886. While updating, the fdtable pointer must be looked up while 89 holding files->file_lock. If ->file_lock is dropped, then 90 another thread expand the files thereby creating a new 91 fdtable and making the earlier fdtable pointer stale. 92 93 For example:: 94 95 spin_lock(&files->file_lock); 96 fd = locate_fd(files, file, start); 97 if (fd >= 0) { 98 /* locate_fd() may have expanded fdtable, load the ptr */ 99 fdt = files_fdtable(files); 100 __set_open_fd(fd, fdt); 101 __clear_close_on_exec(fd, fdt); 102 spin_unlock(&files->file_lock); 103 ..... 104 105 Since locate_fd() can drop ->file_lock (and reacquire ->file_lock), 106 the fdtable pointer (fdt) must be loaded after locate_fd(). 107 108On newer kernels rcu based file lookup has been switched to rely on 109SLAB_TYPESAFE_BY_RCU instead of call_rcu(). It isn't sufficient anymore 110to just acquire a reference to the file in question under rcu using 111atomic_long_inc_not_zero() since the file might have already been 112recycled and someone else might have bumped the reference. In other 113words, callers might see reference count bumps from newer users. For 114this is reason it is necessary to verify that the pointer is the same 115before and after the reference count increment. This pattern can be seen 116in get_file_rcu() and __files_get_rcu(). 117 118In addition, it isn't possible to access or check fields in struct file 119without first acquiring a reference on it under rcu lookup. Not doing 120that was always very dodgy and it was only usable for non-pointer data 121in struct file. With SLAB_TYPESAFE_BY_RCU it is necessary that callers 122either first acquire a reference or they must hold the files_lock of the 123fdtable. 124