xref: /linux/Documentation/filesystems/bcachefs/casefolding.rst (revision 4a4b30ea80d8cb5e8c4c62bb86201f4ea0d9b030)
1.. SPDX-License-Identifier: GPL-2.0
2
3Casefolding
4===========
5
6bcachefs has support for case-insensitive file and directory
7lookups using the regular `chattr +F` (`S_CASEFOLD`, `FS_CASEFOLD_FL`)
8casefolding attributes.
9
10The main usecase for casefolding is compatibility with software written
11against other filesystems that rely on casefolded lookups
12(eg. NTFS and Wine/Proton).
13Taking advantage of file-system level casefolding can lead to great
14loading time gains in many applications and games.
15
16Casefolding support requires a kernel with the `CONFIG_UNICODE` enabled.
17Once a directory has been flagged for casefolding, a feature bit
18is enabled on the superblock which marks the filesystem as using
19casefolding.
20When the feature bit for casefolding is enabled, it is no longer possible
21to mount that filesystem on kernels without `CONFIG_UNICODE` enabled.
22
23On the lookup/query side: casefolding is implemented by allocating a new
24string of `BCH_NAME_MAX` length using the `utf8_casefold` function to
25casefold the query string.
26
27On the dirent side: casefolding is implemented by ensuring the `bkey`'s
28hash is made from the casefolded string and storing the cached casefolded
29name with the regular name in the dirent.
30
31The structure looks like this:
32
33* Regular:    [dirent data][regular name][nul][nul]...
34* Casefolded: [dirent data][reg len][cf len][regular name][casefolded name][nul][nul]...
35
36(Do note, the number of NULs here is merely for illustration; their count can
37vary per-key, and they may not even be present if the key is aligned to
38`sizeof(u64)`.)
39
40This is efficient as it means that for all file lookups that require casefolding,
41it has identical performance to a regular lookup:
42a hash comparison and a `memcmp` of the name.
43
44Rationale
45---------
46
47Several designs were considered for this system:
48One was to introduce a dirent_v2, however that would be painful especially as
49the hash system only has support for a single key type. This would also need
50`BCH_NAME_MAX` to change between versions, and a new feature bit.
51
52Another option was to store without the two lengths, and just take the length of
53the regular name and casefolded name contiguously / 2 as the length. This would
54assume that the regular length == casefolded length, but that could potentially
55not be true, if the uppercase unicode glyph had a different UTF-8 encoding than
56the lowercase unicode glyph.
57It would be possible to disregard the casefold cache for those cases, but it was
58decided to simply encode the two string lengths in the key to avoid random
59performance issues if this edgecase was ever hit.
60
61The option settled on was to use a free-bit in d_type to mark a dirent as having
62a casefold cache, and then treat the first 4 bytes the name block as lengths.
63You can see this in the `d_cf_name_block` member of union in `bch_dirent`.
64
65The feature bit was used to allow casefolding support to be enabled for the majority
66of users, but some allow users who have no need for the feature to still use bcachefs as
67`CONFIG_UNICODE` can increase the kernel side a significant amount due to the tables used,
68which may be decider between using bcachefs for eg. embedded platforms.
69
70Other filesystems like ext4 and f2fs have a super-block level option for casefolding
71encoding, but bcachefs currently does not provide this. ext4 and f2fs do not expose
72any encodings than a single UTF-8 version. When future encodings are desirable,
73they will be added trivially using the opts mechanism.
74
75dentry/dcache considerations
76----------------------------
77
78Currently, in casefolded directories, bcachefs (like other filesystems) will not cache
79negative dentry's.
80
81This is because currently doing so presents a problem in the following scenario:
82
83 - Lookup file "blAH" in a casefolded directory
84 - Creation of file "BLAH" in a casefolded directory
85 - Lookup file "blAH" in a casefolded directory
86
87This would fail if negative dentry's were cached.
88
89This is slightly suboptimal, but could be fixed in future with some vfs work.
90
91