1$FreeBSD$ 2 3Design of xlocale 4================= 5 6The xlocale APIs come from Darwin, although a subset is now part of POSIX 2008. 7They fall into two broad categories: 8 9- Manipulation of per-thread locales (POSIX) 10- Locale-aware functions taking an explicit locale argument (Darwin) 11 12This document describes the implementation of these APIs for FreeBSD. 13 14Goals 15----- 16 17The overall goal of this implementation is to be compatible with the Darwin 18version. Additionally, it should include minimal changes to the existing 19locale code. A lot of the existing locale code originates with 4BSD or earlier 20and has had over a decade of testing. Replacing this code, unless absolutely 21necessary, gives us the potential for more bugs without much benefit. 22 23With this in mind, various libc-private functions have been modified to take a 24locale_t parameter. This causes a compiler error if they are accidentally 25called without a locale. This approach was taken, rather than adding _l 26variants of these functions, to make it harder for accidental uses of the 27global-locale versions to slip in. 28 29Locale Objects 30-------------- 31 32A locale is encapsulated in a `locale_t`, which is an opaque type: a pointer to 33a `struct _xlocale`. The name `_xlocale` is unfortunate, as it does not fit 34well with existing conventions, but is used because this is the name the Darwin 35implementation gives to this structure and so may be used by existing (bad) code. 36 37This structure should include all of the information corresponding to a locale. 38A locale_t is almost immutable after creation. There are no functions that modify it, 39and it can therefore be used without locking. It is the responsibility of the 40caller to ensure that a locale is not deallocated during a call that uses it. 41 42Each locale contains a number of components, one for each of the categories 43supported by `setlocale()`. These are likewise immutable after creation. This 44differs from the Darwin implementation, which includes a deprecated 45`setinvalidrune()` function that can modify the rune locale. 46 47The exception to these mutability rules is a set of `mbstate_t` flags stored 48with each locale. These are used by various functions that previously had a 49static local `mbstate_t` variable. 50 51The components are reference counted, and so can be aliased between locale 52objects. This makes copying locales very cheap. 53 54The Global Locale 55----------------- 56 57All locales and locale components are reference counted. The global locale, 58however, is special. It, and all of its components, are static and so no 59malloc() memory is required when using a single locale. 60 61This means that threads using the global locale are subject to the same 62constraints as with the pre-xlocale libc. Calls to any locale-aware functions 63in threads using the global locale, while modifying the global locale, have 64undefined behaviour. 65 66Because of this, we have to ensure that we always copy the components of the 67global locale, rather than alias them. 68 69It would be cleaner to simply remove the special treatment of the global locale 70and have a locale_t lazily allocated for the global context. This would cost a 71little more `malloc()` memory, so is not done in the initial version. 72 73Caching 74------- 75 76The existing locale implementation included several ad-hoc caching layers. 77None of these were thread safe. Caching is only really of use for supporting 78the pattern where the locale is briefly changed to something and then changed 79back. 80 81The current xlocale implementation removes the caching entirely. This pattern 82is not one that should be encouraged. If you need to make some calls with a 83modified locale, then you should use the _l suffix versions of the calls, 84rather than switch the global locale. If you do need to temporarily switch the 85locale and then switch it back, `uselocale()` provides a way of doing this very 86easily: It returns the old locale, which can then be passed to a subsequent 87call to `uselocale()` to restore it, without the need to load any locale data 88from the disk. 89 90If, in the future, it is determined that caching is beneficial, it can be added 91quite easily in xlocale.c. Given, however, that any locale-aware call is going 92to be a preparation for presenting data to the user, and so is invariably going 93to be part of an I/O operation, this seems like a case of premature 94optimisation. 95 96localeconv 97---------- 98 99The `localeconv()` function is an exception to the immutable-after-creation 100rule. In the classic implementation, this function returns a pointer to some 101global storage, which is initialised with the data from the current locale. 102This is not possible in a multithreaded environment, with multiple locales. 103 104Instead, each locale contains a `struct lconv` that is lazily initialised on 105calls to `localeconv()`. This is not protected by any locking, however this is 106still safe on any machine where word-sized stores are atomic: two concurrent 107calls will write the same data into the structure. 108 109Explicit Locale Calls 110--------------------- 111 112A large number of functions have been modified to take an explicit `locale_t` 113parameter. The old APIs are then reimplemented with a call to `__get_locale()` 114to supply the `locale_t` parameter. This is in line with the Darwin public 115APIs, but also simplifies the modifications to these functions. The 116`__get_locale()` function is now the only way to access the current locale 117within libc. All of the old globals have gone, so there is now a linker error 118if any functions attempt to use them. 119 120The ctype.h functions are a little different. These are not implemented in 121terms of their locale-aware versions, for performance reasons. Each of these 122is implemented as a short inline function. 123 124Differences to Darwin APIs 125-------------------------- 126 127`strtoq_l()` and `strtouq_l() `are not provided. These are extensions to 128deprecated functions - we should not be encouraging people to use deprecated 129interfaces. 130 131Locale Placeholders 132------------------- 133 134The pointer values 0 and -1 have special meanings as `locale_t` values. Any 135public function that accepts a `locale_t` parameter must use the `FIX_LOCALE()` 136macro on it before using it. For efficiency, this can be emitted in functions 137which *only* use their locale parameter as an argument to another public 138function, as the callee will do the `FIX_LOCALE()` itself. 139 140Potential Improvements 141---------------------- 142 143Currently, the current rune set is accessed via a function call. This makes it 144fairly expensive to use any of the ctype.h functions. We could improve this 145quite a lot by storing the rune locale data in a __thread-qualified variable. 146 147Several of the existing FreeBSD locale-aware functions appear to be wrong. For 148example, most of the `strto*()` family should probably use `digittoint_l()`, 149but instead they assume ASCII. These will break if using a character encoding 150that does not put numbers and the letters A-F in the same location as ASCII. 151Some functions, like `strcoll()` only work on single-byte encodings. No 152attempt has been made to fix existing limitations in the libc functions other 153than to add support for xlocale. 154 155Intuitively, setting a thread-local locale should ensure that all locale-aware 156functions can be used safely from that thread. In fact, this is not the case 157in either this implementation or the Darwin one. You must call `duplocale()` 158or `newlocale()` before calling `uselocale()`. This is a bit ugly, and it 159would be better if libc ensure that every thread had its own locale object. 160