1*3c87aa1dSDavid Chisnall 2*3c87aa1dSDavid ChisnallDesign of xlocale 3*3c87aa1dSDavid Chisnall================= 4*3c87aa1dSDavid Chisnall 5*3c87aa1dSDavid ChisnallThe xlocale APIs come from Darwin, although a subset is now part of POSIX 2008. 6*3c87aa1dSDavid ChisnallThey fall into two broad categories: 7*3c87aa1dSDavid Chisnall 8*3c87aa1dSDavid Chisnall- Manipulation of per-thread locales (POSIX) 9*3c87aa1dSDavid Chisnall- Locale-aware functions taking an explicit locale argument (Darwin) 10*3c87aa1dSDavid Chisnall 11*3c87aa1dSDavid ChisnallThis document describes the implementation of these APIs for FreeBSD. 12*3c87aa1dSDavid Chisnall 13*3c87aa1dSDavid ChisnallGoals 14*3c87aa1dSDavid Chisnall----- 15*3c87aa1dSDavid Chisnall 16*3c87aa1dSDavid ChisnallThe overall goal of this implementation is to be compatible with the Darwin 17*3c87aa1dSDavid Chisnallversion. Additionally, it should include minimal changes to the existing 18*3c87aa1dSDavid Chisnalllocale code. A lot of the existing locale code originates with 4BSD or earlier 19*3c87aa1dSDavid Chisnalland has had over a decade of testing. Replacing this code, unless absolutely 20*3c87aa1dSDavid Chisnallnecessary, gives us the potential for more bugs without much benefit. 21*3c87aa1dSDavid Chisnall 22*3c87aa1dSDavid ChisnallWith this in mind, various libc-private functions have been modified to take a 23*3c87aa1dSDavid Chisnalllocale_t parameter. This causes a compiler error if they are accidentally 24*3c87aa1dSDavid Chisnallcalled without a locale. This approach was taken, rather than adding _l 25*3c87aa1dSDavid Chisnallvariants of these functions, to make it harder for accidental uses of the 26*3c87aa1dSDavid Chisnallglobal-locale versions to slip in. 27*3c87aa1dSDavid Chisnall 28*3c87aa1dSDavid ChisnallLocale Objects 29*3c87aa1dSDavid Chisnall-------------- 30*3c87aa1dSDavid Chisnall 31*3c87aa1dSDavid ChisnallA locale is encapsulated in a `locale_t`, which is an opaque type: a pointer to 32*3c87aa1dSDavid Chisnalla `struct _xlocale`. The name `_xlocale` is unfortunate, as it does not fit 33*3c87aa1dSDavid Chisnallwell with existing conventions, but is used because this is the name the Darwin 34*3c87aa1dSDavid Chisnallimplementation gives to this structure and so may be used by existing (bad) code. 35*3c87aa1dSDavid Chisnall 36*3c87aa1dSDavid ChisnallThis structure should include all of the information corresponding to a locale. 37*3c87aa1dSDavid ChisnallA locale_t is almost immutable after creation. There are no functions that modify it, 38*3c87aa1dSDavid Chisnalland it can therefore be used without locking. It is the responsibility of the 39*3c87aa1dSDavid Chisnallcaller to ensure that a locale is not deallocated during a call that uses it. 40*3c87aa1dSDavid Chisnall 41*3c87aa1dSDavid ChisnallEach locale contains a number of components, one for each of the categories 42*3c87aa1dSDavid Chisnallsupported by `setlocale()`. These are likewise immutable after creation. This 43*3c87aa1dSDavid Chisnalldiffers from the Darwin implementation, which includes a deprecated 44*3c87aa1dSDavid Chisnall`setinvalidrune()` function that can modify the rune locale. 45*3c87aa1dSDavid Chisnall 46*3c87aa1dSDavid ChisnallThe exception to these mutability rules is a set of `mbstate_t` flags stored 47*3c87aa1dSDavid Chisnallwith each locale. These are used by various functions that previously had a 48*3c87aa1dSDavid Chisnallstatic local `mbstate_t` variable. 49*3c87aa1dSDavid Chisnall 50*3c87aa1dSDavid ChisnallThe components are reference counted, and so can be aliased between locale 51*3c87aa1dSDavid Chisnallobjects. This makes copying locales very cheap. 52*3c87aa1dSDavid Chisnall 53*3c87aa1dSDavid ChisnallThe Global Locale 54*3c87aa1dSDavid Chisnall----------------- 55*3c87aa1dSDavid Chisnall 56*3c87aa1dSDavid ChisnallAll locales and locale components are reference counted. The global locale, 57*3c87aa1dSDavid Chisnallhowever, is special. It, and all of its components, are static and so no 58*3c87aa1dSDavid Chisnallmalloc() memory is required when using a single locale. 59*3c87aa1dSDavid Chisnall 60*3c87aa1dSDavid ChisnallThis means that threads using the global locale are subject to the same 61*3c87aa1dSDavid Chisnallconstraints as with the pre-xlocale libc. Calls to any locale-aware functions 62*3c87aa1dSDavid Chisnallin threads using the global locale, while modifying the global locale, have 63*3c87aa1dSDavid Chisnallundefined behaviour. 64*3c87aa1dSDavid Chisnall 65*3c87aa1dSDavid ChisnallBecause of this, we have to ensure that we always copy the components of the 66*3c87aa1dSDavid Chisnallglobal locale, rather than alias them. 67*3c87aa1dSDavid Chisnall 68*3c87aa1dSDavid ChisnallIt would be cleaner to simply remove the special treatment of the global locale 69*3c87aa1dSDavid Chisnalland have a locale_t lazily allocated for the global context. This would cost a 70*3c87aa1dSDavid Chisnalllittle more `malloc()` memory, so is not done in the initial version. 71*3c87aa1dSDavid Chisnall 72*3c87aa1dSDavid ChisnallCaching 73*3c87aa1dSDavid Chisnall------- 74*3c87aa1dSDavid Chisnall 75*3c87aa1dSDavid ChisnallThe existing locale implementation included several ad-hoc caching layers. 76*3c87aa1dSDavid ChisnallNone of these were thread safe. Caching is only really of use for supporting 77*3c87aa1dSDavid Chisnallthe pattern where the locale is briefly changed to something and then changed 78*3c87aa1dSDavid Chisnallback. 79*3c87aa1dSDavid Chisnall 80*3c87aa1dSDavid ChisnallThe current xlocale implementation removes the caching entirely. This pattern 81*3c87aa1dSDavid Chisnallis not one that should be encouraged. If you need to make some calls with a 82*3c87aa1dSDavid Chisnallmodified locale, then you should use the _l suffix versions of the calls, 83*3c87aa1dSDavid Chisnallrather than switch the global locale. If you do need to temporarily switch the 84*3c87aa1dSDavid Chisnalllocale and then switch it back, `uselocale()` provides a way of doing this very 85*3c87aa1dSDavid Chisnalleasily: It returns the old locale, which can then be passed to a subsequent 86*3c87aa1dSDavid Chisnallcall to `uselocale()` to restore it, without the need to load any locale data 87*3c87aa1dSDavid Chisnallfrom the disk. 88*3c87aa1dSDavid Chisnall 89*3c87aa1dSDavid ChisnallIf, in the future, it is determined that caching is beneficial, it can be added 90*3c87aa1dSDavid Chisnallquite easily in xlocale.c. Given, however, that any locale-aware call is going 91*3c87aa1dSDavid Chisnallto be a preparation for presenting data to the user, and so is invariably going 92*3c87aa1dSDavid Chisnallto be part of an I/O operation, this seems like a case of premature 93*3c87aa1dSDavid Chisnalloptimisation. 94*3c87aa1dSDavid Chisnall 95*3c87aa1dSDavid Chisnalllocaleconv 96*3c87aa1dSDavid Chisnall---------- 97*3c87aa1dSDavid Chisnall 98*3c87aa1dSDavid ChisnallThe `localeconv()` function is an exception to the immutable-after-creation 99*3c87aa1dSDavid Chisnallrule. In the classic implementation, this function returns a pointer to some 100*3c87aa1dSDavid Chisnallglobal storage, which is initialised with the data from the current locale. 101*3c87aa1dSDavid ChisnallThis is not possible in a multithreaded environment, with multiple locales. 102*3c87aa1dSDavid Chisnall 103*3c87aa1dSDavid ChisnallInstead, each locale contains a `struct lconv` that is lazily initialised on 104*3c87aa1dSDavid Chisnallcalls to `localeconv()`. This is not protected by any locking, however this is 105*3c87aa1dSDavid Chisnallstill safe on any machine where word-sized stores are atomic: two concurrent 106*3c87aa1dSDavid Chisnallcalls will write the same data into the structure. 107*3c87aa1dSDavid Chisnall 108*3c87aa1dSDavid ChisnallExplicit Locale Calls 109*3c87aa1dSDavid Chisnall--------------------- 110*3c87aa1dSDavid Chisnall 111*3c87aa1dSDavid ChisnallA large number of functions have been modified to take an explicit `locale_t` 112*3c87aa1dSDavid Chisnallparameter. The old APIs are then reimplemented with a call to `__get_locale()` 113*3c87aa1dSDavid Chisnallto supply the `locale_t` parameter. This is in line with the Darwin public 114*3c87aa1dSDavid ChisnallAPIs, but also simplifies the modifications to these functions. The 115*3c87aa1dSDavid Chisnall`__get_locale()` function is now the only way to access the current locale 116*3c87aa1dSDavid Chisnallwithin libc. All of the old globals have gone, so there is now a linker error 117*3c87aa1dSDavid Chisnallif any functions attempt to use them. 118*3c87aa1dSDavid Chisnall 119*3c87aa1dSDavid ChisnallThe ctype.h functions are a little different. These are not implemented in 120*3c87aa1dSDavid Chisnallterms of their locale-aware versions, for performance reasons. Each of these 121*3c87aa1dSDavid Chisnallis implemented as a short inline function. 122*3c87aa1dSDavid Chisnall 123*3c87aa1dSDavid ChisnallDifferences to Darwin APIs 124*3c87aa1dSDavid Chisnall-------------------------- 125*3c87aa1dSDavid Chisnall 126*3c87aa1dSDavid Chisnall`strtoq_l()` and `strtouq_l() `are not provided. These are extensions to 127*3c87aa1dSDavid Chisnalldeprecated functions - we should not be encouraging people to use deprecated 128*3c87aa1dSDavid Chisnallinterfaces. 129*3c87aa1dSDavid Chisnall 130*3c87aa1dSDavid ChisnallLocale Placeholders 131*3c87aa1dSDavid Chisnall------------------- 132*3c87aa1dSDavid Chisnall 133*3c87aa1dSDavid ChisnallThe pointer values 0 and -1 have special meanings as `locale_t` values. Any 134*3c87aa1dSDavid Chisnallpublic function that accepts a `locale_t` parameter must use the `FIX_LOCALE()` 135*3c87aa1dSDavid Chisnallmacro on it before using it. For efficiency, this can be emitted in functions 136*3c87aa1dSDavid Chisnallwhich *only* use their locale parameter as an argument to another public 137*3c87aa1dSDavid Chisnallfunction, as the callee will do the `FIX_LOCALE()` itself. 138*3c87aa1dSDavid Chisnall 139*3c87aa1dSDavid ChisnallPotential Improvements 140*3c87aa1dSDavid Chisnall---------------------- 141*3c87aa1dSDavid Chisnall 142*3c87aa1dSDavid ChisnallCurrently, the current rune set is accessed via a function call. This makes it 143*3c87aa1dSDavid Chisnallfairly expensive to use any of the ctype.h functions. We could improve this 144*3c87aa1dSDavid Chisnallquite a lot by storing the rune locale data in a __thread-qualified variable. 145*3c87aa1dSDavid Chisnall 146*3c87aa1dSDavid ChisnallSeveral of the existing FreeBSD locale-aware functions appear to be wrong. For 147*3c87aa1dSDavid Chisnallexample, most of the `strto*()` family should probably use `digittoint_l()`, 148*3c87aa1dSDavid Chisnallbut instead they assume ASCII. These will break if using a character encoding 149*3c87aa1dSDavid Chisnallthat does not put numbers and the letters A-F in the same location as ASCII. 150*3c87aa1dSDavid ChisnallSome functions, like `strcoll()` only work on single-byte encodings. No 151*3c87aa1dSDavid Chisnallattempt has been made to fix existing limitations in the libc functions other 152*3c87aa1dSDavid Chisnallthan to add support for xlocale. 153*3c87aa1dSDavid Chisnall 154*3c87aa1dSDavid ChisnallIntuitively, setting a thread-local locale should ensure that all locale-aware 155*3c87aa1dSDavid Chisnallfunctions can be used safely from that thread. In fact, this is not the case 156*3c87aa1dSDavid Chisnallin either this implementation or the Darwin one. You must call `duplocale()` 157*3c87aa1dSDavid Chisnallor `newlocale()` before calling `uselocale()`. This is a bit ugly, and it 158*3c87aa1dSDavid Chisnallwould be better if libc ensure that every thread had its own locale object. 159