xref: /freebsd/lib/libc/locale/DESIGN.xlocale (revision 05248206f720394d95c2a7475429311df670a2e9)
1*3c87aa1dSDavid Chisnall
2*3c87aa1dSDavid ChisnallDesign of xlocale
3*3c87aa1dSDavid Chisnall=================
4*3c87aa1dSDavid Chisnall
5*3c87aa1dSDavid ChisnallThe xlocale APIs come from Darwin, although a subset is now part of POSIX 2008.
6*3c87aa1dSDavid ChisnallThey fall into two broad categories:
7*3c87aa1dSDavid Chisnall
8*3c87aa1dSDavid Chisnall- Manipulation of per-thread locales (POSIX)
9*3c87aa1dSDavid Chisnall- Locale-aware functions taking an explicit locale argument (Darwin)
10*3c87aa1dSDavid Chisnall
11*3c87aa1dSDavid ChisnallThis document describes the implementation of these APIs for FreeBSD.
12*3c87aa1dSDavid Chisnall
13*3c87aa1dSDavid ChisnallGoals
14*3c87aa1dSDavid Chisnall-----
15*3c87aa1dSDavid Chisnall
16*3c87aa1dSDavid ChisnallThe overall goal of this implementation is to be compatible with the Darwin
17*3c87aa1dSDavid Chisnallversion.  Additionally, it should include minimal changes to the existing
18*3c87aa1dSDavid Chisnalllocale code.  A lot of the existing locale code originates with 4BSD or earlier
19*3c87aa1dSDavid Chisnalland has had over a decade of testing.  Replacing this code, unless absolutely
20*3c87aa1dSDavid Chisnallnecessary, gives us the potential for more bugs without much benefit.
21*3c87aa1dSDavid Chisnall
22*3c87aa1dSDavid ChisnallWith this in mind, various libc-private functions have been modified to take a
23*3c87aa1dSDavid Chisnalllocale_t parameter.  This causes a compiler error if they are accidentally
24*3c87aa1dSDavid Chisnallcalled without a locale.  This approach was taken, rather than adding _l
25*3c87aa1dSDavid Chisnallvariants of these functions, to make it harder for accidental uses of the
26*3c87aa1dSDavid Chisnallglobal-locale versions to slip in.
27*3c87aa1dSDavid Chisnall
28*3c87aa1dSDavid ChisnallLocale Objects
29*3c87aa1dSDavid Chisnall--------------
30*3c87aa1dSDavid Chisnall
31*3c87aa1dSDavid ChisnallA locale is encapsulated in a `locale_t`, which is an opaque type: a pointer to
32*3c87aa1dSDavid Chisnalla `struct _xlocale`.  The name `_xlocale` is unfortunate, as it does not fit
33*3c87aa1dSDavid Chisnallwell with existing conventions, but is used because this is the name the Darwin
34*3c87aa1dSDavid Chisnallimplementation gives to this structure and so may be used by existing (bad) code.
35*3c87aa1dSDavid Chisnall
36*3c87aa1dSDavid ChisnallThis structure should include all of the information corresponding to a locale.
37*3c87aa1dSDavid ChisnallA locale_t is almost immutable after creation.  There are no functions that modify it,
38*3c87aa1dSDavid Chisnalland it can therefore be used without locking.  It is the responsibility of the
39*3c87aa1dSDavid Chisnallcaller to ensure that a locale is not deallocated during a call that uses it.
40*3c87aa1dSDavid Chisnall
41*3c87aa1dSDavid ChisnallEach locale contains a number of components, one for each of the categories
42*3c87aa1dSDavid Chisnallsupported by `setlocale()`.  These are likewise immutable after creation.  This
43*3c87aa1dSDavid Chisnalldiffers from the Darwin implementation, which includes a deprecated
44*3c87aa1dSDavid Chisnall`setinvalidrune()` function that can modify the rune locale.
45*3c87aa1dSDavid Chisnall
46*3c87aa1dSDavid ChisnallThe exception to these mutability rules is a set of `mbstate_t` flags stored
47*3c87aa1dSDavid Chisnallwith each locale.  These are used by various functions that previously had a
48*3c87aa1dSDavid Chisnallstatic local `mbstate_t` variable.
49*3c87aa1dSDavid Chisnall
50*3c87aa1dSDavid ChisnallThe components are reference counted, and so can be aliased between locale
51*3c87aa1dSDavid Chisnallobjects.  This makes copying locales very cheap.
52*3c87aa1dSDavid Chisnall
53*3c87aa1dSDavid ChisnallThe Global Locale
54*3c87aa1dSDavid Chisnall-----------------
55*3c87aa1dSDavid Chisnall
56*3c87aa1dSDavid ChisnallAll locales and locale components are reference counted.  The global locale,
57*3c87aa1dSDavid Chisnallhowever, is special.  It, and all of its components, are static and so no
58*3c87aa1dSDavid Chisnallmalloc() memory is required when using a single locale.
59*3c87aa1dSDavid Chisnall
60*3c87aa1dSDavid ChisnallThis means that threads using the global locale are subject to the same
61*3c87aa1dSDavid Chisnallconstraints as with the pre-xlocale libc.  Calls to any locale-aware functions
62*3c87aa1dSDavid Chisnallin threads using the global locale, while modifying the global locale, have
63*3c87aa1dSDavid Chisnallundefined behaviour.
64*3c87aa1dSDavid Chisnall
65*3c87aa1dSDavid ChisnallBecause of this, we have to ensure that we always copy the components of the
66*3c87aa1dSDavid Chisnallglobal locale, rather than alias them.
67*3c87aa1dSDavid Chisnall
68*3c87aa1dSDavid ChisnallIt would be cleaner to simply remove the special treatment of the global locale
69*3c87aa1dSDavid Chisnalland have a locale_t lazily allocated for the global context.  This would cost a
70*3c87aa1dSDavid Chisnalllittle more `malloc()` memory, so is not done in the initial version.
71*3c87aa1dSDavid Chisnall
72*3c87aa1dSDavid ChisnallCaching
73*3c87aa1dSDavid Chisnall-------
74*3c87aa1dSDavid Chisnall
75*3c87aa1dSDavid ChisnallThe existing locale implementation included several ad-hoc caching layers.
76*3c87aa1dSDavid ChisnallNone of these were thread safe.  Caching is only really of use for supporting
77*3c87aa1dSDavid Chisnallthe pattern where the locale is briefly changed to something and then changed
78*3c87aa1dSDavid Chisnallback.
79*3c87aa1dSDavid Chisnall
80*3c87aa1dSDavid ChisnallThe current xlocale implementation removes the caching entirely.  This pattern
81*3c87aa1dSDavid Chisnallis not one that should be encouraged.  If you need to make some calls with a
82*3c87aa1dSDavid Chisnallmodified locale, then you should use the _l suffix versions of the calls,
83*3c87aa1dSDavid Chisnallrather than switch the global locale.  If you do need to temporarily switch the
84*3c87aa1dSDavid Chisnalllocale and then switch it back, `uselocale()` provides a way of doing this very
85*3c87aa1dSDavid Chisnalleasily: It returns the old locale, which can then be passed to a subsequent
86*3c87aa1dSDavid Chisnallcall to `uselocale()` to restore it, without the need to load any locale data
87*3c87aa1dSDavid Chisnallfrom the disk.
88*3c87aa1dSDavid Chisnall
89*3c87aa1dSDavid ChisnallIf, in the future, it is determined that caching is beneficial, it can be added
90*3c87aa1dSDavid Chisnallquite easily in xlocale.c.  Given, however, that any locale-aware call is going
91*3c87aa1dSDavid Chisnallto be a preparation for presenting data to the user, and so is invariably going
92*3c87aa1dSDavid Chisnallto be part of an I/O operation, this seems like a case of premature
93*3c87aa1dSDavid Chisnalloptimisation.
94*3c87aa1dSDavid Chisnall
95*3c87aa1dSDavid Chisnalllocaleconv
96*3c87aa1dSDavid Chisnall----------
97*3c87aa1dSDavid Chisnall
98*3c87aa1dSDavid ChisnallThe `localeconv()` function is an exception to the immutable-after-creation
99*3c87aa1dSDavid Chisnallrule.  In the classic implementation, this function returns a pointer to some
100*3c87aa1dSDavid Chisnallglobal storage, which is initialised with the data from the current locale.
101*3c87aa1dSDavid ChisnallThis is not possible in a multithreaded environment, with multiple locales.
102*3c87aa1dSDavid Chisnall
103*3c87aa1dSDavid ChisnallInstead, each locale contains a `struct lconv` that is lazily initialised on
104*3c87aa1dSDavid Chisnallcalls to `localeconv()`.  This is not protected by any locking, however this is
105*3c87aa1dSDavid Chisnallstill safe on any machine where word-sized stores are atomic: two concurrent
106*3c87aa1dSDavid Chisnallcalls will write the same data into the structure.
107*3c87aa1dSDavid Chisnall
108*3c87aa1dSDavid ChisnallExplicit Locale Calls
109*3c87aa1dSDavid Chisnall---------------------
110*3c87aa1dSDavid Chisnall
111*3c87aa1dSDavid ChisnallA large number of functions have been modified to take an explicit `locale_t`
112*3c87aa1dSDavid Chisnallparameter.  The old APIs are then reimplemented with a call to `__get_locale()`
113*3c87aa1dSDavid Chisnallto supply the `locale_t` parameter.  This is in line with the Darwin public
114*3c87aa1dSDavid ChisnallAPIs, but also simplifies the modifications to these functions.  The
115*3c87aa1dSDavid Chisnall`__get_locale()` function is now the only way to access the current locale
116*3c87aa1dSDavid Chisnallwithin libc.  All of the old globals have gone, so there is now a linker error
117*3c87aa1dSDavid Chisnallif any functions attempt to use them.
118*3c87aa1dSDavid Chisnall
119*3c87aa1dSDavid ChisnallThe ctype.h functions are a little different.  These are not implemented in
120*3c87aa1dSDavid Chisnallterms of their locale-aware versions, for performance reasons.  Each of these
121*3c87aa1dSDavid Chisnallis implemented as a short inline function.
122*3c87aa1dSDavid Chisnall
123*3c87aa1dSDavid ChisnallDifferences to Darwin APIs
124*3c87aa1dSDavid Chisnall--------------------------
125*3c87aa1dSDavid Chisnall
126*3c87aa1dSDavid Chisnall`strtoq_l()` and `strtouq_l() `are not provided.  These are extensions to
127*3c87aa1dSDavid Chisnalldeprecated functions - we should not be encouraging people to use deprecated
128*3c87aa1dSDavid Chisnallinterfaces.
129*3c87aa1dSDavid Chisnall
130*3c87aa1dSDavid ChisnallLocale Placeholders
131*3c87aa1dSDavid Chisnall-------------------
132*3c87aa1dSDavid Chisnall
133*3c87aa1dSDavid ChisnallThe pointer values 0 and -1 have special meanings as `locale_t` values.  Any
134*3c87aa1dSDavid Chisnallpublic function that accepts a `locale_t` parameter must use the `FIX_LOCALE()`
135*3c87aa1dSDavid Chisnallmacro on it before using it.  For efficiency, this can be emitted in functions
136*3c87aa1dSDavid Chisnallwhich *only* use their locale parameter as an argument to another public
137*3c87aa1dSDavid Chisnallfunction, as the callee will do the `FIX_LOCALE()` itself.
138*3c87aa1dSDavid Chisnall
139*3c87aa1dSDavid ChisnallPotential Improvements
140*3c87aa1dSDavid Chisnall----------------------
141*3c87aa1dSDavid Chisnall
142*3c87aa1dSDavid ChisnallCurrently, the current rune set is accessed via a function call.  This makes it
143*3c87aa1dSDavid Chisnallfairly expensive to use any of the ctype.h functions.  We could improve this
144*3c87aa1dSDavid Chisnallquite a lot by storing the rune locale data in a __thread-qualified variable.
145*3c87aa1dSDavid Chisnall
146*3c87aa1dSDavid ChisnallSeveral of the existing FreeBSD locale-aware functions appear to be wrong.  For
147*3c87aa1dSDavid Chisnallexample, most of the `strto*()` family should probably use `digittoint_l()`,
148*3c87aa1dSDavid Chisnallbut instead they assume ASCII.  These will break if using a character encoding
149*3c87aa1dSDavid Chisnallthat does not put numbers and the letters A-F in the same location as ASCII.
150*3c87aa1dSDavid ChisnallSome functions, like `strcoll()` only work on single-byte encodings.  No
151*3c87aa1dSDavid Chisnallattempt has been made to fix existing limitations in the libc functions other
152*3c87aa1dSDavid Chisnallthan to add support for xlocale.
153*3c87aa1dSDavid Chisnall
154*3c87aa1dSDavid ChisnallIntuitively, setting a thread-local locale should ensure that all locale-aware
155*3c87aa1dSDavid Chisnallfunctions can be used safely from that thread.  In fact, this is not the case
156*3c87aa1dSDavid Chisnallin either this implementation or the Darwin one.  You must call `duplocale()`
157*3c87aa1dSDavid Chisnallor `newlocale()` before calling `uselocale()`.  This is a bit ugly, and it
158*3c87aa1dSDavid Chisnallwould be better if libc ensure that every thread had its own locale object.
159