xref: /freebsd/contrib/libxo/doc/field-formatting.rst (revision 34b867ca30479cec104fd069178df294f8ea35f1)
1983afe33SPhil Shafer
2983afe33SPhil Shafer.. index:: Field Formatting
3*34b867caSPhil Shafer.. _field-formatting:
4983afe33SPhil Shafer
5983afe33SPhil ShaferField Formatting
6983afe33SPhil Shafer----------------
7983afe33SPhil Shafer
8983afe33SPhil ShaferThe field format is similar to the format string for printf(3).  Its
9983afe33SPhil Shaferuse varies based on the role of the field, but generally is used to
10983afe33SPhil Shaferformat the field's contents.
11983afe33SPhil Shafer
12983afe33SPhil ShaferIf the format string is not provided for a value field, it defaults to
13983afe33SPhil Shafer"%s".
14983afe33SPhil Shafer
15983afe33SPhil ShaferNote a field definition can contain zero or more printf-style
16983afe33SPhil Shafer'directives', which are sequences that start with a '%' and end with
17983afe33SPhil Shaferone of following characters: "diouxXDOUeEfFgGaAcCsSp".  Each directive
18983afe33SPhil Shaferis matched by one of more arguments to the xo_emit function.
19983afe33SPhil Shafer
20983afe33SPhil ShaferThe format string has the form::
21983afe33SPhil Shafer
22983afe33SPhil Shafer  '%' format-modifier * format-character
23983afe33SPhil Shafer
24983afe33SPhil ShaferThe format-modifier can be:
25983afe33SPhil Shafer
26983afe33SPhil Shafer- a '#' character, indicating the output value should be prefixed
27983afe33SPhil Shafer  with '0x', typically to indicate a base 16 (hex) value.
28983afe33SPhil Shafer- a minus sign ('-'), indicating the output value should be padded on
29983afe33SPhil Shafer  the right instead of the left.
30983afe33SPhil Shafer- a leading zero ('0') indicating the output value should be padded on the
31983afe33SPhil Shafer  left with zeroes instead of spaces (' ').
32983afe33SPhil Shafer- one or more digits ('0' - '9') indicating the minimum width of the
33983afe33SPhil Shafer  argument.  If the width in columns of the output value is less than
34983afe33SPhil Shafer  the minimum width, the value will be padded to reach the minimum.
35983afe33SPhil Shafer- a period followed by one or more digits indicating the maximum
36983afe33SPhil Shafer  number of bytes which will be examined for a string argument, or the maximum
37983afe33SPhil Shafer  width for a non-string argument.  When handling ASCII strings this
38983afe33SPhil Shafer  functions as the field width but for multi-byte characters, a single
39983afe33SPhil Shafer  character may be composed of multiple bytes.
40983afe33SPhil Shafer  xo_emit will never dereference memory beyond the given number of bytes.
41983afe33SPhil Shafer- a second period followed by one or more digits indicating the maximum
42983afe33SPhil Shafer  width for a string argument.  This modifier cannot be given for non-string
43983afe33SPhil Shafer  arguments.
44983afe33SPhil Shafer- one or more 'h' characters, indicating shorter input data.
45983afe33SPhil Shafer- one or more 'l' characters, indicating longer input data.
46983afe33SPhil Shafer- a 'z' character, indicating a 'size_t' argument.
47983afe33SPhil Shafer- a 't' character, indicating a 'ptrdiff_t' argument.
48983afe33SPhil Shafer- a ' ' character, indicating a space should be emitted before
49983afe33SPhil Shafer  positive numbers.
50983afe33SPhil Shafer- a '+' character, indicating sign should emitted before any number.
51983afe33SPhil Shafer
52983afe33SPhil ShaferNote that 'q', 'D', 'O', and 'U' are considered deprecated and will be
53983afe33SPhil Shaferremoved eventually.
54983afe33SPhil Shafer
55983afe33SPhil ShaferThe format character is described in the following table:
56983afe33SPhil Shafer
57983afe33SPhil Shafer  ===== ================= ======================
58983afe33SPhil Shafer   Ltr   Argument Type     Format
59983afe33SPhil Shafer  ===== ================= ======================
60983afe33SPhil Shafer   d     int               base 10 (decimal)
61983afe33SPhil Shafer   i     int               base 10 (decimal)
62983afe33SPhil Shafer   o     int               base 8 (octal)
63983afe33SPhil Shafer   u     unsigned          base 10 (decimal)
64983afe33SPhil Shafer   x     unsigned          base 16 (hex)
65983afe33SPhil Shafer   X     unsigned long     base 16 (hex)
66983afe33SPhil Shafer   D     long              base 10 (decimal)
67983afe33SPhil Shafer   O     unsigned long     base 8 (octal)
68983afe33SPhil Shafer   U     unsigned long     base 10 (decimal)
69983afe33SPhil Shafer   e     double            [-]d.ddde+-dd
70983afe33SPhil Shafer   E     double            [-]d.dddE+-dd
71983afe33SPhil Shafer   f     double            [-]ddd.ddd
72983afe33SPhil Shafer   F     double            [-]ddd.ddd
73983afe33SPhil Shafer   g     double            as 'e' or 'f'
74983afe33SPhil Shafer   G     double            as 'E' or 'F'
75983afe33SPhil Shafer   a     double            [-]0xh.hhhp[+-]d
76983afe33SPhil Shafer   A     double            [-]0Xh.hhhp[+-]d
77983afe33SPhil Shafer   c     unsigned char     a character
78983afe33SPhil Shafer   C     wint_t            a character
79983afe33SPhil Shafer   s     char \*           a UTF-8 string
80983afe33SPhil Shafer   S     wchar_t \*        a unicode/WCS string
81983afe33SPhil Shafer   p     void \*           '%#lx'
82983afe33SPhil Shafer  ===== ================= ======================
83983afe33SPhil Shafer
84983afe33SPhil ShaferThe 'h' and 'l' modifiers affect the size and treatment of the
85983afe33SPhil Shaferargument:
86983afe33SPhil Shafer
87983afe33SPhil Shafer  ===== ============= ====================
88983afe33SPhil Shafer   Mod   d, i          o, u, x, X
89983afe33SPhil Shafer  ===== ============= ====================
90983afe33SPhil Shafer   hh    signed char   unsigned char
91983afe33SPhil Shafer   h     short         unsigned short
92983afe33SPhil Shafer   l     long          unsigned long
93983afe33SPhil Shafer   ll    long long     unsigned long long
94983afe33SPhil Shafer   j     intmax_t      uintmax_t
95983afe33SPhil Shafer   t     ptrdiff_t     ptrdiff_t
96983afe33SPhil Shafer   z     size_t        size_t
97983afe33SPhil Shafer   q     quad_t        u_quad_t
98983afe33SPhil Shafer  ===== ============= ====================
99983afe33SPhil Shafer
100983afe33SPhil Shafer.. index:: UTF-8
101983afe33SPhil Shafer.. index:: Locale
102983afe33SPhil Shafer
103983afe33SPhil Shafer.. _utf-8:
104983afe33SPhil Shafer
105983afe33SPhil ShaferUTF-8 and Locale Strings
106983afe33SPhil Shafer~~~~~~~~~~~~~~~~~~~~~~~~
107983afe33SPhil Shafer
108983afe33SPhil ShaferFor strings, the 'h' and 'l' modifiers affect the interpretation of
109983afe33SPhil Shaferthe bytes pointed to argument.  The default '%s' string is a 'char \*'
110983afe33SPhil Shaferpointer to a string encoded as UTF-8.  Since UTF-8 is compatible with
111983afe33SPhil ShaferASCII data, a normal 7-bit ASCII string can be used.  '%ls' expects a
112983afe33SPhil Shafer'wchar_t \*' pointer to a wide-character string, encoded as a 32-bit
113983afe33SPhil ShaferUnicode values.  '%hs' expects a 'char \*' pointer to a multi-byte
114983afe33SPhil Shaferstring encoded with the current locale, as given by the LC_CTYPE,
115983afe33SPhil ShaferLANG, or LC_ALL environment varibles.  The first of this list of
116983afe33SPhil Shafervariables is used and if none of the variables are set, the locale
117983afe33SPhil Shaferdefaults to "UTF-8".
118983afe33SPhil Shafer
119983afe33SPhil Shaferlibxo will convert these arguments as needed to either UTF-8 (for XML,
120983afe33SPhil ShaferJSON, and HTML styles) or locale-based strings for display in text
121983afe33SPhil Shaferstyle::
122983afe33SPhil Shafer
123983afe33SPhil Shafer   xo_emit("All strings are utf-8 content {:tag/%ls}",
124983afe33SPhil Shafer           L"except for wide strings");
125983afe33SPhil Shafer
126983afe33SPhil Shafer  ======== ================== ===============================
127983afe33SPhil Shafer   Format   Argument Type      Argument Contents
128983afe33SPhil Shafer  ======== ================== ===============================
129983afe33SPhil Shafer   %s       const char \*      UTF-8 string
130983afe33SPhil Shafer   %S       const char \*      UTF-8 string (alias for '%ls')
131983afe33SPhil Shafer   %ls      const wchar_t \*   Wide character UNICODE string
132983afe33SPhil Shafer   %hs      const char *       locale-based string
133983afe33SPhil Shafer  ======== ================== ===============================
134983afe33SPhil Shafer
135983afe33SPhil Shafer.. admonition:: "Long", not "locale"
136983afe33SPhil Shafer
137983afe33SPhil Shafer  The "*l*" in "%ls" is for "*long*", following the convention of "%ld".
138983afe33SPhil Shafer  It is not "*locale*", a common mis-mnemonic.  "%S" is equivalent to
139983afe33SPhil Shafer  "%ls".
140983afe33SPhil Shafer
141983afe33SPhil ShaferFor example, the following function is passed a locale-base name, a
142983afe33SPhil Shaferhat size, and a time value.  The hat size is formatted in a UTF-8
143983afe33SPhil Shafer(ASCII) string, and the time value is formatted into a wchar_t
144983afe33SPhil Shaferstring::
145983afe33SPhil Shafer
146983afe33SPhil Shafer    void print_order (const char *name, int size,
147983afe33SPhil Shafer                      struct tm *timep) {
148983afe33SPhil Shafer        char buf[32];
149983afe33SPhil Shafer        const char *size_val = "unknown";
150983afe33SPhil Shafer
151983afe33SPhil Shafer	if (size > 0)
152983afe33SPhil Shafer            snprintf(buf, sizeof(buf), "%d", size);
153983afe33SPhil Shafer            size_val = buf;
154983afe33SPhil Shafer        }
155983afe33SPhil Shafer
156983afe33SPhil Shafer        wchar_t when[32];
157983afe33SPhil Shafer        wcsftime(when, sizeof(when), L"%d%b%y", timep);
158983afe33SPhil Shafer
159983afe33SPhil Shafer        xo_emit("The hat for {:name/%hs} is {:size/%s}.\n",
160983afe33SPhil Shafer                name, size_val);
161983afe33SPhil Shafer        xo_emit("It was ordered on {:order-time/%ls}.\n",
162983afe33SPhil Shafer                when);
163983afe33SPhil Shafer    }
164983afe33SPhil Shafer
165983afe33SPhil ShaferIt is important to note that xo_emit will perform the conversion
166983afe33SPhil Shaferrequired to make appropriate output.  Text style output uses the
167983afe33SPhil Shafercurrent locale (as described above), while XML, JSON, and HTML use
168983afe33SPhil ShaferUTF-8.
169983afe33SPhil Shafer
170983afe33SPhil ShaferUTF-8 and locale-encoded strings can use multiple bytes to encode one
171983afe33SPhil Shafercolumn of data.  The traditional "precision'" (aka "max-width") value
172983afe33SPhil Shaferfor "%s" printf formatting becomes overloaded since it specifies both
173983afe33SPhil Shaferthe number of bytes that can be safely referenced and the maximum
174983afe33SPhil Shafernumber of columns to emit.  xo_emit uses the precision as the former,
175983afe33SPhil Shaferand adds a third value for specifying the maximum number of columns.
176983afe33SPhil Shafer
177983afe33SPhil ShaferIn this example, the name field is printed with a minimum of 3 columns
178983afe33SPhil Shaferand a maximum of 6.  Up to ten bytes of data at the location given by
179983afe33SPhil Shafer'name' are in used in filling those columns::
180983afe33SPhil Shafer
181983afe33SPhil Shafer    xo_emit("{:name/%3.10.6s}", name);
182983afe33SPhil Shafer
183983afe33SPhil ShaferCharacters Outside of Field Definitions
184983afe33SPhil Shafer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
185983afe33SPhil Shafer
186983afe33SPhil ShaferCharacters in the format string that are not part of a field
187983afe33SPhil Shaferdefinition are copied to the output for the TEXT style, and are
188983afe33SPhil Shaferignored for the JSON and XML styles.  For HTML, these characters are
189983afe33SPhil Shaferplaced in a <div> with class "text"::
190983afe33SPhil Shafer
191983afe33SPhil Shafer  EXAMPLE:
192983afe33SPhil Shafer      xo_emit("The hat is {:size/%s}.\n", size_val);
193983afe33SPhil Shafer  TEXT:
194983afe33SPhil Shafer      The hat is extra small.
195983afe33SPhil Shafer  XML:
196983afe33SPhil Shafer      <size>extra small</size>
197983afe33SPhil Shafer  JSON:
198983afe33SPhil Shafer      "size": "extra small"
199983afe33SPhil Shafer  HTML:
200983afe33SPhil Shafer      <div class="text">The hat is </div>
201983afe33SPhil Shafer      <div class="data" data-tag="size">extra small</div>
202983afe33SPhil Shafer      <div class="text">.</div>
203983afe33SPhil Shafer
204983afe33SPhil Shafer.. index:: errno
205983afe33SPhil Shafer
206983afe33SPhil Shafer"%m" Is Supported
207983afe33SPhil Shafer~~~~~~~~~~~~~~~~~
208983afe33SPhil Shafer
209983afe33SPhil Shaferlibxo supports the '%m' directive, which formats the error message
210983afe33SPhil Shaferassociated with the current value of "errno".  It is the equivalent
211983afe33SPhil Shaferof "%s" with the argument strerror(errno)::
212983afe33SPhil Shafer
213983afe33SPhil Shafer    xo_emit("{:filename} cannot be opened: {:error/%m}", filename);
214983afe33SPhil Shafer    xo_emit("{:filename} cannot be opened: {:error/%s}",
215983afe33SPhil Shafer            filename, strerror(errno));
216983afe33SPhil Shafer
217983afe33SPhil Shafer"%n" Is Not Supported
218983afe33SPhil Shafer~~~~~~~~~~~~~~~~~~~~~
219983afe33SPhil Shafer
220983afe33SPhil Shaferlibxo does not support the '%n' directive.  It's a bad idea and we
221983afe33SPhil Shaferjust don't do it.
222983afe33SPhil Shafer
223983afe33SPhil ShaferThe Encoding Format (eformat)
224983afe33SPhil Shafer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
225983afe33SPhil Shafer
226983afe33SPhil ShaferThe "eformat" string is the format string used when encoding the field
227983afe33SPhil Shaferfor JSON and XML.  If not provided, it defaults to the primary format
228983afe33SPhil Shaferwith any minimum width removed.  If the primary is not given, both
229983afe33SPhil Shaferdefault to "%s".
230983afe33SPhil Shafer
231983afe33SPhil ShaferContent Strings
232983afe33SPhil Shafer~~~~~~~~~~~~~~~
233983afe33SPhil Shafer
234983afe33SPhil ShaferFor padding and labels, the content string is considered the content,
235983afe33SPhil Shaferunless a format is given.
236983afe33SPhil Shafer
237983afe33SPhil Shafer.. index:: printf-like
238983afe33SPhil Shafer
239983afe33SPhil ShaferArgument Validation
240983afe33SPhil Shafer~~~~~~~~~~~~~~~~~~~
241983afe33SPhil Shafer
242983afe33SPhil ShaferMany compilers and tool chains support validation of printf-like
243983afe33SPhil Shaferarguments.  When the format string fails to match the argument list,
244983afe33SPhil Shafera warning is generated.  This is a valuable feature and while the
245983afe33SPhil Shaferformatting strings for libxo differ considerably from printf, many of
246983afe33SPhil Shaferthese checks can still provide build-time protection against bugs.
247983afe33SPhil Shafer
248983afe33SPhil Shaferlibxo provide variants of functions that provide this ability, if the
249983afe33SPhil Shafer"--enable-printflike" option is passed to the "configure" script.
250983afe33SPhil ShaferThese functions use the "_p" suffix, like "xo_emit_p()",
251983afe33SPhil Shaferxo_emit_hp()", etc.
252983afe33SPhil Shafer
253983afe33SPhil ShaferThe following are features of libxo formatting strings that are
254983afe33SPhil Shaferincompatible with printf-like testing:
255983afe33SPhil Shafer
256983afe33SPhil Shafer- implicit formats, where "{:tag}" has an implicit "%s";
257983afe33SPhil Shafer- the "max" parameter for strings, where "{:tag/%4.10.6s}" means up to
258983afe33SPhil Shafer  ten bytes of data can be inspected to fill a minimum of 4 columns and
259983afe33SPhil Shafer  a maximum of 6;
260983afe33SPhil Shafer- percent signs in strings, where "{:filled}%" makes a single,
261983afe33SPhil Shafer  trailing percent sign;
262983afe33SPhil Shafer- the "l" and "h" modifiers for strings, where "{:tag/%hs}" means
263983afe33SPhil Shafer  locale-based string and "{:tag/%ls}" means a wide character string;
264983afe33SPhil Shafer- distinct encoding formats, where "{:tag/#%s/%s}" means the display
265983afe33SPhil Shafer  styles (text and HTML) will use "#%s" where other styles use "%s";
266983afe33SPhil Shafer
267983afe33SPhil ShaferIf none of these features are in use by your code, then using the "_p"
268983afe33SPhil Shafervariants might be wise:
269983afe33SPhil Shafer
270983afe33SPhil Shafer  ================== ========================
271983afe33SPhil Shafer   Function           printf-like Equivalent
272983afe33SPhil Shafer  ================== ========================
273983afe33SPhil Shafer   xo_emit_hv         xo_emit_hvp
274983afe33SPhil Shafer   xo_emit_h          xo_emit_hp
275983afe33SPhil Shafer   xo_emit            xo_emit_p
276983afe33SPhil Shafer   xo_emit_warn_hcv   xo_emit_warn_hcvp
277983afe33SPhil Shafer   xo_emit_warn_hc    xo_emit_warn_hcp
278983afe33SPhil Shafer   xo_emit_warn_c     xo_emit_warn_cp
279983afe33SPhil Shafer   xo_emit_warn       xo_emit_warn_p
280983afe33SPhil Shafer   xo_emit_warnx      xo_emit_warnx_p
281983afe33SPhil Shafer   xo_emit_err        xo_emit_err_p
282983afe33SPhil Shafer   xo_emit_errx       xo_emit_errx_p
283983afe33SPhil Shafer   xo_emit_errc       xo_emit_errc_p
284983afe33SPhil Shafer  ================== ========================
285983afe33SPhil Shafer
286983afe33SPhil Shafer.. index:: performance
287983afe33SPhil Shafer.. index:: XOEF_RETAIN
288983afe33SPhil Shafer
289983afe33SPhil Shafer.. _retain:
290983afe33SPhil Shafer
291983afe33SPhil ShaferRetaining Parsed Format Information
292983afe33SPhil Shafer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
293983afe33SPhil Shafer
294983afe33SPhil Shaferlibxo can retain the parsed internal information related to the given
295983afe33SPhil Shaferformat string, allowing subsequent xo_emit calls, the retained
296983afe33SPhil Shaferinformation is used, avoiding repetitive parsing of the format string::
297983afe33SPhil Shafer
298983afe33SPhil Shafer    SYNTAX:
299983afe33SPhil Shafer      int xo_emit_f(xo_emit_flags_t flags, const char fmt, ...);
300983afe33SPhil Shafer    EXAMPLE:
301983afe33SPhil Shafer      xo_emit_f(XOEF_RETAIN, "{:some/%02d}{:thing/%-6s}{:fancy}\n",
302983afe33SPhil Shafer                     some, thing, fancy);
303983afe33SPhil Shafer
304983afe33SPhil ShaferTo retain parsed format information, use the XOEF_RETAIN flag to the
305983afe33SPhil Shaferxo_emit_f() function.  A complete set of xo_emit_f functions exist to
306983afe33SPhil Shafermatch all the xo_emit function signatures (with handles, varadic
307983afe33SPhil Shaferargument, and printf-like flags):
308983afe33SPhil Shafer
309983afe33SPhil Shafer  ================== ========================
310983afe33SPhil Shafer   Function           Flags Equivalent
311983afe33SPhil Shafer  ================== ========================
312983afe33SPhil Shafer   xo_emit_hv         xo_emit_hvf
313983afe33SPhil Shafer   xo_emit_h          xo_emit_hf
314983afe33SPhil Shafer   xo_emit            xo_emit_f
315983afe33SPhil Shafer   xo_emit_hvp        xo_emit_hvfp
316983afe33SPhil Shafer   xo_emit_hp         xo_emit_hfp
317983afe33SPhil Shafer   xo_emit_p          xo_emit_fp
318983afe33SPhil Shafer  ================== ========================
319983afe33SPhil Shafer
320983afe33SPhil ShaferThe format string must be immutable across multiple calls to xo_emit_f(),
321983afe33SPhil Shafersince the library retains the string.  Typically this is done by using
322983afe33SPhil Shaferstatic constant strings, such as string literals. If the string is not
323983afe33SPhil Shaferimmutable, the XOEF_RETAIN flag must not be used.
324983afe33SPhil Shafer
325983afe33SPhil ShaferThe functions xo_retain_clear() and xo_retain_clear_all() release
326983afe33SPhil Shaferinternal information on either a single format string or all format
327983afe33SPhil Shaferstrings, respectively.  Neither is required, but the library will
328983afe33SPhil Shaferretain this information until it is cleared or the process exits::
329983afe33SPhil Shafer
330983afe33SPhil Shafer    const char *fmt = "{:name}  {:count/%d}\n";
331983afe33SPhil Shafer    for (i = 0; i < 1000; i++) {
332983afe33SPhil Shafer        xo_open_instance("item");
333983afe33SPhil Shafer        xo_emit_f(XOEF_RETAIN, fmt, name[i], count[i]);
334983afe33SPhil Shafer    }
335983afe33SPhil Shafer    xo_retain_clear(fmt);
336983afe33SPhil Shafer
337983afe33SPhil ShaferThe retained information is kept as thread-specific data.
338983afe33SPhil Shafer
339983afe33SPhil ShaferExample
340983afe33SPhil Shafer~~~~~~~
341983afe33SPhil Shafer
342983afe33SPhil ShaferIn this example, the value for the number of items in stock is emitted::
343983afe33SPhil Shafer
344983afe33SPhil Shafer        xo_emit("{P:   }{Lwc:In stock}{:in-stock/%u}\n",
345983afe33SPhil Shafer                instock);
346983afe33SPhil Shafer
347983afe33SPhil ShaferThis call will generate the following output::
348983afe33SPhil Shafer
349983afe33SPhil Shafer  TEXT:
350983afe33SPhil Shafer       In stock: 144
351983afe33SPhil Shafer  XML:
352983afe33SPhil Shafer      <in-stock>144</in-stock>
353983afe33SPhil Shafer  JSON:
354983afe33SPhil Shafer      "in-stock": 144,
355983afe33SPhil Shafer  HTML:
356983afe33SPhil Shafer      <div class="line">
357983afe33SPhil Shafer        <div class="padding">   </div>
358983afe33SPhil Shafer        <div class="label">In stock</div>
359983afe33SPhil Shafer        <div class="decoration">:</div>
360983afe33SPhil Shafer        <div class="padding"> </div>
361983afe33SPhil Shafer        <div class="data" data-tag="in-stock">144</div>
362983afe33SPhil Shafer      </div>
363983afe33SPhil Shafer
364983afe33SPhil ShaferClearly HTML wins the verbosity award, and this output does
365983afe33SPhil Shafernot include XOF_XPATH or XOF_INFO data, which would expand the
366983afe33SPhil Shaferpenultimate line to::
367983afe33SPhil Shafer
368983afe33SPhil Shafer       <div class="data" data-tag="in-stock"
369983afe33SPhil Shafer          data-xpath="/top/data/item/in-stock"
370983afe33SPhil Shafer          data-type="number"
371983afe33SPhil Shafer          data-help="Number of items in stock">144</div>
372