xref: /freebsd/contrib/libxo/doc/field-formatting.rst (revision 9f23cbd6cae82fd77edfad7173432fa8dccd0a95)
1
2.. index:: Field Formatting
3.. _field-formatting:
4
5Field Formatting
6----------------
7
8The field format is similar to the format string for printf(3).  Its
9use varies based on the role of the field, but generally is used to
10format the field's contents.
11
12If the format string is not provided for a value field, it defaults to
13"%s".
14
15Note a field definition can contain zero or more printf-style
16'directives', which are sequences that start with a '%' and end with
17one of following characters: "diouxXDOUeEfFgGaAcCsSp".  Each directive
18is matched by one of more arguments to the xo_emit function.
19
20The format string has the form::
21
22  '%' format-modifier * format-character
23
24The format-modifier can be:
25
26- a '#' character, indicating the output value should be prefixed
27  with '0x', typically to indicate a base 16 (hex) value.
28- a minus sign ('-'), indicating the output value should be padded on
29  the right instead of the left.
30- a leading zero ('0') indicating the output value should be padded on the
31  left with zeroes instead of spaces (' ').
32- one or more digits ('0' - '9') indicating the minimum width of the
33  argument.  If the width in columns of the output value is less than
34  the minimum width, the value will be padded to reach the minimum.
35- a period followed by one or more digits indicating the maximum
36  number of bytes which will be examined for a string argument, or the maximum
37  width for a non-string argument.  When handling ASCII strings this
38  functions as the field width but for multi-byte characters, a single
39  character may be composed of multiple bytes.
40  xo_emit will never dereference memory beyond the given number of bytes.
41- a second period followed by one or more digits indicating the maximum
42  width for a string argument.  This modifier cannot be given for non-string
43  arguments.
44- one or more 'h' characters, indicating shorter input data.
45- one or more 'l' characters, indicating longer input data.
46- a 'z' character, indicating a 'size_t' argument.
47- a 't' character, indicating a 'ptrdiff_t' argument.
48- a ' ' character, indicating a space should be emitted before
49  positive numbers.
50- a '+' character, indicating sign should emitted before any number.
51
52Note that 'q', 'D', 'O', and 'U' are considered deprecated and will be
53removed eventually.
54
55The format character is described in the following table:
56
57  ===== ================= ======================
58   Ltr   Argument Type     Format
59  ===== ================= ======================
60   d     int               base 10 (decimal)
61   i     int               base 10 (decimal)
62   o     int               base 8 (octal)
63   u     unsigned          base 10 (decimal)
64   x     unsigned          base 16 (hex)
65   X     unsigned long     base 16 (hex)
66   D     long              base 10 (decimal)
67   O     unsigned long     base 8 (octal)
68   U     unsigned long     base 10 (decimal)
69   e     double            [-]d.ddde+-dd
70   E     double            [-]d.dddE+-dd
71   f     double            [-]ddd.ddd
72   F     double            [-]ddd.ddd
73   g     double            as 'e' or 'f'
74   G     double            as 'E' or 'F'
75   a     double            [-]0xh.hhhp[+-]d
76   A     double            [-]0Xh.hhhp[+-]d
77   c     unsigned char     a character
78   C     wint_t            a character
79   s     char \*           a UTF-8 string
80   S     wchar_t \*        a unicode/WCS string
81   p     void \*           '%#lx'
82  ===== ================= ======================
83
84The 'h' and 'l' modifiers affect the size and treatment of the
85argument:
86
87  ===== ============= ====================
88   Mod   d, i          o, u, x, X
89  ===== ============= ====================
90   hh    signed char   unsigned char
91   h     short         unsigned short
92   l     long          unsigned long
93   ll    long long     unsigned long long
94   j     intmax_t      uintmax_t
95   t     ptrdiff_t     ptrdiff_t
96   z     size_t        size_t
97   q     quad_t        u_quad_t
98  ===== ============= ====================
99
100.. index:: UTF-8
101.. index:: Locale
102
103.. _utf-8:
104
105UTF-8 and Locale Strings
106~~~~~~~~~~~~~~~~~~~~~~~~
107
108For strings, the 'h' and 'l' modifiers affect the interpretation of
109the bytes pointed to argument.  The default '%s' string is a 'char \*'
110pointer to a string encoded as UTF-8.  Since UTF-8 is compatible with
111ASCII data, a normal 7-bit ASCII string can be used.  '%ls' expects a
112'wchar_t \*' pointer to a wide-character string, encoded as a 32-bit
113Unicode values.  '%hs' expects a 'char \*' pointer to a multi-byte
114string encoded with the current locale, as given by the LC_CTYPE,
115LANG, or LC_ALL environment varibles.  The first of this list of
116variables is used and if none of the variables are set, the locale
117defaults to "UTF-8".
118
119libxo will convert these arguments as needed to either UTF-8 (for XML,
120JSON, and HTML styles) or locale-based strings for display in text
121style::
122
123   xo_emit("All strings are utf-8 content {:tag/%ls}",
124           L"except for wide strings");
125
126  ======== ================== ===============================
127   Format   Argument Type      Argument Contents
128  ======== ================== ===============================
129   %s       const char \*      UTF-8 string
130   %S       const char \*      UTF-8 string (alias for '%ls')
131   %ls      const wchar_t \*   Wide character UNICODE string
132   %hs      const char *       locale-based string
133  ======== ================== ===============================
134
135.. admonition:: "Long", not "locale"
136
137  The "*l*" in "%ls" is for "*long*", following the convention of "%ld".
138  It is not "*locale*", a common mis-mnemonic.  "%S" is equivalent to
139  "%ls".
140
141For example, the following function is passed a locale-base name, a
142hat size, and a time value.  The hat size is formatted in a UTF-8
143(ASCII) string, and the time value is formatted into a wchar_t
144string::
145
146    void print_order (const char *name, int size,
147                      struct tm *timep) {
148        char buf[32];
149        const char *size_val = "unknown";
150
151	if (size > 0)
152            snprintf(buf, sizeof(buf), "%d", size);
153            size_val = buf;
154        }
155
156        wchar_t when[32];
157        wcsftime(when, sizeof(when), L"%d%b%y", timep);
158
159        xo_emit("The hat for {:name/%hs} is {:size/%s}.\n",
160                name, size_val);
161        xo_emit("It was ordered on {:order-time/%ls}.\n",
162                when);
163    }
164
165It is important to note that xo_emit will perform the conversion
166required to make appropriate output.  Text style output uses the
167current locale (as described above), while XML, JSON, and HTML use
168UTF-8.
169
170UTF-8 and locale-encoded strings can use multiple bytes to encode one
171column of data.  The traditional "precision'" (aka "max-width") value
172for "%s" printf formatting becomes overloaded since it specifies both
173the number of bytes that can be safely referenced and the maximum
174number of columns to emit.  xo_emit uses the precision as the former,
175and adds a third value for specifying the maximum number of columns.
176
177In this example, the name field is printed with a minimum of 3 columns
178and a maximum of 6.  Up to ten bytes of data at the location given by
179'name' are in used in filling those columns::
180
181    xo_emit("{:name/%3.10.6s}", name);
182
183Characters Outside of Field Definitions
184~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
185
186Characters in the format string that are not part of a field
187definition are copied to the output for the TEXT style, and are
188ignored for the JSON and XML styles.  For HTML, these characters are
189placed in a <div> with class "text"::
190
191  EXAMPLE:
192      xo_emit("The hat is {:size/%s}.\n", size_val);
193  TEXT:
194      The hat is extra small.
195  XML:
196      <size>extra small</size>
197  JSON:
198      "size": "extra small"
199  HTML:
200      <div class="text">The hat is </div>
201      <div class="data" data-tag="size">extra small</div>
202      <div class="text">.</div>
203
204.. index:: errno
205
206"%m" Is Supported
207~~~~~~~~~~~~~~~~~
208
209libxo supports the '%m' directive, which formats the error message
210associated with the current value of "errno".  It is the equivalent
211of "%s" with the argument strerror(errno)::
212
213    xo_emit("{:filename} cannot be opened: {:error/%m}", filename);
214    xo_emit("{:filename} cannot be opened: {:error/%s}",
215            filename, strerror(errno));
216
217"%n" Is Not Supported
218~~~~~~~~~~~~~~~~~~~~~
219
220libxo does not support the '%n' directive.  It's a bad idea and we
221just don't do it.
222
223The Encoding Format (eformat)
224~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
225
226The "eformat" string is the format string used when encoding the field
227for JSON and XML.  If not provided, it defaults to the primary format
228with any minimum width removed.  If the primary is not given, both
229default to "%s".
230
231Content Strings
232~~~~~~~~~~~~~~~
233
234For padding and labels, the content string is considered the content,
235unless a format is given.
236
237.. index:: printf-like
238
239Argument Validation
240~~~~~~~~~~~~~~~~~~~
241
242Many compilers and tool chains support validation of printf-like
243arguments.  When the format string fails to match the argument list,
244a warning is generated.  This is a valuable feature and while the
245formatting strings for libxo differ considerably from printf, many of
246these checks can still provide build-time protection against bugs.
247
248libxo provide variants of functions that provide this ability, if the
249"--enable-printflike" option is passed to the "configure" script.
250These functions use the "_p" suffix, like "xo_emit_p()",
251xo_emit_hp()", etc.
252
253The following are features of libxo formatting strings that are
254incompatible with printf-like testing:
255
256- implicit formats, where "{:tag}" has an implicit "%s";
257- the "max" parameter for strings, where "{:tag/%4.10.6s}" means up to
258  ten bytes of data can be inspected to fill a minimum of 4 columns and
259  a maximum of 6;
260- percent signs in strings, where "{:filled}%" makes a single,
261  trailing percent sign;
262- the "l" and "h" modifiers for strings, where "{:tag/%hs}" means
263  locale-based string and "{:tag/%ls}" means a wide character string;
264- distinct encoding formats, where "{:tag/#%s/%s}" means the display
265  styles (text and HTML) will use "#%s" where other styles use "%s";
266
267If none of these features are in use by your code, then using the "_p"
268variants might be wise:
269
270  ================== ========================
271   Function           printf-like Equivalent
272  ================== ========================
273   xo_emit_hv         xo_emit_hvp
274   xo_emit_h          xo_emit_hp
275   xo_emit            xo_emit_p
276   xo_emit_warn_hcv   xo_emit_warn_hcvp
277   xo_emit_warn_hc    xo_emit_warn_hcp
278   xo_emit_warn_c     xo_emit_warn_cp
279   xo_emit_warn       xo_emit_warn_p
280   xo_emit_warnx      xo_emit_warnx_p
281   xo_emit_err        xo_emit_err_p
282   xo_emit_errx       xo_emit_errx_p
283   xo_emit_errc       xo_emit_errc_p
284  ================== ========================
285
286.. index:: performance
287.. index:: XOEF_RETAIN
288
289.. _retain:
290
291Retaining Parsed Format Information
292~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
293
294libxo can retain the parsed internal information related to the given
295format string, allowing subsequent xo_emit calls, the retained
296information is used, avoiding repetitive parsing of the format string::
297
298    SYNTAX:
299      int xo_emit_f(xo_emit_flags_t flags, const char fmt, ...);
300    EXAMPLE:
301      xo_emit_f(XOEF_RETAIN, "{:some/%02d}{:thing/%-6s}{:fancy}\n",
302                     some, thing, fancy);
303
304To retain parsed format information, use the XOEF_RETAIN flag to the
305xo_emit_f() function.  A complete set of xo_emit_f functions exist to
306match all the xo_emit function signatures (with handles, varadic
307argument, and printf-like flags):
308
309  ================== ========================
310   Function           Flags Equivalent
311  ================== ========================
312   xo_emit_hv         xo_emit_hvf
313   xo_emit_h          xo_emit_hf
314   xo_emit            xo_emit_f
315   xo_emit_hvp        xo_emit_hvfp
316   xo_emit_hp         xo_emit_hfp
317   xo_emit_p          xo_emit_fp
318  ================== ========================
319
320The format string must be immutable across multiple calls to xo_emit_f(),
321since the library retains the string.  Typically this is done by using
322static constant strings, such as string literals. If the string is not
323immutable, the XOEF_RETAIN flag must not be used.
324
325The functions xo_retain_clear() and xo_retain_clear_all() release
326internal information on either a single format string or all format
327strings, respectively.  Neither is required, but the library will
328retain this information until it is cleared or the process exits::
329
330    const char *fmt = "{:name}  {:count/%d}\n";
331    for (i = 0; i < 1000; i++) {
332        xo_open_instance("item");
333        xo_emit_f(XOEF_RETAIN, fmt, name[i], count[i]);
334    }
335    xo_retain_clear(fmt);
336
337The retained information is kept as thread-specific data.
338
339Example
340~~~~~~~
341
342In this example, the value for the number of items in stock is emitted::
343
344        xo_emit("{P:   }{Lwc:In stock}{:in-stock/%u}\n",
345                instock);
346
347This call will generate the following output::
348
349  TEXT:
350       In stock: 144
351  XML:
352      <in-stock>144</in-stock>
353  JSON:
354      "in-stock": 144,
355  HTML:
356      <div class="line">
357        <div class="padding">   </div>
358        <div class="label">In stock</div>
359        <div class="decoration">:</div>
360        <div class="padding"> </div>
361        <div class="data" data-tag="in-stock">144</div>
362      </div>
363
364Clearly HTML wins the verbosity award, and this output does
365not include XOF_XPATH or XOF_INFO data, which would expand the
366penultimate line to::
367
368       <div class="data" data-tag="in-stock"
369          data-xpath="/top/data/item/in-stock"
370          data-type="number"
371          data-help="Number of items in stock">144</div>
372