xref: /freebsd/contrib/libxo/doc/field-formatting.rst (revision b3512b30dbec579da28028e29d8b33ec7242af68)
1
2.. index:: Field Formatting
3
4Field Formatting
5----------------
6
7The field format is similar to the format string for printf(3).  Its
8use varies based on the role of the field, but generally is used to
9format the field's contents.
10
11If the format string is not provided for a value field, it defaults to
12"%s".
13
14Note a field definition can contain zero or more printf-style
15'directives', which are sequences that start with a '%' and end with
16one of following characters: "diouxXDOUeEfFgGaAcCsSp".  Each directive
17is matched by one of more arguments to the xo_emit function.
18
19The format string has the form::
20
21  '%' format-modifier * format-character
22
23The format-modifier can be:
24
25- a '#' character, indicating the output value should be prefixed
26  with '0x', typically to indicate a base 16 (hex) value.
27- a minus sign ('-'), indicating the output value should be padded on
28  the right instead of the left.
29- a leading zero ('0') indicating the output value should be padded on the
30  left with zeroes instead of spaces (' ').
31- one or more digits ('0' - '9') indicating the minimum width of the
32  argument.  If the width in columns of the output value is less than
33  the minimum width, the value will be padded to reach the minimum.
34- a period followed by one or more digits indicating the maximum
35  number of bytes which will be examined for a string argument, or the maximum
36  width for a non-string argument.  When handling ASCII strings this
37  functions as the field width but for multi-byte characters, a single
38  character may be composed of multiple bytes.
39  xo_emit will never dereference memory beyond the given number of bytes.
40- a second period followed by one or more digits indicating the maximum
41  width for a string argument.  This modifier cannot be given for non-string
42  arguments.
43- one or more 'h' characters, indicating shorter input data.
44- one or more 'l' characters, indicating longer input data.
45- a 'z' character, indicating a 'size_t' argument.
46- a 't' character, indicating a 'ptrdiff_t' argument.
47- a ' ' character, indicating a space should be emitted before
48  positive numbers.
49- a '+' character, indicating sign should emitted before any number.
50
51Note that 'q', 'D', 'O', and 'U' are considered deprecated and will be
52removed eventually.
53
54The format character is described in the following table:
55
56  ===== ================= ======================
57   Ltr   Argument Type     Format
58  ===== ================= ======================
59   d     int               base 10 (decimal)
60   i     int               base 10 (decimal)
61   o     int               base 8 (octal)
62   u     unsigned          base 10 (decimal)
63   x     unsigned          base 16 (hex)
64   X     unsigned long     base 16 (hex)
65   D     long              base 10 (decimal)
66   O     unsigned long     base 8 (octal)
67   U     unsigned long     base 10 (decimal)
68   e     double            [-]d.ddde+-dd
69   E     double            [-]d.dddE+-dd
70   f     double            [-]ddd.ddd
71   F     double            [-]ddd.ddd
72   g     double            as 'e' or 'f'
73   G     double            as 'E' or 'F'
74   a     double            [-]0xh.hhhp[+-]d
75   A     double            [-]0Xh.hhhp[+-]d
76   c     unsigned char     a character
77   C     wint_t            a character
78   s     char \*           a UTF-8 string
79   S     wchar_t \*        a unicode/WCS string
80   p     void \*           '%#lx'
81  ===== ================= ======================
82
83The 'h' and 'l' modifiers affect the size and treatment of the
84argument:
85
86  ===== ============= ====================
87   Mod   d, i          o, u, x, X
88  ===== ============= ====================
89   hh    signed char   unsigned char
90   h     short         unsigned short
91   l     long          unsigned long
92   ll    long long     unsigned long long
93   j     intmax_t      uintmax_t
94   t     ptrdiff_t     ptrdiff_t
95   z     size_t        size_t
96   q     quad_t        u_quad_t
97  ===== ============= ====================
98
99.. index:: UTF-8
100.. index:: Locale
101
102.. _utf-8:
103
104UTF-8 and Locale Strings
105~~~~~~~~~~~~~~~~~~~~~~~~
106
107For strings, the 'h' and 'l' modifiers affect the interpretation of
108the bytes pointed to argument.  The default '%s' string is a 'char \*'
109pointer to a string encoded as UTF-8.  Since UTF-8 is compatible with
110ASCII data, a normal 7-bit ASCII string can be used.  '%ls' expects a
111'wchar_t \*' pointer to a wide-character string, encoded as a 32-bit
112Unicode values.  '%hs' expects a 'char \*' pointer to a multi-byte
113string encoded with the current locale, as given by the LC_CTYPE,
114LANG, or LC_ALL environment varibles.  The first of this list of
115variables is used and if none of the variables are set, the locale
116defaults to "UTF-8".
117
118libxo will convert these arguments as needed to either UTF-8 (for XML,
119JSON, and HTML styles) or locale-based strings for display in text
120style::
121
122   xo_emit("All strings are utf-8 content {:tag/%ls}",
123           L"except for wide strings");
124
125  ======== ================== ===============================
126   Format   Argument Type      Argument Contents
127  ======== ================== ===============================
128   %s       const char \*      UTF-8 string
129   %S       const char \*      UTF-8 string (alias for '%ls')
130   %ls      const wchar_t \*   Wide character UNICODE string
131   %hs      const char *       locale-based string
132  ======== ================== ===============================
133
134.. admonition:: "Long", not "locale"
135
136  The "*l*" in "%ls" is for "*long*", following the convention of "%ld".
137  It is not "*locale*", a common mis-mnemonic.  "%S" is equivalent to
138  "%ls".
139
140For example, the following function is passed a locale-base name, a
141hat size, and a time value.  The hat size is formatted in a UTF-8
142(ASCII) string, and the time value is formatted into a wchar_t
143string::
144
145    void print_order (const char *name, int size,
146                      struct tm *timep) {
147        char buf[32];
148        const char *size_val = "unknown";
149
150	if (size > 0)
151            snprintf(buf, sizeof(buf), "%d", size);
152            size_val = buf;
153        }
154
155        wchar_t when[32];
156        wcsftime(when, sizeof(when), L"%d%b%y", timep);
157
158        xo_emit("The hat for {:name/%hs} is {:size/%s}.\n",
159                name, size_val);
160        xo_emit("It was ordered on {:order-time/%ls}.\n",
161                when);
162    }
163
164It is important to note that xo_emit will perform the conversion
165required to make appropriate output.  Text style output uses the
166current locale (as described above), while XML, JSON, and HTML use
167UTF-8.
168
169UTF-8 and locale-encoded strings can use multiple bytes to encode one
170column of data.  The traditional "precision'" (aka "max-width") value
171for "%s" printf formatting becomes overloaded since it specifies both
172the number of bytes that can be safely referenced and the maximum
173number of columns to emit.  xo_emit uses the precision as the former,
174and adds a third value for specifying the maximum number of columns.
175
176In this example, the name field is printed with a minimum of 3 columns
177and a maximum of 6.  Up to ten bytes of data at the location given by
178'name' are in used in filling those columns::
179
180    xo_emit("{:name/%3.10.6s}", name);
181
182Characters Outside of Field Definitions
183~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
184
185Characters in the format string that are not part of a field
186definition are copied to the output for the TEXT style, and are
187ignored for the JSON and XML styles.  For HTML, these characters are
188placed in a <div> with class "text"::
189
190  EXAMPLE:
191      xo_emit("The hat is {:size/%s}.\n", size_val);
192  TEXT:
193      The hat is extra small.
194  XML:
195      <size>extra small</size>
196  JSON:
197      "size": "extra small"
198  HTML:
199      <div class="text">The hat is </div>
200      <div class="data" data-tag="size">extra small</div>
201      <div class="text">.</div>
202
203.. index:: errno
204
205"%m" Is Supported
206~~~~~~~~~~~~~~~~~
207
208libxo supports the '%m' directive, which formats the error message
209associated with the current value of "errno".  It is the equivalent
210of "%s" with the argument strerror(errno)::
211
212    xo_emit("{:filename} cannot be opened: {:error/%m}", filename);
213    xo_emit("{:filename} cannot be opened: {:error/%s}",
214            filename, strerror(errno));
215
216"%n" Is Not Supported
217~~~~~~~~~~~~~~~~~~~~~
218
219libxo does not support the '%n' directive.  It's a bad idea and we
220just don't do it.
221
222The Encoding Format (eformat)
223~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
224
225The "eformat" string is the format string used when encoding the field
226for JSON and XML.  If not provided, it defaults to the primary format
227with any minimum width removed.  If the primary is not given, both
228default to "%s".
229
230Content Strings
231~~~~~~~~~~~~~~~
232
233For padding and labels, the content string is considered the content,
234unless a format is given.
235
236.. index:: printf-like
237
238Argument Validation
239~~~~~~~~~~~~~~~~~~~
240
241Many compilers and tool chains support validation of printf-like
242arguments.  When the format string fails to match the argument list,
243a warning is generated.  This is a valuable feature and while the
244formatting strings for libxo differ considerably from printf, many of
245these checks can still provide build-time protection against bugs.
246
247libxo provide variants of functions that provide this ability, if the
248"--enable-printflike" option is passed to the "configure" script.
249These functions use the "_p" suffix, like "xo_emit_p()",
250xo_emit_hp()", etc.
251
252The following are features of libxo formatting strings that are
253incompatible with printf-like testing:
254
255- implicit formats, where "{:tag}" has an implicit "%s";
256- the "max" parameter for strings, where "{:tag/%4.10.6s}" means up to
257  ten bytes of data can be inspected to fill a minimum of 4 columns and
258  a maximum of 6;
259- percent signs in strings, where "{:filled}%" makes a single,
260  trailing percent sign;
261- the "l" and "h" modifiers for strings, where "{:tag/%hs}" means
262  locale-based string and "{:tag/%ls}" means a wide character string;
263- distinct encoding formats, where "{:tag/#%s/%s}" means the display
264  styles (text and HTML) will use "#%s" where other styles use "%s";
265
266If none of these features are in use by your code, then using the "_p"
267variants might be wise:
268
269  ================== ========================
270   Function           printf-like Equivalent
271  ================== ========================
272   xo_emit_hv         xo_emit_hvp
273   xo_emit_h          xo_emit_hp
274   xo_emit            xo_emit_p
275   xo_emit_warn_hcv   xo_emit_warn_hcvp
276   xo_emit_warn_hc    xo_emit_warn_hcp
277   xo_emit_warn_c     xo_emit_warn_cp
278   xo_emit_warn       xo_emit_warn_p
279   xo_emit_warnx      xo_emit_warnx_p
280   xo_emit_err        xo_emit_err_p
281   xo_emit_errx       xo_emit_errx_p
282   xo_emit_errc       xo_emit_errc_p
283  ================== ========================
284
285.. index:: performance
286.. index:: XOEF_RETAIN
287
288.. _retain:
289
290Retaining Parsed Format Information
291~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
292
293libxo can retain the parsed internal information related to the given
294format string, allowing subsequent xo_emit calls, the retained
295information is used, avoiding repetitive parsing of the format string::
296
297    SYNTAX:
298      int xo_emit_f(xo_emit_flags_t flags, const char fmt, ...);
299    EXAMPLE:
300      xo_emit_f(XOEF_RETAIN, "{:some/%02d}{:thing/%-6s}{:fancy}\n",
301                     some, thing, fancy);
302
303To retain parsed format information, use the XOEF_RETAIN flag to the
304xo_emit_f() function.  A complete set of xo_emit_f functions exist to
305match all the xo_emit function signatures (with handles, varadic
306argument, and printf-like flags):
307
308  ================== ========================
309   Function           Flags Equivalent
310  ================== ========================
311   xo_emit_hv         xo_emit_hvf
312   xo_emit_h          xo_emit_hf
313   xo_emit            xo_emit_f
314   xo_emit_hvp        xo_emit_hvfp
315   xo_emit_hp         xo_emit_hfp
316   xo_emit_p          xo_emit_fp
317  ================== ========================
318
319The format string must be immutable across multiple calls to xo_emit_f(),
320since the library retains the string.  Typically this is done by using
321static constant strings, such as string literals. If the string is not
322immutable, the XOEF_RETAIN flag must not be used.
323
324The functions xo_retain_clear() and xo_retain_clear_all() release
325internal information on either a single format string or all format
326strings, respectively.  Neither is required, but the library will
327retain this information until it is cleared or the process exits::
328
329    const char *fmt = "{:name}  {:count/%d}\n";
330    for (i = 0; i < 1000; i++) {
331        xo_open_instance("item");
332        xo_emit_f(XOEF_RETAIN, fmt, name[i], count[i]);
333    }
334    xo_retain_clear(fmt);
335
336The retained information is kept as thread-specific data.
337
338Example
339~~~~~~~
340
341In this example, the value for the number of items in stock is emitted::
342
343        xo_emit("{P:   }{Lwc:In stock}{:in-stock/%u}\n",
344                instock);
345
346This call will generate the following output::
347
348  TEXT:
349       In stock: 144
350  XML:
351      <in-stock>144</in-stock>
352  JSON:
353      "in-stock": 144,
354  HTML:
355      <div class="line">
356        <div class="padding">   </div>
357        <div class="label">In stock</div>
358        <div class="decoration">:</div>
359        <div class="padding"> </div>
360        <div class="data" data-tag="in-stock">144</div>
361      </div>
362
363Clearly HTML wins the verbosity award, and this output does
364not include XOF_XPATH or XOF_INFO data, which would expand the
365penultimate line to::
366
367       <div class="data" data-tag="in-stock"
368          data-xpath="/top/data/item/in-stock"
369          data-type="number"
370          data-help="Number of items in stock">144</div>
371