xref: /illumos-gate/usr/src/man/man5/ctf.5 (revision 20a7641f9918de8574b8b3b47dbe35c4bfc78df1)
1.\"
2.\" This file and its contents are supplied under the terms of the
3.\" Common Development and Distribution License ("CDDL"), version 1.0.
4.\" You may only use this file in accordance with the terms of version
5.\" 1.0 of the CDDL.
6.\"
7.\" A full copy of the text of the CDDL should have accompanied this
8.\" source.  A copy of the CDDL is also available via the Internet at
9.\" http://www.illumos.org/license/CDDL.
10.\"
11.\"
12.\" Copyright (c) 2014 Joyent, Inc.
13.\"
14.Dd Sep 26, 2014
15.Dt CTF 5
16.Os
17.Sh NAME
18.Nm ctf
19.Nd Compact C Type Format
20.Sh SYNOPSIS
21.In sys/ctf.h
22.Sh DESCRIPTION
23.Nm
24is designed to be a compact representation of the C programming
25language's type information focused on serving the needs of dynamic
26tracing, debuggers, and other in-situ and post-mortem introspection
27tools.
28.Nm
29data is generally included in
30.Sy ELF
31objects and is tagged as
32.Sy SHT_PROGBITS
33to ensure that the data is accessible in a running process and in subsequent
34core dumps, if generated.
35.Lp
36The
37.Nm
38data contained in each file has information about the layout and
39sizes of C types, including intrinsic types, enumerations, structures,
40typedefs, and unions, that are used by the corresponding
41.Sy ELF
42object.
43The
44.Nm
45data may also include information about the types of global objects and
46the return type and arguments of functions in the symbol table.
47.Lp
48Because a
49.Nm
50file is often embedded inside a file, rather than being a standalone
51file itself, it may also be referred to as a
52.Nm
53.Sy container .
54.Lp
55On illumos systems,
56.Nm
57data is consumed by multiple programs.
58It can be used by the modular debugger,
59.Xr mdb 1 ,
60as well as by
61.Xr dtrace 8 .
62Programmatic access to
63.Nm
64data can be obtained through
65.Xr libctf 3LIB .
66.Lp
67The
68.Nm
69file format is broken down into seven different sections.
70The first section is the
71.Sy preamble
72and
73.Sy header ,
74which describes the version of the
75.Nm
76file, links it has to other
77.Nm
78files, and the sizes of the other sections.
79The next section is the
80.Sy label
81section,
82which provides a way of identifying similar groups of
83.Nm
84data across multiple files.
85This is followed by the
86.Sy object
87information section, which describes the type of global
88symbols.
89The subsequent section is the
90.Sy function
91information section, which describes the return
92types and arguments of functions.
93The next section is the
94.Sy type
95information section, which describes
96the format and layout of the C types themselves, and finally the last
97section is the
98.Sy string
99section, which contains the names of types, enumerations, members, and
100labels.
101.Lp
102While strictly speaking, only the
103.Sy preamble
104and
105.Sy header
106are required, to be actually useful, both the type and string
107sections are necessary.
108.Lp
109A
110.Nm
111file may contain all of the type information that it requires, or it
112may optionally refer to another
113.Nm
114file which holds the remaining types.
115When a
116.Nm
117file refers to another file, it is called the
118.Sy child
119and the file it refers to is called the
120.Sy parent .
121A given file may only refer to one parent.
122This process is called
123.Em uniquification
124because it ensures each child only has type information that is
125unique to it.
126A common example of this is that most kernel modules in illumos are uniquified
127against the kernel module
128.Sy genunix
129and the type information that comes from the
130.Sy IP
131module.
132This means that a module only has types that are unique to itself and the most
133common types in the kernel are not duplicated.
134.Sh FILE FORMAT
135This documents version
136.Em two
137of the
138.Nm
139file format.
140All applications and tools currently produce and operate on this version.
141.Lp
142The file format can be summarized with the following image, the
143following sections will cover this in more detail.
144.Bd -literal
145
146         +-------------+  0t0
147+--------| Preamble    |
148|        +-------------+  0t4
149|+-------| Header      |
150||       +-------------+  0t36 + cth_lbloff
151||+------| Labels      |
152|||      +-------------+  0t36 + cth_objtoff
153|||+-----| Objects     |
154||||     +-------------+  0t36 + cth_funcoff
155||||+----| Functions   |
156|||||    +-------------+  0t36 + cth_typeoff
157|||||+---| Types       |
158||||||   +-------------+  0t36 + cth_stroff
159||||||+--| Strings     |
160|||||||  +-------------+  0t36 + cth_stroff + cth_strlen
161|||||||
162|||||||
163|||||||
164|||||||    +-- magic -   vers   flags
165|||||||    |          |    |      |
166|||||||   +------+------+------+------+
167+---------| 0xcf | 0xf1 | 0x02 | 0x00 |
168 ||||||   +------+------+------+------+
169 ||||||   0      1      2      3      4
170 ||||||
171 ||||||    + parent label        + objects
172 ||||||    |       + parent name |     + functions    + strings
173 ||||||    |       |     + label |     |      + types |       + strlen
174 ||||||    |       |     |       |     |      |       |       |
175 ||||||   +------+------+------+------+------+-------+-------+-------+
176 +--------| 0x00 | 0x00 | 0x00 | 0x08 | 0x36 | 0x110 | 0x5f4 | 0x611 |
177  |||||   +------+------+------+------+------+-------+-------+-------+
178  |||||   0x04   0x08   0x0c   0x10   0x14    0x18    0x1c    0x20   0x24
179  |||||
180  |||||         + Label name
181  |||||         |       + Label type
182  |||||         |       |       + Next label
183  |||||         |       |       |
184  |||||       +-------+------+-----+
185  +-----------| 0x01  | 0x42 | ... |
186   ||||       +-------+------+-----+
187   ||||  cth_lbloff   +0x4   +0x8  cth_objtoff
188   ||||
189   ||||
190   |||| Symidx  0t15   0t43   0t44
191   ||||       +------+------+------+-----+
192   +----------| 0x00 | 0x42 | 0x36 | ... |
193    |||       +------+------+------+-----+
194    ||| cth_objtoff  +0x2   +0x4   +0x6   cth_funcoff
195    |||
196    |||        + CTF_TYPE_INFO         + CTF_TYPE_INFO
197    |||        |        + Return type  |
198    |||        |        |       + arg0 |
199    |||       +--------+------+------+-----+
200    +---------| 0x2c10 | 0x08 | 0x0c | ... |
201     ||       +--------+------+------+-----+
202     || cth_funcff     +0x2   +0x4   +0x6  cth_typeoff
203     ||
204     ||         + ctf_stype_t for type 1
205     ||         |  integer           + integer encoding
206     ||         |                    |          + ctf_stype_t for type 2
207     ||         |                    |          |
208     ||       +--------------------+-----------+-----+
209     +--------| 0x19 * 0xc01 * 0x0 | 0x1000000 | ... |
210      |       +--------------------+-----------+-----+
211      | cth_typeoff               +0x08      +0x0c  cth_stroff
212      |
213      |     +--- str 0
214      |     |    +--- str 1       + str 2
215      |     |    |                |
216      |     v    v                v
217      |   +----+---+---+---+----+---+---+---+---+---+----+
218      +---| \\0 | i | n | t | \\0 | f | o | o | _ | t | \\0 |
219          +----+---+---+---+----+---+---+---+---+---+----+
220          0    1   2   3   4    5   6   7   8   9   10   11
221.Ed
222.Lp
223Every
224.Nm
225file begins with a
226.Sy preamble ,
227followed by a
228.Sy header .
229The
230.Sy preamble
231is defined as follows:
232.Bd -literal
233typedef struct ctf_preamble {
234	ushort_t ctp_magic;	/* magic number (CTF_MAGIC) */
235	uchar_t ctp_version;	/* data format version number (CTF_VERSION) */
236	uchar_t ctp_flags;	/* flags (see below) */
237} ctf_preamble_t;
238.Ed
239.Pp
240The
241.Sy preamble
242is four bytes long and must be four byte aligned.
243This
244.Sy preamble
245defines the version of the
246.Nm
247file which defines the format of the rest of the header.
248While the header may change in subsequent versions, the preamble will not change
249across versions, though the interpretation of its flags may change from
250version to version.
251The
252.Em ctp_magic
253member defines the magic number for the
254.Nm
255file format.
256This must always be
257.Li 0xcff1 .
258If another value is encountered, then the file should not be treated as
259a
260.Nm
261file.
262The
263.Em ctp_version
264member defines the version of the
265.Nm
266file.
267The current version is
268.Li 2 .
269It is possible to encounter an unsupported version.
270In that case, software should not try to parse the format, as it may have
271changed.
272Finally, the
273.Em ctp_flags
274member describes aspects of the file which modify its interpretation.
275The following flags are currently defined:
276.Bd -literal
277#define	CTF_F_COMPRESS		0x01
278.Ed
279.Pp
280The flag
281.Sy CTF_F_COMPRESS
282indicates that the body of the
283.Nm
284file, all the data following the
285.Sy header ,
286has been compressed through the
287.Sy zlib
288library and its
289.Sy deflate
290algorithm.
291If this flag is not present, then the body has not been compressed and no
292special action is needed to interpret it.
293All offsets into the data as described by
294.Sy header ,
295always refer to the
296.Sy uncompressed
297data.
298.Lp
299In version two of the
300.Nm
301file format, the
302.Sy header
303denotes whether whether or not this
304.Nm
305file is the child of another
306.Nm
307file and also indicates the size of the remaining sections.
308The structure for the
309.Sy header ,
310logically contains a copy of the
311.Sy preamble
312and the two have a combined size of 36 bytes.
313.Bd -literal
314typedef struct ctf_header {
315	ctf_preamble_t cth_preamble;
316	uint_t cth_parlabel;	/* ref to name of parent lbl uniq'd against */
317	uint_t cth_parname;	/* ref to basename of parent */
318	uint_t cth_lbloff;	/* offset of label section */
319	uint_t cth_objtoff;	/* offset of object section */
320	uint_t cth_funcoff;	/* offset of function section */
321	uint_t cth_typeoff;	/* offset of type section */
322	uint_t cth_stroff;	/* offset of string section */
323	uint_t cth_strlen;	/* length of string section in bytes */
324} ctf_header_t;
325.Ed
326.Pp
327After the
328.Sy preamble ,
329the next two members
330.Em cth_parlablel
331and
332.Em cth_parname ,
333are used to identify the parent.
334The value of both members are offsets into the
335.Sy string
336section which point to the start of a null-terminated string.
337For more information on the encoding of strings, see the subsection on
338.Sx String Identifiers .
339If the value of either is zero, then there is no entry for that
340member.
341If the member
342.Em cth_parlabel
343is set, then the
344.Em ctf_parname
345member must be set, otherwise it will not be possible to find the
346parent.
347If
348.Em ctf_parname
349is set, it is not necessary to define
350.Em cth_parlabel ,
351as the parent may not have a label.
352For more information on labels and their interpretation, see
353.Sx The Label Section .
354.Lp
355The remaining members (excepting
356.Em cth_strlen )
357describe the beginning of the corresponding sections.
358These offsets are relative to the end of the
359.Sy header .
360Therefore, something with an offset of 0 is at an offset of thirty-six
361bytes relative to the start of the
362.Nm
363file.
364The difference between members indicates the size of the section itself.
365Different offsets have different alignment requirements.
366The start of the
367.Em cth_objotoff
368and
369.Em cth_funcoff
370must be two byte aligned, while the sections
371.Em cth_lbloff
372and
373.Em cth_typeoff
374must be four-byte aligned.
375The section
376.Em cth_stroff
377has no alignment requirements.
378To calculate the size of a given section, excepting the
379.Sy string
380section, one should subtract the offset of the section from the following one.
381For example, the size of the
382.Sy types
383section can be calculated by subtracting
384.Em cth_stroff
385from
386.Em cth_typeoff .
387.Lp
388Finally, the member
389.Em cth_strlen
390describes the length of the string section itself.
391From it, you can also calculate the size of the entire
392.Nm
393file by adding together the size of the
394.Sy ctf_header_t ,
395the offset of the string section in
396.Em cth_stroff ,
397and the size of the string section in
398.Em cth_srlen .
399.Ss Type Identifiers
400Through the
401.Nm ctf
402data, types are referred to by identifiers.
403A given
404.Nm
405file supports up to 32767 (0x7fff) types.
406The first valid type identifier is 0x1.
407When a given
408.Nm
409file is a child, indicated by a non-zero entry for the
410.Sy header Ns 's
411.Em cth_parname ,
412then the first valid type identifier is 0x8000 and the last is 0xffff.
413In this case, type identifiers 0x1 through 0x7fff are references to the
414parent.
415.Lp
416The type identifier zero is a sentinel value used to indicate that there
417is no type information available or it is an unknown type.
418.Lp
419Throughout the file format, the identifier is stored in different sized
420values; however, the minimum size to represent a given identifier is a
421.Sy uint16_t .
422Other consumers of
423.Nm
424information may use larger or opaque identifiers.
425.Ss String Identifiers
426String identifiers are always encoded as four byte unsigned integers
427which are an offset into a string table.
428The
429.Nm
430format supports two different string tables which have an identifier of
431zero or one.
432This identifier is stored in the high-order bit of the unsigned four byte
433offset.
434Therefore, the maximum supported offset into one of these tables is 0x7ffffffff.
435.Lp
436Table identifier zero, always refers to the
437.Sy string
438section in the CTF file itself.
439String table identifier one refers to an external string table which is the ELF
440string table for the ELF symbol table associated with the
441.Nm
442container.
443.Ss Type Encoding
444Every
445.Nm
446type begins with metadata encoded into a
447.Sy uint16_t .
448This encoded information tells us three different pieces of information:
449.Bl -bullet -offset indent -compact
450.It
451The kind of the type
452.It
453Whether this type is a root type or not
454.It
455The length of the variable data
456.El
457.Lp
458The 16 bits that make up the encoding are broken down such that you have
459five bits for the kind, one bit for indicating whether or not it is a
460root type, and 10 bits for the variable length.
461This is laid out as follows:
462.Bd -literal -offset indent
463+--------------------+
464| kind | root | vlen |
465+--------------------+
46615   11   10   9    0
467.Ed
468.Lp
469The current version of the file format defines 14 different kinds.
470The interpretation of these different kinds will be discussed in the section
471.Sx The Type Section .
472If a kind is encountered that is not listed below, then it is not a valid
473.Nm
474file.
475The kinds are defined as follows:
476.Bd -literal -offset indent
477#define	CTF_K_UNKNOWN	0
478#define	CTF_K_INTEGER	1
479#define	CTF_K_FLOAT	2
480#define	CTF_K_POINTER	3
481#define	CTF_K_ARRAY	4
482#define	CTF_K_FUNCTION	5
483#define	CTF_K_STRUCT	6
484#define	CTF_K_UNION	7
485#define	CTF_K_ENUM	8
486#define	CTF_K_FORWARD	9
487#define	CTF_K_TYPEDEF	10
488#define	CTF_K_VOLATILE	11
489#define	CTF_K_CONST	12
490#define	CTF_K_RESTRICT	13
491.Ed
492.Lp
493Programs directly reference many types; however, other types are referenced
494indirectly because they are part of some other structure.
495These types that are referenced directly and used are called
496.Sy root
497types.
498Other types may be used indirectly, for example, a program may reference
499a structure directly, but not one of its members which has a type.
500That type is not considered a
501.Sy root
502type.
503If a type is a
504.Sy root
505type, then it will have bit 10 set.
506.Lp
507The variable length section is specific to each kind and is discussed in the
508section
509.Sx The Type Section .
510.Lp
511The following macros are useful for constructing and deconstructing the encoded
512type information:
513.Bd -literal -offset indent
514
515#define	CTF_MAX_VLEN	0x3ff
516#define	CTF_INFO_KIND(info)	(((info) & 0xf800) >> 11)
517#define	CTF_INFO_ISROOT(info)	(((info) & 0x0400) >> 10)
518#define	CTF_INFO_VLEN(info)	(((info) & CTF_MAX_VLEN))
519
520#define	CTF_TYPE_INFO(kind, isroot, vlen) \\
521	(((kind) << 11) | (((isroot) ? 1 : 0) << 10) | ((vlen) & CTF_MAX_VLEN))
522.Ed
523.Ss The Label Section
524When consuming
525.Nm
526data, it is often useful to know whether two different
527.Nm
528containers come from the same source base and version.
529For example, when building illumos, there are many kernel modules that are built
530against a single collection of source code.
531A label is encoded into the
532.Nm
533files that corresponds with the particular build.
534This ensures that if files on the system were to become mixed up from multiple
535releases, that they are not used together by tools, particularly when a child
536needs to refer to a type in the parent.
537Because they are linked used the type identifiers, if the wrong parent is used
538then the wrong type will be encountered.
539.Lp
540Each label is encoded in the file format using the following eight byte
541structure:
542.Bd -literal
543typedef struct ctf_lblent {
544	uint_t ctl_label;	/* ref to name of label */
545	uint_t ctl_typeidx;	/* last type associated with this label */
546} ctf_lblent_t;
547.Ed
548.Lp
549Each label has two different components, a name and a type identifier.
550The name is encoded in the
551.Em ctl_label
552member which is in the format defined in the section
553.Sx String Identifiers .
554Generally, the names of all labels are found in the internal string
555section.
556.Lp
557The type identifier encoded in the member
558.Em ctl_typeidx
559refers to the last type identifier that a label refers to in the current
560file.
561Labels only refer to types in the current file, if the
562.Nm
563file is a child, then it will have the same label as its parent;
564however, its label will only refer to its types, not its parents.
565.Lp
566It is also possible, though rather uncommon, for a
567.Nm
568file to have multiple labels.
569Labels are placed one after another, every eight bytes.
570When multiple labels are present, types may only belong to a single label.
571.Ss The Object Section
572The object section provides a mapping from ELF symbols of type
573.Sy STT_OBJECT
574in the symbol table to a type identifier.
575Every entry in this section is a
576.Sy uint16_t
577which contains a type identifier as described in the section
578.Sx Type Identifiers .
579If there is no information for an object, then the type identifier 0x0
580is stored for that entry.
581.Lp
582To walk the object section, you need to have a corresponding
583.Sy symbol table
584in the ELF object that contains the
585.Nm
586data.
587Not every object is included in this section.
588Specifically, when walking the symbol table.
589An entry is skipped if it matches any of the following conditions:
590.Lp
591.Bl -bullet -offset indent -compact
592.It
593The symbol type is not
594.Sy STT_OBJECT
595.It
596The symbol's section index is
597.Sy SHN_UNDEF
598.It
599The symbol's name offset is zero
600.It
601The symbol's section index is
602.Sy SHN_ABS
603and the value of the symbol is zero.
604.It
605The symbol's name is
606.Li _START_
607or
608.Li _END_ .
609These are skipped because they are used for scoping local symbols in
610ELF.
611.El
612.Lp
613The following sample code shows an example of iterating the object
614section and skipping the correct symbols:
615.Bd -literal
616#include <gelf.h>
617#include <stdio.h>
618
619/*
620 * Given the start of the object section in the CTF file, the number of symbols,
621 * and the ELF Data sections for the symbol table and the string table, this
622 * prints the type identifiers that correspond to objects. Note, a more robust
623 * implementation should ensure that they don't walk beyond the end of the CTF
624 * object section.
625 */
626static int
627walk_symbols(uint16_t *objtoff, Elf_Data *symdata, Elf_Data *strdata,
628    long nsyms)
629{
630	long i;
631	uintptr_t strbase = strdata->d_buf;
632
633	for (i = 1; i < nsyms; i++, objftoff++) {
634		const char *name;
635		GElf_Sym sym;
636
637		if (gelf_getsym(symdata, i, &sym) == NULL)
638			return (1);
639
640		if (GELF_ST_TYPE(sym.st_info) != STT_OBJECT)
641			continue;
642		if (sym.st_shndx == SHN_UNDEF || sym.st_name == 0)
643			continue;
644		if (sym.st_shndx == SHN_ABS && sym.st_value == 0)
645			continue;
646		name = (const char *)(strbase + sym.st_name);
647		if (strcmp(name, "_START_") == 0 || strcmp(name, "_END_") == 0)
648			continue;
649
650		(void) printf("Symbol %d has type %d\n", i, *objtoff);
651	}
652
653	return (0);
654}
655.Ed
656.Ss The Function Section
657The function section of the
658.Nm
659file encodes the types of both the function's arguments and the function's
660return type.
661Similar to
662.Sx The Object Section ,
663the function section encodes information for all symbols of type
664.Sy STT_FUNCTION ,
665excepting those that fit specific criteria.
666Unlike with objects, because functions have a variable number of arguments, they
667start with a type encoding as defined in
668.Sx Type Encoding ,
669which is the size of a
670.Sy uint16_t .
671For functions which have no type information available, they are encoded as
672.Li CTF_TYPE_INFO(CTF_K_UNKNOWN, 0, 0) .
673Functions with arguments are encoded differently.
674Here, the variable length is turned into the number of arguments in the
675function.
676If a function is a
677.Sy varargs
678type function, then the number of arguments is increased by one.
679Functions with type information are encoded as:
680.Li CTF_TYPE_INFO(CTF_K_FUNCTION, 0, nargs) .
681.Lp
682For functions that have no type information, nothing else is encoded, and the
683next function is encoded.
684For functions with type information, the next
685.Sy uint16_t
686is encoded with the type identifier of the return type of the function.
687It is followed by each of the type identifiers of the arguments, if any exist,
688in the order that they appear in the function.
689Therefore, argument 0 is the first type identifier and so on.
690When a function has a final varargs argument, that is encoded with the type
691identifier of zero.
692.Lp
693Like
694.Sx The Object Section ,
695the function section is encoded in the order of the symbol table.
696It has similar, but slightly different considerations from objects.
697While iterating the symbol table, if any of the following conditions are true,
698then the entry is skipped and no corresponding entry is written:
699.Lp
700.Bl -bullet -offset indent -compact
701.It
702The symbol type is not
703.Sy STT_FUNCTION
704.It
705The symbol's section index is
706.Sy SHN_UNDEF
707.It
708The symbol's name offset is zero
709.It
710The symbol's name is
711.Li _START_
712or
713.Li _END_ .
714These are skipped because they are used for scoping local symbols in
715ELF.
716.El
717.Ss The Type Section
718The type section is the heart of the
719.Nm
720data.
721It encodes all of the information about the types themselves.
722The base of the type information comes in two forms, a short form and a long
723form, each of which may be followed by a variable number of arguments.
724The following definitions describe the short and long forms:
725.Bd -literal
726#define	CTF_MAX_SIZE	0xfffe	/* max size of a type in bytes */
727#define	CTF_LSIZE_SENT	0xffff	/* sentinel for ctt_size */
728#define	CTF_MAX_LSIZE	UINT64_MAX
729
730typedef struct ctf_stype {
731	uint_t ctt_name;	/* reference to name in string table */
732	ushort_t ctt_info;	/* encoded kind, variant length */
733	union {
734		ushort_t _size;	/* size of entire type in bytes */
735		ushort_t _type;	/* reference to another type */
736	} _u;
737} ctf_stype_t;
738
739typedef struct ctf_type {
740	uint_t ctt_name;	/* reference to name in string table */
741	ushort_t ctt_info;	/* encoded kind, variant length */
742	union {
743		ushort_t _size;	/* always CTF_LSIZE_SENT */
744		ushort_t _type; /* do not use */
745	} _u;
746	uint_t ctt_lsizehi;	/* high 32 bits of type size in bytes */
747	uint_t ctt_lsizelo;	/* low 32 bits of type size in bytes */
748} ctf_type_t;
749
750#define	ctt_size _u._size	/* for fundamental types that have a size */
751#define	ctt_type _u._type	/* for types that reference another type */
752.Ed
753.Pp
754Type sizes are stored in
755.Sy bytes .
756The basic small form uses a
757.Sy ushort_t
758to store the number of bytes.
759If the number of bytes in a structure would exceed 0xfffe, then the alternate
760form, the
761.Sy ctf_type_t ,
762is used instead.
763To indicate that the larger form is being used, the member
764.Em ctt_size
765is set to value of
766.Sy CTF_LSIZE_SENT
767(0xffff).
768In general, when going through the type section, consumers use the
769.Sy ctf_type_t
770structure, but pay attention to the value of the member
771.Em ctt_size
772to determine whether they should increment their scan by the size of the
773.Sy ctf_stype_t
774or
775.Sy ctf_type_t .
776Not all kinds of types use
777.Sy ctt_size .
778Those which do not, will always use the
779.Sy ctf_stype_t
780structure.
781The individual sections for each kind have more information.
782.Lp
783Types are written out in order.
784Therefore the first entry encountered has a type id of 0x1, or 0x8000 if a
785child.
786The member
787.Em ctt_name
788is encoded as described in the section
789.Sx String Identifiers .
790The string that it points to is the name of the type.
791If the identifier points to an empty string (one that consists solely of a null
792terminator) then the type does not have a name, this is common with anonymous
793structures and unions that only have a typedef to name them, as well as,
794pointers and qualifiers.
795.Lp
796The next member, the
797.Em ctt_info ,
798is encoded as described in the section
799.Sx Type Encoding .
800The types kind tells us how to interpret the remaining data in the
801.Sy ctf_type_t
802and any variable length data that may exist.
803The rest of this section will be broken down into the interpretation of the
804various kinds.
805.Ss Encoding of Integers
806Integers, which are of type
807.Sy CTF_K_INTEGER ,
808have no variable length arguments.
809Instead, they are followed by a four byte
810.Sy uint_t
811which describes their encoding.
812All integers must be encoded with a variable length of zero.
813The
814.Em ctt_size
815member describes the length of the integer in bytes.
816In general, integer sizes will be rounded up to the closest power of two.
817.Lp
818The integer encoding contains three different pieces of information:
819.Bl -bullet -offset indent -compact
820.It
821The encoding of the integer
822.It
823The offset in
824.Sy bits
825of the type
826.It
827The size in
828.Sy bits
829of the type
830.El
831.Pp
832This encoding can be expressed through the following macros:
833.Bd -literal -offset indent
834#define	CTF_INT_ENCODING(data)	(((data) & 0xff000000) >> 24)
835#define	CTF_INT_OFFSET(data)	(((data) & 0x00ff0000) >> 16)
836#define	CTF_INT_BITS(data)	(((data) & 0x0000ffff))
837
838#define	CTF_INT_DATA(encoding, offset, bits) \\
839	(((encoding) << 24) | ((offset) << 16) | (bits))
840.Ed
841.Pp
842The following flags are defined for the encoding at this time:
843.Bd -literal -offset indent
844#define	CTF_INT_SIGNED		0x01
845#define	CTF_INT_CHAR		0x02
846#define	CTF_INT_BOOL		0x04
847#define	CTF_INT_VARARGS		0x08
848.Ed
849.Lp
850By default, an integer is considered to be unsigned, unless it has the
851.Sy CTF_INT_SIGNED
852flag set.
853If the flag
854.Sy CTF_INT_CHAR
855is set, that indicates that the integer is of a type that stores character
856data, for example the intrinsic C type
857.Sy char
858would have the
859.Sy CTF_INT_CHAR
860flag set.
861If the flag
862.Sy CTF_INT_BOOL
863is set, that indicates that the integer represents a boolean type.
864For example, the intrinsic C type
865.Sy _Bool
866would have the
867.Sy CTF_INT_BOOL
868flag set.
869Finally, the flag
870.Sy CTF_INT_VARARGS
871indicates that the integer is used as part of a variable number of arguments.
872This encoding is rather uncommon.
873.Ss Encoding of Floats
874Floats, which are of type
875.Sy CTF_K_FLOAT ,
876are similar to their integer counterparts.
877They have no variable length arguments and are followed by a four byte encoding
878which describes the kind of float that exists.
879The
880.Em ctt_size
881member is the size, in bytes, of the float.
882The float encoding has three different pieces of information inside of it:
883.Lp
884.Bl -bullet -offset indent -compact
885.It
886The specific kind of float that exists
887.It
888The offset in
889.Sy bits
890of the float
891.It
892The size in
893.Sy bits
894of the float
895.El
896.Lp
897This encoding can be expressed through the following macros:
898.Bd -literal -offset indent
899#define	CTF_FP_ENCODING(data)	(((data) & 0xff000000) >> 24)
900#define	CTF_FP_OFFSET(data)	(((data) & 0x00ff0000) >> 16)
901#define	CTF_FP_BITS(data)	(((data) & 0x0000ffff))
902
903#define	CTF_FP_DATA(encoding, offset, bits) \\
904	(((encoding) << 24) | ((offset) << 16) | (bits))
905.Ed
906.Lp
907Where as the encoding for integers was a series of flags, the encoding for
908floats maps to a specific kind of float.
909It is not a flag-based value.
910The kinds of floats correspond to both their size, and the encoding.
911This covers all of the basic C intrinsic floating point types.
912The following are the different kinds of floats represented in the encoding:
913.Bd -literal -offset indent
914#define	CTF_FP_SINGLE	1	/* IEEE 32-bit float encoding */
915#define	CTF_FP_DOUBLE	2	/* IEEE 64-bit float encoding */
916#define	CTF_FP_CPLX	3	/* Complex encoding */
917#define	CTF_FP_DCPLX	4	/* Double complex encoding */
918#define	CTF_FP_LDCPLX	5	/* Long double complex encoding */
919#define	CTF_FP_LDOUBLE	6	/* Long double encoding */
920#define	CTF_FP_INTRVL	7	/* Interval (2x32-bit) encoding */
921#define	CTF_FP_DINTRVL	8	/* Double interval (2x64-bit) encoding */
922#define	CTF_FP_LDINTRVL	9	/* Long double interval (2x128-bit) encoding */
923#define	CTF_FP_IMAGRY	10	/* Imaginary (32-bit) encoding */
924#define	CTF_FP_DIMAGRY	11	/* Long imaginary (64-bit) encoding */
925#define	CTF_FP_LDIMAGRY	12	/* Long double imaginary (128-bit) encoding */
926.Ed
927.Ss Encoding of Arrays
928Arrays, which are of type
929.Sy CTF_K_ARRAY ,
930have no variable length arguments.
931They are followed by a structure which describes the number of elements in the
932array, the type identifier of the elements in the array, and the type identifier
933of the index of the array.
934With arrays, the
935.Em ctt_size
936member is set to zero.
937The structure that follows an array is defined as:
938.Bd -literal
939typedef struct ctf_array {
940	ushort_t cta_contents;	/* reference to type of array contents */
941	ushort_t cta_index;	/* reference to type of array index */
942	uint_t cta_nelems;	/* number of elements */
943} ctf_array_t;
944.Ed
945.Lp
946The
947.Em cta_contents
948and
949.Em cta_index
950members of the
951.Sy ctf_array_t
952are type identifiers which are encoded as per the section
953.Sx Type Identifiers .
954The member
955.Em cta_nelems
956is a simple four byte unsigned count of the number of elements.
957This count may be zero when encountering C99's flexible array members.
958.Ss Encoding of Functions
959Function types, which are of type
960.Sy CTF_K_FUNCTION ,
961use the variable length list to be the number of arguments in the function.
962When the function has a final member which is a varargs, then the argument count
963is incremented by one to account for the variable argument.
964Here, the
965.Em ctt_type
966member is encoded with the type identifier of the return type of the function.
967Note that the
968.Em ctt_size
969member is not used here.
970.Lp
971The variable argument list contains the type identifiers for the arguments of
972the function, if any.
973Each one is represented by a
974.Sy uint16_t
975and encoded according to the
976.Sx Type Identifiers
977section.
978If the function's last argument is of type varargs, then it is also written out,
979but the type identifier is zero.
980This is included in the count of the function's arguments.
981.Ss Encoding of Structures and Unions
982Structures and Unions, which are encoded with
983.Sy CTF_K_STRUCT
984and
985.Sy CTF_K_UNION
986respectively,  are very similar constructs in C.
987The main difference between them is the fact that every member of a structure
988follows one another, where as in a union, all members share the same memory.
989They are also very similar in terms of their encoding in
990.Nm .
991The variable length argument for structures and unions represents the number of
992members that they have.
993The value of the member
994.Em ctt_size
995is the size of the structure and union.
996There are two different structures which are used to encode members in the
997variable list.
998When the size of a structure or union is greater than or equal to the large
999member threshold, 8192, then a different structure is used to encode the member,
1000all members are encoded using the same structure.
1001The structure for members is as follows:
1002.Bd -literal
1003typedef struct ctf_member {
1004	uint_t ctm_name;	/* reference to name in string table */
1005	ushort_t ctm_type;	/* reference to type of member */
1006	ushort_t ctm_offset;	/* offset of this member in bits */
1007} ctf_member_t;
1008
1009typedef struct ctf_lmember {
1010	uint_t ctlm_name;	/* reference to name in string table */
1011	ushort_t ctlm_type;	/* reference to type of member */
1012	ushort_t ctlm_pad;	/* padding */
1013	uint_t ctlm_offsethi;	/* high 32 bits of member offset in bits */
1014	uint_t ctlm_offsetlo;	/* low 32 bits of member offset in bits */
1015} ctf_lmember_t;
1016.Ed
1017.Lp
1018Both the
1019.Em ctm_name
1020and
1021.Em ctlm_name
1022refer to the name of the member.
1023The name is encoded as an offset into the string table as described by the
1024section
1025.Sx String Identifiers .
1026The members
1027.Sy ctm_type
1028and
1029.Sy ctlm_type
1030both refer to the type of the member.
1031They are encoded as per the section
1032.Sx Type Identifiers .
1033.Lp
1034The last piece of information that is present is the offset which describes the
1035offset in memory that the member begins at.
1036For unions, this value will always be zero because the start of unions in memory
1037is always zero.
1038For structures, this is the offset in
1039.Sy bits
1040that the member begins at.
1041Note that a compiler may lay out a type with padding.
1042This means that the difference in offset between two consecutive members may be
1043larger than the size of the member.
1044When the size of the overall structure is strictly less than 8192 bytes, the
1045normal structure,
1046.Sy ctf_member_t ,
1047is used and the offset in bits is stored in the member
1048.Em ctm_offset .
1049However, when the size of the structure is greater than or equal to 8192 bytes,
1050then the number of bits is split into two 32-bit quantities.
1051One member,
1052.Em ctlm_offsethi ,
1053represents the upper 32 bits of the offset, while the other member,
1054.Em ctlm_offsetlo ,
1055represents the lower 32 bits of the offset.
1056These can be joined together to get a 64-bit sized offset in bits by shifting
1057the member
1058.Em ctlm_offsethi
1059to the left by thirty two and then doing a binary or of
1060.Em ctlm_offsetlo .
1061.Ss Encoding of Enumerations
1062Enumerations, noted by the type
1063.Sy CTF_K_ENUM ,
1064are similar to structures.
1065Enumerations use the variable list to note the number of values that the
1066enumeration contains, which we'll term enumerators.
1067In C, an enumeration is always equivalent to the intrinsic type
1068.Sy int ,
1069thus the value of the member
1070.Em ctt_size
1071is always the size of an integer which is determined based on the current model.
1072For illumos systems, this will always be 4, as an integer is always defined to
1073be 4 bytes large in both
1074.Sy ILP32
1075and
1076.Sy LP64 ,
1077regardless of the architecture.
1078.Lp
1079The enumerators encoded in an enumeration have the following structure in the
1080variable list:
1081.Bd -literal
1082typedef struct ctf_enum {
1083	uint_t cte_name;	/* reference to name in string table */
1084	int cte_value;		/* value associated with this name */
1085} ctf_enum_t;
1086.Ed
1087.Pp
1088The member
1089.Em cte_name
1090refers to the name of the enumerator's value, it is encoded according to the
1091rules in the section
1092.Sx String Identifiers .
1093The member
1094.Em cte_value
1095contains the integer value of this enumerator.
1096.Ss Encoding of Forward References
1097Forward references, types of kind
1098.Sy CTF_K_FORWARD ,
1099in a
1100.Nm
1101file refer to types which may not have a definition at all, only a name.
1102If the
1103.Nm
1104file is a child, then it may be that the forward is resolved to an
1105actual type in the parent, otherwise the definition may be in another
1106.Nm
1107container or may not be known at all.
1108The only member of the
1109.Sy ctf_type_t
1110that matters for a forward declaration is the
1111.Em ctt_name
1112which points to the name of the forward reference in the string table as
1113described earlier.
1114There is no other information recorded for forward references.
1115.Ss Encoding of Pointers, Typedefs, Volatile, Const, and Restrict
1116Pointers, typedefs, volatile, const, and restrict are all similar in
1117.Nm .
1118They all refer to another type.
1119In the case of typedefs, they provide an alternate name, while volatile, const,
1120and restrict change how the type is interpreted in the C programming language.
1121This covers the
1122.Nm
1123kinds
1124.Sy CTF_K_POINTER ,
1125.Sy CTF_K_TYPEDEF ,
1126.Sy CTF_K_VOLATILE ,
1127.Sy CTF_K_RESTRICT ,
1128and
1129.Sy CTF_K_CONST .
1130.Lp
1131These types have no variable list entries and use the member
1132.Em ctt_type
1133to refer to the base type that they modify.
1134.Ss Encoding of Unknown Types
1135Types with the kind
1136.Sy CTF_K_UNKNOWN
1137are used to indicate gaps in the type identifier space.
1138These entries consume an identifier, but do not define anything.
1139Nothing should refer to these gap identifiers.
1140.Ss Dependencies Between Types
1141C types can be imagined as a directed, cyclic, graph.
1142Structures and unions may refer to each other in a way that creates a cyclic
1143dependency.
1144In cases such as these, the entire type section must be read in and processed.
1145Consumers must not assume that every type can be laid out in dependency order;
1146they cannot.
1147.Ss The String Section
1148The last section of the
1149.Nm
1150file is the
1151.Sy string
1152section.
1153This section encodes all of the strings that appear throughout the other
1154sections.
1155It is laid out as a series of characters followed by a null terminator.
1156Generally, all names are written out in ASCII, as most C compilers do not allow
1157and characters to appear in identifiers outside of a subset of ASCII.
1158However, any extended characters sets should be written out as a series of UTF-8
1159bytes.
1160.Lp
1161The first entry in the section, at offset zero, is a single null
1162terminator to reference the empty string.
1163Following that, each C string should be written out, including the null
1164terminator.
1165Offsets that refer to something in this section should refer to the first byte
1166which begins a string.
1167Beyond the first byte in the section being the null terminator, the order of
1168strings is unimportant.
1169.Sh Data Encoding and ELF Considerations
1170.Nm
1171data is generally included in ELF objects which specify information to
1172identify the architecture and endianness of the file.
1173A
1174.Nm
1175container inside such an object must match the endianness of the ELF object.
1176Aside from the question of the endian encoding of data, there should be no other
1177differences between architectures.
1178While many of the types in this document refer to non-fixed size C integral
1179types, they are equivalent in the models
1180.Sy ILP32
1181and
1182.Sy LP64 .
1183If any other model is being used with
1184.Nm
1185data that has different sizes, then it must not use the model's sizes for
1186those integral types and instead use the fixed size equivalents based on an
1187.Sy ILP32
1188environment.
1189.Lp
1190When placing a
1191.Nm
1192container inside of an ELF object, there are certain conventions that are
1193expected for the purposes of tooling being able to find the
1194.Nm
1195data.
1196In particular, a given ELF object should only contain a single
1197.Nm
1198section.
1199Multiple containers should be merged together into a single one.
1200.Lp
1201The
1202.Nm
1203file should be included in its own ELF section.
1204The section's name must be
1205.Ql .SUNW_ctf .
1206The type of the section must be
1207.Sy SHT_PROGBITS .
1208The section should have a link set to the symbol table and its address
1209alignment must be 4.
1210.Sh SEE ALSO
1211.Xr mdb 1 ,
1212.Xr gelf 3ELF ,
1213.Xr libelf 3LIB ,
1214.Xr a.out 5 ,
1215.Xr dtrace 8
1216