xref: /titanic_51/usr/src/man/man4/ctf.4 (revision 751609474e831927e5706b37cb08160df31dcd4d)
1.\"
2.\" This file and its contents are supplied under the terms of the
3.\" Common Development and Distribution License ("CDDL"), version 1.0.
4.\" You may only use this file in accordance with the terms of version
5.\" 1.0 of the CDDL.
6.\"
7.\" A full copy of the text of the CDDL should have accompanied this
8.\" source.  A copy of the CDDL is also available via the Internet at
9.\" http://www.illumos.org/license/CDDL.
10.\"
11.\"
12.\" Copyright (c) 2014 Joyent, Inc.
13.\"
14.Dd Sep 26, 2014
15.Dt CTF 4
16.Os
17.Sh NAME
18.Nm ctf
19.Nd Compact C Type Format
20.Sh SYNOPSIS
21.In sys/ctf.h
22.Sh DESCRIPTION
23.Nm
24is designed to be a compact representation of the C programming
25language's type information focused on serving the needs of dynamic
26tracing, debuggers, and other in-situ and post-mortem introspection
27tools.
28.Nm
29data is generally included in
30.Sy ELF
31objects and is tagged as
32.Sy SHT_PROGBITS
33to ensure that the data is accessible in a running process and in subsequent
34core dumps, if generated.
35.Lp
36The
37.Nm
38data contained in each file has information about the layout and
39sizes of C types, including intrinsic types, enumerations, structures,
40typedefs, and unions, that are used by the corresponding
41.Sy ELF
42object. The
43.Nm
44data may also include information about the types of global objects and
45the return type and arguments of functions in the symbol table.
46.Lp
47Because a
48.Nm
49file is often embedded inside a file, rather than being a standalone
50file itself, it may also be referred to as a
51.Nm
52.Sy container .
53.Lp
54On illumos systems,
55.Nm
56data is consumed by multiple programs. It can be used by the modular
57debugger,
58.Xr mdb 1 ,
59as well as by
60.Xr dtrace 1M .
61Programmatic access to
62.Nm
63data can be obtained through
64.Xr libctf 3LIB .
65.Lp
66The
67.Nm
68file format is broken down into seven different sections. The first
69section is the
70.Sy preamble
71and
72.Sy header ,
73which describes the version of the
74.Nm
75file, links it has to other
76.Nm
77files, and the sizes of the other sections. The next section is the
78.Sy label
79section,
80which provides a way of identifying similar groups of
81.Nm
82data across multiple files. This is followed by the
83.Sy object
84information section, which describes the type of global
85symbols. The subsequent section is the
86.Sy function
87information section, which describes the return
88types and arguments of functions. The next section is the
89.Sy type
90information section, which describes
91the format and layout of the C types themselves, and finally the last
92section is the
93.Sy string
94section, which contains the names of types, enumerations, members, and
95labels.
96.Lp
97While strictly speaking, only the
98.Sy preamble
99and
100.Sy header
101are required, to be actually useful, both the type and string
102sections are necessary.
103.Lp
104A
105.Nm
106file may contain all of the type information that it requires, or it
107may optionally refer to another
108.Nm
109file which holds the remaining types. When a
110.Nm
111file refers to another file, it is called the
112.Sy child
113and the file it refers to is called the
114.Sy parent .
115A given file may only refer to one parent. This process is called
116.Em uniquification
117because it ensures each child only has type information that is
118unique to it. A common example of this is that most kernel modules in
119illumos are uniquified against the kernel module
120.Sy genunix
121and the type information that comes from the
122.Sy IP
123module. This means that a module only has types that are unique to
124itself and the most common types in the kernel are not duplicated.
125.Sh FILE FORMAT
126This documents version
127.Em two
128of the
129.Nm
130file format. All applications and tools currently produce and operate on
131this version.
132.Lp
133The file format can be summarized with the following image, the
134following sections will cover this in more detail.
135.Bd -literal
136
137         +-------------+  0t0
138+--------| Preamble    |
139|        +-------------+  0t4
140|+-------| Header      |
141||       +-------------+  0t36 + cth_lbloff
142||+------| Labels      |
143|||      +-------------+  0t36 + cth_objtoff
144|||+-----| Objects     |
145||||     +-------------+  0t36 + cth_funcoff
146||||+----| Functions   |
147|||||    +-------------+  0t36 + cth_typeoff
148|||||+---| Types       |
149||||||   +-------------+  0t36 + cth_stroff
150||||||+--| Strings     |
151|||||||  +-------------+  0t36 + cth_stroff + cth_strlen
152|||||||
153|||||||
154|||||||
155|||||||    +-- magic -   vers   flags
156|||||||    |          |    |      |
157|||||||   +------+------+------+------+
158+---------| 0xcf | 0xf1 | 0x02 | 0x00 |
159 ||||||   +------+------+------+------+
160 ||||||   0      1      2      3      4
161 ||||||
162 ||||||    + parent label        + objects
163 ||||||    |       + parent name |     + functions    + strings
164 ||||||    |       |     + label |     |      + types |       + strlen
165 ||||||    |       |     |       |     |      |       |       |
166 ||||||   +------+------+------+------+------+-------+-------+-------+
167 +--------| 0x00 | 0x00 | 0x00 | 0x08 | 0x36 | 0x110 | 0x5f4 | 0x611 |
168  |||||   +------+------+------+------+------+-------+-------+-------+
169  |||||   0x04   0x08   0x0c   0x10   0x14    0x18    0x1c    0x20   0x24
170  |||||
171  |||||         + Label name
172  |||||         |       + Label type
173  |||||         |       |       + Next label
174  |||||         |       |       |
175  |||||       +-------+------+-----+
176  +-----------| 0x01  | 0x42 | ... |
177   ||||       +-------+------+-----+
178   ||||  cth_lbloff   +0x4   +0x8  cth_objtoff
179   ||||
180   ||||
181   |||| Symidx  0t15   0t43   0t44
182   ||||       +------+------+------+-----+
183   +----------| 0x00 | 0x42 | 0x36 | ... |
184    |||       +------+------+------+-----+
185    ||| cth_objtoff  +0x2   +0x4   +0x6   cth_funcoff
186    |||
187    |||        + CTF_TYPE_INFO         + CTF_TYPE_INFO
188    |||        |        + Return type  |
189    |||        |        |       + arg0 |
190    |||       +--------+------+------+-----+
191    +---------| 0x2c10 | 0x08 | 0x0c | ... |
192     ||       +--------+------+------+-----+
193     || cth_funcff     +0x2   +0x4   +0x6  cth_typeoff
194     ||
195     ||         + ctf_stype_t for type 1
196     ||         |  integer           + integer encoding
197     ||         |                    |          + ctf_stype_t for type 2
198     ||         |                    |          |
199     ||       +--------------------+-----------+-----+
200     +--------| 0x19 * 0xc01 * 0x0 | 0x1000000 | ... |
201      |       +--------------------+-----------+-----+
202      | cth_typeoff               +0x08      +0x0c  cth_stroff
203      |
204      |     +--- str 0
205      |     |    +--- str 1       + str 2
206      |     |    |                |
207      |     v    v                v
208      |   +----+---+---+---+----+---+---+---+---+---+----+
209      +---| \\0 | i | n | t | \\0 | f | o | o | _ | t | \\0 |
210          +----+---+---+---+----+---+---+---+---+---+----+
211          0    1   2   3   4    5   6   7   8   9   10   11
212.Ed
213.Lp
214Every
215.Nm
216file begins with a
217.Sy preamble ,
218followed by a
219.Sy header .
220The
221.Sy preamble
222is defined as follows:
223.Bd -literal
224typedef struct ctf_preamble {
225	ushort_t ctp_magic;	/* magic number (CTF_MAGIC) */
226	uchar_t ctp_version;	/* data format version number (CTF_VERSION) */
227	uchar_t ctp_flags;	/* flags (see below) */
228} ctf_preamble_t;
229.Ed
230.Pp
231The
232.Sy preamble
233is four bytes long and must be four byte aligned.
234This
235.Sy preamble
236defines the version of the
237.Nm
238file which defines the format of the rest of the header. While the
239header may change in subsequent versions, the preamble will not change
240across versions, though the interpretation of its flags may change from
241version to version. The
242.Em ctp_magic
243member defines the magic number for the
244.Nm
245file format. This must always be
246.Li 0xcff1 .
247If another value is encountered, then the file should not be treated as
248a
249.Nm
250file. The
251.Em ctp_version
252member defines the version of the
253.Nm
254file. The current version is
255.Li 2 .
256It is possible to encounter an unsupported version. In that case,
257software should not try to parse the format, as it may have changed.
258Finally, the
259.Em ctp_flags
260member describes aspects of the file which modify its interpretation.
261The following flags are currently defined:
262.Bd -literal
263#define	CTF_F_COMPRESS		0x01
264.Ed
265.Pp
266The flag
267.Sy CTF_F_COMPRESS
268indicates that the body of the
269.Nm
270file, all the data following the
271.Sy header ,
272has been compressed through the
273.Sy zlib
274library and its
275.Sy deflate
276algorithm. If this flag is not present, then the body has not been
277compressed and no special action is needed to interpret it. All offsets
278into the data as described by
279.Sy header ,
280always refer to the
281.Sy uncompressed
282data.
283.Lp
284In version two of the
285.Nm
286file format, the
287.Sy header
288denotes whether whether or not this
289.Nm
290file is the child of another
291.Nm
292file and also indicates the size of the remaining sections. The
293structure for the
294.Sy header ,
295logically contains a copy of the
296.Sy preamble
297and the two have a combined size of 36 bytes.
298.Bd -literal
299typedef struct ctf_header {
300	ctf_preamble_t cth_preamble;
301	uint_t cth_parlabel;	/* ref to name of parent lbl uniq'd against */
302	uint_t cth_parname;	/* ref to basename of parent */
303	uint_t cth_lbloff;	/* offset of label section */
304	uint_t cth_objtoff;	/* offset of object section */
305	uint_t cth_funcoff;	/* offset of function section */
306	uint_t cth_typeoff;	/* offset of type section */
307	uint_t cth_stroff;	/* offset of string section */
308	uint_t cth_strlen;	/* length of string section in bytes */
309} ctf_header_t;
310.Ed
311.Pp
312After the
313.Sy preamble ,
314the next two members
315.Em cth_parlablel
316and
317.Em cth_parname ,
318are used to identify the parent. The value of both members are offsets
319into the
320.Sy string
321section which point to the start of a null-terminated string. For more
322information on the encoding of strings, see the subsection on
323.Sx String Identifiers .
324If the value of either is zero, then there is no entry for that
325member. If the member
326.Em cth_parlabel
327is set, then the
328.Em ctf_parname
329member must be set, otherwise it will not be possible to find the
330parent. If
331.Em ctf_parname
332is set, it is not necessary to define
333.Em cth_parlabel ,
334as the parent may not have a label. For more information on labels
335and their interpretation, see
336.Sx The Label Section .
337.Lp
338The remaining members (excepting
339.Em cth_strlen )
340describe the beginning of the corresponding sections. These offsets are
341relative to the end of the
342.Sy header .
343Therefore, something with an offset of 0 is at an offset of thirty-six
344bytes relative to the start of the
345.Nm
346file. The difference between members
347indicates the size of the section itself. Different offsets have
348different alignment requirements. The start of the
349.Em cth_objotoff
350and
351.Em cth_funcoff
352must be two byte aligned, while the sections
353.Em cth_lbloff
354and
355.Em cth_typeoff
356must be four-byte aligned. The section
357.Em cth_stroff
358has no alignment requirements. To calculate the size of a given section,
359excepting the
360.Sy string
361section, one should subtract the offset of the section from the following one. For
362example, the size of the
363.Sy types
364section can be calculated by subtracting
365.Em cth_stroff
366from
367.Em cth_typeoff .
368.Lp
369Finally, the member
370.Em cth_strlen
371describes the length of the string section itself. From it, you can also
372calculate the size of the entire
373.Nm
374file by adding together the size of the
375.Sy ctf_header_t ,
376the offset of the string section in
377.Em cth_stroff ,
378and the size of the string section in
379.Em cth_srlen .
380.Ss Type Identifiers
381Through the
382.Nm ctf
383data, types are referred to by identifiers. A given
384.Nm
385file supports up to 32767 (0x7fff) types. The first valid type identifier is 0x1.
386When a given
387.Nm
388file is a child, indicated by a non-zero entry for the
389.Sy header Ns 's
390.Em cth_parname ,
391then the first valid type identifier is 0x8000 and the last is 0xffff.
392In this case, type identifiers 0x1 through 0x7fff are references to the
393parent.
394.Lp
395The type identifier zero is a sentinel value used to indicate that there
396is no type information available or it is an unknown type.
397.Lp
398Throughout the file format, the identifier is stored in different sized
399values; however, the minimum size to represent a given identifier is a
400.Sy uint16_t .
401Other consumers of
402.Nm
403information may use larger or opaque identifiers.
404.Ss String Identifiers
405String identifiers are always encoded as four byte unsigned integers
406which are an offset into a string table. The
407.Nm
408format supports two different string tables which have an identifier of
409zero or one. This identifier is stored in the high-order bit of the
410unsigned four byte offset. Therefore, the maximum supported offset into
411one of these tables is 0x7ffffffff.
412.Lp
413Table identifier zero, always refers to the
414.Sy string
415section in the CTF file itself. String table identifier one refers to an
416external string table which is the ELF string table for the ELF symbol
417table associated with the
418.Nm
419container.
420.Ss Type Encoding
421Every
422.Nm
423type begins with metadata encoded into a
424.Sy uint16_t .
425This encoded information tells us three different pieces of information:
426.Bl -bullet -offset indent -compact
427.It
428The kind of the type
429.It
430Whether this type is a root type or not
431.It
432The length of the variable data
433.El
434.Lp
435The 16 bits that make up the encoding are broken down such that you have
436five bits for the kind, one bit for indicating whether or not it is a
437root type, and 10 bits for the variable length. This is laid out as
438follows:
439.Bd -literal -offset indent
440+--------------------+
441| kind | root | vlen |
442+--------------------+
44315   11   10   9    0
444.Ed
445.Lp
446The current version of the file format defines 14 different kinds. The
447interpretation of these different kinds will be discussed in the section
448.Sx The Type Section .
449If a kind is encountered that is not listed below, then it is not a valid
450.Nm
451file. The kinds are defined as follows:
452.Bd -literal -offset indent
453#define	CTF_K_UNKNOWN	0
454#define	CTF_K_INTEGER	1
455#define	CTF_K_FLOAT	2
456#define	CTF_K_POINTER	3
457#define	CTF_K_ARRAY	4
458#define	CTF_K_FUNCTION	5
459#define	CTF_K_STRUCT	6
460#define	CTF_K_UNION	7
461#define	CTF_K_ENUM	8
462#define	CTF_K_FORWARD	9
463#define	CTF_K_TYPEDEF	10
464#define	CTF_K_VOLATILE	11
465#define	CTF_K_CONST	12
466#define	CTF_K_RESTRICT	13
467.Ed
468.Lp
469Programs directly reference many types; however, other types are referenced
470indirectly because they are part of some other structure. These types that are
471referenced directly and used are called
472.Sy root
473types. Other types may be used indirectly, for example, a program may reference
474a structure directly, but not one of its members which has a type. That type is
475not considered a
476.Sy root
477type. If a type is a
478.Sy root
479type, then it will have bit 10 set.
480.Lp
481The variable length section is specific to each kind and is discussed in the
482section
483.Sx The Type Section .
484.Lp
485The following macros are useful for constructing and deconstructing the encoded
486type information:
487.Bd -literal -offset indent
488
489#define	CTF_MAX_VLEN	0x3ff
490#define	CTF_INFO_KIND(info)	(((info) & 0xf800) >> 11)
491#define	CTF_INFO_ISROOT(info)	(((info) & 0x0400) >> 10)
492#define	CTF_INFO_VLEN(info)	(((info) & CTF_MAX_VLEN))
493
494#define	CTF_TYPE_INFO(kind, isroot, vlen) \\
495	(((kind) << 11) | (((isroot) ? 1 : 0) << 10) | ((vlen) & CTF_MAX_VLEN))
496.Ed
497.Ss The Label Section
498When consuming
499.Nm
500data, it is often useful to know whether two different
501.Nm
502containers come from the same source base and version. For example, when
503building illumos, there are many kernel modules that are built against a
504single collection of source code. A label is encoded into the
505.Nm
506files that corresponds with the particular build. This ensures that if
507files on the system were to become mixed up from multiple releases, that
508they are not used together by tools, particularly when a child needs to
509refer to a type in the parent. Because they are linked used the type
510identifiers, if the wrong parent is used then the wrong type will be
511encountered.
512.Lp
513Each label is encoded in the file format using the following eight byte
514structure:
515.Bd -literal
516typedef struct ctf_lblent {
517	uint_t ctl_label;	/* ref to name of label */
518	uint_t ctl_typeidx;	/* last type associated with this label */
519} ctf_lblent_t;
520.Ed
521.Lp
522Each label has two different components, a name and a type identifier.
523The name is encoded in the
524.Em ctl_label
525member which is in the format defined in the section
526.Sx String Identifiers .
527Generally, the names of all labels are found in the internal string
528section.
529.Lp
530The type identifier encoded in the member
531.Em ctl_typeidx
532refers to the last type identifier that a label refers to in the current
533file. Labels only refer to types in the current file, if the
534.Nm
535file is a child, then it will have the same label as its parent;
536however, its label will only refer to its types, not its parents.
537.Lp
538It is also possible, though rather uncommon, for a
539.Nm
540file to have multiple labels. Labels are placed one after another, every
541eight bytes. When multiple labels are present, types may only belong to
542a single label.
543.Ss The Object Section
544The object section provides a mapping from ELF symbols of type
545.Sy STT_OBJECT
546in the symbol table to a type identifier. Every entry in this section is
547a
548.Sy uint16_t
549which contains a type identifier as described in the section
550.Sx Type Identifiers .
551If there is no information for an object, then the type identifier 0x0
552is stored for that entry.
553.Lp
554To walk the object section, you need to have a corresponding
555.Sy symbol table
556in the ELF object that contains the
557.Nm
558data. Not every object is included in this section. Specifically, when
559walking the symbol table. An entry is skipped if it matches any of the
560following conditions:
561.Lp
562.Bl -bullet -offset indent -compact
563.It
564The symbol type is not
565.Sy STT_OBJECT
566.It
567The symbol's section index is
568.Sy SHN_UNDEF
569.It
570The symbol's name offset is zero
571.It
572The symbol's section index is
573.Sy SHN_ABS
574and the value of the symbol is zero.
575.It
576The symbol's name is
577.Li _START_
578or
579.Li _END_ .
580These are skipped because they are used for scoping local symbols in
581ELF.
582.El
583.Lp
584The following sample code shows an example of iterating the object
585section and skipping the correct symbols:
586.Bd -literal
587#include <gelf.h>
588#include <stdio.h>
589
590/*
591 * Given the start of the object section in the CTF file, the number of symbols,
592 * and the ELF Data sections for the symbol table and the string table, this
593 * prints the type identifiers that correspond to objects. Note, a more robust
594 * implementation should ensure that they don't walk beyond the end of the CTF
595 * object section.
596 */
597static int
598walk_symbols(uint16_t *objtoff, Elf_Data *symdata, Elf_Data *strdata,
599    long nsyms)
600{
601	long i;
602	uintptr_t strbase = strdata->d_buf;
603
604	for (i = 1; i < nsyms; i++, objftoff++) {
605		const char *name;
606		GElf_Sym sym;
607
608		if (gelf_getsym(symdata, i, &sym) == NULL)
609			return (1);
610
611		if (GELF_ST_TYPE(sym.st_info) != STT_OBJECT)
612			continue;
613		if (sym.st_shndx == SHN_UNDEF || sym.st_name == 0)
614			continue;
615		if (sym.st_shndx == SHN_ABS && sym.st_value == 0)
616			continue;
617		name = (const char *)(strbase + sym.st_name);
618		if (strcmp(name, "_START_") == 0 || strcmp(name, "_END_") == 0)
619			continue;
620
621		(void) printf("Symbol %d has type %d\n", i, *objtoff);
622	}
623
624	return (0);
625}
626.Ed
627.Ss The Function Section
628The function section of the
629.Nm
630file encodes the types of both the function's arguments and the function's
631return type. Similar to
632.Sx The Object Section ,
633the function section encodes information for all symbols of type
634.Sy STT_FUNCTION ,
635excepting those that fit specific criteria. Unlike with objects, because
636functions have a variable number of arguments, they start with a type encoding
637as defined in
638.Sx Type Encoding ,
639which is the size of a
640.Sy uint16_t .
641For functions which have no type information available, they are encoded as
642.Li CTF_TYPE_INFO(CTF_K_UNKNOWN, 0, 0) .
643Functions with arguments are encoded differently. Here, the variable length is
644turned into the number of arguments in the function. If a function is a
645.Sy varargs
646type function, then the number of arguments is increased by one. Functions with
647type information are encoded as:
648.Li CTF_TYPE_INFO(CTF_K_FUNCTION, 0, nargs) .
649.Lp
650For functions that have no type information, nothing else is encoded, and the
651next function is encoded. For functions with type information, the next
652.Sy uint16_t
653is encoded with the type identifier of the return type of the function. It is
654followed by each of the type identifiers of the arguments, if any exist, in the
655order that they appear in the function.  Therefore, argument 0 is the first type
656identifier and so on. When a function has a final varargs argument, that is
657encoded with the type identifier of zero.
658.Lp
659Like
660.Sx The Object Section ,
661the function section is encoded in the order of the symbol table. It has
662similar, but slightly different considerations from objects. While iterating the
663symbol table, if any of the following conditions are true, then the entry is
664skipped and no corresponding entry is written:
665.Lp
666.Bl -bullet -offset indent -compact
667.It
668The symbol type is not
669.Sy STT_FUNCTION
670.It
671The symbol's section index is
672.Sy SHN_UNDEF
673.It
674The symbol's name offset is zero
675.It
676The symbol's name is
677.Li _START_
678or
679.Li _END_ .
680These are skipped because they are used for scoping local symbols in
681ELF.
682.El
683.Ss The Type Section
684The type section is the heart of the
685.Nm
686data. It encodes all of the information about the types themselves. The base of
687the type information comes in two forms, a short form and a long form, each of
688which may be followed by a variable number of arguments. The following
689definitions describe the short and long forms:
690.Bd -literal
691#define	CTF_MAX_SIZE	0xfffe	/* max size of a type in bytes */
692#define	CTF_LSIZE_SENT	0xffff	/* sentinel for ctt_size */
693#define	CTF_MAX_LSIZE	UINT64_MAX
694
695typedef struct ctf_stype {
696	uint_t ctt_name;	/* reference to name in string table */
697	ushort_t ctt_info;	/* encoded kind, variant length */
698	union {
699		ushort_t _size;	/* size of entire type in bytes */
700		ushort_t _type;	/* reference to another type */
701	} _u;
702} ctf_stype_t;
703
704typedef struct ctf_type {
705	uint_t ctt_name;	/* reference to name in string table */
706	ushort_t ctt_info;	/* encoded kind, variant length */
707	union {
708		ushort_t _size;	/* always CTF_LSIZE_SENT */
709		ushort_t _type; /* do not use */
710	} _u;
711	uint_t ctt_lsizehi;	/* high 32 bits of type size in bytes */
712	uint_t ctt_lsizelo;	/* low 32 bits of type size in bytes */
713} ctf_type_t;
714
715#define	ctt_size _u._size	/* for fundamental types that have a size */
716#define	ctt_type _u._type	/* for types that reference another type */
717.Ed
718.Pp
719Type sizes are stored in
720.Sy bytes .
721The basic small form uses a
722.Sy ushort_t
723to store the number of bytes. If the number of bytes in a structure would exceed
7240xfffe, then the alternate form, the
725.Sy ctf_type_t ,
726is used instead. To indicate that the larger form is being used, the member
727.Em ctt_size
728is set to value of
729.Sy CTF_LSIZE_SENT
730(0xffff). In general, when going through the type section, consumers use the
731.Sy ctf_type_t
732structure, but pay attention to the value of the member
733.Em ctt_size
734to determine whether they should increment their scan by the size of the
735.Sy ctf_stype_t
736or
737.Sy ctf_type_t .
738Not all kinds of types use
739.Sy ctt_size .
740Those which do not, will always use the
741.Sy ctf_stype_t
742structure. The individual sections for each kind have more information.
743.Lp
744Types are written out in order. Therefore the first entry encountered has a type
745id of 0x1, or 0x8000 if a child. The member
746.Em ctt_name
747is encoded as described in the section
748.Sx String Identifiers .
749The string that it points to is the name of the type. If the identifier points
750to an empty string (one that consists solely of a null terminator) then the type
751does not have a name, this is common with anonymous structures and unions that
752only have a typedef to name them, as well as, pointers and qualifiers.
753.Lp
754The next member, the
755.Em ctt_info ,
756is encoded as described in the section
757.Sx Type Encoding .
758The types kind tells us how to interpret the remaining data in the
759.Sy ctf_type_t
760and any variable length data that may exist. The rest of this section will be
761broken down into the interpretation of the various kinds.
762.Ss Encoding of Integers
763Integers, which are of type
764.Sy CTF_K_INTEGER ,
765have no variable length arguments. Instead, they are followed by a four byte
766.Sy uint_t
767which describes their encoding. All integers must be encoded with a variable
768length of zero. The
769.Em ctt_size
770member describes the length of the integer in bytes. In general, integer sizes
771will be rounded up to the closest power of two.
772.Lp
773The integer encoding contains three different pieces of information:
774.Bl -bullet -offset indent -compact
775.It
776The encoding of the integer
777.It
778The offset in
779.Sy bits
780of the type
781.It
782The size in
783.Sy bits
784of the type
785.El
786.Pp
787This encoding can be expressed through the following macros:
788.Bd -literal -offset indent
789#define	CTF_INT_ENCODING(data)	(((data) & 0xff000000) >> 24)
790#define	CTF_INT_OFFSET(data)	(((data) & 0x00ff0000) >> 16)
791#define	CTF_INT_BITS(data)	(((data) & 0x0000ffff))
792
793#define	CTF_INT_DATA(encoding, offset, bits) \\
794	(((encoding) << 24) | ((offset) << 16) | (bits))
795.Ed
796.Pp
797The following flags are defined for the encoding at this time:
798.Bd -literal -offset indent
799#define	CTF_INT_SIGNED		0x01
800#define	CTF_INT_CHAR		0x02
801#define	CTF_INT_BOOL		0x04
802#define	CTF_INT_VARARGS		0x08
803.Ed
804.Lp
805By default, an integer is considered to be unsigned, unless it has the
806.Sy CTF_INT_SIGNED
807flag set. If the flag
808.Sy CTF_INT_CHAR
809is set, that indicates that the integer is of a type that stores character
810data, for example the intrinsic C type
811.Sy char
812would have the
813.Sy CTF_INT_CHAR
814flag set. If the flag
815.Sy CTF_INT_BOOL
816is set, that indicates that the integer represents a boolean type. For example,
817the intrinsic C type
818.Sy _Bool
819would have the
820.Sy CTF_INT_BOOL
821flag set. Finally, the flag
822.Sy CTF_INT_VARARGS
823indicates that the integer is used as part of a variable number of arguments.
824This encoding is rather uncommon.
825.Ss Encoding of Floats
826Floats, which are of type
827.Sy CTF_K_FLOAT ,
828are similar to their integer counterparts. They have no variable length
829arguments and are followed by a four byte encoding which describes the kind of
830float that exists. The
831.Em ctt_size
832member is the size, in bytes, of the float. The float encoding has three
833different pieces of information inside of it:
834.Lp
835.Bl -bullet -offset indent -compact
836.It
837The specific kind of float that exists
838.It
839The offset in
840.Sy bits
841of the float
842.It
843The size in
844.Sy bits
845of the float
846.El
847.Lp
848This encoding can be expressed through the following macros:
849.Bd -literal -offset indent
850#define	CTF_FP_ENCODING(data)	(((data) & 0xff000000) >> 24)
851#define	CTF_FP_OFFSET(data)	(((data) & 0x00ff0000) >> 16)
852#define	CTF_FP_BITS(data)	(((data) & 0x0000ffff))
853
854#define	CTF_FP_DATA(encoding, offset, bits) \\
855	(((encoding) << 24) | ((offset) << 16) | (bits))
856.Ed
857.Lp
858Where as the encoding for integers was a series of flags, the encoding for
859floats maps to a specific kind of float. It is not a flag-based value. The kinds of floats
860correspond to both their size, and the encoding. This covers all of the basic C
861intrinsic floating point types. The following are the different kinds of floats
862represented in the encoding:
863.Bd -literal -offset indent
864#define	CTF_FP_SINGLE	1	/* IEEE 32-bit float encoding */
865#define	CTF_FP_DOUBLE	2	/* IEEE 64-bit float encoding */
866#define	CTF_FP_CPLX	3	/* Complex encoding */
867#define	CTF_FP_DCPLX	4	/* Double complex encoding */
868#define	CTF_FP_LDCPLX	5	/* Long double complex encoding */
869#define	CTF_FP_LDOUBLE	6	/* Long double encoding */
870#define	CTF_FP_INTRVL	7	/* Interval (2x32-bit) encoding */
871#define	CTF_FP_DINTRVL	8	/* Double interval (2x64-bit) encoding */
872#define	CTF_FP_LDINTRVL	9	/* Long double interval (2x128-bit) encoding */
873#define	CTF_FP_IMAGRY	10	/* Imaginary (32-bit) encoding */
874#define	CTF_FP_DIMAGRY	11	/* Long imaginary (64-bit) encoding */
875#define	CTF_FP_LDIMAGRY	12	/* Long double imaginary (128-bit) encoding */
876.Ed
877.Ss Encoding of Arrays
878Arrays, which are of type
879.Sy CTF_K_ARRAY ,
880have no variable length arguments. They are followed by a structure which
881describes the number of elements in the array, the type identifier of the
882elements in the array, and the type identifier of the index of the array. With
883arrays, the
884.Em ctt_size
885member is set to zero. The structure that follows an array is defined as:
886.Bd -literal
887typedef struct ctf_array {
888	ushort_t cta_contents;	/* reference to type of array contents */
889	ushort_t cta_index;	/* reference to type of array index */
890	uint_t cta_nelems;	/* number of elements */
891} ctf_array_t;
892.Ed
893.Lp
894The
895.Em cta_contents
896and
897.Em cta_index
898members of the
899.Sy ctf_array_t
900are type identifiers which are encoded as per the section
901.Sx Type Identifiers .
902The member
903.Em cta_nelems
904is a simple four byte unsigned count of the number of elements. This count may
905be zero when encountering C99's flexible array members.
906.Ss Encoding of Functions
907Function types, which are of type
908.Sy CTF_K_FUNCTION ,
909use the variable length list to be the number of arguments in the function. When
910the function has a final member which is a varargs, then the argument count is
911incremented by one to account for the variable argument. Here, the
912.Em ctt_type
913member is encoded with the type identifier of the return type of the function.
914Note that the
915.Em ctt_size
916member is not used here.
917.Lp
918The variable argument list contains the type identifiers for the arguments of
919the function, if any. Each one is represented by a
920.Sy uint16_t
921and encoded according to the
922.Sx Type Identifiers
923section. If the function's last argument is of type varargs, then it is also
924written out, but the type identifier is zero. This is included in the count of
925the function's arguments.
926.Ss Encoding of Structures and Unions
927Structures and Unions, which are encoded with
928.Sy CTF_K_STRUCT
929and
930.Sy CTF_K_UNION
931respectively,  are very similar constructs in C. The main difference
932between them is the fact that every member of a structure follows one another,
933where as in a union, all members share the same memory. They are also very
934similar in terms of their encoding in
935.Nm .
936The variable length argument for structures and unions represents the number of
937members that they have. The value of the member
938.Em ctt_size
939is the size of the structure and union. There are two different structures which
940are used to encode members in the variable list. When the size of a structure or
941union is greater than or equal to the large member threshold, 8192, then a
942different structure is used to encode the member, all members are encoded using
943the same structure. The structure for members is as follows:
944.Bd -literal
945typedef struct ctf_member {
946	uint_t ctm_name;	/* reference to name in string table */
947	ushort_t ctm_type;	/* reference to type of member */
948	ushort_t ctm_offset;	/* offset of this member in bits */
949} ctf_member_t;
950
951typedef struct ctf_lmember {
952	uint_t ctlm_name;	/* reference to name in string table */
953	ushort_t ctlm_type;	/* reference to type of member */
954	ushort_t ctlm_pad;	/* padding */
955	uint_t ctlm_offsethi;	/* high 32 bits of member offset in bits */
956	uint_t ctlm_offsetlo;	/* low 32 bits of member offset in bits */
957} ctf_lmember_t;
958.Ed
959.Lp
960Both the
961.Em ctm_name
962and
963.Em ctlm_name
964refer to the name of the member. The name is encoded as an offset into the
965string table as described by the section
966.Sx String Identifiers .
967The members
968.Sy ctm_type
969and
970.Sy ctlm_type
971both refer to the type of the member. They are encoded as per the section
972.Sx Type Identifiers .
973.Lp
974The last piece of information that is present is the offset which describes the
975offset in memory that the member begins at. For unions, this value will always
976be zero because the start of unions in memory is always zero. For structures,
977this is the offset in
978.Sy bits
979that the member begins at. Note that a compiler may lay out a type with padding.
980This means that the difference in offset between two consecutive members may be
981larger than the size of the member. When the size of the overall structure is
982strictly less than 8192 bytes, the normal structure,
983.Sy ctf_member_t ,
984is used and the offset in bits is stored in the member
985.Em ctm_offset .
986However, when the size of the structure is greater than or equal to 8192 bytes,
987then the number of bits is split into two 32-bit quantities. One member,
988.Em ctlm_offsethi ,
989represents the upper 32 bits of the offset, while the other member,
990.Em ctlm_offsetlo ,
991represents the lower 32 bits of the offset. These can be joined together to get
992a 64-bit sized offset in bits by shifting the member
993.Em ctlm_offsethi
994to the left by thirty two and then doing a binary or of
995.Em ctlm_offsetlo .
996.Ss Encoding of Enumerations
997Enumerations, noted by the type
998.Sy CTF_K_ENUM ,
999are similar to structures. Enumerations use the variable list to note the number
1000of values that the enumeration contains, which we'll term enumerators. In C, an
1001enumeration is always equivalent to the intrinsic type
1002.Sy int ,
1003thus the value of the member
1004.Em ctt_size
1005is always the size of an integer which is determined based on the current model.
1006For illumos systems, this will always be 4, as an integer is always defined to
1007be 4 bytes large in both
1008.Sy ILP32
1009and
1010.Sy LP64 ,
1011regardless of the architecture.
1012.Lp
1013The enumerators encoded in an enumeration have the following structure in the
1014variable list:
1015.Bd -literal
1016typedef struct ctf_enum {
1017	uint_t cte_name;	/* reference to name in string table */
1018	int cte_value;		/* value associated with this name */
1019} ctf_enum_t;
1020.Ed
1021.Pp
1022The member
1023.Em cte_name
1024refers to the name of the enumerator's value, it is encoded according to the
1025rules in the section
1026.Sx String Identifiers .
1027The member
1028.Em cte_value
1029contains the integer value of this enumerator.
1030.Ss Encoding of Forward References
1031Forward references, types of kind
1032.Sy CTF_K_FORWARD ,
1033in a
1034.Nm
1035file refer to types which may not have a definition at all, only a name. If
1036the
1037.Nm
1038file is a child, then it may be that the forward is resolved to an
1039actual type in the parent, otherwise the definition may be in another
1040.Nm
1041container or may not be known at all. The only member of the
1042.Sy ctf_type_t
1043that matters for a forward declaration is the
1044.Em ctt_name
1045which points to the name of the forward reference in the string table as
1046described earlier. There is no other information recorded for forward
1047references.
1048.Ss Encoding of Pointers, Typedefs, Volatile, Const, and Restrict
1049Pointers, typedefs, volatile, const, and restrict are all similar in
1050.Nm .
1051They all refer to another type. In the case of typedefs, they provide an
1052alternate name, while volatile, const, and restrict change how the type is
1053interpreted in the C programming language. This covers the
1054.Nm
1055kinds
1056.Sy CTF_K_POINTER ,
1057.Sy CTF_K_TYPEDEF ,
1058.Sy CTF_K_VOLATILE ,
1059.Sy CTF_K_RESTRICT ,
1060and
1061.Sy CTF_K_CONST .
1062.Lp
1063These types have no variable list entries and use the member
1064.Em ctt_type
1065to refer to the base type that they modify.
1066.Ss Encoding of Unknown Types
1067Types with the kind
1068.Sy CTF_K_UNKNOWN
1069are used to indicate gaps in the type identifier space. These entries consume an
1070identifier, but do not define anything. Nothing should refer to these gap
1071identifiers.
1072.Ss Dependencies Between Types
1073C types can be imagined as a directed, cyclic, graph. Structures and unions may
1074refer to each other in a way that creates a cyclic dependency. In cases such as
1075these, the entire type section must be read in and processed. Consumers must
1076not assume that every type can be laid out in dependency order; they
1077cannot.
1078.Ss The String Section
1079The last section of the
1080.Nm
1081file is the
1082.Sy string
1083section. This section encodes all of the strings that appear throughout
1084the other sections. It is laid out as a series of characters followed by
1085a null terminator. Generally, all names are written out in ASCII, as
1086most C compilers do not allow and characters to appear in identifiers
1087outside of a subset of ASCII. However, any extended characters sets
1088should be written out as a series of UTF-8 bytes.
1089.Lp
1090The first entry in the section, at offset zero, is a single null
1091terminator to reference the empty string. Following that, each C string
1092should be written out, including the null terminator. Offsets that refer
1093to something in this section should refer to the first byte which begins
1094a string. Beyond the first byte in the section being the null
1095terminator, the order of strings is unimportant.
1096.Sh Data Encoding and ELF Considerations
1097.Nm
1098data is generally included in ELF objects which specify information to
1099identify the architecture and endianness of the file. A
1100.Nm
1101container inside such an object must match the endianness of the ELF
1102object. Aside from the question of the endian encoding of data, there
1103should be no other differences between architectures. While many of the
1104types in this document refer to non-fixed size C integral types, they
1105are equivalent in the models
1106.Sy ILP32
1107and
1108.Sy LP64 .
1109If any other model is being used with
1110.Nm
1111data that has different sizes, then it must not use the model's sizes for
1112those integral types and instead use the fixed size equivalents based on an
1113.Sy ILP32
1114environment.
1115.Lp
1116When placing a
1117.Nm
1118container inside of an ELF object, there are certain conventions that are
1119expected for the purposes of tooling being able to find the
1120.Nm
1121data. In particular, a given ELF object should only contain a single
1122.Nm
1123section. Multiple containers should be merged together into a single
1124one.
1125.Lp
1126The
1127.Nm
1128file should be included in its own ELF section. The section's name
1129must be
1130.Ql .SUNW_ctf .
1131The type of the section must be
1132.Sy SHT_PROGBITS .
1133The section should have a link set to the symbol table and its address
1134alignment must be 4.
1135.Sh SEE ALSO
1136.Xr mdb 1 ,
1137.Xr dtrace 1M ,
1138.Xr gelf 3ELF ,
1139.Xr libelf 3LIB ,
1140.Xr a.out 4
1141