1.\" 2.\" This file and its contents are supplied under the terms of the 3.\" Common Development and Distribution License ("CDDL"), version 1.0. 4.\" You may only use this file in accordance with the terms of version 5.\" 1.0 of the CDDL. 6.\" 7.\" A full copy of the text of the CDDL should have accompanied this 8.\" source. A copy of the CDDL is also available via the Internet at 9.\" http://www.illumos.org/license/CDDL. 10.\" 11.\" 12.\" Copyright (c) 2014 Joyent, Inc. 13.\" 14.Dd February 28, 2022 15.Dt CTF 5 16.Os 17.Sh NAME 18.Nm ctf 19.Nd Compact C Type Format 20.Sh SYNOPSIS 21.In sys/ctf.h 22.Sh DESCRIPTION 23.Nm 24is designed to be a compact representation of the C programming 25language's type information focused on serving the needs of dynamic 26tracing, debuggers, and other in-situ and post-mortem introspection 27tools. 28.Nm 29data is generally included in 30.Sy ELF 31objects and is tagged as 32.Sy SHT_PROGBITS 33to ensure that the data is accessible in a running process and in subsequent 34core dumps, if generated. 35.Lp 36The 37.Nm 38data contained in each file has information about the layout and 39sizes of C types, including intrinsic types, enumerations, structures, 40typedefs, and unions, that are used by the corresponding 41.Sy ELF 42object. 43The 44.Nm 45data may also include information about the types of global objects and 46the return type and arguments of functions in the symbol table. 47.Lp 48Because a 49.Nm 50file is often embedded inside a file, rather than being a standalone 51file itself, it may also be referred to as a 52.Nm 53.Sy container . 54.Lp 55On 56.Fx 57systems, 58.Nm 59data is consumed by 60.Xr dtrace 1 . 61Programmatic access to 62.Nm 63data can be obtained through libctf. 64.Lp 65The 66.Nm 67file format is broken down into seven different sections. 68The first two sections are the 69.Sy preamble 70and 71.Sy header , 72which describe the version of the 73.Nm 74file, the links it has to other 75.Nm 76files, and the sizes of the other sections. 77The next section is the 78.Sy label 79section, 80which provides a way of identifying similar groups of 81.Nm 82data across multiple files. 83This is followed by the 84.Sy object 85information section, which describes the types of global 86symbols. 87The subsequent section is the 88.Sy function 89information section, which describes the return 90types and arguments of functions. 91The next section is the 92.Sy type 93information section, which describes 94the format and layout of the C types themselves, and finally the last 95section is the 96.Sy string 97section, which contains the names of types, enumerations, members, and 98labels. 99.Lp 100While strictly speaking, only the 101.Sy preamble 102and 103.Sy header 104are required, to be actually useful, both the type and string 105sections are necessary. 106.Lp 107A 108.Nm 109file may contain all of the type information that it requires, or it 110may optionally refer to another 111.Nm 112file which holds the remaining types. 113When a 114.Nm 115file refers to another file, it is called the 116.Sy child 117and the file it refers to is called the 118.Sy parent . 119A given file may only refer to one parent. 120This process is called 121.Em uniquification 122because it ensures each child only has type information that is 123unique to it. 124A common example of this is that most kernel modules in illumos are uniquified 125against the kernel module 126.Sy genunix 127and the type information that comes from the 128.Sy IP 129module. 130This means that a module only has types that are unique to itself and the most 131common types in the kernel are not duplicated. 132Uniquification is not used when building kernel modules on 133.Fx . 134.Sh FILE FORMAT 135This documents version 136.Em three 137of the 138.Nm 139file format. 140The 141.Xr ctfconvert 1 142and 143.Xr ctfmerge 1 144utilities emit 145.Nm 146version 3, and all other applications and libraries can operate on 147versions 2 and 3. 148.Lp 149The file format can be summarized with the following image, the 150following sections will cover this in more detail. 151.Bd -literal 152 153 +-------------+ 0t0 154+--------| Preamble | 155| +-------------+ 0t4 156|+-------| Header | 157|| +-------------+ 0t36 + cth_lbloff 158||+------| Labels | 159||| +-------------+ 0t36 + cth_objtoff 160|||+-----| Objects | 161|||| +-------------+ 0t36 + cth_funcoff 162||||+----| Functions | 163||||| +-------------+ 0t36 + cth_typeoff 164|||||+---| Types | 165|||||| +-------------+ 0t36 + cth_stroff 166||||||+--| Strings | 167||||||| +-------------+ 0t36 + cth_stroff + cth_strlen 168||||||| 169||||||| 170||||||| 171||||||| +-- magic - vers flags 172||||||| | | | | 173||||||| +------+------+------+------+ 174+---------| 0xcf | 0xf1 | 0x03 | 0x00 | 175 |||||| +------+------+------+------+ 176 |||||| 0 1 2 3 4 177 |||||| 178 |||||| + parent label + objects 179 |||||| | + parent name | + functions + strings 180 |||||| | | + label | | + types | + strlen 181 |||||| | | | | | | | | 182 |||||| +------+------+------+------+------+-------+-------+-------+ 183 +--------| 0x00 | 0x00 | 0x00 | 0x08 | 0x36 | 0x110 | 0x5f4 | 0x611 | 184 ||||| +------+------+------+------+------+-------+-------+-------+ 185 ||||| 0x04 0x08 0x0c 0x10 0x14 0x18 0x1c 0x20 0x24 186 ||||| 187 ||||| + Label name 188 ||||| | + Label type 189 ||||| | | + Next label 190 ||||| | | | 191 ||||| +-------+------+-----+ 192 +-----------| 0x01 | 0x42 | ... | 193 |||| +-------+------+-----+ 194 |||| cth_lbloff +0x4 +0x8 cth_objtoff 195 |||| 196 |||| 197 |||| Symidx 0t15 0t43 0t44 198 |||| +------+------+------+-----+ 199 +----------| 0x00 | 0x42 | 0x36 | ... | 200 ||| +------+------+------+-----+ 201 ||| cth_objtoff +0x4 +0x8 +0xc cth_funcoff 202 ||| 203 ||| + CTF_TYPE_INFO + CTF_TYPE_INFO 204 ||| | + Return type | 205 ||| | | + arg0 | 206 ||| +--------+------+------+-----+ 207 +---------| 0x2c10 | 0x08 | 0x0c | ... | 208 || +--------+------+------+-----+ 209 || cth_funcff +0x4 +0x8 +0xc cth_typeoff 210 || 211 || + ctf_stype_t for type 1 212 || | integer + integer encoding 213 || | | + ctf_stype_t for type 2 214 || | | | 215 || +--------------------+-----------+-----+ 216 +--------| 0x19 * 0xc01 * 0x0 | 0x1000000 | ... | 217 | +--------------------+-----------+-----+ 218 | cth_typeoff +0x0c +0x10 cth_stroff 219 | 220 | +--- str 0 221 | | +--- str 1 + str 2 222 | | | | 223 | v v v 224 | +----+---+---+---+----+---+---+---+---+---+----+ 225 +---| \\0 | i | n | t | \\0 | f | o | o | _ | t | \\0 | 226 +----+---+---+---+----+---+---+---+---+---+----+ 227 0 1 2 3 4 5 6 7 8 9 10 11 228.Ed 229.Lp 230Every 231.Nm 232file begins with a 233.Sy preamble , 234followed by a 235.Sy header . 236The 237.Sy preamble 238is defined as follows: 239.Bd -literal 240typedef struct ctf_preamble { 241 uint16_t ctp_magic; /* magic number (CTF_MAGIC) */ 242 uint8_t ctp_version; /* data format version number (CTF_VERSION) */ 243 uint8_t ctp_flags; /* flags (see below) */ 244} ctf_preamble_t; 245.Ed 246.Pp 247The 248.Sy preamble 249is four bytes long and must be four byte aligned. 250This 251.Sy preamble 252defines the version of the 253.Nm 254file which defines the format of the rest of the header. 255While the header may change in subsequent versions, the preamble will not change 256across versions, though the interpretation of its flags may change from 257version to version. 258The 259.Em ctp_magic 260member defines the magic number for the 261.Nm 262file format. 263This must always be 264.Li 0xcff1 . 265If another value is encountered, then the file should not be treated as 266a 267.Nm 268file. 269The 270.Em ctp_version 271member defines the version of the 272.Nm 273file. 274The current version is 275.Li 3 . 276It is possible to encounter an unsupported version. 277In that case, software should not try to parse the format, as it may have 278changed. 279Finally, the 280.Em ctp_flags 281member describes aspects of the file which modify its interpretation. 282The following flags are currently defined: 283.Bd -literal 284#define CTF_F_COMPRESS 0x01 285.Ed 286.Pp 287The flag 288.Sy CTF_F_COMPRESS 289indicates that the body of the 290.Nm 291file, all the data following the 292.Sy header , 293has been compressed through the 294.Sy zlib 295library and its 296.Sy deflate 297algorithm. 298If this flag is not present, then the body has not been compressed and no 299special action is needed to interpret it. 300All offsets into the data as described by 301.Sy header , 302always refer to the 303.Sy uncompressed 304data. 305.Lp 306In versions two and three of the 307.Nm 308file format, the 309.Sy header 310denotes whether or not this 311.Nm 312file is the child of another 313.Nm 314file and also indicates the size of the remaining sections. 315The structure for the 316.Sy header 317logically contains a copy of the 318.Sy preamble 319and the two have a combined size of 36 bytes. 320.Bd -literal 321typedef struct ctf_header { 322 ctf_preamble_t cth_preamble; 323 uint32_t cth_parlabel; /* ref to name of parent lbl uniq'd against */ 324 uint32_t cth_parname; /* ref to basename of parent */ 325 uint32_t cth_lbloff; /* offset of label section */ 326 uint32_t cth_objtoff; /* offset of object section */ 327 uint32_t cth_funcoff; /* offset of function section */ 328 uint32_t cth_typeoff; /* offset of type section */ 329 uint32_t cth_stroff; /* offset of string section */ 330 uint32_t cth_strlen; /* length of string section in bytes */ 331} ctf_header_t; 332.Ed 333.Pp 334After the 335.Sy preamble , 336the next two members 337.Em cth_parlabel 338and 339.Em cth_parname , 340are used to identify the parent. 341The value of both members are offsets into the 342.Sy string 343section which point to the start of a null-terminated string. 344For more information on the encoding of strings, see the subsection on 345.Sx String Identifiers . 346If the value of either is zero, then there is no entry for that 347member. 348If the member 349.Em cth_parlabel 350is set, then the 351.Em ctf_parname 352member must be set, otherwise it will not be possible to find the 353parent. 354If 355.Em ctf_parname 356is set, it is not necessary to define 357.Em cth_parlabel , 358as the parent may not have a label. 359For more information on labels and their interpretation, see 360.Sx The Label Section . 361.Lp 362The remaining members (excepting 363.Em cth_strlen ) 364describe the beginning of the corresponding sections. 365These offsets are relative to the end of the 366.Sy header . 367Therefore, something with an offset of 0 is at an offset of thirty-six 368bytes relative to the start of the 369.Nm 370file. 371The difference between members indicates the size of the section itself. 372Different offsets have different alignment requirements. 373The start of the 374.Em cth_objtoff 375and 376.Em cth_funcoff 377must be two byte aligned, while the sections 378.Em cth_lbloff 379and 380.Em cth_typeoff 381must be four-byte aligned. 382The section 383.Em cth_stroff 384has no alignment requirements. 385To calculate the size of a given section, excepting the 386.Sy string 387section, one should subtract the offset of the section from the following one. 388For example, the size of the 389.Sy types 390section can be calculated by subtracting 391.Em cth_typeoff 392from 393.Em cth_stroff . 394.Lp 395Finally, the member 396.Em cth_strlen 397describes the length of the string section itself. 398From it, you can also calculate the size of the entire 399.Nm 400file by adding together the size of the 401.Sy ctf_header_t , 402the offset of the string section in 403.Em cth_stroff , 404and the size of the string section in 405.Em cth_srlen . 406.Ss Type Identifiers 407Through the 408.Nm ctf 409data, types are referred to by identifiers. 410A given 411.Nm 412file supports up to 2147483646 (0x7ffffffe) types. 413.Nm 414version 2 had a much smaller limit of 32767 types. 415The first valid type identifier is 0x1. 416When a given 417.Nm 418file is a child, indicated by a non-zero entry for the 419.Sy header Ns 's 420.Em cth_parname , 421then the first valid type identifier is 0x80000000 and the last is 0xfffffffe. 422In this case, type identifiers 0x1 through 0x7ffffffe are references to the 423parent. 4240x7fffffff and 0xffffffff are not treated as valid type identifiers so as to 425enable the use of -1 as an error value. 426.Lp 427The type identifier zero is a sentinel value used to indicate that there 428is no type information available or it is an unknown type. 429.Lp 430Throughout the file format, the identifier is stored in different sized 431values; however, the minimum size to represent a given identifier is a 432.Sy uint16_t . 433Other consumers of 434.Nm 435information may use larger or opaque identifiers. 436.Ss String Identifiers 437String identifiers are always encoded as four byte unsigned integers 438which are an offset into a string table. 439The 440.Nm 441format supports two different string tables which have an identifier of 442zero or one. 443This identifier is stored in the high-order bit of the unsigned four byte 444offset. 445Therefore, the maximum supported offset into one of these tables is 0x7ffffffff. 446.Lp 447Table identifier zero, always refers to the 448.Sy string 449section in the CTF file itself. 450String table identifier one refers to an external string table which is the ELF 451string table for the ELF symbol table associated with the 452.Nm 453container. 454.Ss Type Encoding 455Every 456.Nm 457type begins with metadata encoded into a 458.Sy uint32_t . 459This encoded information tells us three different pieces of information: 460.Bl -bullet -offset indent -compact 461.It 462The kind of the type 463.It 464Whether this type is a root type or not 465.It 466The length of the variable data 467.El 468.Lp 469The 32 bits that make up the encoding are broken down into six bits 470for the kind (bits 26 to 31), one bit for the root type flag (bit 25), 471and 25 bits for the length of the variable data. 472.Lp 473The current version of the file format defines 14 different kinds. 474The interpretation of these different kinds will be discussed in the section 475.Sx The Type Section . 476If a kind is encountered that is not listed below, then it is not a valid 477.Nm 478file. 479The kinds are defined as follows: 480.Bd -literal -offset indent 481#define CTF_K_UNKNOWN 0 482#define CTF_K_INTEGER 1 483#define CTF_K_FLOAT 2 484#define CTF_K_POINTER 3 485#define CTF_K_ARRAY 4 486#define CTF_K_FUNCTION 5 487#define CTF_K_STRUCT 6 488#define CTF_K_UNION 7 489#define CTF_K_ENUM 8 490#define CTF_K_FORWARD 9 491#define CTF_K_TYPEDEF 10 492#define CTF_K_VOLATILE 11 493#define CTF_K_CONST 12 494#define CTF_K_RESTRICT 13 495.Ed 496.Lp 497Programs directly reference many types; however, other types are referenced 498indirectly because they are part of some other structure. 499These types that are referenced directly and used are called 500.Sy root 501types. 502Other types may be used indirectly, for example, a program may reference 503a structure directly, but not one of its members which has a type. 504That type is not considered a 505.Sy root 506type. 507If a type is a 508.Sy root 509type, then it will have bit 25 set. 510.Lp 511The variable length section is specific to each kind and is discussed in the 512section 513.Sx The Type Section . 514.Lp 515The following macros are useful for constructing and deconstructing the encoded 516type information: 517.Bd -literal -offset indent 518 519#define CTF_V3_MAX_VLEN 0x00ffffff 520#define CTF_V3_INFO_KIND(info) (((info) & 0xfc000000) >> 26) 521#define CTF_V3_INFO_ISROOT(info) (((info) & 0x02000000) >> 25) 522#define CTF_V3_INFO_VLEN(info) (((info) & CTF_V3_MAX_VLEN)) 523 524#define CTF_V3_TYPE_INFO(kind, isroot, vlen) \\ 525 (((kind) << 26) | (((isroot) ? 1 : 0) << 25) | ((vlen) & CTF_V3_MAX_VLEN)) 526.Ed 527.Ss The Label Section 528When consuming 529.Nm 530data, it is often useful to know whether two different 531.Nm 532containers come from the same source base and version. 533For example, when building illumos, there are many kernel modules that are built 534against a single collection of source code. 535A label is encoded into the 536.Nm 537files that corresponds with the particular build. 538This ensures that if files on the system were to become mixed up from multiple 539releases, that they are not used together by tools, particularly when a child 540needs to refer to a type in the parent. 541Because they are linked using the type identifiers, if the wrong parent is used 542then the wrong type will be encountered. 543Note that this mechanism is not currently used on 544.Fx . 545In particular, kernel modules built on 546.Fx 547each contain a complete type graph. 548.Lp 549Each label is encoded in the file format using the following eight byte 550structure: 551.Bd -literal 552typedef struct ctf_lblent { 553 uint32_t ctl_label; /* ref to name of label */ 554 uint32_t ctl_typeidx; /* last type associated with this label */ 555} ctf_lblent_t; 556.Ed 557.Lp 558Each label has two different components, a name and a type identifier. 559The name is encoded in the 560.Em ctl_label 561member which is in the format defined in the section 562.Sx String Identifiers . 563Generally, the names of all labels are found in the internal string 564section. 565.Lp 566The type identifier encoded in the member 567.Em ctl_typeidx 568refers to the last type identifier that a label refers to in the current 569file. 570Labels only refer to types in the current file, if the 571.Nm 572file is a child, then it will have the same label as its parent; 573however, its label will only refer to its types, not its parent's. 574.Lp 575It is also possible, though rather uncommon, for a 576.Nm 577file to have multiple labels. 578Labels are placed one after another, every eight bytes. 579When multiple labels are present, types may only belong to a single label. 580.Ss The Object Section 581The object section provides a mapping from ELF symbols of type 582.Sy STT_OBJECT 583in the symbol table to a type identifier. 584Every entry in this section is a 585.Sy uint32_t 586which contains a type identifier as described in the section 587.Sx Type Identifiers . 588If there is no information for an object, then the type identifier 0x0 589is stored for that entry. 590.Lp 591To walk the object section, you need to have a corresponding 592.Sy symbol table 593in the ELF object that contains the 594.Nm 595data. 596Not every object is included in this section. 597Specifically, when walking the symbol table, an entry is skipped if it matches 598any of the following conditions: 599.Lp 600.Bl -bullet -offset indent -compact 601.It 602The symbol type is not 603.Sy STT_OBJECT 604.It 605The symbol's section index is 606.Sy SHN_UNDEF 607.It 608The symbol's name offset is zero 609.It 610The symbol's section index is 611.Sy SHN_ABS 612and the value of the symbol is zero. 613.It 614The symbol's name is 615.Li _START_ 616or 617.Li _END_ . 618These are skipped because they are used for scoping local symbols in 619ELF. 620.El 621.Lp 622The following sample code shows an example of iterating the object 623section and skipping the correct symbols: 624.Bd -literal 625#include <gelf.h> 626#include <stdio.h> 627 628/* 629 * Given the start of the object section in a CTFv3 file, the number of symbols, 630 * and the ELF Data sections for the symbol table and the string table, this 631 * prints the type identifiers that correspond to objects. Note, a more robust 632 * implementation should ensure that they don't walk beyond the end of the CTF 633 * object section. 634 * 635 * An implementation that handles CTFv2 must take into account the fact that 636 * type identifiers are 16 bits wide rather than 32 bits wide. 637 */ 638static int 639walk_symbols(uint32_t *objtoff, Elf_Data *symdata, Elf_Data *strdata, 640 long nsyms) 641{ 642 long i; 643 uintptr_t strbase = strdata->d_buf; 644 645 for (i = 1; i < nsyms; i++, objftoff++) { 646 const char *name; 647 GElf_Sym sym; 648 649 if (gelf_getsym(symdata, i, &sym) == NULL) 650 return (1); 651 652 if (GELF_ST_TYPE(sym.st_info) != STT_OBJECT) 653 continue; 654 if (sym.st_shndx == SHN_UNDEF || sym.st_name == 0) 655 continue; 656 if (sym.st_shndx == SHN_ABS && sym.st_value == 0) 657 continue; 658 name = (const char *)(strbase + sym.st_name); 659 if (strcmp(name, "_START_") == 0 || strcmp(name, "_END_") == 0) 660 continue; 661 662 (void) printf("Symbol %d has type %d\n", i, *objtoff); 663 } 664 665 return (0); 666} 667.Ed 668.Ss The Function Section 669The function section of the 670.Nm 671file encodes the types of both the function's arguments and the function's 672return value. 673Similar to 674.Sx The Object Section , 675the function section encodes information for all symbols of type 676.Sy STT_FUNCTION , 677excepting those that fit specific criteria. 678Unlike with objects, because functions have a variable number of arguments, they 679start with a type encoding as defined in 680.Sx Type Encoding , 681which is the size of a 682.Sy uint32_t . 683For functions which have no type information available, they are encoded as 684.Li CTF_V3_TYPE_INFO(CTF_K_UNKNOWN, 0, 0) . 685Functions with arguments are encoded differently. 686Here, the variable length is turned into the number of arguments in the 687function. 688If a function is a 689.Sy varargs 690type function, then the number of arguments is increased by one. 691Functions with type information are encoded as: 692.Li CTF_V3_TYPE_INFO(CTF_K_FUNCTION, 0, nargs) . 693.Lp 694For functions that have no type information, nothing else is encoded, and the 695next function is encoded. 696For functions with type information, the next 697.Sy uint32_t 698is encoded with the type identifier of the return type of the function. 699It is followed by each of the type identifiers of the arguments, if any exist, 700in the order that they appear in the function. 701Therefore, argument 0 is the first type identifier and so on. 702When a function has a final varargs argument, that is encoded with the type 703identifier of zero. 704.Lp 705Like 706.Sx The Object Section , 707the function section is encoded in the order of the symbol table. 708It has similar, but slightly different considerations from objects. 709While iterating the symbol table, if any of the following conditions are true, 710then the entry is skipped and no corresponding entry is written: 711.Lp 712.Bl -bullet -offset indent -compact 713.It 714The symbol type is not 715.Sy STT_FUNCTION 716.It 717The symbol's section index is 718.Sy SHN_UNDEF 719.It 720The symbol's name offset is zero 721.It 722The symbol's name is 723.Li _START_ 724or 725.Li _END_ . 726These are skipped because they are used for scoping local symbols in 727ELF. 728.El 729.Ss The Type Section 730The type section is the heart of the 731.Nm 732data. 733It encodes all of the information about the types themselves. 734The base of the type information comes in two forms, a short form and a long 735form, each of which may be followed by a variable number of arguments. 736The following definitions describe the short and long forms: 737.Bd -literal 738#define CTF_V3_MAX_SIZE 0xfffffffe /* max size of a type in bytes */ 739#define CTF_V3_LSIZE_SENT 0xffffffff /* sentinel for ctt_size */ 740#define CTF_V3_MAX_LSIZE UINT64_MAX 741 742struct ctf_stype_v3 { 743 uint32_t ctt_name; /* reference to name in string table */ 744 uint32_t ctt_info; /* encoded kind, variant length */ 745 union { 746 uint32_t _size; /* size of entire type in bytes */ 747 uint32_t _type; /* reference to another type */ 748 } _u; 749}; 750 751struct ctf_type_v3 { 752 uint32_t ctt_name; /* reference to name in string table */ 753 uint32_t ctt_info; /* encoded kind, variant length */ 754 union { 755 uint32_t _size; /* always CTF_LSIZE_SENT */ 756 uint32_t _type; /* do not use */ 757 } _u; 758 uint32_t ctt_lsizehi; /* high 32 bits of type size in bytes */ 759 uint32_t ctt_lsizelo; /* low 32 bits of type size in bytes */ 760}; 761 762#define ctt_size _u._size /* for fundamental types that have a size */ 763#define ctt_type _u._type /* for types that reference another type */ 764.Ed 765.Pp 766Type sizes are stored in 767.Sy bytes . 768The basic small form uses a 769.Sy uint32_t 770to store the number of bytes. 771If the number of bytes in a structure would exceed 0xfffffffe, then the 772alternate form, the 773.Sy struct ctf_type_v3 , 774is used instead. 775To indicate that the larger form is being used, the member 776.Em ctt_size 777is set to value of 778.Sy CTF_V3_LSIZE_SENT 779(0xffffffff). 780In general, when going through the type section, consumers use the 781.Sy struct ctf_type_v3 782structure, but pay attention to the value of the member 783.Em ctt_size 784to determine whether they should increment their scan by the size of 785.Sy struct ctf_stype_v3 786or 787.Sy struct ctf_type_v3 . 788Not all kinds of types use 789.Sy ctt_size . 790Those which do not, will always use the 791.Sy struct ctf_stype_v3 792structure. 793The individual sections for each kind have more information. 794.Lp 795Types are written out in order. 796Therefore the first entry encountered has a type id of 0x1, or 0x8000 if a 797child. 798The member 799.Em ctt_name 800is encoded as described in the section 801.Sx String Identifiers . 802The string that it points to is the name of the type. 803If the identifier points to an empty string (one that consists solely of a null 804terminator) then the type does not have a name, this is common with anonymous 805structures and unions that only have a typedef to name them, as well as 806pointers and qualifiers. 807.Lp 808The next member, the 809.Em ctt_info , 810is encoded as described in the section 811.Sx Type Encoding . 812The type's kind tells us how to interpret the remaining data in the 813.Sy struct ctf_type_v3 814and any variable length data that may exist. 815The rest of this section will be broken down into the interpretation of the 816various kinds. 817.Ss Encoding of Integers 818Integers, which are of type 819.Sy CTF_K_INTEGER , 820have no variable length arguments. 821Instead, they are followed by a 822.Sy uint32_t 823which describes their encoding. 824All integers must be encoded with a variable length of zero. 825The 826.Em ctt_size 827member describes the length of the integer in bytes. 828In general, integer sizes will be rounded up to the closest power of two. 829.Lp 830The integer encoding contains three different pieces of information: 831.Bl -bullet -offset indent -compact 832.It 833The encoding of the integer 834.It 835The offset in 836.Sy bits 837of the type 838.It 839The size in 840.Sy bits 841of the type 842.El 843.Pp 844This encoding can be expressed through the following macros: 845.Bd -literal -offset indent 846#define CTF_INT_ENCODING(data) (((data) & 0xff000000) >> 24) 847#define CTF_INT_OFFSET(data) (((data) & 0x00ff0000) >> 16) 848#define CTF_INT_BITS(data) (((data) & 0x0000ffff)) 849 850#define CTF_INT_DATA(encoding, offset, bits) \\ 851 (((encoding) << 24) | ((offset) << 16) | (bits)) 852.Ed 853.Pp 854The following flags are defined for the encoding at this time: 855.Bd -literal -offset indent 856#define CTF_INT_SIGNED 0x01 857#define CTF_INT_CHAR 0x02 858#define CTF_INT_BOOL 0x04 859#define CTF_INT_VARARGS 0x08 860.Ed 861.Lp 862By default, an integer is considered to be unsigned, unless it has the 863.Sy CTF_INT_SIGNED 864flag set. 865If the flag 866.Sy CTF_INT_CHAR 867is set, that indicates that the integer is of a type that stores character 868data, for example the intrinsic C type 869.Sy char 870would have the 871.Sy CTF_INT_CHAR 872flag set. 873If the flag 874.Sy CTF_INT_BOOL 875is set, that indicates that the integer represents a boolean type. 876For example, the intrinsic C type 877.Sy _Bool 878would have the 879.Sy CTF_INT_BOOL 880flag set. 881Finally, the flag 882.Sy CTF_INT_VARARGS 883indicates that the integer is used as part of a variable number of arguments. 884This encoding is rather uncommon. 885.Ss Encoding of Floats 886Floats, which are of type 887.Sy CTF_K_FLOAT , 888are similar to their integer counterparts. 889They have no variable length arguments and are followed by a four byte encoding 890which describes the kind of float that exists. 891The 892.Em ctt_size 893member is the size, in bytes, of the float. 894The float encoding has three different pieces of information inside of it: 895.Lp 896.Bl -bullet -offset indent -compact 897.It 898The specific kind of float that exists 899.It 900The offset in 901.Sy bits 902of the float 903.It 904The size in 905.Sy bits 906of the float 907.El 908.Lp 909This encoding can be expressed through the following macros: 910.Bd -literal -offset indent 911#define CTF_FP_ENCODING(data) (((data) & 0xff000000) >> 24) 912#define CTF_FP_OFFSET(data) (((data) & 0x00ff0000) >> 16) 913#define CTF_FP_BITS(data) (((data) & 0x0000ffff)) 914 915#define CTF_FP_DATA(encoding, offset, bits) \\ 916 (((encoding) << 24) | ((offset) << 16) | (bits)) 917.Ed 918.Lp 919Where as the encoding for integers is a series of flags, the encoding for 920floats maps to a specific kind of float. 921It is not a flag-based value. 922The kinds of floats correspond to both their size, and the encoding. 923This covers all of the basic C intrinsic floating point types. 924The following are the different kinds of floats represented in the encoding: 925.Bd -literal -offset indent 926#define CTF_FP_SINGLE 1 /* IEEE 32-bit float encoding */ 927#define CTF_FP_DOUBLE 2 /* IEEE 64-bit float encoding */ 928#define CTF_FP_CPLX 3 /* Complex encoding */ 929#define CTF_FP_DCPLX 4 /* Double complex encoding */ 930#define CTF_FP_LDCPLX 5 /* Long double complex encoding */ 931#define CTF_FP_LDOUBLE 6 /* Long double encoding */ 932#define CTF_FP_INTRVL 7 /* Interval (2x32-bit) encoding */ 933#define CTF_FP_DINTRVL 8 /* Double interval (2x64-bit) encoding */ 934#define CTF_FP_LDINTRVL 9 /* Long double interval (2x128-bit) encoding */ 935#define CTF_FP_IMAGRY 10 /* Imaginary (32-bit) encoding */ 936#define CTF_FP_DIMAGRY 11 /* Long imaginary (64-bit) encoding */ 937#define CTF_FP_LDIMAGRY 12 /* Long double imaginary (128-bit) encoding */ 938.Ed 939.Ss Encoding of Arrays 940Arrays, which are of type 941.Sy CTF_K_ARRAY , 942have no variable length arguments. 943They are followed by a structure which describes the number of elements in the 944array, the type identifier of the elements in the array, and the type identifier 945of the index of the array. 946With arrays, the 947.Em ctt_size 948member is set to zero. 949The structure that follows an array is defined as: 950.Bd -literal 951struct ctf_array_v3 { 952 uint32_t cta_contents; /* reference to type of array contents */ 953 uint32_t cta_index; /* reference to type of array index */ 954 uint32_t cta_nelems; /* number of elements */ 955}; 956.Ed 957.Lp 958The 959.Em cta_contents 960and 961.Em cta_index 962members of the 963.Sy struct ctf_array_v3 964are type identifiers which are encoded as per the section 965.Sx Type Identifiers . 966The member 967.Em cta_nelems 968is a simple four byte unsigned count of the number of elements. 969This count may be zero when encountering C99's flexible array members. 970.Ss Encoding of Functions 971Function types, which are of type 972.Sy CTF_K_FUNCTION , 973use the variable length list to be the number of arguments in the function. 974When the function has a final member which is a varargs, then the argument count 975is incremented by one to account for the variable argument. 976Here, the 977.Em ctt_type 978member is encoded with the type identifier of the return type of the function. 979Note that the 980.Em ctt_size 981member is not used here. 982.Lp 983The variable argument list contains the type identifiers for the arguments of 984the function, if any. 985Each one is represented by a 986.Sy uint32_t 987and encoded according to the 988.Sx Type Identifiers 989section. 990If the function's last argument is of type varargs, then it is also written out, 991but the type identifier is zero. 992This is included in the count of the function's arguments. 993In 994.Nm 995version 2, an extra type identifier may follow the argument and return type 996identifiers in order to maintain four-byte alignment for the following type 997definition. 998Such a type identifier is not included in the argument count and has a value 999of zero. 1000In 1001.Nm 1002version 3, four-byte alignment occurs naturally and no padding is used. 1003.Ss Encoding of Structures and Unions 1004Structures and Unions, which are encoded with 1005.Sy CTF_K_STRUCT 1006and 1007.Sy CTF_K_UNION 1008respectively, are very similar constructs in C. 1009The main difference between them is the fact that members of a structure 1010follow one another, where as in a union, all members share the same memory. 1011They are also very similar in terms of their encoding in 1012.Nm . 1013The variable length argument for structures and unions represents the number of 1014members that they have. 1015The value of the member 1016.Em ctt_size 1017is the size of the structure and union. 1018There are two different structures which are used to encode members in the 1019variable list. 1020When the size of a structure or union is greater than or equal to the large 1021member threshold, 536870912, then a different structure is used to encode the 1022member; all members are encoded using the same structure. 1023The structure for members is as follows: 1024.Bd -literal 1025struct ctf_member_v3 { 1026 uint32_t ctm_name; /* reference to name in string table */ 1027 uint32_t ctm_type; /* reference to type of member */ 1028 uint32_t ctm_offset; /* offset of this member in bits */ 1029}; 1030 1031struct ctf_lmember_v3 { 1032 uint32_t ctlm_name; /* reference to name in string table */ 1033 uint32_t ctlm_type; /* reference to type of member */ 1034 uint32_t ctlm_offsethi; /* high 32 bits of member offset in bits */ 1035 uint32_t ctlm_offsetlo; /* low 32 bits of member offset in bits */ 1036}; 1037.Ed 1038.Lp 1039Both the 1040.Em ctm_name 1041and 1042.Em ctlm_name 1043refer to the name of the member. 1044The name is encoded as an offset into the string table as described by the 1045section 1046.Sx String Identifiers . 1047The members 1048.Sy ctm_type 1049and 1050.Sy ctlm_type 1051both refer to the type of the member. 1052They are encoded as per the section 1053.Sx Type Identifiers . 1054.Lp 1055The last piece of information that is present is the offset which describes the 1056offset in memory at which the member begins. 1057For unions, this value will always be zero because each member of a union has 1058an offset of zero. 1059For structures, this is the offset in 1060.Sy bits 1061at which the member begins. 1062Note that a compiler may lay out a type with padding. 1063This means that the difference in offset between two consecutive members may be 1064larger than the size of the member. 1065When the size of the overall structure is strictly less than 536870912 bytes, 1066the normal structure, 1067.Sy struct ctf_member_v3 , 1068is used and the offset in bits is stored in the member 1069.Em ctm_offset . 1070However, when the size of the structure is greater than or equal to 536870912 1071bytes, then the number of bits is split into two 32-bit quantities. 1072One member, 1073.Em ctlm_offsethi , 1074represents the upper 32 bits of the offset, while the other member, 1075.Em ctlm_offsetlo , 1076represents the lower 32 bits of the offset. 1077These can be joined together to get a 64-bit sized offset in bits by shifting 1078the member 1079.Em ctlm_offsethi 1080to the left by thirty two and then doing a binary or of 1081.Em ctlm_offsetlo . 1082.Ss Encoding of Enumerations 1083Enumerations, noted by the type 1084.Sy CTF_K_ENUM , 1085are similar to structures. 1086Enumerations use the variable list to note the number of values that the 1087enumeration contains, which we'll term enumerators. 1088In C, an enumeration is always equivalent to the intrinsic type 1089.Sy int , 1090thus the value of the member 1091.Em ctt_size 1092is always the size of an integer which is determined based on the current model. 1093For 1094.Fx 1095systems, this will always be 4, as an integer is always defined to 1096be 4 bytes large in both 1097.Sy ILP32 1098and 1099.Sy LP64 , 1100regardless of the architecture. 1101For further details, see 1102.Xr arch 7 . 1103.Lp 1104The enumerators encoded in an enumeration have the following structure in the 1105variable list: 1106.Bd -literal 1107typedef struct ctf_enum { 1108 uint32_t cte_name; /* reference to name in string table */ 1109 int32_t cte_value; /* value associated with this name */ 1110} ctf_enum_t; 1111.Ed 1112.Pp 1113The member 1114.Em cte_name 1115refers to the name of the enumerator's value, it is encoded according to the 1116rules in the section 1117.Sx String Identifiers . 1118The member 1119.Em cte_value 1120contains the integer value of this enumerator. 1121.Ss Encoding of Forward References 1122Forward references, types of kind 1123.Sy CTF_K_FORWARD , 1124in a 1125.Nm 1126file refer to types which may not have a definition at all, only a name. 1127If the 1128.Nm 1129file is a child, then it may be that the forward is resolved to an 1130actual type in the parent, otherwise the definition may be in another 1131.Nm 1132container or may not be known at all. 1133The only member of the 1134.Sy struct ctf_type_v3 1135that matters for a forward declaration is the 1136.Em ctt_name 1137which points to the name of the forward reference in the string table as 1138described earlier. 1139There is no other information recorded for forward references. 1140.Ss Encoding of Pointers, Typedefs, Volatile, Const, and Restrict 1141Pointers, typedefs, volatile, const, and restrict are all similar in 1142.Nm . 1143They all refer to another type. 1144In the case of typedefs, they provide an alternate name, while volatile, const, 1145and restrict change how the type is interpreted in the C programming language. 1146This covers the 1147.Nm 1148kinds 1149.Sy CTF_K_POINTER , 1150.Sy CTF_K_TYPEDEF , 1151.Sy CTF_K_VOLATILE , 1152.Sy CTF_K_RESTRICT , 1153and 1154.Sy CTF_K_CONST . 1155.Lp 1156These types have no variable list entries and use the member 1157.Em ctt_type 1158to refer to the base type that they modify. 1159.Ss Encoding of Unknown Types 1160Types with the kind 1161.Sy CTF_K_UNKNOWN 1162are used to indicate gaps in the type identifier space. 1163These entries consume an identifier, but do not define anything. 1164Nothing should refer to these gap identifiers. 1165.Ss Dependencies Between Types 1166C types can be imagined as a directed, cyclic, graph. 1167Structures and unions may refer to each other in a way that creates a cyclic 1168dependency. 1169In cases such as these, the entire type section must be read in and processed. 1170Consumers must not assume that every type can be laid out in dependency order; 1171they cannot. 1172.Ss The String Section 1173The last section of the 1174.Nm 1175file is the 1176.Sy string 1177section. 1178This section encodes all of the strings that appear throughout the other 1179sections. 1180It is laid out as a series of characters followed by a null terminator. 1181Generally, all names are written out in ASCII, as most C compilers do not allow 1182any characters to appear in identifiers outside of a subset of ASCII. 1183However, any extended characters sets should be written out as a series of UTF-8 1184bytes. 1185.Lp 1186The first entry in the section, at offset zero, is a single null 1187terminator to reference the empty string. 1188Following that, each C string should be written out, including the null 1189terminator. 1190Offsets that refer to something in this section should refer to the first byte 1191which begins a string. 1192Beyond the first byte in the section being the null terminator, the order of 1193strings is unimportant. 1194.Ss Data Encoding and ELF Considerations 1195.Nm 1196data is generally included in ELF objects which specify information to 1197identify the architecture and endianness of the file. 1198A 1199.Nm 1200container inside such an object must match the endianness of the ELF object. 1201Aside from the question of the endian encoding of data, there should be no other 1202differences between architectures. 1203While many of the types in this document refer to non-fixed size C integral 1204types, they are equivalent in the models 1205.Sy ILP32 1206and 1207.Sy LP64 . 1208If any other model is being used with 1209.Nm 1210data that has different sizes, then it must not use the model's sizes for 1211those integral types and instead use the fixed size equivalents based on an 1212.Sy ILP32 1213environment. 1214.Lp 1215When placing a 1216.Nm 1217container inside of an ELF object, there are certain conventions that are 1218expected for the purposes of tooling being able to find the 1219.Nm 1220data. 1221In particular, a given ELF object should only contain a single 1222.Nm 1223section. 1224Multiple containers should be merged together into a single one. 1225.Lp 1226The 1227.Nm 1228file should be included in its own ELF section. 1229The section's name must be 1230.Ql .SUNW_ctf . 1231The type of the section must be 1232.Sy SHT_PROGBITS . 1233The section should have a link set to the symbol table and its address 1234alignment must be 4. 1235.Sh SEE ALSO 1236.Xr ctfconvert 1 , 1237.Xr ctfdump 1 , 1238.Xr ctfmerge 1 , 1239.Xr dtrace 1 , 1240.Xr elf 3 , 1241.Xr gelf 3 , 1242.Xr a.out 5 , 1243.Xr elf 5 , 1244.Xr arch 7 1245