10b57cec5SDimitry AndricThe ELF, COFF and Wasm Linkers 20b57cec5SDimitry Andric============================== 30b57cec5SDimitry Andric 40b57cec5SDimitry AndricThe ELF Linker as a Library 50b57cec5SDimitry Andric--------------------------- 60b57cec5SDimitry Andric 70b57cec5SDimitry AndricYou can embed LLD to your program by linking against it and calling the linker's 8*06c3fb27SDimitry Andricentry point function ``lld::lldMain``. 90b57cec5SDimitry Andric 10480093f4SDimitry AndricThe current policy is that it is your responsibility to give trustworthy object 110b57cec5SDimitry Andricfiles. The function is guaranteed to return as long as you do not pass corrupted 120b57cec5SDimitry Andricor malicious object files. A corrupted file could cause a fatal error or SEGV. 130b57cec5SDimitry AndricThat being said, you don't need to worry too much about it if you create object 140b57cec5SDimitry Andricfiles in the usual way and give them to the linker. It is naturally expected to 150b57cec5SDimitry Andricwork, or otherwise it's a linker's bug. 160b57cec5SDimitry Andric 170b57cec5SDimitry AndricDesign 180b57cec5SDimitry Andric====== 190b57cec5SDimitry Andric 200b57cec5SDimitry AndricWe will describe the design of the linkers in the rest of the document. 210b57cec5SDimitry Andric 220b57cec5SDimitry AndricKey Concepts 230b57cec5SDimitry Andric------------ 240b57cec5SDimitry Andric 250b57cec5SDimitry AndricLinkers are fairly large pieces of software. 260b57cec5SDimitry AndricThere are many design choices you have to make to create a complete linker. 270b57cec5SDimitry Andric 280b57cec5SDimitry AndricThis is a list of design choices we've made for ELF and COFF LLD. 290b57cec5SDimitry AndricWe believe that these high-level design choices achieved a right balance 300b57cec5SDimitry Andricbetween speed, simplicity and extensibility. 310b57cec5SDimitry Andric 320b57cec5SDimitry Andric* Implement as native linkers 330b57cec5SDimitry Andric 340b57cec5SDimitry Andric We implemented the linkers as native linkers for each file format. 350b57cec5SDimitry Andric 360b57cec5SDimitry Andric The linkers share the same design but share very little code. 370b57cec5SDimitry Andric Sharing code makes sense if the benefit is worth its cost. 380b57cec5SDimitry Andric In our case, the object formats are different enough that we thought the layer 390b57cec5SDimitry Andric to abstract the differences wouldn't be worth its complexity and run-time 400b57cec5SDimitry Andric cost. Elimination of the abstract layer has greatly simplified the 410b57cec5SDimitry Andric implementation. 420b57cec5SDimitry Andric 430b57cec5SDimitry Andric* Speed by design 440b57cec5SDimitry Andric 450b57cec5SDimitry Andric One of the most important things in archiving high performance is to 460b57cec5SDimitry Andric do less rather than do it efficiently. 470b57cec5SDimitry Andric Therefore, the high-level design matters more than local optimizations. 480b57cec5SDimitry Andric Since we are trying to create a high-performance linker, 490b57cec5SDimitry Andric it is very important to keep the design as efficient as possible. 500b57cec5SDimitry Andric 510b57cec5SDimitry Andric Broadly speaking, we do not do anything until we have to do it. 520b57cec5SDimitry Andric For example, we do not read section contents or relocations 530b57cec5SDimitry Andric until we need them to continue linking. 540b57cec5SDimitry Andric When we need to do some costly operation (such as looking up 550b57cec5SDimitry Andric a hash table for each symbol), we do it only once. 560b57cec5SDimitry Andric We obtain a handle (which is typically just a pointer to actual data) 570b57cec5SDimitry Andric on the first operation and use it throughout the process. 580b57cec5SDimitry Andric 590b57cec5SDimitry Andric* Efficient archive file handling 600b57cec5SDimitry Andric 610b57cec5SDimitry Andric LLD's handling of archive files (the files with ".a" file extension) is 620b57cec5SDimitry Andric different from the traditional Unix linkers and similar to Windows linkers. 630b57cec5SDimitry Andric We'll describe how the traditional Unix linker handles archive files, what the 640b57cec5SDimitry Andric problem is, and how LLD approached the problem. 650b57cec5SDimitry Andric 660b57cec5SDimitry Andric The traditional Unix linker maintains a set of undefined symbols during 670b57cec5SDimitry Andric linking. The linker visits each file in the order as they appeared in the 680b57cec5SDimitry Andric command line until the set becomes empty. What the linker would do depends on 690b57cec5SDimitry Andric file type. 700b57cec5SDimitry Andric 710b57cec5SDimitry Andric - If the linker visits an object file, the linker links object files to the 720b57cec5SDimitry Andric result, and undefined symbols in the object file are added to the set. 730b57cec5SDimitry Andric 740b57cec5SDimitry Andric - If the linker visits an archive file, it checks for the archive file's 750b57cec5SDimitry Andric symbol table and extracts all object files that have definitions for any 760b57cec5SDimitry Andric symbols in the set. 770b57cec5SDimitry Andric 780b57cec5SDimitry Andric This algorithm sometimes leads to a counter-intuitive behavior. If you give 790b57cec5SDimitry Andric archive files before object files, nothing will happen because when the linker 800b57cec5SDimitry Andric visits archives, there is no undefined symbols in the set. As a result, no 810b57cec5SDimitry Andric files are extracted from the first archive file, and the link is done at that 820b57cec5SDimitry Andric point because the set is empty after it visits one file. 830b57cec5SDimitry Andric 840b57cec5SDimitry Andric You can fix the problem by reordering the files, 850b57cec5SDimitry Andric but that cannot fix the issue of mutually-dependent archive files. 860b57cec5SDimitry Andric 870b57cec5SDimitry Andric Linking mutually-dependent archive files is tricky. You may specify the same 880b57cec5SDimitry Andric archive file multiple times to let the linker visit it more than once. Or, 890b57cec5SDimitry Andric you may use the special command line options, `--start-group` and 900b57cec5SDimitry Andric `--end-group`, to let the linker loop over the files between the options until 910b57cec5SDimitry Andric no new symbols are added to the set. 920b57cec5SDimitry Andric 930b57cec5SDimitry Andric Visiting the same archive files multiple times makes the linker slower. 940b57cec5SDimitry Andric 950b57cec5SDimitry Andric Here is how LLD approaches the problem. Instead of memorizing only undefined 960b57cec5SDimitry Andric symbols, we program LLD so that it memorizes all symbols. When it sees an 970b57cec5SDimitry Andric undefined symbol that can be resolved by extracting an object file from an 980b57cec5SDimitry Andric archive file it previously visited, it immediately extracts the file and links 990b57cec5SDimitry Andric it. It is doable because LLD does not forget symbols it has seen in archive 1000b57cec5SDimitry Andric files. 1010b57cec5SDimitry Andric 1020b57cec5SDimitry Andric We believe that LLD's way is efficient and easy to justify. 1030b57cec5SDimitry Andric 1040b57cec5SDimitry Andric The semantics of LLD's archive handling are different from the traditional 1050b57cec5SDimitry Andric Unix's. You can observe it if you carefully craft archive files to exploit 1060b57cec5SDimitry Andric it. However, in reality, we don't know any program that cannot link with our 1070b57cec5SDimitry Andric algorithm so far, so it's not going to cause trouble. 1080b57cec5SDimitry Andric 1090b57cec5SDimitry AndricNumbers You Want to Know 1100b57cec5SDimitry Andric------------------------ 1110b57cec5SDimitry Andric 1120b57cec5SDimitry AndricTo give you intuition about what kinds of data the linker is mainly working on, 1130b57cec5SDimitry AndricI'll give you the list of objects and their numbers LLD has to read and process 1140b57cec5SDimitry Andricin order to link a very large executable. In order to link Chrome with debug 1150b57cec5SDimitry Andricinfo, which is roughly 2 GB in output size, LLD reads 1160b57cec5SDimitry Andric 1170b57cec5SDimitry Andric- 17,000 files, 1180b57cec5SDimitry Andric- 1,800,000 sections, 1190b57cec5SDimitry Andric- 6,300,000 symbols, and 1200b57cec5SDimitry Andric- 13,000,000 relocations. 1210b57cec5SDimitry Andric 1220b57cec5SDimitry AndricLLD produces the 2 GB executable in 15 seconds. 1230b57cec5SDimitry Andric 1240b57cec5SDimitry AndricThese numbers vary depending on your program, but in general, 1250b57cec5SDimitry Andricyou have a lot of relocations and symbols for each file. 1260b57cec5SDimitry AndricIf your program is written in C++, symbol names are likely to be 1270b57cec5SDimitry Andricpretty long because of name mangling. 1280b57cec5SDimitry Andric 1290b57cec5SDimitry AndricIt is important to not waste time on relocations and symbols. 1300b57cec5SDimitry Andric 1310b57cec5SDimitry AndricIn the above case, the total amount of symbol strings is 450 MB, 1320b57cec5SDimitry Andricand inserting all of them to a hash table takes 1.5 seconds. 1330b57cec5SDimitry AndricTherefore, if you causally add a hash table lookup for each symbol, 1340b57cec5SDimitry Andricit would slow down the linker by 10%. So, don't do that. 1350b57cec5SDimitry Andric 1360b57cec5SDimitry AndricOn the other hand, you don't have to pursue efficiency 1370b57cec5SDimitry Andricwhen handling files. 1380b57cec5SDimitry Andric 1390b57cec5SDimitry AndricImportant Data Structures 1400b57cec5SDimitry Andric------------------------- 1410b57cec5SDimitry Andric 1420b57cec5SDimitry AndricWe will describe the key data structures in LLD in this section. The linker can 1430b57cec5SDimitry Andricbe understood as the interactions between them. Once you understand their 1440b57cec5SDimitry Andricfunctions, the code of the linker should look obvious to you. 1450b57cec5SDimitry Andric 1460b57cec5SDimitry Andric* Symbol 1470b57cec5SDimitry Andric 1480b57cec5SDimitry Andric This class represents a symbol. 1490b57cec5SDimitry Andric They are created for symbols in object files or archive files. 1500b57cec5SDimitry Andric The linker creates linker-defined symbols as well. 1510b57cec5SDimitry Andric 1520b57cec5SDimitry Andric There are basically three types of Symbols: Defined, Undefined, or Lazy. 1530b57cec5SDimitry Andric 1540b57cec5SDimitry Andric - Defined symbols are for all symbols that are considered as "resolved", 1550b57cec5SDimitry Andric including real defined symbols, COMDAT symbols, common symbols, 1560b57cec5SDimitry Andric absolute symbols, linker-created symbols, etc. 1570b57cec5SDimitry Andric - Undefined symbols represent undefined symbols, which need to be replaced by 1580b57cec5SDimitry Andric Defined symbols by the resolver until the link is complete. 1590b57cec5SDimitry Andric - Lazy symbols represent symbols we found in archive file headers 1600b57cec5SDimitry Andric which can turn into Defined if we read archive members. 1610b57cec5SDimitry Andric 1620b57cec5SDimitry Andric There's only one Symbol instance for each unique symbol name. This uniqueness 1630b57cec5SDimitry Andric is guaranteed by the symbol table. As the resolver reads symbols from input 1640b57cec5SDimitry Andric files, it replaces an existing Symbol with the "best" Symbol for its symbol 1650b57cec5SDimitry Andric name using the placement new. 1660b57cec5SDimitry Andric 1670b57cec5SDimitry Andric The above mechanism allows you to use pointers to Symbols as a very cheap way 1680b57cec5SDimitry Andric to access name resolution results. Assume for example that you have a pointer 1690b57cec5SDimitry Andric to an undefined symbol before name resolution. If the symbol is resolved to a 1700b57cec5SDimitry Andric defined symbol by the resolver, the pointer will "automatically" point to the 1710b57cec5SDimitry Andric defined symbol, because the undefined symbol the pointer pointed to will have 1720b57cec5SDimitry Andric been replaced by the defined symbol in-place. 1730b57cec5SDimitry Andric 1740b57cec5SDimitry Andric* SymbolTable 1750b57cec5SDimitry Andric 1760b57cec5SDimitry Andric SymbolTable is basically a hash table from strings to Symbols 1770b57cec5SDimitry Andric with logic to resolve symbol conflicts. It resolves conflicts by symbol type. 1780b57cec5SDimitry Andric 1790b57cec5SDimitry Andric - If we add Defined and Undefined symbols, the symbol table will keep the 1800b57cec5SDimitry Andric former. 1810b57cec5SDimitry Andric - If we add Defined and Lazy symbols, it will keep the former. 1820b57cec5SDimitry Andric - If we add Lazy and Undefined, it will keep the former, 1830b57cec5SDimitry Andric but it will also trigger the Lazy symbol to load the archive member 1840b57cec5SDimitry Andric to actually resolve the symbol. 1850b57cec5SDimitry Andric 1860b57cec5SDimitry Andric* Chunk (COFF specific) 1870b57cec5SDimitry Andric 1880b57cec5SDimitry Andric Chunk represents a chunk of data that will occupy space in an output. 1890b57cec5SDimitry Andric Each regular section becomes a chunk. 1900b57cec5SDimitry Andric Chunks created for common or BSS symbols are not backed by sections. 1910b57cec5SDimitry Andric The linker may create chunks to append additional data to an output as well. 1920b57cec5SDimitry Andric 1930b57cec5SDimitry Andric Chunks know about their size, how to copy their data to mmap'ed outputs, 1940b57cec5SDimitry Andric and how to apply relocations to them. 1950b57cec5SDimitry Andric Specifically, section-based chunks know how to read relocation tables 1960b57cec5SDimitry Andric and how to apply them. 1970b57cec5SDimitry Andric 1980b57cec5SDimitry Andric* InputSection (ELF specific) 1990b57cec5SDimitry Andric 2000b57cec5SDimitry Andric Since we have less synthesized data for ELF, we don't abstract slices of 2010b57cec5SDimitry Andric input files as Chunks for ELF. Instead, we directly use the input section 2020b57cec5SDimitry Andric as an internal data type. 2030b57cec5SDimitry Andric 2040b57cec5SDimitry Andric InputSection knows about their size and how to copy themselves to 2050b57cec5SDimitry Andric mmap'ed outputs, just like COFF Chunks. 2060b57cec5SDimitry Andric 2070b57cec5SDimitry Andric* OutputSection 2080b57cec5SDimitry Andric 2090b57cec5SDimitry Andric OutputSection is a container of InputSections (ELF) or Chunks (COFF). 2100b57cec5SDimitry Andric An InputSection or Chunk belongs to at most one OutputSection. 2110b57cec5SDimitry Andric 2120b57cec5SDimitry AndricThere are mainly three actors in this linker. 2130b57cec5SDimitry Andric 2140b57cec5SDimitry Andric* InputFile 2150b57cec5SDimitry Andric 2160b57cec5SDimitry Andric InputFile is a superclass of file readers. 2170b57cec5SDimitry Andric We have a different subclass for each input file type, 2180b57cec5SDimitry Andric such as regular object file, archive file, etc. 2190b57cec5SDimitry Andric They are responsible for creating and owning Symbols and InputSections/Chunks. 2200b57cec5SDimitry Andric 2210b57cec5SDimitry Andric* Writer 2220b57cec5SDimitry Andric 2230b57cec5SDimitry Andric The writer is responsible for writing file headers and InputSections/Chunks to 2240b57cec5SDimitry Andric a file. It creates OutputSections, put all InputSections/Chunks into them, 2250b57cec5SDimitry Andric assign unique, non-overlapping addresses and file offsets to them, and then 2260b57cec5SDimitry Andric write them down to a file. 2270b57cec5SDimitry Andric 2280b57cec5SDimitry Andric* Driver 2290b57cec5SDimitry Andric 2300b57cec5SDimitry Andric The linking process is driven by the driver. The driver: 2310b57cec5SDimitry Andric 2320b57cec5SDimitry Andric - processes command line options, 2330b57cec5SDimitry Andric - creates a symbol table, 2340b57cec5SDimitry Andric - creates an InputFile for each input file and puts all symbols within into 2350b57cec5SDimitry Andric the symbol table, 2360b57cec5SDimitry Andric - checks if there's no remaining undefined symbols, 2370b57cec5SDimitry Andric - creates a writer, 2380b57cec5SDimitry Andric - and passes the symbol table to the writer to write the result to a file. 2390b57cec5SDimitry Andric 2400b57cec5SDimitry AndricLink-Time Optimization 2410b57cec5SDimitry Andric---------------------- 2420b57cec5SDimitry Andric 2430b57cec5SDimitry AndricLTO is implemented by handling LLVM bitcode files as object files. 2440b57cec5SDimitry AndricThe linker resolves symbols in bitcode files normally. If all symbols 2450b57cec5SDimitry Andricare successfully resolved, it then runs LLVM passes 2460b57cec5SDimitry Andricwith all bitcode files to convert them to one big regular ELF/COFF file. 2470b57cec5SDimitry AndricFinally, the linker replaces bitcode symbols with ELF/COFF symbols, 2480b57cec5SDimitry Andricso that they are linked as if they were in the native format from the beginning. 2490b57cec5SDimitry Andric 2500b57cec5SDimitry AndricThe details are described in this document. 2515ffd83dbSDimitry Andrichttps://llvm.org/docs/LinkTimeOptimization.html 2520b57cec5SDimitry Andric 2530b57cec5SDimitry AndricGlossary 2540b57cec5SDimitry Andric-------- 2550b57cec5SDimitry Andric 2560b57cec5SDimitry Andric* RVA (COFF) 2570b57cec5SDimitry Andric 2580b57cec5SDimitry Andric Short for Relative Virtual Address. 2590b57cec5SDimitry Andric 2600b57cec5SDimitry Andric Windows executables or DLLs are not position-independent; they are 2610b57cec5SDimitry Andric linked against a fixed address called an image base. RVAs are 2620b57cec5SDimitry Andric offsets from an image base. 2630b57cec5SDimitry Andric 2640b57cec5SDimitry Andric Default image bases are 0x140000000 for executables and 0x18000000 2650b57cec5SDimitry Andric for DLLs. For example, when we are creating an executable, we assume 2660b57cec5SDimitry Andric that the executable will be loaded at address 0x140000000 by the 2670b57cec5SDimitry Andric loader, so we apply relocations accordingly. Result texts and data 2680b57cec5SDimitry Andric will contain raw absolute addresses. 2690b57cec5SDimitry Andric 2700b57cec5SDimitry Andric* VA 2710b57cec5SDimitry Andric 2720b57cec5SDimitry Andric Short for Virtual Address. For COFF, it is equivalent to RVA + image base. 2730b57cec5SDimitry Andric 2740b57cec5SDimitry Andric* Base relocations (COFF) 2750b57cec5SDimitry Andric 2760b57cec5SDimitry Andric Relocation information for the loader. If the loader decides to map 2770b57cec5SDimitry Andric an executable or a DLL to a different address than their image 2780b57cec5SDimitry Andric bases, it fixes up binaries using information contained in the base 2790b57cec5SDimitry Andric relocation table. A base relocation table consists of a list of 2800b57cec5SDimitry Andric locations containing addresses. The loader adds a difference between 2810b57cec5SDimitry Andric RVA and actual load address to all locations listed there. 2820b57cec5SDimitry Andric 2830b57cec5SDimitry Andric Note that this run-time relocation mechanism is much simpler than ELF. 2840b57cec5SDimitry Andric There's no PLT or GOT. Images are relocated as a whole just 2850b57cec5SDimitry Andric by shifting entire images in memory by some offsets. Although doing 2860b57cec5SDimitry Andric this breaks text sharing, I think this mechanism is not actually bad 2870b57cec5SDimitry Andric on today's computers. 2880b57cec5SDimitry Andric 2890b57cec5SDimitry Andric* ICF 2900b57cec5SDimitry Andric 2910b57cec5SDimitry Andric Short for Identical COMDAT Folding (COFF) or Identical Code Folding (ELF). 2920b57cec5SDimitry Andric 2930b57cec5SDimitry Andric ICF is an optimization to reduce output size by merging read-only sections 2940b57cec5SDimitry Andric by not only their names but by their contents. If two read-only sections 2950b57cec5SDimitry Andric happen to have the same metadata, actual contents and relocations, 2960b57cec5SDimitry Andric they are merged by ICF. It is known as an effective technique, 2970b57cec5SDimitry Andric and it usually reduces C++ program's size by a few percent or more. 2980b57cec5SDimitry Andric 2990b57cec5SDimitry Andric Note that this is not an entirely sound optimization. C/C++ require 3000b57cec5SDimitry Andric different functions have different addresses. If a program depends on 3010b57cec5SDimitry Andric that property, it would fail at runtime. 3020b57cec5SDimitry Andric 3030b57cec5SDimitry Andric On Windows, that's not really an issue because MSVC link.exe enabled 3040b57cec5SDimitry Andric the optimization by default. As long as your program works 3050b57cec5SDimitry Andric with the linker's default settings, your program should be safe with ICF. 3060b57cec5SDimitry Andric 3070b57cec5SDimitry Andric On Unix, your program is generally not guaranteed to be safe with ICF, 3080b57cec5SDimitry Andric although large programs happen to work correctly. 3090b57cec5SDimitry Andric LLD works fine with ICF for example. 3100b57cec5SDimitry Andric 3110b57cec5SDimitry AndricOther Info 3120b57cec5SDimitry Andric---------- 3130b57cec5SDimitry Andric 3140b57cec5SDimitry Andric.. toctree:: 3150b57cec5SDimitry Andric :maxdepth: 1 3160b57cec5SDimitry Andric 3170b57cec5SDimitry Andric missingkeyfunction 318