123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701 |
- \input texinfo
- @setfilename ldint.info
- @c Copyright (C) 1992-2015 Free Software Foundation, Inc.
- @ifnottex
- @dircategory Software development
- @direntry
- * Ld-Internals: (ldint). The GNU linker internals.
- @end direntry
- @end ifnottex
- @copying
- This file documents the internals of the GNU linker ld.
- Copyright @copyright{} 1992-2015 Free Software Foundation, Inc.
- Contributed by Cygnus Support.
- Permission is granted to copy, distribute and/or modify this document
- under the terms of the GNU Free Documentation License, Version 1.3 or
- any later version published by the Free Software Foundation; with the
- Invariant Sections being ``GNU General Public License'' and ``Funding
- Free Software'', the Front-Cover texts being (a) (see below), and with
- the Back-Cover Texts being (b) (see below). A copy of the license is
- included in the section entitled ``GNU Free Documentation License''.
- (a) The FSF's Front-Cover Text is:
- A GNU Manual
- (b) The FSF's Back-Cover Text is:
- You have freedom to copy and modify this GNU Manual, like GNU
- software. Copies published by the Free Software Foundation raise
- funds for GNU development.
- @end copying
- @iftex
- @finalout
- @setchapternewpage off
- @settitle GNU Linker Internals
- @titlepage
- @title{A guide to the internals of the GNU linker}
- @author Per Bothner, Steve Chamberlain, Ian Lance Taylor, DJ Delorie
- @author Cygnus Support
- @page
- @tex
- \def\$#1${{#1}} % Kluge: collect RCS revision info without $...$
- \xdef\manvers{2.10.91} % For use in headers, footers too
- {\parskip=0pt
- \hfill Cygnus Support\par
- \hfill \manvers\par
- \hfill \TeX{}info \texinfoversion\par
- }
- @end tex
- @vskip 0pt plus 1filll
- Copyright @copyright{} 1992-2015 Free Software Foundation, Inc.
- Permission is granted to copy, distribute and/or modify this document
- under the terms of the GNU Free Documentation License, Version 1.3
- or any later version published by the Free Software Foundation;
- with no Invariant Sections, with no Front-Cover Texts, and with no
- Back-Cover Texts. A copy of the license is included in the
- section entitled "GNU Free Documentation License".
- @end titlepage
- @end iftex
- @node Top
- @top
- This file documents the internals of the GNU linker @code{ld}. It is a
- collection of miscellaneous information with little form at this point.
- Mostly, it is a repository into which you can put information about
- GNU @code{ld} as you discover it (or as you design changes to @code{ld}).
- This document is distributed under the terms of the GNU Free
- Documentation License. A copy of the license is included in the
- section entitled "GNU Free Documentation License".
- @menu
- * README:: The README File
- * Emulations:: How linker emulations are generated
- * Emulation Walkthrough:: A Walkthrough of a Typical Emulation
- * Architecture Specific:: Some Architecture Specific Notes
- * GNU Free Documentation License:: GNU Free Documentation License
- @end menu
- @node README
- @chapter The @file{README} File
- Check the @file{README} file; it often has useful information that does not
- appear anywhere else in the directory.
- @node Emulations
- @chapter How linker emulations are generated
- Each linker target has an @dfn{emulation}. The emulation includes the
- default linker script, and certain emulations also modify certain types
- of linker behaviour.
- Emulations are created during the build process by the shell script
- @file{genscripts.sh}.
- The @file{genscripts.sh} script starts by reading a file in the
- @file{emulparams} directory. This is a shell script which sets various
- shell variables used by @file{genscripts.sh} and the other shell scripts
- it invokes.
- The @file{genscripts.sh} script will invoke a shell script in the
- @file{scripttempl} directory in order to create default linker scripts
- written in the linker command language. The @file{scripttempl} script
- will be invoked 5 (or, in some cases, 6) times, with different
- assignments to shell variables, to create different default scripts.
- The choice of script is made based on the command line options.
- After creating the scripts, @file{genscripts.sh} will invoke yet another
- shell script, this time in the @file{emultempl} directory. That shell
- script will create the emulation source file, which contains C code.
- This C code permits the linker emulation to override various linker
- behaviours. Most targets use the generic emulation code, which is in
- @file{emultempl/generic.em}.
- To summarize, @file{genscripts.sh} reads three shell scripts: an
- emulation parameters script in the @file{emulparams} directory, a linker
- script generation script in the @file{scripttempl} directory, and an
- emulation source file generation script in the @file{emultempl}
- directory.
- For example, the Sun 4 linker sets up variables in
- @file{emulparams/sun4.sh}, creates linker scripts using
- @file{scripttempl/aout.sc}, and creates the emulation code using
- @file{emultempl/sunos.em}.
- Note that the linker can support several emulations simultaneously,
- depending upon how it is configured. An emulation can be selected with
- the @code{-m} option. The @code{-V} option will list all supported
- emulations.
- @menu
- * emulation parameters:: @file{emulparams} scripts
- * linker scripts:: @file{scripttempl} scripts
- * linker emulations:: @file{emultempl} scripts
- @end menu
- @node emulation parameters
- @section @file{emulparams} scripts
- Each target selects a particular file in the @file{emulparams} directory
- by setting the shell variable @code{targ_emul} in @file{configure.tgt}.
- This shell variable is used by the @file{configure} script to control
- building an emulation source file.
- Certain conventions are enforced. Suppose the @code{targ_emul} variable
- is set to @var{emul} in @file{configure.tgt}. The name of the emulation
- shell script will be @file{emulparams/@var{emul}.sh}. The
- @file{Makefile} must have a target named @file{e@var{emul}.c}; this
- target must depend upon @file{emulparams/@var{emul}.sh}, as well as the
- appropriate scripts in the @file{scripttempl} and @file{emultempl}
- directories. The @file{Makefile} target must invoke @code{GENSCRIPTS}
- with two arguments: @var{emul}, and the value of the make variable
- @code{tdir_@var{emul}}. The value of the latter variable will be set by
- the @file{configure} script, and is used to set the default target
- directory to search.
- By convention, the @file{emulparams/@var{emul}.sh} shell script should
- only set shell variables. It may set shell variables which are to be
- interpreted by the @file{scripttempl} and the @file{emultempl} scripts.
- Certain shell variables are interpreted directly by the
- @file{genscripts.sh} script.
- Here is a list of shell variables interpreted by @file{genscripts.sh},
- as well as some conventional shell variables interpreted by the
- @file{scripttempl} and @file{emultempl} scripts.
- @table @code
- @item SCRIPT_NAME
- This is the name of the @file{scripttempl} script to use. If
- @code{SCRIPT_NAME} is set to @var{script}, @file{genscripts.sh} will use
- the script @file{scripttempl/@var{script}.sc}.
- @item TEMPLATE_NAME
- This is the name of the @file{emultempl} script to use. If
- @code{TEMPLATE_NAME} is set to @var{template}, @file{genscripts.sh} will
- use the script @file{emultempl/@var{template}.em}. If this variable is
- not set, the default value is @samp{generic}.
- @item GENERATE_SHLIB_SCRIPT
- If this is set to a nonempty string, @file{genscripts.sh} will invoke
- the @file{scripttempl} script an extra time to create a shared library
- script. @ref{linker scripts}.
- @item OUTPUT_FORMAT
- This is normally set to indicate the BFD output format use (e.g.,
- @samp{"a.out-sunos-big"}. The @file{scripttempl} script will normally
- use it in an @code{OUTPUT_FORMAT} expression in the linker script.
- @item ARCH
- This is normally set to indicate the architecture to use (e.g.,
- @samp{sparc}). The @file{scripttempl} script will normally use it in an
- @code{OUTPUT_ARCH} expression in the linker script.
- @item ENTRY
- Some @file{scripttempl} scripts use this to set the entry address, in an
- @code{ENTRY} expression in the linker script.
- @item TEXT_START_ADDR
- Some @file{scripttempl} scripts use this to set the start address of the
- @samp{.text} section.
- @item SEGMENT_SIZE
- The @file{genscripts.sh} script uses this to set the default value of
- @code{DATA_ALIGNMENT} when running the @file{scripttempl} script.
- @item TARGET_PAGE_SIZE
- If @code{SEGMENT_SIZE} is not defined, the @file{genscripts.sh} script
- uses this to define it.
- @item ALIGNMENT
- Some @file{scripttempl} scripts set this to a number to pass to
- @code{ALIGN} to set the required alignment for the @code{end} symbol.
- @end table
- @node linker scripts
- @section @file{scripttempl} scripts
- Each linker target uses a @file{scripttempl} script to generate the
- default linker scripts. The name of the @file{scripttempl} script is
- set by the @code{SCRIPT_NAME} variable in the @file{emulparams} script.
- If @code{SCRIPT_NAME} is set to @var{script}, @code{genscripts.sh} will
- invoke @file{scripttempl/@var{script}.sc}.
- The @file{genscripts.sh} script will invoke the @file{scripttempl}
- script 5 to 9 times. Each time it will set the shell variable
- @code{LD_FLAG} to a different value. When the linker is run, the
- options used will direct it to select a particular script. (Script
- selection is controlled by the @code{get_script} emulation entry point;
- this describes the conventional behaviour).
- The @file{scripttempl} script should just write a linker script, written
- in the linker command language, to standard output. If the emulation
- name--the name of the @file{emulparams} file without the @file{.sc}
- extension--is @var{emul}, then the output will be directed to
- @file{ldscripts/@var{emul}.@var{extension}} in the build directory,
- where @var{extension} changes each time the @file{scripttempl} script is
- invoked.
- Here is the list of values assigned to @code{LD_FLAG}.
- @table @code
- @item (empty)
- The script generated is used by default (when none of the following
- cases apply). The output has an extension of @file{.x}.
- @item n
- The script generated is used when the linker is invoked with the
- @code{-n} option. The output has an extension of @file{.xn}.
- @item N
- The script generated is used when the linker is invoked with the
- @code{-N} option. The output has an extension of @file{.xbn}.
- @item r
- The script generated is used when the linker is invoked with the
- @code{-r} option. The output has an extension of @file{.xr}.
- @item u
- The script generated is used when the linker is invoked with the
- @code{-Ur} option. The output has an extension of @file{.xu}.
- @item shared
- The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
- this value if @code{GENERATE_SHLIB_SCRIPT} is defined in the
- @file{emulparams} file. The @file{emultempl} script must arrange to use
- this script at the appropriate time, normally when the linker is invoked
- with the @code{-shared} option. The output has an extension of
- @file{.xs}.
- @item c
- The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
- this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the
- @file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf}. The
- @file{emultempl} script must arrange to use this script at the appropriate
- time, normally when the linker is invoked with the @code{-z combreloc}
- option. The output has an extension of
- @file{.xc}.
- @item cshared
- The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
- this value if @code{GENERATE_COMBRELOC_SCRIPT} is defined in the
- @file{emulparams} file or if @code{SCRIPT_NAME} is @code{elf} and
- @code{GENERATE_SHLIB_SCRIPT} is defined in the @file{emulparams} file.
- The @file{emultempl} script must arrange to use this script at the
- appropriate time, normally when the linker is invoked with the @code{-shared
- -z combreloc} option. The output has an extension of @file{.xsc}.
- @item auto_import
- The @file{scripttempl} script is only invoked with @code{LD_FLAG} set to
- this value if @code{GENERATE_AUTO_IMPORT_SCRIPT} is defined in the
- @file{emulparams} file. The @file{emultempl} script must arrange to
- use this script at the appropriate time, normally when the linker is
- invoked with the @code{--enable-auto-import} option. The output has
- an extension of @file{.xa}.
- @end table
- Besides the shell variables set by the @file{emulparams} script, and the
- @code{LD_FLAG} variable, the @file{genscripts.sh} script will set
- certain variables for each run of the @file{scripttempl} script.
- @table @code
- @item RELOCATING
- This will be set to a non-empty string when the linker is doing a final
- relocation (e.g., all scripts other than @code{-r} and @code{-Ur}).
- @item CONSTRUCTING
- This will be set to a non-empty string when the linker is building
- global constructor and destructor tables (e.g., all scripts other than
- @code{-r}).
- @item DATA_ALIGNMENT
- This will be set to an @code{ALIGN} expression when the output should be
- page aligned, or to @samp{.} when generating the @code{-N} script.
- @item CREATE_SHLIB
- This will be set to a non-empty string when generating a @code{-shared}
- script.
- @item COMBRELOC
- This will be set to a non-empty string when generating @code{-z combreloc}
- scripts to a temporary file name which can be used during script generation.
- @end table
- The conventional way to write a @file{scripttempl} script is to first
- set a few shell variables, and then write out a linker script using
- @code{cat} with a here document. The linker script will use variable
- substitutions, based on the above variables and those set in the
- @file{emulparams} script, to control its behaviour.
- When there are parts of the @file{scripttempl} script which should only
- be run when doing a final relocation, they should be enclosed within a
- variable substitution based on @code{RELOCATING}. For example, on many
- targets special symbols such as @code{_end} should be defined when doing
- a final link. Naturally, those symbols should not be defined when doing
- a relocatable link using @code{-r}. The @file{scripttempl} script
- could use a construct like this to define those symbols:
- @smallexample
- $@{RELOCATING+ _end = .;@}
- @end smallexample
- This will do the symbol assignment only if the @code{RELOCATING}
- variable is defined.
- The basic job of the linker script is to put the sections in the correct
- order, and at the correct memory addresses. For some targets, the
- linker script may have to do some other operations.
- For example, on most MIPS platforms, the linker is responsible for
- defining the special symbol @code{_gp}, used to initialize the
- @code{$gp} register. It must be set to the start of the small data
- section plus @code{0x8000}. Naturally, it should only be defined when
- doing a final relocation. This will typically be done like this:
- @smallexample
- $@{RELOCATING+ _gp = ALIGN(16) + 0x8000;@}
- @end smallexample
- This line would appear just before the sections which compose the small
- data section (@samp{.sdata}, @samp{.sbss}). All those sections would be
- contiguous in memory.
- Many COFF systems build constructor tables in the linker script. The
- compiler will arrange to output the address of each global constructor
- in a @samp{.ctor} section, and the address of each global destructor in
- a @samp{.dtor} section (this is done by defining
- @code{ASM_OUTPUT_CONSTRUCTOR} and @code{ASM_OUTPUT_DESTRUCTOR} in the
- @code{gcc} configuration files). The @code{gcc} runtime support
- routines expect the constructor table to be named @code{__CTOR_LIST__}.
- They expect it to be a list of words, with the first word being the
- count of the number of entries. There should be a trailing zero word.
- (Actually, the count may be -1 if the trailing word is present, and the
- trailing word may be omitted if the count is correct, but, as the
- @code{gcc} behaviour has changed slightly over the years, it is safest
- to provide both). Here is a typical way that might be handled in a
- @file{scripttempl} file.
- @smallexample
- $@{CONSTRUCTING+ __CTOR_LIST__ = .;@}
- $@{CONSTRUCTING+ LONG((__CTOR_END__ - __CTOR_LIST__) / 4 - 2)@}
- $@{CONSTRUCTING+ *(.ctors)@}
- $@{CONSTRUCTING+ LONG(0)@}
- $@{CONSTRUCTING+ __CTOR_END__ = .;@}
- $@{CONSTRUCTING+ __DTOR_LIST__ = .;@}
- $@{CONSTRUCTING+ LONG((__DTOR_END__ - __DTOR_LIST__) / 4 - 2)@}
- $@{CONSTRUCTING+ *(.dtors)@}
- $@{CONSTRUCTING+ LONG(0)@}
- $@{CONSTRUCTING+ __DTOR_END__ = .;@}
- @end smallexample
- The use of @code{CONSTRUCTING} ensures that these linker script commands
- will only appear when the linker is supposed to be building the
- constructor and destructor tables. This example is written for a target
- which uses 4 byte pointers.
- Embedded systems often need to set a stack address. This is normally
- best done by using the @code{PROVIDE} construct with a default stack
- address. This permits the user to easily override the stack address
- using the @code{--defsym} option. Here is an example:
- @smallexample
- $@{RELOCATING+ PROVIDE (__stack = 0x80000000);@}
- @end smallexample
- The value of the symbol @code{__stack} would then be used in the startup
- code to initialize the stack pointer.
- @node linker emulations
- @section @file{emultempl} scripts
- Each linker target uses an @file{emultempl} script to generate the
- emulation code. The name of the @file{emultempl} script is set by the
- @code{TEMPLATE_NAME} variable in the @file{emulparams} script. If the
- @code{TEMPLATE_NAME} variable is not set, the default is
- @samp{generic}. If the value of @code{TEMPLATE_NAME} is @var{template},
- @file{genscripts.sh} will use @file{emultempl/@var{template}.em}.
- Most targets use the generic @file{emultempl} script,
- @file{emultempl/generic.em}. A different @file{emultempl} script is
- only needed if the linker must support unusual actions, such as linking
- against shared libraries.
- The @file{emultempl} script is normally written as a simple invocation
- of @code{cat} with a here document. The document will use a few
- variable substitutions. Typically each function names uses a
- substitution involving @code{EMULATION_NAME}, for ease of debugging when
- the linker supports multiple emulations.
- Every function and variable in the emitted file should be static. The
- only globally visible object must be named
- @code{ld_@var{EMULATION_NAME}_emulation}, where @var{EMULATION_NAME} is
- the name of the emulation set in @file{configure.tgt} (this is also the
- name of the @file{emulparams} file without the @file{.sh} extension).
- The @file{genscripts.sh} script will set the shell variable
- @code{EMULATION_NAME} before invoking the @file{emultempl} script.
- The @code{ld_@var{EMULATION_NAME}_emulation} variable must be a
- @code{struct ld_emulation_xfer_struct}, as defined in @file{ldemul.h}.
- It defines a set of function pointers which are invoked by the linker,
- as well as strings for the emulation name (normally set from the shell
- variable @code{EMULATION_NAME} and the default BFD target name (normally
- set from the shell variable @code{OUTPUT_FORMAT} which is normally set
- by the @file{emulparams} file).
- The @file{genscripts.sh} script will set the shell variable
- @code{COMPILE_IN} when it invokes the @file{emultempl} script for the
- default emulation. In this case, the @file{emultempl} script should
- include the linker scripts directly, and return them from the
- @code{get_scripts} entry point. When the emulation is not the default,
- the @code{get_scripts} entry point should just return a file name. See
- @file{emultempl/generic.em} for an example of how this is done.
- At some point, the linker emulation entry points should be documented.
- @node Emulation Walkthrough
- @chapter A Walkthrough of a Typical Emulation
- This chapter is to help people who are new to the way emulations
- interact with the linker, or who are suddenly thrust into the position
- of having to work with existing emulations. It will discuss the files
- you need to be aware of. It will tell you when the given "hooks" in
- the emulation will be called. It will, hopefully, give you enough
- information about when and how things happen that you'll be able to
- get by. As always, the source is the definitive reference to this.
- The starting point for the linker is in @file{ldmain.c} where
- @code{main} is defined. The bulk of the code that's emulation
- specific will initially be in @code{emultempl/@var{emulation}.em} but
- will end up in @code{e@var{emulation}.c} when the build is done.
- Most of the work to select and interface with emulations is in
- @code{ldemul.h} and @code{ldemul.c}. Specifically, @code{ldemul.h}
- defines the @code{ld_emulation_xfer_struct} structure your emulation
- exports.
- Your emulation file exports a symbol
- @code{ld_@var{EMULATION_NAME}_emulation}. If your emulation is
- selected (it usually is, since usually there's only one),
- @code{ldemul.c} sets the variable @var{ld_emulation} to point to it.
- @code{ldemul.c} also defines a number of API functions that interface
- to your emulation, like @code{ldemul_after_parse} which simply calls
- your @code{ld_@var{EMULATION}_emulation.after_parse} function. For
- the rest of this section, the functions will be mentioned, but you
- should assume the indirect reference to your emulation also.
- We will also skip or gloss over parts of the link process that don't
- relate to emulations, like setting up internationalization.
- After initialization, @code{main} selects an emulation by pre-scanning
- the command line arguments. It calls @code{ldemul_choose_target} to
- choose a target. If you set @code{choose_target} to
- @code{ldemul_default_target}, it picks your @code{target_name} by
- default.
- @code{main} calls @code{ldemul_before_parse}, then @code{parse_args}.
- @code{parse_args} calls @code{ldemul_parse_args} for each arg, which
- must update the @code{getopt} globals if it recognizes the argument.
- If the emulation doesn't recognize it, then parse_args checks to see
- if it recognizes it.
- Now that the emulation has had access to all its command-line options,
- @code{main} calls @code{ldemul_set_symbols}. This can be used for any
- initialization that may be affected by options. It is also supposed
- to set up any variables needed by the emulation script.
- @code{main} now calls @code{ldemul_get_script} to get the emulation
- script to use (based on arguments, no doubt, @pxref{Emulations}) and
- runs it. While parsing, @code{ldgram.y} may call @code{ldemul_hll} or
- @code{ldemul_syslib} to handle the @code{HLL} or @code{SYSLIB}
- commands. It may call @code{ldemul_unrecognized_file} if you asked
- the linker to link a file it doesn't recognize. It will call
- @code{ldemul_recognized_file} for each file it does recognize, in case
- the emulation wants to handle some files specially. All the while,
- it's loading the files (possibly calling
- @code{ldemul_open_dynamic_archive}) and symbols and stuff. After it's
- done reading the script, @code{main} calls @code{ldemul_after_parse}.
- Use the after-parse hook to set up anything that depends on stuff the
- script might have set up, like the entry point.
- @code{main} next calls @code{lang_process} in @code{ldlang.c}. This
- appears to be the main core of the linking itself, as far as emulation
- hooks are concerned(*). It first opens the output file's BFD, calling
- @code{ldemul_set_output_arch}, and calls
- @code{ldemul_create_output_section_statements} in case you need to use
- other means to find or create object files (i.e. shared libraries
- found on a path, or fake stub objects). Despite the name, nobody
- creates output sections here.
- (*) In most cases, the BFD library does the bulk of the actual
- linking, handling symbol tables, symbol resolution, relocations, and
- building the final output file. See the BFD reference for all the
- details. Your emulation is usually concerned more with managing
- things at the file and section level, like "put this here, add this
- section", etc.
- Next, the objects to be linked are opened and BFDs created for them,
- and @code{ldemul_after_open} is called. At this point, you have all
- the objects and symbols loaded, but none of the data has been placed
- yet.
- Next comes the Big Linking Thingy (except for the parts BFD does).
- All input sections are mapped to output sections according to the
- script. If a section doesn't get mapped by default,
- @code{ldemul_place_orphan} will get called to figure out where it goes.
- Next it figures out the offsets for each section, calling
- @code{ldemul_before_allocation} before and
- @code{ldemul_after_allocation} after deciding where each input section
- ends up in the output sections.
- The last part of @code{lang_process} is to figure out all the symbols'
- values. After assigning final values to the symbols,
- @code{ldemul_finish} is called, and after that, any undefined symbols
- are turned into fatal errors.
- OK, back to @code{main}, which calls @code{ldwrite} in
- @file{ldwrite.c}. @code{ldwrite} calls BFD's final_link, which does
- all the relocation fixups and writes the output bfd to disk, and we're
- done.
- In summary,
- @itemize @bullet
- @item @code{main()} in @file{ldmain.c}
- @item @file{emultempl/@var{EMULATION}.em} has your code
- @item @code{ldemul_choose_target} (defaults to your @code{target_name})
- @item @code{ldemul_before_parse}
- @item Parse argv, calls @code{ldemul_parse_args} for each
- @item @code{ldemul_set_symbols}
- @item @code{ldemul_get_script}
- @item parse script
- @itemize @bullet
- @item may call @code{ldemul_hll} or @code{ldemul_syslib}
- @item may call @code{ldemul_open_dynamic_archive}
- @end itemize
- @item @code{ldemul_after_parse}
- @item @code{lang_process()} in @file{ldlang.c}
- @itemize @bullet
- @item create @code{output_bfd}
- @item @code{ldemul_set_output_arch}
- @item @code{ldemul_create_output_section_statements}
- @item read objects, create input bfds - all symbols exist, but have no values
- @item may call @code{ldemul_unrecognized_file}
- @item will call @code{ldemul_recognized_file}
- @item @code{ldemul_after_open}
- @item map input sections to output sections
- @item may call @code{ldemul_place_orphan} for remaining sections
- @item @code{ldemul_before_allocation}
- @item gives input sections offsets into output sections, places output sections
- @item @code{ldemul_after_allocation} - section addresses valid
- @item assigns values to symbols
- @item @code{ldemul_finish} - symbol values valid
- @end itemize
- @item output bfd is written to disk
- @end itemize
- @node Architecture Specific
- @chapter Some Architecture Specific Notes
- This is the place for notes on the behavior of @code{ld} on
- specific platforms. Currently, only Intel x86 is documented (and
- of that, only the auto-import behavior for DLLs).
- @menu
- * ix86:: Intel x86
- @end menu
- @node ix86
- @section Intel x86
- @table @emph
- @code{ld} can create DLLs that operate with various runtimes available
- on a common x86 operating system. These runtimes include native (using
- the mingw "platform"), cygwin, and pw.
- @item auto-import from DLLs
- @enumerate
- @item
- With this feature on, DLL clients can import variables from DLL
- without any concern from their side (for example, without any source
- code modifications). Auto-import can be enabled using the
- @code{--enable-auto-import} flag, or disabled via the
- @code{--disable-auto-import} flag. Auto-import is disabled by default.
- @item
- This is done completely in bounds of the PE specification (to be fair,
- there's a minor violation of the spec at one point, but in practice
- auto-import works on all known variants of that common x86 operating
- system) So, the resulting DLL can be used with any other PE
- compiler/linker.
- @item
- Auto-import is fully compatible with standard import method, in which
- variables are decorated using attribute modifiers. Libraries of either
- type may be mixed together.
- @item
- Overhead (space): 8 bytes per imported symbol, plus 20 for each
- reference to it; Overhead (load time): negligible; Overhead
- (virtual/physical memory): should be less than effect of DLL
- relocation.
- @end enumerate
- Motivation
- The obvious and only way to get rid of dllimport insanity is
- to make client access variable directly in the DLL, bypassing
- the extra dereference imposed by ordinary DLL runtime linking.
- I.e., whenever client contains something like
- @code{mov dll_var,%eax,}
- address of dll_var in the command should be relocated to point
- into loaded DLL. The aim is to make OS loader do so, and than
- make ld help with that. Import section of PE made following
- way: there's a vector of structures each describing imports
- from particular DLL. Each such structure points to two other
- parallel vectors: one holding imported names, and one which
- will hold address of corresponding imported name. So, the
- solution is de-vectorize these structures, making import
- locations be sparse and pointing directly into code.
- Implementation
- For each reference of data symbol to be imported from DLL (to
- set of which belong symbols with name <sym>, if __imp_<sym> is
- found in implib), the import fixup entry is generated. That
- entry is of type IMAGE_IMPORT_DESCRIPTOR and stored in .idata$3
- subsection. Each fixup entry contains pointer to symbol's address
- within .text section (marked with __fuN_<sym> symbol, where N is
- integer), pointer to DLL name (so, DLL name is referenced by
- multiple entries), and pointer to symbol name thunk. Symbol name
- thunk is singleton vector (__nm_th_<symbol>) pointing to
- IMAGE_IMPORT_BY_NAME structure (__nm_<symbol>) directly containing
- imported name. Here comes that "om the edge" problem mentioned above:
- PE specification rambles that name vector (OriginalFirstThunk) should
- run in parallel with addresses vector (FirstThunk), i.e. that they
- should have same number of elements and terminated with zero. We violate
- this, since FirstThunk points directly into machine code. But in
- practice, OS loader implemented the sane way: it goes thru
- OriginalFirstThunk and puts addresses to FirstThunk, not something
- else. It once again should be noted that dll and symbol name
- structures are reused across fixup entries and should be there
- anyway to support standard import stuff, so sustained overhead is
- 20 bytes per reference. Other question is whether having several
- IMAGE_IMPORT_DESCRIPTORS for the same DLL is possible. Answer is yes,
- it is done even by native compiler/linker (libth32's functions are in
- fact resident in windows9x kernel32.dll, so if you use it, you have
- two IMAGE_IMPORT_DESCRIPTORS for kernel32.dll). Yet other question is
- whether referencing the same PE structures several times is valid.
- The answer is why not, prohibiting that (detecting violation) would
- require more work on behalf of loader than not doing it.
- @end table
- @node GNU Free Documentation License
- @chapter GNU Free Documentation License
- @include fdl.texi
- @contents
- @bye
|