123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386 |
- Format of Inform 6 Debugging Information Files
- Version 1.0
- 0: Introduction
- This is a specification of the Version 1 format for the debugging information
- files emitted by the Inform 6 compiler. It replaces Version 0, which is
- documented in Section 12.5 of the Inform Technical Manual.
- 1: Overview
- Debugging information files are written in XML and encoded in UTF-8. They
- therefore begin with the following declaration:
- <?xml version="1.0" encoding="UTF-8"?>
- Beyond the usual requirements for well-formed XML, the file adheres to the
- conventions that all numbers are written in decimal, all strings are
- case-sensitive, and all excerpts from binary files are Base64-encoded.
- 2: The Top Level
- The root element is given by the tag <inform-story-file> with three attributes,
- the version of the debug file format being used, the name of the program that
- produced the file, and that program's version. For instance,
- <inform-story-file version="1.0" content-creator="Inform"
- content-creator-version="6.33">
- ...
- </inform-story-file>
- The elements from Sections 3--8 may appear in the ellipses.
- 3: Story File Prefix
- The story file prefix contains a Base64 encoding of the story file's first bytes
- so that a debugging tool can easily check whether the story and the debug
- information file are mismatched. For example, the prefix for a Glulx story
- might appear as
- <story-file-prefix>
- R2x1bAADAQEACqEAAAwsAAAMLAAAAQAAAAAAPAAIo2Jc
- 6B2XSW5mbwABAAA2LjMyMC4zOAABMTIxMDE1wQAAMA==
- </story-file-prefix>
- The story file prefix is mandatory, but its length is unspecified. Version 6.33
- of the Inform compiler records 64 bytes, which seems sufficient.
- 4: Story File Sections
- Story file sections partition the story file according to how the data will be
- used. For the Inform 6 compiler, this partitioning is the same as the one that
- the `z' flag prints.
- A record for a story file section gives a name for that section, its beginning
- address (inclusive), and its end address (exclusive):
- <story-file-section>
- <type>abbreviations table</type>
- <address>64</address>
- <end-address>128</end-address>
- </story-file-section>
- The names currently in use include those from Section 12.5 of the Inform
- Technical Manual:
- abbreviations table
- header extension (Z-code only)
- alphabets table (Z-code only)
- Unicode table (Z-code only)
- property defaults
- object tree
- common properties
- class numbers
- individual properties (Z-code only)
- global variables
- array space
- grammar table
- actions table
- parsing routines (Z-code only)
- adjectives table (Z-code only)
- dictionary
- code area
- strings area
- plus one addition for Z-code:
- abbreviations
- two additions for Glulx:
- memory layout id
- string decoding table
- and three additions for both targets:
- header
- identifier names
- zero padding
- Names may repeat; Glulx story files, for example, sometimes have two zero
- padding sections.
- A compiler that does not wish to subdivide the story file should emit one
- section for the entirety and give it the name
- story
- 5: Source Files
- Source files are encoded as in the example below. Each file has a unique index,
- which is used by other elements when referring to source code locations; these
- indices count from zero. The file's path is recorded in two forms, first as it
- was given to the compiler via a command-line argument or include directive but
- without any path abbreviations like `>' (the form suitable for presentation to a
- human) and second after resolution to an absolute path (the form suitable for
- loading the file contents). All paths are written according to the conventions
- of the host OS. The language is, at present, either "Inform 6" or "Inform 7".
- More languages may added in the future.
- <source index="0">
- <given-path>example.inf</given-path>
- <resolved-path>/home/user/directory/example.inf</resolved-path>
- <language>Inform 6</language>
- </source>
- If the source file is known to appear in the story's Blorb, its chunk number
- will also be recorded:
- <source index="0">
- <given-path>example.inf</given-path>
- <resolved-path>/home/user/directory/example.inf</resolved-path>
- <language>Inform 6</language>
- <blorb-chunk-number>18</blorb-chunk-number>
- </source>
- 6: Table Entries; Grammar Lines
- Table entries are data defined by particular parts of the source code, but
- without any corresponding identifiers. The <table-entry> element notes the
- entry's type, the address where it begins (inclusive), the address where it ends
- (exclusive), and the defining source code location(s), if any:
- <table-entry>
- <type>grammar line</type>
- <address>1004</address>
- <end-address>1030</end-address>
- <source-code-location>...</source-code-location>
- </table-entry>
- Version 6.33 of the Inform compiler only emits <table-entry> tags for grammar
- lines; these data are all located in the grammar table section.
- 7: Named Values; Constants, Attributes, Properties, Actions, Fake Actions,
- Objects, Classes, Arrays, and Routines
- Records for named values store their identifier, their value, and the source
- code location(s) of their definition, if any. For instance,
- <constant>
- <identifier>MAX_SCORE</identifier>
- <value>40</value>
- <source-code-location>...</source-code-location>
- </constant>
- would represent a named constant. Attributes, properties, actions, fake
- actions, objects, arrays, and routines are also names for numbers, and differ
- only in their use; they are represented in the same format under the tags
- <attribute>, <property>, <action>, <fake-action>, <object>, <array>, and
- <routine>. (Moreover, unlike Version 0 of the debug information format, fake
- actions are not recorded as both fake actions and actions.)
- The records for constants include some extra entries for the system constants
- tabulated in Section 12.2 of the Inform Technical Manual, even though these are
- not created by Constant directives. Entries for #undefed constants are also
- included, but necessarily without values.
- Some records for objects will represent class objects. In that case, they will
- be given with the tag <class> rather than <object> and include an additional
- child to indicate their class number:
- <class>
- <identifier>lamp</identifier>
- <class-number>5</class-number>
- <value>1560</value>
- <source-code-location>...</source-code-location>
- </class>
- Records for arrays also have extra children, which record their size, their
- element size, and the intended semantics for their zeroth element:
- <array>
- <identifier>route</identifier>
- <value>1500</value>
- <byte-count>20</byte-count>
- <bytes-per-element>4</bytes-per-element>
- <zeroth-element-holds-length>true</zeroth-element-holds-length>
- <source-code-location>...</source-code-location>
- </array>
- And finally, <routine> records contain an <address> and a <byte-count> element,
- along with any number of the <local-variable> and <sequence-point> elements,
- which are described in Sections 9 and 10. The address is provided because the
- identifier's value may be packed.
- Sometimes what would otherwise be a named value is in fact anonymous; unnamed
- objects, embedded routines, some replaced routines, veneer properties, and the
- Infix attribute are all examples. In such a case, the <identifier> subelement
- will carry the XML attribute
- artificial
- to indicate that the compiler is providing a sensible name of its own, which
- could be presented to a human, but is not actually an identifier. For instance:
- <routine>
- <identifier artificial="true">lantern.time_left</identifier>
- <value>1820</value>
- <byte-count>80</byte-count>
- <source-code-location>...</source-code-location>
- ...
- </routine>
- Artificial identifiers may contain characters, like the full stop in
- ``lantern.time_left'', that would not be legal in the source language.
- 8: Global Variables
- Globals are similar to named values, except that they are not interpreted as a
- fixed value, but rather have an address where their value can be found. Their
- records therefore contain an <address> tag in place of the <value> tag, as in:
- <global-variable>
- <identifier>darkness_witnessed</identifier>
- <address>1520</address>
- <source-code-location>...</source-code-location>
- </global-variable>
- 9: Local Variables
- The format for local variables mimics the format for global variables, except
- that a source code location is never included, and their memory locations are
- not given by address. For Z-code, locals are specified by index:
- <local-variable>
- <identifier>parameter</identifier>
- <index>1</index>
- </local-variable>
- whereas for Glulx they are specified by frame offset:
- <local-variable>
- <identifier>parameter</identifier>
- <frame-offset>4</frame-offset>
- </local-variable>
- If a local variable identifier is only in scope for part of a routine, it's
- scope will be encoded as a beginning instruction address (inclusive) and an
- ending instruction address (exclusive):
- <local-variable>
- <identifier>rulebook</identifier>
- <index>0</index>
- <scope-address>1628</scope-address>
- <end-scope-address>1678</end-scope-address>
- </local-variable>
- Identifiers with noncontiguous scopes are recorded as one <local-variable>
- element per contiguous region. It is possible for the same identifier to map to
- different variables, so long as the corresponding scopes are disjoint.
- 10: Sequence Points
- Sequence points are stored as an instruction address and the corresponding
- single location in the source code:
- <sequence-point>
- <address>1628</address>
- <source-code-location>...</source-code-location>
- </sequence-point>
- The source code location will always be exactly one position with overlapping
- endpoints.
- Sequence points are defined as in Section 12.4 of the Inform Technical Manual,
- but with the further stipulation that labels do not influence their source code
- locations, as they did in Version 0 of the debug information format. For
- instance, in code like
- say__p = 1; ParaContent(); .L_Say59; .LSayX59;
- t_0 = 0;
- the sequence points are to be placed like this:
- <*> say__p = 1; <*> ParaContent(); .L_Say59; .LSayX59;
- <*> t_0 = 0;
- rather than like this:
- <*> say__p = 1; <*> ParaContent(); <*> .L_Say59; .LSayX59;
- t_0 = 0;
- 11: Source Code Locations
- Most source code locations take the following format, which describes their
- file, the line and character number where they begin (inclusive), the line and
- character number where they end (exclusive), and the file positions (in bytes)
- corresponding to those endpoints:
- <source-code-location>
- <file-index>0</file-index>
- <line>1024</line>
- <character>4</character>
- <file-position>44153</file-position>
- <end-line>1025</end-line>
- <end-character>1</end-character>
- <end-file-position>44186</end-file-position>
- </source-code-location>
- Line numbers and character numbers begin at one, but file positions count from
- zero.
- In the special case where the endpoints coincide, as happens with sequence
- points, the end elements may be elided:
- <source-code-location>
- <file-index>0</file-index>
- <line>1024</line>
- <character>4</character>
- <file-position>44153</file-position>
- </source-code-location>
- At the other extreme, sometimes definitions span several source files or appear
- in two different languages. The former case is dealt with by including multiple
- code location elements and indexing them to indicate order:
- <!-- First Part of Inform 6 Definition -->
- <source-code-location index="0">
- <!-- Assuming file 0 was given with the language "Inform 6" -->
- <file-index>0</file-index>
- <line>1024</line>
- <character>4</character>
- <file-position>44153</file-position>
- <end-line>1025</end-line>
- <end-character>1</end-character>
- <end-file-position>44186</end-file-position>
- </source-code-location>
- <!-- Second Part of Inform 6 Definition -->
- <source-code-location index="1">
- <!-- Assuming file 1 was given with the language "Inform 6" -->
- <file-index>1</file-index>
- <line>1</line>
- <character>0</character>
- <file-position>0</file-position>
- <end-line>3</end-line>
- <end-character>1</end-character>
- <end-file-position>59</end-file-position>
- </source-code-location>
- The latter case is also handled with multiple elements. Note that indexing is
- only used to indicated order among locations in the same language.
- <!-- Inform 7 Definition -->
- <source-code-location>
- <!-- Assuming file 2 was given with the language "Inform 7" -->
- <file-index>2</file-index>
- <line>12</line>
- <character>0</character>
- <file-position>308</file-position>
- <end-line>12</end-line>
- <end-character>112</end-character>
- <end-file-position>420</end-file-position>
- </source-code-location>
- <!-- Inform 6 Definition -->
- <source-code-location>
- <!-- Assuming file 0 was given with the language "Inform 6" -->
- <file-index>0</file-index>
- <line>1024</line>
- <character>4</character>
- <file-position>44153</file-position>
- <end-line>1025</end-line>
- <end-character>1</end-character>
- <end-file-position>44186</end-file-position>
- </source-code-location>
|