DebugFileFormat.txt 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386
  1. Format of Inform 6 Debugging Information Files
  2. Version 1.0
  3. 0: Introduction
  4. This is a specification of the Version 1 format for the debugging information
  5. files emitted by the Inform 6 compiler. It replaces Version 0, which is
  6. documented in Section 12.5 of the Inform Technical Manual.
  7. 1: Overview
  8. Debugging information files are written in XML and encoded in UTF-8. They
  9. therefore begin with the following declaration:
  10. <?xml version="1.0" encoding="UTF-8"?>
  11. Beyond the usual requirements for well-formed XML, the file adheres to the
  12. conventions that all numbers are written in decimal, all strings are
  13. case-sensitive, and all excerpts from binary files are Base64-encoded.
  14. 2: The Top Level
  15. The root element is given by the tag <inform-story-file> with three attributes,
  16. the version of the debug file format being used, the name of the program that
  17. produced the file, and that program's version. For instance,
  18. <inform-story-file version="1.0" content-creator="Inform"
  19. content-creator-version="6.33">
  20. ...
  21. </inform-story-file>
  22. The elements from Sections 3--8 may appear in the ellipses.
  23. 3: Story File Prefix
  24. The story file prefix contains a Base64 encoding of the story file's first bytes
  25. so that a debugging tool can easily check whether the story and the debug
  26. information file are mismatched. For example, the prefix for a Glulx story
  27. might appear as
  28. <story-file-prefix>
  29. R2x1bAADAQEACqEAAAwsAAAMLAAAAQAAAAAAPAAIo2Jc
  30. 6B2XSW5mbwABAAA2LjMyMC4zOAABMTIxMDE1wQAAMA==
  31. </story-file-prefix>
  32. The story file prefix is mandatory, but its length is unspecified. Version 6.33
  33. of the Inform compiler records 64 bytes, which seems sufficient.
  34. 4: Story File Sections
  35. Story file sections partition the story file according to how the data will be
  36. used. For the Inform 6 compiler, this partitioning is the same as the one that
  37. the `z' flag prints.
  38. A record for a story file section gives a name for that section, its beginning
  39. address (inclusive), and its end address (exclusive):
  40. <story-file-section>
  41. <type>abbreviations table</type>
  42. <address>64</address>
  43. <end-address>128</end-address>
  44. </story-file-section>
  45. The names currently in use include those from Section 12.5 of the Inform
  46. Technical Manual:
  47. abbreviations table
  48. header extension (Z-code only)
  49. alphabets table (Z-code only)
  50. Unicode table (Z-code only)
  51. property defaults
  52. object tree
  53. common properties
  54. class numbers
  55. individual properties (Z-code only)
  56. global variables
  57. array space
  58. grammar table
  59. actions table
  60. parsing routines (Z-code only)
  61. adjectives table (Z-code only)
  62. dictionary
  63. code area
  64. strings area
  65. plus one addition for Z-code:
  66. abbreviations
  67. two additions for Glulx:
  68. memory layout id
  69. string decoding table
  70. and three additions for both targets:
  71. header
  72. identifier names
  73. zero padding
  74. Names may repeat; Glulx story files, for example, sometimes have two zero
  75. padding sections.
  76. A compiler that does not wish to subdivide the story file should emit one
  77. section for the entirety and give it the name
  78. story
  79. 5: Source Files
  80. Source files are encoded as in the example below. Each file has a unique index,
  81. which is used by other elements when referring to source code locations; these
  82. indices count from zero. The file's path is recorded in two forms, first as it
  83. was given to the compiler via a command-line argument or include directive but
  84. without any path abbreviations like `>' (the form suitable for presentation to a
  85. human) and second after resolution to an absolute path (the form suitable for
  86. loading the file contents). All paths are written according to the conventions
  87. of the host OS. The language is, at present, either "Inform 6" or "Inform 7".
  88. More languages may added in the future.
  89. <source index="0">
  90. <given-path>example.inf</given-path>
  91. <resolved-path>/home/user/directory/example.inf</resolved-path>
  92. <language>Inform 6</language>
  93. </source>
  94. If the source file is known to appear in the story's Blorb, its chunk number
  95. will also be recorded:
  96. <source index="0">
  97. <given-path>example.inf</given-path>
  98. <resolved-path>/home/user/directory/example.inf</resolved-path>
  99. <language>Inform 6</language>
  100. <blorb-chunk-number>18</blorb-chunk-number>
  101. </source>
  102. 6: Table Entries; Grammar Lines
  103. Table entries are data defined by particular parts of the source code, but
  104. without any corresponding identifiers. The <table-entry> element notes the
  105. entry's type, the address where it begins (inclusive), the address where it ends
  106. (exclusive), and the defining source code location(s), if any:
  107. <table-entry>
  108. <type>grammar line</type>
  109. <address>1004</address>
  110. <end-address>1030</end-address>
  111. <source-code-location>...</source-code-location>
  112. </table-entry>
  113. Version 6.33 of the Inform compiler only emits <table-entry> tags for grammar
  114. lines; these data are all located in the grammar table section.
  115. 7: Named Values; Constants, Attributes, Properties, Actions, Fake Actions,
  116. Objects, Classes, Arrays, and Routines
  117. Records for named values store their identifier, their value, and the source
  118. code location(s) of their definition, if any. For instance,
  119. <constant>
  120. <identifier>MAX_SCORE</identifier>
  121. <value>40</value>
  122. <source-code-location>...</source-code-location>
  123. </constant>
  124. would represent a named constant. Attributes, properties, actions, fake
  125. actions, objects, arrays, and routines are also names for numbers, and differ
  126. only in their use; they are represented in the same format under the tags
  127. <attribute>, <property>, <action>, <fake-action>, <object>, <array>, and
  128. <routine>. (Moreover, unlike Version 0 of the debug information format, fake
  129. actions are not recorded as both fake actions and actions.)
  130. The records for constants include some extra entries for the system constants
  131. tabulated in Section 12.2 of the Inform Technical Manual, even though these are
  132. not created by Constant directives. Entries for #undefed constants are also
  133. included, but necessarily without values.
  134. Some records for objects will represent class objects. In that case, they will
  135. be given with the tag <class> rather than <object> and include an additional
  136. child to indicate their class number:
  137. <class>
  138. <identifier>lamp</identifier>
  139. <class-number>5</class-number>
  140. <value>1560</value>
  141. <source-code-location>...</source-code-location>
  142. </class>
  143. Records for arrays also have extra children, which record their size, their
  144. element size, and the intended semantics for their zeroth element:
  145. <array>
  146. <identifier>route</identifier>
  147. <value>1500</value>
  148. <byte-count>20</byte-count>
  149. <bytes-per-element>4</bytes-per-element>
  150. <zeroth-element-holds-length>true</zeroth-element-holds-length>
  151. <source-code-location>...</source-code-location>
  152. </array>
  153. And finally, <routine> records contain an <address> and a <byte-count> element,
  154. along with any number of the <local-variable> and <sequence-point> elements,
  155. which are described in Sections 9 and 10. The address is provided because the
  156. identifier's value may be packed.
  157. Sometimes what would otherwise be a named value is in fact anonymous; unnamed
  158. objects, embedded routines, some replaced routines, veneer properties, and the
  159. Infix attribute are all examples. In such a case, the <identifier> subelement
  160. will carry the XML attribute
  161. artificial
  162. to indicate that the compiler is providing a sensible name of its own, which
  163. could be presented to a human, but is not actually an identifier. For instance:
  164. <routine>
  165. <identifier artificial="true">lantern.time_left</identifier>
  166. <value>1820</value>
  167. <byte-count>80</byte-count>
  168. <source-code-location>...</source-code-location>
  169. ...
  170. </routine>
  171. Artificial identifiers may contain characters, like the full stop in
  172. ``lantern.time_left'', that would not be legal in the source language.
  173. 8: Global Variables
  174. Globals are similar to named values, except that they are not interpreted as a
  175. fixed value, but rather have an address where their value can be found. Their
  176. records therefore contain an <address> tag in place of the <value> tag, as in:
  177. <global-variable>
  178. <identifier>darkness_witnessed</identifier>
  179. <address>1520</address>
  180. <source-code-location>...</source-code-location>
  181. </global-variable>
  182. 9: Local Variables
  183. The format for local variables mimics the format for global variables, except
  184. that a source code location is never included, and their memory locations are
  185. not given by address. For Z-code, locals are specified by index:
  186. <local-variable>
  187. <identifier>parameter</identifier>
  188. <index>1</index>
  189. </local-variable>
  190. whereas for Glulx they are specified by frame offset:
  191. <local-variable>
  192. <identifier>parameter</identifier>
  193. <frame-offset>4</frame-offset>
  194. </local-variable>
  195. If a local variable identifier is only in scope for part of a routine, it's
  196. scope will be encoded as a beginning instruction address (inclusive) and an
  197. ending instruction address (exclusive):
  198. <local-variable>
  199. <identifier>rulebook</identifier>
  200. <index>0</index>
  201. <scope-address>1628</scope-address>
  202. <end-scope-address>1678</end-scope-address>
  203. </local-variable>
  204. Identifiers with noncontiguous scopes are recorded as one <local-variable>
  205. element per contiguous region. It is possible for the same identifier to map to
  206. different variables, so long as the corresponding scopes are disjoint.
  207. 10: Sequence Points
  208. Sequence points are stored as an instruction address and the corresponding
  209. single location in the source code:
  210. <sequence-point>
  211. <address>1628</address>
  212. <source-code-location>...</source-code-location>
  213. </sequence-point>
  214. The source code location will always be exactly one position with overlapping
  215. endpoints.
  216. Sequence points are defined as in Section 12.4 of the Inform Technical Manual,
  217. but with the further stipulation that labels do not influence their source code
  218. locations, as they did in Version 0 of the debug information format. For
  219. instance, in code like
  220. say__p = 1; ParaContent(); .L_Say59; .LSayX59;
  221. t_0 = 0;
  222. the sequence points are to be placed like this:
  223. <*> say__p = 1; <*> ParaContent(); .L_Say59; .LSayX59;
  224. <*> t_0 = 0;
  225. rather than like this:
  226. <*> say__p = 1; <*> ParaContent(); <*> .L_Say59; .LSayX59;
  227. t_0 = 0;
  228. 11: Source Code Locations
  229. Most source code locations take the following format, which describes their
  230. file, the line and character number where they begin (inclusive), the line and
  231. character number where they end (exclusive), and the file positions (in bytes)
  232. corresponding to those endpoints:
  233. <source-code-location>
  234. <file-index>0</file-index>
  235. <line>1024</line>
  236. <character>4</character>
  237. <file-position>44153</file-position>
  238. <end-line>1025</end-line>
  239. <end-character>1</end-character>
  240. <end-file-position>44186</end-file-position>
  241. </source-code-location>
  242. Line numbers and character numbers begin at one, but file positions count from
  243. zero.
  244. In the special case where the endpoints coincide, as happens with sequence
  245. points, the end elements may be elided:
  246. <source-code-location>
  247. <file-index>0</file-index>
  248. <line>1024</line>
  249. <character>4</character>
  250. <file-position>44153</file-position>
  251. </source-code-location>
  252. At the other extreme, sometimes definitions span several source files or appear
  253. in two different languages. The former case is dealt with by including multiple
  254. code location elements and indexing them to indicate order:
  255. <!-- First Part of Inform 6 Definition -->
  256. <source-code-location index="0">
  257. <!-- Assuming file 0 was given with the language "Inform 6" -->
  258. <file-index>0</file-index>
  259. <line>1024</line>
  260. <character>4</character>
  261. <file-position>44153</file-position>
  262. <end-line>1025</end-line>
  263. <end-character>1</end-character>
  264. <end-file-position>44186</end-file-position>
  265. </source-code-location>
  266. <!-- Second Part of Inform 6 Definition -->
  267. <source-code-location index="1">
  268. <!-- Assuming file 1 was given with the language "Inform 6" -->
  269. <file-index>1</file-index>
  270. <line>1</line>
  271. <character>0</character>
  272. <file-position>0</file-position>
  273. <end-line>3</end-line>
  274. <end-character>1</end-character>
  275. <end-file-position>59</end-file-position>
  276. </source-code-location>
  277. The latter case is also handled with multiple elements. Note that indexing is
  278. only used to indicated order among locations in the same language.
  279. <!-- Inform 7 Definition -->
  280. <source-code-location>
  281. <!-- Assuming file 2 was given with the language "Inform 7" -->
  282. <file-index>2</file-index>
  283. <line>12</line>
  284. <character>0</character>
  285. <file-position>308</file-position>
  286. <end-line>12</end-line>
  287. <end-character>112</end-character>
  288. <end-file-position>420</end-file-position>
  289. </source-code-location>
  290. <!-- Inform 6 Definition -->
  291. <source-code-location>
  292. <!-- Assuming file 0 was given with the language "Inform 6" -->
  293. <file-index>0</file-index>
  294. <line>1024</line>
  295. <character>4</character>
  296. <file-position>44153</file-position>
  297. <end-line>1025</end-line>
  298. <end-character>1</end-character>
  299. <end-file-position>44186</end-file-position>
  300. </source-code-location>