internals-1 51 KB


  1. Info file internals, produced by texinfo-format-buffer -*-Text-*-
  2. from file internals.texinfo
  3. This file documents the internals of the GNU compiler.
  4. Copyright (C) 1987 Richard M. Stallman.
  5. Permission is granted to make and distribute verbatim copies of
  6. this manual provided the copyright notice and this permission notice
  7. are preserved on all copies.
  8. Permission is granted to copy and distribute modified versions of this
  9. manual under the conditions for verbatim copying, provided also that the
  10. section entitled "GNU CC General Public License" is included exactly as
  11. in the original, and provided that the entire resulting derived work is
  12. distributed under the terms of a permission notice identical to this one.
  13. Permission is granted to copy and distribute translations of this manual
  14. into another language, under the above conditions for modified versions,
  15. except that the section entitled "GNU CC General Public License" may be
  16. included in a translation approved by the author instead of in the original
  17. English.
  18. 
  19. File: internals Node: Top, Up: (DIR), Next: Switches
  20. Introduction
  21. ************
  22. This manual documents how to install and port the GNU C compiler.
  23. * Menu:
  24. * Copying:: GNU CC General Public License says
  25. how you can copy and share GNU CC.
  26. * Switches:: Command switches supported by `gcc'.
  27. * Installation:: How to configure, compile and install GNU CC.
  28. * Portability:: Goals of GNU CC's portability features.
  29. * Passes:: Order of passes, what they do, and what each file is for.
  30. * RTL:: The intermediate representation that most passes work on.
  31. * Machine Desc:: How to write machine description instruction patterns.
  32. * Machine Macros:: How to write the machine description C macros.
  33. 
  34. File: internals Node: Copying, Prev: Top, Up: Top, Next: Switches
  35. GNU CC GENERAL PUBLIC LICENSE
  36. *****************************
  37. The license agreements of most software companies keep you at the
  38. mercy of those companies. By contrast, our general public license is
  39. intended to give everyone the right to share GNU CC. To make sure that
  40. you get the rights we want you to have, we need to make restrictions
  41. that forbid anyone to deny you these rights or to ask you to surrender
  42. the rights. Hence this license agreement.
  43. Specifically, we want to make sure that you have the right to give
  44. away copies of GNU CC, that you receive source code or else can get it
  45. if you want it, that you can change GNU CC or use pieces of it in new
  46. free programs, and that you know you can do these things.
  47. To make sure that everyone has such rights, we have to forbid you to
  48. deprive anyone else of these rights. For example, if you distribute
  49. copies of GNU CC, you must give the recipients all the rights that you
  50. have. You must make sure that they, too, receive or can get the
  51. source code. And you must tell them their rights.
  52. Also, for our own protection, we must make certain that everyone
  53. finds out that there is no warranty for GNU CC. If GNU CC is modified by
  54. someone else and passed on, we want its recipients to know that what
  55. they have is not what we distributed, so that any problems introduced
  56. by others will not reflect on our reputation.
  57. Therefore we (Richard Stallman and the Free Software Fundation,
  58. Inc.) make the following terms which say what you must do to be
  59. allowed to distribute or change GNU CC.
  60. COPYING POLICIES
  61. ================
  62. 1. You may copy and distribute verbatim copies of GNU CC source code as
  63. you receive it, in any medium, provided that you conspicuously and
  64. appropriately publish on each copy a valid copyright notice
  65. "Copyright (C) 1987 Free Software Foundation, Inc." (or
  66. with the year updated if that is appropriate); keep intact the notices
  67. on all files that refer to this License Agreement and to the absence
  68. of any warranty; and give any other recipients of the GNU CC program a
  69. copy of this License Agreement along with the program. You may charge
  70. a distribution fee for the physical act of transferring a copy.
  71. 2. You may modify your copy or copies of GNU CC or any portion of it,
  72. and copy and distribute such modifications under the terms of
  73. Paragraph 1 above, provided that you also do the following:
  74. * cause the modified files to carry prominent notices stating
  75. that you changed the files and the date of any change; and
  76. * cause the whole of any work that you distribute or publish,
  77. that in whole or in part contains or is a derivative of GNU CC or
  78. any part thereof, to be licensed at no charge to all third
  79. parties on terms identical to those contained in this License
  80. Agreement (except that you may choose to grant more extensive
  81. warranty protection to some or all third parties, at your
  82. option).
  83. * You may charge a distribution fee for the physical act of
  84. transferring a copy, and you may at your option offer warranty
  85. protection in exchange for a fee.
  86. 3. You may copy and distribute GNU CC or any portion of it in
  87. compiled, executable or object code form under the terms of Paragraphs
  88. 1 and 2 above provided that you do the following:
  89. * cause each such copy to be accompanied by the
  90. corresponding machine-readable source code, which must
  91. be distributed under the terms of Paragraphs 1 and 2 above; or,
  92. * cause each such copy to be accompanied by a
  93. written offer, with no time limit, to give any third party
  94. free (except for a nominal shipping charge) a machine readable
  95. copy of the corresponding source code, to be distributed
  96. under the terms of Paragraphs 1 and 2 above; or,
  97. * in the case of a recipient of GNU CC in compiled, executable
  98. or object code form (without the corresponding source code) you
  99. shall cause copies you distribute to be accompanied by a copy
  100. of the written offer of source code which you received along
  101. with the copy you received.
  102. 4. You may not copy, sublicense, distribute or transfer GNU CC
  103. except as expressly provided under this License Agreement. Any attempt
  104. otherwise to copy, sublicense, distribute or transfer GNU CC is void and
  105. your rights to use the program under this License agreement shall be
  106. automatically terminated. However, parties who have received computer
  107. software programs from you with this License Agreement will not have
  108. their licenses terminated so long as such parties remain in full compliance.
  109. 5. If you wish to incorporate parts of GNU CC into other free programs
  110. whose distribution conditions are different, write to the Free Software
  111. Foundation at 1000 Mass Ave, Cambridge, MA 02138. We have not yet worked
  112. out a simple rule that can be stated here, but we will often permit this.
  113. We will be guided by the two goals of preserving the free status of all
  114. derivatives our free software and of promoting the sharing and reuse of
  115. software.
  116. Your comments and suggestions about our licensing policies and our
  117. software are welcome! Please contact the Free Software Foundation, Inc.,
  118. 1000 Mass Ave, Cambridge, MA 02138, or call (617) 876-3296.
  119. NO WARRANTY
  120. ===========
  121. BECAUSE GNU CC IS LICENSED FREE OF CHARGE, WE PROVIDE ABSOLUTELY NO
  122. WARRANTY, TO THE EXTENT PERMITTED BY APPLICABLE STATE LAW. EXCEPT
  123. WHEN OTHERWISE STATED IN WRITING, FREE SOFTWARE FOUNDATION, INC,
  124. RICHARD M. STALLMAN AND/OR OTHER PARTIES PROVIDE GNU CC "AS IS" WITHOUT
  125. WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT
  126. LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
  127. A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND
  128. PERFORMANCE OF GNU CC IS WITH YOU. SHOULD GNU CC PROVE DEFECTIVE, YOU
  129. ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
  130. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW WILL RICHARD M.
  131. STALLMAN, THE FREE SOFTWARE FOUNDATION, INC., AND/OR ANY OTHER PARTY
  132. WHO MAY MODIFY AND REDISTRIBUTE GNU CC AS PERMITTED ABOVE, BE LIABLE TO
  133. YOU FOR DAMAGES, INCLUDING ANY LOST PROFITS, LOST MONIES, OR OTHER
  134. SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR
  135. INABILITY TO USE (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA
  136. BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY THIRD PARTIES OR A
  137. FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS) GNU CC, EVEN
  138. IF YOU HAVE BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES, OR FOR
  139. ANY CLAIM BY ANY OTHER PARTY.
  140. 
  141. File: internals Node: Switches, Prev: Copying, Up: Top, Next: Installation
  142. GNU CC Switches
  143. ***************
  144. `-O'
  145. Do optimize.
  146. `-g'
  147. Produce debugging information in DBX format.
  148. `-c'
  149. Compile but do not link the object files.
  150. `-o FILE'
  151. Place linker output in file FILE.
  152. `-S'
  153. Compile into assembler code but do not assemble.
  154. `-mMACHINESPEC'
  155. Machine-dependent switch specifying something about the type
  156. of target machine. For example, using the 68000 machine description,
  157. `-m68000' specifies do not use the 68020 instructions,
  158. and `-msoft-float' specifies do not use the 68881 floating point
  159. instructions.
  160. `-dLETTERS'
  161. Says to make debugging dumps at times specified by LETTERS.
  162. Here are the possible letters:
  163. `t'
  164. Dump syntax-tree.
  165. `r'
  166. Dump after RTL generation.
  167. `j'
  168. Dump after first jump optimization.
  169. `s'
  170. Dump after CSE.
  171. `L'
  172. Dump after loop optimization.
  173. `f'
  174. Dump after flow analysis.
  175. `c'
  176. Dump after instruction combination.
  177. `l'
  178. Dump after local register allocation.
  179. `g'
  180. Dump after global register allocation.
  181. `-pedantic'
  182. Attempt to support strict ANSI standard C. Valid ANSI standard C
  183. programs should compile properly with or without this switch.
  184. However, without this switch, certain useful or traditional constructs
  185. banned by the standard are supported. With this switch, they are
  186. rejected. There is no reason to use this switch; it exists only
  187. to satisfy pedants.
  188. `E'
  189. Preprocess the input files and output the results to standard output.
  190. `C'
  191. Tell the preprocessor not to discard comments. Used with the `-E'
  192. switch.
  193. `IDIR'
  194. Search directory DIR for include files.
  195. `DMACRO'
  196. Define macro MACRO with the empty string as its definition.
  197. `DMACRO=DEFN'
  198. Define macro MACRO as DEFN.
  199. `UMACRO'
  200. Undefine macro MACRO.
  201. `w'
  202. Inhibit warning messages.
  203. `v'
  204. Compiler driver program prints the commands it executes as it runs
  205. the preprocessor, compiler proper, assembler and linker.
  206. `BPREFIX'
  207. Compiler driver program tries PREFIX as a prefix for each program
  208. it tries to run. These programs are `cpp', `cc1',
  209. `as' and `ld'.
  210. For each subprogram to be run, the compiler driver first tries the
  211. `-B' prefix, if any. If that name is not found, or if `-B'
  212. was not specified, the driver tries two standard prefixes, which are
  213. `/usr/lib/gcc-' and `/usr/local/lib/gcc-'. If neither of
  214. those results in a file name that is found, the unmodified program
  215. name is searched for using the `PATH' environment variable.
  216. 
  217. File: internals Node: Installation, Prev: Switches, Up: Top, Next: Portability
  218. Installing GNU CC
  219. *****************
  220. 1. Choose configuration files.
  221. * Make a symbolic link from file `config.h' to the top-level
  222. config file for the machine you are using. Its name should be
  223. `config-MACHINE.h'. This file is responsible for
  224. defining information about the host machine. It includes
  225. `tm.h'.
  226. * Make a symbolic link from `tm.h' to the machine-description
  227. macro file for your machine (its name should be
  228. `tm-MACHINE.h').
  229. * Make a symbolic link from `md' to the
  230. machine description pattern file (its name should be
  231. `MACHINE.md').
  232. * Make a symbolic link from
  233. `aux-output.c' to the output-subroutine file for your machine
  234. (its name should be `MACHINE-output.c').
  235. 2. Make sure the Bison parser generator is installed.
  236. 3. Build the compiler. Just type `make' in the compiler directory.
  237. 4. Delete `*.o' in the compiler directory. The executables from
  238. the previous step remain for the next step.
  239. 5. Remake the compiler with
  240. make CC=./gcc CFLAGS="-g -O -I."
  241. 6. Install the compiler's passes. Copy the file `cc1' just made
  242. to `/usr/local/lib/gcc-cc1'.
  243. Make the file `/usr/local/lib/gcc-cpp' either a link to `/lib/cpp'
  244. or a copy of the file `cpp' generated by `make'.
  245. *Warning: the GNU CPP may not work for @file{ioctl.h}.* This
  246. cannot be fixed in the GNU CPP because the bug is in `ioctl.h':
  247. at least on some machines, it relies on behavior that is incompatible
  248. with ANSI C. This behavior consists of substituting for macro
  249. argument names when they appear inside of character constants.
  250. 7. Install the compiler driver. This is the file `gcc' generated
  251. by `make'.
  252. 
  253. File: internals Node: Portability, Prev: Installation, Up: Top, Next: Passes
  254. GNU CC and Portability
  255. **********************
  256. The main goal of GNU CC was to make a good, fast compiler for machines in
  257. the class that the GNU system aims to run on: 32-bit machines that address
  258. 8-bit bytes and have several general registers. Elegance, theoretical
  259. power and simplicity are only secondary.
  260. GNU CC gets most of the information about the target machine from a machine
  261. description which gives an algebraic formula for each of the machine's
  262. instructions. This is a very clean way to describe the target. But when
  263. the compiler needs information that is difficult to express in this
  264. fashion, I have not hesitated to define an ad-hoc parameter to the machine
  265. description. The purpose of portability is to reduce the total work needed
  266. on the compiler; it was not of interest for its own sake.
  267. GNU CC does not contain machine dependent code, but it does contain code
  268. that depends on machine parameters such as endianness (whether the most
  269. significant byte has the highest or lowest address of the bytes in a word)
  270. and the availability of autoincrement addressing. In the RTL-generation
  271. pass, it is often necessary to have multiple strategies for generating code
  272. for a particular kind of syntax tree, strategies that are usable for different
  273. combinations of parameters. Often I have not tried to address all possible
  274. cases, but only the common ones or only the ones that I have encountered.
  275. As a result, a new target may require additional strategies. You will know
  276. if this happens because the compiler will call `abort'. Fortunately,
  277. the new strategies can be added to all versions of the compiler, and will
  278. be relevant only for target machines that need them.
  279. 
  280. File: internals Node: Passes, Prev: Portability, Up: Top, Next: RTL
  281. Passes and Files of the Compiler
  282. ********************************
  283. The overall control structure of the compiler is in `toplev.c'. This
  284. file is responsible for initialization, decoding arguments, opening and
  285. closing files, and sequencing the passes.
  286. The parsing pass is invoked only once, to parse the entire input. Each
  287. time a complete function definition or top-level data definition is read,
  288. the parsing pass calls the function `rest_of_compilation' in
  289. `toplev.c', which is responsible for all further processing necessary,
  290. ending with output of the assembler language. All other compiler passes
  291. run, in sequence, within `rest_of_compilation'. After
  292. `rest_of_compilation' returns from compiling a function definition,
  293. the storage used for its compilation is entirely freed.
  294. Here is a list of all the passes of the compiler and their source files.
  295. Also included is a description of where debugging dumps can be requested
  296. with `-d' switches.
  297. * Parsing. This pass reads the entire text of a function definition,
  298. constructing a syntax tree. The tree representation does not entirely
  299. follow C syntax, because it is intended to support other languages as well.
  300. C data type analysis is also done in this pass, and every tree node that
  301. represents an expression has a data type attached. Variables are represented
  302. as declaration nodes.
  303. Constant folding and associative-law simplifications are also done during
  304. this pass.
  305. The source files of the parsing pass are `parse.y', `decl.c',
  306. `typecheck.c', `stor-layout.c', `fold-const.c', and
  307. `tree.c'. The last three are intended to be language-independent.
  308. There are also header files `parse.h', `c-tree.h',
  309. `tree.h' and `tree.def'. The last two define the format of
  310. the tree representation.
  311. * RTL generation. This pass converts the tree structure for one
  312. function into RTL code.
  313. This is where the bulk of target-parameter-dependent code is found,
  314. since often it is necessary for strategies to apply only when certain
  315. standard kinds of instructions are available. The purpose of named
  316. instruction patterns is to provide this information to the RTL
  317. generation pass.
  318. Optimization is done in this pass for `if'-conditions that are
  319. comparisons, boolean operations or conditional expressions. Tail
  320. recursion is detected at this time also. Decisions are made about how
  321. best to arrange loops and how to output `switch' statements.
  322. The files of the RTL generation pass are `stmt.c', `expr.c',
  323. `explow.c', `expmed.c', `optabs.c' and `emit-rtl.c'.
  324. Also, the file `insn-emit.c', generated from the machine description
  325. by the program `genemit', is used in this pass. The header files
  326. `expr.h' is used for communication within this pass.
  327. The header files `insn-flags.h' and `insn-codes.h', generated from
  328. the machine description by the programs `genflags' and `gencodes',
  329. tell this pass which standard names are available for use and which patterns
  330. correspond to them.
  331. Aside from debugging information output, none of the following passes
  332. refers to the tree structure representation of the function.
  333. The switch `-dr' causes a debugging dump of the RTL code after this
  334. pass. This dump file's name is made by appending `.rtl' to the
  335. input file name.
  336. * Jump optimization. This pass simplifies jumps to the following instruction,
  337. jumps across jumps, and jumps to jumps. It deletes unreferenced labels
  338. and unreachable code, except that unreachable code that contains a loop
  339. is not recognized as unreachable in this pass. (Such loops are deleted
  340. later in the basic block analysis.)
  341. Jump optimization is performed two or three times. The first time is
  342. immediately following RTL generation.
  343. The source file of this pass is `jump.c'.
  344. The switch `-dj' causes a debugging dump of the RTL code after this
  345. pass is run for the first time. This dump file's name is made by appending
  346. `.jump' to the input file name.
  347. * Register scan. This pass finds the first and last use of each
  348. register, as a guide for common subexpression elimination. Its source
  349. is in `regclass.c'.
  350. * Common subexpression elimination. This pass also does constant
  351. propagation. Its source file is `cse.c'. If constant
  352. propagation causes conditional jumps to become unconditional or to
  353. become no-ops, jump optimization is run again when cse is finished.
  354. The switch `-ds' causes a debugging dump of the RTL code after
  355. this pass. This dump file's name is made by appending `.cse' to
  356. the input file name.
  357. * Loop optimization. This pass moves constant expressions out of loops.
  358. Its source file is `loop.c'.
  359. The switch `-dL' causes a debugging dump of the RTL code after
  360. this pass. This dump file's name is made by appending `.loop' to
  361. the input file name.
  362. * Stupid register allocation is performed at this point in a
  363. nonoptimizing compilation. It does a little data flow analysis as
  364. well. When stupid register allocation is in use, the next pass
  365. executed is the reloading pass; the others in between are skipped.
  366. The source file is `stupid.c', with header file `stupid.h'
  367. used for communication with the RTL generation pass.
  368. * Data flow analysis (`flow.c'). This pass divides the program
  369. into basic blocks (and in the process deletes unreachable loops); then
  370. it computes which pseudo-registers are live at each point in the
  371. program, and makes the first instruction that uses a value point at
  372. the instruction that computed the value.
  373. This pass also deletes computations whose results are never used, and
  374. combines memory references with add or subtract instructions to make
  375. autoincrement or autodecrement addressing.
  376. The switch `-df' causes a debugging dump of the RTL code after
  377. this pass. This dump file's name is made by appending `.flow' to
  378. the input file name. If stupid register allocation is in use, this
  379. dump file reflects the full results of such allocation.
  380. * Instruction combination (`combine.c'). This pass attempts to
  381. combine groups of two or three instructions that are related by data
  382. flow into single instructions. It combines the RTL expressions for
  383. the instructions by substitution, simplifies the result using algebra,
  384. and then attempts to match the result against the machine description.
  385. The switch `-dc' causes a debugging dump of the RTL code after
  386. this pass. This dump file's name is made by appending `.combine'
  387. to the input file name.
  388. * Register class preferencing. The RTL code is scanned to find out
  389. which register class is best for each pseudo register. The source file
  390. is `regclass.c'.
  391. * Local register allocation (`local-alloc.c'). This pass allocates
  392. hard registers to pseudo registers that are used only within one basic
  393. block. Because the basic block is linear, it can use fast and powerful
  394. techniques to do a very good job.
  395. The switch `-dl' causes a debugging dump of the RTL code after
  396. this pass. This dump file's name is made by appending `.lreg' to
  397. the input file name.
  398. * Global register allocation (`global-alloc.c'). This pass
  399. allocates hard registers for the remaining pseudo registers (those
  400. whose life spans are not contained in one basic block).
  401. * Reloading. This pass finds instructions that are invalid because a
  402. value has failed to end up in a register, or has ended up in a
  403. register of the wrong kind. It fixes up these instructions by
  404. reloading the problematical values into registers temporarily.
  405. Additional instructions are generated to do the copying.
  406. Source files are `reload.c' and `reload1.c', plus the header
  407. `reload.h' used for communication between them.
  408. The switch `-dg' causes a debugging dump of the RTL code after
  409. this pass. This dump file's name is made by appending `.greg' to
  410. the input file name.
  411. * Jump optimization is repeated, this time including cross-jumping.
  412. * Final. This pass outputs the assembler code for the function. It is
  413. also responsible for identifying no-op move instructions and spurious
  414. test and compare instructions. The function entry and exit sequences
  415. are generated directly as assembler code in this pass; they never
  416. exist as RTL. Pseudo registers that did not get hard registers are
  417. given stack slots in this pass.
  418. The source files are `final.c' plus `insn-output.c'; the
  419. latter is generated automatically from the machine description by the
  420. tool `genoutput'. The header file `conditions.h' is used
  421. for communication between these files.
  422. * Debugging information output. This is run after final because it must
  423. output the stack slot offsets for pseudo registers that did not get
  424. hard registers. Source files are `dbxout.c' for DBX symbol table
  425. format and `symout.c' for GDB's own symbol table format.
  426. Some additional files are used by all or many passes:
  427. * Every pass uses `machmode.def', which defines the machine modes.
  428. * All the passes that work with RTL use the header files `rtl.h'
  429. and `rtl.def', and subroutines in file `rtl.c'. The
  430. tools `gen*' also use these files to read and work with the
  431. machine description RTL.
  432. * Several passes refer to the header file `insn-config.h' which
  433. contains a few parameters (C macro definitions) generated
  434. automatically from the machine description RTL by the tool
  435. `genconfig'.
  436. * Several passes use the instruction recognizer, which consists of
  437. `recog.c' and `recog.h', plus the files `insn-recog.c'
  438. and `insn-extract.c' that are generated automatically from the
  439. machine description by the tools `genrecog' and `genextract'.
  440. * Several passes use the header file `regs.h' which defines the
  441. information recorded about pseudo register usage, `basic-block.h'
  442. which defines the information recorded about basic blocks.
  443. * `hard-reg-set.h' defines the type `HARD_REG_SET', a bit-vector
  444. with a bit for each hard register, and some macros to manipulate it.
  445. This type is just `int' if the machine has few enough hard registers;
  446. otherwise it is an array of `int' and some of the macros expand
  447. into loops.
  448. 
  449. File: internals Node: RTL, Prev: Passes, Up: Top, Next: Machine Desc
  450. RTL Representation
  451. ******************
  452. Most of the work of the compiler is done on an intermediate representation
  453. called register tranfer language. In this language, the instructions to be
  454. output are described, pretty much one by one, in an algebraic form that
  455. describes what the instruction does.
  456. RTL is inspired by Lisp lists. It has both an internal form, made up of
  457. structures that point at other structures, and a textual form that is used
  458. in the machine description and in printed debugging dumps. The textual
  459. form uses nested parentheses to indicate the pointers in the internal form.
  460. * Menu:
  461. * RTL Objects:: Expressions vs vectors vs strings vs integers.
  462. * Accessors:: Macros to access expression operands or vector elts.
  463. * Machine Modes:: Describing the size and format of a datum.
  464. * Constants:: Expressions with constant values.
  465. * Regs and Memory:: Expressions representing register contents or memory.
  466. * Arithmetic:: Expressions representing arithmetic on other expressions.
  467. * Comparisons:: Expressions representing comparison of expressions.
  468. * Bit Fields:: Expressions representing bit-fields in memory or reg.
  469. * Conversions:: Extending, truncating, floating or fixing.
  470. * RTL Declarations:: Declaring volatility, constancy, etc.
  471. * Side Effects:: Expressions for storing in registers, etc.
  472. * Incdec:: Embedded side-effects for autoincrement addressing.
  473. * Insns:: Expression types for entire insns.
  474. * Sharing:: Some expressions are unique; others *must* be copied.
  475. 
  476. File: internals Node: RTL Objects, Prev: RTL, Up: RTL, Next: Accessors
  477. RTL Object Types
  478. ================
  479. RTL uses four kinds of objects: expressions, integers, strings and vectors.
  480. Expressions are the most important ones. An RTL expression is a C
  481. structure, but it is usually referred to with a pointer; a type that is
  482. given the typedef name `rtx'.
  483. An integer is simply an `int', and a string is a `char *'.
  484. Within rtl code, strings appear only inside `symbol_ref' expressions,
  485. but they appear in other contexts in the rtl expressions that make up
  486. machine descriptions. Their written form uses decimal digits.
  487. A string is a sequence of characters. In core it is represented as a
  488. `char *' in usual C fashion, and they are written in C syntax as well.
  489. However, strings in RTL may never be null. If you write an empty string in
  490. a machine description, it is represented in core as a null pointer rather
  491. than as a pointer to a null character. In certain contexts, these null
  492. pointers instead of strings are valid.
  493. A vector contains an arbitrary, specified number of pointers to
  494. expressions. The number of elements in the vector is explicitly present in
  495. the vector. The written form of a vector consists of square brackets
  496. (`[...]') surrounding the elements, in sequence and with
  497. whitespace separating them. Vectors of length zero are not created; null
  498. pointers are used instead.
  499. Expressions are classified by "expression code". The expression code
  500. is a name defined in `rtl.def', which is also (in upper case) a C
  501. enumeration constant. The possible expression codes and their meanings are
  502. machine-independent. The code of an rtx can be extracted with the macro
  503. `GET_CODE (X)' and altered with `PUT_CODE (X,
  504. NEWCODE)'.
  505. The expression code determines how many operands the expression contains,
  506. and what kinds of objects they are. In RTL, unlike Lisp, you cannot tell
  507. by looking at an operand what kind of object it is. Instead, you must know
  508. from its context---from the expression code of the containing expression.
  509. For example, in an expression of code `subreg', the first operand is
  510. to be regarded as an expression and the second operand as an integer. In
  511. an expression of code `plus', there are two operands, both of which
  512. are to be regarded as expressions. In a `symbol_ref' expression,
  513. there is one operand, which is to be regarded as a string.
  514. Expressions are written as parentheses containing the name of the
  515. expression type, its flags and machine mode if any, and then the operands
  516. of the expression (separated by spaces).
  517. In a few contexts a null pointer is valid where an expression is normally
  518. wanted. The written form of this is `(nil)'.
  519. 
  520. File: internals Node: Accessors, Prev: RTL Objects, Up: RTL, Next: Machine Modes
  521. Access to Operands
  522. ==================
  523. For each expression type `rtl.def' specifies the number of contained
  524. objects and their kinds, with four possibilities: `e' for expression
  525. (actually a pointer to an expression), `i' for integer, `s' for
  526. string, and `E' for vector of expressions. The sequence of letters
  527. for an expression code is called its "format". Thus, the format of
  528. `subreg' is `ei'.
  529. Two other format characters are used occasionally: `u' and `0'.
  530. `u' is equivalent to `e' except that it is printed differently in
  531. debugging dumps, and `0' means a slot whose contents do not fit any
  532. normal category. `0' slots are not printed at all in dumps, and are
  533. often used in special ways by small parts of the compiler.
  534. There are macros to get the number of operands and the format of an
  535. expression code:
  536. `GET_RTX_LENGTH (CODE)'
  537. Number of operands of an rtx of code CODE.
  538. `GET_RTX_FORMAT (CODE)'
  539. The format of an rtx of code CODE, as a C string.
  540. Operands of expressions are accessed using the macros `XEXP',
  541. `XINT' and `XSTR'. Each of these macros takes two arguments: an
  542. expression-pointer (rtx) and an operand number (counting from zero). Thus,
  543. XEXP (x, 2)
  544. accesses operand 2 of expression X, as an expression.
  545. XINT (x, 2)
  546. accesses the same operand as an integer. `XSTR', used in the same
  547. fashion, would access it as a string.
  548. Any operand can be accessed as an integer, as an expression or as a string.
  549. You must choose the correct method of access for the kind of value actually
  550. stored in the operand. You would do this based on the expression code of
  551. the containing expression. That is also how you would know how many
  552. operands there are.
  553. For example, if X is a `subreg' expression, you know that it has
  554. two operands which can be correctly accessed as `XEXP (x, 0)' and
  555. `XINT (x, 1)'. If you did `XINT (x, 0)', you would get the
  556. address of the expression operand but cast as an integer; that might
  557. occasionally be useful, but it would be cleaner to write `(int) XEXP
  558. (x, 0)'. `XEXP (x, 1)' would also compile without error, and would
  559. return the second, integer operand cast as an expression pointer, which
  560. would probably result in a crash when accessed. Nothing stops you from
  561. writing `XEXP (x, 28)' either, but this will access memory past the
  562. end of the expression with unpredictable results.
  563. Access to operands which are vectors is more complicated. You can use the
  564. macro `XVEC' to get the vector-pointer itself, or the macros
  565. `XVECEXP' and `XVECLEN' to access the elements and length of a
  566. vector.
  567. `XVEC (EXP, IDX)'
  568. Access the vector-pointer which is operand number IDX in EXP.
  569. `XVECLEN (EXP, IDX)'
  570. Access the length (number of elements) in the vector which is
  571. in operand number IDX in EXP. This value is an `int'.
  572. `XVECLEN (EXP, IDX, ELTNUM)'
  573. Access element number ELTNUM in the vector which is
  574. in operand number IDX in EXP. This value is an `rtx'.
  575. It is up to you to make sure that ELTNUM is not negative
  576. and is less than `XVECLEN (EXP, IDX)'.
  577. All the macros defined in this section expand into lvalues and therefore
  578. can be used to assign the operands, lengths and vector elements as well as
  579. to access them.
  580. 
  581. File: internals Node: Machine Modes, Prev: Accessors, Up: RTL, Next: Constants
  582. Machine Modes
  583. =============
  584. A machine mode describes a size of data object and the representation used
  585. for it. In the C code, machine modes are represented by an enumeration
  586. type, `enum machine_mode'. Each rtl expression has room for a machine
  587. mode and so do certain kinds of tree expressions (declarations and types,
  588. to be precise).
  589. In debugging dumps and machine descriptions, the machine mode of an RTL
  590. expression is written after the expression code with a colon to separate
  591. them. The letters `mode' which appear at the end of each machine mode
  592. name are omitted. For example, `(reg:SI 38)' is a `reg'
  593. expression with machine mode `SImode'. If the mode is
  594. `VOIDmode', it is not written at all.
  595. Here is a table of machine modes.
  596. `QImode'
  597. "Quarter-Integer" mode represents a single byte treated as an integer.
  598. `HImode'
  599. "Half-Integer" mode represents a two-byte integer.
  600. `SImode'
  601. "Single Integer" mode represents a four-byte integer.
  602. `DImode'
  603. "Double Integer" mode represents an eight-byte integer.
  604. `TImode'
  605. "Tetra Integer" (?) mode represents a sixteen-byte integer.
  606. `SFmode'
  607. "Single Floating" mode represents a single-precision (four byte) floating
  608. point number.
  609. `DFmode'
  610. "Double Floating" mode represents a double-precision (eight byte) floating
  611. point number.
  612. `TFmode'
  613. "Tetra Floating" mode represents a quadruple-precision (sixteen byte)
  614. floating point number.
  615. `BLKmode'
  616. "Block" mode represents values that are aggregates to which none of
  617. the other modes apply. In rtl, only memory references can have this mode,
  618. and only if they appear in string-move or vector instructions. On machines
  619. which have no such instructions, `BLKmode' will not appear in RTL.
  620. `VOIDmode'
  621. Void mode means the absence of a mode or an unspecified mode.
  622. For example, RTL expresslons of code `const_int' have mode
  623. `VOIDmode' because they can be taken to have whatever mode the context
  624. requires. In debugging dumps of RTL, `VOIDmode' is expressed by
  625. the absence of any mode.
  626. `EPmode'
  627. "Entry Pointer" mode is intended to be used for function variables in
  628. Pascal and other block structured languages. Such values contain
  629. both a function address and a static chain pointer for access to
  630. automatic variables of outer levels. This mode is only partially
  631. implemented since C does not use it.
  632. `CSImode, ...'
  633. "Complex Single Integer" mode stands for a complex number represented
  634. as a pair of `SImode' integers. Any of the integer and floating modes
  635. may have `C' prefixed to its name to obtain a complex number mode.
  636. For example, there are `CQImode', `CSFmode', and `CDFmode'.
  637. Since C does not support complex numbers, these machine modes are only
  638. partially implemented.
  639. `BImode'
  640. This is the machine mode of a bit-field in a structure. It is used
  641. only in the syntax tree, never in RTL, and in the syntax tree it appears
  642. only in declaration nodes. In C, it appears only in `FIELD_DECL'
  643. nodes for structure fields defined with a bit size.
  644. The machine description defines `Pmode' as a C macro which expands
  645. into the machine mode used for addresses. Normally this is `SImode'.
  646. The only modes which a machine description must support are
  647. `QImode', `SImode', `SFmode' and `DFmode'. The
  648. compiler will attempt to use `DImode' for two-word structures and
  649. unions, but it would not be hard to program it to avoid this. Likewise,
  650. you can arrange for the C type `short int' to avoid using
  651. `HImode'. In the long term it would be desirable to make the set of
  652. available machine modes machine-dependent and eliminate all assumptions
  653. about specific machine modes or their uses from the machine-independent
  654. code of the compiler.
  655. Here are some C macros that relate to machine modes:
  656. `GET_MODE (X)'
  657. Returns the machine mode of the rtx X.
  658. `PUT_MODE (X, NEWMODE)'
  659. Alters the machine mode of the rtx X to be NEWMODE.
  660. `GET_MODE_SIZE (M)'
  661. Returns the size in bytes of a datum of mode M.
  662. `GET_MODE_BITSIZE (M)'
  663. Returns the size in bits of a datum of mode M.
  664. `GET_MODE_UNIT_SIZE (M)'
  665. Returns the size in bits of the subunits of a datum of mode M.
  666. This is the same as `GET_MODE_SIZE' except in the case of
  667. complex modes and `EPmode'. For them, the unit size ithe
  668. size of the real or imaginary part, or the size of the function
  669. pointer or the context pointer.
  670. 
  671. File: internals Node: Constants, Prev: Machine Modes, Up: RTL, Next: Regs and Memory
  672. Constant Expression Types
  673. =========================
  674. The simplest RTL expressions are those that represent constant values.
  675. `(const_int I)'
  676. This type of expression represents the integer value I. I
  677. is customarily accessed with the macro `INTVAL' as in
  678. `INTVAL (exp)', which is equivalent to `XINT (exp, 0)'.
  679. There is only one expression object for the integer value zero;
  680. it is the value of the variable `const0_rtx'. Likewise, the
  681. only expression for integer value one is found in `const1_rtx'.
  682. Any attempt to create an expression of code `const_int' and
  683. value zero or one will return `const0_rtx' or `const1_rtx'
  684. as appropriate.
  685. `(const_double:M I0 I1)'
  686. Represents a floating point constant value of mode M. The two
  687. integers I0 and I1 together contain the bits of a
  688. `double' value. To convert them to a `double', do
  689. union { double d; int i[2];} u;
  690. u.i[0] = XINT (x, 0);
  691. u.i[1] = XINT (x, 1);
  692. and then refer to `u.d'. The value of the constant is
  693. represented as a double in this fashion even if the value represented
  694. is single-precision.
  695. `dconst0_rtx' and `fconst0_rtx' are `CONST_DOUBLE'
  696. expressions with value 0 and modes `DFmode' and `SFmode'.
  697. `(symbol_ref SYMBOL)'
  698. Represents the value of an assembler label for data. SYMBOL is
  699. a string that describes the name of the assembler label. If it starts
  700. with a `*', the label is the rest of SYMBOL not including
  701. the `*'. Otherwise, the label is SYMBOL, prefixed with
  702. `_'.
  703. `(label_ref LABEL)'
  704. Represents the value of an assembler label for code. It contains one
  705. operand, an expression, which must be a `code_label' that appears
  706. in the instruction sequence to identify the place where the label
  707. should go.
  708. The reason for using a distinct expression type for code label
  709. references is so that jump optimization can distinguish them.
  710. `(const EXP)'
  711. Represents a constant that is the result of an assembly-time
  712. arithmetic computation. The operand, EXP, is an expression that
  713. contains only constants (`const_int', `symbol_ref' and
  714. `label_ref' expressions) combined with `plus' and
  715. `minus'. However, not all combinations are valid, since the
  716. assembler cannot do arbitrary arithmetic on relocatable symbols.
  717. 
  718. File: internals Node: Regs and Memory, Prev: Constants, Up: RTL, Next: Arithmetic
  719. Registers and Memory
  720. ====================
  721. Here are the RTL expression types for describing access to machine
  722. registers and to main memory.
  723. `(reg:M N)'
  724. For small values of the integer N (less than
  725. `FIRST_PSEUDO_REGISTER'), this stands for a reference to machine
  726. register number N: a "hard register". For larger values of
  727. N, it stands for a temporary value or "pseudo register".
  728. The compiler's strategy is to generate code assuming an unlimited
  729. number of such pseudo registers, and later convert them into hard
  730. registers or into memory references.
  731. The symbol `FIRST_PSEUDO_REGISTER' is defined by the machine
  732. description, since the number of hard registers on the machine is an
  733. invariant characteristic of the machine. Note, however, that not
  734. all of the machine registers must be general registers. All the
  735. machine registers that can be used for storage of data are given
  736. hard register numbers, even those that can be used only in certain
  737. instructions or can hold only certain types of data.
  738. Each pseudo register number used in a function's rtl code is
  739. represented by a unique `reg' expression.
  740. M is the machine mode of the reference. It is necessary because
  741. machines can generally refer to each register in more than one mode.
  742. For example, a register may contain a full word but there may be
  743. instructions to refer to it as a half word or as a single byte, as
  744. well as instructions to refer to it as a floating point number of
  745. various precisions.
  746. Even for a register that the machine can access in only one mode,
  747. the mode must always be specified.
  748. A hard register may be accessed in various modes throughout one
  749. function, but each pseudo register is given a natural mode
  750. and is accessed only in that mode. When it is necessary to describe
  751. an access to a pseudo register using a nonnatural mode, a `subreg'
  752. expression is used.
  753. A `reg' expression with a machine mode that specifies more than
  754. one word of data may actually stand for several consecutive registers.
  755. If in addition the register number specifies a hardware register, then
  756. it actually represents several consecutive hardware registers starting
  757. with the specified one.
  758. Such multi-word hardware register `reg' expressions may not be live
  759. across the boundary of a basic block. The lifetime analysis pass does not
  760. know how to record properly that several consecutive registers are
  761. actually live there, and therefore register allocation would be confused.
  762. The CSE pass must go out of its way to make sure the situation does
  763. not arise.
  764. `(subreg:M REG WORDNUM)'
  765. `subreg' expressions are used to refer to a register in a machine
  766. mode other than its natural one, or to refer to one register of
  767. a multi-word `reg' that actually refers to several registers.
  768. Each pseudo-register has a natural mode. If it is necessary to
  769. operate on it in a different mode---for example, to perform a fullword
  770. move instruction on a pseudo-register that contains a single byte---
  771. the pseudo-register must be enclosed in a `subreg'. In such
  772. a case, WORDNUM is zero.
  773. The other use of `subreg' is to extract the individual registers
  774. of a multi-register value. Machine modes such as `DImode' and
  775. `EPmode' indicate values longer than a word, values which usually
  776. require two consecutive registers. To access one of the registers,
  777. use a `subreg' with mode `SImode' and a WORDNUM that
  778. says which register.
  779. The compilation parameter `WORDS_BIG_ENDIAN', if defined, says
  780. that word number zero is the most significant part; otherwise, it is
  781. the least significant part.
  782. Note that it is not valid to access a `DFmode' value in `SFmode'
  783. using a `subreg'. On some machines the most significant part of a
  784. `DFmode' value does not have the same format as a single-precision
  785. floating value.
  786. `(cc0)'
  787. This refers to the machine's condition code register. It has no
  788. operands and may not have a machine mode. It may be validly used in
  789. only two contexts: as the destination of an assignment (in test and
  790. compare instructions) and in comparison operators comparing against
  791. zero (`const_int' with value zero; that is to say,
  792. `const0_rtx'.
  793. There is only one expression object of code `cc0'; it is the
  794. value of the variable `cc0_rtx'. Any attempt to create an
  795. expression of code `cc0' will return `cc0_rtx'.
  796. One special thing about the condition code register is that instructions
  797. can set it implicitly. On many machines, nearly all instructions set
  798. the condition code based on the value that they compute or store.
  799. It is not necessary to record these actions explicitly in the RTL
  800. because the machine description includes a prescription for recognizing
  801. the instructions that do so (by means of the macro `NOTICE_UPDATE_CC').
  802. Only instructions whose sole purpose is to set the condition code,
  803. and instructions that use the condition code, need mention `(cc0)'.
  804. `(pc)'
  805. This represents the machine's program counter. It has no operands and
  806. may not have a machine mode. `(pc)' may be validly used only in
  807. certain specific contexts in jump instructions.
  808. There is only one expression object of code `pc'; it is the value of
  809. the variable `pc_rtx'. Any attempt to create an expression of code
  810. `pc' will return `pc_rtx'.
  811. All instructions that do not jump alter the program counter implicitly,
  812. but there is no need to mention this in the RTL.
  813. `(mem:M ADDR)'
  814. This rtx represents a reference to main memory at an address
  815. represented by the expression ADDR. M specifies how
  816. large a unit of memory is accessed.
  817. 
  818. File: internals Node: Arithmetic, Prev: Regs and Memory, Up: RTL, Next: Comparisons
  819. RTL Expressions for Arithmetic
  820. ==============================
  821. `(plus:M X Y)'
  822. Represents the sum of the values represented by X and Y
  823. carried out in machine mode M. This is valid only if
  824. X and Y both are valid for mode M.
  825. `(minus:M X Y)'
  826. Like `plus' but represents subtraction.
  827. `(minus X Y)'
  828. Represents the result of subtracting Y from X
  829. for purposes of comparison. The absence of a machine mode
  830. in the `minus' expression indicates that the result is
  831. computed without overflow, as if with infinite precision.
  832. Of course, machines can't really subtract with infinite precision.
  833. However, they can pretend to do so when only the sign of the
  834. result will be used, which is the case when the result is stored
  835. in `(cc0)'. And that is the only was this kind of expression
  836. may validly be used: as a value to be stored in the condition codes.
  837. `(neg:M X)'
  838. Represents the negation (subtraction from zero) of the value
  839. represented by X, carried out in mode M. X must be
  840. valid for mode M.
  841. `(mult:M X Y)'
  842. Represents the signed product of the values represented by X and
  843. Y carried out in machine mode M. If
  844. X and Y are both valid for mode M, this is ordinary
  845. size-preserving multiplication. Alteratively, both X and Y
  846. may be valid for a different, narrower mode. This represents the
  847. kind of multiplication that generates a product wider than the operands.
  848. Widening multiplication and same-size multiplication are completely
  849. distinct and supported by different machine instructions; machines may
  850. support one but not the other.
  851. `mult' may be used for floating point division as well.
  852. Then M is a floating point machine mode.
  853. `(umult:M X Y)'
  854. Like `mult' but represents unsigned multiplication. It may be
  855. used in both same-size and widening forms, like `mult'.
  856. `umult' is used only for fixed-point division.
  857. `(div:M X Y)'
  858. Represents the quotient in signed division of X by Y,
  859. carried out in machine mode M. If M is a floating-point
  860. mode, it represents the exact quotient; otherwise, the integerized
  861. quotient. If X and Y are both valid for mode M,
  862. this is ordinary size-preserving division. Some machines have
  863. division instructions in which the operands and quotient widths are
  864. not all the same; such instructions are represented by `div'
  865. expressions in which the machine modes are not all the same.
  866. `(udiv:M X Y)'
  867. Like `div' but represents unsigned division.
  868. `(mod:M X Y)'
  869. `(umod:M X Y)'
  870. Like `div' and `udiv' but represent the remainder instead of
  871. the quotient.
  872. `(not:M X)'
  873. Represents the bitwise complement of the value represented by X,
  874. carried out in mode M, which must be a fixed-point machine mode.
  875. X must be valid for mode M, which must be a fixed-point mode.
  876. `(and:M X Y)'
  877. Represents the bitwise logical-and of the values represented by
  878. X and Y, carried out in machine mode M. This is
  879. valid only if X and Y both are valid for mode M,
  880. which must be a fixed-point mode.
  881. `(ior:M X Y)'
  882. Represents the bitwise inclusive-or of the values represented by
  883. X and Y, carried out in machine mode M. This is
  884. valid only if X and Y both are valid for mode M,
  885. which must be a fixed-point mode.
  886. `(xor:M X Y)'
  887. Represents the bitwise exclusive-or of the values represented by
  888. X and Y, carried out in machine mode M. This is
  889. valid only if X and Y both are valid for mode M,
  890. which must be a fixed-point mode.
  891. `(lshift:M X C)'
  892. Represents the result of logically shifting X left by C
  893. places. X must be valid for the mode M, a fixed-point
  894. machine mode. C must be valid for a fixed-point mode;
  895. which mode is determined by the mode called for in the machine
  896. description entry for the left-shift instruction. For example,
  897. on the Vax, the mode of C is `QImode' regardless of M.
  898. On some machines, negative values of C may be meaningful; this
  899. is why logical left shift an arithmetic left shift are distinguished.
  900. For example, Vaxes have no right-shift instructions, and right shifts
  901. are represented as left-shift instructions whose counts happen
  902. to be negative constants or else computed (in a previous instruction)
  903. by negation.
  904. `(ashift:M X C)'
  905. Like `lshift' but for arithmetic left shift.
  906. `(lshiftrt:M X C)'
  907. `(ashiftrt:M X C)'
  908. Like `lshift' and `ashift' but for right shift.
  909. `(rotate:M X C)'
  910. `(rotatert:M X C)'
  911. Similar but represent left and right rotate.
  912. `(abs:M X)'
  913. Represents the absolute value of X, computed in mode M.
  914. X must be valid for M.
  915. `(sqrt:M X)'
  916. Represents the square root of X, computed in mode M.
  917. X must be valid for M. Most often M will be
  918. a floating point mode.
  919.