123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341 |
- In the compiler `continuation' means a continuation that is a lambda node.
- Non-lambda continuation arguments, such as the argument to a RETURN, are
- not referred to as continuations (the argument isn't a continuation, it
- is a variable that is bound to a continuation).
- Every node has the following fields:
- variant ; one of LITERAL, REFERENCE, LAMBDA, or CALL
- parent ; parent node
- index ; index of this node in parent, if parent is a call node
- simplified? ; true if it has already been simplified; if this is #F
- ; then all of this node's ancestors must also be unsimplified
- flag ; useful flag, all users must leave this is #F
- Literal nodes:
- value ; the value
- type ; the type of the value (important for statically typed languages,
- ; not so useful for Scheme)
- Reference nodes:
- variable ; the referenced variable; the binder of the variable must be
- ; an ancestor of the reference node
- Call nodes:
- primop ; the primitive being called
- args ; vector of argument nodes
- exits ; the number of arguments that are continuations; the continuation
- ; arguments come before the non-continuation ones
- source ; source info; used for error messages
- Primops are either trivial or nontrivial. Trivial primops only return a value
- and have no side effects. Calls to trivial primops never have continuation
- arguments and are always arguments to other calls. Calls to nontrivial primops
- may or may not have continuations and are always the body of a lambda node.
- Lambda nodes:
- type ; one of PROC, CONT, or JUMP (and maybe THROW at some point)
- name ; symbol (for debugging)
- id ; unique integer (for debugging)
- body ; the call-node that is the body of the lambda
- variables ; a list of variable records, with #Fs for ignored positions
- source ; source info; used for error messages
- protocol ; calling protocol from the source language
- block ; for use during code generation
- env ; for use when adding explicit environments
- PROC's are general procedures. The first variable of a PROC will be bound
- to the PROC's continuation.
- CONT's are continuation arguments to calls.
- JUMP's are continuations bound by LET or LETREC, whose calling points are
- known, and which are created and called within a single PROC.
- Variables:
- name ; source code name for variable (used for debugging only)
- id ; unique numeric identifier (used for debugging only)
- type ; type of variable's value
- binder ; LAMBDA node which binds this variable (or #F if none)
- refs ; list of reference nodes n for which (REFERENCE-VARIABLE n)
- ; = this variable
- flag ; useful slot, used by shapes, COPY-NODE, NODE->VECTOR, etc.
- ; all users must leave this is #F
- flags ; list of various annotations, e.g. IGNORABLE
- generate ; for whatever code generation wants
- ----------------------------------------------------------------
- The node tree has a very regular lexical structure:
- The body of every lambda node is a non-trivial call.
- The parent of every non-trivial call is a lambda node.
- Every CONT lambda is a continuation of a non-trivial call.
- Every JUMP lambda is an argument to either the LET or the LETREC
- primops (described below).
- The lambda node that binds a variable is an ancestor of every reference
- to that variable.
-
- If you start from any leaf node and follow the parent pointers up through the
- node tree, you first go through some number, possible zero, of trivial calls
- until a non-trivial call is reached. From that point on non-trivial calls
- alternate with CONT nodes until a PROC or JUMP lambda is reached. Going up
- from a PROC lambda is the same as going up from a leaf, while JUMP lambdas
- are always arguments to LET or LETREC, both of which are non-trivial.
- A basic block appears as a sequence of non-trivial calls with a single
- continuation apiece. The block begins with a PROC or JUMP lambda, or
- with a CONT lambda that is an argument to a call with two or more
- continuations, and ends with a call that has either no continuations,
- or two or more.
- Basic blocks are grouped into trees. The root of every tree is either
- a PROC or JUMP lambda, the branch points are calls with two or more
- continuations, and the leaves are jumps or returns. Within a tree
- the control flow follows the lexical structure of the program from
- parent to child (if we ignore calls to other PROCs).
- Every JUMP lambda is called from within only one PROC lambda, so a PROC
- can be considered to consist of a set of trees, the leaves of which either
- return from that PROC or jump to the top of another tree in the set.
- ----------------------------------------------------------------
- Primops:
- id ; unique symbol identifying this primop
- trivial? ; #t if this primop has does not accept a continuation
- side-effects ; one of #F, READ, WRITE, ALLOCATE, or IO
- simplify-call-proc ; simplify method
- primop-cost-proc ; cost of executing this operation
- ; (in some undisclosed metric)
- return-type-proc ; the type of the value returned (for trivial primops only)
- proc-data ; more data for the procedure primops
- cond-data ; more data for conditional primops
- code-data ; code generation data
- `procedure' primops are those that call one of their values.
- `conditional' primops are those that have more than one continuation.
- Below is a list of the standard primops. All but the last two are non-trivial.
- For the following the five primops the lambda node being called, jumped to,
- or whatever has been identified by the compiler, and the number of variables
- that the lambda node has matches the number of arguments.
- (CALL <cont> <proc> . <args>)
- (TAIL-CALL <cont-var> <proc> . <args>)
- (RETURN <cont-var> . <args>)
- (JUMP <jump-var> . <args>)
- ; (THROW <throw-var> . <args>) not yet implemented
- These are the same as the above except that the procedure has not been
- identified by the compiler. There is no UNKNOWN-JUMP because all calls
- to JUMP lambdas must be known.
- (UNKNOWN-CALL <cont> <proc> . <args>)
- (UNKNOWN-TAIL-CALL <cont> <proc> . <args>)
- (UNKNOWN-RETURN <cont-var> . <args>)
- PROC lambdas are called with either CALL or TAIL-CALL if all of their call
- sites have been identified, or with UNKNOWN-CALL or UNKNOWN-TAIL-CALL if not.
- JUMP lambdas are called using JUMP.
- LET binds random values, such as lambda nodes or the results of trivial
- calls, to variables. This primop only exists because of the requirement
- that every call have a primop; all it does is apply <cont> to <args>
- (it is called LET instead of APPLY because LET forms in the source code
- become calls to this primop).
- (LET <cont> . <args>)
- Recursive binding:
- (LETREC1 <cont>)
- (LETREC2 <cont> <id-var> <lambda1> <lambda2> ...)
- These are always used together, with the body of the continuation to LETREC1
- being a call to LETREC2. The two calls together look like:
- (LETREC1 (lambda (<id-var> <var1> ... <varN>)
- (LETREC2 <cont> <id-var> <lambda1> ... <lambdaN>)))
- which the CPS pretty-printer prints as:
- (let* (...
- ((id-var var1 ... varN) (letrec1))
- (() (letrec2 id-var lambda1 ... lambdaN))
- ...)
- ...)
- The end result is to bind <varI> to <lambdaI>. The point to the excercise
- is that lambdas occur within the scope of the variables.
- Undefined effect. This takes a continuation variable as an argument only
- so that the continuation variable is always reached.
- (UNDEFINED-EFFECT <cont-var> ...)
- Accessing and mutating the store.
- Cells are used to implement SET! on lexically bound variables. GLOBAL-SET!
- and GLOBAL-REF are used for module variables that may be set.
- (CELL-SET! <cont> <cell> <value>)
- (GLOBAL-SET! <cont> <global-var> <value>)
- (CELL-REF <cell>) ; trivial
- (GLOBAL-REF <global-var>) ; trivial
- ----------------------------------------------------------------
- Printing out the node tree.
- The following procedure:
- (define (fact n)
- (let loop ((n n) (r 1))
- (if (< n 2)
- r
- (loop (- n 1) (* n r)))))
- when converted into nodes is:
- (LAMBDAp (c_6 n_1)
- (letrec1 (LAMBDAc (x_13 loop_2)
- (letrec2 (LAMBDAc ()
- (unknown-tail-call c_6 loop_2 n_1 '1))
- x_13
- (LAMBDAp (c_8 n_3 r_4)
- (test
- (LAMBDAc ()
- (unknown-return c_8 r_4))
- (LAMBDAc ()
- (unknown-tail-call c_8 loop_2 (- n_3 '1) (* n_3 r_4)))
- (< n_3 '2)))))))
- where LAMBDAp is a PROC lambda and LAMBDAc is a CONT lambda. Lexically bound
- variables are printed as <name>_<id> and constants as '<value>. This is not
- very readable, and larger procedures are much worse. The first step in making
- it more comprehensible is to print each lambda node separately with a marker
- to indicate where it appears in the tree.
- (LAMBDAp fact_7 (c_6 n_1)
- (letrec1 1 ^c_14))
- (LAMBDAc c_14 (x_13 loop_2)
- (letrec2 1 ^c_12 x_13 ^loop_9))
- (LAMBDAc c_12 ()
- (unknown-tail-call 0 c_6 loop_2 n_1 '1))
- (LAMBDAp loop9 (c_8 n_3 r_4)
- (test 2 ^g_10 ^g_11 (< n_3 '2)))
- (LAMBDAc g_10 ()
- (unknown-return 0 c_8 r_4))
- (LAMBDAc g_11 ()
- (unknown-tail-call 0 c_8 loop_2 (- n_3 '1) (* n_3 r_4)))
- The labels used are the names and id's of the lambda nodes, with a ^ in front
- to distinguish them from variables. The code for each lambda is indented
- slightly more than the lambda in which it actually occurs. To make the
- distinction between continuation and non-continuation lambdas clearer the
- number of continuation arguments to a call is printed just after the primop
- (for example the first two arguments to TEST are continuations).
- The first three calls form a basic block because the first two calls have
- exactly one continuation apiece. To make this more easily seen these
- calls can be printed using a more condensed notation:
- (LAMBDAp fact_7 (c_6 n_1)
- (LET* (((x_13 loop_2) (letrec1))
- (() (letrec2 x_13 ^loop_9)))
- (unknown-tail-call 0 c_6 loop_2 n_1 '1)))
- The continuations are not printed as arguments but instead their variables
- are printed to the left of the call in a parody of Scheme's LET*. The results
- of the LETREC1 are bound to the variables X_13 and LOOP_2 as would happen with
- the real LET* (if it allowed calls to return multiple values).
- Finally, here is the way the code for FACT is actually printed:
- 7 (P fact_7 (c_6 n_1)
- 14 (LET* (((x_13 loop_2)
- (letrec1))
- 12 (() (letrec2 x_13 ^loop_9)))
- (unknown-tail-call 0 c_6 loop_2 n_1 '1)))
- 9 (P loop_9 (c_8 n_3 r_4)
- (test 2 ^g_10 ^g_11 (< n_3 '2)))
- 10 (C g_10 ()
- (unknown-return 0 c_8 r_4))
- 11 (C g_11 ()
- (unknown-tail-call 0 c_8 loop_2 (- n_3 '1) (* n_3 r_4)))
- The ID number of every lambda node is printed out at the beginning of the
- line on which the code for the lambda appears. This is redundant for the
- lambdas that are not printed as part of a LET*. The word `LAMBDA' is not
- printed. The (letrec1) call appears on a new line because the printer
- indents the calls in LET* a fixed amount.
- The reason for printing the ID numbers is so that the actual nodes can be
- obtained. Once a lambda has been printed (either by the pretty printer or
- by the regular printer), (NODE-UNHASH <id>) will return it:
- scheme-compiler> (node-unhash 9)
- '#{Node lambda loop 9}
- scheme-compiler> ,inspect ##
- '#{Node lambda loop 9}
- [0: variant] 'lambda
- [1: parent] '#{Node call letrec2}
- [2: index] 2
- [3: simplified?] #t
- [4: flag] #f
- [5: stuff-0] '#{Node call test}
- [6: stuff-1] '(#{Variable n 3} #{Variable r 4})
- [7: stuff-2] '(#{Name #} (n r) (if # r #))
- [8: stuff-3] '#{Lambda-data}
- ----------------------------------------------------------------
- Simplification.
- The factorial procedure above is how it looks when originally translated
- into a node tree. The next step in compilation is to simplify the tree,
- doing constant folding, identifying call points, and so on. The simplified
- version of FACT is:
- 7 (P fact_7 (c_6 n_1)
- 14 (LET* (((x_13 loop_2)
- (letrec1))
- 12 (() (letrec2 x_13 ^loop_9)))
- (jump 0 loop_2 n_1 '1)))
- 9 (J loop_9 (n_3 r_4)
- (test 2 ^g_10 ^g_11 (< n_3 '2)))
- 10 (C g_10 ()
- (unknown-return 0 c_6 r_4))
- 11 (C g_11 ()
- (jump 0 loop_2 (+ '-1 n_3) (* n_3 r_4)))
- The only change is that the loop has been turned into a JUMP lambda.
- ----------------------------------------------------------------
- Still to describe:
- protocol determination
- simplifier moving stuff down, duplicating, later passes move values back up
|