vm.texi 88 KB


  1. @c -*-texinfo-*-
  2. @c This is part of the GNU Guile Reference Manual.
  3. @c Copyright (C) 2008-2011, 2013, 2015, 2018, 2019, 2020, 2022
  4. @c Free Software Foundation, Inc.
  5. @c See the file guile.texi for copying conditions.
  6. @node A Virtual Machine for Guile
  7. @section A Virtual Machine for Guile
  8. Enough about data---how does Guile run code?
  9. Code is a grammatical production of a language. Sometimes these
  10. languages are implemented using interpreters: programs that run
  11. along-side the program being interpreted, dynamically translating the
  12. high-level code to low-level code. Sometimes these languages are
  13. implemented using compilers: programs that translate high-level
  14. programs to equivalent low-level code, and pass on that low-level code
  15. to some other language implementation. Each of these languages can be
  16. thought to be virtual machines: they offer programs an abstract machine
  17. on which to run.
  18. Guile implements a number of interpreters and compilers on different
  19. language levels. For example, there is an interpreter for the Scheme
  20. language that is itself implemented as a Scheme program compiled to a
  21. bytecode for a low-level virtual machine shipped with Guile. That
  22. virtual machine is implemented by both an interpreter---a C program that
  23. interprets the bytecodes---and a compiler---a C program that dynamically
  24. translates bytecode programs to native machine code@footnote{Even the
  25. lowest-level machine code can be thought to be interpreted by the CPU,
  26. and indeed is often implemented by compiling machine instructions to
  27. ``micro-operations''.}.
  28. This section describes the language implemented by Guile's bytecode
  29. virtual machine, as well as some examples of translations of Scheme
  30. programs to Guile's VM.
  31. @menu
  32. * Why a VM?::
  33. * VM Concepts::
  34. * Stack Layout::
  35. * Variables and the VM::
  36. * VM Programs::
  37. * Object File Format::
  38. * Instruction Set::
  39. * Just-In-Time Native Code::
  40. @end menu
  41. @node Why a VM?
  42. @subsection Why a VM?
  43. @cindex interpreter
  44. For a long time, Guile only had a Scheme interpreter, implemented in C.
  45. Guile's interpreter operated directly on the S-expression representation
  46. of Scheme source code.
  47. But while the interpreter was highly optimized and hand-tuned, it still
  48. performed many needless computations during the course of evaluating a
  49. Scheme expression. For example, application of a function to arguments
  50. needlessly consed up the arguments in a list. Evaluation of an
  51. expression like @code{(f x y)} always had to figure out whether @var{f}
  52. was a procedure, or a special form like @code{if}, or something else.
  53. The interpreter represented the lexical environment as a heap data
  54. structure, so every evaluation caused allocation, which was of course
  55. slow. Et cetera.
  56. The solution to the slow-interpreter problem was to compile the
  57. higher-level language, Scheme, into a lower-level language for which all
  58. of the checks and dispatching have already been done---the code is
  59. instead stripped to the bare minimum needed to ``do the job''.
  60. The question becomes then, what low-level language to choose? There are
  61. many options. We could compile to native code directly, but that poses
  62. portability problems for Guile, as it is a highly cross-platform
  63. project.
  64. So we want the performance gains that compilation provides, but we
  65. also want to maintain the portability benefits of a single code path.
  66. The obvious solution is to compile to a virtual machine that is
  67. present on all Guile installations.
  68. The easiest (and most fun) way to depend on a virtual machine is to
  69. implement the virtual machine within Guile itself. Guile contains a
  70. bytecode interpreter (written in C) and a Scheme to bytecode compiler
  71. (written in Scheme). This way the virtual machine provides what Scheme
  72. needs (tail calls, multiple values, @code{call/cc}) and can provide
  73. optimized inline instructions for Guile as well (GC-managed allocations,
  74. type checks, etc.).
  75. Guile also includes a just-in-time (JIT) compiler to translate bytecode
  76. to native code. Because Guile embeds a portable code generation library
  77. (@url{https://gitlab.com/wingo/lightening}), we keep the benefits of
  78. portability while also benefitting from fast native code. To avoid too
  79. much time spent in the JIT compiler itself, Guile is tuned to only emit
  80. machine code for bytecode that is called often.
  81. The rest of this section describes that VM that Guile implements, and
  82. the compiled procedures that run on it.
  83. Before moving on, though, we should note that though we spoke of the
  84. interpreter in the past tense, Guile still has an interpreter. The
  85. difference is that before, it was Guile's main Scheme implementation,
  86. and so was implemented in highly optimized C; now, it is actually
  87. implemented in Scheme, and compiled down to VM bytecode, just like any
  88. other program. (There is still a C interpreter around, used to
  89. bootstrap the compiler, but it is not normally used at runtime.)
  90. The upside of implementing the interpreter in Scheme is that we preserve
  91. tail calls and multiple-value handling between interpreted and compiled
  92. code, and with advent of the JIT compiler in Guile 3.0 we reach the
  93. speed of the old hand-tuned C implementation; it's the best of both
  94. worlds.
  95. Also note that this decision to implement a bytecode compiler does not
  96. preclude ahead-of-time native compilation. More possibilities are
  97. discussed in @ref{Extending the Compiler}.
  98. @node VM Concepts
  99. @subsection VM Concepts
  100. The bytecode in a Scheme procedure is interpreted by a virtual machine
  101. (VM). Each thread has its own instantiation of the VM. The virtual
  102. machine executes the sequence of instructions in a procedure.
  103. Each VM instruction starts by indicating which operation it is, and then
  104. follows by encoding its source and destination operands. Each procedure
  105. declares that it has some number of local variables, including the
  106. function arguments. These local variables form the available operands
  107. of the procedure, and are accessed by index.
  108. The local variables for a procedure are stored on a stack. Calling a
  109. procedure typically enlarges the stack, and returning from a procedure
  110. shrinks it. Stack memory is exclusive to the virtual machine that owns
  111. it.
  112. In addition to their stacks, virtual machines also have access to the
  113. global memory (modules, global bindings, etc) that is shared among other
  114. parts of Guile, including other VMs.
  115. The registers that a VM has are as follows:
  116. @itemize
  117. @item ip - Instruction pointer
  118. @item sp - Stack pointer
  119. @item fp - Frame pointer
  120. @end itemize
  121. In other architectures, the instruction pointer is sometimes called the
  122. ``program counter'' (pc). This set of registers is pretty typical for
  123. virtual machines; their exact meanings in the context of Guile's VM are
  124. described in the next section.
  125. @node Stack Layout
  126. @subsection Stack Layout
  127. The stack of Guile's virtual machine is composed of @dfn{frames}. Each
  128. frame corresponds to the application of one compiled procedure, and
  129. contains storage space for arguments, local variables, and some
  130. bookkeeping information (such as what to do after the frame is
  131. finished).
  132. While the compiler is free to do whatever it wants to, as long as the
  133. semantics of a computation are preserved, in practice every time you
  134. call a function, a new frame is created. (The notable exception of
  135. course is the tail call case, @pxref{Tail Calls}.)
  136. The structure of the top stack frame is as follows:
  137. @example
  138. | ...previous frame locals... |
  139. +==============================+ <- fp + 3
  140. | Dynamic link |
  141. +------------------------------+
  142. | Virtual return address (vRA) |
  143. +------------------------------+
  144. | Machine return address (mRA) |
  145. +==============================+ <- fp
  146. | Local 0 |
  147. +------------------------------+
  148. | Local 1 |
  149. +------------------------------+
  150. | ... |
  151. +------------------------------+
  152. | Local N-1 |
  153. \------------------------------/ <- sp
  154. @end example
  155. In the above drawing, the stack grows downward. At the beginning of a
  156. function call, the procedure being applied is in local 0, followed by
  157. the arguments from local 1. After the procedure checks that it is being
  158. passed a compatible set of arguments, the procedure allocates some
  159. additional space in the frame to hold variables local to the function.
  160. Note that once a value in a local variable slot is no longer needed,
  161. Guile is free to re-use that slot. This applies to the slots that were
  162. initially used for the callee and arguments, too. For this reason,
  163. backtraces in Guile aren't always able to show all of the arguments: it
  164. could be that the slot corresponding to that argument was re-used by
  165. some other variable.
  166. The @dfn{virtual return address} is the @code{ip} that was in effect
  167. before this program was applied. When we return from this activation
  168. frame, we will jump back to this @code{ip}. Likewise, the @dfn{dynamic
  169. link} is the offset of the @code{fp} that was in effect before this
  170. program was applied, relative to the current @code{fp}.
  171. There are two return addresses: the virtual return address (vRA), and
  172. the machine return address (mRA). The vRA is always present and
  173. indicates a bytecode address. The mRA is only present when a call is
  174. made from a function with machine code (e.g. a function that has been
  175. JIT-compiled).
  176. To prepare for a non-tail application, Guile's VM will emit code that
  177. shuffles the function to apply and its arguments into appropriate stack
  178. slots, with three free slots below them. The call then initializes
  179. those free slots to hold the machine return address (or NULL), the
  180. virtual return address, and the offset to the previous frame pointer
  181. (@code{fp}). It then gets the @code{ip} for the function being called
  182. and adjusts @code{fp} to point to the new call frame.
  183. In this way, the dynamic link links the current frame to the previous
  184. frame. Computing a stack trace involves traversing these frames.
  185. Each stack local in Guile is 64 bits wide, even on 32-bit architectures.
  186. This allows Guile to preserve its uniform treatment of stack locals
  187. while allowing for unboxed arithmetic on 64-bit integers and
  188. floating-point numbers. @xref{Instruction Set}, for more on unboxed
  189. arithmetic.
  190. As an implementation detail, we actually store the dynamic link as an
  191. offset and not an absolute value because the stack can move at runtime
  192. as it expands or during partial continuation calls. If it were an
  193. absolute value, we would have to walk the frames, relocating frame
  194. pointers.
  195. @node Variables and the VM
  196. @subsection Variables and the VM
  197. Consider the following Scheme code as an example:
  198. @example
  199. (define (foo a)
  200. (lambda (b) (vector foo a b)))
  201. @end example
  202. Within the lambda expression, @code{foo} is a top-level variable,
  203. @code{a} is a lexically captured variable, and @code{b} is a local
  204. variable.
  205. Another way to refer to @code{a} and @code{b} is to say that @code{a} is
  206. a ``free'' variable, since it is not defined within the lambda, and
  207. @code{b} is a ``bound'' variable. These are the terms used in the
  208. @dfn{lambda calculus}, a mathematical notation for describing functions.
  209. The lambda calculus is useful because it is a language in which to
  210. reason precisely about functions and variables. It is especially good
  211. at describing scope relations, and it is for that reason that we mention
  212. it here.
  213. Guile allocates all variables on the stack. When a lexically enclosed
  214. procedure with free variables---a @dfn{closure}---is created, it copies
  215. those variables into its free variable vector. References to free
  216. variables are then redirected through the free variable vector.
  217. If a variable is ever @code{set!}, however, it will need to be
  218. heap-allocated instead of stack-allocated, so that different closures
  219. that capture the same variable can see the same value. Also, this
  220. allows continuations to capture a reference to the variable, instead
  221. of to its value at one point in time. For these reasons, @code{set!}
  222. variables are allocated in ``boxes''---actually, in variable cells.
  223. @xref{Variables}, for more information. References to @code{set!}
  224. variables are indirected through the boxes.
  225. Thus perhaps counterintuitively, what would seem ``closer to the
  226. metal'', viz @code{set!}, actually forces an extra memory allocation and
  227. indirection. Sometimes Guile's optimizer can remove this allocation,
  228. but not always.
  229. Going back to our example, @code{b} may be allocated on the stack, as
  230. it is never mutated.
  231. @code{a} may also be allocated on the stack, as it too is never
  232. mutated. Within the enclosed lambda, its value will be copied into
  233. (and referenced from) the free variables vector.
  234. @code{foo} is a top-level variable, because @code{foo} is not
  235. lexically bound in this example.
  236. @node VM Programs
  237. @subsection Compiled Procedures are VM Programs
  238. By default, when you enter in expressions at Guile's REPL, they are
  239. first compiled to bytecode. Then that bytecode is executed to produce a
  240. value. If the expression evaluates to a procedure, the result of this
  241. process is a compiled procedure.
  242. A compiled procedure is a compound object consisting of its bytecode and
  243. a reference to any captured lexical variables. In addition, when a
  244. procedure is compiled, it has associated metadata written to side
  245. tables, for instance a line number mapping, or its docstring. You can
  246. pick apart these pieces with the accessors in @code{(system vm
  247. program)}. @xref{Compiled Procedures}, for a full API reference.
  248. A procedure may reference data that was statically allocated when the
  249. procedure was compiled. For example, a pair of immediate objects
  250. (@pxref{Immediate Objects}) can be allocated directly in the memory
  251. segment that contains the compiled bytecode, and accessed directly by
  252. the bytecode.
  253. Another use for statically allocated data is to serve as a cache for a
  254. bytecode. Top-level variable lookups are handled in this way; the first
  255. time a top-level binding is referenced, the resolved variable will be
  256. stored in a cache. Thereafter all access to the variable goes through
  257. the cache cell. The variable's value may change in the future, but the
  258. variable itself will not.
  259. We can see how these concepts tie together by disassembling the
  260. @code{foo} function we defined earlier to see what is going on:
  261. @smallexample
  262. scheme@@(guile-user)> (define (foo a) (lambda (b) (vector foo a b)))
  263. scheme@@(guile-user)> ,x foo
  264. Disassembly of #<procedure foo (a)> at #xf1da30:
  265. 0 (instrument-entry 164) at (unknown file):5:0
  266. 2 (assert-nargs-ee/locals 2 1) ;; 3 slots (1 arg)
  267. 3 (allocate-words/immediate 2 3) at (unknown file):5:16
  268. 4 (load-u64 0 0 65605)
  269. 7 (word-set!/immediate 2 0 0)
  270. 8 (load-label 0 7) ;; anonymous procedure at #xf1da6c
  271. 10 (word-set!/immediate 2 1 0)
  272. 11 (scm-set!/immediate 2 2 1)
  273. 12 (reset-frame 1) ;; 1 slot
  274. 13 (handle-interrupts)
  275. 14 (return-values)
  276. ----------------------------------------
  277. Disassembly of anonymous procedure at #xf1da6c:
  278. 0 (instrument-entry 183) at (unknown file):5:16
  279. 2 (assert-nargs-ee/locals 2 3) ;; 5 slots (1 arg)
  280. 3 (static-ref 2 152) ;; #<variable 112e530 value: #<procedure foo (a)>>
  281. 5 (immediate-tag=? 2 7 0) ;; heap-object?
  282. 7 (je 19) ;; -> L2
  283. 8 (static-ref 2 119) ;; #<directory (guile-user) ca9750>
  284. 10 (static-ref 1 127) ;; foo
  285. 12 (call-scm<-scm-scm 2 2 1 40)
  286. 14 (immediate-tag=? 2 7 0) ;; heap-object?
  287. 16 (jne 8) ;; -> L1
  288. 17 (scm-ref/immediate 0 2 1)
  289. 18 (immediate-tag=? 0 4095 2308) ;; undefined?
  290. 20 (je 4) ;; -> L1
  291. 21 (static-set! 2 134) ;; #<variable 112e530 value: #<procedure foo (a)>>
  292. 23 (j 3) ;; -> L2
  293. L1:
  294. 24 (throw/value 1 151) ;; #(unbound-variable #f "Unbound variable: ~S")
  295. L2:
  296. 26 (scm-ref/immediate 2 2 1)
  297. 27 (allocate-words/immediate 1 4) at (unknown file):5:28
  298. 28 (load-u64 0 0 781)
  299. 31 (word-set!/immediate 1 0 0)
  300. 32 (scm-set!/immediate 1 1 2)
  301. 33 (scm-ref/immediate 4 4 2)
  302. 34 (scm-set!/immediate 1 2 4)
  303. 35 (scm-set!/immediate 1 3 3)
  304. 36 (mov 4 1)
  305. 37 (reset-frame 1) ;; 1 slot
  306. 38 (handle-interrupts)
  307. 39 (return-values)
  308. @end smallexample
  309. The first thing to notice is that the bytecode is at a fairly low level.
  310. When a program is compiled from Scheme to bytecode, it is expressed in
  311. terms of more primitive operations. As such, there can be more
  312. instructions than you might expect.
  313. The first chunk of instructions is the outer @code{foo} procedure. It
  314. is followed by the code for the contained closure. The code can look
  315. daunting at first glance, but with practice it quickly becomes
  316. comprehensible, and indeed being able to read bytecode is an important
  317. step to understanding the low-level performance of Guile programs.
  318. The @code{foo} function begins with a prelude. The
  319. @code{instrument-entry} bytecode increments a counter associated with
  320. the function. If the counter reaches a certain threshold, Guile will
  321. emit machine code (``JIT-compile'') for @code{foo}. Emitting machine
  322. code is fairly cheap but it does take time, so it's not something you
  323. want to do for every function. Using a per-function counter and a
  324. global threshold allows Guile to spend time JIT-compiling only the
  325. ``hot'' functions.
  326. Next in the prelude is an argument-checking instruction, which checks
  327. that it was called with only 1 argument (plus the callee function itself
  328. makes 2) and then reserves stack space for an additional 1 local.
  329. Then from @code{ip} 3 to 11, we allocate a new closure by allocating a
  330. three-word object, initializing its first word to store a type tag,
  331. setting its second word to its code pointer, and finally at @code{ip}
  332. 11, storing local value 1 (the @code{a} argument) into the third word
  333. (the first free variable).
  334. Before returning, @code{foo} ``resets the frame'' to hold only one local
  335. (the return value), runs any pending interrupts (@pxref{Asyncs}) and
  336. then returns.
  337. Note that local variables in Guile's virtual machine are usually
  338. addressed relative to the stack pointer, which leads to a pleasantly
  339. efficient @code{sp[@var{n}]} access. However it can make the
  340. disassembly hard to read, because the @code{sp} can change during the
  341. function, and because incoming arguments are relative to the @code{fp},
  342. not the @code{sp}.
  343. To know what @code{fp}-relative slot corresponds to an
  344. @code{sp}-relative reference, scan up in the disassembly until you get
  345. to a ``@var{n} slots'' annotation; in our case, 3, indicating that the
  346. frame has space for 3 slots. Thus a zero-indexed @code{sp}-relative
  347. slot of 2 corresponds to the @code{fp}-relative slot of 0, which
  348. initially held the value of the closure being called. This means that
  349. Guile doesn't need the value of the closure to compute its result, and
  350. so slot 0 was free for re-use, in this case for the result of making a
  351. new closure.
  352. A closure is code with data. As you can see, making the closure
  353. involved making an object (@code{ip} 3), putting a code pointer in it
  354. (@code{ip} 8 and 10), and putting in the closure's free variable
  355. (@code{ip} 11).
  356. The second stanza disassembles the code for the closure. After the
  357. prelude, all of the code between @code{ip} 5 and 24 is related to
  358. loading the toplevel variable @code{foo} into slot 1. This lookup
  359. happens only once, and is associated with a cache; after the first run,
  360. the value in the cache will be a bound variable, and the code will jump
  361. from @code{ip} 7 to 26. On the first run, Guile gets the module
  362. associated with the function, calls out to a run-time routine to look up
  363. the variable, and checks that the variable is bound before initializing
  364. the cache. Either way, @code{ip} 26 dereferences the variable into
  365. local 2.
  366. What follows is the allocation and initialization of the vector return
  367. value. @code{Ip} 27 does the allocation, and the following two
  368. instructions initialize the type-and-length tag for the object's first
  369. word. @code{Ip} 32 sets word 1 of the object (the first vector slot) to
  370. the value of @code{foo}; @code{ip} 33 fetches the closure variable for
  371. @code{a}, then in @code{ip} 34 stores it in the second vector slot; and
  372. finally, in @code{ip} 35, local @code{b} is stored to the third vector
  373. slot. This is followed by the return sequence.
  374. @node Object File Format
  375. @subsection Object File Format
  376. To compile a file to disk, we need a format in which to write the
  377. compiled code to disk, and later load it into Guile. A good @dfn{object
  378. file format} has a number of characteristics:
  379. @itemize
  380. @item Above all else, it should be very cheap to load a compiled file.
  381. @item It should be possible to statically allocate constants in the
  382. file. For example, a bytevector literal in source code can be emitted
  383. directly into the object file.
  384. @item The compiled file should enable maximum code and data sharing
  385. between different processes.
  386. @item The compiled file should contain debugging information, such as
  387. line numbers, but that information should be separated from the code
  388. itself. It should be possible to strip debugging information if space
  389. is tight.
  390. @end itemize
  391. These characteristics are not specific to Scheme. Indeed, mainstream
  392. languages like C and C++ have solved this issue many times in the past.
  393. Guile builds on their work by adopting ELF, the object file format of
  394. GNU and other Unix-like systems, as its object file format. Although
  395. Guile uses ELF on all platforms, we do not use platform support for ELF.
  396. Guile implements its own linker and loader. The advantage of using ELF
  397. is not sharing code, but sharing ideas. ELF is simply a well-designed
  398. object file format.
  399. An ELF file has two meta-tables describing its contents. The first
  400. meta-table is for the loader, and is called the @dfn{program table} or
  401. sometimes the @dfn{segment table}. The program table divides the file
  402. into big chunks that should be treated differently by the loader.
  403. Mostly the difference between these @dfn{segments} is their
  404. permissions.
  405. Typically all segments of an ELF file are marked as read-only, except
  406. that part that represents modifiable static data or static data that
  407. needs load-time initialization. Loading an ELF file is as simple as
  408. mmapping the thing into memory with read-only permissions, then using
  409. the segment table to mark a small sub-region of the file as writable.
  410. This writable section is typically added to the root set of the garbage
  411. collector as well.
  412. One ELF segment is marked as ``dynamic'', meaning that it has data of
  413. interest to the loader. Guile uses this segment to record the Guile
  414. version corresponding to this file. There is also an entry in the
  415. dynamic segment that points to the address of an initialization thunk
  416. that is run to perform any needed link-time initialization. (This is
  417. like dynamic relocations for normal ELF shared objects, except that we
  418. compile the relocations as a procedure instead of having the loader
  419. interpret a table of relocations.) Finally, the dynamic segment marks
  420. the location of the ``entry thunk'' of the object file. This thunk is
  421. returned to the caller of @code{load-thunk-from-memory} or
  422. @code{load-thunk-from-file}. When called, it will execute the ``body''
  423. of the compiled expression.
  424. The other meta-table in an ELF file is the @dfn{section table}. Whereas
  425. the program table divides an ELF file into big chunks for the loader,
  426. the section table specifies small sections for use by introspective
  427. tools like debuggers or the like. One segment (program table entry)
  428. typically contains many sections. There may be sections outside of any
  429. segment, as well.
  430. Typical sections in a Guile @code{.go} file include:
  431. @table @code
  432. @item .rtl-text
  433. Bytecode.
  434. @item .data
  435. Data that needs initialization, or which may be modified at runtime.
  436. @item .rodata
  437. Statically allocated data that needs no run-time initialization, and
  438. which therefore can be shared between processes.
  439. @item .dynamic
  440. The dynamic section, discussed above.
  441. @item .symtab
  442. @itemx .strtab
  443. A table mapping addresses in the @code{.rtl-text} to procedure names.
  444. @code{.strtab} is used by @code{.symtab}.
  445. @item .guile.procprops
  446. @itemx .guile.arities
  447. @itemx .guile.arities.strtab
  448. @itemx .guile.docstrs
  449. @itemx .guile.docstrs.strtab
  450. Side tables of procedure properties, arities, and docstrings.
  451. @item .guile.docstrs.strtab
  452. Side table of frame maps, describing the set of live slots for ever
  453. return point in the program text, and whether those slots are pointers
  454. are not. Used by the garbage collector.
  455. @item .debug_info
  456. @itemx .debug_abbrev
  457. @itemx .debug_str
  458. @itemx .debug_loc
  459. @itemx .debug_line
  460. Debugging information, in DWARF format. See the DWARF specification,
  461. for more information.
  462. @item .shstrtab
  463. Section name string table.
  464. @end table
  465. For more information, see @uref{http://linux.die.net/man/5/elf,,the
  466. elf(5) man page}. See @uref{http://dwarfstd.org/,the DWARF
  467. specification} for more on the DWARF debugging format. Or if you are an
  468. adventurous explorer, try running @code{readelf} or @code{objdump} on
  469. compiled @code{.go} files. It's good times!
  470. @node Instruction Set
  471. @subsection Instruction Set
  472. There are currently about 150 instructions in Guile's virtual machine.
  473. These instructions represent atomic units of a program's execution.
  474. Ideally, they perform one task without conditional branches, then
  475. dispatch to the next instruction in the stream.
  476. Instructions themselves are composed of 1 or more 32-bit units. The low
  477. 8 bits of the first word indicate the opcode, and the rest of
  478. instruction describe the operands. There are a number of different ways
  479. operands can be encoded.
  480. @table @code
  481. @item s@var{n}
  482. An unsigned @var{n}-bit integer, indicating the @code{sp}-relative index
  483. of a local variable.
  484. @item f@var{n}
  485. An unsigned @var{n}-bit integer, indicating the @code{fp}-relative index
  486. of a local variable. Used when a continuation accepts a variable number
  487. of values, to shuffle received values into known locations in the
  488. frame.
  489. @item c@var{n}
  490. An unsigned @var{n}-bit integer, indicating a constant value.
  491. @item l24
  492. An offset from the current @code{ip}, in 32-bit units, as a signed
  493. 24-bit value. Indicates a bytecode address, for a relative jump.
  494. @item zi16
  495. @itemx i16
  496. @itemx i32
  497. An immediate Scheme value (@pxref{Immediate Objects}), encoded directly
  498. in 16 or 32 bits. @code{zi16} is sign-extended; the others are
  499. zero-extended.
  500. @item a32
  501. @itemx b32
  502. An immediate Scheme value, encoded as a pair of 32-bit words.
  503. @code{a32} and @code{b32} values always go together on the same opcode,
  504. and indicate the high and low bits, respectively. Normally only used on
  505. 64-bit systems.
  506. @item n32
  507. A statically allocated non-immediate. The address of the non-immediate
  508. is encoded as a signed 32-bit integer, and indicates a relative offset
  509. in 32-bit units. Think of it as @code{SCM x = ip + offset}.
  510. @item r32
  511. Indirect scheme value, like @code{n32} but indirected. Think of it as
  512. @code{SCM *x = ip + offset}.
  513. @item l32
  514. @item lo32
  515. An ip-relative address, as a signed 32-bit integer. Could indicate a
  516. bytecode address, as in @code{make-closure}, or a non-immediate address,
  517. as with @code{static-patch!}.
  518. @code{l32} and @code{lo32} are the same from the perspective of the
  519. virtual machine. The difference is that an assembler might want to
  520. allow an @code{lo32} address to be specified as a label and then some
  521. number of words offset from that label, for example when patching a
  522. field of a statically allocated object.
  523. @item v32:x8-l24
  524. Almost all VM instructions have a fixed size. The @code{jtable}
  525. instruction used to perform optimized @code{case} branches is an
  526. exception, which uses a @code{v32} trailing word to indicate the number
  527. of additional words in the instruction, which themselves are encoded as
  528. @code{x8-l24} values.
  529. @item b1
  530. A boolean value: 1 for true, otherwise 0.
  531. @item x@var{n}
  532. An ignored sequence of @var{n} bits.
  533. @end table
  534. An instruction is specified by giving its name, then describing its
  535. operands. The operands are packed by 32-bit words, with earlier
  536. operands occupying the lower bits.
  537. For example, consider the following instruction specification:
  538. @deftypefn Instruction {} call f24:@var{proc} x8:@var{_} c24:@var{nlocals}
  539. @end deftypefn
  540. The first word in the instruction will start with the 8-bit value
  541. corresponding to the @var{call} opcode in the low bits, followed by
  542. @var{proc} as a 24-bit value. The second word starts with 8 dead bits,
  543. followed by the index as a 24-bit immediate value.
  544. For instructions with operands that encode references to the stack, the
  545. interpretation of those stack values is up to the instruction itself.
  546. Most instructions expect their operands to be tagged SCM values
  547. (@code{scm} representation), but some instructions expect unboxed
  548. integers (@code{u64} and @code{s64} representations) or floating-point
  549. numbers (@code{f64} representation). It is assumed that the bits for a
  550. @code{u64} value are the same as those for an @code{s64} value, and that
  551. @code{s64} values are stored in two's complement.
  552. Instructions have static types: they must receive their operands in the
  553. format they expect. It's up to the compiler to ensure this is the case.
  554. Unless otherwise mentioned, all operands and results are in the
  555. @code{scm} representation.
  556. @menu
  557. * Call and Return Instructions::
  558. * Function Prologue Instructions::
  559. * Shuffling Instructions::
  560. * Trampoline Instructions::
  561. * Non-Local Control Flow Instructions::
  562. * Instrumentation Instructions::
  563. * Intrinsic Call Instructions::
  564. * Constant Instructions::
  565. * Memory Access Instructions::
  566. * Atomic Memory Access Instructions::
  567. * Tagging and Untagging Instructions::
  568. * Integer Arithmetic Instructions::
  569. * Floating-Point Arithmetic Instructions::
  570. * Comparison Instructions::
  571. * Branch Instructions::
  572. * Raw Memory Access Instructions::
  573. @end menu
  574. @node Call and Return Instructions
  575. @subsubsection Call and Return Instructions
  576. As described earlier (@pxref{Stack Layout}), Guile's calling convention
  577. is that arguments are passed and values returned on the stack.
  578. For calls, both in tail position and in non-tail position, we require
  579. that the procedure and the arguments already be shuffled into place
  580. before the call instruction. ``Into place'' for a tail call means that
  581. the procedure should be in slot 0, relative to the @code{fp}, and the
  582. arguments should follow. For a non-tail call, if the procedure is in
  583. @code{fp}-relative slot @var{n}, the arguments should follow from slot
  584. @var{n}+1, and there should be three free slots between @var{n}-1 and
  585. @var{n}-3 in which to save the mRA, vRA, and @code{fp}.
  586. Returning values is similar. Multiple-value returns should have values
  587. already shuffled down to start from @code{fp}-relative slot 0 before
  588. emitting @code{return-values}.
  589. In both calls and returns, the @code{sp} is used to indicate to the
  590. callee or caller the number of arguments or return values, respectively.
  591. After receiving return values, it is the caller's responsibility to
  592. @dfn{restore the frame} by resetting the @code{sp} to its former value.
  593. @deftypefn Instruction {} call f24:@var{proc} x8:@var{_} c24:@var{nlocals}
  594. Call a procedure. @var{proc} is the local corresponding to a procedure.
  595. The three values below @var{proc} will be overwritten by the saved call
  596. frame data. The new frame will have space for @var{nlocals} locals: one
  597. for the procedure, and the rest for the arguments which should already
  598. have been pushed on.
  599. When the call returns, execution proceeds with the next instruction.
  600. There may be any number of values on the return stack; the precise
  601. number can be had by subtracting the address of @var{proc}-1 from the
  602. post-call @code{sp}.
  603. @end deftypefn
  604. @deftypefn Instruction {} call-label f24:@var{proc} x8:@var{_} c24:@var{nlocals} l32:@var{label}
  605. Call a procedure in the same compilation unit.
  606. This instruction is just like @code{call}, except that instead of
  607. dereferencing @var{proc} to find the call target, the call target is
  608. known to be at @var{label}, a signed 32-bit offset in 32-bit units from
  609. the current @code{ip}. Since @var{proc} is not dereferenced, it may be
  610. some other representation of the closure.
  611. @end deftypefn
  612. @deftypefn Instruction {} tail-call x24:@var{_}
  613. Tail-call a procedure. Requires that the procedure and all of the
  614. arguments have already been shuffled into position, and that the frame
  615. has already been reset to the number of arguments to the call.
  616. @end deftypefn
  617. @deftypefn Instruction {} tail-call-label x24:@var{_} l32:@var{label}
  618. Tail-call a known procedure. As @code{call} is to @code{call-label},
  619. @code{tail-call} is to @code{tail-call-label}.
  620. @end deftypefn
  621. @deftypefn Instruction {} return-values x24:@var{_}
  622. Return a number of values from a call frame. The return values should
  623. have already been shuffled down to a contiguous array starting at slot
  624. 0, and the frame already reset.
  625. @end deftypefn
  626. @deftypefn Instruction {} receive f12:@var{dst} f12:@var{proc} x8:@var{_} c24:@var{nlocals}
  627. Receive a single return value from a call whose procedure was in
  628. @var{proc}, asserting that the call actually returned at least one
  629. value. Afterwards, resets the frame to @var{nlocals} locals.
  630. @end deftypefn
  631. @deftypefn Instruction {} receive-values f24:@var{proc} b1:@var{allow-extra?} x7:@var{_} c24:@var{nvalues}
  632. Receive a return of multiple values from a call whose procedure was in
  633. @var{proc}. If fewer than @var{nvalues} values were returned, signal an
  634. error. Unless @var{allow-extra?} is true, require that the number of
  635. return values equals @var{nvalues} exactly. After @code{receive-values}
  636. has run, the values can be copied down via @code{mov}, or used in place.
  637. @end deftypefn
  638. @node Function Prologue Instructions
  639. @subsubsection Function Prologue Instructions
  640. A function call in Guile is very cheap: the VM simply hands control to
  641. the procedure. The procedure itself is responsible for asserting that it
  642. has been passed an appropriate number of arguments. This strategy allows
  643. arbitrarily complex argument parsing idioms to be developed, without
  644. harming the common case.
  645. For example, only calls to keyword-argument procedures ``pay'' for the
  646. cost of parsing keyword arguments. (At the time of this writing, calling
  647. procedures with keyword arguments is typically two to four times as
  648. costly as calling procedures with a fixed set of arguments.)
  649. @deftypefn Instruction {} assert-nargs-ee c24:@var{expected}
  650. @deftypefnx Instruction {} assert-nargs-ge c24:@var{expected}
  651. @deftypefnx Instruction {} assert-nargs-le c24:@var{expected}
  652. If the number of actual arguments is not @code{==}, @code{>=}, or
  653. @code{<=} @var{expected}, respectively, signal an error.
  654. The number of arguments is determined by subtracting the stack pointer
  655. from the frame pointer (@code{fp - sp}). @xref{Stack Layout}, for more
  656. details on stack frames. Note that @var{expected} includes the
  657. procedure itself.
  658. @end deftypefn
  659. @deftypefn Instruction {} arguments<=? c24:@var{expected}
  660. Set the @code{LESS_THAN}, @code{EQUAL}, or @code{NONE} comparison result
  661. values if the number of arguments is respectively less than, equal to,
  662. or greater than @var{expected}.
  663. @end deftypefn
  664. @deftypefn Instruction {} positional-arguments<=? c24:@var{nreq} x8:@var{_} c24:@var{expected}
  665. Set the @code{LESS_THAN}, @code{EQUAL}, or @code{NONE} comparison result
  666. values if the number of positional arguments is respectively less than,
  667. equal to, or greater than @var{expected}. The first @var{nreq}
  668. arguments are positional arguments, as are the subsequent arguments that
  669. are not keywords.
  670. @end deftypefn
  671. The @code{arguments<=?} and @code{positional-arguments<=?} instructions
  672. are used to implement multiple arities, as in @code{case-lambda}.
  673. @xref{Case-lambda}, for more information. @xref{Branch Instructions},
  674. for more on comparison results.
  675. @deftypefn Instruction {} bind-kwargs c24:@var{nreq} c8:@var{flags} c24:@var{nreq-and-opt} x8:@var{_} c24:@var{ntotal} n32:@var{kw-offset}
  676. @var{flags} is a bitfield, whose lowest bit is @var{allow-other-keys},
  677. second bit is @var{has-rest}, and whose following six bits are unused.
  678. Find the last positional argument, and shuffle all the rest above
  679. @var{ntotal}. Initialize the intervening locals to
  680. @code{SCM_UNDEFINED}. Then load the constant at @var{kw-offset} words
  681. from the current @var{ip}, and use it and the @var{allow-other-keys}
  682. flag to bind keyword arguments. If @var{has-rest}, collect all shuffled
  683. arguments into a list, and store it in @var{nreq-and-opt}. Finally,
  684. clear the arguments that we shuffled up.
  685. The parsing is driven by a keyword arguments association list, looked up
  686. using @var{kw-offset}. The alist is a list of pairs of the form
  687. @code{(@var{kw} . @var{index})}, mapping keyword arguments to their
  688. local slot indices. Unless @code{allow-other-keys} is set, the parser
  689. will signal an error if an unknown key is found.
  690. A macro-mega-instruction.
  691. @end deftypefn
  692. @deftypefn Instruction {} bind-optionals f24:@var{nlocals}
  693. Expand the current frame to have at least @var{nlocals} locals, filling
  694. in any fresh values with @code{SCM_UNDEFINED}. If the frame has more
  695. than @var{nlocals} locals, it is left as it is.
  696. @end deftypefn
  697. @deftypefn Instruction {} bind-rest f24:@var{dst}
  698. Collect any arguments at or above @var{dst} into a list, and store that
  699. list at @var{dst}.
  700. @end deftypefn
  701. @deftypefn Instruction {} alloc-frame c24:@var{nlocals}
  702. Ensure that there is space on the stack for @var{nlocals} local
  703. variables. The value of any new local is undefined.
  704. @end deftypefn
  705. @deftypefn Instruction {} reset-frame c24:@var{nlocals}
  706. Like @code{alloc-frame}, but doesn't check that the stack is big enough,
  707. and doesn't initialize values to @code{SCM_UNDEFINED}. Used to reset
  708. the frame size to something less than the size that was previously set
  709. via alloc-frame.
  710. @end deftypefn
  711. @deftypefn Instruction {} assert-nargs-ee/locals c12:@var{expected} c12:@var{nlocals}
  712. Equivalent to a sequence of @code{assert-nargs-ee} and
  713. @code{allocate-frame}. The number of locals reserved is @var{expected}
  714. + @var{nlocals}.
  715. @end deftypefn
  716. @node Shuffling Instructions
  717. @subsubsection Shuffling Instructions
  718. These instructions are used to move around values on the stack.
  719. @deftypefn Instruction {} mov s12:@var{dst} s12:@var{src}
  720. @deftypefnx Instruction {} long-mov s24:@var{dst} x8:@var{_} s24:@var{src}
  721. Copy a value from one local slot to another.
  722. As discussed previously, procedure arguments and local variables are
  723. allocated to local slots. Guile's compiler tries to avoid shuffling
  724. variables around to different slots, which often makes @code{mov}
  725. instructions redundant. However there are some cases in which shuffling
  726. is necessary, and in those cases, @code{mov} is the thing to use.
  727. @end deftypefn
  728. @deftypefn Instruction {} long-fmov f24:@var{dst} x8:@var{_} f24:@var{src}
  729. Copy a value from one local slot to another, but addressing slots
  730. relative to the @code{fp} instead of the @code{sp}. This is used when
  731. shuffling values into place after multiple-value returns.
  732. @end deftypefn
  733. @deftypefn Instruction {} push s24:@var{src}
  734. Bump the stack pointer by one word, and fill it with the value from slot
  735. @var{src}. The offset to @var{src} is calculated before the stack
  736. pointer is adjusted.
  737. @end deftypefn
  738. The @code{push} instruction is used when another instruction is unable
  739. to address an operand because the operand is encoded with fewer than 24
  740. bits. In that case, Guile's assembler will transparently emit code that
  741. temporarily pushes any needed operands onto the stack, emits the
  742. original instruction to address those now-near variables, then shuffles
  743. the result (if any) back into place.
  744. @deftypefn Instruction {} pop s24:@var{dst}
  745. Pop the stack pointer, storing the value that was there in slot
  746. @var{dst}. The offset to @var{dst} is calculated after the stack
  747. pointer is adjusted.
  748. @end deftypefn
  749. @deftypefn Instruction {} drop c24:@var{count}
  750. Pop the stack pointer by @var{count} words, discarding any values that
  751. were stored there.
  752. @end deftypefn
  753. @deftypefn Instruction {} shuffle-down f12:@var{from} f12:@var{to}
  754. Shuffle down values from @var{from} to @var{to}, reducing the frame size
  755. by @var{FROM}-@var{TO} slots. Part of the internal implementation of
  756. @code{call-with-values}, @code{values}, and @code{apply}.
  757. @end deftypefn
  758. @deftypefn Instruction {} expand-apply-argument x24:@var{_}
  759. Take the last local in a frame and expand it out onto the stack, as for
  760. the last argument to @code{apply}.
  761. @end deftypefn
  762. @node Trampoline Instructions
  763. @subsubsection Trampoline Instructions
  764. Though most applicable objects in Guile are procedures implemented in
  765. bytecode, not all are. There are primitives, continuations, and other
  766. procedure-like objects that have their own calling convention. Instead
  767. of adding special cases to the @code{call} instruction, Guile wraps
  768. these other applicable objects in VM trampoline procedures, then
  769. provides special support for these objects in bytecode.
  770. Trampoline procedures are typically generated by Guile at runtime, for
  771. example in response to a call to @code{scm_c_make_gsubr}. As such, a
  772. compiler probably shouldn't emit code with these instructions. However,
  773. it's still interesting to know how these things work, so we document
  774. these trampoline instructions here.
  775. @deftypefn Instruction {} subr-call c24:@var{idx}
  776. Call a subr, passing all locals in this frame as arguments, and storing
  777. the results on the stack, ready to be returned.
  778. @end deftypefn
  779. @deftypefn Instruction {} foreign-call c12:@var{cif-idx} c12:@var{ptr-idx}
  780. Call a foreign function. Fetch the @var{cif} and foreign pointer from
  781. @var{cif-idx} and @var{ptr-idx} closure slots of the callee. Arguments
  782. are taken from the stack, and results placed on the stack, ready to be
  783. returned.
  784. @end deftypefn
  785. @deftypefn Instruction {} builtin-ref s12:@var{dst} c12:@var{idx}
  786. Load a builtin stub by index into @var{dst}.
  787. @end deftypefn
  788. @node Non-Local Control Flow Instructions
  789. @subsubsection Non-Local Control Flow Instructions
  790. @deftypefn Instruction {} capture-continuation s24:@var{dst}
  791. Capture the current continuation, and write it to @var{dst}. Part of
  792. the implementation of @code{call/cc}.
  793. @end deftypefn
  794. @deftypefn Instruction {} continuation-call c24:@var{contregs}
  795. Return to a continuation, nonlocally. The arguments to the continuation
  796. are taken from the stack. @var{contregs} is a free variable containing
  797. the reified continuation.
  798. @end deftypefn
  799. @deftypefn Instruction {} abort x24:@var{_}
  800. Abort to a prompt handler. The tag is expected in slot 1, and the rest
  801. of the values in the frame are returned to the prompt handler. This
  802. corresponds to a tail application of @code{abort-to-prompt}.
  803. If no prompt can be found in the dynamic environment with the given tag,
  804. an error is signalled. Otherwise all arguments are passed to the
  805. prompt's handler, along with the captured continuation, if necessary.
  806. If the prompt's handler can be proven to not reference the captured
  807. continuation, no continuation is allocated. This decision happens
  808. dynamically, at run-time; the general case is that the continuation may
  809. be captured, and thus resumed. A reinstated continuation will have its
  810. arguments pushed on the stack from slot 0, as if from a multiple-value
  811. return, and control resumes in the caller. Thus to the calling
  812. function, a call to @code{abort-to-prompt} looks like any other function
  813. call.
  814. @end deftypefn
  815. @deftypefn Instruction {} compose-continuation c24:@var{cont}
  816. Compose a partial continuation with the current continuation. The
  817. arguments to the continuation are taken from the stack. @var{cont} is a
  818. free variable containing the reified continuation.
  819. @end deftypefn
  820. @deftypefn Instruction {} prompt s24:@var{tag} b1:@var{escape-only?} x7:@var{_} f24:@var{proc-slot} x8:@var{_} l24:@var{handler-offset}
  821. Push a new prompt on the dynamic stack, with a tag from @var{tag} and a
  822. handler at @var{handler-offset} words from the current @var{ip}.
  823. If an abort is made to this prompt, control will jump to the handler.
  824. The handler will expect a multiple-value return as if from a call with
  825. the procedure at @var{proc-slot}, with the reified partial continuation
  826. as the first argument, followed by the values returned to the handler.
  827. If control returns to the handler, the prompt is already popped off by
  828. the abort mechanism. (Guile's @code{prompt} implements Felleisen's
  829. @dfn{--F--} operator.)
  830. If @var{escape-only?} is nonzero, the prompt will be marked as
  831. escape-only, which allows an abort to this prompt to avoid reifying the
  832. continuation.
  833. @xref{Prompts}, for more information on prompts.
  834. @end deftypefn
  835. @deftypefn Instruction {} throw s12:@var{key} s12:@var{args}
  836. Raise an error by throwing to @var{key} and @var{args}. @var{args}
  837. should be a list.
  838. @end deftypefn
  839. @deftypefn Instruction {} throw/value s24:@var{value} n32:@var{key-subr-and-message}
  840. @deftypefnx Instruction {} throw/value+data s24:@var{value} n32:@var{key-subr-and-message}
  841. Raise an error, indicating @var{val} as the bad value.
  842. @var{key-subr-and-message} should be a vector, where the first element
  843. is the symbol to which to throw, the second is the procedure in which to
  844. signal the error (a string) or @code{#f}, and the third is a format
  845. string for the message, with one template. These instructions do not
  846. fall through.
  847. Both of these instructions throw to a key with four arguments: the
  848. procedure that indicates the error (or @code{#f}, the format string, a
  849. list with @var{value}, and either @code{#f} or the list with @var{value}
  850. as the last argument respectively.
  851. @end deftypefn
  852. @node Instrumentation Instructions
  853. @subsubsection Instrumentation Instructions
  854. @deftypefn Instruction {} instrument-entry x24_@var{_} n32:@var{data}
  855. @deftypefnx Instruction {} instrument-loop x24_@var{_} n32:@var{data}
  856. Increase execution counter for this function and potentially tier up to
  857. the next JIT level. @var{data} is an offset to a structure recording
  858. execution counts and the next-level JIT code corresponding to this
  859. function. The increment values are currently 30 for
  860. @code{instrument-entry} and 2 for @code{instrument-loop}.
  861. @code{instrument-entry} will also run the apply hook, if VM hooks are
  862. enabled.
  863. @end deftypefn
  864. @deftypefn Instruction {} handle-interrupts x24:@var{_}
  865. Handle pending asynchronous interrupts (asyncs). @xref{Asyncs}. The
  866. compiler inserts @code{handle-interrupts} instructions before any call,
  867. return, or loop back-edge.
  868. @end deftypefn
  869. @deftypefn Instruction {} return-from-interrupt x24:@var{_}
  870. A special instruction to return from a call and also pop off the stack
  871. frame from the call. Used when returning from asynchronous interrupts.
  872. @end deftypefn
  873. @node Intrinsic Call Instructions
  874. @subsubsection Intrinsic Call Instructions
  875. Guile's instruction set is low-level. This is good because the separate
  876. components of, say, a @code{vector-ref} operation might be able to be
  877. optimized out, leaving only the operations that need to be performed at
  878. run-time.
  879. However some macro-operations may need to perform large amounts of
  880. computation at run-time to handle all the edge cases, and whose
  881. micro-operation components aren't amenable to optimization.
  882. Residualizing code for the entire macro-operation would lead to code
  883. bloat with no benefit.
  884. In this kind of a case, Guile's VM calls out to @dfn{intrinsics}:
  885. run-time routines written in the host language (currently C, possibly
  886. more in the future if Guile gains more run-time targets like
  887. WebAssembly). There is one instruction for each instrinsic prototype;
  888. the intrinsic is specified by index in the instruction.
  889. @deftypefn Instruction {} call-thread x24:@var{_} c32:@var{idx}
  890. Call the @code{void}-returning instrinsic with index @var{idx}, passing
  891. the current @code{scm_thread*} as the argument.
  892. @end deftypefn
  893. @deftypefn Instruction {} call-thread-scm s24:@var{a} c32:@var{idx}
  894. Call the @code{void}-returning instrinsic with index @var{idx}, passing
  895. the current @code{scm_thread*} and the @code{scm} local @var{a} as
  896. arguments.
  897. @end deftypefn
  898. @deftypefn Instruction {} call-thread-scm-scm s12:@var{a} s12:@var{b} c32:@var{idx}
  899. Call the @code{void}-returning instrinsic with index @var{idx}, passing
  900. the current @code{scm_thread*} and the @code{scm} locals @var{a} and
  901. @var{b} as arguments.
  902. @end deftypefn
  903. @deftypefn Instruction {} call-scm-sz-u32 s12:@var{a} s12:@var{b} c32:@var{idx}
  904. Call the @code{void}-returning instrinsic with index @var{idx}, passing
  905. the locals @var{a}, @var{b}, and @var{c} as arguments. @var{a} is a
  906. @code{scm} value, while @var{b} and @var{c} are raw @code{u64} values
  907. which fit into @code{size_t} and @code{uint32_t} types, respectively.
  908. @end deftypefn
  909. @deftypefn Instruction {} call-scm<-thread s24:@var{dst} c32:@var{idx}
  910. Call the @code{SCM}-returning instrinsic with index @var{idx}, passing
  911. the current @code{scm_thread*} as the argument. Place the result in
  912. @var{dst}.
  913. @end deftypefn
  914. @deftypefn Instruction {} call-scm<-u64 s12:@var{dst} s12:@var{a} c32:@var{idx}
  915. Call the @code{SCM}-returning instrinsic with index @var{idx}, passing
  916. @code{u64} local @var{a} as the argument. Place the result in
  917. @var{dst}.
  918. @end deftypefn
  919. @deftypefn Instruction {} call-scm<-s64 s12:@var{dst} s12:@var{a} c32:@var{idx}
  920. Call the @code{SCM}-returning instrinsic with index @var{idx}, passing
  921. @code{s64} local @var{a} as the argument. Place the result in
  922. @var{dst}.
  923. @end deftypefn
  924. @deftypefn Instruction {} call-scm<-scm s12:@var{dst} s12:@var{a} c32:@var{idx}
  925. Call the @code{SCM}-returning instrinsic with index @var{idx}, passing
  926. @code{scm} local @var{a} as the argument. Place the result in
  927. @var{dst}.
  928. @end deftypefn
  929. @deftypefn Instruction {} call-u64<-scm s12:@var{dst} s12:@var{a} c32:@var{idx}
  930. Call the @code{uint64_t}-returning instrinsic with index @var{idx},
  931. passing @code{scm} local @var{a} as the argument. Place the @code{u64}
  932. result in @var{dst}.
  933. @end deftypefn
  934. @deftypefn Instruction {} call-s64<-scm s12:@var{dst} s12:@var{a} c32:@var{idx}
  935. Call the @code{int64_t}-returning instrinsic with index @var{idx},
  936. passing @code{scm} local @var{a} as the argument. Place the @code{s64}
  937. result in @var{dst}.
  938. @end deftypefn
  939. @deftypefn Instruction {} call-f64<-scm s12:@var{dst} s12:@var{a} c32:@var{idx}
  940. Call the @code{double}-returning instrinsic with index @var{idx},
  941. passing @code{scm} local @var{a} as the argument. Place the @code{f64}
  942. result in @var{dst}.
  943. @end deftypefn
  944. @deftypefn Instruction {} call-scm<-scm-scm s8:@var{dst} s8:@var{a} s8:@var{b} c32:@var{idx}
  945. Call the @code{SCM}-returning instrinsic with index @var{idx}, passing
  946. @code{scm} locals @var{a} and @var{b} as arguments. Place the
  947. @code{scm} result in @var{dst}.
  948. @end deftypefn
  949. @deftypefn Instruction {} call-scm<-scm-uimm s8:@var{dst} s8:@var{a} c8:@var{b} c32:@var{idx}
  950. Call the @code{SCM}-returning instrinsic with index @var{idx}, passing
  951. @code{scm} local @var{a} and @code{uint8_t} immediate @var{b} as
  952. arguments. Place the @code{scm} result in @var{dst}.
  953. @end deftypefn
  954. @deftypefn Instruction {} call-scm<-thread-scm s12:@var{dst} s12:@var{a} c32:@var{idx}
  955. Call the @code{SCM}-returning instrinsic with index @var{idx}, passing
  956. the current @code{scm_thread*} and @code{scm} local @var{a} as
  957. arguments. Place the @code{scm} result in @var{dst}.
  958. @end deftypefn
  959. @deftypefn Instruction {} call-scm<-scm-u64 s8:@var{dst} s8:@var{a} s8:@var{b} c32:@var{idx}
  960. Call the @code{SCM}-returning instrinsic with index @var{idx}, passing
  961. @code{scm} local @var{a} and @code{u64} local @var{b} as arguments.
  962. Place the @code{scm} result in @var{dst}.
  963. @end deftypefn
  964. @deftypefn Instruction {} call-scm-scm s12:@var{a} s12:@var{b} c32:@var{idx}
  965. Call the @code{void}-returning instrinsic with index @var{idx}, passing
  966. @code{scm} locals @var{a} and @var{b} as arguments.
  967. @end deftypefn
  968. @deftypefn Instruction {} call-scm-scm-scm s8:@var{a} s8:@var{b} s8:@var{c} c32:@var{idx}
  969. Call the @code{void}-returning instrinsic with index @var{idx}, passing
  970. @code{scm} locals @var{a}, @var{b}, and @var{c} as arguments.
  971. @end deftypefn
  972. @deftypefn Instruction {} call-scm-uimm-scm s8:@var{a} c8:@var{b} s8:@var{c} c32:@var{idx}
  973. Call the @code{void}-returning instrinsic with index @var{idx}, passing
  974. @code{scm} local @var{a}, @code{uint8_t} immediate @var{b}, and
  975. @code{scm} local @var{c} as arguments.
  976. @end deftypefn
  977. There are corresponding macro-instructions for specific intrinsics.
  978. These are equivalent to @code{call-@var{instrinsic-kind}} instructions
  979. with the appropriate intrinsic @var{idx} arguments.
  980. @deffn {Macro Instruction} add dst a b
  981. @deffnx {Macro Instruction} add/immediate dst a b/imm
  982. Add @code{SCM} values @var{a} and @var{b} and place the result in
  983. @var{dst}.
  984. @end deffn
  985. @deffn {Macro Instruction} sub dst a b
  986. @deffnx {Macro Instruction} sub/immediate dst a b/imm
  987. Subtract @code{SCM} value @var{b} from @var{a} and place the result in
  988. @var{dst}.
  989. @end deffn
  990. @deffn {Macro Instruction} mul dst a b
  991. Multiply @code{SCM} values @var{a} and @var{b} and place the result in
  992. @var{dst}.
  993. @end deffn
  994. @deffn {Macro Instruction} div dst a b
  995. Divide @code{SCM} value @var{a} by @var{b} and place the result in
  996. @var{dst}.
  997. @end deffn
  998. @deffn {Macro Instruction} quo dst a b
  999. Compute the quotient of @code{SCM} values @var{a} and @var{b} and place
  1000. the result in @var{dst}.
  1001. @end deffn
  1002. @deffn {Macro Instruction} rem dst a b
  1003. Compute the remainder of @code{SCM} values @var{a} and @var{b} and place
  1004. the result in @var{dst}.
  1005. @end deffn
  1006. @deffn {Macro Instruction} mod dst a b
  1007. Compute the modulo of @code{SCM} value @var{a} by @var{b} and place the
  1008. result in @var{dst}.
  1009. @end deffn
  1010. @deffn {Macro Instruction} logand dst a b
  1011. Compute the bitwise @code{and} of @code{SCM} values @var{a} and @var{b}
  1012. and place the result in @var{dst}.
  1013. @end deffn
  1014. @deffn {Macro Instruction} logior dst a b
  1015. Compute the bitwise inclusive @code{or} of @code{SCM} values @var{a} and
  1016. @var{b} and place the result in @var{dst}.
  1017. @end deffn
  1018. @deffn {Macro Instruction} logxor dst a b
  1019. Compute the bitwise exclusive @code{or} of @code{SCM} values @var{a} and
  1020. @var{b} and place the result in @var{dst}.
  1021. @end deffn
  1022. @deffn {Macro Instruction} logsub dst a b
  1023. Compute the bitwise @code{and} of @code{SCM} value @var{a} and the
  1024. bitwise @code{not} of @var{b} and place the result in @var{dst}.
  1025. @end deffn
  1026. @deffn {Macro Instruction} lsh dst a b
  1027. @deffnx {Macro Instruction} lsh/immediate a b/imm
  1028. Shift @code{SCM} value @var{a} left by @code{u64} value @var{b} bits and
  1029. place the result in @var{dst}.
  1030. @end deffn
  1031. @deffn {Macro Instruction} rsh dst a b
  1032. @deffnx {Macro Instruction} rsh/immediate dst a b/imm
  1033. Shifts @code{SCM} value @var{a} right by @code{u64} value @var{b} bits
  1034. and place the result in @var{dst}.
  1035. @end deffn
  1036. @deffn {Macro Instruction} scm->f64 dst src
  1037. Convert @var{src} to an unboxed @code{f64} and place the result in
  1038. @var{dst}, or raises an error if @var{src} is not a real number.
  1039. @end deffn
  1040. @deffn {Macro Instruction} scm->u64 dst src
  1041. Convert @var{src} to an unboxed @code{u64} and place the result in
  1042. @var{dst}, or raises an error if @var{src} is not an integer within
  1043. range.
  1044. @end deffn
  1045. @deffn {Macro Instruction} scm->u64/truncate dst src
  1046. Convert @var{src} to an unboxed @code{u64} and place the result in
  1047. @var{dst}, truncating to the low 64 bits, or raises an error if
  1048. @var{src} is not an integer.
  1049. @end deffn
  1050. @deffn {Macro Instruction} scm->s64 dst src
  1051. Convert @var{src} to an unboxed @code{s64} and place the result in
  1052. @var{dst}, or raises an error if @var{src} is not an integer within
  1053. range.
  1054. @end deffn
  1055. @deffn {Macro Instruction} u64->scm dst src
  1056. Convert @var{u64} value @var{src} to a Scheme integer in @var{dst}.
  1057. @end deffn
  1058. @deffn {Macro Instruction} s64->scm scm<-s64
  1059. Convert @var{s64} value @var{src} to a Scheme integer in @var{dst}.
  1060. @end deffn
  1061. @deffn {Macro Instruction} string-set! str idx ch
  1062. Sets the character @var{idx} (a @code{u64}) of string @var{str} to
  1063. @var{ch} (a @code{u64} that is a valid character value).
  1064. @end deffn
  1065. @deffn {Macro Instruction} string->number dst src
  1066. Call @code{string->number} on @var{src} and place the result in
  1067. @var{dst}.
  1068. @end deffn
  1069. @deffn {Macro Instruction} string->symbol dst src
  1070. Call @code{string->symbol} on @var{src} and place the result in
  1071. @var{dst}.
  1072. @end deffn
  1073. @deffn {Macro Instruction} symbol->keyword dst src
  1074. Call @code{symbol->keyword} on @var{src} and place the result in
  1075. @var{dst}.
  1076. @end deffn
  1077. @deffn {Macro Instruction} class-of dst src
  1078. Set @var{dst} to the GOOPS class of @code{src}.
  1079. @end deffn
  1080. @deffn {Macro Instruction} wind winder unwinder
  1081. Push wind and unwind procedures onto the dynamic stack. Note that
  1082. neither are actually called; the compiler should emit calls to
  1083. @var{winder} and @var{unwinder} for the normal dynamic-wind control
  1084. flow. Also note that the compiler should have inserted checks that
  1085. @var{winder} and @var{unwinder} are thunks, if it could not prove that
  1086. to be the case. @xref{Dynamic Wind}.
  1087. @end deffn
  1088. @deffn {Macro Instruction} unwind
  1089. Exit from the dynamic extent of an expression, popping the top entry off
  1090. of the dynamic stack.
  1091. @end deffn
  1092. @deffn {Macro Instruction} push-fluid fluid value
  1093. Dynamically bind @var{value} to @var{fluid} by creating a with-fluids
  1094. object, pushing that object on the dynamic stack. @xref{Fluids and
  1095. Dynamic States}.
  1096. @end deffn
  1097. @deffn {Macro Instruction} pop-fluid
  1098. Leave the dynamic extent of a @code{with-fluid*} expression, restoring
  1099. the fluid to its previous value. @code{push-fluid} should always be
  1100. balanced with @code{pop-fluid}.
  1101. @end deffn
  1102. @deffn {Macro Instruction} fluid-ref dst fluid
  1103. Place the value associated with the fluid @var{fluid} in @var{dst}.
  1104. @end deffn
  1105. @deffn {Macro Instruction} fluid-set! fluid value
  1106. Set the value of the fluid @var{fluid} to @var{value}.
  1107. @end deffn
  1108. @deffn {Macro Instruction} push-dynamic-state state
  1109. Save the current set of fluid bindings on the dynamic stack and instate
  1110. the bindings from @var{state} instead. @xref{Fluids and Dynamic
  1111. States}.
  1112. @end deffn
  1113. @deffn {Macro Instruction} pop-dynamic-state
  1114. Restore a saved set of fluid bindings from the dynamic stack.
  1115. @code{push-dynamic-state} should always be balanced with
  1116. @code{pop-dynamic-state}.
  1117. @end deffn
  1118. @deffn {Macro Instruction} resolve-module dst name public?
  1119. Look up the module named @var{name}, resolve its public interface if the
  1120. immediate operand @var{public?} is true, then place the result in
  1121. @var{dst}.
  1122. @end deffn
  1123. @deffn {Macro Instruction} lookup dst mod sym
  1124. Look up @var{sym} in module @var{mod}, placing the resulting variable
  1125. (or @code{#f} if not found) in @var{dst}.
  1126. @end deffn
  1127. @deffn {Macro Instruction} define! dst mod sym
  1128. Look up @var{sym} in module @var{mod}, placing the resulting variable in
  1129. @var{dst}, creating the variable if needed.
  1130. @end deffn
  1131. @deffn {Macro Instruction} current-module dst
  1132. Set @var{dst} to the current module.
  1133. @end deffn
  1134. @deffn {Macro Instruction} $car dst src
  1135. @deffnx {Macro Instruction} $cdr dst src
  1136. @deffnx {Macro Instruction} $set-car! x val
  1137. @deffnx {Macro Instruction} $set-cdr! x val
  1138. @deffnx {Macro Instruction} $variable-ref dst src
  1139. @deffnx {Macro Instruction} $variable-set! x val
  1140. @deffnx {Macro Instruction} $vector-length dst x
  1141. @deffnx {Macro Instruction} $vector-ref dst x idx
  1142. @deffnx {Macro Instruction} $vector-ref/immediate dst x idx/imm
  1143. @deffnx {Macro Instruction} $vector-set! x idx v
  1144. @deffnx {Macro Instruction} $vector-set!/immediate x idx/imm v
  1145. @deffnx {Macro Instruction} $allocate-struct dst vtable nwords
  1146. @deffnx {Macro Instruction} $struct-vtable dst src
  1147. @deffnx {Macro Instruction} $struct-ref dst src idx
  1148. @deffnx {Macro Instruction} $struct-ref/immediate dst src idx/imm
  1149. @deffnx {Macro Instruction} $struct-set! x idx v
  1150. @deffnx {Macro Instruction} $struct-set!/immediate x idx/imm v
  1151. Intrinsics for use by the baseline compiler. The usual strategy for CPS
  1152. compilation is to expose the component parts of e.g. @code{vector-ref}
  1153. so that the compiler can learn from them and eliminate needless bits.
  1154. However in the non-optimizing baseline compiler, that's just overhead,
  1155. so we have some intrinsics that encapsulate all the usual type checks.
  1156. @end deffn
  1157. @node Constant Instructions
  1158. @subsubsection Constant Instructions
  1159. The following instructions load literal data into a program. There are
  1160. two kinds.
  1161. The first set of instructions loads immediate values. These
  1162. instructions encode the immediate directly into the instruction stream.
  1163. @deftypefn Instruction {} make-immediate s8:@var{dst} zi16:@var{low-bits}
  1164. Make an immediate whose low bits are @var{low-bits}, sign-extended.
  1165. @end deftypefn
  1166. @deftypefn Instruction {} make-short-immediate s8:@var{dst} i16:@var{low-bits}
  1167. Make an immediate whose low bits are @var{low-bits}, and whose top bits are
  1168. 0.
  1169. @end deftypefn
  1170. @deftypefn Instruction {} make-long-immediate s24:@var{dst} i32:@var{low-bits}
  1171. Make an immediate whose low bits are @var{low-bits}, and whose top bits are
  1172. 0.
  1173. @end deftypefn
  1174. @deftypefn Instruction {} make-long-long-immediate s24:@var{dst} a32:@var{high-bits} b32:@var{low-bits}
  1175. Make an immediate with @var{high-bits} and @var{low-bits}.
  1176. @end deftypefn
  1177. Non-immediate constant literals are referenced either directly or
  1178. indirectly. For example, Guile knows at compile-time what the layout of
  1179. a string will be like, and arranges to embed that object directly in the
  1180. compiled image. A reference to a string will use
  1181. @code{make-non-immediate} to treat a pointer into the compilation unit
  1182. as a @code{scm} value directly.
  1183. @deftypefn Instruction {} make-non-immediate s24:@var{dst} n32:@var{offset}
  1184. Load a pointer to statically allocated memory into @var{dst}. The
  1185. object's memory will be found @var{offset} 32-bit words away from the
  1186. current instruction pointer. Whether the object is mutable or immutable
  1187. depends on where it was allocated by the compiler, and loaded by the
  1188. loader.
  1189. @end deftypefn
  1190. Sometimes you need to load up a code pointer into a register; for this,
  1191. use @code{load-label}.
  1192. @deftypefn Instruction {} load-label s24:@var{dst} l32:@var{offset}
  1193. Load a label @var{offset} words away from the current @code{ip} and
  1194. write it to @var{dst}. @var{offset} is a signed 32-bit integer.
  1195. @end deftypefn
  1196. Finally, Guile supports a number of unboxed data types, with their
  1197. associate constant loaders.
  1198. @deftypefn Instruction {} load-f64 s24:@var{dst} au32:@var{high-bits} au32:@var{low-bits}
  1199. Load a double-precision floating-point value formed by joining
  1200. @var{high-bits} and @var{low-bits}, and write it to @var{dst}.
  1201. @end deftypefn
  1202. @deftypefn Instruction {} load-u64 s24:@var{dst} au32:@var{high-bits} au32:@var{low-bits}
  1203. Load an unsigned 64-bit integer formed by joining @var{high-bits} and
  1204. @var{low-bits}, and write it to @var{dst}.
  1205. @end deftypefn
  1206. @deftypefn Instruction {} load-s64 s24:@var{dst} au32:@var{high-bits} au32:@var{low-bits}
  1207. Load a signed 64-bit integer formed by joining @var{high-bits} and
  1208. @var{low-bits}, and write it to @var{dst}.
  1209. @end deftypefn
  1210. Some objects must be unique across the whole system. This is the case
  1211. for symbols and keywords. For these objects, Guile arranges to
  1212. initialize them when the compilation unit is loaded, storing them into a
  1213. slot in the image. References go indirectly through that slot.
  1214. @code{static-ref} is used in this case.
  1215. @deftypefn Instruction {} static-ref s24:@var{dst} r32:@var{offset}
  1216. Load a @var{scm} value into @var{dst}. The @var{scm} value will be fetched from
  1217. memory, @var{offset} 32-bit words away from the current instruction
  1218. pointer. @var{offset} is a signed value.
  1219. @end deftypefn
  1220. Fields of non-immediates may need to be fixed up at load time, because
  1221. we do not know in advance at what address they will be loaded. This is
  1222. the case, for example, for a pair containing a non-immediate in one of
  1223. its fields. @code{static-set!} and @code{static-patch!} are used in
  1224. these situations.
  1225. @deftypefn Instruction {} static-set! s24:@var{src} lo32:@var{offset}
  1226. Store a @var{scm} value into memory, @var{offset} 32-bit words away from the
  1227. current instruction pointer. @var{offset} is a signed value.
  1228. @end deftypefn
  1229. @deftypefn Instruction {} static-patch! x24:@var{_} lo32:@var{dst-offset} l32:@var{src-offset}
  1230. Patch a pointer at @var{dst-offset} to point to @var{src-offset}. Both offsets
  1231. are signed 32-bit values, indicating a memory address as a number
  1232. of 32-bit words away from the current instruction pointer.
  1233. @end deftypefn
  1234. @node Memory Access Instructions
  1235. @subsubsection Memory Access Instructions
  1236. In these instructions, the @code{/immediate} variants represent their
  1237. indexes or counts as immediates; otherwise these values are unboxed u64
  1238. locals.
  1239. @deftypefn Instruction {} allocate-words s12:@var{dst} s12:@var{count}
  1240. @deftypefnx Instruction {} allocate-words/immediate s12:@var{dst} c12:@var{count}
  1241. Allocate a fresh GC-traced object consisting of @var{count} words and
  1242. store it into @var{dst}.
  1243. @end deftypefn
  1244. @deftypefn Instruction {} scm-ref s8:@var{dst} s8:@var{obj} s8:@var{idx}
  1245. @deftypefnx Instruction {} scm-ref/immediate s8:@var{dst} s8:@var{obj} c8:@var{idx}
  1246. Load the @code{SCM} object at word offset @var{idx} from local
  1247. @var{obj}, and store it to @var{dst}.
  1248. @end deftypefn
  1249. @deftypefn Instruction {} scm-set! s8:@var{dst} s8:@var{idx} s8:@var{obj}
  1250. @deftypefnx Instruction {} scm-set!/immediate s8:@var{dst} c8:@var{idx} s8:@var{obj}
  1251. Store the @code{scm} local @var{val} into object @var{obj} at word
  1252. offset @var{idx}.
  1253. @end deftypefn
  1254. @deftypefn Instruction {} scm-ref/tag s8:@var{dst} s8:@var{obj} c8:@var{tag}
  1255. Load the first word of @var{obj}, subtract the immediate @var{tag}, and store the
  1256. resulting @code{SCM} to @var{dst}.
  1257. @end deftypefn
  1258. @deftypefn Instruction {} scm-set!/tag s8:@var{obj} c8:@var{tag} s8:@var{val}
  1259. Set the first word of @var{obj} to the unpacked bits of the @code{scm}
  1260. value @var{val} plus the immediate value @var{tag}.
  1261. @end deftypefn
  1262. @deftypefn Instruction {} word-ref s8:@var{dst} s8:@var{obj} s8:@var{idx}
  1263. @deftypefnx Instruction {} word-ref/immediate s8:@var{dst} s8:@var{obj} c8:@var{idx}
  1264. Load the word at offset @var{idx} from local @var{obj}, and store it to
  1265. the @code{u64} local @var{dst}.
  1266. @end deftypefn
  1267. @deftypefn Instruction {} word-set! s8:@var{dst} s8:@var{idx} s8:@var{obj}
  1268. @deftypefnx Instruction {} word-set!/immediate s8:@var{dst} c8:@var{idx} s8:@var{obj}
  1269. Store the @code{u64} local @var{val} into object @var{obj} at word
  1270. offset @var{idx}.
  1271. @end deftypefn
  1272. @deftypefn Instruction {} pointer-ref/immediate s8:@var{dst} s8:@var{obj} c8:@var{idx}
  1273. Load the pointer at offset @var{idx} from local @var{obj}, and store it
  1274. to the unboxed pointer local @var{dst}.
  1275. @end deftypefn
  1276. @deftypefn Instruction {} pointer-set!/immediate s8:@var{dst} c8:@var{idx} s8:@var{obj}
  1277. Store the unboxed pointer local @var{val} into object @var{obj} at word
  1278. offset @var{idx}.
  1279. @end deftypefn
  1280. @deftypefn Instruction {} tail-pointer-ref/immediate s8:@var{dst} s8:@var{obj} c8:@var{idx}
  1281. Compute the address of word offset @var{idx} from local @var{obj}, and store it
  1282. to @var{dst}.
  1283. @end deftypefn
  1284. @node Atomic Memory Access Instructions
  1285. @subsubsection Atomic Memory Access Instructions
  1286. @deftypefn Instruction {} current-thread s24:@var{dst}
  1287. Write the current thread into @var{dst}.
  1288. @end deftypefn
  1289. @deftypefn Instruction {} atomic-scm-ref/immediate s8:@var{dst} s8:@var{obj} c8:@var{idx}
  1290. Atomically load the @code{SCM} object at word offset @var{idx} from
  1291. local @var{obj}, using the sequential consistency memory model. Store
  1292. the result to @var{dst}.
  1293. @end deftypefn
  1294. @deftypefn Instruction {} atomic-scm-set!/immediate s8:@var{obj} c8:@var{idx} s8:@var{val}
  1295. Atomically set the @code{SCM} object at word offset @var{idx} from local
  1296. @var{obj} to @var{val}, using the sequential consistency memory model.
  1297. @end deftypefn
  1298. @deftypefn Instruction {} atomic-scm-swap!/immediate s24:@var{dst} x8:@var{_} s24:@var{obj} c8:@var{idx} s24:@var{val}
  1299. Atomically swap the @code{SCM} value stored in object @var{obj} at word
  1300. offset @var{idx} with @var{val}, using the sequentially consistent
  1301. memory model. Store the previous value to @var{dst}.
  1302. @end deftypefn
  1303. @deftypefn Instruction {} atomic-scm-compare-and-swap!/immediate s24:@var{dst} x8:@var{_} s24:@var{obj} c8:@var{idx} s24:@var{expected} x8:@var{_} s24:@var{desired}
  1304. Atomically swap the @code{SCM} value stored in object @var{obj} at word
  1305. offset @var{idx} with @var{desired}, if and only if the value that was
  1306. there was @var{expected}, using the sequentially consistent memory
  1307. model. Store the value that was previously at @var{idx} from @var{obj}
  1308. in @var{dst}.
  1309. @end deftypefn
  1310. @node Tagging and Untagging Instructions
  1311. @subsubsection Tagging and Untagging Instructions
  1312. @deftypefn Instruction {} tag-char s12:@var{dst} s12:@var{src}
  1313. Make a @code{SCM} character whose integer value is the @code{u64} in
  1314. @var{src}, and store it in @var{dst}.
  1315. @end deftypefn
  1316. @deftypefn Instruction {} untag-char s12:@var{dst} s12:@var{src}
  1317. Extract the integer value from the @code{SCM} character @var{src}, and
  1318. store the resulting @code{u64} in @var{dst}.
  1319. @end deftypefn
  1320. @deftypefn Instruction {} tag-fixnum s12:@var{dst} s12:@var{src}
  1321. Make a @code{SCM} integer whose value is the @code{s64} in @var{src},
  1322. and store it in @var{dst}.
  1323. @end deftypefn
  1324. @deftypefn Instruction {} untag-fixnum s12:@var{dst} s12:@var{src}
  1325. Extract the integer value from the @code{SCM} integer @var{src}, and
  1326. store the resulting @code{s64} in @var{dst}.
  1327. @end deftypefn
  1328. @node Integer Arithmetic Instructions
  1329. @subsubsection Integer Arithmetic Instructions
  1330. @deftypefn Instruction {} uadd s8:@var{dst} s8:@var{a} s8:@var{b}
  1331. @deftypefnx Instruction {} uadd/immediate s8:@var{dst} s8:@var{a} c8:@var{b}
  1332. Add the @code{u64} values @var{a} and @var{b}, and store the @code{u64}
  1333. result to @var{dst}. Overflow will wrap.
  1334. @end deftypefn
  1335. @deftypefn Instruction {} usub s8:@var{dst} s8:@var{a} s8:@var{b}
  1336. @deftypefnx Instruction {} usub/immediate s8:@var{dst} s8:@var{a} c8:@var{b}
  1337. Subtract the @code{u64} value @var{b} from @var{a}, and store the
  1338. @code{u64} result to @var{dst}. Underflow will wrap.
  1339. @end deftypefn
  1340. @deftypefn Instruction {} umul s8:@var{dst} s8:@var{a} s8:@var{b}
  1341. @deftypefnx Instruction {} umul/immediate s8:@var{dst} s8:@var{a} c8:@var{b}
  1342. Multiply the @code{u64} values @var{a} and @var{b}, and store the
  1343. @code{u64} result to @var{dst}. Overflow will wrap.
  1344. @end deftypefn
  1345. @deftypefn Instruction {} ulogand s8:@var{dst} s8:@var{a} s8:@var{b}
  1346. Place the bitwise @code{and} of the @code{u64} values @var{a} and
  1347. @var{b} into the @code{u64} local @var{dst}.
  1348. @end deftypefn
  1349. @deftypefn Instruction {} ulogior s8:@var{dst} s8:@var{a} s8:@var{b}
  1350. Place the bitwise inclusive @code{or} of the @code{u64} values @var{a}
  1351. and @var{b} into the @code{u64} local @var{dst}.
  1352. @end deftypefn
  1353. @deftypefn Instruction {} ulogxor s8:@var{dst} s8:@var{a} s8:@var{b}
  1354. Place the bitwise exclusive @code{or} of the @code{u64} values @var{a}
  1355. and @var{b} into the @code{u64} local @var{dst}.
  1356. @end deftypefn
  1357. @deftypefn Instruction {} ulogsub s8:@var{dst} s8:@var{a} s8:@var{b}
  1358. Place the bitwise @code{and} of the @code{u64} values @var{a} and the
  1359. bitwise @code{not} of @var{b} into the @code{u64} local @var{dst}.
  1360. @end deftypefn
  1361. @deftypefn Instruction {} ulsh s8:@var{dst} s8:@var{a} s8:@var{b}
  1362. @deftypefnx Instruction {} ulsh/immediate s8:@var{dst} s8:@var{a} c8:@var{b}
  1363. Shift the unboxed unsigned 64-bit integer in @var{a} left by @var{b}
  1364. bits, also an unboxed unsigned 64-bit integer. Truncate to 64 bits and
  1365. write to @var{dst} as an unboxed value. Only the lower 6 bits of
  1366. @var{b} are used.
  1367. @end deftypefn
  1368. @deftypefn Instruction {} ursh s8:@var{dst} s8:@var{a} s8:@var{b}
  1369. @deftypefnx Instruction {} ursh/immediate s8:@var{dst} s8:@var{a} c8:@var{b}
  1370. Shift the unboxed unsigned 64-bit integer in @var{a} right by @var{b}
  1371. bits, also an unboxed unsigned 64-bit integer. Truncate to 64 bits and
  1372. write to @var{dst} as an unboxed value. Only the lower 6 bits of
  1373. @var{b} are used.
  1374. @end deftypefn
  1375. @deftypefn Instruction {} srsh s8:@var{dst} s8:@var{a} s8:@var{b}
  1376. @deftypefnx Instruction {} srsh/immediate s8:@var{dst} s8:@var{a} c8:@var{b}
  1377. Shift the unboxed signed 64-bit integer in @var{a} right by @var{b}
  1378. bits, also an unboxed signed 64-bit integer. Truncate to 64 bits and
  1379. write to @var{dst} as an unboxed value. Only the lower 6 bits of
  1380. @var{b} are used.
  1381. @end deftypefn
  1382. @node Floating-Point Arithmetic Instructions
  1383. @subsubsection Floating-Point Arithmetic Instructions
  1384. @deftypefn Instruction {} fadd s8:@var{dst} s8:@var{a} s8:@var{b}
  1385. Add the @code{f64} values @var{a} and @var{b}, and store the @code{f64}
  1386. result to @var{dst}.
  1387. @end deftypefn
  1388. @deftypefn Instruction {} fsub s8:@var{dst} s8:@var{a} s8:@var{b}
  1389. Subtract the @code{f64} value @var{b} from @var{a}, and store the
  1390. @code{f64} result to @var{dst}.
  1391. @end deftypefn
  1392. @deftypefn Instruction {} fmul s8:@var{dst} s8:@var{a} s8:@var{b}
  1393. Multiply the @code{f64} values @var{a} and @var{b}, and store the
  1394. @code{f64} result to @var{dst}.
  1395. @end deftypefn
  1396. @deftypefn Instruction {} fdiv s8:@var{dst} s8:@var{a} s8:@var{b}
  1397. Divide the @code{f64} values @var{a} by @var{b}, and store the
  1398. @code{f64} result to @var{dst}.
  1399. @end deftypefn
  1400. @node Comparison Instructions
  1401. @subsubsection Comparison Instructions
  1402. @deftypefn Instruction {} u64=? s12:@var{a} s12:@var{b}
  1403. Set the comparison result to @var{EQUAL} if the @code{u64} values
  1404. @var{a} and @var{b} are the same, or @code{NONE} otherwise.
  1405. @end deftypefn
  1406. @deftypefn Instruction {} u64<? s12:@var{a} s12:@var{b}
  1407. Set the comparison result to @code{LESS_THAN} if the @code{u64} value
  1408. @var{a} is less than the @code{u64} value @var{b} are the same, or
  1409. @code{NONE} otherwise.
  1410. @end deftypefn
  1411. @deftypefn Instruction {} s64<? s12:@var{a} s12:@var{b}
  1412. Set the comparison result to @code{LESS_THAN} if the @code{s64} value
  1413. @var{a} is less than the @code{s64} value @var{b} are the same, or
  1414. @code{NONE} otherwise.
  1415. @end deftypefn
  1416. @deftypefn Instruction {} s64-imm=? s12:@var{a} z12:@var{b}
  1417. Set the comparison result to @var{EQUAL} if the @code{s64} value @var{a}
  1418. is equal to the immediate @code{s64} value @var{b}, or @code{NONE}
  1419. otherwise.
  1420. @end deftypefn
  1421. @deftypefn Instruction {} u64-imm<? s12:@var{a} c12:@var{b}
  1422. Set the comparison result to @code{LESS_THAN} if the @code{u64} value
  1423. @var{a} is less than the immediate @code{u64} value @var{b}, or
  1424. @code{NONE} otherwise.
  1425. @end deftypefn
  1426. @deftypefn Instruction {} imm-u64<? s12:@var{a} s12:@var{b}
  1427. Set the comparison result to @code{LESS_THAN} if the @code{u64}
  1428. immediate @var{b} is less than the @code{u64} value @var{a}, or
  1429. @code{NONE} otherwise.
  1430. @end deftypefn
  1431. @deftypefn Instruction {} s64-imm<? s12:@var{a} z12:@var{b}
  1432. Set the comparison result to @code{LESS_THAN} if the @code{s64} value
  1433. @var{a} is less than the immediate @code{s64} value @var{b}, or
  1434. @code{NONE} otherwise.
  1435. @end deftypefn
  1436. @deftypefn Instruction {} imm-s64<? s12:@var{a} z12:@var{b}
  1437. Set the comparison result to @code{LESS_THAN} if the @code{s64}
  1438. immediate @var{b} is less than the @code{s64} value @var{a}, or
  1439. @code{NONE} otherwise.
  1440. @end deftypefn
  1441. @deftypefn Instruction {} f64=? s12:@var{a} s12:@var{b}
  1442. Set the comparison result to @var{EQUAL} if the f64 value @var{a} is
  1443. equal to the f64 value @var{b}, or @code{NONE} otherwise.
  1444. @end deftypefn
  1445. @deftypefn Instruction {} f64<? s12:@var{a} s12:@var{b}
  1446. Set the comparison result to @code{LESS_THAN} if the f64 value @var{a}
  1447. is less than the f64 value @var{b}, @code{NONE} if @var{a} is greater
  1448. than or equal to @var{b}, or @code{INVALID} otherwise.
  1449. @end deftypefn
  1450. @deftypefn Instruction {} =? s12:@var{a} s12:@var{b}
  1451. Set the comparison result to @var{EQUAL} if the SCM values @var{a} and
  1452. @var{b} are numerically equal, in the sense of the Scheme @code{=}
  1453. operator. Set to @code{NONE} otherwise.
  1454. @end deftypefn
  1455. @deftypefn Instruction {} heap-numbers-equal? s12:@var{a} s12:@var{b}
  1456. Set the comparison result to @var{EQUAL} if the SCM values @var{a} and
  1457. @var{b} are numerically equal, in the sense of Scheme @code{=}. Set to
  1458. @code{NONE} otherwise. It is known that both @var{a} and @var{b} are
  1459. heap numbers.
  1460. @end deftypefn
  1461. @deftypefn Instruction {} <? s12:@var{a} s12:@var{b}
  1462. Set the comparison result to @code{LESS_THAN} if the SCM value @var{a}
  1463. is less than the SCM value @var{b}, @code{NONE} if @var{a} is greater
  1464. than or equal to @var{b}, or @code{INVALID} otherwise.
  1465. @end deftypefn
  1466. @deftypefn Instruction {} immediate-tag=? s24:@var{obj} c16:@var{mask} c16:@var{tag}
  1467. Set the comparison result to @var{EQUAL} if the result of a bitwise
  1468. @code{and} between the bits of @code{scm} value @var{a} and the
  1469. immediate @var{mask} is @var{tag}, or @code{NONE} otherwise.
  1470. @end deftypefn
  1471. @deftypefn Instruction {} heap-tag=? s24:@var{obj} c16:@var{mask} c16:@var{tag}
  1472. Set the comparison result to @var{EQUAL} if the result of a bitwise
  1473. @code{and} between the first word of @code{scm} value @var{a} and the
  1474. immediate @var{mask} is @var{tag}, or @code{NONE} otherwise.
  1475. @end deftypefn
  1476. @deftypefn Instruction {} eq? s12:@var{a} s12:@var{b}
  1477. Set the comparison result to @var{EQUAL} if the SCM values @var{a} and
  1478. @var{b} are @code{eq?}, or @code{NONE} otherwise.
  1479. @end deftypefn
  1480. @deftypefn Instruction {} eq-immediate? s8:@var{a} zi16:@var{b}
  1481. Set the comparison result to @var{EQUAL} if the SCM value @var{a} is
  1482. equal to the immediate SCM value @var{b} (sign-extended), or @code{NONE}
  1483. otherwise.
  1484. @end deftypefn
  1485. There are a set of macro-instructions for @code{immediate-tag=?} and
  1486. @code{heap-tag=?} as well that abstract away the precise type tag
  1487. values. @xref{The SCM Type in Guile}.
  1488. @deffn {Macro Instruction} fixnum? x
  1489. @deffnx {Macro Instruction} heap-object? x
  1490. @deffnx {Macro Instruction} char? x
  1491. @deffnx {Macro Instruction} eq-false? x
  1492. @deffnx {Macro Instruction} eq-nil? x
  1493. @deffnx {Macro Instruction} eq-null? x
  1494. @deffnx {Macro Instruction} eq-true? x
  1495. @deffnx {Macro Instruction} unspecified? x
  1496. @deffnx {Macro Instruction} undefined? x
  1497. @deffnx {Macro Instruction} eof-object? x
  1498. @deffnx {Macro Instruction} null? x
  1499. @deffnx {Macro Instruction} false? x
  1500. @deffnx {Macro Instruction} nil? x
  1501. Emit a @code{immediate-tag=?} instruction that will set the comparison
  1502. result to @code{EQUAL} if @var{x} would pass the corresponding predicate
  1503. (e.g. @code{null?}), or @code{NONE} otherwise.
  1504. @end deffn
  1505. @deffn {Macro Instruction} pair? x
  1506. @deffnx {Macro Instruction} struct? x
  1507. @deffnx {Macro Instruction} symbol? x
  1508. @deffnx {Macro Instruction} variable? x
  1509. @deffnx {Macro Instruction} vector? x
  1510. @deffnx {Macro Instruction} immutable-vector? x
  1511. @deffnx {Macro Instruction} mutable-vector? x
  1512. @deffnx {Macro Instruction} weak-vector? x
  1513. @deffnx {Macro Instruction} string? x
  1514. @deffnx {Macro Instruction} heap-number? x
  1515. @deffnx {Macro Instruction} hash-table? x
  1516. @deffnx {Macro Instruction} pointer? x
  1517. @deffnx {Macro Instruction} fluid? x
  1518. @deffnx {Macro Instruction} stringbuf? x
  1519. @deffnx {Macro Instruction} dynamic-state? x
  1520. @deffnx {Macro Instruction} frame? x
  1521. @deffnx {Macro Instruction} keyword? x
  1522. @deffnx {Macro Instruction} atomic-box? x
  1523. @deffnx {Macro Instruction} syntax? x
  1524. @deffnx {Macro Instruction} program? x
  1525. @deffnx {Macro Instruction} vm-continuation? x
  1526. @deffnx {Macro Instruction} bytevector? x
  1527. @deffnx {Macro Instruction} weak-set? x
  1528. @deffnx {Macro Instruction} weak-table? x
  1529. @deffnx {Macro Instruction} array? x
  1530. @deffnx {Macro Instruction} bitvector? x
  1531. @deffnx {Macro Instruction} smob? x
  1532. @deffnx {Macro Instruction} port? x
  1533. @deffnx {Macro Instruction} bignum? x
  1534. @deffnx {Macro Instruction} flonum? x
  1535. @deffnx {Macro Instruction} compnum? x
  1536. @deffnx {Macro Instruction} fracnum? x
  1537. Emit a @code{heap-tag=?} instruction that will set the comparison result
  1538. to @code{EQUAL} if @var{x} would pass the corresponding predicate
  1539. (e.g. @code{null?}), or @code{NONE} otherwise.
  1540. @end deffn
  1541. @node Branch Instructions
  1542. @subsubsection Branch Instructions
  1543. All offsets to branch instructions are 24-bit signed numbers, which
  1544. count 32-bit units. This gives Guile effectively a 26-bit address range
  1545. for relative jumps.
  1546. @deftypefn Instruction {} j l24:@var{offset}
  1547. Add @var{offset} to the current instruction pointer.
  1548. @end deftypefn
  1549. @deftypefn Instruction {} jl l24:@var{offset}
  1550. If the last comparison result is @code{LESS_THAN}, add @var{offset}, a
  1551. signed 24-bit number, to the current instruction pointer.
  1552. @end deftypefn
  1553. @deftypefn Instruction {} je l24:@var{offset}
  1554. If the last comparison result is @code{EQUAL}, add @var{offset}, a
  1555. signed 24-bit number, to the current instruction pointer.
  1556. @end deftypefn
  1557. @deftypefn Instruction {} jnl l24:@var{offset}
  1558. If the last comparison result is not @code{LESS_THAN}, add @var{offset},
  1559. a signed 24-bit number, to the current instruction pointer.
  1560. @end deftypefn
  1561. @deftypefn Instruction {} jne l24:@var{offset}
  1562. If the last comparison result is not @code{EQUAL}, add @var{offset}, a
  1563. signed 24-bit number, to the current instruction pointer.
  1564. @end deftypefn
  1565. @deftypefn Instruction {} jge l24:@var{offset}
  1566. If the last comparison result is @code{NONE}, add @var{offset}, a
  1567. signed 24-bit number, to the current instruction pointer.
  1568. This is intended for use after a @code{<?} comparison, and is different
  1569. from @code{jnl} in the way it handles not-a-number (NaN) values:
  1570. @code{<?} sets @code{INVALID} instead of @code{NONE} if either value is
  1571. a NaN. For exact numbers, @code{jge} is the same as @code{jnl}.
  1572. @end deftypefn
  1573. @deftypefn Instruction {} jnge l24:@var{offset}
  1574. If the last comparison result is not @code{NONE}, add @var{offset}, a
  1575. signed 24-bit number, to the current instruction pointer.
  1576. This is intended for use after a @code{<?} comparison, and is different
  1577. from @code{jl} in the way it handles not-a-number (NaN) values:
  1578. @code{<?} sets @code{INVALID} instead of @code{NONE} if either value is
  1579. a NaN. For exact numbers, @code{jnge} is the same as @code{jl}.
  1580. @end deftypefn
  1581. @deftypefn Instruction {} jtable s24:@var{idx} v32:@var{length} [x8:_ l24:@var{offset}]...
  1582. Branch to an entry in a table, as in C's @code{switch} statement.
  1583. @var{idx} is a @code{u64} local indicating which entry to branch to.
  1584. The immediate @var{len} indicates the number of entries in the table,
  1585. and should be greater than or equal to 1. The last entry in the table
  1586. is the "catch-all" entry. The @var{offset}... values are signed 24-bit
  1587. immediates (@code{l24} encoding), indicating a memory address as a
  1588. number of 32-bit words away from the current instruction pointer.
  1589. @end deftypefn
  1590. @node Raw Memory Access Instructions
  1591. @subsubsection Raw Memory Access Instructions
  1592. Bytevector operations correspond closely to what the current hardware
  1593. can do, so it makes sense to inline them to VM instructions, providing
  1594. a clear path for eventual native compilation. Without this, Scheme
  1595. programs would need other primitives for accessing raw bytes -- but
  1596. these primitives are as good as any.
  1597. @deftypefn Instruction {} u8-ref s8:@var{dst} s8:@var{ptr} s8:@var{idx}
  1598. @deftypefnx Instruction {} s8-ref s8:@var{dst} s8:@var{ptr} s8:@var{idx}
  1599. @deftypefnx Instruction {} u16-ref s8:@var{dst} s8:@var{ptr} s8:@var{idx}
  1600. @deftypefnx Instruction {} s16-ref s8:@var{dst} s8:@var{ptr} s8:@var{idx}
  1601. @deftypefnx Instruction {} u32-ref s8:@var{dst} s8:@var{ptr} s8:@var{idx}
  1602. @deftypefnx Instruction {} s32-ref s8:@var{dst} s8:@var{ptr} s8:@var{idx}
  1603. @deftypefnx Instruction {} u64-ref s8:@var{dst} s8:@var{ptr} s8:@var{idx}
  1604. @deftypefnx Instruction {} s64-ref s8:@var{dst} s8:@var{ptr} s8:@var{idx}
  1605. @deftypefnx Instruction {} f32-ref s8:@var{dst} s8:@var{ptr} s8:@var{idx}
  1606. @deftypefnx Instruction {} f64-ref s8:@var{dst} s8:@var{ptr} s8:@var{idx}
  1607. Fetch the item at byte offset @var{idx} from the raw pointer local
  1608. @var{ptr}, and store it in @var{dst}. All accesses use native
  1609. endianness.
  1610. The @var{idx} value should be an unboxed unsigned 64-bit integer.
  1611. The results are all written to the stack as unboxed values, either as
  1612. signed 64-bit integers, unsigned 64-bit integers, or IEEE double
  1613. floating point numbers.
  1614. @end deftypefn
  1615. @deftypefn Instruction {} u8-set! s8:@var{ptr} s8:@var{idx} s8:@var{val}
  1616. @deftypefnx Instruction {} s8-set! s8:@var{ptr} s8:@var{idx} s8:@var{val}
  1617. @deftypefnx Instruction {} u16-set! s8:@var{ptr} s8:@var{idx} s8:@var{val}
  1618. @deftypefnx Instruction {} s16-set! s8:@var{ptr} s8:@var{idx} s8:@var{val}
  1619. @deftypefnx Instruction {} u32-set! s8:@var{ptr} s8:@var{idx} s8:@var{val}
  1620. @deftypefnx Instruction {} s32-set! s8:@var{ptr} s8:@var{idx} s8:@var{val}
  1621. @deftypefnx Instruction {} u64-set! s8:@var{ptr} s8:@var{idx} s8:@var{val}
  1622. @deftypefnx Instruction {} s64-set! s8:@var{ptr} s8:@var{idx} s8:@var{val}
  1623. @deftypefnx Instruction {} f32-set! s8:@var{ptr} s8:@var{idx} s8:@var{val}
  1624. @deftypefnx Instruction {} f64-set! s8:@var{ptr} s8:@var{idx} s8:@var{val}
  1625. Store @var{val} into memory pointed to by raw pointer local @var{ptr},
  1626. at byte offset @var{idx}. Multibyte values are written using native
  1627. endianness.
  1628. The @var{idx} value should be an unboxed unsigned 64-bit integer.
  1629. The @var{val} values are all unboxed, either as signed 64-bit integers,
  1630. unsigned 64-bit integers, or IEEE double floating point numbers.
  1631. @end deftypefn
  1632. @node Just-In-Time Native Code
  1633. @subsection Just-In-Time Native Code
  1634. @cindex just-in-time compiler
  1635. @cindex jit compiler
  1636. @cindex template jit
  1637. @cindex compiler, just-in-time
  1638. The final piece of Guile's virtual machine is a just-in-time (JIT)
  1639. compiler from bytecode instructions to native code. It is faster to run
  1640. a function when its bytecode instructions are compiled to native code,
  1641. compared to having the VM interpret the instructions.
  1642. The JIT compiler runs automatically, triggered by counters associated
  1643. with each function. The counter increments when functions are called
  1644. and during each loop iteration. Once a function's counter passes a
  1645. certain value, the function gets JIT-compiled. @xref{Instrumentation
  1646. Instructions}, for full details.
  1647. Guile's JIT compiler is what is known as a @dfn{template JIT}. This
  1648. kind of JIT is very simple: for each instruction in a function, the JIT
  1649. compiler will emit a generic sequence of machine code corresponding to
  1650. the instruction kind, specializing that generic template to reference
  1651. the specific operands of the instruction being compiled.
  1652. The strength of a template JIT is principally that it is very fast at
  1653. emitting code. It doesn't need to do any time-consuming analysis on the
  1654. bytecode that it is compiling to do its job.
  1655. A template JIT is also very predictable: the native code emitted by a
  1656. template JIT has the same performance characteristics of the
  1657. corresponding bytecode, only that it runs faster. In theory you could
  1658. even generate the template-JIT machine code ahead of time, as it doesn't
  1659. depend on any value seen at run-time.
  1660. This predictability makes it possible to reason about the performance of
  1661. a system in terms of bytecode, knowing that the conclusions apply to
  1662. native code emitted by a template JIT.
  1663. Because the machine code corresponding to an instruction always performs
  1664. the same tasks that the interpreter would do for that instruction,
  1665. bytecode and a template JIT also allows Guile programmers to debug their
  1666. programs in terms of the bytecode model. When a Guile programmer sets a
  1667. breakpoint, Guile will disable the JIT for the thread being debugged,
  1668. falling back to the interpreter (which has the corresponding code to run
  1669. the hooks). @xref{VM Hooks}.
  1670. To emit native code, Guile uses a forked version of GNU Lightning. This
  1671. "Lightening" effort, spun out as a separate project, aims to build on
  1672. the back-end support from GNU Lightning, but adapting the API and
  1673. behavior of the library to match Guile's needs. This code is included
  1674. in the Guile source distribution. For more information, see
  1675. @url{https://gitlab.com/wingo/lightening}. As of mid-2019, Lightening
  1676. supports code generation for the x86-64, ia32, ARMv7, and AArch64
  1677. architectures.
  1678. The weaknesses of a template JIT are two-fold. Firstly, as a simple
  1679. back-end that has to run fast, a template JIT doesn't have time to do
  1680. analysis that could help it generate better code, notably global
  1681. register allocation and instruction selection.
  1682. However this is a minor weakness compared to the inability to perform
  1683. significant, speculative program transformations. For example, Guile
  1684. could see that in an expression @code{(f x)}, that in practice @var{f}
  1685. always refers to the same function. An advanced JIT compiler would
  1686. speculatively inline @var{f} into the call-site, along with a dynamic
  1687. check to make sure that the assertion still held. But as a template JIT
  1688. doesn't pay attention to values only known at run-time, it can't make
  1689. this transformation.
  1690. This limitation is mitigated in part by Guile's robust ahead-of-time
  1691. compiler which can already perform significant optimizations when it can
  1692. prove they will always be valid, and its low-level bytecode which is
  1693. able to represent the effect of those optimizations (e.g. elided
  1694. type-checks). @xref{Compiling to the Virtual Machine}, for more on
  1695. Guile's compiler.
  1696. An ahead-of-time Scheme-to-bytecode strategy, complemented by a template
  1697. JIT, also particularly suits the somewhat static nature of Scheme.
  1698. Scheme programmers often write code in a way that makes the identity of
  1699. free variable references lexically apparent. For example, the @code{(f
  1700. x)} expression could appear within a @code{(let ((f (lambda (x) (1+
  1701. x)))) ...)} expression, or we could see that @code{f} was imported from
  1702. a particular module where we know its binding. Ahead-of-time
  1703. compilation techniques can work well for a language like Scheme where
  1704. there is little polymorphism and much first-order programming. They do
  1705. not work so well for a language like JavaScript, which is highly mutable
  1706. at run-time and difficult to analyze due to method calls (which are
  1707. effectively higher-order calls).
  1708. All that said, a template JIT works well for Guile at this point. It's
  1709. only a few thousand lines of maintainable code, it speeds up Scheme
  1710. programs, and it keeps the bulk of the Guile Scheme implementation
  1711. written in Scheme itself. The next step is probably to add
  1712. ahead-of-time native code emission to the back-end of the compiler
  1713. written in Scheme, to take advantage of the opportunity to do global
  1714. register allocation and instruction selection. Once this is working, it
  1715. can allow Guile to experiment with speculative optimizations in Scheme
  1716. as well. @xref{Extending the Compiler}, for more on future directions.
  1717. Finally, note that there are a few environment variables that can be
  1718. tweaked to make JIT compilation happen sooner, later, or never.
  1719. @xref{Environment Variables}, for more.