[TOC]
SCM
unitypeScheme is an untyped language; any value can be stored in the common
SCM
unitype. Which concrete kind of value a given SCM
holds can be
determined by inspecting the value. (Of course, Guile’s compiler can
unbox values and sometimes will be able to work in e.g. raw f64
values. But we need a story for the general case).
Since we are targetting the GC
MVP,
we can represent SCM
as (ref eq)
. There is no need to allow
nullability in this type.
All immediates (fixnums, chars, bools, etc) are encoded as (ref i31)
values. The 231 possible values are partitioned by tagging.
The partition is the similar to what native Guile
does,
except that the bottom 0 bit is left off and we eliminate some
intermediate 0 bits. The fixnum range is therefore the same as in
native Guile for a 32-bit target: [-229, 229-1].
Note that there is a risk here, that i31ref
gets punted to
post-MVP, but this
appears to be unlikely.
In native Guile, there are immediate values and heap objects. Immediate
values have their contents directly in the bits of the SCM
. Heap
objects are SCM
values that are pointers to the heap. The type of the
object is encoded in the first word of memory pointed to by the SCM
value.
In WebAssembly, garbage-collected objects can also also be associated with a type identifier which WebAssembly programs can introspect over, for example to branch to a label if a value has a given type. We will use these built-in capabilities when implementing the various Scheme data types.
In order to support lightweight concurrency via delimited continuations and variable argument counts, Guile-on-WebAssembly uses the standard WebAssembly function call mechanism in a strange way.
On the side of continuations, the basic idea is that the compiler from
Guile to WebAssembly will transform the program so that all calls are
tail calls. Non-tail calls within a function are transformed so that
the caller pushs the live values flowing to the return point onto an
explicit stack, pushes the return continuation then tail-calls the
callee. Returning from a function is transforms so that the callee pops
the return continuation from the stack and tail-calls it. For full
details, see tailify.scm
from the wip-tailify
Guile
branch.
The advantage of this approach is that with an explicit stack representation, we can capture and reinstate delimited continuations, and write debuggers in terms of continuations whose structure can be inspected by Guile instead of the host WebAssembly system. The transformation is minimal, so that e.g. inner loops without calls are still as fast as direct-style compilation.
(This calling convention may be obviated by either the typed continuations or the fiber-based stack switching proposal, but we don't see either of these features reaching consensus and shipping before 2025 or so.)
Additionally, Guile functions can accept a variable number of arguments, whereas WebAssembly functions have a fixed type. In the general case we may need to pass arguments via a global argument-passing array. The first few arguments can be passed as parameters though. The number of arguments is also passed as a function parameter. Since all calls are tail calls, this convention applies to returning values as well.
The dynamic stack associates stack frames with dynamic-wind
winders,
prompts, individual fluid bindings, and dynamic states (whole sets of
fluid bindings). Not yet specified. Really upstream Guile should be
using continuation marks here, but we don't do this yet.
See below for more on fluids and threads, but basically Guile should keep a cache of the current values of some dynamically-scoped variables. Not yet specified.
The return stack is for stack-allocated continuations: basically those
that correspond to the return points of non-tail calls in the source
program. There will be separate stacks for SCM
values, raw i64
and
f64
values, and (ref func)
return continuations, but the concrete
representation remains to be specified.
An immediate is a fixnum, a char, or an oddball.
All immediate values are encoded as a (ref i31)
.
The i32
payload is extracted from the (ref i31)
via sign extension,
using i31.get_s
.
If the low bit of the payload is #b0
, the value is a fixnum, and its
signed integer value is in the high bits of payload.
If the low 2 bits of the payload are #b11
, the value is a char, and
its codepoint is in the high bits of the payload. The sign bit will
always be unset, because codepoints only go up to 221.
Otherwise, the possible payloads are:
#b000001
: 1: #f
#b000101
: 5: nil
#b001101
: 13: '()
(null)#b010001
: 17: #t
#b100001
: 33: the unspecified value#b101001
: 41: EOF#b111001
: 57: the undefined value (used internally)Some common oddball tests:
null?
: check for null or nil; (= (logand payload #b110111) #b000101)
false?
: check for false or nil; (= (logand payload #b111011) #b000001)
elisp-false?
: check for false or nil or null; (= (logand payload #b110011) #b000001)
(type $raw-immutable-bitvector (array i32))
(type $raw-immutable-bytevector (array i8))
(type $raw-bitvector (array (mut i32)))
(type $raw-bytevector (array (mut i8)))
(type $raw-scmvector (array (mut (ref eq))))
The functions residualized by the tailify transform take a variable
number of arguments. The number of values to take is passed to the
function as an argument. The first three arguments are passed in
parameters; any addition arguments get passed through a global array.
If a function has fewer than three parameters, you can pass any value as
the argument, for example (i31.new (i32.const 0))
.
Since we tailify everything, all calls are tail calls, so there are no return values.
(type $kvarargs (func (param $nargs i32)
(param $arg0 (ref eq))
(param $arg1 (ref eq))
(param $arg2 (ref eq))
(result)))
All Guile objects which are not represented as (ref i31)
are "heap
objects". These objects are all subtypes of $heap-object
:
(type $void-struct (struct))
(type $heap-object
(sub $void-struct
(struct
(field $hash (mut i32)))))
There are subtypes for each concrete kind of object that Guile uses (approximately 20 types).
As a WebAssembly type system detail, note that usually two types that
have the same shape will be deemed equivalent and thus
indistinguishable. It is the case that there are Guile objects that
have the same shape. However we use the "hybrid nominal typing"
facility to declare our types as being distinct, even when some of them
have the same shape. By wrapping all our types in a rec
block, we
ensure that dynamic type checks can be made with a simple ref.test
.
For symbols and keywords, the $hash
field is eagerly computed based on
the string-hash of the underlying string. For other data types, the
hash is computed lazily. A hash of 0 indicates an uninitialized hash.
Bit 0 is always set if the hash is initialized. Otherwise for immediate
values, there is a simple bit-mixing hash function.
(type $extern-ref
(sub $heap-object
(struct
(field $hash (mut i32))
(field $val (ref extern)))))
Sometimes we want to refer to a reference-typed value from the host, so
one data type is an $extern-ref
.
(type $heap-number
(sub $heap-object
(struct
(field $hash (mut i32)))))
There is a supertype for heap numbers, in case we need to quickly check that a non-fixnum is indeed a number. Then there are the concrete heap number types.
(type $bignum
(sub $heap-number
(struct
(field $hash (mut i32))
(field $val (ref extern)))))
(type $flonum
(sub $heap-number
(struct
(field $hash (mut i32))
(field $val f64))))
(type $complex
(sub $heap-number
(struct
(field $hash (mut i32))
(field $real f64)
(field $imag f64))))
(type $fraction
(sub $heap-number
(struct
(field $hash (mut i32))
(field $num (ref eq))
(field $denom (ref eq)))))
(type $pair
(sub $heap-object
(struct
(field $hash (mut i32))
(field $car (mut (ref eq)))
(field $cdr (mut (ref eq))))))
There is also a $mutable-pair
subtype of $pair
, with the same
fields. car
requires that the object be a $pair
, but set-car!
requires the subset of $pair
values which are also $mutable-pair
.
(type $vector
(sub $heap-object
(struct
(field $hash (mut i32))
(field $vals (ref $raw-scmvector)))))
There is also a $mutable-vector
subtype of $vector
with the same
fields.
You can get the length of the vector using array.length
on the $vals
field.
(type $bytevector
(sub $heap-object
(struct
(field $hash (mut i32))
(field $vals (ref $raw-bytevector)))))
There is also a $mutable-bytevector
subtype of $bytevector
with the same
fields.
You can get the length of the bytevector using array.length
on the
$vals
field.
We need an explicit length which is generally smaller than the storage
space in the raw i32
array.
(type $bitvector
(sub $heap-object
(struct
(field $hash (mut i32))
(field $len i32)
(field $bits (ref $raw-bitvector)))))
There is also a $mutable-bitvector
subtype of $bitvector
with the
same fields.
(type $string
(sub $heap-object
(struct
(field $hash (mut i32))
(field $str (mut (ref string))))))
It would be nice to just have (ref string)
be the string
representation, but stringref
is not a subtype of
eqref
. Therefore
we have to wrap strings with a tagged struct. But, this also gives us
the possibility to have a hashq field, and to possibly mutate the string
(by replacing its contents).
There is also a $mutable-string
subtype of $string
with the same
fields.
(type $proc
(sub $heap-object
(struct
(field $hash (mut i32))
(field $func (ref $kvarargs)))))
A procedure is just another name for a function. A $proc
is a tagged
function.
Some functions close over a set of free variables; for them,
there are subtypes of $proc
:
(type $closure1
(sub $proc
(struct
(field $hash (mut i32))
(field $func (ref $kvarargs))
(field $free0 (ref eq)))))
(type $closure2
(sub $proc
(struct
(field $hash (mut i32))
(field $func (ref $kvarargs))
(field $free0 (ref eq))
(field $free1 (ref eq)))))
;; ...
The set of closure types will depend on what is needed by the code being
compiled, and are not part of the rec
block of distinct types.
Also note that in the future, WebAssembly will support funcrefs which
are themselves closures; in that case we can avoid the separate
$closureN
types, storing the data in the funcref directly.
(type $symbol
(sub $heap-object
(struct
(field $hash (mut i32))
(field $name (ref string)))))
(type $keyword
(sub $heap-object
(struct
(field $hash (mut i32))
(field $name (ref $symbol)))))
How to compute the symbol's hash is not yet determined.
(type $variable
(sub $heap-object
(struct
(field $hash (mut i32))
(field $val (mut (ref eq))))))
(type $atomic-box
(sub $heap-object
(struct
(field $hash (mut i32))
(field $val (mut (ref eq))))))
WebAssembly does support multiple threads, but there is no multi-thread support for GC objects, so for the time being atomic boxes don't need to use atomic operations.
We use a simple buckets-and-chains hash table, implemented in Scheme.
(type $hash-table
(sub $heap-object
(struct
(field $hash (mut i32))
(field $size (mut (ref i31)))
(field $buckets (ref $vector)))))
(type $weak-table
(sub $heap-object
(struct
(field $hash (mut i32))
(field $val (ref extern)))))
The external value held by the weak table is a host-supplied weak map.
(type $fluid
(sub $heap-object
(struct
(field $hash (mut i32))
(field $init (ref eq)))))
(type $dynamic-state
(sub $heap-object
(struct
(field $hash (mut i32))
(field $val (ref extern)))))
In native Guile, a fluid is essentially a key and a dynamic state is a weak hash table mapping all fluids to their values. At run-time there is a per-thread cache for faster access to fluid values. There will have to be some run-time support routines for fluids.
(type $syntax
(sub $heap-object
(struct
(field $hash (mut i32))
(field $expr (ref eq))
(field $wrap (ref eq))
(field $module (ref eq))
(field $source (ref eq)))))
I dearly hope that we can avoid having psyntax
compiled to WebAssembly
for any module that doesn't include eval
. Still, read-syntax
can
produce syntax objects, which is a nice way of associating source info
with objects; it's a type in native Guile, so I guess it makes sense to
hollow out a space for it for Guile-on-WebAssembly.
Note there is also a scm_tc16_macro
for syntax transformers in native
Guile, that we will also need to implement at some point.
In this first version, we'll punt on these. We should check with Daniel Lloda about the state of his rewrite of arrays in Scheme.
We take inspiration from how native Guile represents ports. However for both WASI and Web environments, we can assume that I/O routines are all capable of returning a promise instead of blocking, so we don't need explicit support for e.g. read or write wait FDs; instead we can assume that we use the pure-Scheme suspendable port implementation, so delimited continuation suspend and resume will just work.
We can also simplify and assume UTF-8 encoding for textual I/O on ports.
(type $port-type
(struct
(field $name (ref string))
;; in guile these are (port, bv, start, count) -> size_t
(field $read (ref null $proc)) ;; could have a more refined type
(field $write (ref null $proc))
(field $seek (ref null $proc)) ;; (port, offset, whence) -> offset
(field $close (ref null $proc)) ;; (port) -> ()
(field $get-natural-buffer-sizes (ref null $proc)) ;; port -> (rdsz, wrsz)
(field $random-access? (ref null $proc)) ;; port -> bool
(field $input-waiting (ref null $proc)) ;; port -> bool
(field $truncate (ref null $proc)) ;; (port, length) -> ()
;; Guile also has GOOPS classes here.
))
(type $port
(sub $heap-object
(struct
(field $hash (mut i32))
(field $pt (ref $port-type))
(field $stream (mut (ref eq)))
(field $file_name (mut (ref eq)))
(field $position (ref $pair))
(field $read_buf (mut (ref eq))) ;; A 5-vector
(field $write_buf (mut (ref eq))) ;; A 5-vector
(field $write_buf_aux (mut (ref eq))) ;; A 5-vector
(field $read_buffering (mut i32))
(field $refcount (mut i32))
(field $rw_random (mut i8))
(field $properties (mut (ref eq))))))
The meanings of these various fields are as in native Guile; you have to go spelunking a bit to find this information as the port representation isn't public API/ABI. Also, there is quite a bit of run-time work needed here.
In native Guile there is also a "port-with-print-state" data type; unclear if we will need this eventually. Probably not.
(type $struct
(sub $heap-object
(struct
(field $hash (mut i32))
(field $vtable (mut (ref null $vtable))))))
(type $vtable
(sub $struct
(struct
(field $hash (mut i32))
(field $vtable (mut (ref null $vtable)))
(field $field0 (mut (ref eq)))
(field $field1 (mut (ref eq)))
(field $field2 (mut (ref eq)))
(field $field3 (mut (ref eq))))))
Guile's structs are the underlying facility that implements records and object-orientation. They have the oddity that their vtable is also a struct with at least 4 fields.
The $struct
and $vtable
definitions are inside the rec
block.
Specific struct types as needed by the program will be residualized as
needed. If the struct has more than 4 fields, it may store them in a
heap vector.
(type $struct1
(sub $struct
(struct
(field $hash (mut i32))
(field $vtable (mut (ref null $struct4)))
(field $field0 (mut (ref eq))))))
(type $struct2
(sub $struct
(struct
(field $hash (mut i32))
(field $vtable (mut (ref null $struct4)))
(field $field0 (mut (ref eq)))
(field $field1 (mut (ref eq))))))
(type $struct3
(sub $struct
(struct
(field $hash (mut i32))
(field $vtable (mut (ref null $struct4)))
(field $field0 (mut (ref eq)))
(field $field1 (mut (ref eq)))
(field $field2 (mut (ref eq))))))
(type $struct4
(sub $struct
(struct
(field $hash (mut i32))
(field $vtable (mut (ref null $struct4)))
(field $field0 (mut (ref eq)))
(field $field1 (mut (ref eq)))
(field $field2 (mut (ref eq)))
(field $field3 (mut (ref eq))))))
(type $structN
(sub $struct
(struct
(field $hash (mut i32))
(field $vtable (mut (ref null $struct4)))
(field $field0 (mut (ref eq)))
(field $field1 (mut (ref eq)))
(field $field2 (mut (ref eq)))
(field $field3 (mut (ref eq)))
(field $tail (ref $raw-scmvector)))))
Weak vectors are not yet supported.
Regular expressions: not yet supported. Would be a JS callout.
Random states.
Charsets.
First-class threads, mutexes, and condition variables.
The representation of first-class delimited and undelimited continuations is currently unspecified. (It will be a slice of all stacks, though.)
Should we be annotating struct types with final
? The MVP document
mentions it but binaryen does not seem to support it.
Wasm GC values (and thus Guile-on-Wasm values) are opaque to JavaScript; they can pass through JS by reference but if JS is to do something with them on its own, there need to be explicit conversions to and from JS, for example to unpack the integer in a fixnum. There will be a side Wasm library to do this.