rillian
/
rust-prehistory


			
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233
							
suppose rust uses a module system similar to the flatt & owens one.

we have 2 types that make it up: units and modules.

both are 1st class values: they can be passed around as values.

a module defines a single letrec of types and constants. modules are
*always* concrete, usable values. the types can be inhabited by
values, the constants can be accessed and used. A top-level module
needs to have all its dependencies satisfied to compile properly.

a unit defines a statically-parameterized module. units cannot be
"reached into". units have imports and exports. the set of imports and
exports, as well as any relationships between those, makes up a unit
type for the unit.

units can be wired together into composite units, in a specific
expression form, and if a unit value has no imports remaining it can
be instantiated into a module containing all of its exports.

the compilation model of rust is then that you write your code in a
bunch of files and then you write a top-level "compilation entry"
module that implements a standard tooling interface: one that, when
it is itself compiled, produces a data structure that the compiler
uses to drive compilation of the rest of your files into a module.

the module produced by compile time should probably itself be one that
implements a load-time tooling interface, and only *that* causes the
construction of runtime artifacts (rather than baking a set of
load-time actions into the language semantics).

the module produced by compilation, and the subsequent module produced
by loading, may not be done of course; you can always have it depend
on some further runtime parameterization. but this organization gives
you the *option* to do quite a lot of compile-time and load-time
metaprogramming.


so erm...


         units: static parts with imports and exports; what the compiler produces and the linker consumes
       modules: records of constants (including functions and types) that can be projected to interfaces
      programs: templates for processes
     processes: identities, running programs; each copy differs, arranged in a tree
   pure values: values constructed from literals and type-constructors alone
general values: pure values AND channels to processes (that may go dead)
limited values: general values AND processes

all copies of general values are identical; they may be freely
copied. limited values may not be copied, as they represent
identities.

All type constructors (including modules and functions) may be pure,
general or limited. Purity requires that all subcomponents -- and all
arguments to subcomponents -- are pure. Limitation requires that all
*supercomponents* -- and all functions taking or returning the limited
type -- are limited. Purity is delicate, limitation is
infectious. 

Modules may implement a particular interface. There's an explicit
structural subtyping "action" that can coerce a module of one
interface to a subtype-compatible interface. This produces a remapping
table on the fly.

Note that functions can accept or return limited values, whereas
channels cannot; channels can also not be pure, by definition. A
general channel can therefore be promoted to a general function, and
this is performed automatically by the explicit subtyping operation on
interfaces (as is every form of type-state weakening on function
signatures that follows subtype logic).

When you spawn a process, you get a limited module holding the
process, and a collection of channels for all the process' ports.
This limited module can be coerced down to a non-limited (general)
interface full of functions, which is the usual mechanism.


---- snip ----

No. Let's back off from recursive "modules". It's a difficult and
researchy area and we're not in the business of doing research. Rust
is supposed to *work* predictably and reliably, even if that means
being a little sub-optimal. Let's go back to the original design:

A crate is a unit of compilation, loading, naming and distribution:
an ELF .so or PE .dll.

A crate depends on other crates in a strict DAG. 

crate dependencies are expressed by label, label+version, or
label+version+authenticators (SHA1 hashes or such). We call each of
these a crate specifier. A given specifier gets a versioned list of
crates at runtime, so you can do a little bit of hot-code
upgrade/rollback.

crate linkage requires type compatibility on transparent exports, and
specifier refinement for opaque exports. If P and Q both require
specifiers that can be satisfied by the same crate R, then their
dependencies on R are merged into a composite dependency that has the
more-specific of the two dependencies.

inter-crate calls twiddle crate refcounts.

double vision: A uses B and C. B and C use D. D exports T. B defines
a function f(int)->T and C defines a function g(T)->int. Code in A 
tries to do g(f(10)). Does this typecheck?

Let's assume T is opaquely exported because if it's transparent, it's
simple. So for the opaque case, typechecking succeeds if we decide
the two Ts are the same. How?

Every name reference (type or otherwise) resolves to a number inside a
specifier. Every specifier in the dependency set of a crate causes the
allocation of a crate-slot at runtime and is filled in by a crate that
satisfies the specifier. For B or C to export f or g, they have to
export their view of T, which means they have to export their
specifiers for D. In other words: each crate exports a set of names
and a map from those names to offsets in runtime crate-slots, each of
which is loaded in order to satisfy a specifier. So the crate has to
statically export every specifier used to support a name it's
exporting, as part of the description of the exported name.

A imports B and C, so its set of specifiers is at least the union of
the exported specifiers of B and C. This includes two specifiers for
D, of possibly-varying specificity. The expression g(f(10)) checks iff
the specifiers for D from B and C can be merged into a single
specifier / dependency for D.

Conventional "double vision" as dreyer complains about is only a
problem in the context of *recursive* modules, and we're doing a
system that is not concerned with that. Recursion *within* a crate is
fine, recursion *between* crates is verboten.

This is a perfectly reasonable design, allows medium-level static
recursion (within the crate), and has important practical
interpretation (fast loading, clear load boundaries, hot loading, hot
unloading, etc.) so I am happy with the restriction.

Question: do we support existential ADTs? Or just plugs? I would say
the latter, but what about "pure data structures", eg. "associative
map"? How do you abstract it away w/o abstracting away from
concurrency / direct function-call access? An existential would be
ok... but syntactically perhaps awkward. Seems also to duplicate some
of the machinery of crates. And plugs. Bleah.

Maybe a plug should be an existential? Well sure. But let's call it a
module.

assoc_map = mod[T] { type C[T];
                     fun* each(C[T])->T;
                     fun* has(C[T],T)->bool; }


Here is the key point: a module is not a functor, is not recursive,
and does not imply independent compilation. A crate, a record *or* a
process can be used to implement a module type.

--- aug 2009 ---


a worked-out container module example


type map[k,v] = mod { type t;
                      fn new(fn compare(~k a,~k b)->bool) -> t;
                      fn set(^t m, ~k key, ~v val) -> ();
                      fn? find(~t m, ~k key) -> ~v; };

mod treemap[k,v] {
    type t = rec(...);
    fn new() -> t {
        ...
    }
    fn set(^t m, ~k key, ~v val) -> () {
        ...
    }
    fn? find(~t m, ~k key) -> ~v {
        ...
    }
}

fn get_a_map_impl[k,v]() -> map[k,v] {
    typecase (k) { 
       ... 
    }
}

fn compare(~int a, ~int b) -> bool { ... }
let map[int,str] mv = get_a_map_impl[int,str]();
mod m = mapv;

// or ...


mod m = get_a_map_impl[int,str]();

auto t = m.new(compare);
m.set(t, 10, "hello");
m.set(t, 12, "there");


// Note that from a semantics level, ophel suggests can treat generic
// items as modules with an opaque type:

type modt = mod { type t; fn f(~t) -> (); } 

~=

fn[t] f(~t) -> () {...}

at least for the purposes of passing them around. I think this is a possible de-sugaring if you're
doing a proof in coq, but I don't want to force users to write this way. The "throwaway module whose
only purpose is to parameterize its contained item" idiom seems clunky at best. Also it has a 
weird falling-apart-ness when it comes to generic *types*:

eg: how would you do the map case?

fn mk_map(mod {type k; type v; } params) -> 
   mod { type t; 
         fn new() -> t;
         fn add(~params.k, ~params.v) -> ();
         fn? find(~params.k) -> ~params.v; };

now we run into the sticky question of the type of the return value of mk_map.  I think it's not
denotable in this language; or if it is it involves nutty things like existentials and higher-kinds.
even ophel defers here and says "let's have a separate binding form map[T]" for a type constructor.