123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141 |
- Support for compilation into real machine code in CSL
- =====================================================
- First note that this does not really exist yet, and the facilities and
- ideas described here are part of an EXPERIMENT rather than stable
- and viable reality.
- An objective is that a single CSL image file should continue to work
- properly with any version of the system. This means that if run on a
- computer that uses a machine code where no hard-code compiler is available
- the system should run all byte-coded. It also suggests that there should
- be support for having multiple sets of machine code in the one image.
- The first part of realising this is to arrange that there is a numeric
- code value that is available at run-time and that identifies the current
- machine architecture. Part of this is set up in the CSL sources in the
- file "machine.h" by defining a symbol HARD_CODE_TAG, which is an integer
- in the range 0 to 31 (if I ever get close to having more different sorts
- of computer than that I will think some more!). The tag found in this way
- can be qualified (in csl.c) by setting some of the 0xe0 bits to distinguish
- between systems that "machine.h" does not. Eg this MIGHT be used to
- distinguish between 486, Pentium and Pentium-Pro optimisations in an
- Intel-targetted version, where different optimisations would be useful in
- the three cases. The value of hard_code_tag end up in the range 0 to 255
- with 0 reserved to mean "no hard code facility available for this
- architecture". The value of this tag is available at run-time on the
- features list (lispsystem!* in Standard Lisp mode) as an entry of the
- form (native . nnn).
- Hard coded functions live in a set of pages (hard_code_pages). In memory
- there will be just one such set of pages, and for a FIRST version at least
- their contents will be fixed and not subject to garbage collection. I
- hope that eventually it will be possible to have a garbage collector for this
- space though, and so the code in CSL makes provision for a second space
- of hard code that the main one could be copied across into. In generating
- machine code the possibility that even while machine code is being
- executed a garbage collection might want to relocate it should be considered,
- and for some architectures this thought may be sufficiently dreadful that
- it must be avoided even if on other systems it is permitted.
- In an image file there can be many different sets of hard code pages.
- Because preserving a new image only wants to over-write one of these
- (the one for the current machine type) these will not be stored as
- part of the main initial image itself, but as items rather like "fasl"
- modules. Well actually rather more like the way that help data is stored
- in the image file. I will support the idea that for codes 1 to 255
- the numeric code -1 to -255 can be passed in to the low-level file
- manipulation code in "preserve.c" (just as special values are used for
- IMAGE_CODE and HELP_CODE). At startup it can be checked whether a hard code
- module of the relevant sort exists, and if so it can be loaded. The format in
- the module will be a 2-byte integer showing how many pages are there followed
- by dumps of the data to go in each such page. On doing a "preserve" to write
- out a new checkpoint image the code will in effect preface the writing of the
- root portion of the image with a "(faslout 'HardCode<nn>);
- (write-out-the-hard-code)(faslend)" or whatever to put a copy of what is
- in memory back into the module. Well, I guess the correct strategy will
- be for it to do this if anything has been done that changes the set of
- things compiled into machine code.
- When hard code is loaded two more things have to be done. Firstly the
- code may need to be relocated, both to allow for the address in memory
- that it ends up at, and then to fix up places where it refers to entrypoints
- into the rest of CSL and to statically allocated data. This will be done by
- making the contents of a hard code page contain relevant relocation
- information so that it can be scanned when first loaded and the code patched
- up. The relocation and patch-up information needs to be designed so that
- a single relocation program can deal with all the possible relocation modes
- needed by various architectures, even though only a few will be implemented
- to start with. Secondly it will be necessary to point actual entrypoints
- into code towards the proper bits of compiled code. This last is achieved by
- having (in the main heap image) a list (hard_code) that gives such
- entrypoints. For every function that has been hard coded for at least one
- architecture this has an entry, so overall its structure is:
- ((f-name-1 nargs . details-1) (f-name-2 nargs . details-2) ...)
- The value nargs is an integer as used in symbol_set_definition (fns2.c)
- and where each "details" is a list of items
- details = ((type-and-page offset . env) ...)
- here type-and-page is an integer (a Lisp fixnum). The top 8 bits give the
- machine type that this entry refers to. If these bits are zero we are
- talking here about a byte-coded definition. Every function that is hard
- compiled for some architecture must have a bytecoded definition stored here
- (so it can be instated on systems where no hard code is available).
- Non-zero values are used when genuine hard code is available. The next
- 2 bits give four possibilities. The obvious one is when the code-pointer
- described here just goes into the natural function cell of the symbol and the
- other two function calls get filled with default values. The remaining three
- codes are used to make it possible to put a value into just the FN1, FN2 or
- FNN call of the function. The bottom 4 bits of the fixnum are TAG_FIXNUM
- to indicate that it is a small integer, so that leaves 18 over. These are
- used to specify a page-number in the hard code heap. Since pages are
- 64K or (more usually) 256K bytes large having 18 bits of page selection is
- comfortably generous for the moment. Lisp integer "offset" is a byte
- offset within the selected page. "env" is an arbitrary Lisp value (but very
- often a vector) that will be placed in the environment cell when the function
- is defined. Note that it may often (I hope) be that the same environment
- vector will be used for several different machine architectures, and in
- such cases the reference will be to a shared object, so space might not
- be too badly wasted.
- See relocate_hard_code() in restart.c and preserve_hard_code in preserve.c
- for some more details.
- The following Lisp functions may be used:
- (setq v (make-native n)) create handle on n bytes of native code space
- (putv-native v k w) put byte w at offset k
- (getv-native v k) retrieve byte (for checking)
- (putv-native v k w 1/2/4) as putv-native but the trailing integer arg
- (getv-native v k 1/2/4) .. says use 1, 2 or 4 byte value.
- (preserve) dumps all current native code for re-loading
- (native-address 'lispfn nargs) get address of entrypoint for function
- when called with n args
- (native-address n) integer n selects an address to hand back
- as an integer. See fns3.c for details
- The native code as created must include (put there by the person
- generating the code) relocation etc information.
- (symbol-set-native fname args bpsbase offset env)
- fname must be a symbol.
- (args & 0xff) is the number of args to the function. If other bits of args
- being set tell the system NOT to set the other 2 function cells to error
- calls, so USUALLY just use args=1,2 or 3.
- bpsbase is the value returnned earlier by (make-native nnn). Bytes in it
- must have been filled in by (native-putv ..) calls.
- offset is the offset within this vector that the entrypoint should be set
- to. The offset is needed because the first few bytes of the vector will need
- to hold relocation information for when the code is re-loaded. At present
- PLEASE start the contents of the vector at byte 8 leaving the first few bytes
- untouched. Sometimes (later on) a function taking variable numbers of args
- will also have several entrypoints into the same vector - another reason for
- having the offset.
- env is a thing to put in the environment cell of the function, and this will
- be passed as the first argument to any call, so it will probably usefully
- be a vector of literal Lisp objects that the function wants to use.
- Problem: I maybe want to support cross-compilation of native code, and for
- that the function symbol-set-native may need to be told what architecture
- the relevant code has been created for?
|