hardcode.txt 8.5 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141
  1. Support for compilation into real machine code in CSL
  2. =====================================================
  3. First note that this does not really exist yet, and the facilities and
  4. ideas described here are part of an EXPERIMENT rather than stable
  5. and viable reality.
  6. An objective is that a single CSL image file should continue to work
  7. properly with any version of the system. This means that if run on a
  8. computer that uses a machine code where no hard-code compiler is available
  9. the system should run all byte-coded. It also suggests that there should
  10. be support for having multiple sets of machine code in the one image.
  11. The first part of realising this is to arrange that there is a numeric
  12. code value that is available at run-time and that identifies the current
  13. machine architecture. Part of this is set up in the CSL sources in the
  14. file "machine.h" by defining a symbol HARD_CODE_TAG, which is an integer
  15. in the range 0 to 31 (if I ever get close to having more different sorts
  16. of computer than that I will think some more!). The tag found in this way
  17. can be qualified (in csl.c) by setting some of the 0xe0 bits to distinguish
  18. between systems that "machine.h" does not. Eg this MIGHT be used to
  19. distinguish between 486, Pentium and Pentium-Pro optimisations in an
  20. Intel-targetted version, where different optimisations would be useful in
  21. the three cases. The value of hard_code_tag end up in the range 0 to 255
  22. with 0 reserved to mean "no hard code facility available for this
  23. architecture". The value of this tag is available at run-time on the
  24. features list (lispsystem!* in Standard Lisp mode) as an entry of the
  25. form (native . nnn).
  26. Hard coded functions live in a set of pages (hard_code_pages). In memory
  27. there will be just one such set of pages, and for a FIRST version at least
  28. their contents will be fixed and not subject to garbage collection. I
  29. hope that eventually it will be possible to have a garbage collector for this
  30. space though, and so the code in CSL makes provision for a second space
  31. of hard code that the main one could be copied across into. In generating
  32. machine code the possibility that even while machine code is being
  33. executed a garbage collection might want to relocate it should be considered,
  34. and for some architectures this thought may be sufficiently dreadful that
  35. it must be avoided even if on other systems it is permitted.
  36. In an image file there can be many different sets of hard code pages.
  37. Because preserving a new image only wants to over-write one of these
  38. (the one for the current machine type) these will not be stored as
  39. part of the main initial image itself, but as items rather like "fasl"
  40. modules. Well actually rather more like the way that help data is stored
  41. in the image file. I will support the idea that for codes 1 to 255
  42. the numeric code -1 to -255 can be passed in to the low-level file
  43. manipulation code in "preserve.c" (just as special values are used for
  44. IMAGE_CODE and HELP_CODE). At startup it can be checked whether a hard code
  45. module of the relevant sort exists, and if so it can be loaded. The format in
  46. the module will be a 2-byte integer showing how many pages are there followed
  47. by dumps of the data to go in each such page. On doing a "preserve" to write
  48. out a new checkpoint image the code will in effect preface the writing of the
  49. root portion of the image with a "(faslout 'HardCode<nn>);
  50. (write-out-the-hard-code)(faslend)" or whatever to put a copy of what is
  51. in memory back into the module. Well, I guess the correct strategy will
  52. be for it to do this if anything has been done that changes the set of
  53. things compiled into machine code.
  54. When hard code is loaded two more things have to be done. Firstly the
  55. code may need to be relocated, both to allow for the address in memory
  56. that it ends up at, and then to fix up places where it refers to entrypoints
  57. into the rest of CSL and to statically allocated data. This will be done by
  58. making the contents of a hard code page contain relevant relocation
  59. information so that it can be scanned when first loaded and the code patched
  60. up. The relocation and patch-up information needs to be designed so that
  61. a single relocation program can deal with all the possible relocation modes
  62. needed by various architectures, even though only a few will be implemented
  63. to start with. Secondly it will be necessary to point actual entrypoints
  64. into code towards the proper bits of compiled code. This last is achieved by
  65. having (in the main heap image) a list (hard_code) that gives such
  66. entrypoints. For every function that has been hard coded for at least one
  67. architecture this has an entry, so overall its structure is:
  68. ((f-name-1 nargs . details-1) (f-name-2 nargs . details-2) ...)
  69. The value nargs is an integer as used in symbol_set_definition (fns2.c)
  70. and where each "details" is a list of items
  71. details = ((type-and-page offset . env) ...)
  72. here type-and-page is an integer (a Lisp fixnum). The top 8 bits give the
  73. machine type that this entry refers to. If these bits are zero we are
  74. talking here about a byte-coded definition. Every function that is hard
  75. compiled for some architecture must have a bytecoded definition stored here
  76. (so it can be instated on systems where no hard code is available).
  77. Non-zero values are used when genuine hard code is available. The next
  78. 2 bits give four possibilities. The obvious one is when the code-pointer
  79. described here just goes into the natural function cell of the symbol and the
  80. other two function calls get filled with default values. The remaining three
  81. codes are used to make it possible to put a value into just the FN1, FN2 or
  82. FNN call of the function. The bottom 4 bits of the fixnum are TAG_FIXNUM
  83. to indicate that it is a small integer, so that leaves 18 over. These are
  84. used to specify a page-number in the hard code heap. Since pages are
  85. 64K or (more usually) 256K bytes large having 18 bits of page selection is
  86. comfortably generous for the moment. Lisp integer "offset" is a byte
  87. offset within the selected page. "env" is an arbitrary Lisp value (but very
  88. often a vector) that will be placed in the environment cell when the function
  89. is defined. Note that it may often (I hope) be that the same environment
  90. vector will be used for several different machine architectures, and in
  91. such cases the reference will be to a shared object, so space might not
  92. be too badly wasted.
  93. See relocate_hard_code() in restart.c and preserve_hard_code in preserve.c
  94. for some more details.
  95. The following Lisp functions may be used:
  96. (setq v (make-native n)) create handle on n bytes of native code space
  97. (putv-native v k w) put byte w at offset k
  98. (getv-native v k) retrieve byte (for checking)
  99. (putv-native v k w 1/2/4) as putv-native but the trailing integer arg
  100. (getv-native v k 1/2/4) .. says use 1, 2 or 4 byte value.
  101. (preserve) dumps all current native code for re-loading
  102. (native-address 'lispfn nargs) get address of entrypoint for function
  103. when called with n args
  104. (native-address n) integer n selects an address to hand back
  105. as an integer. See fns3.c for details
  106. The native code as created must include (put there by the person
  107. generating the code) relocation etc information.
  108. (symbol-set-native fname args bpsbase offset env)
  109. fname must be a symbol.
  110. (args & 0xff) is the number of args to the function. If other bits of args
  111. being set tell the system NOT to set the other 2 function cells to error
  112. calls, so USUALLY just use args=1,2 or 3.
  113. bpsbase is the value returnned earlier by (make-native nnn). Bytes in it
  114. must have been filled in by (native-putv ..) calls.
  115. offset is the offset within this vector that the entrypoint should be set
  116. to. The offset is needed because the first few bytes of the vector will need
  117. to hold relocation information for when the code is re-loaded. At present
  118. PLEASE start the contents of the vector at byte 8 leaving the first few bytes
  119. untouched. Sometimes (later on) a function taking variable numbers of args
  120. will also have several entrypoints into the same vector - another reason for
  121. having the offset.
  122. env is a thing to put in the environment cell of the function, and this will
  123. be passed as the first argument to any call, so it will probably usefully
  124. be a vector of literal Lisp objects that the function wants to use.
  125. Problem: I maybe want to support cross-compilation of native code, and for
  126. that the function symbol-set-native may need to be told what architecture
  127. the relevant code has been created for?