reg-alloc.txt 2.8 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273
  1. x86 normally has 8 32-bit GPRs
  2. x86 + MMX has 8 32-bit GPRs and 8 64-bit MMX regs
  3. x86 + SSE has 8 32-bit GPRs and 8 128-bit XMM regs
  4. x64 has 16 64-bit GPRs and 16 128-bit XMM regs
  5. note that XMM/MMX regs cannot be used to address memory, but they're
  6. perfectly fine for ALU ops (in a sense: they work in banks that
  7. operate in parallel on their subcomponents, which is a bit odd)
  8. they also make lovely "level-0" spill slots. consider that there are
  9. effectively 16 (MMX) or 32 (SSE) GPRs of data available here, and on
  10. modern core chips they're all single-cycle access!
  11. So ... hmm. You can design a *really* fast fastcall with this.
  12. MMX is 10 years old (1997). Let's assume anyone who gives a damn has
  13. MMX. ELF showed up in the system 5 ABI from the same year, so
  14. ... seriously. It's supported. So you can at bare minimum treat each
  15. MMX reg as its own independent ALU GPR and ignore the high doubleword.
  16. (PAND, POR, PXOR, PADDD, PSUBD, PCMPEQD, PCMPGTD, PSRAD, PSRLD, PSLLD)
  17. so that buys us 8 more GPRs most of the time. we can probably tolerate
  18. simulating 64 bit ints using "2 32-bit GPRs"; it's not the most
  19. efficient use of the hardware on your desk, but it's easier to
  20. generate code that way, fewer special cases. when you're actually in
  21. 64-but mode we can scale the assumptions up, everything doubles.
  22. notes on GPR constraints / uses:
  23. EAX - GPR w/ subregs. used for return value in most call conventions.
  24. EBX - GPR w/ subregs. used for GOT pointer in ELF.
  25. ECX - GPR w/ subregs.
  26. EDX - GPR w/ subregs.
  27. EBP - GPR, named for base pointer, not needed since we know stack sizes
  28. ESI - GPR.
  29. EDI - GPR.
  30. ESP - GPR but almost always stack pointer. reserved?
  31. MMX0 .. MMX7 - GPR (with restrictions)
  32. theoretically the sky is the limit for using these. That's 16 32-bit
  33. GPRs on *most* x86 machines we're likely to encounter. we are using
  34. stack frames in heap segments so we always have to open-code our own
  35. stack fiddling code. which is fine.
  36. ok fine, we need a real register allocator and register-heavy calling
  37. convention for that.
  38. what's a nice way of modelling our needs? we have a few
  39. pseudo-variables (current process, current stack frame, current
  40. environment closure, current yield and return addresses) we probably
  41. need frequent access to, but do we just feed them all into a standard
  42. reg allocator and let it do its work? let's try that.
  43. feels like we won't be using call or ret at all, just jmp. that's
  44. fine. so a frame has a yield address and a ret address, and inside an
  45. iterator a yield?/yield!/yield+/yield* operation passes the *caller's*
  46. yield address up to the inner func; this is effectively a tail-yield
  47. (temporarily) until the inner func completes yielding and returns.
  48. Tail calls are done with become/become?/become!/become*/become+: if I
  49. become another func/func?/func!/func*/func+ call, my frame is
  50. destroyed and both yield address and return address are forwarded to
  51. the callee.