12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273 |
- x86 normally has 8 32-bit GPRs
- x86 + MMX has 8 32-bit GPRs and 8 64-bit MMX regs
- x86 + SSE has 8 32-bit GPRs and 8 128-bit XMM regs
- x64 has 16 64-bit GPRs and 16 128-bit XMM regs
- note that XMM/MMX regs cannot be used to address memory, but they're
- perfectly fine for ALU ops (in a sense: they work in banks that
- operate in parallel on their subcomponents, which is a bit odd)
- they also make lovely "level-0" spill slots. consider that there are
- effectively 16 (MMX) or 32 (SSE) GPRs of data available here, and on
- modern core chips they're all single-cycle access!
- So ... hmm. You can design a *really* fast fastcall with this.
- MMX is 10 years old (1997). Let's assume anyone who gives a damn has
- MMX. ELF showed up in the system 5 ABI from the same year, so
- ... seriously. It's supported. So you can at bare minimum treat each
- MMX reg as its own independent ALU GPR and ignore the high doubleword.
- (PAND, POR, PXOR, PADDD, PSUBD, PCMPEQD, PCMPGTD, PSRAD, PSRLD, PSLLD)
- so that buys us 8 more GPRs most of the time. we can probably tolerate
- simulating 64 bit ints using "2 32-bit GPRs"; it's not the most
- efficient use of the hardware on your desk, but it's easier to
- generate code that way, fewer special cases. when you're actually in
- 64-but mode we can scale the assumptions up, everything doubles.
- notes on GPR constraints / uses:
- EAX - GPR w/ subregs. used for return value in most call conventions.
- EBX - GPR w/ subregs. used for GOT pointer in ELF.
- ECX - GPR w/ subregs.
- EDX - GPR w/ subregs.
- EBP - GPR, named for base pointer, not needed since we know stack sizes
- ESI - GPR.
- EDI - GPR.
- ESP - GPR but almost always stack pointer. reserved?
- MMX0 .. MMX7 - GPR (with restrictions)
- theoretically the sky is the limit for using these. That's 16 32-bit
- GPRs on *most* x86 machines we're likely to encounter. we are using
- stack frames in heap segments so we always have to open-code our own
- stack fiddling code. which is fine.
- ok fine, we need a real register allocator and register-heavy calling
- convention for that.
- what's a nice way of modelling our needs? we have a few
- pseudo-variables (current process, current stack frame, current
- environment closure, current yield and return addresses) we probably
- need frequent access to, but do we just feed them all into a standard
- reg allocator and let it do its work? let's try that.
- feels like we won't be using call or ret at all, just jmp. that's
- fine. so a frame has a yield address and a ret address, and inside an
- iterator a yield?/yield!/yield+/yield* operation passes the *caller's*
- yield address up to the inner func; this is effectively a tail-yield
- (temporarily) until the inner func completes yielding and returns.
- Tail calls are done with become/become?/become!/become*/become+: if I
- become another func/func?/func!/func*/func+ call, my frame is
- destroyed and both yield address and return address are forwarded to
- the callee.
|