runtime-decisions.txt 8.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189
  1. 2007-Oct-07
  2. -----------
  3. So far we've been very antagonistic towards any "clever" requirements
  4. in a runtime: if the semantic model doesn't *spell out* the
  5. optimization strategy, chances are no runtime will ever manage to be
  6. smart enough to "discover" one.
  7. This position has failed to address one important issue, and I need to
  8. nail it down before going on: how do you efficiently schedule and
  9. dispatch message-receives. If we can't do that efficiently, we're
  10. sunk. Efficient here means: constant space per proc, constant
  11. dispatching time.
  12. So let's sketch the situation. Note this is not necessarily related to
  13. handling dynamic alts and binding-channels-to-sets; it might be nice
  14. if it handles it, but not essential. This is for the case of a fixed
  15. set of ports and a fixed set of dispatching alts visible in the
  16. proc. The only variable size is the number of queued callers and the
  17. number
  18. So, naively -- worst case -- we have 1 run queue including the
  19. sleepers, we pull people out in random order, and each sleeper points
  20. to the target it's trying to enter, and the first one to enter a proc
  21. wins. That works, but it's effectively busy-waiting the sleepers when
  22. anyone is making progress; you only idle when *everyone* is blocked.
  23. The crucial problem is devising the constant-size data structure that
  24. maps N callers into queues in a variable partition of between 0 .. K
  25. queues, such that you can do the following in O(1):
  26. - add a member to, or remove a member from, a queue
  27. - figure out if there's anyone in a queue
  28. - pick a representative from that queue at random
  29. First, note the following facts:
  30. - doubling/halving vectors waste no more than 75% of their space.
  31. - 2-space copying collectors waste no more than 50% of their space
  32. So we build a 2+10N-word structure given N processes:
  33. - 2 4N-word vectors representing 2 spaces for d/h queue sub-vectors,
  34. densely packed, copied back and forth between the spaces as they
  35. need reallocation.
  36. - 2 control words representing "end of current space" in each space.
  37. - a 2N-word vector of 2-word control records that refer into
  38. the 2 spaces
  39. So a million processes require <=10 million words, or ~40mb, of
  40. scheduling queues on a 32-bit machine. Not great, but not bad. Bounded
  41. at least. And really, no process is going to eat less than 10 words of
  42. stack and control state anyways; we just don't want to wind up
  43. charging more for the queues than we do for the process abstraction in
  44. a neutral-ish, low-stack form (say an auto proc with small-ish ports).
  45. The proc can get by with 1 word in its header for every queue, and
  46. combine all the auto ports into a single queue. So for abundant
  47. million-copy auto-proc sort of procs, 1 word period: pointing to the
  48. queue ID in the entry scheduler. Yay.
  49. This gives us O(1) randomized scheduling on every queue assuming we
  50. use an O(1) PRNG (I like ISAAC). Then when you execute an alt, it
  51. picks 2 words from the PRNG, uses the first to pick a queue and the
  52. second to pick a caller within the queue to wake (or should it pick 1
  53. word mod total waiters, then distribute to the queue that held that
  54. waiter? slightly slower to dispatch if you have a large number of
  55. ports, but gives a different model of fairness...). Either way:
  56. simple, fast, easy to guess what will happen, smoothes out load lumps
  57. at every step of the system.
  58. That last question is curious. Suppose you have a proc with a million
  59. processes waiting on port A, and 1 process waiting on port B.
  60. Should the B process get a ~1/1million chance to run, or a ~1/2 chance
  61. to run? Which is easier to imagine the designer wanting more often?
  62. Which is easier for the designer to work around if they mean the other
  63. thing? I think the designer is more likely to want to get a guarantee
  64. of the latter (1/2 chance) because they have more to count on that
  65. way; they can always combine event streams together dynamically if
  66. they want the other sort of behavior. It's easy enough to just have a
  67. proc hold an internal work buffer that it mixes its own trouble into
  68. and processes as it wishes. Also, it doesn't hurt that the 1/2 case
  69. scales better :)
  70. Also note that if you have dynamic ports, you may have a larger-size
  71. scheduling pool.
  72. Woo! I think this issue is (finally) solved. I chewed on it for 24+
  73. hours solid. Bleah.
  74. 2007-Oct-10
  75. -----------
  76. Multithreading.
  77. Hmm. I am generally not interested in multithreading: it's not worth
  78. the pain. There are reasons you might want it, but they're few.
  79. But let's just jot down "the simplest possible MT solution" and see if
  80. it works. First, note that modern systems guarantee word-aligned writes
  81. are atomic and all have an atomic compare-and-swap (CAS) instruction.
  82. So you are never going to have to race over refcount operations or
  83. twiddling owner IDs or whatever. So:
  84. - Every directly limited type has a single owner, which implies a
  85. single reference. That much is obvious.
  86. - If I have a single reference and limitation infects, then the path
  87. from the thread register that owns me to me is all
  88. single-referenced values.
  89. - So: directly or indirectly limited slots make up a true
  90. tree. Always. There is always a true tree of process ownership,
  91. for example. Moving process references around only happens between
  92. portions of a tree-shaped data structure. If a thread can see node
  93. N in that tree, it is the only thread that can see the subtree
  94. under N.
  95. - The only stuff that might be "visible" to multiple threads
  96. simultaneously is the daggy "value" stuff. That's all refcounted
  97. along pointer edges. And we CoW any "value" into our own thread's
  98. tree any time we're performing an update: we don't expect other
  99. threads can "see" changes we make. We always edit "our own copy".
  100. There is no semantic "shared visibility mutation" in this
  101. language, never has been.
  102. - So all we need to reason about in MT terms is the CoW system and
  103. the message queue system, and interactions with the outside world.
  104. Which, unfortunately, we *always* have to reason about quite
  105. carefully anyways.
  106. - CoW is CAS-based. The path from your thread registers/stack to a
  107. heap memory write is *always* an lval-lookup LHS of an assignment
  108. statement. So to perform such a write you go step by step ensuring
  109. refcount=1. If you see a >1 edge you must pause, make a shallow
  110. copy, rewrite "your" rc=1 base value that's holding the shared
  111. edge to instead hold your copy, and carry on. When you get to the
  112. leaf you write.
  113. When you are forming a new reference to a shared value, you must
  114. increment the shared value refcount safely. You do this with a
  115. CAS.
  116. When you are dropping a refcount due to a value you're freeing,
  117. you must get the refcount to 0, then free. If you CAS the value
  118. 1->0 successfully, you're the guy who gets to free the memory. If
  119. you're unsuccessful, someone else beat you to it (or re-referenced
  120. the value) and you're off the hook.
  121. aliases permit concurrent reads but reading through them should
  122. always use a memory barrier pointer-read sort of thing to ensure a
  123. coherent view. Or maybe it's that writes to heap pages always need
  124. to run a memory barrier op on the containing object? I can never
  125. remember...
  126. - The message queue system is a little hairier, but not hugely. You
  127. have a single Big Scheduler Of Processes and a bunch -- say 64 or
  128. 512 -- of threads. You can activate only as many procs as you have
  129. threads at any moment, and you may occasionally have to have one
  130. wait on another due to a message send or an attempted proc
  131. lifecycle event; in those cases the proc becomes blocked, the
  132. thread sticks it back in the run queue, and it's "dormant data"
  133. for other threads to diddle. You only possibly get stuck when you
  134. have an imperative like "I'm dying" and you are trying to clean up
  135. a proc you own. You can post the proc-death event but it might
  136. take a moment to "really" die. So you have a sort of grim reaper
  137. pseudo-proc that you have to message synchronously to confirm the
  138. death of another proc.
  139. ... OR ... you never have 1 thread running a proc with another
  140. thread running a parent proc. In order to activate a proc you have
  141. to block -- all the way up to the root of the proc tree -- its
  142. parents. So they're only likely to give you a little moment to run
  143. in, not a lot. That is probably a better rule. Explicit priority
  144. delegation: no inversion possible! The thread scheduler has to
  145. perform the topological allocation.
  146. - The other hairy part is that the message-send queue, the proc run
  147. queue, and the idle thread queue probably all have to be some
  148. fancy MT-safe lockless queues. Because the threads are all going
  149. to be independently and concurrently hammering on those
  150. things. But such structures exist.
  151. ---
  152. Native pointers are limited, non-transmittable. Native functions are statically imported and
  153. statically bound, single-entry, CDECL. Crate defn can specify transitive closure of acceptable
  154. native dependencies, or * if you don't care.