spawn.txt 3.6 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556575859606162636465666768697071727374757677787980818283848586878889909192939495969798
  1. ==========================================================
  2. Parallel & Spawn
  3. ==========================================================
  4. Nim has two flavors of parallelism:
  5. 1) `Structured`:idx parallelism via the ``parallel`` statement.
  6. 2) `Unstructured`:idx: parallelism via the standalone ``spawn`` statement.
  7. Both need the [threadpool](threadpool.html) module to work.
  8. Somewhat confusingly, ``spawn`` is also used in the ``parallel`` statement
  9. with slightly different semantics. ``spawn`` always takes a call expression of
  10. the form ``f(a, ...)``. Let ``T`` be ``f``'s return type. If ``T`` is ``void``
  11. then ``spawn``'s return type is also ``void``. Within a ``parallel`` section
  12. ``spawn``'s return type is ``T``, otherwise it is ``FlowVar[T]``.
  13. The compiler can ensure the location in ``location = spawn f(...)`` is not
  14. read prematurely within a ``parallel`` section and so there is no need for
  15. the overhead of an indirection via ``FlowVar[T]`` to ensure correctness.
  16. Spawn statement
  17. ===============
  18. A standalone ``spawn`` statement is a simple construct. It executes
  19. the passed expression on the thread pool and returns a `data flow variable`:idx:
  20. ``FlowVar[T]`` that can be read from. The reading with the ``^`` operator is
  21. **blocking**. However, one can use ``blockUntilAny`` to wait on multiple flow
  22. variables at the same time:
  23. ```nim
  24. import std/threadpool, ...
  25. # wait until 2 out of 3 servers received the update:
  26. proc main =
  27. var responses = newSeq[FlowVarBase](3)
  28. for i in 0..2:
  29. responses[i] = spawn tellServer(Update, "key", "value")
  30. var index = blockUntilAny(responses)
  31. assert index >= 0
  32. responses.del(index)
  33. discard blockUntilAny(responses)
  34. ```
  35. Data flow variables ensure that no data races
  36. are possible. Due to technical limitations not every type ``T`` is possible in
  37. a data flow variable: ``T`` has to be of the type ``ref``, ``string``, ``seq``
  38. or of a type that doesn't contain a type that is garbage collected. This
  39. restriction will be removed in the future.
  40. Parallel statement
  41. ==================
  42. Example:
  43. ```nim
  44. # Compute PI in an inefficient way
  45. import std/[strutils, math, threadpool]
  46. proc term(k: float): float = 4 * math.pow(-1, k) / (2*k + 1)
  47. proc pi(n: int): float =
  48. var ch = newSeq[float](n+1)
  49. parallel:
  50. for k in 0..ch.high:
  51. ch[k] = spawn term(float(k))
  52. for k in 0..ch.high:
  53. result += ch[k]
  54. echo formatFloat(pi(5000))
  55. ```
  56. The parallel statement is the preferred mechanism to introduce parallelism
  57. in a Nim program. A subset of the Nim language is valid within a
  58. ``parallel`` section. This subset is checked to be free of data races at
  59. compile time. A sophisticated `disjoint checker`:idx: ensures that no data
  60. races are possible even though shared memory is extensively supported!
  61. The subset is in fact the full language with the following
  62. restrictions / changes:
  63. * ``spawn`` within a ``parallel`` section has special semantics.
  64. * Every location of the form ``a[i]`` and ``a[i..j]`` and ``dest`` where
  65. ``dest`` is part of the pattern ``dest = spawn f(...)`` has to be
  66. provably disjoint. This is called the *disjoint check*.
  67. * Every other complex location ``loc`` that is used in a spawned
  68. proc (``spawn f(loc)``) has to be immutable for the duration of
  69. the ``parallel`` section. This is called the *immutability check*. Currently
  70. it is not specified what exactly "complex location" means. We need to make
  71. this an optimization!
  72. * Every array access has to be provably within bounds. This is called
  73. the *bounds check*.
  74. * Slices are optimized so that no copy is performed. This optimization is not
  75. yet performed for ordinary slices outside of a ``parallel`` section.