libitm.texi 34 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775
  1. \input texinfo @c -*-texinfo-*-
  2. @c %**start of header
  3. @setfilename libitm.info
  4. @settitle GNU libitm
  5. @c %**end of header
  6. @copying
  7. Copyright @copyright{} 2011-2015 Free Software Foundation, Inc.
  8. Permission is granted to copy, distribute and/or modify this document
  9. under the terms of the GNU Free Documentation License, Version 1.2 or
  10. any later version published by the Free Software Foundation; with no
  11. Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts.
  12. A copy of the license is included in the section entitled ``GNU
  13. Free Documentation License''.
  14. @end copying
  15. @ifinfo
  16. @dircategory GNU Libraries
  17. @direntry
  18. * libitm: (libitm). GNU Transactional Memory Library
  19. @end direntry
  20. This manual documents the GNU Transactional Memory Library.
  21. @insertcopying
  22. @end ifinfo
  23. @setchapternewpage odd
  24. @titlepage
  25. @title The GNU Transactional Memory Library
  26. @page
  27. @vskip 0pt plus 1filll
  28. @comment For the @value{version-GCC} Version*
  29. @sp 1
  30. @insertcopying
  31. @end titlepage
  32. @summarycontents
  33. @contents
  34. @page
  35. @node Top
  36. @top Introduction
  37. @cindex Introduction
  38. This manual documents the usage and internals of libitm, the GNU Transactional
  39. Memory Library. It provides transaction support for accesses to a process'
  40. memory, enabling easy-to-use synchronization of accesses to shared memory by
  41. several threads.
  42. @comment
  43. @comment When you add a new menu item, please keep the right hand
  44. @comment aligned to the same column. Do not use tabs. This provides
  45. @comment better formatting.
  46. @comment
  47. @menu
  48. * Enabling libitm:: How to enable libitm for your applications.
  49. * C/C++ Language Constructs for TM::
  50. Notes on the language-level interface supported
  51. by gcc.
  52. * The libitm ABI:: Notes on the external ABI provided by libitm.
  53. * Internals:: Notes on libitm's internal synchronization.
  54. * GNU Free Documentation License::
  55. How you can copy and share this manual.
  56. * Library Index:: Index of this documentation.
  57. @end menu
  58. @c ---------------------------------------------------------------------
  59. @c Enabling libitm
  60. @c ---------------------------------------------------------------------
  61. @node Enabling libitm
  62. @chapter Enabling libitm
  63. To activate support for TM in C/C++, the compile-time flag @option{-fgnu-tm}
  64. must be specified. This enables TM language-level constructs such as
  65. transaction statements (e.g., @code{__transaction_atomic}, @pxref{C/C++
  66. Language Constructs for TM} for details).
  67. @c ---------------------------------------------------------------------
  68. @c C/C++ Language Constructs for TM
  69. @c ---------------------------------------------------------------------
  70. @node C/C++ Language Constructs for TM
  71. @chapter C/C++ Language Constructs for TM
  72. Transactions are supported in C++ and C in the form of transaction statements,
  73. transaction expressions, and function transactions. In the following example,
  74. both @code{a} and @code{b} will be read and the difference will be written to
  75. @code{c}, all atomically and isolated from other transactions:
  76. @example
  77. __transaction_atomic @{ c = a - b; @}
  78. @end example
  79. Therefore, another thread can use the following code to concurrently update
  80. @code{b} without ever causing @code{c} to hold a negative value (and without
  81. having to use other synchronization constructs such as locks or C++11
  82. atomics):
  83. @example
  84. __transaction_atomic @{ if (a > b) b++; @}
  85. @end example
  86. GCC follows the @uref{https://sites.google.com/site/tmforcplusplus/, Draft
  87. Specification of Transactional Language Constructs for C++ (v1.1)} in its
  88. implementation of transactions.
  89. The precise semantics of transactions are defined in terms of the C++11/C11
  90. memory model (see the specification). Roughly, transactions provide
  91. synchronization guarantees that are similar to what would be guaranteed when
  92. using a single global lock as a guard for all transactions. Note that like
  93. other synchronization constructs in C/C++, transactions rely on a
  94. data-race-free program (e.g., a nontransactional write that is concurrent
  95. with a transactional read to the same memory location is a data race).
  96. @c ---------------------------------------------------------------------
  97. @c The libitm ABI
  98. @c ---------------------------------------------------------------------
  99. @node The libitm ABI
  100. @chapter The libitm ABI
  101. The ABI provided by libitm is basically equal to the Linux variant of Intel's
  102. current TM ABI specification document (Revision 1.1, May 6 2009) but with the
  103. differences listed in this chapter. It would be good if these changes would
  104. eventually be merged into a future version of this specification. To ease
  105. look-up, the following subsections mirror the structure of this specification.
  106. @section [No changes] Objectives
  107. @section [No changes] Non-objectives
  108. @section Library design principles
  109. @subsection [No changes] Calling conventions
  110. @subsection [No changes] TM library algorithms
  111. @subsection [No changes] Optimized load and store routines
  112. @subsection [No changes] Aligned load and store routines
  113. @subsection Data logging functions
  114. The memory locations accessed with transactional loads and stores and the
  115. memory locations whose values are logged must not overlap. This required
  116. separation only extends to the scope of the execution of one transaction
  117. including all the executions of all nested transactions.
  118. The compiler must be consistent (within the scope of a single transaction)
  119. about which memory locations are shared and which are not shared with other
  120. threads (i.e., data must be accessed either transactionally or
  121. nontransactionally). Otherwise, non-write-through TM algorithms would not work.
  122. For memory locations on the stack, this requirement extends to only the
  123. lifetime of the stack frame that the memory location belongs to (or the
  124. lifetime of the transaction, whichever is shorter). Thus, memory that is
  125. reused for several stack frames could be target of both data logging and
  126. transactional accesses; however, this is harmless because these stack frames'
  127. lifetimes will end before the transaction finishes.
  128. @subsection [No changes] Scatter/gather calls
  129. @subsection [No changes] Serial and irrevocable mode
  130. @subsection [No changes] Transaction descriptor
  131. @subsection Store allocation
  132. There is no @code{getTransaction} function.
  133. @subsection [No changes] Naming conventions
  134. @subsection Function pointer encryption
  135. Currently, this is not implemented.
  136. @section Types and macros list
  137. @code{_ITM_codeProperties} has changed, @pxref{txn-code-properties,,Starting a
  138. transaction}.
  139. @code{_ITM_srcLocation} is not used.
  140. @section Function list
  141. @subsection Initialization and finalization functions
  142. These functions are not part of the ABI.
  143. @subsection [No changes] Version checking
  144. @subsection [No changes] Error reporting
  145. @subsection [No changes] inTransaction call
  146. @subsection State manipulation functions
  147. There is no @code{getTransaction} function. Transaction identifiers for
  148. nested transactions will be ordered but not necessarily sequential (i.e., for
  149. a nested transaction's identifier @var{IN} and its enclosing transaction's
  150. identifier @var{IE}, it is guaranteed that @math{IN >= IE}).
  151. @subsection [No changes] Source locations
  152. @subsection Starting a transaction
  153. @subsubsection Transaction code properties
  154. @anchor{txn-code-properties}
  155. The bit @code{hasNoXMMUpdate} is instead called @code{hasNoVectorUpdate}.
  156. Iff it is set, vector register save/restore is not necessary for any target
  157. machine.
  158. The @code{hasNoFloatUpdate} bit (@code{0x0010}) is new. Iff it is set, floating
  159. point register save/restore is not necessary for any target machine.
  160. @code{undoLogCode} is not supported and a fatal runtime error will be raised
  161. if this bit is set. It is not properly defined in the ABI why barriers
  162. other than undo logging are not present; Are they not necessary (e.g., a
  163. transaction operating purely on thread-local data) or have they been omitted by
  164. the compiler because it thinks that some kind of global synchronization
  165. (e.g., serial mode) might perform better? The specification suggests that the
  166. latter might be the case, but the former seems to be more useful.
  167. The @code{readOnly} bit (@code{0x4000}) is new. @strong{TODO} Lexical or dynamic
  168. scope?
  169. @code{hasNoRetry} is not supported. If this bit is not set, but
  170. @code{hasNoAbort} is set, the library can assume that transaction
  171. rollback will not be requested.
  172. It would be useful if the absence of externally-triggered rollbacks would be
  173. reported for the dynamic scope as well, not just for the lexical scope
  174. (@code{hasNoAbort}). Without this, a library cannot exploit this together
  175. with flat nesting.
  176. @code{exceptionBlock} is not supported because exception blocks are not used.
  177. @subsubsection [No changes] Windows exception state
  178. @subsubsection [No changes] Other machine state
  179. @subsubsection [No changes] Results from beginTransaction
  180. @subsection Aborting a transaction
  181. @code{_ITM_rollbackTransaction} is not supported. @code{_ITM_abortTransaction}
  182. is supported but the abort reasons @code{exceptionBlockAbort},
  183. @code{TMConflict}, and @code{userRetry} are not supported. There are no
  184. exception blocks in general, so the related cases also do not have to be
  185. considered. To encode @code{__transaction_cancel [[outer]]}, compilers must
  186. set the new @code{outerAbort} bit (@code{0x10}) additionally to the
  187. @code{userAbort} bit in the abort reason.
  188. @subsection Committing a transaction
  189. The exception handling (EH) scheme is different. The Intel ABI requires the
  190. @code{_ITM_tryCommitTransaction} function that will return even when the
  191. commit failed and will have to be matched with calls to either
  192. @code{_ITM_abortTransaction} or @code{_ITM_commitTransaction}. In contrast,
  193. gcc relies on transactional wrappers for the functions of the Exception
  194. Handling ABI and on one additional commit function (shown below). This allows
  195. the TM to keep track of EH internally and thus it does not have to embed the
  196. cleanup of EH state into the existing EH code in the program.
  197. @code{_ITM_tryCommitTransaction} is not supported.
  198. @code{_ITM_commitTransactionToId} is also not supported because the
  199. propagation of thrown exceptions will not bypass commits of nested
  200. transactions.
  201. @example
  202. void _ITM_commitTransactionEH(void *exc_ptr) ITM_REGPARM;
  203. void *_ITM_cxa_allocate_exception (size_t);
  204. void _ITM_cxa_throw (void *obj, void *tinfo, void *dest);
  205. void *_ITM_cxa_begin_catch (void *exc_ptr);
  206. void _ITM_cxa_end_catch (void);
  207. @end example
  208. @code{_ITM_commitTransactionEH} must be called to commit a transaction if an
  209. exception could be in flight at this position in the code. @code{exc_ptr} is
  210. the current exception or zero if there is no current exception.
  211. The @code{_ITM_cxa...} functions are transactional wrappers for the respective
  212. @code{__cxa...} functions and must be called instead of these in transactional
  213. code.
  214. To support this EH scheme, libstdc++ needs to provide one additional function
  215. (@code{_cxa_tm_cleanup}), which is used by the TM to clean up the exception
  216. handling state while rolling back a transaction:
  217. @example
  218. void __cxa_tm_cleanup (void *unthrown_obj, void *cleanup_exc,
  219. unsigned int caught_count);
  220. @end example
  221. @code{unthrown_obj} is non-null if the program called
  222. @code{__cxa_allocate_exception} for this exception but did not yet called
  223. @code{__cxa_throw} for it. @code{cleanup_exc} is non-null if the program is
  224. currently processing a cleanup along an exception path but has not caught this
  225. exception yet. @code{caught_count} is the nesting depth of
  226. @code{__cxa_begin_catch} within the transaction (which can be counted by the TM
  227. using @code{_ITM_cxa_begin_catch} and @code{_ITM_cxa_end_catch});
  228. @code{__cxa_tm_cleanup} then performs rollback by essentially performing
  229. @code{__cxa_end_catch} that many times.
  230. @subsection Exception handling support
  231. Currently, there is no support for functionality like
  232. @code{__transaction_cancel throw} as described in the C++ TM specification.
  233. Supporting this should be possible with the EH scheme explained previously
  234. because via the transactional wrappers for the EH ABI, the TM is able to
  235. observe and intercept EH.
  236. @subsection [No changes] Transition to serial--irrevocable mode
  237. @subsection [No changes] Data transfer functions
  238. @subsection [No changes] Transactional memory copies
  239. @subsection Transactional versions of memmove
  240. If either the source or destination memory region is to be accessed
  241. nontransactionally, then source and destination regions must not be
  242. overlapping. The respective @code{_ITM_memmove} functions are still
  243. available but a fatal runtime error will be raised if such regions do overlap.
  244. To support this functionality, the ABI would have to specify how the
  245. intersection of the regions has to be accessed (i.e., transactionally or
  246. nontransactionally).
  247. @subsection [No changes] Transactional versions of memset
  248. @subsection [No changes] Logging functions
  249. @subsection User-registered commit and undo actions
  250. Commit actions will get executed in the same order in which the respective
  251. calls to @code{_ITM_addUserCommitAction} happened. Only
  252. @code{_ITM_noTransactionId} is allowed as value for the
  253. @code{resumingTransactionId} argument. Commit actions get executed after
  254. privatization safety has been ensured.
  255. Undo actions will get executed in reverse order compared to the order in which
  256. the respective calls to @code{_ITM_addUserUndoAction} happened. The ordering of
  257. undo actions w.r.t. the roll-back of other actions (e.g., data transfers or
  258. memory allocations) is undefined.
  259. @code{_ITM_getThreadnum} is not supported currently because its only purpose
  260. is to provide a thread ID that matches some assumed performance tuning output,
  261. but this output is not part of the ABI nor further defined by it.
  262. @code{_ITM_dropReferences} is not supported currently because its semantics and
  263. the intention behind it is not entirely clear. The
  264. specification suggests that this function is necessary because of certain
  265. orderings of data transfer undos and the releasing of memory regions (i.e.,
  266. privatization). However, this ordering is never defined, nor is the ordering of
  267. dropping references w.r.t. other events.
  268. @subsection [New] Transactional indirect calls
  269. Indirect calls (i.e., calls through a function pointer) within transactions
  270. should execute the transactional clone of the original function (i.e., a clone
  271. of the original that has been fully instrumented to use the TM runtime), if
  272. such a clone is available. The runtime provides two functions to
  273. register/deregister clone tables:
  274. @example
  275. struct clone_entry
  276. @{
  277. void *orig, *clone;
  278. @};
  279. void _ITM_registerTMCloneTable (clone_entry *table, size_t entries);
  280. void _ITM_deregisterTMCloneTable (clone_entry *table);
  281. @end example
  282. Registered tables must be writable by the TM runtime, and must be live
  283. throughout the life-time of the TM runtime.
  284. @strong{TODO} The intention was always to drop the registration functions
  285. entirely, and create a new ELF Phdr describing the linker-sorted table. Much
  286. like what currently happens for @code{PT_GNU_EH_FRAME}.
  287. This work kept getting bogged down in how to represent the @var{N} different
  288. code generation variants. We clearly needed at least two---SW and HW
  289. transactional clones---but there was always a suggestion of more variants for
  290. different TM assumptions/invariants.
  291. The compiler can then use two TM runtime functions to perform indirect calls in
  292. transactions:
  293. @example
  294. void *_ITM_getTMCloneOrIrrevocable (void *function) ITM_REGPARM;
  295. void *_ITM_getTMCloneSafe (void *function) ITM_REGPARM;
  296. @end example
  297. If there is a registered clone for supplied function, both will return a
  298. pointer to the clone. If not, the first runtime function will attempt to switch
  299. to serial--irrevocable mode and return the original pointer, whereas the second
  300. will raise a fatal runtime error.
  301. @subsection [New] Transactional dynamic memory management
  302. @example
  303. void *_ITM_malloc (size_t)
  304. __attribute__((__malloc__)) ITM_PURE;
  305. void *_ITM_calloc (size_t, size_t)
  306. __attribute__((__malloc__)) ITM_PURE;
  307. void _ITM_free (void *) ITM_PURE;
  308. @end example
  309. These functions are essentially transactional wrappers for @code{malloc},
  310. @code{calloc}, and @code{free}. Within transactions, the compiler should
  311. replace calls to the original functions with calls to the wrapper functions.
  312. @section [No changes] Future Enhancements to the ABI
  313. @section Sample code
  314. The code examples might not be correct w.r.t. the current version of the ABI,
  315. especially everything related to exception handling.
  316. @section [New] Memory model
  317. The ABI should define a memory model and the ordering that is guaranteed for
  318. data transfers and commit/undo actions, or at least refer to another memory
  319. model that needs to be preserved. Without that, the compiler cannot ensure the
  320. memory model specified on the level of the programming language (e.g., by the
  321. C++ TM specification).
  322. For example, if a transactional load is ordered before another load/store, then
  323. the TM runtime must also ensure this ordering when accessing shared state. If
  324. not, this might break the kind of publication safety used in the C++ TM
  325. specification. Likewise, the TM runtime must ensure privatization safety.
  326. @c ---------------------------------------------------------------------
  327. @c Internals
  328. @c ---------------------------------------------------------------------
  329. @node Internals
  330. @chapter Internals
  331. @section TM methods and method groups
  332. libitm supports several ways of synchronizing transactions with each other.
  333. These TM methods (or TM algorithms) are implemented in the form of
  334. subclasses of @code{abi_dispatch}, which provide methods for
  335. transactional loads and stores as well as callbacks for rollback and commit.
  336. All methods that are compatible with each other (i.e., that let concurrently
  337. running transactions still synchronize correctly even if different methods
  338. are used) belong to the same TM method group. Pointers to TM methods can be
  339. obtained using the factory methods prefixed with @code{dispatch_} in
  340. @file{libitm_i.h}. There are two special methods, @code{dispatch_serial} and
  341. @code{dispatch_serialirr}, that are compatible with all methods because they
  342. run transactions completely in serial mode.
  343. @subsection TM method life cycle
  344. The state of TM methods does not change after construction, but they do alter
  345. the state of transactions that use this method. However, because
  346. per-transaction data gets used by several methods, @code{gtm_thread} is
  347. responsible for setting an initial state that is useful for all methods.
  348. After that, methods are responsible for resetting/clearing this state on each
  349. rollback or commit (of outermost transactions), so that the transaction
  350. executed next is not affected by the previous transaction.
  351. There is also global state associated with each method group, which is
  352. initialized and shut down (@code{method_group::init()} and @code{fini()})
  353. when switching between method groups (see @file{retry.cc}).
  354. @subsection Selecting the default method
  355. The default method that libitm uses for freshly started transactions (but
  356. not necessarily for restarted transactions) can be set via an environment
  357. variable (@env{ITM_DEFAULT_METHOD}), whose value should be equal to the name
  358. of one of the factory methods returning abi_dispatch subclasses but without
  359. the "dispatch_" prefix (e.g., "serialirr" instead of
  360. @code{GTM::dispatch_serialirr()}).
  361. Note that this environment variable is only a hint for libitm and might not
  362. be supported in the future.
  363. @section Nesting: flat vs. closed
  364. We support two different kinds of nesting of transactions. In the case of
  365. @emph{flat nesting}, the nesting structure is flattened and all nested
  366. transactions are subsumed by the enclosing transaction. In contrast,
  367. with @emph{closed nesting}, nested transactions that have not yet committed
  368. can be rolled back separately from the enclosing transactions; when they
  369. commit, they are subsumed by the enclosing transaction, and their effects
  370. will be finally committed when the outermost transaction commits.
  371. @emph{Open nesting} (where nested transactions can commit independently of the
  372. enclosing transactions) are not supported.
  373. Flat nesting is the default nesting mode, but closed nesting is supported and
  374. used when transactions contain user-controlled aborts
  375. (@code{__transaction_cancel} statements). We assume that user-controlled
  376. aborts are rare in typical code and used mostly in exceptional situations.
  377. Thus, it makes more sense to use flat nesting by default to avoid the
  378. performance overhead of the additional checkpoints required for closed
  379. nesting. User-controlled aborts will correctly abort the innermost enclosing
  380. transaction, whereas the whole (i.e., outermost) transaction will be restarted
  381. otherwise (e.g., when a transaction encounters data conflicts during
  382. optimistic execution).
  383. @section Locking conventions
  384. This section documents the locking scheme and rules for all uses of locking
  385. in libitm. We have to support serial(-irrevocable) mode, which is implemented
  386. using a global lock as explained next (called the @emph{serial lock}). To
  387. simplify the overall design, we use the same lock as catch-all locking
  388. mechanism for other infrequent tasks such as (de)registering clone tables or
  389. threads. Besides the serial lock, there are @emph{per-method-group locks} that
  390. are managed by specific method groups (i.e., groups of similar TM concurrency
  391. control algorithms), and lock-like constructs for quiescence-based operations
  392. such as ensuring privatization safety.
  393. Thus, the actions that participate in the libitm-internal locking are either
  394. @emph{active transactions} that do not run in serial mode, @emph{serial
  395. transactions} (which (are about to) run in serial mode), and management tasks
  396. that do not execute within a transaction but have acquired the serial mode
  397. like a serial transaction would do (e.g., to be able to register threads with
  398. libitm). Transactions become active as soon as they have successfully used the
  399. serial lock to announce this globally (@pxref{serial-lock-impl,,Serial lock
  400. implementation}). Likewise, transactions become serial transactions as soon as
  401. they have acquired the exclusive rights provided by the serial lock (i.e.,
  402. serial mode, which also means that there are no other concurrent active or
  403. serial transactions). Note that active transactions can become serial
  404. transactions when they enter serial mode during the runtime of the
  405. transaction.
  406. @subsection State-to-lock mapping
  407. Application data is protected by the serial lock if there is a serial
  408. transaction and no concurrently running active transaction (i.e., non-serial).
  409. Otherwise, application data is protected by the currently selected method
  410. group, which might use per-method-group locks or other mechanisms. Also note
  411. that application data that is about to be privatized might not be allowed to be
  412. accessed by nontransactional code until privatization safety has been ensured;
  413. the details of this are handled by the current method group.
  414. libitm-internal state is either protected by the serial lock or accessed
  415. through custom concurrent code. The latter applies to the public/shared part
  416. of a transaction object and most typical method-group-specific state.
  417. The former category (protected by the serial lock) includes:
  418. @itemize @bullet
  419. @item The list of active threads that have used transactions.
  420. @item The tables that map functions to their transactional clones.
  421. @item The current selection of which method group to use.
  422. @item Some method-group-specific data, or invariants of this data. For example,
  423. resetting a method group to its initial state is handled by switching to the
  424. same method group, so the serial lock protects such resetting as well.
  425. @end itemize
  426. In general, such state is immutable whenever there exists an active
  427. (non-serial) transaction. If there is no active transaction, a serial
  428. transaction (or a thread that is not currently executing a transaction but has
  429. acquired the serial lock) is allowed to modify this state (but must of course
  430. be careful to not surprise the current method group's implementation with such
  431. modifications).
  432. @subsection Lock acquisition order
  433. To prevent deadlocks, locks acquisition must happen in a globally agreed-upon
  434. order. Note that this applies to other forms of blocking too, but does not
  435. necessarily apply to lock acquisitions that do not block (e.g., trylock()
  436. calls that do not get retried forever). Note that serial transactions are
  437. never return back to active transactions until the transaction has committed.
  438. Likewise, active transactions stay active until they have committed.
  439. Per-method-group locks are typically also not released before commit.
  440. Lock acquisition / blocking rules:
  441. @itemize @bullet
  442. @item Transactions must become active or serial before they are allowed to
  443. use method-group-specific locks or blocking (i.e., the serial lock must be
  444. acquired before those other locks, either in serial or nonserial mode).
  445. @item Any number of threads that do not currently run active transactions can
  446. block while trying to get the serial lock in exclusive mode. Note that active
  447. transactions must not block when trying to upgrade to serial mode unless there
  448. is no other transaction that is trying that (the latter is ensured by the
  449. serial lock implementation.
  450. @item Method groups must prevent deadlocks on their locks. In particular, they
  451. must also be prepared for another active transaction that has acquired
  452. method-group-specific locks but is blocked during an attempt to upgrade to
  453. being a serial transaction. See below for details.
  454. @item Serial transactions can acquire method-group-specific locks because there
  455. will be no other active nor serial transaction.
  456. @end itemize
  457. There is no single rule for per-method-group blocking because this depends on
  458. when a TM method might acquire locks. If no active transaction can upgrade to
  459. being a serial transaction after it has acquired per-method-group locks (e.g.,
  460. when those locks are only acquired during an attempt to commit), then the TM
  461. method does not need to consider a potential deadlock due to serial mode.
  462. If there can be upgrades to serial mode after the acquisition of
  463. per-method-group locks, then TM methods need to avoid those deadlocks:
  464. @itemize @bullet
  465. @item When upgrading to a serial transaction, after acquiring exclusive rights
  466. to the serial lock but before waiting for concurrent active transactions to
  467. finish (@pxref{serial-lock-impl,,Serial lock implementation} for details),
  468. we have to wake up all active transactions waiting on the upgrader's
  469. per-method-group locks.
  470. @item Active transactions blocking on per-method-group locks need to check the
  471. serial lock and abort if there is a pending serial transaction.
  472. @item Lost wake-ups have to be prevented (e.g., by changing a bit in each
  473. per-method-group lock before doing the wake-up, and only blocking on this lock
  474. using a futex if this bit is not group).
  475. @end itemize
  476. @strong{TODO}: Can reuse serial lock for gl-*? And if we can, does it make
  477. sense to introduce further complexity in the serial lock? For gl-*, we can
  478. really only avoid an abort if we do -wb and -vbv.
  479. @subsection Serial lock implementation
  480. @anchor{serial-lock-impl}
  481. The serial lock implementation is optimized towards assuming that serial
  482. transactions are infrequent and not the common case. However, the performance
  483. of entering serial mode can matter because when only few transactions are run
  484. concurrently or if there are few threads, then it can be efficient to run
  485. transactions serially.
  486. The serial lock is similar to a multi-reader-single-writer lock in that there
  487. can be several active transactions but only one serial transaction. However,
  488. we do want to avoid contention (in the lock implementation) between active
  489. transactions, so we split up the reader side of the lock into per-transaction
  490. flags that are true iff the transaction is active. The exclusive writer side
  491. remains a shared single flag, which is acquired using a CAS, for example.
  492. On the fast-path, the serial lock then works similar to Dekker's algorithm but
  493. with several reader flags that a serial transaction would have to check.
  494. A serial transaction thus requires a list of all threads with potentially
  495. active transactions; we can use the serial lock itself to protect this list
  496. (i.e., only threads that have acquired the serial lock can modify this list).
  497. We want starvation-freedom for the serial lock to allow for using it to ensure
  498. progress for potentially starved transactions (@pxref{progress-guarantees,,
  499. Progress Guarantees} for details). However, this is currently not enforced by
  500. the implementation of the serial lock.
  501. Here is pseudo-code for the read/write fast paths of acquiring the serial
  502. lock (read-to-write upgrade is similar to write_lock:
  503. @example
  504. // read_lock:
  505. tx->shared_state |= active;
  506. __sync_synchronize(); // or STLD membar, or C++0x seq-cst fence
  507. while (!serial_lock.exclusive)
  508. if (spinning_for_too_long) goto slowpath;
  509. // write_lock:
  510. if (CAS(&serial_lock.exclusive, 0, this) != 0)
  511. goto slowpath; // writer-writer contention
  512. // need a membar here, but CAS already has full membar semantics
  513. bool need_blocking = false;
  514. for (t: all txns)
  515. @{
  516. for (;t->shared_state & active;)
  517. if (spinning_for_too_long) @{ need_blocking = true; break; @}
  518. @}
  519. if (need_blocking) goto slowpath;
  520. @end example
  521. Releasing a lock in this spin-lock version then just consists of resetting
  522. @code{tx->shared_state} to inactive or clearing @code{serial_lock.exclusive}.
  523. However, we can't rely on a pure spinlock because we need to get the OS
  524. involved at some time (e.g., when there are more threads than CPUs to run on).
  525. Therefore, the real implementation falls back to a blocking slow path, either
  526. based on pthread mutexes or Linux futexes.
  527. @subsection Reentrancy
  528. libitm has to consider the following cases of reentrancy:
  529. @itemize @bullet
  530. @item Transaction calls unsafe code that starts a new transaction: The outer
  531. transaction will become a serial transaction before executing unsafe code.
  532. Therefore, nesting within serial transactions must work, even if the nested
  533. transaction is called from within uninstrumented code.
  534. @item Transaction calls either a transactional wrapper or safe code, which in
  535. turn starts a new transaction: It is not yet defined in the specification
  536. whether this is allowed. Thus, it is undefined whether libitm supports this.
  537. @item Code that starts new transactions might be called from within any part
  538. of libitm: This kind of reentrancy would likely be rather complex and can
  539. probably be avoided. Therefore, it is not supported.
  540. @end itemize
  541. @subsection Privatization safety
  542. Privatization safety is ensured by libitm using a quiescence-based approach.
  543. Basically, a privatizing transaction waits until all concurrent active
  544. transactions will either have finished (are not active anymore) or operate on
  545. a sufficiently recent snapshot to not access the privatized data anymore. This
  546. happens after the privatizing transaction has stopped being an active
  547. transaction, so waiting for quiescence does not contribute to deadlocks.
  548. In method groups that need to ensure publication safety explicitly, active
  549. transactions maintain a flag or timestamp in the public/shared part of the
  550. transaction descriptor. Before blocking, privatizers need to let the other
  551. transactions know that they should wake up the privatizer.
  552. @strong{TODO} Ho to implement the waiters? Should those flags be
  553. per-transaction or at a central place? We want to avoid one wake/wait call
  554. per active transactions, so we might want to use either a tree or combining
  555. to reduce the syscall overhead, or rather spin for a long amount of time
  556. instead of doing blocking. Also, it would be good if only the last transaction
  557. that the privatizer waits for would do the wake-up.
  558. @subsection Progress guarantees
  559. @anchor{progress-guarantees}
  560. Transactions that do not make progress when using the current TM method will
  561. eventually try to execute in serial mode. Thus, the serial lock's progress
  562. guarantees determine the progress guarantees of the whole TM. Obviously, we at
  563. least need deadlock-freedom for the serial lock, but it would also be good to
  564. provide starvation-freedom (informally, all threads will finish executing a
  565. transaction eventually iff they get enough cycles).
  566. However, the scheduling of transactions (e.g., thread scheduling by the OS)
  567. also affects the handling of progress guarantees by the TM. First, the TM
  568. can only guarantee deadlock-freedom if threads do not get stopped. Likewise,
  569. low-priority threads can starve if they do not get scheduled when other
  570. high-priority threads get those cycles instead.
  571. If all threads get scheduled eventually, correct lock implementations will
  572. provide deadlock-freedom, but might not provide starvation-freedom. We can
  573. either enforce the latter in the TM's lock implementation, or assume that
  574. the scheduling is sufficiently random to yield a probabilistic guarantee that
  575. no thread will starve (because eventually, a transaction will encounter a
  576. scheduling that will allow it to run). This can indeed work well in practice
  577. but is not necessarily guaranteed to work (e.g., simple spin locks can be
  578. pretty efficient).
  579. Because enforcing stronger progress guarantees in the TM has a higher runtime
  580. overhead, we focus on deadlock-freedom right now and assume that the threads
  581. will get scheduled eventually by the OS (but don't consider threads with
  582. different priorities). We should support starvation-freedom for serial
  583. transactions in the future. Everything beyond that is highly related to proper
  584. contention management across all of the TM (including with TM method to
  585. choose), and is future work.
  586. @strong{TODO} Handling thread priorities: We want to avoid priority inversion
  587. but it's unclear how often that actually matters in practice. Workloads that
  588. have threads with different priorities will likely also require lower latency
  589. or higher throughput for high-priority threads. Therefore, it probably makes
  590. not that much sense (except for eventual progress guarantees) to use
  591. priority inheritance until the TM has priority-aware contention management.
  592. @c ---------------------------------------------------------------------
  593. @c GNU Free Documentation License
  594. @c ---------------------------------------------------------------------
  595. @include fdl.texi
  596. @c ---------------------------------------------------------------------
  597. @c Index
  598. @c ---------------------------------------------------------------------
  599. @node Library Index
  600. @unnumbered Library Index
  601. @printindex cp
  602. @bye