pageupdater.txt 12 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192
  1. This document provides an overview of the usage of PageUpdater and DerivedPageDataUpdater.
  2. == PageUpdater ==
  3. PageUpdater is the canonical way to create page revisions, that is, to perform edits.
  4. PageUpdater is a stateful, handle-like object that allows new revisions to be created
  5. on a given wiki page using the saveRevision() method. PageUpdater provides setters for
  6. defining the new revision's content as well as meta-data such as change tags. saveRevision()
  7. stores the new revision's primary content and metadata, and triggers the necessary
  8. updates to derived secondary data and cached artifacts e.g. in the ParserCache and the
  9. CDN layer, using a DerivedPageDataUpdater.
  10. PageUpdater instances follow the below life cycle, defined by a number of
  11. methods:
  12. +----------------------------+
  13. | |
  14. | new |
  15. | |
  16. +------|--------------|------+
  17. | |
  18. grabParentRevision()-| |
  19. or hasEditConflict()-| |
  20. | |
  21. +--------v-------+ |
  22. | | |
  23. | parent known | |
  24. | | |
  25. Enables---------------+--------|-------+ |
  26. safe operations based on | |-saveRevision()
  27. the parent revision, e.g. | |
  28. section replacement or | |
  29. edit conflict resolution. | |
  30. | |
  31. saveRevision()-| |
  32. | |
  33. +------v--------------v------+
  34. | |
  35. | creation committed |
  36. | |
  37. Enables-----------------+----------------------------+
  38. wasSuccess()
  39. isUnchanged()
  40. isNew()
  41. getState()
  42. getNewRevision()
  43. etc.
  44. The stateful nature of PageUpdater allows it to be used to safely perform
  45. transformations that depend on the new revision's parent revision, such as replacing
  46. sections or applying 3-way conflict resolution, while protecting against race
  47. conditions using a compare-and-swap (CAS) mechanism: after calling code used the
  48. grabParentRevision() method to access the edit's logical parent, PageUpdater
  49. remembers that revision, and ensure that that revision is still the page's current
  50. revision when performing the atomic database update for the revision's primary
  51. meta-data when saveRevision() is called. If another revision was created concurrently,
  52. saveRevision() will fail, indicating the problem with the "edit-conflict" code in the status
  53. object.
  54. Typical usage for programmatic revision creation (with $page being a WikiPage as of 1.32, to be
  55. replaced by a repository service later):
  56. $updater = $page->newPageUpdater( $user );
  57. $updater->setContent( SlotRecord::MAIN, $content );
  58. $updater->setRcPatrolStatus( RecentChange::PRC_PATROLLED );
  59. $newRev = $updater->saveRevision( $comment );
  60. Usage with content depending on the parent revision
  61. $updater = $page->newPageUpdater( $user );
  62. $parent = $updater->grabParentRevision();
  63. $content = $parent->getContent( SlotRecord::MAIN )->replaceSection( $section, $sectionContent );
  64. $updater->setContent( SlotRecord::MAIN, $content );
  65. $newRev = $updater->saveRevision( $comment, EDIT_UPDATE );
  66. In both cases, all secondary updates will be triggered automatically.
  67. == DerivedPageDataUpdater ==
  68. DerivedPageDataUpdater is a stateful, handle-like object that caches derived data representing
  69. a revision, and can trigger updates of cached copies of that data, e.g. in the links tables,
  70. page_props, the ParserCache, and the CDN layer.
  71. DerivedPageDataUpdater is used by PageUpdater when creating new revisions, but can also
  72. be used independently when performing meta data updates during undeletion, import, or
  73. when puring a page. It's a stepping stone on the way to a more complete refactoring of WikiPage.
  74. NOTE: Avoid direct usage of DerivedPageDataUpdater. In the future, we want to define interfaces
  75. for the different use cases of DerivedPageDataUpdater, particularly providing access to post-PST
  76. content and ParserOutput to callbacks during revision creation, which currently use
  77. WikiPage::prepareContentForEdit, and allowing updates to be triggered on purge, import, and
  78. undeletion, which currently use WikiPage::doEditUpdates() and Content::getSecondaryDataUpdates().
  79. The primary reason for DerivedPageDataUpdater to be stateful is internal caching of state
  80. that avoids the re-generation of ParserOutput and re-application of pre-save-
  81. transformations (PST).
  82. DerivedPageDataUpdater instances follow the below life cycle, defined by a number of
  83. methods:
  84. +---------------------------------------------------------------------+
  85. | |
  86. | new |
  87. | |
  88. +---------------|------------------|------------------|---------------+
  89. | | |
  90. grabCurrentRevision()-| | |
  91. | | |
  92. +-----------v----------+ | |
  93. | | |-prepareContent() |
  94. | knows current | | |
  95. | | | |
  96. Enables------------------+-----|-----|----------+ | |
  97. pageExisted() | | | |
  98. wasRedirect() | |-prepareContent() | |-prepareUpdate()
  99. | | | |
  100. | | +-------------v------------+ |
  101. | | | | |
  102. | +----> has content | |
  103. | | | |
  104. Enables------------------------|----------+--------------------------+ |
  105. isChange() | | |
  106. isCreation() |-prepareUpdate() | |
  107. getSlots() | prepareUpdate()-| |
  108. getTouchedSlotRoles() | | |
  109. getCanonicalParserOutput() | +-----------v------------v-----------------+
  110. | | |
  111. +------------------> has revision |
  112. | |
  113. Enables-------------------------------------------+------------------------|-----------------+
  114. updateParserCache() |
  115. runSecondaryDataUpdates() |-doUpdates()
  116. |
  117. +-----------v---------+
  118. | |
  119. | updates done |
  120. | |
  121. +---------------------+
  122. - grabCurrentRevision() returns the logical parent revision of the target revision. It is
  123. guaranteed to always return the same revision for a given DerivedPageDataUpdater instance.
  124. If called before prepareUpdate(), this fixates the logical parent to be the page's current
  125. revision. If called for the first time after prepareUpdate(), it returns the revision
  126. passed as the 'oldrevision' option to prepareUpdate(), or, if that wasn't given, the
  127. parent of $revision parameter passed to prepareUpdate().
  128. - prepareContent() is called before the new revision is created, to apply pre-save-
  129. transformation (PST) and allow subsequent access to the canonical ParserOutput of the
  130. revision. getSlots() and getCanonicalParserOutput() as well as getSecondaryDataUpdates()
  131. may be used after prepareContent() was called. Calling prepareContent() with the same
  132. parameters again has no effect. Calling it again with mismatching parameters, or calling
  133. it after prepareUpdate() was called, triggers a LogicException.
  134. - prepareUpdate() is called after the new revision has been created. This may happen
  135. right after the revision was created, on the same instance on which prepareContent() was
  136. called, or later (possibly much later), on a fresh instance in a different process,
  137. due to deferred or asynchronous updates, or during import, undeletion, purging, etc.
  138. prepareUpdate() is required before a call to doUpdates(), and it also enables calls to
  139. getSlots() and getCanonicalParserOutput() as well as getSecondaryDataUpdates().
  140. Calling prepareUpdate() with the same parameters again has no effect.
  141. Calling it again with mismatching parameters, or calling it with parameters mismatching
  142. the ones prepareContent() was called with, triggers a LogicException.
  143. - getSecondaryDataUpdates() returns DataUpdates that represent derived data for the revision.
  144. These may be used to update such data, e.g. in ApiPurge, RefreshLinksJob, and the refreshLinks
  145. script.
  146. - doUpdates() triggers the updates defined by getSecondaryDataUpdates(), and also causes
  147. updates to cached artifacts in the ParserCache, the CDN layer, etc. This is primarily
  148. used by PageUpdater, but also by PageArchive during undeletion, and when importing
  149. revisions from XML. doUpdates() can only be called after prepareUpdate() was used to
  150. initialize the DerivedPageDataUpdater instance for a specific revision. Calling it before
  151. prepareUpdate() is called raises a LogicException.
  152. A DerivedPageDataUpdater instance is intended to be re-used during different stages
  153. of complex update operations that often involve callbacks to extension code via
  154. MediaWiki's hook mechanism, or deferred or even asynchronous execution of Jobs and
  155. DeferredUpdates. Since these mechanisms typically do not provide a way to pass a
  156. DerivedPageDataUpdater directly, WikiPage::getDerivedPageDataUpdater() has to be used to
  157. obtain a DerivedPageDataUpdater for the update currently in progress - re-using the
  158. same DerivedPageDataUpdater if possible avoids re-generation of ParserOutput objects
  159. and other expensively derived artifacts.
  160. This mechanism for re-using a DerivedPageDataUpdater instance without passing it directly
  161. requires a way to ensure that a given DerivedPageDataUpdater instance can actually be used
  162. in the calling code's context. For this purpose, WikiPage::getDerivedPageDataUpdater()
  163. calls the isReusableFor() method on DerivedPageDataUpdater, which ensures that the given
  164. instance is applicable to the given parameters. In other words, isReusableFor() predicts
  165. whether calling prepareContent() or prepareUpdate() with a given set of parameters will
  166. trigger a LogicException. In that case, WikiPage::getDerivedPageDataUpdater() creates a
  167. fresh DerivedPageDataUpdater instance.