database.txt 8.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194
  1. Some information about database access in MediaWiki.
  2. By Tim Starling, January 2006.
  3. ------------------------------------------------------------------------
  4. Database layout
  5. ------------------------------------------------------------------------
  6. For information about the MediaWiki database layout, such as a
  7. description of the tables and their contents, please see:
  8. https://www.mediawiki.org/wiki/Manual:Database_layout
  9. https://phabricator.wikimedia.org/diffusion/MW/browse/master/maintenance/tables.sql
  10. ------------------------------------------------------------------------
  11. API
  12. ------------------------------------------------------------------------
  13. To make a read query, something like this usually suffices:
  14. $dbr = wfGetDB( DB_REPLICA );
  15. $res = $dbr->select( /* ...see docs... */ );
  16. foreach ( $res as $row ) {
  17. ...
  18. }
  19. For a write query, use something like:
  20. $dbw = wfGetDB( DB_MASTER );
  21. $dbw->insert( /* ...see docs... */ );
  22. We use the convention $dbr for read and $dbw for write to help you keep
  23. track of whether the database object is a slave (read-only) or a master
  24. (read/write). If you write to a slave, the world will explode. Or to be
  25. precise, a subsequent write query which succeeded on the master may fail
  26. when replicated to the slave due to a unique key collision. Replication
  27. on the slave will stop and it may take hours to repair the database and
  28. get it back online. Setting read_only in my.cnf on the slave will avoid
  29. this scenario, but given the dire consequences, we prefer to have as
  30. many checks as possible.
  31. We provide a query() function for raw SQL, but the wrapper functions
  32. like select() and insert() are usually more convenient. They take care
  33. of things like table prefixes and escaping for you. If you really need
  34. to make your own SQL, please read the documentation for tableName() and
  35. addQuotes(). You will need both of them.
  36. ------------------------------------------------------------------------
  37. Basic query optimisation
  38. ------------------------------------------------------------------------
  39. MediaWiki developers who need to write DB queries should have some
  40. understanding of databases and the performance issues associated with
  41. them. Patches containing unacceptably slow features will not be
  42. accepted. Unindexed queries are generally not welcome in MediaWiki,
  43. except in special pages derived from QueryPage. It's a common pitfall
  44. for new developers to submit code containing SQL queries which examine
  45. huge numbers of rows. Remember that COUNT(*) is O(N), counting rows in a
  46. table is like counting beans in a bucket.
  47. ------------------------------------------------------------------------
  48. Replication
  49. ------------------------------------------------------------------------
  50. The largest installation of MediaWiki, Wikimedia, uses a large set of
  51. slave MySQL servers replicating writes made to a master MySQL server. It
  52. is important to understand the issues associated with this setup if you
  53. want to write code destined for Wikipedia.
  54. It's often the case that the best algorithm to use for a given task
  55. depends on whether or not replication is in use. Due to our unabashed
  56. Wikipedia-centrism, we often just use the replication-friendly version,
  57. but if you like, you can use LoadBalancer::getServerCount() > 1 to
  58. check to see if replication is in use.
  59. === Lag ===
  60. Lag primarily occurs when large write queries are sent to the master.
  61. Writes on the master are executed in parallel, but they are executed in
  62. serial when they are replicated to the slaves. The master writes the
  63. query to the binlog when the transaction is committed. The slaves poll
  64. the binlog and start executing the query as soon as it appears. They can
  65. service reads while they are performing a write query, but will not read
  66. anything more from the binlog and thus will perform no more writes. This
  67. means that if the write query runs for a long time, the slaves will lag
  68. behind the master for the time it takes for the write query to complete.
  69. Lag can be exacerbated by high read load. MediaWiki's load balancer will
  70. stop sending reads to a slave when it is lagged by more than 30 seconds.
  71. If the load ratios are set incorrectly, or if there is too much load
  72. generally, this may lead to a slave permanently hovering around 30
  73. seconds lag.
  74. If all slaves are lagged by more than 30 seconds, MediaWiki will stop
  75. writing to the database. All edits and other write operations will be
  76. refused, with an error returned to the user. This gives the slaves a
  77. chance to catch up. Before we had this mechanism, the slaves would
  78. regularly lag by several minutes, making review of recent edits
  79. difficult.
  80. In addition to this, MediaWiki attempts to ensure that the user sees
  81. events occurring on the wiki in chronological order. A few seconds of lag
  82. can be tolerated, as long as the user sees a consistent picture from
  83. subsequent requests. This is done by saving the master binlog position
  84. in the session, and then at the start of each request, waiting for the
  85. slave to catch up to that position before doing any reads from it. If
  86. this wait times out, reads are allowed anyway, but the request is
  87. considered to be in "lagged slave mode". Lagged slave mode can be
  88. checked by calling LoadBalancer::getLaggedReplicaMode(). The only
  89. practical consequence at present is a warning displayed in the page
  90. footer.
  91. === Lag avoidance ===
  92. To avoid excessive lag, queries which write large numbers of rows should
  93. be split up, generally to write one row at a time. Multi-row INSERT ...
  94. SELECT queries are the worst offenders should be avoided altogether.
  95. Instead do the select first and then the insert.
  96. === Working with lag ===
  97. Despite our best efforts, it's not practical to guarantee a low-lag
  98. environment. Lag will usually be less than one second, but may
  99. occasionally be up to 30 seconds. For scalability, it's very important
  100. to keep load on the master low, so simply sending all your queries to
  101. the master is not the answer. So when you have a genuine need for
  102. up-to-date data, the following approach is advised:
  103. 1) Do a quick query to the master for a sequence number or timestamp 2)
  104. Run the full query on the slave and check if it matches the data you got
  105. from the master 3) If it doesn't, run the full query on the master
  106. To avoid swamping the master every time the slaves lag, use of this
  107. approach should be kept to a minimum. In most cases you should just read
  108. from the slave and let the user deal with the delay.
  109. ------------------------------------------------------------------------
  110. Lock contention
  111. ------------------------------------------------------------------------
  112. Due to the high write rate on Wikipedia (and some other wikis),
  113. MediaWiki developers need to be very careful to structure their writes
  114. to avoid long-lasting locks. By default, MediaWiki opens a transaction
  115. at the first query, and commits it before the output is sent. Locks will
  116. be held from the time when the query is done until the commit. So you
  117. can reduce lock time by doing as much processing as possible before you
  118. do your write queries.
  119. Often this approach is not good enough, and it becomes necessary to
  120. enclose small groups of queries in their own transaction. Use the
  121. following syntax:
  122. $dbw = wfGetDB( DB_MASTER );
  123. $dbw->begin( __METHOD__ );
  124. /* Do queries */
  125. $dbw->commit( __METHOD__ );
  126. Use of locking reads (e.g. the FOR UPDATE clause) is not advised. They
  127. are poorly implemented in InnoDB and will cause regular deadlock errors.
  128. It's also surprisingly easy to cripple the wiki with lock contention.
  129. Instead of locking reads, combine your existence checks into your write
  130. queries, by using an appropriate condition in the WHERE clause of an
  131. UPDATE, or by using unique indexes in combination with INSERT IGNORE.
  132. Then use the affected row count to see if the query succeeded.
  133. ------------------------------------------------------------------------
  134. Supported DBMSs
  135. ------------------------------------------------------------------------
  136. MediaWiki is written primarily for use with MySQL. Queries are optimized
  137. for it and its schema is considered the canonical version. However,
  138. MediaWiki does support the following other DBMSs to varying degrees.
  139. * PostgreSQL
  140. * SQLite
  141. More information can be found about each of these databases (known issues,
  142. level of support, extra configuration) in the "databases" subdirectory in
  143. this folder.
  144. ------------------------------------------------------------------------
  145. Use of GROUP BY
  146. ------------------------------------------------------------------------
  147. MySQL supports GROUP BY without checking anything in the SELECT clause.
  148. Other DBMSs (especially Postgres) are stricter and require that all the
  149. non-aggregate items in the SELECT clause appear in the GROUP BY. For
  150. this reason, it is highly discouraged to use SELECT * with GROUP BY
  151. queries.