123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194 |
- Some information about database access in MediaWiki.
- By Tim Starling, January 2006.
- ------------------------------------------------------------------------
- Database layout
- ------------------------------------------------------------------------
- For information about the MediaWiki database layout, such as a
- description of the tables and their contents, please see:
- https://www.mediawiki.org/wiki/Manual:Database_layout
- https://phabricator.wikimedia.org/diffusion/MW/browse/master/maintenance/tables.sql
- ------------------------------------------------------------------------
- API
- ------------------------------------------------------------------------
- To make a read query, something like this usually suffices:
- $dbr = wfGetDB( DB_REPLICA );
- $res = $dbr->select( /* ...see docs... */ );
- foreach ( $res as $row ) {
- ...
- }
- For a write query, use something like:
- $dbw = wfGetDB( DB_MASTER );
- $dbw->insert( /* ...see docs... */ );
- We use the convention $dbr for read and $dbw for write to help you keep
- track of whether the database object is a slave (read-only) or a master
- (read/write). If you write to a slave, the world will explode. Or to be
- precise, a subsequent write query which succeeded on the master may fail
- when replicated to the slave due to a unique key collision. Replication
- on the slave will stop and it may take hours to repair the database and
- get it back online. Setting read_only in my.cnf on the slave will avoid
- this scenario, but given the dire consequences, we prefer to have as
- many checks as possible.
- We provide a query() function for raw SQL, but the wrapper functions
- like select() and insert() are usually more convenient. They take care
- of things like table prefixes and escaping for you. If you really need
- to make your own SQL, please read the documentation for tableName() and
- addQuotes(). You will need both of them.
- ------------------------------------------------------------------------
- Basic query optimisation
- ------------------------------------------------------------------------
- MediaWiki developers who need to write DB queries should have some
- understanding of databases and the performance issues associated with
- them. Patches containing unacceptably slow features will not be
- accepted. Unindexed queries are generally not welcome in MediaWiki,
- except in special pages derived from QueryPage. It's a common pitfall
- for new developers to submit code containing SQL queries which examine
- huge numbers of rows. Remember that COUNT(*) is O(N), counting rows in a
- table is like counting beans in a bucket.
- ------------------------------------------------------------------------
- Replication
- ------------------------------------------------------------------------
- The largest installation of MediaWiki, Wikimedia, uses a large set of
- slave MySQL servers replicating writes made to a master MySQL server. It
- is important to understand the issues associated with this setup if you
- want to write code destined for Wikipedia.
- It's often the case that the best algorithm to use for a given task
- depends on whether or not replication is in use. Due to our unabashed
- Wikipedia-centrism, we often just use the replication-friendly version,
- but if you like, you can use LoadBalancer::getServerCount() > 1 to
- check to see if replication is in use.
- === Lag ===
- Lag primarily occurs when large write queries are sent to the master.
- Writes on the master are executed in parallel, but they are executed in
- serial when they are replicated to the slaves. The master writes the
- query to the binlog when the transaction is committed. The slaves poll
- the binlog and start executing the query as soon as it appears. They can
- service reads while they are performing a write query, but will not read
- anything more from the binlog and thus will perform no more writes. This
- means that if the write query runs for a long time, the slaves will lag
- behind the master for the time it takes for the write query to complete.
- Lag can be exacerbated by high read load. MediaWiki's load balancer will
- stop sending reads to a slave when it is lagged by more than 30 seconds.
- If the load ratios are set incorrectly, or if there is too much load
- generally, this may lead to a slave permanently hovering around 30
- seconds lag.
- If all slaves are lagged by more than 30 seconds, MediaWiki will stop
- writing to the database. All edits and other write operations will be
- refused, with an error returned to the user. This gives the slaves a
- chance to catch up. Before we had this mechanism, the slaves would
- regularly lag by several minutes, making review of recent edits
- difficult.
- In addition to this, MediaWiki attempts to ensure that the user sees
- events occurring on the wiki in chronological order. A few seconds of lag
- can be tolerated, as long as the user sees a consistent picture from
- subsequent requests. This is done by saving the master binlog position
- in the session, and then at the start of each request, waiting for the
- slave to catch up to that position before doing any reads from it. If
- this wait times out, reads are allowed anyway, but the request is
- considered to be in "lagged slave mode". Lagged slave mode can be
- checked by calling LoadBalancer::getLaggedReplicaMode(). The only
- practical consequence at present is a warning displayed in the page
- footer.
- === Lag avoidance ===
- To avoid excessive lag, queries which write large numbers of rows should
- be split up, generally to write one row at a time. Multi-row INSERT ...
- SELECT queries are the worst offenders should be avoided altogether.
- Instead do the select first and then the insert.
- === Working with lag ===
- Despite our best efforts, it's not practical to guarantee a low-lag
- environment. Lag will usually be less than one second, but may
- occasionally be up to 30 seconds. For scalability, it's very important
- to keep load on the master low, so simply sending all your queries to
- the master is not the answer. So when you have a genuine need for
- up-to-date data, the following approach is advised:
- 1) Do a quick query to the master for a sequence number or timestamp 2)
- Run the full query on the slave and check if it matches the data you got
- from the master 3) If it doesn't, run the full query on the master
- To avoid swamping the master every time the slaves lag, use of this
- approach should be kept to a minimum. In most cases you should just read
- from the slave and let the user deal with the delay.
- ------------------------------------------------------------------------
- Lock contention
- ------------------------------------------------------------------------
- Due to the high write rate on Wikipedia (and some other wikis),
- MediaWiki developers need to be very careful to structure their writes
- to avoid long-lasting locks. By default, MediaWiki opens a transaction
- at the first query, and commits it before the output is sent. Locks will
- be held from the time when the query is done until the commit. So you
- can reduce lock time by doing as much processing as possible before you
- do your write queries.
- Often this approach is not good enough, and it becomes necessary to
- enclose small groups of queries in their own transaction. Use the
- following syntax:
- $dbw = wfGetDB( DB_MASTER );
- $dbw->begin( __METHOD__ );
- /* Do queries */
- $dbw->commit( __METHOD__ );
- Use of locking reads (e.g. the FOR UPDATE clause) is not advised. They
- are poorly implemented in InnoDB and will cause regular deadlock errors.
- It's also surprisingly easy to cripple the wiki with lock contention.
- Instead of locking reads, combine your existence checks into your write
- queries, by using an appropriate condition in the WHERE clause of an
- UPDATE, or by using unique indexes in combination with INSERT IGNORE.
- Then use the affected row count to see if the query succeeded.
- ------------------------------------------------------------------------
- Supported DBMSs
- ------------------------------------------------------------------------
- MediaWiki is written primarily for use with MySQL. Queries are optimized
- for it and its schema is considered the canonical version. However,
- MediaWiki does support the following other DBMSs to varying degrees.
- * PostgreSQL
- * SQLite
- More information can be found about each of these databases (known issues,
- level of support, extra configuration) in the "databases" subdirectory in
- this folder.
- ------------------------------------------------------------------------
- Use of GROUP BY
- ------------------------------------------------------------------------
- MySQL supports GROUP BY without checking anything in the SELECT clause.
- Other DBMSs (especially Postgres) are stricter and require that all the
- non-aggregate items in the SELECT clause appear in the GROUP BY. For
- this reason, it is highly discouraged to use SELECT * with GROUP BY
- queries.
|