In order for replication to be resilient to unexpected halts of the server (sometimes described as crash-safe) it must be possible for the slave to recover its state before halting. This section describes the impact of an unexpected halt of a slave during replication and how to configure a slave for the best chance of recovery to continue replication.
After an unexpected halt of a slave, upon restart the I/O thread
must recover the information about which transactions have been
received, and the SQL thread must recover which transactions have
been executed already. For information on the slave logs required
for recovery, see Section 16.2.4, “Replication Relay and Status Logs”. The information
required for recovery was traditionally stored in files, which had
the risk of losing synchrony with the master depending at which
stage of processing a transaction the slave halted at, or even
corruption of the files themselves. In MySQL 5.7 you
can instead use tables to store this information. These tables are
created using InnoDB, and by using
this transactional storage engine the information is always
recoverable upon restart. To configure MySQL 5.7 to
store the replication information in tables, set
relay_log_info_repository and
master_info_repository to
TABLE. The server then stores information
required for the recovery of the I/O thread in the
mysql.slave_master_info table and information
required for the recovery of the SQL thread in the
mysql.slave_relay_log_info table.
Exactly how a replication slave recovers from an unexpected halt
is influenced by the chosen method of replication, whether the
slave is single-threaded or multi-threaded, the setting of
variables such as
relay_log_recovery, and whether
features such as MASTER_AUTO_POSITION are being
used.
The following table shows the impact of these different factors on how a single-threaded slave recovers from an unexpected halt.
Table 16.5 Factors Influencing Single-threaded Replication Slave Recovery
GTID | MASTER_AUTO_POSITION | Crash type | Recovery guaranteed | Relay log impact | ||
|---|---|---|---|---|---|---|
OFF | Any | 1 | TABLE | Any | Yes | Lost |
OFF | Any | 1 | TABLE | Server | Yes | Lost |
OFF | Any | 1 | Any | OS | No | Lost |
OFF | Any | 0 | TABLE | Server | Yes | Remains |
OFF | Any | 0 | TABLE | OS | No | Remains |
ON | ON | Any | Any | Any | Yes | Lost |
ON | OFF | 0 | TABLE | Server | Yes | Remains |
ON | OFF | 0 | Any | OS | No | Remains |
As the table shows, when using a single-threaded slave the following configurations are most resilient to unexpected halts:
When using GTIDs and
MASTER_AUTO_POSITION, setrelay_log_recovery=1. With this configuration the setting ofrelay_log_info_repositoryand other variables does not impact on recovery.When using file position based replication, set
relay_log_recovery=1andrelay_log_info_repository=TABLE.NoteDuring recovery the relay log is lost.
The following table shows the impact of these different factors on how a multi-threaded slave recovers from an unexpected halt.
Table 16.6 Factors Influencing Multi-threaded Replication Slave Recovery
GTID |
| Crash type | Recovery guaranteed | Relay log impact | |||
|---|---|---|---|---|---|---|---|
OFF | 1 | Any | 1 | TABLE | Any | Yes | Lost |
OFF | >1 | Any | 1 | TABLE | Server | Yes | Lost |
OFF | >1 | Any | 1 | Any | OS | No | Lost |
OFF | 1 | Any | 0 | TABLE | Server | Yes | Remains |
OFF | 1 | Any | 0 | TABLE | OS | No | Remains |
ON | Any | ON | Any | Any | Any | Yes | Lost |
ON | 1 | OFF | 0 | TABLE | Server | Yes | Remains |
ON | 1 | OFF | 0 | Any | OS | No | Remains |
As the table shows, when using a multi-threaded slave the following configurations are most resilient to unexpected halts:
When using GTIDs and
MASTER_AUTO_POSITION, setrelay_log_recovery=1. With this configuration the setting ofrelay_log_info_repositoryand other variables does not impact on recovery.When using file position based replication, set
relay_log_recovery=1,sync_relay_log=1, andrelay_log_info_repository=TABLE.NoteDuring recovery the relay log is lost.
It is important to note the impact of
sync_relay_log=1, which requires
a write of to the relay log per transaction. Although this setting
is the most resilient to an unexpected halt, with at most one
unwritten transaction being lost, it also has the potential to
greatly increase the load on storage. Without
sync_relay_log=1, the effect of
an unexpected halt depends on how the relay log is handled by the
operating system. Also note that when
relay_log_recovery=0, the next
time the slave is started after an unexpected halt the relay log
is processed as part of recovery. After this process completes,
the relay log is deleted.
An unexpected halt of a multi-threaded replication slave using the
recommended file position based replication configuration above
may result in a relay log with transaction inconsistencies (gaps
in the sequence of transactions) caused by the unexpected halt.
See
Section 16.4.1.34, “Replication and Transaction Inconsistencies”.
In MySQL 5.7.13 and later, if the relay log recovery process
encounters such transaction inconsistencies they are filled and
the recovery process continues automatically. In MySQL versions
prior to MySQL 5.7.13, this process is not automatic and requires
starting the server with
relay_log_recovery=0, starting
the slave with START
SLAVE UNTIL SQL_AFTER_MTS_GAPS to fix any transaction
inconsistencies and then restarting the slave with
relay_log_recovery=1.
When you are using multi-source replication and
relay_log_recovery=1, after
restarting due to an unexpected halt all replication channels go
through the relay log recovery process. Any inconsistencies found
in the relay log due to an unexpected halt of a multi-threaded
slave are filled.