In the event that the primary Cluster replication process fails, it is possible to switch over to the secondary replication channel. The following procedure describes the steps required to accomplish this.
Obtain the time of the most recent global checkpoint (GCP).
That is, you need to determine the most recent epoch from the
ndb_apply_status table on the slave
cluster, which can be found using the following query:
mysqlS'>SELECT @latest:=MAX(epoch)->FROM mysql.ndb_apply_status;
In a circular replication topology, with a master and a slave
running on each host, when you are using
ndb_log_apply_status=1, NDB
Cluster epochs are written in the slave binary logs. This
means that the ndb_apply_status table
contains information for the slave on this host as well as for
any other host which acts as a slave of the master running on
this host.
In this case, you need to determine the latest epoch on this
slave to the exclusion of any epochs from any other slaves in
this slave's binary log that were not listed in the
IGNORE_SERVER_IDS options of the
CHANGE MASTER TO statement used
to set up this slave. The reason for excluding such epochs is
that rows in the mysql.ndb_apply_status
table whose server IDs have a match in the
IGNORE_SERVER_IDS list used with the CHANGE
MASTER TO statement used to prepare this slave's master
are also considered to be from local servers, in addition to
those having the slave's own server ID. You can retrieve
this list as Replicate_Ignore_Server_Ids
from the output of SHOW SLAVE
STATUS. We assume that you have obtained this list
and are substituting it for
ignore_server_ids in the query
shown here, which like the previous version of the query,
selects the greatest epoch into a variable named
@latest:
mysqlS'>SELECT @latest:=MAX(epoch)->FROM mysql.ndb_apply_status->WHERE server_id NOT IN (ignore_server_ids);
In some cases, it may be simpler or more efficient (or both)
to use a list of the server IDs to be included and
server_id IN
in the
server_id_listWHERE condition of the preceding query.
Using the information obtained from the query shown in Step 1,
obtain the corresponding records from the
ndb_binlog_index table on the master
cluster.
Prior to NDB 7.2.6, you should use the following query to accomplish this task:
mysqlM'>SELECT->@file:=SUBSTRING_INDEX(File, '/', -1),->@pos:=Position->FROM mysql.ndb_binlog_index->WHERE epoch > @latest->ORDER BY epoch ASC LIMIT 1;
Beginning with NDB 7.2.6, you can take advantage of the
improved binary logging of DDL statements implemented in those
and later versions by using the following query to obtain the
needed records from the master's
ndb_binlog_index table:
mysqlM'>SELECT->@file:=SUBSTRING_INDEX(next_file, '/', -1),->@pos:=next_position->FROM mysql.ndb_binlog_index->WHERE epoch = @latest->ORDER BY epoch ASC LIMIT 1;
In either case, these are the records saved on the master
since the failure of the primary replication channel. We have
employed a user variable @latest here to
represent the value obtained in Step 1. Of course, it is not
possible for one mysqld instance to access
user variables set on another server instance directly. These
values must be “plugged in” to the second query
manually or in application code.
If (and only if) you use the second of the two queries just
shown against ndb_binlog_index (that is,
the query that employs the next_position
and next_file columns), you must ensure
that the slave mysqld is started with
--slave-skip-errors=ddl_exist_errors
before executing START SLAVE.
Otherwise, replication may stop with duplicate DDL errors.
Now it is possible to synchronize the secondary channel by running the following query on the secondary slave server:
mysqlS'>CHANGE MASTER TO->MASTER_LOG_FILE='@file',->MASTER_LOG_POS=@pos;
Again we have employed user variables (in this case
@file and @pos) to
represent the values obtained in Step 2 and applied in Step 3;
in practice these values must be inserted manually or using
application code that can access both of the servers involved.
@file is a string value such as
'/var/log/mysql/replication-master-bin.00001',
and so must be quoted when used in SQL or application code.
However, the value represented by @pos
must not be quoted. Although MySQL
normally attempts to convert strings to numbers, this case
is an exception.
You can now initiate replication on the secondary channel by issuing the appropriate command on the secondary slave mysqld:
mysqlS'>START SLAVE;
Once the secondary replication channel is active, you can investigate the failure of the primary and effect repairs. The precise actions required to do this will depend upon the reasons for which the primary channel failed.
The secondary replication channel is to be started only if and when the primary replication channel has failed. Running multiple replication channels simultaneously can result in unwanted duplicate records being created on the replication slaves.
If the failure is limited to a single server, it should (in
theory) be possible to replicate from M
to S', or from
M' to S;
however, this has not yet been tested.