Table of Contents
NDB Cluster supports asynchronous replication, more usually referred to simply as “replication”. This section explains how to set up and manage a configuration in which one group of computers operating as an NDB Cluster replicates to a second computer or group of computers. We assume some familiarity on the part of the reader with standard MySQL replication as discussed elsewhere in this Manual. (See Replication).
Normal (non-clustered) replication involves a “master”
server and a “slave” server, the master being the
source of the operations and data to be replicated and the slave
being the recipient of these. In NDB Cluster, replication is
conceptually very similar but can be more complex in practice, as it
may be extended to cover a number of different configurations
including replicating between two complete clusters. Although an NDB
Cluster itself depends on the NDB
storage engine for clustering functionality, it is not necessary to
use NDB as the storage engine for the
slave's copies of the replicated tables (see
Replication from NDB to other storage engines).
However, for maximum availability, it is possible (and preferable)
to replicate from one NDB Cluster to another, and it is this
scenario that we discuss, as shown in the following figure:
In this scenario, the replication process is one in which successive
states of a master cluster are logged and saved to a slave cluster.
This process is accomplished by a special thread known as the NDB
binary log injector thread, which runs on each MySQL server and
produces a binary log (binlog). This thread
ensures that all changes in the cluster producing the binary
log—and not just those changes that are effected through the
MySQL Server—are inserted into the binary log with the correct
serialization order. We refer to the MySQL replication master and
replication slave servers as replication servers or replication
nodes, and the data flow or line of communication between them as a
replication channel.
For information about performing point-in-time recovery with NDB Cluster and NDB Cluster Replication, see Section 8.9.2, “Point-In-Time Recovery Using NDB Cluster Replication”.
NDB API _slave status variables.
NDB API counters can provide enhanced monitoring capabilities on
NDB Cluster replication slaves. These are implemented as NDB
statistics _slave status variables, as seen in
the output of SHOW STATUS, or in
the results of queries against the
SESSION_STATUS or
GLOBAL_STATUS
table in a mysql client session connected to a
MySQL Server that is acting as a slave in NDB Cluster Replication.
By comparing the values of these status variables before and after
the execution of statements affecting replicated
NDB tables, you can observe the
corresponding actions taken on the NDB API level by the slave,
which can be useful when monitoring or troubleshooting NDB Cluster
Replication. Section 7.15, “NDB API Statistics Counters and Variables”,
provides additional information.
Replication from NDB to non-NDB tables.
It is possible to replicate NDB
tables from an NDB Cluster acting as the master to tables using
other MySQL storage engines such as
InnoDB or
MyISAM on a slave
mysqld. This is subject to a number of
conditions; see
Replication from NDB to other storage engines, and
Replication from NDB to a nontransactional storage engine,
for more information.
Throughout this section, we use the following abbreviations or symbols for referring to the master and slave clusters, and to processes and commands run on the clusters or cluster nodes:
| Symbol or Abbreviation | Description (Refers to...) |
|---|---|
M | The cluster serving as the (primary) replication master |
S | The cluster acting as the (primary) replication slave |
shell | Shell command to be issued on the master cluster |
mysql | MySQL client command issued on a single MySQL server running as an SQL node on the master cluster |
mysql | MySQL client command to be issued on all SQL nodes participating in the replication master cluster |
shell | Shell command to be issued on the slave cluster |
mysql | MySQL client command issued on a single MySQL server running as an SQL node on the slave cluster |
mysql | MySQL client command to be issued on all SQL nodes participating in the replication slave cluster |
C | Primary replication channel |
C' | Secondary replication channel |
M' | Secondary replication master |
S' | Secondary replication slave |
A replication channel requires two MySQL servers acting as replication servers (one each for the master and slave). For example, this means that in the case of a replication setup with two replication channels (to provide an extra channel for redundancy), there will be a total of four replication nodes, two per cluster.
Replication of an NDB Cluster as described in this section and
those following is dependent on row-based replication. This means
that the replication master MySQL server must be running with
--binlog-format=ROW or
--binlog-format=MIXED, as described
in Section 8.6, “Starting NDB Cluster Replication (Single Replication Channel)”. For
general information about row-based replication, see
Replication Formats.
If you attempt to use NDB Cluster Replication with
--binlog-format=STATEMENT,
replication fails to work properly because the
ndb_binlog_index table on the master and the
epoch column of the
ndb_apply_status table on the slave are not
updated (see
Section 8.4, “NDB Cluster Replication Schema and Tables”). Instead,
only updates on the MySQL server acting as the replication
master propagate to the slave, and no updates from any other SQL
nodes on the master cluster are replicated.
The default value for the
--binlog-format option in NDB
Cluster 7.3 and NDB Cluster 7.4 is MIXED.
Each MySQL server used for replication in either cluster must be
uniquely identified among all the MySQL replication servers
participating in either cluster (you cannot have replication
servers on both the master and slave clusters sharing the same
ID). This can be done by starting each SQL node using the
--server-id= option,
where idid is a unique integer. Although
it is not strictly necessary, we will assume for purposes of this
discussion that all NDB Cluster binaries are of the same release
version.
It is generally true in MySQL Replication that both MySQL servers (mysqld processes) involved must be compatible with one another with respect to both the version of the replication protocol used and the SQL feature sets which they support (see Replication Compatibility Between MySQL Versions). It is due to such differences between the binaries in the NDB Cluster and MySQL Server 5.6 distributions that NDB Cluster Replication has the additional requirement that both mysqld binaries come from an NDB Cluster distribution. The simplest and easiest way to assure that the mysqld servers are compatible is to use the same NDB Cluster distribution for all master and slave mysqld binaries.
We assume that the slave server or cluster is dedicated to replication of the master, and that no other data is being stored on it.
All NDB tables being replicated must be created
using a MySQL server and client. Tables and other database objects
created using the NDB API (with, for example,
Dictionary::createTable()) are
not visible to a MySQL server and so are not replicated. Updates
by NDB API applications to existing tables that were created using
a MySQL server can be replicated.
It is possible to replicate an NDB Cluster using statement-based replication. However, in this case, the following restrictions apply:
All updates to data rows on the cluster acting as the master must be directed to a single MySQL server.
It is not possible to replicate a cluster using multiple simultaneous MySQL replication processes.
Only changes made at the SQL level are replicated.
These are in addition to the other limitations of statement-based replication as opposed to row-based replication; see Advantages and Disadvantages of Statement-Based and Row-Based Replication, for more specific information concerning the differences between the two replication formats.
This section discusses known problems or issues when using replication with NDB Cluster 7.3 and NDB Cluster 7.4.
Loss of master-slave connection.
A loss of connection can occur either between the replication
master SQL node and the replication slave SQL node, or between
the replication master SQL node and the data nodes in the master
cluster. In the latter case, this can occur not only as a result
of loss of physical connection (for example, a broken network
cable), but due to the overflow of data node event buffers; if
the SQL node is too slow to respond, it may be dropped by the
cluster (this is controllable to some degree by adjusting the
MaxBufferedEpochs and
TimeBetweenEpochs
configuration parameters). If this occurs, it is
entirely possible for new data to be inserted into the master
cluster without being recorded in the replication master's
binary log. For this reason, to guarantee high
availability, it is extremely important to maintain a backup
replication channel, to monitor the primary channel, and to fail
over to the secondary replication channel when necessary to keep
the slave cluster synchronized with the master. NDB Cluster is
not designed to perform such monitoring on its own; for this, an
external application is required.
The replication master issues a “gap” event when
connecting or reconnecting to the master cluster. (A gap event is
a type of “incident event,” which indicates an
incident that occurs that affects the contents of the database but
that cannot easily be represented as a set of changes. Examples of
incidents are server crashes, database resynchronization, (some)
software updates, and (some) hardware changes.) When the slave
encounters a gap in the replication log, it stops with an error
message. This message is available in the output of
SHOW SLAVE STATUS, and indicates
that the SQL thread has stopped due to an incident registered in
the replication stream, and that manual intervention is required.
See Section 8.8, “Implementing Failover with NDB Cluster Replication”, for more
information about what to do in such circumstances.
Because NDB Cluster is not designed on its own to monitor replication status or provide failover, if high availability is a requirement for the slave server or cluster, then you must set up multiple replication lines, monitor the master mysqld on the primary replication line, and be prepared fail over to a secondary line if and as necessary. This must be done manually, or possibly by means of a third-party application. For information about implementing this type of setup, see Section 8.7, “Using Two Replication Channels for NDB Cluster Replication”, and Section 8.8, “Implementing Failover with NDB Cluster Replication”.
However, if you are replicating from a standalone MySQL server to an NDB Cluster, one channel is usually sufficient.
Circular replication. NDB Cluster Replication supports circular replication, as shown in the next example. The replication setup involves three NDB Clusters numbered 1, 2, and 3, in which Cluster 1 acts as the replication master for Cluster 2, Cluster 2 acts as the master for Cluster 3, and Cluster 3 acts as the master for Cluster 1, thus completing the circle. Each NDB Cluster has two SQL nodes, with SQL nodes A and B belonging to Cluster 1, SQL nodes C and D belonging to Cluster 2, and SQL nodes E and F belonging to Cluster 3.
Circular replication using these clusters is supported as long as the following conditions are met:
The SQL nodes on all masters and slaves are the same
All SQL nodes acting as replication masters and slaves are
started using the
--log-slave-updates option
This type of circular replication setup is shown in the following diagram:
In this scenario, SQL node A in Cluster 1 replicates to SQL node C in Cluster 2; SQL node C replicates to SQL node E in Cluster 3; SQL node E replicates to SQL node A. In other words, the replication line (indicated by the red arrows in the diagram) directly connects all SQL nodes used as replication masters and slaves.
It should also be possible to set up circular replication in which not all master SQL nodes are also slaves, as shown here:
In this case, different SQL nodes in each cluster are used as
replication masters and slaves. However, you must
not start any of the SQL nodes using
--log-slave-updates. This type of
circular replication scheme for NDB Cluster, in which the line of
replication (again indicated by the red arrows in the diagram) is
discontinuous, should be possible, but it should be noted that it
has not yet been thoroughly tested and must therefore still be
considered experimental.
The NDB storage engine uses
idempotent execution mode,
which suppresses duplicate-key and other errors that otherwise
break circular replication of NDB Cluster. This is equivalent to
setting the global
slave_exec_mode system variable
to IDEMPOTENT, although this is not necessary
in NDB Cluster replication, since NDB Cluster sets this variable
automatically and ignores any attempts to set it explicitly.
NDB Cluster replication and primary keys.
In the event of a node failure, errors in replication of
NDB tables without primary keys can
still occur, due to the possibility of duplicate rows being
inserted in such cases. For this reason, it is highly
recommended that all NDB tables
being replicated have primary keys.
NDB Cluster Replication and Unique Keys.
In older versions of NDB Cluster, operations that updated values
of unique key columns of NDB tables
could result in duplicate-key errors when replicated. This issue
is solved for replication between
NDB tables by deferring unique key
checks until after all table row updates have been performed.
Deferring constraints in this way is currently supported only by
NDB. Thus, updates of unique keys
when replicating from NDB to a
different storage engine such as
MyISAM or
InnoDB are still not supported.
The problem encountered when replicating without deferred checking
of unique key updates can be illustrated using
NDB table such as
t, is created and populated on the master (and
replicated to a slave that does not support deferred unique key
updates) as shown here:
CREATE TABLE t (
p INT PRIMARY KEY,
c INT,
UNIQUE KEY u (c)
) ENGINE NDB;
INSERT INTO t
VALUES (1,1), (2,2), (3,3), (4,4), (5,5);
The following UPDATE statement on
t succeeded on the master, since the rows
affected are processed in the order determined by the
ORDER BY option, performed over the entire
table:
UPDATE t SET c = c - 1 ORDER BY p;
However, the same statement failed with a duplicate key error or other constraint violation on the slave, because the ordering of the row updates was done for one partition at a time, rather than for the table as a whole.
Every NDB table is implicitly
partitioned by key when it is created. See
KEY Partitioning, for more information.
GTIDs not supported.
Replication using global transaction IDs is not compatible with
the NDB storage engine, and is not supported.
Enabling GTIDs is likely to cause NDB Cluster Replication to
fail.
Multi-threaded slaves not supported.
NDB Cluster does not support multi-threaded slaves, and setting
related system variables such as
slave_parallel_workers,
slave_checkpoint_group, and
slave_checkpoint_group (or the
equivalent mysqld startup options) has no
effect.
This is because the slave may not be able to separate transactions
occurring in one database from those in another if they are
written within the same epoch. In addition, every transaction
handled by the NDB storage engine
involves at least two databases—the target database and the
mysql system database—due to the
requirement for updating the
mysql.ndb_apply_status table (see
Section 8.4, “NDB Cluster Replication Schema and Tables”). This in turn
breaks the requirement for multi-threading that the transaction is
specific to a given database.
Restarting with --initial.
Restarting the cluster with the
--initial option causes the
sequence of GCI and epoch numbers to start over from
0. (This is generally true of NDB Cluster and
not limited to replication scenarios involving Cluster.) The
MySQL servers involved in replication should in this case be
restarted. After this, you should use the
RESET MASTER and
RESET SLAVE statements to clear
the invalid ndb_binlog_index and
ndb_apply_status tables, respectively.
Replication from NDB to other storage engines.
It is possible to replicate an NDB
table on the master to a table using a different storage engine
on the slave, taking into account the restrictions listed here:
Multi-master and circular replication are not supported
(tables on both the master and the slave must use the
NDB storage engine for this to
work).
Using a storage engine which does not perform binary logging for slave tables requires special handling.
Use of a nontransactional storage engine for slave tables also requires special handling.
The master mysqld must be started with
--ndb-log-update-as-write=0 or
--ndb-log-update-as-write=OFF.
The next few paragraphs provide additional information about each of the issues just described.
Multiple masters not supported when replicating NDB to other storage
engines.
For replication from NDB to a
different storage engine, the relationship between the two
databases must be a simple master-slave one. This means that
circular or master-master replication is not supported between
NDB Cluster and other storage engines.
In addition, it is not possible to configure more than one
replication channel when replicating between
NDB and a different storage engine.
(However, an NDB Cluster database can
simultaneously replicate to multiple slave NDB Cluster databases.)
If the master uses NDB tables, it is
still possible to have more than one MySQL Server maintain a
binary log of all changes; however, for the slave to change
masters (fail over), the new master-slave relationship must be
explicitly defined on the slave.
Replicating NDB to a slave storage engine that does not perform binary logging. If you attempt to replicate from an NDB Cluster to a slave that uses a storage engine that does not handle its own binary logging, the replication process aborts with the error Binary logging not possible ... Statement cannot be written atomically since more than one engine involved and at least one engine is self-logging (Error 1595). It is possible to work around this issue in one of the following ways:
Turn off binary logging on the slave.
This can be accomplished by setting
sql_log_bin = 0.
Change the storage engine used for the mysql.ndb_apply_status table.
Causing this table to use an engine that does not handle its
own binary logging can also eliminate the conflict. This can
be done by issuing a statement such as
ALTER TABLE
mysql.ndb_apply_status ENGINE=MyISAM on the slave.
It is safe to do this when using a
non-NDB storage engine on the
slave, since you do not then need to worry about keeping
multiple slave SQL nodes synchronized.
Filter out changes to the mysql.ndb_apply_status table on the slave.
This can be done by starting the slave SQL node with
--replicate-ignore-table=mysql.ndb_apply_status.
If you need for other tables to be ignored by replication,
you might wish to use an appropriate
--replicate-wild-ignore-table
option instead.
You should not disable replication or
binary logging of mysql.ndb_apply_status or
change the storage engine used for this table when replicating
from one NDB Cluster to another. See
Replication and binary log filtering rules with replication between NDB
Clusters,
for details.
Replication from NDB to a nontransactional storage engine.
When replicating from NDB to a
nontransactional storage engine such as
MyISAM, you may encounter
unnecessary duplicate key errors when replicating
INSERT ...
ON DUPLICATE KEY UPDATE statements. You can suppress
these by using
--ndb-log-update-as-write=0,
which forces updates to be logged as writes (rather than as
updates).
Replication and binary log filtering rules with replication between NDB
Clusters.
If you are using any of the options
--replicate-do-*,
--replicate-ignore-*,
--binlog-do-db, or
--binlog-ignore-db to filter
databases or tables being replicated, care must be taken not to
block replication or binary logging of the
mysql.ndb_apply_status, which is required for
replication between NDB Clusters to operate properly. In
particular, you must keep in mind the following:
Using
--replicate-do-db=
(and no other db_name--replicate-do-* or
--replicate-ignore-* options) means that
only tables in database
db_name are replicated. In this
case, you should also use
--replicate-do-db=mysql,
--binlog-do-db=mysql, or
--replicate-do-table=mysql.ndb_apply_status
to ensure that mysql.ndb_apply_status is
populated on slaves.
Using
--binlog-do-db=
(and no other db_name--binlog-do-db
options) means that changes only to
tables in database db_name are
written to the binary log. In this case, you should also use
--replicate-do-db=mysql,
--binlog-do-db=mysql, or
--replicate-do-table=mysql.ndb_apply_status
to ensure that mysql.ndb_apply_status is
populated on slaves.
Using
--replicate-ignore-db=mysql
means that no tables in the mysql database
are replicated. In this case, you should also use
--replicate-do-table=mysql.ndb_apply_status
to ensure that mysql.ndb_apply_status is
replicated.
Using --binlog-ignore-db=mysql
means that no changes to tables in the
mysql database are written to the binary
log. In this case, you should also use
--replicate-do-table=mysql.ndb_apply_status
to ensure that mysql.ndb_apply_status is
replicated.
You should also remember that each replication rule requires the following:
Its own --replicate-do-* or
--replicate-ignore-* option, and that
multiple rules cannot be expressed in a single replication
filtering option. For information about these rules, see
Replication and Binary Logging Options and Variables.
Its own --binlog-do-db or
--binlog-ignore-db option, and
that multiple rules cannot be expressed in a single binary log
filtering option. For information about these rules, see
The Binary Log.
If you are replicating an NDB Cluster to a slave that uses a
storage engine other than NDB, the
considerations just given previously may not apply, as discussed
elsewhere in this section.
NDB Cluster Replication and IPv6. Currently, the NDB API and MGM API do not support IPv6. However, MySQL Servers—including those acting as SQL nodes in an NDB Cluster—can use IPv6 to contact other MySQL Servers. This means that you can replicate between NDB Clusters using IPv6 to connect the master and slave SQL nodes as shown by the dotted arrow in the following diagram:
However, all connections originating within the NDB Cluster—represented in the preceding diagram by solid arrows—must use IPv4. In other words, all NDB Cluster data nodes, management servers, and management clients must be accessible from one another using IPv4. In addition, SQL nodes must use IPv4 to communicate with the cluster.
Since there is currently no support in the NDB and MGM APIs for IPv6, any applications written using these APIs must also make all connections using IPv4.
Attribute promotion and demotion.
NDB Cluster Replication includes support for attribute promotion
and demotion. The implementation of the latter distinguishes
between lossy and non-lossy type conversions, and their use on
the slave can be controlled by setting the
slave_type_conversions global
server system variable.
For more information about attribute promotion and demotion in NDB Cluster, see Row-based replication: attribute promotion and demotion.
Replication in NDB Cluster makes use of a number of dedicated
tables in the mysql database on each MySQL
Server instance acting as an SQL node in both the cluster being
replicated and the replication slave (whether the slave is a
single server or a cluster). These tables are created during the
MySQL installation process by the
mysql_install_db script, and include a table
for storing the binary log's indexing data. Since the
ndb_binlog_index table is local to each MySQL
server and does not participate in clustering, it uses the
MyISAM storage engine. This means that it must
be created separately on each mysqld
participating in the master cluster. (However, the binary log
itself contains updates from all MySQL servers in the cluster to
be replicated.) This table is defined as follows:
CREATE TABLE `ndb_binlog_index` (
`Position` BIGINT(20) UNSIGNED NOT NULL,
`File` VARCHAR(255) NOT NULL,
`epoch` BIGINT(20) UNSIGNED NOT NULL,
`inserts` INT(10) UNSIGNED NOT NULL,
`updates` INT(10) UNSIGNED NOT NULL,
`deletes` INT(10) UNSIGNED NOT NULL,
`schemaops` INT(10) UNSIGNED NOT NULL,
`orig_server_id` INT(10) UNSIGNED NOT NULL,
`orig_epoch` BIGINT(20) UNSIGNED NOT NULL,
`gci` INT(10) UNSIGNED NOT NULL,
`next_position` bigint(20) unsigned NOT NULL,
`next_file` varchar(255) NOT NULL,
PRIMARY KEY (`epoch`,`orig_server_id`,`orig_epoch`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
The size of this table is dependent on the number of epochs per
binary log file and the number of binary log files. The number of
epochs per binary log file normally depends on the amount of
binary log generated per epoch and the size of the binary log
file, with smaller epochs resulting in more epochs per file. You
should be aware that empty epochs produce inserts to the
ndb_binlog_index table, even when the
--ndb-log-empty-epochs option is
OFF, meaning that the number of entries per
file depends on the length of time that the file is in use; that
is,
[number of epochs per file] = [time spent per file] / TimeBetweenEpochs
A busy NDB Cluster writes to the binary log regularly and
presumably rotates binary log files more quickly than a quiet one.
This means that a “quiet” NDB Cluster with
--ndb-log-empty-epochs=ON can
actually have a much higher number of
ndb_binlog_index rows per file than one with a
great deal of activity.
When mysqld is started with the
--ndb-log-orig option, the
orig_server_id and
orig_epoch columns store, respectively, the ID
of the server on which the event originated and the epoch in which
the event took place on the originating server, which is useful in
NDB Cluster replication setups employing multiple masters. The
SELECT statement used to find the
closest binary log position to the highest applied epoch on the
slave in a multi-master setup (see
Section 8.10, “NDB Cluster Replication: Multi-Master and Circular Replication”) employs
these two columns, which are not indexed. This can lead to
performance issues when trying to fail over, since the query must
perform a table scan, especially when the master has been running
with --ndb-log-empty-epochs=ON. You
can improve multi-master failover times by adding an index to
these columns, as shown here:
ALTER TABLE mysql.ndb_binlog_index
ADD INDEX orig_lookup USING BTREE (orig_server_id, orig_epoch);
Adding this index provides no benefit when replicating from a
single master to a single slave, since the query used to get the
binary log position in such cases makes no use of
orig_server_id or
orig_epoch.
See Section 8.8, “Implementing Failover with NDB Cluster Replication”, for more
information about using the next_position and
next_file columns.
The following figure shows the relationship of the NDB Cluster
replication master server, its binary log injector thread, and the
mysql.ndb_binlog_index table.
An additional table, named ndb_apply_status, is
used to keep a record of the operations that have been replicated
from the master to the slave. Unlike the case with
ndb_binlog_index, the data in this table is not
specific to any one SQL node in the (slave) cluster, and so
ndb_apply_status can use the
NDBCLUSTER storage engine, as shown here:
CREATE TABLE `ndb_apply_status` (
`server_id` INT(10) UNSIGNED NOT NULL,
`epoch` BIGINT(20) UNSIGNED NOT NULL,
`log_name` VARCHAR(255) CHARACTER SET latin1 COLLATE latin1_bin NOT NULL,
`start_pos` BIGINT(20) UNSIGNED NOT NULL,
`end_pos` BIGINT(20) UNSIGNED NOT NULL,
PRIMARY KEY (`server_id`) USING HASH
) ENGINE=NDBCLUSTER DEFAULT CHARSET=latin1;
The ndb_apply_status table is populated only on
slaves, which means that, on the master, this table never contains
any rows; thus, there is no need to allow for
DataMemory or
IndexMemory to be allotted
to ndb_apply_status there.
Because this table is populated from data originating on the
master, it should be allowed to replicate; any replication
filtering or binary log filtering rules that inadvertently prevent
the slave from updating ndb_apply_status or the
master from writing into the binary log may prevent replication
between clusters from operating properly. For more information
about potential problems arising from such filtering rules, see
Replication and binary log filtering rules with replication between NDB
Clusters.
The ndb_binlog_index and
ndb_apply_status tables are created in the
mysql database because they should not be
explicitly replicated by the user. User intervention is normally
not required to create or maintain either of these tables, since
both ndb_binlog_index and the
ndb_apply_status are maintained by the
NDB binary log (binlog) injector
thread. This keeps the master mysqld process
updated to changes performed by the
NDB storage engine. The
NDB binlog
injector thread receives events directly from the
NDB storage engine. The
NDB injector is responsible for
capturing all the data events within the cluster, and ensures that
all events which change, insert, or delete data are recorded in
the ndb_binlog_index table. The slave I/O
thread transfers the events from the master's binary log to the
slave's relay log.
However, it is advisable to check for the existence and integrity
of these tables as an initial step in preparing an NDB Cluster for
replication. It is possible to view event data recorded in the
binary log by querying the
mysql.ndb_binlog_index table directly on the
master. This can be also be accomplished using the
SHOW BINLOG EVENTS statement on
either the replication master or slave MySQL servers. (See
SHOW BINLOG EVENTS Syntax.)
You can also obtain useful information from the output of
SHOW ENGINE NDB
STATUS.
The ndb_schema table is used to track schema
changes made to NDB tables. It is
defined as shown here:
CREATE TABLE ndb_schema (
`db` VARBINARY(63) NOT NULL,
`name` VARBINARY(63) NOT NULL,
`slock` BINARY(32) NOT NULL,
`query` BLOB NOT NULL,
`node_id` INT UNSIGNED NOT NULL,
`epoch` BIGINT UNSIGNED NOT NULL,
`id` INT UNSIGNED NOT NULL,
`version` INT UNSIGNED NOT NULL,
`type` INT UNSIGNED NOT NULL,
PRIMARY KEY USING HASH (db,name)
) ENGINE=NDB DEFAULT CHARSET=latin1;
Unlike the two tables previously mentioned in this section, the
ndb_schema table is not visible either to MySQL
SHOW statements, or in any
INFORMATION_SCHEMA tables; however, it can be
seen in the output of ndb_show_tables, as shown
here:
shell> ndb_show_tables -t 2
id type state logging database schema name
4 UserTable Online Yes mysql def ndb_apply_status
5 UserTable Online Yes ndbworld def city
6 UserTable Online Yes ndbworld def country
3 UserTable Online Yes mysql def NDB$BLOB_2_3
7 UserTable Online Yes ndbworld def countrylanguage
2 UserTable Online Yes mysql def ndb_schema
NDBT_ProgramExit: 0 - OK
It is also possible to SELECT from
this table in mysql and other MySQL client
applications, as shown here:
mysql> SELECT * FROM mysql.ndb_schema WHERE name='city' \G
*************************** 1. row ***************************
db: ndbworld
name: city
slock:
query: alter table City engine=ndb
node_id: 4
epoch: 0
id: 0
version: 0
type: 7
1 row in set (0.00 sec)
This can sometimes be useful when debugging applications.
When performing schema changes on
NDB tables, applications should
wait until the ALTER TABLE
statement has returned in the MySQL client connection that
issued the statement before attempting to use the updated
definition of the table.
If the ndb_apply_status table or the
ndb_schema table does not exist on the slave,
ndb_restore re-creates the missing table or
tables (Bug #14612).
Conflict resolution for NDB Cluster Replication requires the
presence of an additional mysql.ndb_replication
table. Currently, this table must be created manually. For
information about how to do this, see
Section 8.11, “NDB Cluster Replication Conflict Resolution”.
Preparing the NDB Cluster for replication consists of the following steps:
Check all MySQL servers for version compatibility (see Section 8.2, “General Requirements for NDB Cluster Replication”).
Create a slave account on the master Cluster with the appropriate privileges:
mysqlM>GRANT REPLICATION SLAVE->ON *.* TO '->slave_user'@'slave_host'IDENTIFIED BY 'slave_password';
In the previous statement,
slave_user is the slave account
user name, slave_host is the host
name or IP address of the replication slave, and
slave_password is the password to
assign to this account.
For example, to create a slave user account with the name
myslave, logging in from the host named
rep-slave, and using the password
53cr37, use the following
GRANT statement:
mysqlM>GRANT REPLICATION SLAVE->ON *.* TO 'myslave'@'rep-slave'->IDENTIFIED BY '53cr37';
For security reasons, it is preferable to use a unique user account—not employed for any other purpose—for the replication slave account.
Configure the slave to use the master. Using the MySQL
Monitor, this can be accomplished with the
CHANGE MASTER TO statement:
mysqlS>CHANGE MASTER TO->MASTER_HOST='->master_host',MASTER_PORT=->master_port,MASTER_USER='->slave_user',MASTER_PASSWORD='slave_password';
In the previous statement,
master_host is the host name or IP
address of the replication master,
master_port is the port for the
slave to use for connecting to the master,
slave_user is the user name set up
for the slave on the master, and
slave_password is the password set
for that user account in the previous step.
For example, to tell the slave to replicate from the MySQL
server whose host name is rep-master, using
the replication slave account created in the previous step,
use the following statement:
mysqlS>CHANGE MASTER TO->MASTER_HOST='rep-master',->MASTER_PORT=3306,->MASTER_USER='myslave',->MASTER_PASSWORD='53cr37';
For a complete list of options that can be used with this statement, see CHANGE MASTER TO Syntax.
To provide replication backup capability, you also need to add
an --ndb-connectstring option
to the slave's my.cnf file prior to
starting the replication process. See
Section 8.9, “NDB Cluster Backups With NDB Cluster Replication”, for
details.
For additional options that can be set in
my.cnf for replication slaves, see
Replication and Binary Logging Options and Variables.
If the master cluster is already in use, you can create a backup of the master and load this onto the slave to cut down on the amount of time required for the slave to synchronize itself with the master. If the slave is also running NDB Cluster, this can be accomplished using the backup and restore procedure described in Section 8.9, “NDB Cluster Backups With NDB Cluster Replication”.
ndb-connectstring=management_host[:port]
In the event that you are not using NDB Cluster on the replication slave, you can create a backup with this command on the replication master:
shellM>mysqldump --master-data=1
Then import the resulting data dump onto the slave by copying
the dump file over to the slave. After this, you can use the
mysql client to import the data from the
dumpfile into the slave database as shown here, where
dump_file is the name of the file
that was generated using mysqldump on the
master, and db_name is the name of
the database to be replicated:
shellS>mysql -u root -pdb_name<dump_file
For a complete list of options to use with mysqldump, see mysqldump — A Database Backup Program.
If you copy the data to the slave in this fashion, you
should make sure that the slave is started with the
--skip-slave-start option on
the command line, or else include
skip-slave-start in the slave's
my.cnf file to keep it from trying to
connect to the master to begin replicating before all the
data has been loaded. Once the data loading has completed,
follow the additional steps outlined in the next two
sections.
Ensure that each MySQL server acting as a replication master
is configured with a unique server ID, and with binary logging
enabled, using the row format. (See
Replication Formats.) These options can be
set either in the master server's my.cnf
file, or on the command line when starting the master
mysqld process. See
Section 8.6, “Starting NDB Cluster Replication (Single Replication Channel)”, for
information regarding the latter option.
This section outlines the procedure for starting NDB Cluster replication using a single replication channel.
Start the MySQL replication master server by issuing this command:
shellM>mysqld --ndbcluster --server-id=id\--log-bin &
In the previous statement, id is
this server's unique ID (see
Section 8.2, “General Requirements for NDB Cluster Replication”). This
starts the server's mysqld process with
binary logging enabled using the proper logging format.
You can also start the master with
--binlog-format=MIXED, in
which case row-based replication is used automatically when
replicating between clusters. STATEMENT
based binary logging is not supported for NDB Cluster
Replication (see
Section 8.2, “General Requirements for NDB Cluster Replication”).
Start the MySQL replication slave server as shown here:
shellS>mysqld --ndbcluster --server-id=id&
In the command just shown, id is
the slave server's unique ID. It is not necessary to enable
logging on the replication slave.
You should use the
--skip-slave-start option
with this command or else you should include
skip-slave-start in the slave server's
my.cnf file, unless you want
replication to begin immediately. With the use of this
option, the start of replication is delayed until the
appropriate START SLAVE
statement has been issued, as explained in Step 4 below.
It is necessary to synchronize the slave server with the master server's replication binary log. If binary logging has not previously been running on the master, run the following statement on the slave:
mysqlS>CHANGE MASTER TO->MASTER_LOG_FILE='',->MASTER_LOG_POS=4;
This instructs the slave to begin reading the master's binary
log from the log's starting point. Otherwise—that is, if
you are loading data from the master using a backup—see
Section 8.8, “Implementing Failover with NDB Cluster Replication”, for
information on how to obtain the correct values to use for
MASTER_LOG_FILE and
MASTER_LOG_POS in such cases.
Finally, you must instruct the slave to begin applying replication by issuing this command from the mysql client on the replication slave:
mysqlS>START SLAVE;
This also initiates the transmission of replication data from the master to the slave.
It is also possible to use two replication channels, in a manner similar to the procedure described in the next section; the differences between this and using a single replication channel are covered in Section 8.7, “Using Two Replication Channels for NDB Cluster Replication”.
It is also possible to improve cluster replication performance by
enabling batched updates.
This can be accomplished by setting the
slave_allow_batching system
variable on the slave mysqld processes.
Normally, updates are applied as soon as they are received.
However, the use of batching causes updates to be applied in 32 KB
batches, which can result in higher throughput and less CPU usage,
particularly where individual updates are relatively small.
Slave batching works on a per-epoch basis; updates belonging to more than one transaction can be sent as part of the same batch.
All outstanding updates are applied when the end of an epoch is reached, even if the updates total less than 32 KB.
Batching can be turned on and off at runtime. To activate it at runtime, you can use either of these two statements:
SET GLOBAL slave_allow_batching = 1; SET GLOBAL slave_allow_batching = ON;
If a particular batch causes problems (such as a statement whose effects do not appear to be replicated correctly), slave batching can be deactivated using either of the following statements:
SET GLOBAL slave_allow_batching = 0; SET GLOBAL slave_allow_batching = OFF;
You can check whether slave batching is currently being used by
means of an appropriate SHOW
VARIABLES statement, like this one:
mysql> SHOW VARIABLES LIKE 'slave%';
+---------------------------+-------+
| Variable_name | Value |
+---------------------------+-------+
| slave_allow_batching | ON |
| slave_compressed_protocol | OFF |
| slave_load_tmpdir | /tmp |
| slave_net_timeout | 3600 |
| slave_skip_errors | OFF |
| slave_transaction_retries | 10 |
+---------------------------+-------+
6 rows in set (0.00 sec)
In a more complete example scenario, we envision two replication channels to provide redundancy and thereby guard against possible failure of a single replication channel. This requires a total of four replication servers, two masters for the master cluster and two slave servers for the slave cluster. For purposes of the discussion that follows, we assume that unique identifiers are assigned as shown here:
| Server ID | Description |
|---|---|
| 1 | Master - primary replication channel (M) |
| 2 | Master - secondary replication channel (M') |
| 3 | Slave - primary replication channel (S) |
| 4 | Slave - secondary replication channel (S') |
Setting up replication with two channels is not radically
different from setting up a single replication channel. First, the
mysqld processes for the primary and secondary
replication masters must be started, followed by those for the
primary and secondary slaves. Then the replication processes may
be initiated by issuing the START
SLAVE statement on each of the slaves. The commands and
the order in which they need to be issued are shown here:
Start the primary replication master:
shellM>mysqld --ndbcluster --server-id=1 \--log-bin &
Start the secondary replication master:
shellM'>mysqld --ndbcluster --server-id=2 \--log-bin &
Start the primary replication slave server:
shellS>mysqld --ndbcluster --server-id=3 \--skip-slave-start &
Start the secondary replication slave:
shellS'>mysqld --ndbcluster --server-id=4 \--skip-slave-start &
Finally, initiate replication on the primary channel by
executing the START SLAVE
statement on the primary slave as shown here:
mysqlS>START SLAVE;
Only the primary channel is to be started at this point. The secondary replication channel is to be started only in the event that the primary replication channel fails, as described in Section 8.8, “Implementing Failover with NDB Cluster Replication”. Running multiple replication channels simultaneously can result in unwanted duplicate records being created on the replication slaves.
As mentioned previously, it is not necessary to enable binary logging on replication slaves.
In the event that the primary Cluster replication process fails, it is possible to switch over to the secondary replication channel. The following procedure describes the steps required to accomplish this.
Obtain the time of the most recent global checkpoint (GCP).
That is, you need to determine the most recent epoch from the
ndb_apply_status table on the slave
cluster, which can be found using the following query:
mysqlS'>SELECT @latest:=MAX(epoch)->FROM mysql.ndb_apply_status;
In a circular replication topology, with a master and a slave
running on each host, when you are using
ndb_log_apply_status=1, NDB
Cluster epochs are written in the slave binary logs. This
means that the ndb_apply_status table
contains information for the slave on this host as well as for
any other host which acts as a slave of the master running on
this host.
In this case, you need to determine the latest epoch on this
slave to the exclusion of any epochs from any other slaves in
this slave's binary log that were not listed in the
IGNORE_SERVER_IDS options of the
CHANGE MASTER TO statement used
to set up this slave. The reason for excluding such epochs is
that rows in the mysql.ndb_apply_status
table whose server IDs have a match in the
IGNORE_SERVER_IDS list used with the CHANGE
MASTER TO statement used to prepare this slave's master
are also considered to be from local servers, in addition to
those having the slave's own server ID. You can retrieve
this list as Replicate_Ignore_Server_Ids
from the output of SHOW SLAVE
STATUS. We assume that you have obtained this list
and are substituting it for
ignore_server_ids in the query
shown here, which like the previous version of the query,
selects the greatest epoch into a variable named
@latest:
mysqlS'>SELECT @latest:=MAX(epoch)->FROM mysql.ndb_apply_status->WHERE server_id NOT IN (ignore_server_ids);
In some cases, it may be simpler or more efficient (or both)
to use a list of the server IDs to be included and
server_id IN
in the
server_id_listWHERE condition of the preceding query.
Using the information obtained from the query shown in Step 1,
obtain the corresponding records from the
ndb_binlog_index table on the master
cluster.
In NDB Cluster 7.3 and later, you can use the following query
to obtain the needed records from the master's
ndb_binlog_index table:
mysqlM'>SELECT->@file:=SUBSTRING_INDEX(next_file, '/', -1),->@pos:=next_position->FROM mysql.ndb_binlog_index->WHERE epoch = @latest->ORDER BY epoch ASC LIMIT 1;
These are the records saved on the master since the failure of
the primary replication channel. We have employed a user
variable @latest here to represent the
value obtained in Step 1. Of course, it is not possible for
one mysqld instance to access user
variables set on another server instance directly. These
values must be “plugged in” to the second query
manually or in application code.
You must ensure that the slave mysqld is
started with
--slave-skip-errors=ddl_exist_errors
before executing START SLAVE.
Otherwise, replication may stop with duplicate DDL errors.
Now it is possible to synchronize the secondary channel by running the following query on the secondary slave server:
mysqlS'>CHANGE MASTER TO->MASTER_LOG_FILE='@file',->MASTER_LOG_POS=@pos;
Again we have employed user variables (in this case
@file and @pos) to
represent the values obtained in Step 2 and applied in Step 3;
in practice these values must be inserted manually or using
application code that can access both of the servers involved.
@file is a string value such as
'/var/log/mysql/replication-master-bin.00001',
and so must be quoted when used in SQL or application code.
However, the value represented by @pos
must not be quoted. Although MySQL
normally attempts to convert strings to numbers, this case
is an exception.
You can now initiate replication on the secondary channel by issuing the appropriate command on the secondary slave mysqld:
mysqlS'>START SLAVE;
Once the secondary replication channel is active, you can investigate the failure of the primary and effect repairs. The precise actions required to do this will depend upon the reasons for which the primary channel failed.
The secondary replication channel is to be started only if and when the primary replication channel has failed. Running multiple replication channels simultaneously can result in unwanted duplicate records being created on the replication slaves.
If the failure is limited to a single server, it should (in
theory) be possible to replicate from M
to S', or from
M' to S;
however, this has not yet been tested.
This section discusses making backups and restoring from them using NDB Cluster replication. We assume that the replication servers have already been configured as covered previously (see Section 8.5, “Preparing the NDB Cluster for Replication”, and the sections immediately following). This having been done, the procedure for making a backup and then restoring from it is as follows:
There are two different methods by which the backup may be started.
Method A.
This method requires that the cluster backup process was
previously enabled on the master server, prior to
starting the replication process. This can be done by
including the following line in a
[mysql_cluster] section in the
my.cnf file, where
management_host is the IP
address or host name of the
NDB management server for
the master cluster, and port
is the management server's port number:
ndb-connectstring=management_host[:port]
The port number needs to be specified only if the default port (1186) is not being used. See Section 4.4, “Initial Configuration of NDB Cluster”, for more information about ports and port allocation in NDB Cluster.
In this case, the backup can be started by executing this statement on the replication master:
shellM>ndb_mgm -e "START BACKUP"
Method B.
If the my.cnf file does not specify
where to find the management host, you can start the
backup process by passing this information to the
NDB management client as
part of the START
BACKUP command. This can be done as shown
here, where management_host
and port are the host name
and port number of the management server:
shellM>ndb_mgmmanagement_host:port-e "START BACKUP"
In our scenario as outlined earlier (see Section 8.5, “Preparing the NDB Cluster for Replication”), this would be executed as follows:
shellM>ndb_mgm rep-master:1186 -e "START BACKUP"
Copy the cluster backup files to the slave that is being
brought on line. Each system running an
ndbd process for the master cluster will
have cluster backup files located on it, and
all of these files must be copied to the
slave to ensure a successful restore. The backup files can be
copied into any directory on the computer where the slave
management host resides, so long as the MySQL and NDB binaries
have read permissions in that directory. In this case, we will
assume that these files have been copied into the directory
/var/BACKUPS/BACKUP-1.
It is not necessary that the slave cluster have the same
number of ndbd processes (data nodes) as
the master; however, it is highly recommended this number be
the same. It is necessary that the slave
be started with the
--skip-slave-start option, to
prevent premature startup of the replication process.
Create any databases on the slave cluster that are present on the master cluster that are to be replicated to the slave.
A CREATE DATABASE (or
CREATE
SCHEMA) statement corresponding to each database
to be replicated must be executed on each SQL node in the
slave cluster.
Reset the slave cluster using this statement in the MySQL Monitor:
mysqlS>RESET SLAVE;
You can now start the cluster restoration process on the
replication slave using the ndb_restore
command for each backup file in turn. For the first of these,
it is necessary to include the -m option to
restore the cluster metadata:
shellS>ndb_restore -cslave_host:port-nnode-id\-bbackup-id-m -rdir
dir is the path to the directory
where the backup files have been placed on the replication
slave. For the ndb_restore commands
corresponding to the remaining backup files, the
-m option should not be
used.
For restoring from a master cluster with four data nodes (as
shown in the figure in
Chapter 8, NDB Cluster Replication) where the backup
files have been copied to the directory
/var/BACKUPS/BACKUP-1, the proper
sequence of commands to be executed on the slave might look
like this:
shellS>ndb_restore -c rep-slave:1186 -n 2 -b 1 -m \-r ./var/BACKUPS/BACKUP-1shellS>ndb_restore -c rep-slave:1186 -n 3 -b 1 \-r ./var/BACKUPS/BACKUP-1shellS>ndb_restore -c rep-slave:1186 -n 4 -b 1 \-r ./var/BACKUPS/BACKUP-1shellS>ndb_restore -c rep-slave:1186 -n 5 -b 1 -e \-r ./var/BACKUPS/BACKUP-1
The -e (or
--restore_epoch) option
in the final invocation of ndb_restore in
this example is required in order that the epoch is written
to the slave mysql.ndb_apply_status.
Without this information, the slave will not be able to
synchronize properly with the master. (See
Section 6.20, “ndb_restore — Restore an NDB Cluster Backup”.)
Now you need to obtain the most recent epoch from the
ndb_apply_status table on the slave (as
discussed in
Section 8.8, “Implementing Failover with NDB Cluster Replication”):
mysqlS>SELECT @latest:=MAX(epoch)FROM mysql.ndb_apply_status;
Using @latest as the epoch value obtained
in the previous step, you can obtain the correct starting
position @pos in the correct binary log
file @file from the master's
mysql.ndb_binlog_index table using the
query shown here:
mysqlM>SELECT->@file:=SUBSTRING_INDEX(File, '/', -1),->@pos:=Position->FROM mysql.ndb_binlog_index->WHERE epoch > @latest->ORDER BY epoch ASC LIMIT 1;
In the event that there is currently no replication traffic,
you can get this information by running
SHOW MASTER STATUS on the
master and using the value in the Position
column for the file whose name has the suffix with the
greatest value for all files shown in the
File column. However, in this case, you
must determine this and supply it in the next step manually or
by parsing the output with a script.
Using the values obtained in the previous step, you can now
issue the appropriate CHANGE MASTER
TO statement in the slave's mysql
client:
mysqlS>CHANGE MASTER TO->MASTER_LOG_FILE='@file',->MASTER_LOG_POS=@pos;
Now that the slave “knows” from what point in which binary log file to start reading data from the master, you can cause the slave to begin replicating with this standard MySQL statement:
mysqlS>START SLAVE;
To perform a backup and restore on a second replication channel, it is necessary only to repeat these steps, substituting the host names and IDs of the secondary master and slave for those of the primary master and slave replication servers where appropriate, and running the preceding statements on them.
For additional information on performing Cluster backups and restoring Cluster from backups, see Section 7.3, “Online Backup of NDB Cluster”.
It is possible to automate much of the process described in the
previous section (see
Section 8.9, “NDB Cluster Backups With NDB Cluster Replication”). The
following Perl script reset-slave.pl serves
as an example of how you can do this.
#!/user/bin/perl -w
# file: reset-slave.pl
# Copyright ©2005 MySQL AB
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to:
# Free Software Foundation, Inc.
# 59 Temple Place, Suite 330
# Boston, MA 02111-1307 USA
#
# Version 1.1
######################## Includes ###############################
use DBI;
######################## Globals ################################
my $m_host='';
my $m_port='';
my $m_user='';
my $m_pass='';
my $s_host='';
my $s_port='';
my $s_user='';
my $s_pass='';
my $dbhM='';
my $dbhS='';
####################### Sub Prototypes ##########################
sub CollectCommandPromptInfo;
sub ConnectToDatabases;
sub DisconnectFromDatabases;
sub GetSlaveEpoch;
sub GetMasterInfo;
sub UpdateSlave;
######################## Program Main ###########################
CollectCommandPromptInfo;
ConnectToDatabases;
GetSlaveEpoch;
GetMasterInfo;
UpdateSlave;
DisconnectFromDatabases;
################## Collect Command Prompt Info ##################
sub CollectCommandPromptInfo
{
### Check that user has supplied correct number of command line args
die "Usage:\n
reset-slave >master MySQL host< >master MySQL port< \n
>master user< >master pass< >slave MySQL host< \n
>slave MySQL port< >slave user< >slave pass< \n
All 8 arguments must be passed. Use BLANK for NULL passwords\n"
unless @ARGV == 8;
$m_host = $ARGV[0];
$m_port = $ARGV[1];
$m_user = $ARGV[2];
$m_pass = $ARGV[3];
$s_host = $ARGV[4];
$s_port = $ARGV[5];
$s_user = $ARGV[6];
$s_pass = $ARGV[7];
if ($m_pass eq "BLANK") { $m_pass = '';}
if ($s_pass eq "BLANK") { $s_pass = '';}
}
############### Make connections to both databases #############
sub ConnectToDatabases
{
### Connect to both master and slave cluster databases
### Connect to master
$dbhM
= DBI->connect(
"dbi:mysql:database=mysql;host=$m_host;port=$m_port",
"$m_user", "$m_pass")
or die "Can't connect to Master Cluster MySQL process!
Error: $DBI::errstr\n";
### Connect to slave
$dbhS
= DBI->connect(
"dbi:mysql:database=mysql;host=$s_host",
"$s_user", "$s_pass")
or die "Can't connect to Slave Cluster MySQL process!
Error: $DBI::errstr\n";
}
################ Disconnect from both databases ################
sub DisconnectFromDatabases
{
### Disconnect from master
$dbhM->disconnect
or warn " Disconnection failed: $DBI::errstr\n";
### Disconnect from slave
$dbhS->disconnect
or warn " Disconnection failed: $DBI::errstr\n";
}
###################### Find the last good GCI ##################
sub GetSlaveEpoch
{
$sth = $dbhS->prepare("SELECT MAX(epoch)
FROM mysql.ndb_apply_status;")
or die "Error while preparing to select epoch from slave: ",
$dbhS->errstr;
$sth->execute
or die "Selecting epoch from slave error: ", $sth->errstr;
$sth->bind_col (1, \$epoch);
$sth->fetch;
print "\tSlave Epoch = $epoch\n";
$sth->finish;
}
####### Find the position of the last GCI in the binary log ########
sub GetMasterInfo
{
$sth = $dbhM->prepare("SELECT
SUBSTRING_INDEX(File, '/', -1), Position
FROM mysql.ndb_binlog_index
WHERE epoch > $epoch
ORDER BY epoch ASC LIMIT 1;")
or die "Prepare to select from master error: ", $dbhM->errstr;
$sth->execute
or die "Selecting from master error: ", $sth->errstr;
$sth->bind_col (1, \$binlog);
$sth->bind_col (2, \$binpos);
$sth->fetch;
print "\tMaster binary log = $binlog\n";
print "\tMaster binary log position = $binpos\n";
$sth->finish;
}
########## Set the slave to process from that location #########
sub UpdateSlave
{
$sth = $dbhS->prepare("CHANGE MASTER TO
MASTER_LOG_FILE='$binlog',
MASTER_LOG_POS=$binpos;")
or die "Prepare to CHANGE MASTER error: ", $dbhS->errstr;
$sth->execute
or die "CHANGE MASTER on slave error: ", $sth->errstr;
$sth->finish;
print "\tSlave has been updated. You may now start the slave.\n";
}
# end reset-slave.pl
Point-in-time
recovery—that is, recovery of data changes made since a
given point in time—is performed after restoring a full
backup that returns the server to its state when the backup was
made. Performing point-in-time recovery of NDB Cluster tables
with NDB Cluster and NDB Cluster Replication can be accomplished
using a native NDB data backup
(taken by issuing CREATE
BACKUP in the ndb_mgm client) and
restoring the ndb_binlog_index table (from a
dump made using mysqldump).
To perform point-in-time recovery of NDB Cluster, it is necessary to follow the steps shown here:
Back up all NDB databases in the cluster,
using the START BACKUP
command in the ndb_mgm client (see
Section 7.3, “Online Backup of NDB Cluster”).
At some later point, prior to restoring the cluster, make a
backup of the mysql.ndb_binlog_index
table. It is probably simplest to use
mysqldump for this task. Also back up the
binary log files at this time.
This backup should be updated regularly—perhaps even hourly—depending on your needs.
(Catastrophic failure or error occurs.)
Locate the last known good backup.
Clear the data node file systems (using
ndbd
--initial or
ndbmtd
--initial).
NDB Cluster Disk Data tablespace and log files are not
removed by --initial. You
must delete these manually.
Use DROP TABLE or
TRUNCATE TABLE with the
mysql.ndb_binlog_index table.
Execute ndb_restore, restoring all data.
You must include the
--restore_epoch option
when you run ndb_restore, so that the
ndb_apply_status table is populated
correctly. (See
Section 6.20, “ndb_restore — Restore an NDB Cluster Backup”, for
more information.)
Restore the ndb_binlog_index table from
the output of mysqldump and restore the
binary log files from backup, if necessary.
Find the epoch applied most recently—that is, the
maximum epoch column value in the
ndb_apply_status table—as the user
variable @LATEST_EPOCH (emphasized):
SELECT @LATEST_EPOCH:=MAX(epoch)
FROM mysql.ndb_apply_status;
Find the latest binary log file
(@FIRST_FILE) and position
(Position column value) within this file
that correspond to @LATEST_EPOCH in the
ndb_binlog_index table:
SELECT Position, @FIRST_FILE:=File FROM mysql.ndb_binlog_index WHERE epoch > @LATEST_EPOCH ORDER BY epoch ASC LIMIT 1;
Using mysqlbinlog, replay the binary log events from the given file and position up to the point of the failure. (See mysqlbinlog — Utility for Processing Binary Log Files.)
See also Point-in-Time (Incremental) Recovery Using the Binary Log, for more information about the binary log, replication, and incremental recovery.
It is possible to use NDB Cluster in multi-master replication, including circular replication between a number of NDB Clusters.
Circular replication example. In the next few paragraphs we consider the example of a replication setup involving three NDB Clusters numbered 1, 2, and 3, in which Cluster 1 acts as the replication master for Cluster 2, Cluster 2 acts as the master for Cluster 3, and Cluster 3 acts as the master for Cluster 1. Each cluster has two SQL nodes, with SQL nodes A and B belonging to Cluster 1, SQL nodes C and D belonging to Cluster 2, and SQL nodes E and F belonging to Cluster 3.
Circular replication using these clusters is supported as long as the following conditions are met:
The SQL nodes on all masters and slaves are the same
All SQL nodes acting as replication masters and slaves are
started using the
--log-slave-updates option
This type of circular replication setup is shown in the following diagram:
In this scenario, SQL node A in Cluster 1 replicates to SQL node C in Cluster 2; SQL node C replicates to SQL node E in Cluster 3; SQL node E replicates to SQL node A. In other words, the replication line (indicated by the red arrows in the diagram) directly connects all SQL nodes used as replication masters and slaves.
It is also possible to set up circular replication in such a way that not all master SQL nodes are also slaves, as shown here:
In this case, different SQL nodes in each cluster are used as
replication masters and slaves. However, you must
not start any of the SQL nodes using
--log-slave-updates. This type of
circular replication scheme for NDB Cluster, in which the line of
replication (again indicated by the red arrows in the diagram) is
discontinuous, should be possible, but it should be noted that it
has not yet been thoroughly tested and must therefore still be
considered experimental.
Using NDB-native backup and restore to initialize a slave NDB Cluster.
When setting up circular replication, it is possible to
initialize the slave cluster by using the management client
BACKUP command on one NDB Cluster to create a
backup and then applying this backup on another NDB Cluster
using ndb_restore. However, this does not
automatically create binary logs on the second NDB
Cluster's SQL node acting as the replication slave. In
order to cause the binary logs to be created, you must issue a
SHOW TABLES statement on that SQL
node; this should be done prior to running
START SLAVE.
This is a known issue which we intend to address in a future release.
Multi-master failover example. In this section, we discuss failover in a multi-master NDB Cluster replication setup with three NDB Clusters having server IDs 1, 2, and 3. In this scenario, Cluster 1 replicates to Clusters 2 and 3; Cluster 2 also replicates to Cluster 3. This relationship is shown here:
In other words, data replicates from Cluster 1 to Cluster 3 through 2 different routes: directly, and by way of Cluster 2.
Not all MySQL servers taking part in multi-master replication must act as both master and slave, and a given NDB Cluster might use different SQL nodes for different replication channels. Such a case is shown here:
MySQL servers acting as replication slaves must be run with the
--log-slave-updates option. Which
mysqld processes require this option is also
shown in the preceding diagram.
Using the --log-slave-updates
option has no effect on servers not being run as replication
slaves.
The need for failover arises when one of the replicating clusters goes down. In this example, we consider the case where Cluster 1 is lost to service, and so Cluster 3 loses 2 sources of updates from Cluster 1. Because replication between NDB Clusters is asynchronous, there is no guarantee that Cluster 3's updates originating directly from Cluster 1 are more recent than those received through Cluster 2. You can handle this by ensuring that Cluster 3 catches up to Cluster 2 with regard to updates from Cluster 1. In terms of MySQL servers, this means that you need to replicate any outstanding updates from MySQL server C to server F.
On server C, perform the following queries:
mysqlC> SELECT @latest:=MAX(epoch)
-> FROM mysql.ndb_apply_status
-> WHERE server_id=1;
mysqlC> SELECT
-> @file:=SUBSTRING_INDEX(File, '/', -1),
-> @pos:=Position
-> FROM mysql.ndb_binlog_index
-> WHERE orig_epoch >= @latest
-> AND orig_server_id = 1
-> ORDER BY epoch ASC LIMIT 1;
You can improve the performance of this query, and thus likely
speed up failover times significantly, by adding the appropriate
index to the ndb_binlog_index table. See
Section 8.4, “NDB Cluster Replication Schema and Tables”, for more
information.
Copy over the values for @file and
@pos manually from server C to server F
(or have your application perform the equivalent). Then, on server
F, execute the following CHANGE MASTER
TO statement:
mysqlF> CHANGE MASTER TO
-> MASTER_HOST = 'serverC'
-> MASTER_LOG_FILE='@file',
-> MASTER_LOG_POS=@pos;
Once this has been done, you can issue a
START SLAVE statement on MySQL
server F, and any missing updates originating from server B will
be replicated to server F.
The CHANGE MASTER TO statement also
supports an IGNORE_SERVER_IDS option which
takes a comma-separated list of server IDs and causes events
originating from the corresponding servers to be ignored. For more
information, see CHANGE MASTER TO Syntax, and
SHOW SLAVE STATUS Syntax. For information about how
this option intereacts with the
ndb_log_apply_status variable,
see Section 8.8, “Implementing Failover with NDB Cluster Replication”.
When using a replication setup involving multiple masters (including circular replication), it is possible that different masters may try to update the same row on the slave with different data. Conflict resolution in NDB Cluster Replication provides a means of resolving such conflicts by permitting a user-defined resolution column to be used to determine whether or not an update on a given master should be applied on the slave.
Some types of conflict resolution supported by NDB Cluster
(NDB$OLD(), NDB$MAX(),
NDB$MAX_DELETE_WIN()) implement this
user-defined column as a “timestamp” column (although
its type cannot be TIMESTAMP, as
explained later in this section). These types of conflict
resolution are always applied a row-by-row basis rather than a
transactional basis. The epoch-based conflict resolution functions
NDB$EPOCH() and
NDB$EPOCH_TRANS() compare the order in which
epochs are replicated (and thus these functions are
transactional). Different methods can be used to compare
resolution column values on the slave when conflicts occur, as
explained later in this section; the method used can be set on a
per-table basis.
You should also keep in mind that it is the application's responsibility to ensure that the resolution column is correctly populated with relevant values, so that the resolution function can make the appropriate choice when determining whether to apply an update.
Requirements. Preparations for conflict resolution must be made on both the master and the slave. These tasks are described in the following list:
On the master writing the binary logs, you must determine
which columns are sent (all columns or only those that have
been updated). This is done for the MySQL Server as a whole by
applying the mysqld startup option
--ndb-log-updated-only
(described later in this section) or on a per-table basis by
entries in the mysql.ndb_replication table
(see The ndb_replication system table).
If you are replicating tables with very large columns (such
as TEXT or
BLOB columns),
--ndb-log-updated-only can
also be useful for reducing the size of the master and slave
binary logs and avoiding possible replication failures due
to exceeding
max_allowed_packet.
See Replication and max_allowed_packet, for more information about this issue.
On the slave, you must determine which type of conflict
resolution to apply (“latest timestamp wins”,
“same timestamp wins”, “primary
wins”, “primary wins, complete
transaction”, or none). This is done using the
mysql.ndb_replication system table, on a
per-table basis (see
The ndb_replication system table).
NDB 7.4.1 and later support read conflict detection, that is,
detecting conflicts between reads of a given row in one
cluster and updates or deletes of the same row in another
cluster. This requires exclusive read locks obtained by
setting
ndb_log_exclusive_reads equal
to 1 on the slave. All rows read by a conflicting read are
logged in the exceptions table. For more information, see
Read conflict detection and resolution.
When using the functions NDB$OLD(),
NDB$MAX(), and
NDB$MAX_DELETE_WIN() for timestamp-based
conflict resolution, we often refer to the column used for
determining updates as a “timestamp” column. However,
the data type of this column is never
TIMESTAMP; instead, its data type
should be INT
(INTEGER) or
BIGINT. The
“timestamp” column should also be
UNSIGNED and NOT NULL.
The NDB$EPOCH() and
NDB$EPOCH_TRANS() functions discussed later in
this section work by comparing the relative order of replication
epochs applied on a primary and secondary NDB Cluster, and do not
make use of timestamps.
Master column control.
We can see update operations in terms of “before”
and “after” images—that is, the states of the
table before and after the update is applied. Normally, when
updating a table with a primary key, the “before”
image is not of great interest; however, when we need to
determine on a per-update basis whether or not to use the
updated values on a replication slave, we need to make sure that
both images are written to the master's binary log. This is
done with the
--ndb-log-update-as-write option
for mysqld, as described later in this
section.
Whether logging of complete rows or of updated columns only is done is decided when the MySQL server is started, and cannot be changed online; you must either restart mysqld, or start a new mysqld instance with different logging options.
| Command-Line Format | --ndb-log-updated-only | ||
| System Variable | Name | ndb_log_updated_only | |
| Variable Scope | Global | ||
| Dynamic Variable | Yes | ||
| Permitted Values | Type | boolean | |
| Default | ON | ||
For purposes of conflict resolution, there are two basic methods
of logging rows, as determined by the setting of the
--ndb-log-updated-only option for
mysqld:
Log complete rows
Log only column data that has been updated—that is, column data whose value has been set, regardless of whether or not this value was actually changed. This is the default behavior.
It is usually sufficient—and more efficient—to log
updated columns only; however, if you need to log full rows, you
can do so by setting
--ndb-log-updated-only to
0 or OFF.
| Command-Line Format | --ndb-log-update-as-write | ||
| System Variable | Name | ndb_log_update_as_write | |
| Variable Scope | Global | ||
| Dynamic Variable | Yes | ||
| Permitted Values | Type | boolean | |
| Default | ON | ||
The setting of the MySQL Server's
--ndb-log-update-as-write option
determines whether logging is performed with or without the
“before” image. Because conflict resolution is done
in the MySQL Server's update handler, it is necessary to
control logging on the master such that updates are updates and
not writes; that is, such that updates are treated as changes in
existing rows rather than the writing of new rows (even though
these replace existing rows). This option is turned on by default;
in other words, updates are treated as writes. (That is, updates
are by default written as write_row events in
the binary log, rather than as update_row
events.)
To turn off the option, start the master mysqld
with --ndb-log-update-as-write=0 or
--ndb-log-update-as-write=OFF. You must do this
when replicating from NDB tables to tables using a different
storage engine; see
Replication from NDB to other storage engines, and
Replication from NDB to a nontransactional storage engine,
for more information.
Conflict resolution control.
Conflict resolution is usually enabled on the server where
conflicts can occur. Like logging method selection, it is
enabled by entries in the
mysql.ndb_replication table.
The ndb_replication system table.
To enable conflict resolution, it is necessary to create an
ndb_replication table in the
mysql system database on the master, the
slave, or both, depending on the conflict resolution type and
method to be employed. This table is used to control logging and
conflict resolution functions on a per-table basis, and has one
row per table involved in replication.
ndb_replication is created and filled with
control information on the server where the conflict is to be
resolved. In a simple master-slave setup where data can also be
changed locally on the slave this will typically be the slave.
In a more complex master-master (2-way) replication schema this
will usually be all of the masters involved. Each row in
mysql.ndb_replication corresponds to a table
being replicated, and specifies how to log and resolve conflicts
(that is, which conflict resolution function, if any, to use)
for that table. The definition of the
mysql.ndb_replication table is shown here:
CREATE TABLE mysql.ndb_replication (
db VARBINARY(63),
table_name VARBINARY(63),
server_id INT UNSIGNED,
binlog_type INT UNSIGNED,
conflict_fn VARBINARY(128),
PRIMARY KEY USING HASH (db, table_name, server_id)
) ENGINE=NDB
PARTITION BY KEY(db,table_name);
The columns in this table are described in the next few paragraphs.
db.
The name of the database containing the table to be replicated.
You may employ either or both of the wildcards
_ and % as part of the
database name. Matching is similar to what is implemented for
the LIKE operator.
table_name.
The name of the table to be replicated. The table name may
include either or both of the wildcards _ and
%. Matching is similar to what is implemented
for the LIKE operator.
server_id. The unique server ID of the MySQL instance (SQL node) where the table resides.
binlog_type. The type of binary logging to be employed. This is determined as shown in the following table:
| Value | Internal Value | Description |
|---|---|---|
| 0 | NBT_DEFAULT | Use server default |
| 1 | NBT_NO_LOGGING | Do not log this table in the binary log |
| 2 | NBT_UPDATED_ONLY | Only updated attributes are logged |
| 3 | NBT_FULL | Log full row, even if not updated (MySQL server default behavior) |
| 4 | NBT_USE_UPDATE | (For generating NBT_UPDATED_ONLY_USE_UPDATE and
NBT_FULL_USE_UPDATE values
only—not intended for separate use) |
| 5 | [Not used] | --- |
| 6 | NBT_UPDATED_ONLY_USE_UPDATE (equal to
NBT_UPDATED_ONLY | NBT_USE_UPDATE) | Use updated attributes, even if values are unchanged |
| 7 | NBT_FULL_USE_UPDATE (equal to NBT_FULL |
NBT_USE_UPDATE) | Use full row, even if values are unchanged |
conflict_fn. The conflict resolution function to be applied. This function must be specified as one of those shown in the following list:
NDB$EPOCH2() (NDB 7.4.2 and later)
NDB$EPOCH2_TRANS() (NDB 7.4.2 and later)
NULL: Indicates that conflict resolution is
not to be used for the corresponding table.
These functions are described in the next few paragraphs.
NDB$OLD(column_name).
If the value of column_name is the
same on both the master and the slave, then the update is
applied; otherwise, the update is not applied on the slave and
an exception is written to the log. This is illustrated by the
following pseudocode:
if (master_old_column_value==slave_current_column_value) apply_update(); else log_exception();
This function can be used for “same value wins” conflict resolution. This type of conflict resolution ensures that updates are not applied on the slave from the wrong master.
The column value from the master's “before” image is used by this function.
NDB$MAX(column_name). If the “timestamp” column value for a given row coming from the master is higher than that on the slave, it is applied; otherwise it is not applied on the slave. This is illustrated by the following pseudocode:
if (master_new_column_value>slave_current_column_value) apply_update();
This function can be used for “greatest timestamp wins” conflict resolution. This type of conflict resolution ensures that, in the event of a conflict, the version of the row that was most recently updated is the version that persists.
The column value from the master's “after” image is used by this function.
NDB$MAX_DELETE_WIN().
This is a variation on NDB$MAX(). Due to the
fact that no timestamp is available for a delete operation, a
delete using NDB$MAX() is in fact processed
as NDB$OLD. However, for some use cases, this
is not optimal. For NDB$MAX_DELETE_WIN(), if
the “timestamp” column value for a given row adding
or updating an existing row coming from the master is higher
than that on the slave, it is applied. However, delete
operations are treated as always having the higher value. This
is illustrated in the following pseudocode:
if ( (master_new_column_value>slave_current_column_value) ||operation.type== "delete") apply_update();
This function can be used for “greatest timestamp, delete wins” conflict resolution. This type of conflict resolution ensures that, in the event of a conflict, the version of the row that was deleted or (otherwise) most recently updated is the version that persists.
As with NDB$MAX(), the column value from the
master's “after” image is the value used by
this function.
NDB$EPOCH() and NDB$EPOCH_TRANS().
The NDB$EPOCH() function tracks the order in
which replicated epochs are applied on a slave NDB Cluster
relative to changes originating on the slave. This relative
ordering is used to determine whether changes originating on the
slave are concurrent with any changes that originate locally,
and are therefore potentially in conflict.
Most of what follows in the description of
NDB$EPOCH() also applies to
NDB$EPOCH_TRANS(). Any exceptions are noted in
the text.
NDB$EPOCH() is asymmetric, operating on one NDB
Cluster in a two-cluster circular replication configuration
(sometimes referred to as “active-active”
replication). We refer here to cluster on which it operates as the
primary, and the other as the secondary. The slave on the primary
is responsible for detecting and handling conflicts, while the
slave on the secondary is not involved in any conflict detection
or handling.
When the slave on the primary detects conflicts, it injects events into its own binary log to compensate for these; this ensures that the secondary NDB Cluster eventually realigns itself with the primary and so keeps the primary and secondary from diverging. This compensation and realignment mechanism requires that the primary NDB Cluster always wins any conflicts with the secondary—that is, that the primary's changes are always used rather than those from the secondary in event of a conflict. This “primary always wins” rule has the following implications:
Operations that change data, once committed on the primary, are fully persistent and will not be undone or rolled back by conflict detection and resolution.
Data read from the primary is fully consistent. Any changes committed on the Primary (locally or from the slave) will not be reverted later.
Operations that change data on the secondary may later be reverted if the primary determines that they are in conflict.
Individual rows read on the secondary are self-consistent at all times, each row always reflecting either a state committed by the secondary, or one committed by the primary.
Sets of rows read on the secondary may not necessarily be
consistent at a given single point in time. For
NDB$EPOCH_TRANS(), this is a transient
state; for NDB$EPOCH(), it can be a
persistent state.
Assuming a period of sufficient length without any conflicts, all data on the secondary NDB Cluster (eventually) becomes consistent with the primary's data.
NDB$EPOCH() and
NDB$EPOCH_TRANS() do not require any user
schema modifications, or application changes to provide conflict
detection. However, careful thought must be given to the schema
used, and the access patterns used, to verify that the complete
system behaves within specified limits.
Each of the NDB$EPOCH() and
NDB$EPOCH_TRANS() functions can take an
optional parameter; this is the number of bits to use to represent
the lower 32 bits of the epoch, and should be set to no less than
CEIL( LOG2(TimeBetweenGlobalCheckpoints/TimeBetweenEpochs), 1)
For the default values of these configuration parameters (2000 and
100 milliseconds, respectively), this gives a value of 5 bits, so
the default value (6) should be sufficient, unless other values
are used for
TimeBetweenGlobalCheckpoints,
TimeBetweenEpochs, or
both. A value that is too small can result in false positives,
while one that is too large could lead to excessive wasted space
in the database.
Both NDB$EPOCH() and
NDB$EPOCH_TRANS() insert entries for
conflicting rows into the relevant exceptions tables, provided
that these tables have been defined according to the same
exceptions table schema rules as described elsewhere in this
section (see NDB$OLD(column_name)).
You need to create any exceptions table before creating the table
with which it is to be used.
As with the other conflict detection functions discussed in this
section, NDB$EPOCH() and
NDB$EPOCH_TRANS() are activated by including
relevant entries in the mysql.ndb_replication
table (see The ndb_replication system table).
The roles of the primary and secondary NDB Clusters in this
scenario are fully determined by
mysql.ndb_replication table entries.
Because the conflict detection algorithms employed by
NDB$EPOCH() and
NDB$EPOCH_TRANS() are asymmetric, you must use
different values for the primary slave's and secondary
slave's server_id entries.
Prior to NDB 7.3.6, conflicts between DELETE
operations were handled like those for UPDATE
operations, and within the same epoch were considered in conflict.
In NDB 7.3.6 and later, a conflict between
DELETE operations alone is not sufficient to
trigger a conflict using NDB$EPOCH() or
NDB$EPOCH_TRANS(), and the relative placement
within epochs does not matter. (Bug #18459944)
Conflict detection status variables.
Several status variables can be used to monitor conflict
detection. You can see how many rows have been found in conflict
by NDB$EPOCH() since this slave was last
restarted from the current value of the
Ndb_conflict_fn_epoch system
status variable.
Ndb_conflict_fn_epoch_trans
provides the number of rows that have been found directly in
conflict by NDB$EPOCH_TRANS().
Ndb_conflict_fn_epoch2 and
Ndb_conflict_fn_epoch2_trans,
added in NDB 7.4.2, show the number of rows found in conflict by
NDB$EPOCH2() and
NDB$EPOCH2_TRANS(), respectively. The number of
rows actually realigned, including those affected due to their
membership in or dependency on the same transactions as other
conflicting rows, is given by
Ndb_conflict_trans_row_reject_count.
For more information, see Section 5.3.8.3, “NDB Cluster Status Variables”.
Limitations on NDB$EPOCH().
The following limitations currently apply when using
NDB$EPOCH() to perform conflict detection:
Conflicts are detected using NDB Cluster epoch boundaries,
with granularity proportional to
TimeBetweenEpochs
(default: 100 milliseconds). The minimum conflict window is
the minimum time during which concurrent updates to the same
data on both clusters always report a conflict. This is always
a nonzero length of time, and is roughly proportional to
2 * (latency + queueing +
TimeBetweenEpochs). This implies that—assuming
the default for
TimeBetweenEpochs and
ignoring any latency between clusters (as well as any queuing
delays)—the minimum conflict window size is
approximately 200 milliseconds. This minimum window should be
considered when looking at expected application
“race” patterns.
Additional storage is required for tables using the
NDB$EPOCH() and
NDB$EPOCH_TRANS() functions; from 1 to 32
bits extra space per row is required, depending on the value
passed to the function.
Conflicts between delete operations may result in divergence between the primary and secondary. When a row is deleted on both clusters concurrently, the conflict can be detected, but is not recorded, since the row is deleted. This means that further conflicts during the propagation of any subsequent realignment operations will not be detected, which can lead to divergence.
Deletes should be externally serialized, or routed to one cluster only. Alternatively, a separate row should be updated transactionally with such deletes and any inserts that follow them, so that conflicts can be tracked across row deletes. This may require changes in applications.
Only two NDB Clusters in a circular
“active-active” configuration are currently
supported when using NDB$EPOCH() or
NDB$EPOCH_TRANS() for conflict detection.
Tables having BLOB or
TEXT columns are not currently
supported with NDB$EPOCH() or
NDB$EPOCH_TRANS().
NDB$EPOCH_TRANS().
NDB$EPOCH_TRANS() extends the
NDB$EPOCH() function. Conflicts are detected
and handled in the same way using the “primary wins
all” rule (see
NDB$EPOCH() and NDB$EPOCH_TRANS()) but with
the extra condition that any other rows updated in the same
transaction in which the conflict occurred are also regarded as
being in conflict. In other words, where
NDB$EPOCH() realigns individual conflicting
rows on the secondary, NDB$EPOCH_TRANS()
realigns conflicting transactions.
In addition, any transactions which are detectably dependent on a conflicting transaction are also regarded as being in conflict, these dependencies being determined by the contents of the secondary cluster's binary log. Since the binary log contains only data modification operations (inserts, updates, and deletes), only overlapping data modifications are used to determine dependencies between transactions.
NDB$EPOCH_TRANS() is subject to the same
conditions and limitations as NDB$EPOCH(), and
in addition requires that Version 2 binary log row events are used
(--log-bin-use-v1-row-events equal
to 0), which adds a storage overhead of 2 bytes per event in the
binary log. In addition, all transaction IDs must be recorded in
the secondary's binary log
(--ndb-log-transaction-id option),
which adds a further variable overhead (up to 13 bytes per row).
See NDB$EPOCH() and NDB$EPOCH_TRANS().
Status information.
A server status variable
Ndb_conflict_fn_max provides a
count of the number of times that a row was not applied on the
current SQL node due to “greatest timestamp wins”
conflict resolution since the last time that
mysqld was started.
The number of times that a row was not applied as the result of
“same timestamp wins” conflict resolution on a given
mysqld since the last time it was restarted is
given by the global status variable
Ndb_conflict_fn_old. In addition
to incrementing
Ndb_conflict_fn_old, the primary
key of the row that was not used is inserted into an
exceptions table, as
explained later in this section.
NDB$EPOCH2().
The NDB$EPOCH2() function, added in NDB
7.4.2, is similar to NDB$EPOCH(), except that
NDB$EPOCH2() provides for delete-delete
handling with a circular replication
(“master-master”) topology. In this scenario,
primary and secondary roles are assigned to the two masters by
setting the
ndb_slave_conflict_role system
variable to the appropriate value on each master (usually one
each of PRIMARY,
SECONDARY). When this is done, modifications
made by the secondary are reflected by the primary back to the
secondary which then conditionally applies them.
NDB$EPOCH2_TRANS().
In NDB 7.4.2 and later, NDB$EPOCH2_TRANS()
extends the NDB$EPOCH2() function. Conflicts
are detected and handled in the same way, and assigning primary
and secondary roles to the replicating clusters, but with the
extra condition that any other rows updated in the same
transaction in which the conflict occurred are also regarded as
being in conflict. That is, NDB$EPOCH2()
realigns individual conflicting rows on the secondary, while
NDB$EPOCH_TRANS() realigns conflicting
transactions.
Where NDB$EPOCH() and
NDB$EPOCH_TRANS() use metadata that is
specified per row, per last modified epoch, to determine on the
primary whether an incoming replicated row change from the
secondary is concurrent with a locally committed change;
concurrent changes are regarded as conflicting, with subesequent
exceptions table updates and realignment of the secondary. A
problem arises when a row is deleted on the primary so there is no
longer any last-modified epoch available to determine whether any
replicated operations conflict, which means that conflicting
delete operationss are not detected. This can result in
divergence, an example being a delete on one cluster which is
concurrent with a delete and insert on the other; this why delete
operations can be routed to only one cluster when using
NDB$EPOCH() and
NDB$EPOCH_TRANS().
NDB$EPOCH2() bypasses the issue just
described—storing information about deleted rows on the
PRIMARY—by ignoring any delete-delete conflict, and by
avoiding any potential resultant divergence as well. This is
accomplished by reflecting any operation successfully applied on
and replicated from the secondary back to the secondary. On its
return to the secondary, it can be used to reapply an operation on
the secondary which was deleted by an operation originating from
the primary.
When using NDB$EPOCH2(), you should keep in
mind that the secondary applies the delete from the primary,
removing the new row until it is restored by a reflected
operation. In theory, the subsequent insert or update on the
secondary conflicts with the delete from the primary, but in this
case, we choose to ignore this and allow the secondary to
“win”, in the interest of preventing divergence
between the clusters. In other words, after a delete, the primary
does not detect conflicts, and instead adopts the secondary's
following changes immediately. Because of this, the
secondary's state can revisit multiple previous committed
states as it progresses to a final (stable) state, and some of
these may be visible.
You should also be aware that reflecting all operations from the secondary back to the primary increases the size of the primary's logbinary log, as well as demands on bandwidth, CPU usage, and disk I/O.
Application of reflected operations on the secondary depends on
the state of the target row on the secondary. Whether or not
reflected changes are applied on the secondary can be tracked by
checking the
Ndb_conflict_reflected_op_prepare_count
and
Ndb_conflict_reflected_op_discard_count
status variables (both added in NDB 7.4.2). The number of changes
applied is simply the difference between these two values (note
that Ndb_conflict_reflected_op_prepare_count is
always greater than or equal to
Ndb_conflict_reflected_op_discard_count).
Events are applied if and only if both of the following conditions are true:
The existence of the row—that is, whether or not it exists—is in accordance with the type of event. For delete and update operations, the row must already exist; for insert operations, the row must not exist.
The row was last modified by the primary. It is possible that the modification was accomplished through the execution of a reflected operation.
If both of the conditions are not met, the reflected operation is discarded by the secondary.
Conflict resolution exceptions table.
To use the NDB$OLD() conflict resolution
function, it is also necessary to create an exceptions table
corresponding to each NDB table for
which this type of conflict resolution is to be employed. This
is also true when using NDB$EPOCH() or
NDB$EPOCH_TRANS(). The name of this table is
that of the table for which conflict resolution is to be
applied, with the string $EX appended. (For
example, if the name of the original table is
mytable, the name of the corresponding
exceptions table name should be mytable$EX.)
Prior to NDB 7.4.1, this table is created as shown:
CREATE TABLEoriginal_table$EX ( server_id INT UNSIGNED, master_server_id INT UNSIGNED, master_epoch BIGINT UNSIGNED, count INT UNSIGNED,original_table_pk_columns, [additional_columns,] PRIMARY KEY(server_id, master_server_id, master_epoch, count) ) ENGINE=NDB;
NDB 7.4.1 and later support an extended exceptions table definition that includes optional columns providing information about an exception's type, cause, and originating transaction. In these versions, the syntax for creating the exceptions table is as shown here:
CREATE TABLEoriginal_table$EX ( [NDB$]server_id INT UNSIGNED, [NDB$]master_server_id INT UNSIGNED, [NDB$]master_epoch BIGINT UNSIGNED, [NDB$]count INT UNSIGNED, [NDB$OP_TYPE ENUM('WRITE_ROW','UPDATE_ROW', 'DELETE_ROW', 'REFRESH_ROW', 'READ_ROW') NOT NULL,] [NDB$CFT_CAUSE ENUM('ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS', 'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL,] [NDB$ORIG_TRANSID BIGINT UNSIGNED NOT NULL,]original_table_pk_columns, [orig_table_column|orig_table_column$OLD|orig_table_column$NEW,] [additional_columns,] PRIMARY KEY([NDB$]server_id, [NDB$]master_server_id, [NDB$]master_epoch, [NDB$]count) ) ENGINE=NDB;
The first four columns are required. The names of the first four
columns and the columns matching the original table's primary
key columns are not critical; however, we suggest for reasons of
clarity and consistency, that you use the names shown here for the
server_id, master_server_id,
master_epoch, and count
columns, and that you use the same names as in the original table
for the columns matching those in the original table's
primary key.
Starting with NDB 7.4.1, if the exceptions table uses one or more
of the optional columns NDB$OP_TYPE,
NDB$CFT_CAUSE, or
NDB$ORIG_TRANSID discussed later in this
section, then each of the required columns must also be named
using the prefix NDB$. If desired, you can use
the NDB$ prefix to name the required columns
even if you do not define any optional columns, but in this case,
all four of the required columns must be named using the prefix.
Following these columns, the columns making up the original table's primary key should be copied in the order in which they are used to define the primary key of the original table. The data types for the columns duplicating the primary key columns of the original table should be the same as (or larger than) those of the original columns. In NDB Cluster 7.3 and earlier, the exceptions table's primary key must be reproduced column for column. Beginning with NDB 7.4.1, a subset of the primary key columns may be used instead.
Regardless of the NDB Cluster version employed, the exceptions
table must use the NDB storage
engine. (An example that uses NDB$OLD() with an
exceptions table is shown later in this section.)
Additional columns may optionally be defined following the copied
primary key columns, but not before any of them; any such extra
columns cannot be NOT NULL. In NDB 7.4.1 and
later, support is provided for three additional, predefined
optional columns NDB$OP_TYPE,
NDB$CFT_CAUSE, and
NDB$ORIG_TRANSID, which are described in the
next few paragraphs.
NDB$OP_TYPE: This column can be used to obtain
the type of operation causing the conflict. If you use this
column, define it as shown here:
NDB$OP_TYPE ENUM('WRITE_ROW', 'UPDATE_ROW', 'DELETE_ROW',
'REFRESH_ROW', 'READ_ROW') NOT NULL
The WRITE_ROW, UPDATE_ROW,
and DELETE_ROW operation types represent
user-initiated operations. REFRESH_ROW
operations are operations generated by conflict resolution in
compensating transactions sent back to the originating cluster
from the cluster that detected the conflict.
READ_ROW operations are user-initiated read
tracking operations defined with exclusive row locks.
NDB$CFT_CAUSE: You can define an optional
column NDB$CFT_CAUSE which provides the cause
of the registered conflict. This column, if used, is defined as
shown here:
NDB$CFT_CAUSE ENUM('ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS',
'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL
ROW_DOES_NOT_EXIST can be reported as the cause
for UPDATE_ROW and WRITE_ROW
operations; ROW_ALREADY_EXISTS can be reported
for WRITE_ROW events.
DATA_IN_CONFLICT is reported when a row-based
conflict function detects a conflict;
TRANS_IN_CONFLICT is reported when a
transactional conflict function rejects all of the operations
belonging to a complete transaction.
NDB$ORIG_TRANSID: The
NDB$ORIG_TRANSID column, if used, contains the
ID of the originating transaction. This column should be defined
as follows:
NDB$ORIG_TRANSID BIGINT UNSIGNED NOT NULL
NDB$ORIG_TRANSID is a 64-bit value generated by
NDB. This value can be used to correlate
multiple exceptions table entries belonging to the same
conflicting transaction from the same or different exceptions
tables.
In NDB 7.4.1 and later, additional reference columns which are not
part of the original table's primary key can be named
or
colname$OLD.
colname$NEW
references old values in update and delete operations—that
is, operations containing colname$OLDDELETE_ROW events.
can be
used to reference new values in insert and update
operations—in other words, operations using
colname$NEWWRITE_ROW events, UPDATE_ROW
events, or both types of events. Where a conflicting operation
does not supply a value for a given non-primary-key reference
column, the exceptions table row contains either
NULL, or a defined default value for that
column.
The mysql.ndb_replication table is read when
a data table is set up for replication, so the row corresponding
to a table to be replicated must be inserted into
mysql.ndb_replication
before the table to be replicated is
created.
The following examples assume that you have already a working NDB Cluster replication setup, as described in Section 8.5, “Preparing the NDB Cluster for Replication”, and Section 8.6, “Starting NDB Cluster Replication (Single Replication Channel)”.
NDB$MAX() example.
Suppose you wish to enable “greatest timestamp
wins” conflict resolution on table
test.t1, using column
mycol as the “timestamp”. This
can be done using the following steps:
Make sure that you have started the master
mysqld with
--ndb-log-update-as-write=OFF.
On the master, perform this
INSERT statement:
INSERT INTO mysql.ndb_replication
VALUES ('test', 't1', 0, NULL, 'NDB$MAX(mycol)');
Inserting a 0 into the
server_id indicates that all
SQL nodes accessing this table should use conflict resolution.
If you want to use conflict resolution on a specific
mysqld only, use the actual server ID.
Inserting NULL into the
binlog_type column has the same effect as
inserting 0 (NBT_DEFAULT); the server
default is used.
Create the test.t1 table:
CREATE TABLE test.t1 (
columns
mycol INT UNSIGNED,
columns
) ENGINE=NDB;
Now, when updates are done on this table, conflict resolution
is applied, and the version of the row having the greatest
value for mycol is written to the slave.
Other binlog_type options—such as
NBT_UPDATED_ONLY_USE_UPDATE should be used to
control logging on the master using the
ndb_replication table rather than by using
command-line options.
NDB$OLD() example.
Suppose an NDB table such as the
one defined here is being replicated, and you wish to enable
“same timestamp wins” conflict resolution for
updates to this table:
CREATE TABLE test.t2 (
a INT UNSIGNED NOT NULL,
b CHAR(25) NOT NULL,
columns,
mycol INT UNSIGNED NOT NULL,
columns,
PRIMARY KEY pk (a, b)
) ENGINE=NDB;
The following steps are required, in the order shown:
First—and prior to creating
test.t2—you must insert a row into
the
mysql.ndb_replication
table, as shown here:
INSERT INTO mysql.ndb_replication
VALUES ('test', 't2', 0, NULL, 'NDB$OLD(mycol)');
Possible values for the binlog_type column
are shown earlier in this section. The value
'NDB$OLD(mycol)' should be inserted into
the conflict_fn column.
Create an appropriate exceptions table for
test.t2. The table creation statement shown
here includes all required columns; any additional columns
must be declared following these columns, and before the
definition of the table's primary key.
CREATE TABLE test.t2$EX (
server_id SMALLINT UNSIGNED,
master_server_id INT UNSIGNED,
master_epoch BIGINT UNSIGNED,
count BIGINT UNSIGNED,
a INT UNSIGNED NOT NULL,
b CHAR(25) NOT NULL,
[additional_columns,]
PRIMARY KEY(server_id, master_server_id, master_epoch, count)
) ENGINE=NDB;
In NDB 7.4.1 and later, we can include additional columns for information about the type, cause, and originating transaction ID for a given conflict. We are also not required to supply matching columns for all primary key columns in the original table. In these versions, you can create the exceptions table like this:
CREATE TABLE test.t2$EX (
NDB$server_id SMALLINT UNSIGNED,
NDB$master_server_id INT UNSIGNED,
NDB$master_epoch BIGINT UNSIGNED,
NDB$count BIGINT UNSIGNED,
a INT UNSIGNED NOT NULL,
NDB$OP_TYPE ENUM('WRITE_ROW','UPDATE_ROW', 'DELETE_ROW',
'REFRESH_ROW', 'READ_ROW') NOT NULL,
NDB$CFT_CAUSE ENUM('ROW_DOES_NOT_EXIST', 'ROW_ALREADY_EXISTS',
'DATA_IN_CONFLICT', 'TRANS_IN_CONFLICT') NOT NULL,
NDB$ORIG_TRANSID BIGINT UNSIGNED NOT NULL,
[additional_columns,]
PRIMARY KEY(NDB$server_id, NDB$master_server_id, NDB$master_epoch, NDB$count)
) ENGINE=NDB;
The NDB$ prefix is required for the four
required columns since we included at least one of the
columns NDB$OP_TYPE,
NDB$CFT_CAUSE, or
NDB$ORIG_TRANSID in the table definition.
Create the table test.t2 as shown
previously.
These steps must be followed for every table for which you wish to
perform conflict resolution using NDB$OLD().
For each such table, there must be a corresponding row in
mysql.ndb_replication, and there must be an
exceptions table in the same database as the table being
replicated.
Read conflict detection and resolution.
NDB 7.4.1 and later support tracking of read operations, which
makes it possible in circular replication setups to manage
conflicts between reads of a given row in one cluster and
updates or deletes of the same row in another. This example uses
employee and department
tables to model a scenario in which an employee is moved from
one department to another on the master cluster (which we refer
to hereafter as cluster A) while the slave
cluster (hereafter B) updates the employee
count of the employee's former department in an interleaved
transaction.
The data tables have been created using the following SQL statements:
# Employee table
CREATE TABLE employee (
id INT PRIMARY KEY,
name VARCHAR(2000),
dept INT NOT NULL
) ENGINE=NDB;
# Department table
CREATE TABLE department (
id INT PRIMARY KEY,
name VARCHAR(2000),
members INT
) ENGINE=NDB;
The contents of the two tables include the rows shown in the
(partial) output of the following
SELECT statements:
mysql>SELECT id, name, dept FROM employee;+---------------+------+ | id | name | dept | +------+--------+------+ ... | 998 | Mike | 3 | | 999 | Joe | 3 | | 1000 | Mary | 3 | ... +------+--------+------+ mysql>SELECT id, name, members FROM department;+-----+-------------+---------+ | id | name | members | +-----+-------------+---------+ ... | 3 | Old project | 24 | ... +-----+-------------+---------+
We assume that we are already using an exceptions table that includes the four required columns (and these are used for this table's primary key), the optional columns for operation type and cause, and the original table's primary key column, created using the SQL statement shown here:
CREATE TABLE employee$EX (
NDB$server_id INT UNSIGNED,
NDB$master_server_id INT UNSIGNED,
NDB$master_epoch BIGINT UNSIGNED,
NDB$count INT UNSIGNED,
NDB$OP_TYPE ENUM( 'WRITE_ROW','UPDATE_ROW', 'DELETE_ROW',
'REFRESH_ROW','READ_ROW') NOT NULL,
NDB$CFT_CAUSE ENUM( 'ROW_DOES_NOT_EXIST',
'ROW_ALREADY_EXISTS',
'DATA_IN_CONFLICT',
'TRANS_IN_CONFLICT') NOT NULL,
id INT NOT NULL,
PRIMARY KEY(NDB$server_id, NDB$master_server_id, NDB$master_epoch, NDB$count)
) ENGINE=NDB;
Suppose there occur the two simultaneous transactions on the two clusters. On cluster A, we create a new department, then move employee number 999 into that department, using the following SQL statements:
BEGIN;INSERTINTO department VALUES (4, "New project", 1);UPDATEemployee SET dept = 4 WHERE id = 999; COMMIT;
At the same time, on cluster B, another
transaction reads from employee, as shown here:
BEGIN;
SELECT name FROM employee WHERE id = 999;
UPDATE department SET members = members - 1 WHERE id = 3;
commit;
The conflicting transactions are not normally detected by the
conflict resolution mechanism, since the conflict is between a
read (SELECT) and an update operation.
Beginning with NDB 7.4.1, we can circumvent this issue by
executing
SET
ndb_log_exclusive_reads
= 1 on the slave cluster. Acquiring exclusive
read locks in this way causes any rows read on the master to be
flagged as needing conflict resolution on the slave cluster. If we
enable exclusive reads in this way prior to the logging of these
transactions, the read on cluster B is
tracked and sent to cluster A for resolution;
the conflict on the employee row will be detected and the
transaction on cluster B is aborted.
The conflict is registered in the exceptions table (on cluster
A) as a READ_ROW operation
(see Conflict resolution exceptions table,
for a description of operation types), as shown here:
mysql> SELECT id, NDB$OP_TYPE, NDB$CFT_CAUSE FROM employee$EX;
+-------+-------------+-------------------+
| id | NDB$OP_TYPE | NDB$CFT_CAUSE |
+-------+-------------+-------------------+
...
| 999 | READ_ROW | TRANS_IN_CONFLICT |
+-------+-------------+-------------------+
Any existing rows found in the read operation are flagged. This means that multiple rows resulting from the same conflict may be logged in the exception table, as shown by examining the effects a conflict between an update on cluster A and a read of multiple rows on cluster B from the same table in simultaneous transactions. The transaction executed on cluster A is shown here:
BEGIN;
INSERT INTO department VALUES (4, "New project", 0);
UPDATE employee SET dept = 4 WHERE dept = 3;
SELECT COUNT(*) INTO @count FROM employee WHERE dept = 4;
UPDATE department SET members = @count WHERE id = 4;
COMMIT;
Concurrently a transaction containing the statements shown here runs on cluster B:
SET ndb_log_exclusive_reads = 1; # Must be set if not already enabled
...
BEGIN;
SELECT COUNT(*) INTO @count FROM employee WHERE dept = 3 FOR UPDATE;
UPDATE department SET members = @count WHERE id = 3;
COMMIT;
In this case, all three rows matching the WHERE
condition in the second transaction's
SELECT are read, and are thus flagged in the
exceptions table, as shown here:
mysql> SELECT id, NDB$OP_TYPE, NDB$CFT_CAUSE FROM employee$EX;
+-------+-------------+-------------------+
| id | NDB$OP_TYPE | NDB$CFT_CAUSE |
+-------+-------------+-------------------+
...
| 998 | READ_ROW | TRANS_IN_CONFLICT |
| 999 | READ_ROW | TRANS_IN_CONFLICT |
| 1000 | READ_ROW | TRANS_IN_CONFLICT |
...
+-------+-------------+-------------------+
Read tracking is performed on the basis of existing rows only. A read based on a given condition track conflicts only of any rows that are found and not of any rows that are inserted in an interleaved transaction. This is similar to how exclusive row locking is performed in a single instance of NDB Cluster.