If you have followed the instructions but your replication setup is not working, the first thing to do is check the error log for messages. Many users have lost time by not doing this soon enough after encountering problems.
If you cannot tell from the error log what the problem was, try the following techniques:
Verify that the master has binary logging enabled by issuing a
SHOW MASTER STATUSstatement. If logging is enabled,Positionis nonzero. If binary logging is not enabled, verify that you are running the master with the--log-binoption.Verify that the master and slave both were started with the
--server-idoption and that the ID value is unique on each server.Verify that the slave is running. Use
SHOW SLAVE STATUSto check whether theSlave_IO_RunningandSlave_SQL_Runningvalues are bothYes. If not, verify the options that were used when starting the slave server. For example,--skip-slave-startprevents the slave threads from starting until you issue aSTART SLAVEstatement.If the slave is running, check whether it established a connection to the master. Use
SHOW PROCESSLIST, find the I/O and SQL threads and check theirStatecolumn to see what they display. See Section 16.2.2, “Replication Implementation Details”. If the I/O thread state saysConnecting to master, check the following:Verify the privileges for the user being used for replication on the master.
Check that the host name of the master is correct and that you are using the correct port to connect to the master. The port used for replication is the same as used for client network communication (the default is
3306). For the host name, ensure that the name resolves to the correct IP address.Check that networking has not been disabled on the master or slave. Look for the
skip-networkingoption in the configuration file. If present, comment it out or remove it.If the master has a firewall or IP filtering configuration, ensure that the network port being used for MySQL is not being filtered.
Check that you can reach the master by using
pingortraceroute/tracertto reach the host.
If the slave was running previously but has stopped, the reason usually is that some statement that succeeded on the master failed on the slave. This should never happen if you have taken a proper snapshot of the master, and never modified the data on the slave outside of the slave thread. If the slave stops unexpectedly, it is a bug or you have encountered one of the known replication limitations described in Section 16.4.1, “Replication Features and Issues”. If it is a bug, see Section 16.4.5, “How to Report Replication Bugs or Problems”, for instructions on how to report it.
If a statement that succeeded on the master refuses to run on the slave, try the following procedure if it is not feasible to do a full database resynchronization by deleting the slave's databases and copying a new snapshot from the master:
Determine whether the affected table on the slave is different from the master table. Try to understand how this happened. Then make the slave's table identical to the master's and run
START SLAVE.If the preceding step does not work or does not apply, try to understand whether it would be safe to make the update manually (if needed) and then ignore the next statement from the master.
If you decide that the slave can skip the next statement from the master, issue the following statements:
mysql> SET GLOBAL sql_slave_skip_counter = N; mysql> START SLAVE;The value of
Nshould be 1 if the next statement from the master does not useAUTO_INCREMENTorLAST_INSERT_ID(). Otherwise, the value should be 2. The reason for using a value of 2 for statements that useAUTO_INCREMENTorLAST_INSERT_ID()is that they take two events in the binary log of the master.See also Section 13.4.2.5, “SET GLOBAL sql_slave_skip_counter Syntax”.
If you are sure that the slave started out perfectly synchronized with the master, and that no one has updated the tables involved outside of the slave thread, then presumably the discrepancy is the result of a bug. If you are running the most recent version of MySQL, please report the problem. If you are running an older version, try upgrading to the latest production release to determine whether the problem persists.
dbname", binlog-do-db=dbname will not binlog a
query like: "update in dbname.foobar set foo=1"
You explicitly have to do a USE before a query in
order to have your query binlogged, it looks
like. Replication on the slave side can do
wildcard matches .. but the master cannot (a la
binlog-wild-do-table=dbname.%). So make sure your
clients do a use, if you plan to replicate those
tables it updates.
When you attempt to use a certain master-
user/master-password combo to connect to the
mysql master, and you later change my.cnf to
attempt to connect with a new user, you must
update master.info to reflect the changes.
Since my master.info file only had one entry in
it (the slave only has one master), I simply
deleted the file. Upon restarting the slave
daemon, a new master.info was automatically
written.
--Curby
script as I did, and you are familiar with Perl,
look at the Log::Rotate module on CPAN rather
than reinventing the wheel.
modularized and abstracted code" and refers to
safe_reader_query() and safe_writer_query(), I'd
like to put forth the proposition that different
abstracted functions for reading and writing need
not be nessessary for compliance with replication.
We currently have all of our queries running
through One safe-sql envelope, and we intend to
keep this architecture as we move to replication
by telling our envelope to send any queries that
begin with /\s+select/i to the slave, and
anything else to the master. We are running under
the assumption that all "read queries" are
selects. I can't think of any that aren't. If
there were they would probably be nominal in
performace draw (we use many complex select
statements :) and wouldn't do any harm being
handled by the master anyway.
So far as multi-statement queries, are code
doesn't, and won't, mix selects in the same query
with non-selects, or probably even use more than
one select per query, since we're not certain
what results could be returned in such a
circumstance. Thus reading and writing should
never get mixed up in the same query, and all
reads should start with the word "Select".
If you feel that my theory holds water then give
it a go, if you see a flaw in my logic before I
do mail me and lemme know, eh? 10x :)
- - Jess
been using binary logging for some time on the
master. I had already removed $hostname-bin.001
long ago. The slave complained about not being
able to find the first log and would not start
replaying transactions. To fix this, I stopped
the master, made a new snapshot, moved "*-bin.*"
to another directory, and started the master
again. Then when I put the snapshot on the slave
and started it, everything worked correctly.
If you want your slaves to connect to the
replication server with a unique username &
password and minimal privileges, you need to grant
just the FILE privilege to your replication user.
Similarly, if your slaves have no local updates
made on them, just lots of selects, its a very
good idea to connect with a user that cannot
update the data. This stops dead any chance of
mistakenly connecting to the wrong DB and losing
updates.
to work following the instructions about.
Specifically I got errors when I created
the "repl" user before copying data to the slave.
I could only get replication to work if I created
the "repl" user after copying data to the slave -
and obviously starting both servers...
have to make the changes in the my.ini file as
well. It took me a while to realize this.
raise - I have two servers doing 2-way
replication. One is on Linux, and one is on
Windows. There is an issue of case-sensitivity
in that if case is not taken into consideration
on the windows machine, the slave on the linux
machine stops. I'd love to hear any fixes to
this: [email protected]
in my case I had not actually deleted any of the
binary log files which the slave still required
to update from (I knew this courtesy of "show
slave status" command). But I had deleted several
log files it had finished with. I was able to fix
this without making a brand new snapshot (which
in my case took a larger outage that I would
like) What I did was edit the file (in the
Masters logs directory) $hostname-bin.index and
edited entries to show exact filenames of my
remaining binary log files.(ie: removing the
entries matching ones I had deleted manually from
the file system) I quickly did a mysql stop and
mysql start after that and I performed the
command "slave stop; slave start;" on the slave
and it started replicating again. I am running
version 3.23.41
When a slave is running you have a master.info file. If you change the slave to become the master you must delete the master.info file or you may get errors such as this:
030130 10:57:03 Slave thread: error connecting to master: Unknown MySQL Server Host '' (4) (107), retry in 60 sec
It will keep retrying every 60 sec. Just deleting the master.info file worked for me.
Still haven't figure out what happened
the partition where you put your binlogs. One of two things
will happen:
1) the binlog will get truncated in the middle of an event, and the IO thread on the replication slave(s) will halt.
2) A series of transactions will go unrecorded on the master
binlog.
Either way, you'll have to reload the data from the replication master to the slave.
040308 13:38:01 Slave: reconnected to master 'repl@master:3306',replication resumed in log 'master-bin.011' at position 8379225
8
040308 13:38:01 Error reading packet from server: Access denied for user: 'repl@slave' (Using password: YES) (server_errno=1045)
it turns out that I'd installed Mysql, then changed the hostname of the slave server - *however*, the slave server's mysql.user table still contained the old slave server's hostname.
Changing the host for the root user in the slave's mysql.user table to the correct hostname, then restarting the slave process fixed it all.
vittal
If you encounter the following error on the replication slave:
[ERROR] Slave I/O thread: error connecting to master '[email protected]:3306': Error: 'Can't connect to MySQL server on '192.168.7.11' (13)' errno: 2003 retry-time: 60 retries: 86400
check that you can login using:
mysql -u "replication_username" -h "replication_hostname" -p
if you can login using the above but replication fails to connect it may be SELinux policies restricting mysqld's access to ports, sockets, and files.
Check or grep /var/log/audit/audit.log and /var/log/messages for messages pertaining to mysqld like the following:
type=AVC msg=audit(1126806889.640:14244693): avc: denied { name_connect } for pid=2561 comm="mysqld" dest=3306 scontext=system_u:system_r:mysqld_t tcontext=system_u:object_r:mysqld_port_t tclass=tcp_socket type=SYSCALL msg=audit(1126806889.640:14244693): arch=40000003 syscall=102 success=no exit=-13 a0=3 a1=b139a130 a2=2 a3=0 items=0 pid=2561 auid=4294967295 uid=27 gid=27 euid=27 suid=27 fsuid=27 egid=27 sgid=27 fsgid=27 comm="mysqld" exe="/usr/libexec/mysqld"
To confirm SELinux is the cause, try turning off SE Linux from the root account using the following command:
# setenforce 0
# /etc/init.d/mysqld restart
If replication then starts working you know SELinux is the cause. On FC-4, running the following can shed light on this:
# audit2allow -i /var/log/audit/audit.log -l
allow mysqld_t mysqld_port_t:tcp_socket name_connect;
allow mysqld_t self:tcp_socket connect;
allow mysqld_t var_lib_t:dir { add_name read remove_name write };
allow mysqld_t var_lib_t:file { append create getattr lock read unlink write };
allow mysqld_t var_lib_t:sock_file { create getattr };
and running
# restorecon -R -v /var/lib/mysql
can fix problem associated with changing files in the MySQL directory
Since this post is about identifying the MySQL problems and not a tutorial about SELinux you should see also for details:
http://fedora.redhat.com/docs/selinux-apache-fc3/sn-debugging-and-customizing.html (which describes how to update the SELinux policies)
reason usually is that some statement that succeeded on the
master failed on the slave. This should never happen if you
have taken a proper snapshot of the master, and never
modified the data on the slave outside of the slave thread.
AND you have never issued a statement on the MASTER which references a file or database that doesn't exist on the SLAVE.
For example:
INSERT INTO replicated_table SELECT stuff FROM master_only_db.table
This resulted in the slave error log having a line saying:
[ERROR] Got fatal error 1236: 'Could not find first log file name in binary log index file' from master when reading data from binary log
Don't panic. You need to manually edit the master's HOSTNAME-bin.index file and change all ./ relative paths to the absolute path (matching the ones created since the upgrade). Don't do this with the master running for obvious reason. Once you've restarted the master, restart the slave and the slave should catch up. Well, it did for me.
Assumes you haven't explicitly changed where binary logs are going during the upgrade of course.
re-copy the binary log file from the master to the slave, changing the name as necessary.
Shutdown mysqld
Start mysqld again.
> start slave;
For some reason, the daemon held on to the old version of the binary log file, despite me telling it to refresh logs via mysqladmin.
Stop/start of the mysqld daemon followed by a "slave start" worked wonders.
You can check for this problem if command:
show grants for "replicate"@"your.slave.host";
does NOT have "REPLICATION SLAVE" in its output.
The fix is to run mysql_fix_privilege_tables script which comes with mysql.
I could not get the replication slave to connect to the replication master, yet I could connect using the mysql client. The problem was that the password I had set was over 32 characters in length, which apparently gets truncated by the replication slave but not in the mysql server or in the mysql client, causing the connection to be refused to the replication slave but not to the mysql client. The clue I got was that inside the master.info file the password was truncated. Using a shorter password fixed it. This issue is described in the documentation for the change master command.