5.22.2009

Setup or Fix MySQL Replication Fast

So here is how to use LVM snapshots to setup or fix MySQL replication quickly.

Prerequisites:
First you need to have used LVM on the location of your MySQL store. I'm not walking you through that. And you need to have left some unused space in the volume group that your logical volume is in. If you don't know what any of that means, go read up on LVM.

Second, you're going to need to create an ssh key. Also not walking you through that. Setup an ssh key on your slave server, so that your master server can ssh to it as root without using a password.

Third, the script I've written assumes your MySQL store is in /var/lib/mysql. And I've written this for CentOS/Redhat, so I'm assuming
service mysql stop
Shuts down your mysql server. Both of these should be easy to fix for your installation.

Fourth, make sure skip_slave_start is in your my.cnf on your slave server.

Last, you'll need rsync, perl and the modules Linux::LVM and Getopt::Long. If you're using an rpm based distro, use cpan2rpm to install those modules. Everyone else, CPAN or your package manager.

The Plan

So what we are going to do is simple. We shutdown MySQL on the slave server. Then connect to your master database and get a list of all your databases, lock all the databases and get the master log position.
Make sure a replication user is setup. Then we create a snapshot, and unlock the master. Mount the snapshot and rsync the databases we got from the master to the slave. Unmount and remove the snapshot on the master. Start up the slave, set the log position and start up replication. Done.

The Script


#!/usr/bin/perl -w
# Script to fix or setup MySQL replication
# http://greg-techblog.blogspot.com

use strict;
use DBI qw{:sql_types};
use Linux::LVM;
use Getopt::Long;

# Database user. Assumes the slave user and pass on both slave and master
my $dbuser = q{USERNAME};
my $dbpass = q{PASSWORD};

# Should be an ip address, not a hostname
my $dbhost_master = q{192.168.1.2};
my $dbhost_slave = q{192.168.1.3};

# Replication user
my $replication_user = q{repl};
my $replication_pass = q{repl};

# Databases to skip when rsyncing the data director
my @skip = qw{information_schema};

# Name for our LVM snapshot
my $snapshot_name = q{dbbackup};

# Volume Group name where we are creating the snapshot
my $volgroup = q{VolGroup00};

# Logical Volume to snapshot
my $logvol = q{LogVol02};

# Location of ssh key
my $ssh_key = q{/root/.ssh/id_rsa};


my $dsn1 = qq|DBI:mysql:database=mysql;host=$dbhost_master;port=3306|;
my $dsn2 = qq|DBI:mysql:database=mysql;host=$dbhost_slave;port=3306|;
my $help;

GetOptions (
"master|m=s" => \$dbhost_master,
"slave|s=s" => \$dbhost_slave,
"ruser=s" => \$replication_user,
"rpass=s" => \$replication_pass,
"duser=s" => \$dbuser,
"dpass=s" => \$dbpass,
"key|k=s" => \$ssh_key,
"vol|v=s" => \$volgroup,
"log|l=s" => \$logvol,
"help|?|h" => \$help,
);

# Print help
if ($help) {
print "
$0 [options]\n
--master/-m IP address of master server (Default: $dbhost_master)
--slave/-s IP address of slave server (Default: $dbhost_slave)
--ruser Replication username (Default: $replication_user)
--rpass Replication password (Default: $replication_pass)
--duser Database admin user (Default: $dbuser)
--dpass Database admin password (Default: XXXXXX)
--key/-k Full path to ssh-key (Default: $ssh_key)
--vol/v Volume Group (Default: $volgroup)
--log/l Logical Volume (Default: $logvol)
--help/-?/-h This help\n
";
exit 0;
}

# Define a statement handle
my $sth;
# Check to see is snapshot volume exists already and die if it does
my %lvm = get_logical_volume_information($volgroup);
my @lvm = keys %lvm;
if (grep /\/$volgroup\/$snapshot_name$/, @lvm) {
die ("Snapshot volume already exists. Use lvmremove to remove it before running this command.\n");
}

# Shutdown Mysql on slave server
unless ( -e $ssh_key ) {
die ("Ssh key identity file missing: $ssh_key");
}
exe_cmd(
qq{ssh -i $ssh_key root\@$dbhost_slave "service mysql stop"},
qq{Couldn't ssh to slave and stop mysql}
);
print "Mysql on slave stopped\n";

#Connect to Master database
my $dbh = DBI->connect(
$dsn1,
$dbuser,
$dbpass,
{RaiseError => 0, AutoCommit => 1 }
) or die ("Error Connecting to server: ".DBI::errstr);
my $sth_dbs = $dbh->prepare(q{SHOW DATABASES});
$sth_dbs->execute or die("Problem executing query: ".$sth_dbs->errstr);

# Lock all tables on master database
$sth = $dbh->prepare(q{FLUSH TABLES WITH READ LOCK});
$sth->execute or die("Probelm executing query: ".$sth->errstr);
print "All Databases on master locked\n";

# Get the master log position and file
$sth = $dbh->prepare(q{SHOW MASTER STATUS});
unless ($sth->execute) {
my $err_msg = $sth->errstr;
my $sth_unlock = $dbh->prepare(q{UNLOCK TABLES});
$sth_unlock->execute;
die("Probelm executing query: $err_msg\n".$sth_unlock->errstr);
}
my $master_status;
unless ($master_status = $sth->fetchrow_hashref) {
my $err_msg = $sth->errstr;
my $sth_unlock = $dbh->prepare(q{UNLOCK TABLES});
$sth_unlock->execute;
die("Probelm executing query: $err_msg\n".$sth_unlock->errstr);
}

# Create the snapshot
my $lvcreate_msg = `lvcreate -L1G -s -n $snapshot_name \
/dev/$volgroup/$logvol 2>&1`
;
if ( $? ) {
my $sth_unlock = $dbh->prepare(q{UNLOCK TABLES});
$sth_unlock->execute;
die("Couldn't create snapshot: $lvcreate_msg $?\n".$sth_unlock->errstr);
}
print "Snapshot created\n";

# Unlock all tables
$sth = $dbh->prepare(q{UNLOCK TABLES});
$sth->execute or die("Probelm executing query: ".$sth->errstr);
print "All databases on master unlocked\n";

# Grant replication rights on master
$sth = $dbh->prepare(q{GRANT REPLICATION SLAVE ON *.* TO ?@? IDENTIFIED BY ?});
$sth->execute($replication_user,$dbhost_slave,$replication_pass)
or die("Problem executing query: ".$sth->errstr);

# Mount the snapshot
unless ( -d qq{/mnt/$snapshot_name}) {
mkdir qq{/mnt/$snapshot_name} or die ("Couldn't create mount point directory: $?");
}
my $mount_snapshot_msg = `mount /dev/$volgroup/$snapshot_name \
/mnt/$snapshot_name -onouuid,ro 2>&1`
;
die ("Couldn't mount snapshot: $mount_snapshot_msg $?") if $?;

# Start rsyncing the snapshot to the slave
print "Starting rsync\n";
my $out;
while ( my ($db) = $sth_dbs->fetchrow_array ) {
unless (grep /^$db$/, @skip) {
$out .= `rsync -zrav /mnt/$snapshot_name/lib/mysql/$db/ \
$dbhost_slave:/var/lib/mysql/$db/ 2>&1`
;
}
die ("rsync failed: $out $?") if $?;
}

# Diconnect from Master Database
$dbh->disconnect;

# Get rid of snapshot
print $out;
exe_cmd(
qq{umount /mnt/$snapshot_name},
qq{Couldn't umount snapshot}
);
exe_cmd(
qq{lvremove -f /dev/$volgroup/$snapshot_name},
qq{Couldn't remove snapshot volume}
);
print "Snapshot removed\n";

# Start Mysql back up on slave (skip_slave_start better be in the my.cnf)
exe_cmd(
qq{ssh -i $ssh_key root\@$dbhost_slave "service mysql start"},
qq{Couldn't ssh to slave and start mysql}
);
print "Mysql on slave started\n";

# Connect to slave and setup replication
$dbh = DBI->connect( $dsn2, $dbuser, $dbpass, { RaiseError => 0, AutoCommit => 1 } )
or die ("Error Connecting to server: ".DBI::errstr);
my $change_master_query = q{
CHANGE MASTER TO MASTER_HOST = ?,
MASTER_USER = ?,
MASTER_PASSWORD = ?,
MASTER_LOG_FILE = ?,
MASTER_LOG_POS = ?
};
$sth = $dbh->prepare($change_master_query) or die ("Problem preparing query\n $change_master_query\n".DBI::errstr);
# Make MASTER_LOG_POS an INT
$sth->bind_param(5,1,SQL_INTEGER);
$sth->execute(
$dbhost_master,
$replication_user,
$replication_pass,
$master_status->{'File'},
int($master_status->{'Position'})
) or die ("Problem changing master:".$sth->errstr);

# Slave slave
$sth = $dbh->prepare(q{START SLAVE});
$sth->execute or die ("Problem starting slave:".$sth->errstr);

# Check to see if everything worked
print "Sleep for 5 seconds...\n";
sleep 5; # this may need to be longer
$sth = $dbh->prepare(q{SHOW SLAVE STATUS});
$sth->execute;
my $slave_status = $sth->fetchall_arrayref({});
print "Slave_SQL_Running:\t".$slave_status->[0]->{'Slave_SQL_Running'}."\n";
print "Slave_IO_Running:\t".$slave_status->[0]->{'Slave_IO_Running'}."\n";
print "Slave_IO_State:\t".$slave_status->[0]->{'Slave_IO_State'}."\n";
print "Last_IO_Error:\t".$slave_status->[0]->{'Last_IO_Error'}."\n";
print "Seconds_Behind_Master:\t".$slave_status->[0]->{'Seconds_Behind_Master'}."\n";

sub exe_cmd {
my $cmd = shift;
my $fail_msg = shift;
my $msg = `$cmd 2>&1`;
chomp $msg;
die ("$fail_msg : $msg\n$?") if $?;
}
__END__

No comments:

Post a Comment