7.19.2010

Efficient Rule Based Scheduling

Recently at work we had a project to convert a number of myisam latin 1 databases to utf8. The process we settled on was to lvm snapshot the databases before we started the conversions so that if something went wrong we could mount the snapshots and restore quickly. The problem that arose from this was limited space for the snapshots. We needed to break the conversions up so that we were never converting more data than the size of the snapshots we were creating. I needed to come up with a schedule for doing the conversion, to maximize the number of conversions that could be undertaken at a time, while maintaining the rule of not allowing more data to change than the size of the snapshot.

The answer was simple enough, collect the sizes of all the databases, sort them largest to smallest and then start scheduling each round by taking the largest database off the top of the list and start adding the smallest databases from the bottom of the list until we hit our max size limit.

As per my usual, I solved this with perl. Here is what it looked like;

First lets get a list of the databases

my $dbh = DBI->connect( $dsn, $user, $password, { RaiseError => 1, AutoCommit => 0 } );
my $dbs = $dbh->selectcol_arrayref(q{SHOW DATABASES});
$dbh->disconnect;


Now lets get the sizes for each


my @dbsizes;
foreach my $db ( @$dbs ) {
next if grep /^$db$/, @skip;
opendir my $fh, "$dbpath/$db" or die $!;
my @files = readdir($fh);
# Add up the file sizes for all files in the directory
# there are no subdirectories or this would be to be recursive
my $t;
map { my @s = stat "$dbpath/$db/$_" or die $!; $t += $s[7]; } @files;
push(@dbsizes, [$t,$db]) if defined($t);
}


And now lets sort the list by database size


my @sites = sort { $a->[0] <=> $b->[0]; } @dbsizes;


Lets get to the actual scheduling. Here I'm using a recursive function that will build a reference to a hash keyed by round with an array of databases to convert.


# Recursive function, builds a schedule for database conversion so that no
# Round of conversions exceeds $max
sub build_schedule {
my ($sites,$top,$round,$total,$schedule) = @_;
$top = 1 unless defined($top);
$round = 1 unless defined($round);
$total = 0 unless defined($total);
my $next_site = $top ? shift @$sites : pop @$sites;
$top = 0;
return $schedule unless $next_site;
my ($size, $site) = @$next_site;
if (($total + $size) > $max_size_per_round || $#{$schedule->{round}} > $max_sites_per_round) {
if ($schedule->{$round}[0]) {
unshift @$sites, $next_site;
}
else {
push(@{$schedule->{$round}}, $site);
}
$total = 0;
$top = 1;
$round++;
}
else {
push(@{$schedule->{$round}}, $site);
$total += $size;
}
build_schedule($sites,$top,$round,$total,$schedule);
}



Finally we'll print the schedule

my $schedule = build_schedule(\@sites);
# Print schedule
foreach my $round ( sort { $b <=> $a } keys %$schedule ) {
print "Round $round\n\t";
print join("\n\t", @{$schedule->{$round}} );
print "\n";
}



Here is the complete script

#!/usr/bin/perl -w
use DBI;
use strict;
my $dsn = q{dbi:mysql:mysql:localhost:3306};
my $user = q{root};
my $password = q{password};
my $max_size_per_round = 4_000_000_000; # Max size for each round
my $max_sites_per_round = 5; # Max number of sites for each round
my $dbpath = q{/var/lib/mysql};
my @skip = qw{ information_schema mysql };

# Get a list of all databases on the server
my $dbh = DBI->connect( $dsn, $user, $password, { RaiseError => 1, AutoCommit => 0 } );
my $dbs = $dbh->selectcol_arrayref(q{SHOW DATABASES});
$dbh->disconnect;

# Get the size on disk for each database;
my @dbsizes;
foreach my $db ( @$dbs ) {
next if grep /^$db$/, @skip;
opendir my $fh, "$dbpath/$db" or die $!;
my @files = readdir($fh);
# Add up the file sizes for all files in the directory
# there are no subdirectories or this would be to be recursive
my $t;
map { my @s = stat "$dbpath/$db/$_" or die $!; $t += $s[7]; } @files;
push(@dbsizes, [$t,$db]) if defined($t);
}

# Sort dbs in order of largest to smallest
my @sites = sort { $a->[0] <=> $b->[0]; } @dbsizes;
my $schedule = build_schedule(\@sites);

# Print schedule
foreach my $round ( sort { $b <=> $a } keys %$schedule ) {
print "Round $round\n\t";
print join("\n\t", @{$schedule->{$round}} );
print "\n";
}


# Recursive function, builds a schedule for database conversion so that no
# Round of conversions exceeds $max
sub build_schedule {
my ($sites,$top,$round,$total,$schedule) = @_;
$top = 1 unless defined($top);
$round = 1 unless defined($round);
$total = 0 unless defined($total);
my $next_site = $top ? shift @$sites : pop @$sites;
$top = 0;
return $schedule unless $next_site;
my ($size, $site) = @$next_site;
if (($total + $size) > $max_size_per_round || $#{$schedule->{$round}} > $max_site_per_round ) {
if ($schedule->{$round}[0]) {
unshift @$sites, $next_site;
}
else {
push(@{$schedule->{$round}}, $site);
}
$total = 0;
$top = 1;
$round++;
}
else {
push(@{$schedule->{$round}}, $site);
$total += $size;
}
build_schedule($sites,$top,$round,$total,$schedule);
}

No comments:

Post a Comment