A Japanese Geek's MySQL Blog.

2009-12-30

awk and mysqldump

Awk, which has been existing for long time on UNIX like operating system, may sound legacy for people who live in modern web-based programming world. However, awk is really sophisticated and useful tool for various purposes. Today, I'll show you how to use it with mysqldump ;)

One feature which is missing on the current mysqldump command is to separate dump files per DB or table. It can store all data into a large single file only, whether we run the command with --result-file option or redirect the output to some file. Of course it is possible to run mysqldump command several times per DB or table, but it cannot produce a consistent snapshot at a certain time, each dump may be time-shifted in other words, unless all involved tables are locked during dump. So, we need to backup database using a single mysqldump instance if we need a consistent data.

Now, you can make use of awk! It is a really powerful text processor. As an output of mysqldump is text, you can process the output using awk. See the following awk script:
#!/usr/bin/awk -f

function is_new_db(db_name) {
  for (i = 1; i <= num_db; i++) {
    if (db_name == db[i]) {
      return 0;
    }
  }
  return 1;
}

BEGIN {
  num_db = 0
  num_prelines = 0
  num_postlines = 0
  current_file = "/dev/null"
}

/^\-\-/ {
  if ($2 == "Current" && $3 == "Database:") {
    close(current_file);
    db_name = $4
    gsub("`", "", db_name);
    current_file = db_name ".sql";
    if (is_new_db(db_name)) {
      db[++num_db] = db_name;
      print "--\n" $0 "\n--\n" > current_file;
      for (i = 1; i <= num_prelines; i++)
        print prelines[i] >> current_file;
    }
  } else if (num_db == 0) {
    num_prelines++;
    prelines[num_prelines] = $0;
  } else if ($2 == "Dump" && $3 == "completed") {
    num_postlines++;
    postlines[num_postlines] = "";
    num_postlines++;
    postlines[num_postlines] = $0;
  } else {
    print $0 >> current_file
  }
  next;
}

/^\/\*.+\*\/;/ {
  if (match($0, "character|collation")) {
    print $0 >> current_file;
  } else if (match($0, "SET")) {
    if (num_db == 0) {
      if (match(prelines[num_prelines], "^\-\-")) {
        num_prelines++;
        prelines[num_prelines] = "";
      }
      num_prelines++;
      prelines[num_prelines] = $0;
    } else {
      num_postlines++;
      postlines[num_postlines] = $0;
    }
  } else {
    print $0 >> current_file;
  }
  next;
}

{ print $0 >> current_file }

END {
  for (i = 1; i <= num_db; i++) {
    current_file = db[i] ".sql";
    print "" >> current_file
    for (j = 1; j <= num_postlines; j++) {
      print postlines[j] >> current_file;
    }
  }
}
Save the script into a file named separate-dump.awk etc, and make it exectable (chmod u+rx). You can separate dump files per DB by using the script like below:
shell> mysqldump -A --single-transaction --master-data=2 --flush-logs | ./separate-dump.awk
Then, dump files named like "database_name.sql" are created under your current directory! Of course, you can also process an existing dump file like below:
shell> ./separate-dump.awk < dump.sql
Making use of "good old fashioned" unixy command tools will make us happy ;)

Enjoy!!

2 comments:

  1. I've been looking to do this for a while, and seen a few solutions that looked rather .. umm .. interesting. Even bodged a bash script that would do similar by identifying every database in the server and then calling mysqldump one by one, but this is so much nicer :)

    ReplyDelete
  2. Yesterday in a programming system course test I have a item where the answer had to be this one : 1. shell> mysqldump -A --single-transaction --master-data=2 --flush-logs | ./separate-dump.awk

    ReplyDelete