Engineering for zero downtime – Advanced RDS database migrations
Having more users means putting more thought into how we minimize disruption.
Scaling successfully brings challenges of its own. For Learnosity, one of the most important of these is engineering for zero downtime. It’s something we take very seriously, which is pretty clearly reflected in how we approach the design and upgrades of our systems.
One of the most difficult parts of any high-volume system to upgrade is the Database layer, as there are two ways to approach it:
- Easy way – have an outage and do a backup and restore of databases.
- The Right way – sync up a new database and switch over to it live.
Learnosity is built on Amazon’s Web Services (AWS) and their RDS service provides great tools that solve many of the core heavy lifting that our systems require. Many of the things that were painful are now problems that have been solved:
- Backup and restore of DB’s
- Creation of Read Replicas
- Database encryption
However, there are still places where it’s necessary to dive a bit deeper, and I found one of those recently.
Migrating to encrypted volumes with zero downtime
With RDS, there is currently no built-in way to migrate from a non-encrypted to an encrypted volume so we had to go off-piste a bit.
Knowing that it is possible to migrate into RDS using MySQL/MariaDB standard replication – and that it is also possible to migrate out of RDS using the same technique – we figured it would be possible to do this internally between 2 differently configured RDS machines. It turns out we were correct, and here is how we did it.
- RDS Master DB – source-master
- RDS Read Only Slave – source-slave
- RDS New Master DB – target-master
For testing I set up the following machines:
RDS Master DB:
Name: source-master Not encrypted User: master Pwd: password DBName: test MariaDB 10.0.24
RDS Read Only Replica:
Setup via AWS console, so has the same configuration as the RDS Master DB.
The RDS Target Master DB was configured as follows:
Name: target-master Encrypted User: master Pwd: password DBName: test MariaDB 10.1.23
- Wait for source-master and source-slave to both have the Available status.
- Open a MySQL connection to source-slave and run the following command so that it retains is logs for longer while doing this.:
call mysql.rds_set_configuration('binlog retention hours', 24);
- On source-master create a replication user (with more secure passwords)
CREATE USER 'repl’@‘%’ IDENTIFIED BY 'pass'; GRANT REPLICATION SLAVE ON *.* TO 'repl’@’%';
- Open a MySQL connection to source-slave and run the commands:
call mysql.rds_stop_replication; SHOW SLAVE STATUS;
- Note down the values in Master_Log_File and Read_Master_Log_Pos as shown in the snippet:
Slave_IO_State: Master_Host: 10.0.0.189 Master_User: rdsrepladmin Master_Port: 3306 Connect_Retry: 60 Master_Log_File: mysql-bin-changelog.000013 Read_Master_Log_Pos: 1246 Relay_Log_File: relaylog.000002 Relay_Log_Pos: 918 Relay_Master_Log_File: mysql-bin-changelog.000013 Slave_IO_Running: No ..... Master_SSL_Crlpath: Using_Gtid: Slave_Pos Gtid_IO_Pos: 0-1385452505-37
- Run a mysqldump from source-slave with the following command:
mysqldump -h source-slave -u user -p password --single-transaction --routines --triggers --databases database1 database2 | gzip > source-slave-dump.sql.gz
- Import the DB dump to target-master:
zcat source-slave-dump.sql.gz | mysql -h target-master -u user -p password
- Open a MySQL connection to target-master and set the source as source-master using the following command:
CALL mysql.rds_set_external_master ( 'source-master.xxxxxxx.ap-southeast-2.rds.amazonaws.com' , 3306 , 'repl' , 'pass' , 'mysql-bin-changelog.000013' , 1246 , 0 );
- Start replication by running on target-master and monitoring progress using:
CALL mysql.rds_start_replication; SHOW SLAVE STATUS;
- You can now safely delete the source-slave machine
- You can now transition your writes over to the new master and proceed with near zero downtime (depending on your application)
- When you want to disconnect the target-master from the source-master use the following:
CALL mysql.rds_stop_replication; CALL mysql.rds_reset_external_master;
- That’s it!
Mark Lynch is co-founder and CTO of Learnosity.