At some point you will have to face data migration problem. In most cases it’s a simple SQL statement, but sometimes it can get complicated.
It’s a delicate process. Preparing for it will take hours, running it also will cost some time. It’s very easy to make small mistakes which can lead to a catastrophe.
Most of these problems can be avoided in a few simple steps. Here is the list of things that I try to make during this process.
Prepare the checklist
Whatever you have to do during migration put it on the checklist. Always. Every single step.
Migration can be stressful, it’s easy to forget about some things. Detailed checklist will lower the chance that you’ve missed something and it can save a lot of time too.
Keep this list updated. Test it – make sure that every step is necessary for successful migration. Get rid of unnecessary things – they will only waste your time.
Use smaller scripts
In most cases it’s very tempting to create all-in-one upgrade script. It quickly gets bigger and bigger making it hard to debug and maintain.
Split your migration into separate scripts. Make them responsible for changing only one thing. This way your scripts will be smaller, and easier to maintain.
After dividing the migration into separate steps don’t forget to put them on your checklist! Between each step you can make a backup and if something goes wrong you will be able to return to the previous step.
Optimize
In migration process performance factor is really important. Before running the actual migration it has to go through series of iterations (development – testing – fixes). Slow scripts (and waiting for them all to finish) will waste your time.
During development you probably work with a fairly small database. On dev machine migration can take minutes but on production server this can get even few times longer (lately my 30 minutes migration took almost 4 hours on production).
It’s important to make migration as short as possible. The longer it works the longer your application is unavailable. Also take in mind that waiting to finish the whole process will make you more tired. This can lead later to some problems.
Test on production data
In a perfect world your data is perfectly correct and migration will work perfectly fine. However in userland (production) you get surprised about what can happen with the data (users sometimes can be really creative).
Let me give you an example. In one of our web apps users could specify a source of data. It could be anything – from URI to a word describing the source. During migration I had to check if this value is a valid URI, or just a string. For this I decided to use regular expressions. Everything worked just fine. However on production some users decided to put in the source whole sentences. The regexp returned false-positives which marked them as URI. This caused an error during migration.
Before the actual migration make a test run on a production data. Make sure that you are prepared for every case.
Prepare for a failure
No matter how good you are prepared some things can go wrong (memory leaks, server failure etc.). It happens.
If possible, make sure that your scripts can resume the work they started. This way you can simply restart them if something goes wrong.
TL;DR be prepared, write down every little step, don’t waste your time (optimize). These simple tips should help you save your sanity during data migration.