Migrating Data - Rails Migrations or a Rake Task?

Posted by John Wood on Feb 25, 2015 11:41:00 AM

I’ve always thought that Migrations were one of Rails’ best features. In one of the very first projects I worked on as a n00b software engineer straight out of college, schema and data migrations were a very large, and painful, part of that project’s deployment process. Being able to specify how the schema should change, and being able to check in those changes along with the code that depends on those changes, was a huge step forward. The ability to specify those changes in an easy to understand DSL, and having the ability to run arbitrary code in the migration to help make those changes, was simply icing on the cake.

But, the question of how to best migrate data, not the schema, often comes up. Should data migrations be handled by Rails migrations as well? Should they be handled by a script, or a rake task instead? There are pros and cons to both approaches, and it depends on the situation.

Using Rails Migrations

One of the features of Rails Migrations is that the app will fail to startup if you have a migration that has not been run. This is good, because running your app with an out of date schema could lead to all sorts of problems. Because of this, most deployment processes will run the migrations after the code has been deployed, but before the server is started. If your data migration needs to be run before the app starts up, then you can use this to your advantage by using a migration to migrate your data. In addition, if your data migration can be reversed, then that code be placed in the migration’s down method, fitting nicely into the “migration way” of doing things.

However, there are some pitfalls to this approach. It is bad practice to use code that exists elsewhere in your application inside of a Rails Migration. Application code evolves over time. Classes come and go, and their interfaces can change at any time. Migrations on the other hand are intended to be written once, and never touched again. You should not have to update a migration you wrote three months ago to account for the fact that one of your models no longer exists. So, if migrating your data requires the use of your models, or any other application code, it’s probably best that you not use a migration. But, if you can migrate your data by simply using SQL statements, then this is a perfectly valid approach.

Using a Rake Task

Another way to migrate production data is to write a rake task to perform the migration. Using a rake task to perform the migration provides several clear advantages over using a Rails Migration.

First, you are free to use application code to help with the data migration. Since the rake task is essentially “throw away”, it can easily be deleted after it has been run in production. There is no need to change the rake task in response to application code changing. Should you ever need to view the rake task after it has been deleted, it is always available via your source control system. If you’d like to keep it around, that’s fine to. Since the rake task won’t be run once it has been run in production, it can continue to reference classes that no longer exist, or use APIs that have changed.

Second, it is easier to perform test runs of the rake task. We usually wrap the rake task code within an ActiveRecord transaction, to ensure that if something bad happens, any changes will be rolled back. We can take advantage of this design by conditionally raising an exception at the end of the rake task, rolling back all of the changes, if we are “dry run” mode (usually determined by an environment variable we pass to the task). This allows us to perform dry runs of the rake task, and use logging to see exactly what it will do, before allowing it to modify any data. With Rails Migrations, this is more difficult, as you need to rollback the migration as a separate step, and this is only possible for migrations that are reversible.

Finally, you can easily run the rake task whenever you want. It does not need to happen as a part of the deployment, or alter your existing deployment process to push the code without running the migrations or restarting the server. This gives you some flexibility, and lets you pick the best time to perform the migration.

Our Approach

Generally, we use Rails Migrations to migrate our application's schema, and Rake tasks to migrate our production data. There have only been a few cases where we have used Rails Migrations to ensure that a data migration took place as a part of the deployment. In all other cases, using a Rake task provides us with more flexibility, and less maintenance.

Another Approach?

Do you have another approach for migrating production data? If so, we’d love to hear it. Feel free to drop it in the comments.

Topics: rails, Development & Deployment

Human Resources Today