Boris Stumm Consulting

When it comes to database migrations, often the solution is to do the necessary migrations together with application deployment. However, in continuous delivery scenarios, that is more complicated, because of rolling deployments. So, one has to make sure that when migrating, both the currently deployed and the to-be-deployed version of the application can handle the new schema version.

Migration before deployment

Lets think about the necessary steps when the database is migrated before the deployment of the new code. As an example, take the replacement of a non-null-column because of a type change from integer to varchar. The starting application version is A1, the initial schema is S1.

We define a new migration step leading to S2 that has a new, nullable varchar column. Also, we change the application to use that column in version A2, but still also use the old column for backwards compatibility.
Deployment 1: we can deploy A2 together with S2, because A1 will still work with S2 and S2 is applied before the deployment of A2
Now, we can change the application to A3 and the schema to S3, where A3 only uses the new column. The migration to S3 transfers the data to the new column, makes it not-nullable, and deletes the old column.
Deployment 2: again, we can deploy A3 together with S3, for the same reason as before.

There are some disadvantages to this approach:

When the data migration takes a long time, the deployment also will take as long.
Not all databases and migration tools can handle concurrency: If two instances are deployed at the same time, both starting the migration, migration may fail or produce corrupt data.
When deploying new code together with the new schema, developers are more likely to forget that the currently deployed code must still work with the new schema. Ensuring this is not always trivial. In our example, if the application processes db metadata, it already might break when adding an unused new column.

Migration after deployment

Another approach is to do the migration after application deployment.

We define a new migration step leading to S2 that has a new, nullable varchar column. Also, we change the application to optionally use that column in version A2.
Deployment 1: we can deploy A2 together with S2, because A1 will still work with S2. A2 can work with both S1 and S2.
Now, we can change the schema to S3. The migration to S3 transfers the data to the new column, makes it not-nullable, and deletes the old column. Note that we cannot yet go to A3, because migration comes after deployment, and until the migration finishes, there still might be data in the old column.
Deployment 2: We deploy A2 with S3, because migration is coupled to deployment.
Now we can change the code to A3, not using the old column anymore.
Deployment 3: We deploy A3, which is the cleaned up application version only using the new column.

Compared to migrating before deployment, things feel more complicated with this approach. We need one deployment more than before. On the pro side, it makes developers more aware of the migrations and the needed backwards compatibility. All other disadvantages still exist here.

Decoupling migration from deployment

I think that in scenarios with rolling deployments, the migration step has to be decoupled from deployment. Migrations should happen between full deployments, so that during a migration, there is only one application version deployed. Lets see what the procedure would be here:

We define a new migration step leading to S2 that has a new, nullable varchar column.
Migration 1: we can migrate to S2 without a new application version.
Now we implement A2, that makes use of S2.
Deployment 1: we deploy A2 on all instances.
We define S3, as before.
Migration 2: we migrate to S3, since A2 supports both S2 and S3
The last step is cleaning up application code leading to A3
Deployment 2: We deploy A3 on all instances.

On first sight, there is more to do: besides two deployments, we also have to trigger two migrations. But both deployment and migration should be at least semi-automated and be triggered by a "click". I feel it is more clear what is actually happening, and also, the disadvantages mentioned before do not exist anymore:

Since migration and deployment is decoupled, long running migrations do not hinder deployment. Also, the time window for migration can be chosen independently.
Migration can be done once (and not per instance), preventing any concurrency issues.
Because of explicit deployment and migration steps, developers are aware of the migrations and breaking code is less likely. Schema versions must only work for the current application version, not for two versions at a time (as it is the case when migration and deployment is coupled).

There are other advantages of an explicit decoupling:

Easier testing of migrations: since a schema version must always only work for the current application version, unit tests can be written that test the code with the current and the next schema version. When migrations are coupled, there are always two application versions that need to be tested, which is problematic especially in continuous delivery scenarios.
Rollbacks: if during a normal deployment, no migration is done, it becomes easier to roll back back to a previous version because the existing mechanisms can be used without changes. Be aware that after a migration, rollbacks are not always possible, depending on the migration path chosen.
Using feature toggles instead of deployments: Instead of a new deployment, feature toggles could be used to switch code after a migration. That would potentially be faster than a deployment.

Decoupling migrations from deployments

Ordering the necessary actions explicitly

Migration before deployment

Migration after deployment

Decoupling migration from deployment