The Road to Deploy When Ready

Posted by John Wood on Jul 7, 2015 8:52:13 AM

Our deployment process at UrbanBound has matured considerably over the past year. In this blog post, I’d like to describe how we moved from prescribed deployment windows with downtime, to a deploy-when-ready process that could be executed at any point in time.

The Early Days

About a year ago, UrbanBound was in the middle of transitioning from the “get it up and running quickly to validate the idea” stage to the “the idea works, let’s make sure we can continue to push it forward reliably” stage. Up until that point, we had accrued a significant amount of technical debt, and we didn’t have much in the way of a test suite. As a result, deploys were unpredictable. Sometimes new code would deploy cleanly, sometimes not. Sometimes we would introduce regressions in other areas of the application, sometimes not. Sometimes deploys would interfere with users currently using the app, sometimes not. Our deployment process was simply not reliable.

Stopping the Bleeding

The first order of business was to stop the bleeding. Before we could focus on improving the process, first we needed to stop it from being a problem. We accomplished this with some process changes.

First, we decided to limit the number of releases we did. We would deploy at the end of each two week sprint and to push out critical bug fixes. That’s it. We made some changes to our branching strategy in git to support this work flow, which looked something like this:

  • All feature branches would be based off of an integration branch. When features were completed, reviewed, and had passed QA, they would be merged into this integration branch.

  • At the end of every two week sprint, we would cut a new release branch off of the integration branch. Our QA team would spend the next few days regression testing the integration branch to make sure everything looked good. From this point on, any changes made to the code being released, a fix for a bug QA found for example, would be made on this release branch, and then cherry picked over to the integration branch.

  • When QA was finished testing, they would merge the release branch into master, and deploy master to production.

  • Emergency hotfixes would be done on a branch off of master, and then merged into master and deployed when ready. This change would then have to be merged upstream into the integration branch, and possibly a release branch if one was in progress.

A very similar workflow to the one described above can be found at http://nvie.com/posts/a-successful-git-branching-model/

This process change helped us keep development moving forward while ensuring that we were releasing changes that would not break production. But, it did introduce a significant of overhead. Managing all of the branches proved challenging. And, it was not a great use of QA’s time to spend 2-3 days regression testing the release branch when they had already tested each of the features individually before they were merged into the integration branch.

Automated Acceptance Testing

Manually testing for regressions is a crappy business to be in. But, at the time, there was no other way for us to make sure that what we were shipping would work. We knew that we had to get in front of this. So, we worked to identify the critical paths through the application...the minimum amount of functionality that we would want covered by automated tests in order to feel comfortable cutting out the manual regression testing step of the deployment process.

Once we had identified the critical paths through the application, we started writing Capybara tests to cover those paths. This step took a fair amount of time, because we had to do this while continuing to test new features and performing regression testing for new releases every two weeks. We also had to flush out how we wanted to do integration tests, as integration testing was not a large part of our testing strategy at this point in time.

Eventually, we had enough tests in place, and passing, that we felt comfortable ditching manual regression testing effort. Now, after QA had passed a feature, all we needed to see was a successful build in our continuous integration environment to deem the code ready for deployment.

Zero Downtime Deploys

We deploy the UrbanBound application to Heroku. Personally, I love Heroku as a deployment platform. It is a great solution for those applications that can work within the limitations of the platform. However, one thing that is annoying with Heroku is that, by default, your application becomes totally unresponsive while it reboots after a deploy. The amount of time it is down depends on the application, and how long it takes to boot. But, this window was large enough for us that we felt it would be disruptive to our users if we were deploying multiple times per day.

Thankfully, Heroku offers a rolling reboot feature called preboot. Instead of stopping the web dynos and then starting the new ones, preboot changes the order so that it first starts the new web dynos, and makes sure they have started successfully and are receiving traffic before shutting down the old dynos. This means that the application stays responsive during the deploy.

However, preboot adds a fair amount of complexity to the deployment process. With preboot, you will have the old version of the application running side-by-side with the new version of the application, worker dynos, and the newly migrated database, for at least a few minutes. If any of your changes are not backwards compatible with the older version of the application (a deleted or renamed column in the database, for example), the old version of the application will begin experiencing problems during the deploy. There are also a few potential gotchas with some of the add-ons.

In our case, the backwards compatibility issue can be worked around fairly easily. When we have changes that are not backwards compatible, we simply deploy these changes off hours with the preboot feature disabled. The challenge then becomes recognizing when this is necessary (when there are backwards incompatible changes going out). We place the responsibility for identifying this on the author of the change and the person who performs the code review. Both of these people should be familiar enough with the change to know if it will be backwards compatible with the version of the application currently running in production.

The End Result

With the automated acceptance testing and zero downtime deploys in place, we were finally ready to move to a true “deploy when ready” process. Today, we deploy several times a day, all without the application missing a step. No more big integration efforts, or massive releases. We keep the deploys small, because doing so makes it much easier to diagnose problems when they happen. This deployment process also allows us to be much more responsive to the needs of the business. In the past, it could be up to two weeks before a minor change made it into production. Today, we can deploy that change as soon as it is done, and that’s the way it should be.

Topics: rails, Development & Deployment

Testing at UrbanBound

Posted by John Wood on Mar 26, 2015 9:57:46 AM

Testing is a very large part of how we build UrbanBound. We test at various phases in our software development lifecycle, and we have tests that target different levels of the application’s stack. Testing reassures us that the application will behave as we expect it to, ensures that it continues to behave as we expect it to as we change it, and catches problems early, making them easier and cheaper to fix.

User Testing Our Designs

The product team here at UrbanBound strives to design a product that is intuitive, easy to use, and solves the problem at hand. This is very easy to get wrong. Often, teams will make assumptions around the knowledge that the user has of the problem, or how they will interact with the product. Many times, these assumptions are incorrect.

By testing our designs with people who are, or potentially could be users of our application, we find out if we’re wrong at the earliest possible phase in the process. This lets us quickly correct our mistakes while still in the design phase of a feature, instead of spending time and money developing the solution, only to find out after the feature has been launched that the design is a failure.

How we conduct user testing, and the people we recruit as testers, both differ depending on the feature being designed. Often, the prototype being tested can simply consist of a series of static screens that are linked together. This sort of prototype is cheap to build, and is great for testing a user’s understanding of the product and some simple flows through the product. For features that involve a large amount of user interaction, we’ll build a fully interactive prototype in JavaScript. This sort of prototype lets us test the user’s interaction with the product, in addition to their understanding of it and the product’s flow. While it’s much more expensive to build, the costs pale in comparison to the costs of building and delivering a bad solution, and having to go back to the drawing board after a feature has launched and failed.

Backend Unit Tests

UrbanBound is powered by a Rails application on the backend. Like other Rails apps, the application consists of controllers, models, and a host of support classes (such as service objects, policies, serializers, etc). We write unit tests to verify that each of these classes behaves as expected in isolation. Unit tests are great to write when creating or modifying a class, as it lets you know immediately if your code is working properly. It also gives you feedback with regards to how other modules will interact with your code. This feedback helps you define a public interface that is clear and simple. In addition, unit tests act as a great safety net for catching regressions introduced into the application.

We use rspec as the framework for our unit tests, factory_girl for setting up test data, and database_cleaner to help us clean up after each test runs.

Frontend Unit Tests

Like many modern web applications, UrbanBound’s frontend is a single page JavaScript application. Because our frontend is an app itself with a wide array of classes and components, it is important that we test it like we test our backend application. We do this for the same reasons we unit test our backend code.

Our frontend tests are driven by karma, and written using the Jasmine testing framework. We also use the great set of extensions to Jasmine provided by jasmine-jquery.

Feature Tests

Unit tests are great for making sure that modules work in isolation. But, simply making sure the wheels turn and the engine starts doesn’t mean the car will drive. We feel that, especially with a single page JavaScript app on the frontend, it is important to have a suite of tests that exercise the entire stack. This is where our feature tests come in.

Feature tests can be difficult to write and maintain. Because feature tests usually interact with a web server running in a different process or thread, issues related to timing can pop up. If you’re not careful, your test could be checking for the existence of an element on the page before the server has had time to process the response and insert that element into the DOM. This is especially problematic in frontends that make a lot of asynchronous requests to the server. However, there are some very mature tools out there that help with this problem, so you just need to invest some time into learning how to use them properly.

For feature tests, we use rspec + capybara to write our feature tests, in combination with either Selenium WebDriver or Poltergeist as the test driver. We prefer to use Poltergeist because it runs the tests in a headless browser, but sometimes need to fall back to Selenium for specific tests that don’t run quite right in Poltergeist. We also use SitePrism to model the pages in our application as Page Objects, which makes tests easier to read, and write.

Continuous Integration

What’s the point in having an automated test suite if you don’t run it every chance you get? We use a continuous integration service to run our entire suite each time a change is pushed to GitHub. If any tests fail, we are notified immediately so we can investigate why the test failed and fix the issue.

We use CircleCI for our CI server. Although it can be a bit pricy for more than few workers, it really is a fantastic service. It is simple to set up, incredibly flexible, and supports all sorts of  testing setups. They also provide the ability to SSH into the machine that is running your tests, which has proven to be incredibly useful for tracking down that odd case where a test passes locally, but fails consistently on the CI server.

Manual Testing

Automated testing can never completely replace manual testing. It can only help to keep the manual testing focused on the things that can’t be easily automated. We have a team of Quality Assurance Engineers that, in addition to writing feature tests, will manually test features before they are cleared for deployment and merged into the master branch. Our QA team, with their attention to detail, are able to find issues that automated testing alone would never catch. Issues with the visual layout of a page, of the “feel” of the application, are prime examples of where manual testing shines.

Our Testing Philosophy

Our goal is to make sure our application works, and continues to work as we change it. Our approach to testing helps us make this happen. But, we try to never forget that testing has a cost. It takes time and money to maintain a test suite. Therefore, there must always be a return on that investment to the business. Though it doesn’t happen often, there are certainly times where automating the testing of a certain feature just doesn’t make sense. We’re not dogmatic about making sure that 100% of our codebase is covered by an automated test. But, that being said, we automate as much as we practically can.

Topics: testing, Development & Deployment

Migrating Data - Rails Migrations or a Rake Task?

Posted by John Wood on Feb 25, 2015 11:41:00 AM

I’ve always thought that Migrations were one of Rails’ best features. In one of the very first projects I worked on as a n00b software engineer straight out of college, schema and data migrations were a very large, and painful, part of that project’s deployment process. Being able to specify how the schema should change, and being able to check in those changes along with the code that depends on those changes, was a huge step forward. The ability to specify those changes in an easy to understand DSL, and having the ability to run arbitrary code in the migration to help make those changes, was simply icing on the cake.

But, the question of how to best migrate data, not the schema, often comes up. Should data migrations be handled by Rails migrations as well? Should they be handled by a script, or a rake task instead? There are pros and cons to both approaches, and it depends on the situation.

Using Rails Migrations

One of the features of Rails Migrations is that the app will fail to startup if you have a migration that has not been run. This is good, because running your app with an out of date schema could lead to all sorts of problems. Because of this, most deployment processes will run the migrations after the code has been deployed, but before the server is started. If your data migration needs to be run before the app starts up, then you can use this to your advantage by using a migration to migrate your data. In addition, if your data migration can be reversed, then that code be placed in the migration’s down method, fitting nicely into the “migration way” of doing things.

However, there are some pitfalls to this approach. It is bad practice to use code that exists elsewhere in your application inside of a Rails Migration. Application code evolves over time. Classes come and go, and their interfaces can change at any time. Migrations on the other hand are intended to be written once, and never touched again. You should not have to update a migration you wrote three months ago to account for the fact that one of your models no longer exists. So, if migrating your data requires the use of your models, or any other application code, it’s probably best that you not use a migration. But, if you can migrate your data by simply using SQL statements, then this is a perfectly valid approach.

Using a Rake Task

Another way to migrate production data is to write a rake task to perform the migration. Using a rake task to perform the migration provides several clear advantages over using a Rails Migration.

First, you are free to use application code to help with the data migration. Since the rake task is essentially “throw away”, it can easily be deleted after it has been run in production. There is no need to change the rake task in response to application code changing. Should you ever need to view the rake task after it has been deleted, it is always available via your source control system. If you’d like to keep it around, that’s fine to. Since the rake task won’t be run once it has been run in production, it can continue to reference classes that no longer exist, or use APIs that have changed.

Second, it is easier to perform test runs of the rake task. We usually wrap the rake task code within an ActiveRecord transaction, to ensure that if something bad happens, any changes will be rolled back. We can take advantage of this design by conditionally raising an exception at the end of the rake task, rolling back all of the changes, if we are “dry run” mode (usually determined by an environment variable we pass to the task). This allows us to perform dry runs of the rake task, and use logging to see exactly what it will do, before allowing it to modify any data. With Rails Migrations, this is more difficult, as you need to rollback the migration as a separate step, and this is only possible for migrations that are reversible.

Finally, you can easily run the rake task whenever you want. It does not need to happen as a part of the deployment, or alter your existing deployment process to push the code without running the migrations or restarting the server. This gives you some flexibility, and lets you pick the best time to perform the migration.

Our Approach

Generally, we use Rails Migrations to migrate our application's schema, and Rake tasks to migrate our production data. There have only been a few cases where we have used Rails Migrations to ensure that a data migration took place as a part of the deployment. In all other cases, using a Rake task provides us with more flexibility, and less maintenance.

Another Approach?

Do you have another approach for migrating production data? If so, we’d love to hear it. Feel free to drop it in the comments.

Topics: rails, Development & Deployment

Why we're sticking with Heroku as long as possible

Posted by John Wood on Feb 25, 2015 11:33:00 AM

heroku-cloudThese days companies have a lot of choices when it comes to where to host their web applications. Not only are there many different providers to choose from, there are many different types of hosting to choose from. Do you need the raw performance of running on bare metal? Do you need total customization or control of your environment? Or, do you simply need to deploy your application somewhere, and let somebody else manage all of the finer details? 

There is no one right answer for everybody. It all depends on the needs of your application. However, at UrbanBound, we’re going to stay on Heroku as long as humanly possible.

There are many benefits to using a Platform as a Service (PaaS) provider such as Heroku or OpenShift. You give up a lot of control over how your environment is configured. However, in many cases, this is a very, very good thing.

Security

I’m a nerd. I have an old-as-dirt dual Pentium III box in my basement, with a whopping 512MB of RAM, running the latest Ubuntu Server LTS release. I also have a VPS (Virtual Private Server) that I use to host a couple of applications that I have built over the years. There is nothing special on either of these machines. No credit card numbers. No proprietary information. However, that does not stop the flood of vulnerability scans and brute force login attempts that hit these machines daily. I know for a fact that, should I lapse on keeping my machines patched, up-to-date, and locked down, it would only be a matter of time before somebody gained access and did who knows what to my data or my websites/webapps.

The nice thing about using a PaaS is that I don’t have to worry about security as much. I still worry about it, but I worry about the things that are in my control… like making sure our application doesn’t have any security vulnerabilities, or isn’t using versions of any libraries or frameworks with known security holes. But, I don’t have to worry about stuff like the recent Ghost or Shellshock vulnerabilities. The great team at Heroku is all over these issues. We have come to rely on Heroku to quickly address new security vulnerabilities at all levels of the tech stack below our application.

I also don’t need to worry about firewall configuration, intrusion detection systems, malware scanning, network configuration, configuration of other services running on the servers, etc. The list goes on and on.

Scalability

Scaling out is getting easier and easier as time goes on. The days of provisioning a new server from scratch, making sure all of the right versions of the right packages are installed, and manually adding the server to the load balancer are long gone. Even the hosting services that require you to manage the servers yourself provide an easy way to clone an existing server and have the new machine up and handling requests in minutes. And tools like Chef and Puppet are great for keeping you from ending up with a batch of snowflake servers.

But, simply using a slider in a dashboard to tell Heroku how many instances I’d like running, and then poof? Are you kidding? Short of reading my mind (or implementing a fully automated scaling solution with an unlimited budget), I don’t know how much easier it could be.

Granted, this does not come cheap. Heroku can be a very expensive option when you start scaling out. But for the time being, it is stupid-simple to handle occasional spikes in traffic.

heroku-addonsAdd-ons

One potential reason for wanting to manage your own servers is to have the ability to deploy the services you need to get the job done. When you have access to the machine itself, you are free to run whatever database, queuing system, or cache you wish.

Heroku supports a wide variety of add-ons. The list includes data stores, caching utilities, error reporting tools, logging and monitoring tools, notification services, network services, queuing systems, and more. It is very diverse, and likely includes at least plugin to handle whatever need it is that you have.

The best part is that these add-ons are simple to use. There is nothing to install, and most simply require the running of a single command, and maybe the setting an environment variable or two, before you can begin using them. That’s it.

Focus

At UrbanBound, we have a team of people who specialize in building software. Though we all know are way around the command line, and a few of us administer our own personal servers, none of us do it full time. Using a PaaS, we can leave all of the server administration to somebody else...somebody who knows much more about server administration than we do.

The focus on DevOps tooling the past few years has resulted in tools that make it much easier to administer clusters of servers. However, creating and maintaining Chef recipes and Puppet resource files takes a fair amount of work. It also also not easy to keep up to date with the latest versions of all the software you’re running on the server. Nor is it trivial to ensure that all of the software you are running on your server is configured, tuned, and optimized properly.

Leaving the server administration in the hands of the PaaS provider doesn’t only mean that we will end up with a better configured and more secure server. It also means we can focus on what we do best, building our app.

Summary

Not everybody can get away with using a PaaS service like Heroku. Some may need tighter control over how things are configured. Some may need to run in a private data center for legal reasons. And for those who need to scale out wide, it can become very expensive. But, if you don’t need the raw performance of running on bare metal, or the ultimate flexibility to configure every last thing in your environment, then I would encourage you to seriously consider a PaaS provider such as Heroku. Focus on what you do best. For us, that’s continuing to make the UrbanBound application great.

Topics: Development & Deployment

Subscribe to the UrbanBound MAKE blog

Recent Posts

Human Resources Today