Deploying new code quickly and automatically is fundamental to how we build product at Intercom.
We’ve been doing it since the earliest days of the company. At Intercom’s recent event, On Product, I gave a talk about the unexpected benefits of continuous deployment – what follows is a lightly edited version of it.
One of our core values at Intercom is that we think big and start small. This means we take on big, ambitious, impactful projects and tackle them as a series of small, safe steps. When building a new product feature we build the simplest version possible and deploy it behind a feature flag so only Intercom staff can see it. Based on feedback we’ll fix bugs and make improvements before launching it to a series of beta customers. After another round of iterations, fixes and improvements, we launch the feature to everyone.
This feedback loop is invaluable and we try and use it for everything we build. At each stage in this process we learn more about how people use the feature, how the code performs in production, even whether it’s a good idea at all. For most new features we’ll deploy dozens of iterations before we launch it to all of our customers. With many features being developed at the same time, this means we’re deploying pretty much constantly, every single day.
From our internal deployment dashboard we can see just how many times we’ve deployed Intercom every day over the last three years. From the middle of 2012, it was about 10 times a day and today we’re over 80 times a day, and I predict by the end of 2015 we’ll definitely be over 100 times a day.
The main driver for this massive increase in deployment rate is that we’ve grown our team significantly over that time. So when we have this many people working together on a product that changes 80 times a day our deployment process needs to be smooth, reliable and fast.
Back when we were six people, we started working on an automated deployment system for Intercom.
Here’s a quick overview of how it works:
After a code review on GitHub, engineers merge their features into the master branch. GitHub sends a webhook to Codeship, who run our test suite for us to make sure there are no regressions in existing behaviour. GitHub also send a webhook to a tool we built called Muster, which prepares the latest version of the code for release.
Once the tests have run successfully, Codeship sends a webhook to Muster, and the code is pushed out to our production environment of about 200 EC2 instances.
The whole process generally takes less than 10 minutes end-to-end. This is fast enough that engineers should never be blocked waiting for a deployment; if I’m working on a feature that’s in beta and I have a dozen bugs to fix before release, I could easily deploy them all in a single day – assuming I can write the code that fast.
Having done this every day for the last three years, we’ve noticed that continuous deployment has some other interesting benefits for how we work.
Helping new engineers
When a new engineer joins Intercom we set them two initial goals in their first week: They should ship some code to production on their first day and ship a feature in their first week. This lets them feel like a productive member of the team right away. We have a demo session on Fridays (pictured above) where we show the rest of the company what we built that week; standing up and showing your teammates a feature that’s already live for customers at the end of your first week is an incredibly empowering feeling.
But these are challenging goals for a new engineer. You need to do a bunch of stuff before you can get to this stage. You need to set up your laptop, meet your team, figure out what you’re going to be working on. You need to write the code and you may be unfamiliar with the language you’re going to be working with at Intercom, so there’s a bunch of reasons you might not be able to get a full feature done in a week.
The one thing that doesn’t slow you down at Intercom is figuring out how to deploy your changes. Having a fully automated deployment system removes one significant barrier to success – new engineers can deploy their changes as soon as they’re ready. If they’re curious later they can learn how it works, check out the code, even improve the deployment system. But during their first week, not having to learn rules and processes around deployment is really helpful.
Cut out bad behavior
Fixing a bug in production pic.twitter.com/saD82hLacz
— Practical Developer (@ThePracticalDev) August 11, 2015
The other interesting benefit is that it cuts out certain kinds of bad behavior. When you have a serious bug in production, it’s tempting to do whatever you can to fix the problem as quickly as possible.
If your deployment system is slow – if it’s going to be 20 minutes from merging a fix to having it deployed for production – you’ll be tempted to route around it to fix the bug in any way possible. This sounds kind of reckless, but it might be the right choice. You don’t want to sit there for 20 minutes watching your app crash and burn. The trick is to make your deployment system fast and reliable enough so it’s always easier to use than to hack away in production.
Optimising our deployment rate
This all takes quite a lot of work. We know that our current deployment process takes eight or nine minutes – that’s not bad, but it could be better. A lot of my team’s work right now is on optimizing deployments to be even faster. Ideally we’ll get down to 3 or 4 minutes, which will give us plenty of headroom for when we hit 100 deployments a day, and beyond.
If your company is successful, over time a lot of things are going to grow. You’re going to hire more people, your team is going to grow, you’re going to write more code, your codebase is going to grow, you’re going to have to introduce new processes (the kind of process that work for six people do not work for 100), hopefully you’re going to write a lot of automated tests and make sure you don’t break all your initial features as you add new ones. All this conspires to push up your deploy time and that means slowing down the feedback loop mentioned earlier.
It also makes it less fun to work on your product. It’s less fun because you can be less effective and you’re learning less every day which is pretty frustrating for engineers but it’s actually worse for your product. You’re much more likely to make mistakes and build the wrong feature or invest in something you don’t need to be investing in.
So, early on at Intercom we identified deployment time as a really great way of shortening the feedback loop and keeping it short – which means we can build the right features for our customers in the fastest time possible.