Main illustration: Beth Walrond
This is a time of great uncertainty, with everyone suddenly having to adjust to social distancing restrictions and widespread industry upheaval.
Our customers are relying on Intercom more than ever, and often in a host of new and unforeseen ways. Therefore, it’s incumbent on us to make sure Intercom is stable and robust at this time, and that we keep shipping new features.
“How do you try to maintain “business as usual” at a time when everything is seriously unusual?”
But how do you try to maintain “business as usual” at a time when everything is seriously unusual? What changes when you suddenly become an all-remote team overnight, having spent years building and shipping alongside one another?
We’d like to share some insights from what we’ve learned so far about software delivery and operations over the past few weeks of working from home during COVID-19, and how we as an engineering team have adjusted to this unprecedented, unpredictable situation.
New situation, new processes?
As the situation escalated a few weeks ago we consulted our business continuity plans and thought hard about the challenges our business would face. Of course, the vast majority of businesses didn’t have plans to deal with a global pandemic and we certainly didn’t have any simple runbook to follow here.
“At a time of great change like this, the strong temptation is to introduce new processes and bureaucracy”
We asked ourselves these questions: What is the impact to our engineering team of suddenly becoming distributed? How will this affect our productivity and our ability to keep core business activities going? What do we need to change? What information do we need to figure this out? Above all, how do we support each other as colleagues, as friends, as people trying to get through this incredibly stressful situation?
At a time of great change like this, the strong temptation is to introduce new processes and bureaucracy. For instance, one understandable reaction that many companies probably considered was to reduce risk by slowing things down and being more careful with building and shipping.
Intuitively this kind of makes sense – after all, we were almost certainly going to be working with reduced engineering capacity and would be less capable at responding to problems. Surely at this time, going slower and adding situation-specific processes would keep us safer?
A time for the tried and tested
However, that wasn’t the approach we decided to take. On the contrary, we decided that relying on the tried and tested principles and approaches to building and shipping Intercom would be more important than ever.
“Intercom’s entire software delivery system is built on the ability to move fast…We decided that now is not the time to experiment with slower ways of shipping”
Shipping is our heartbeat, and we’ve built up a solid system to get software safely and quickly to production. Intercom’s entire software delivery system is built on the ability to move fast, react to change and rollback, or roll forward, rapidly in the case of something unexpected happening.
We decided that now is not the time to experiment with slower ways of shipping. For one thing, (as the Accelerate State of DevOps Report has shown) adding more processes and policies to software deployment is actually likely to increase the risk to software delivery.
The care needed to make fundamental changes to an already well running system without causing a lot of friction would inevitably add to the workload of an engineering leadership team that’s trying to cope with working under very new and changed circumstances.
Essentially, we decided that at a moment when all around us was strange and unfamiliar, leaving in place the structures and best practices would be not just the best approach for our shipping and product, but also for our people. And that’s really not a surprise – we’re a principles led company, so we’re sticking to our beliefs around how we build and how we ship.
Assessing the reality
That said, we needed to assess how this was working in practice and to understand any potential impacts to our capacity to deliver software, so we started to more carefully monitor critical metrics such as the rate of software deployments, rollbacks and software deployment related incidents in order to be confident in the health of our deployment pipelines and processes.
“The data so far indicates that we haven’t see any meaningful change to the pace of software delivery”
We’ve been working from home for a few weeks, and the data so far indicates that we haven’t see any meaningful change to the pace of software delivery – looking closely at the rate of code changes, number of PRs opened, number of merges to master, time from PR open to merge, deployment frequency, and so on, we could see that everything has remained steady.
However, we did suspect that there was the risk of very real changes in our velocity as this situation goes on due to suddenly working under very changed circumstances and having to collaborate with one another in a very new way.
To understand the impact to our organization as we all adapted to suddenly working from home, we surveyed our engineering team directly about the changes to their productivity and ability to collaborate. The results showed that many senior engineers, engineering leadership and technical program managers felt that their productivity was negatively impacted.
On the other hand, a lot of engineers did feel more productive working from home, but a small number had inadequate working setups that couldn’t be quickly fixed. Depending on how this crisis pans out, we may need to change what types of work we do to more effectively utilize our senior contributors. We’ve already postponed some non-critical work to free up engineering capacity, and at the very least we’ll be considering longer timelines for projects over the next while.
Oncall together, in isolation
We also examined how our oncall teams were adjusting to the changed circumstances. We definitely didn’t want to see an increased amount of downtime, especially as many of our customers were also suddenly a lot busier and some were responding directly to the crisis.
We already have a long-established distributed virtual team of volunteers to staff our out-of-office oncall. Our paging metrics were good going into this event, and again when we looked at the volume of alarms firing over recent weeks, we were not seeing any real change.
“It was the people we wanted to prioritize, so we chatted to the team to discover how they felt about remaining on 24 hour oncall for a week”
As with our other work, it was the people we wanted to prioritize, so we chatted to the team to discover how they felt about remaining on 24 hour oncall for a week while being largely stationary and working from home. While being stuck inside might seem conducive to quickly responding to pages when oncall, we recognized that general stress levels are considerably higher than usual, so we agreed as a team to more aggressively cover each other if the oncall shifts ended up being busy, without fundamentally changing the processes or shift patterns.
The value of the familiar
So far we’re confident that, despite having to suddenly become an all-remote team, we have adapted well.
Our software delivery systems and operations are robust and well understood internally. “Good enough” is a perfectly reasonable bar for now, as we are optimising for “function” over “effectiveness”.
“At a time of sudden, disorienting change, the familiar and reliable becomes more valuable and more necessary than ever”
Resisting the temptation to create new processes in reaction to this period of flux has proven the right choice – changing these systems now would cause a lot more work at a time when we need to be reducing the amount of work we’re doing, allowing our focus to remain on supporting our teams, customers and core business.
What we’ve found is that leaning on our principles and reliable processes has actually been the best thing for our customers and for us as people – we have realized that at a time of sudden, disorienting change, the familiar and reliable becomes more valuable and more necessary than ever.