In engineering, you want to move fast, ship often and solve real customer problems. Yet competition and the exponential rate of change in software are pushing against that mission.
Enter our philosophy of Run Less Software. It means reducing choices amongst engineering teams and standardizing technology, so our team can spend as much time as possible delivering value to customers.
Rich Archbold, Senior Director of Engineering, has been at the forefront of codifying and scaling this philosophy over the past few years. The concept was the foundation for his popular talk and ultimately his classic, long-form blog post.
To extend that conversation, I hosted Rich on our podcast. Our chat covers the origins of Run Less Software, how it has evolved at scale, and how it differs from the equally valid approaches of other engineering teams. If you find this valuable, check out more episodes of our podcast. You can subscribe on iTunes, stream on Spotify or grab the RSS feed in your player of choice.
What follows is a lightly edited transcript of our chat.
Geoffrey: Rich, thanks for joining us. Since it’s your debut on the podcast, can you give us a feel for your career to date and your current role here at Intercom?
Rich: Today I’m the Senior Director for Foundations Engineering at Intercom. I’ve been here for about four years. Our team’s mission is to help Intercom evolve, scale and be trusted by every internet business in the world. We take care of most of Intercom’s backend technologies. That includes all of our cloud strategy, cloud operations, backend engineering, IT and security.
Prior to Intercom I spent about 10 years working for big tech companies. I spent a year and a half doing site reliability at Facebook. For about eight years before that, I was doing systems engineering work at Amazon. Over time I had eight different jobs there and progressed from an engineer, to technical project manager, to technical program manager, to manager, to a manager of managers. I started at Amazon before they had Amazon Web Services (AWS) so I was lucky enough to see AWS born out of the guts of all of the great operations work done for the amazon.com retail website.
What it means to Run Less Software
Geoffrey: Today we’re talking about a philosophy that you’re extremely passionate about called Run Less Software. This was a team value that became to talk, when led to a blog post and ultimately this conversation. In short, what does it mean to run less software and what are the benefits of this approach?
Rich: In the world we work in today, in this business, there are some harsh realities. They include that time is short, opportunities are fleeting, competition is fierce, and engineering resources are scarce. If we are actually trying to win our market and be the absolute best customer communications company, we need to make exquisitely good use of our precious engineering resources. That means we spend as little time as possible on “undifferentiated heavy lifting”, a phrase that Jeff Bezos coined many moons ago when he first talked about the founding of AWS.
We need to spend as little time as possible on undifferentiated heavy lifting problems that are already solved, and we need to spend as much time as possible creating enduring competitive advantage. We need to make sure the vast majority of our engineering time is spent building things that provide unique value and a great user experience for our customers – helping them solve the problems that they face.
Geoffrey: A lot of this is about choosing standard, or maybe even “boring” technologies. Can you describe our standards technologies? Why did we choose them?
Rich: We have a variety of standard technologies. You can think of them beginning from very base infrastructural technologies, and in our case we’re betting exclusively on AWS as a cloud vendor. When you come up the stack a little bit, even inside of AWS, we use a specific set of EC2 instances. We will use only a specific set of AWS data stores. Coming up the stack a bit more, there are a set of programming languages that we strongly recommend people use. We have Ruby on Rails; Ember, primarily for our web app; and React for our messenger.
Over time these are the technologies that are the safest and fastest for us
We chose these particular standard technologies, or boring technologies, because over time these are the technologies that are the safest and fastest for us to use. They’re really battle tested, well trained, well understood and well supported. They allow us to make very fast engineering decisions and to front load the cost and risk associated to quantify those engineering decisions.
There is a fantastic engineer, Dan McKinley, who previously worked at Etsy. He first publicized the idea of choosing boring technologies. He describes it as a way of enabling you to make fast decisions that create really strong velocity benefits and have really low operational risk or cost over time. This is exactly how we think about choosing standard technologies. It limits the amount of choices we make to ones that are well supported and well understood within the company. We can make fast decisions that are cheap and easy to maintain over time.
Born from an engineering offsite
Geoffrey: The documentation of this idea came out of very small off site three to for years ago. Put us in that room. How did this whole concept come about?
Rich: That’s a fantastic question, and probably my favorite one to answer, because it brings in so many elements of not just Intercom’s engineering philosophy, but also philosophies behind organizational health and organizational alignment.
I’d come from Amazon, and Amazon was an extremely values, principles and tenets based company. I had seen how each team, my own included, had a really strong set of engineering tenets and philosophies. It enabled them to make really good decisions that were well aligned and easily understood and accepted by teams above them and teams around them.
After coming to Intercom, it was very clear to me that we had this incredibly smart set of people, who have really interesting, opinionated and different ideas. Three months after I joined we had a team offsite, where we tried to flesh out and consolidate a small, non-exhaustive set of principles or tenets that maybe would unify the team. Doing so would allow us all to share our diverging perspectives and commit on this small non-exhaustive set.
Some of the things we talked about were operational overhead and cost of various data stores we were running, along with whether or not we needed to hire dedicated database administrators for various different technologies. Ciaran, our CTO, came up with this magical line. He said, “I want to run less software, not more software. I don’t want to hire more people for this stuff, because they won’t be building product. They’ll be operating the database.”
My job on that day was primarily facilitating the conversation and writing everything down. As we sifted through the notes in the days afterwards, this run less software sentence and the fierce debate that came out of it felt like the cornerstone of something really important for everybody to understand. It was one of four things that actually made the cut into our final set of engineering principles.
A testament to the team is that everybody bought into it and immediately figured out how to make it actionable. We made some good tactical decisions afterwards, where we slimmed down the number of database technologies that we used. We invested more heavily in particular types of training in order to build even more muscle in those core technologies, and we built some interesting tooling to support those core technologies. It was far more advanced than the types of tooling I had seen to support those technologies in Amazon and Facebook.
I thought, wow, the system works. We actually made a strategic decision, we invested in it, and very quickly over the subsequent three to six months we really saw that decision pay off.
Recodifying Run Less Software at scale
Geoffrey: Fast forward a few years from that offsite to today, and the engineering team has grown exponentially. How has the Run Less Software philosophy scaled over time?
Rich: It’s scaled well for the first year and then pretty terribly for the next two and a half years after that. It scaled well for the first year because although our team and org grew, it was still pretty easy for all of us to get in a room every three to six months and re-debate all of these core values and principles. We could redescribe them and reaffirm them from first principles, which kept that organizational clarity.
We found that this principle was being used almost politically.
After about a year, we just had too many people and we couldn’t all get in the room and re-debate these things. Previously, this one sentence tenet and small supporting paragraph was enough to provide organizational clarity, because everybody had talked it out. Then people started to use it, deploy it and make decisions based on it, having only the context of that one liner and a small little paragraph. Meanwhile, Intercom’s infrastructure grew, the architectures evolved, our product got way bigger, and we hired a bunch of new people who had different engineering skills.
We found that this principle was being used almost politically. It started to be used to justify one person’s decision versus another. “I think we should do X, because X aligns with run less software,” said by somebody who with the best intentions didn’t understand what run less software.
I got to the stage where I heard so many of these things that I felt terribly responsible for it. I felt like a thing that was created for good was now a thing for politics. We need to take some time to properly define this, and that’s actually where the blog post and everything else came from.
I interviewed Ciaran, and I interviewed Darragh, our VP of Engineering. I interviewed a bunch of our principal engineers, and even a bunch of our lesser tenured engineers, to get everybody’s thoughts and feelings on what Run Less Software should mean. I researched it on the internet and found Dan McKinley’s blog post, which seemed to align with a lot of what we were thinking. Then we spent some time articulating the philosophy in a little bit more detail. We found there was some more nuanced things, that we hadn’t realized back in the day when we were building smaller systems. The more evolved version of Run Less Software is to start off by really understanding the problem you are trying to solve for your customers. Figure out which building blocks can be provided by our existing standard technologies and which are the parts that can’t.
For the parts that can’t be solved easily today by our standard technologies, can we actually further break down those problems into smaller and smaller problems, which start to look like ones that can be solved by our standard technologies? That allows us to really figure out what parts are risky, what parts are difficult, and what parts we need to experiment with. That helps us orientate our tech plans for the things that we ship, and we are always shipping to root out ambiguity or add customer value as early as possible.
Not every startup is able to handle our throughput or reliability needs.
It also helps us when we start thinking about outsourcing on undifferentiated heavy lifting. Who do we outsource to? Do we really want to bet so heavily on AWS, or do we want to hedge our bets a little bit on a cloud-agnostic strategy? How do we think about the companies we outsource to? We realized that we think about them primarily from a security and privacy perspective. We also think about them from a reliability and operational perspective. Intercom is pretty big right now. Not every startup is able to handle our throughput or reliability needs.
We also think about it in terms of dollar cost. How much money are we going to have to spend versus how much engineers could we hire ourselves? We developed this much more sophisticated, more actionable, more usable set of guidelines and principles that helps people make a decision that’s not based on gut.
This spurred a bunch of great work by Ciaran, our CTO, who was able to codify a bunch of these standard technologies into centralized documentation, which is now part of the onboarding and training for all of our engineers. It is a live document, which can be updated, and it has a bunch of guidelines.
Now it’s a much more usable, actionable, examinable, editable and evolvable framework,and I’m happy with that.
Geoffrey: You joined when the engineering team was probably 10-20 people. Now there’s more than 100. How do you continue to move fast and keep everyone aligned at the same time? It must be a challenge.
Rich: Inside of that same tenets and principles creation meeting, Ciaran actually came up with another wonderful principle, which has held true: faster, safer, easier shipping. At the time he said, “Intercom is a place where we ship ambitious changes as a series of small safe steps and we are never afraid to deploy.”
Shipping changes is a series of small, safe steps plays into the evolved Run Less Software mentality. We figure out the problem we’re trying to solve, and how we can organize it in a way so that when we ship, we are removing ambiguity and creating learning and insight as early as possible in the process. That’s definitely how we move fast. Organizing work this way gives us more confidence that the rest of the plan we have is the right one.
The other part of that is we are never afraid to deploy. We have a team dedicated to continuous delivery and developer productivity. We’ve shipped over 1,000 changes to production in one week. In general, we are doing 100 to 200 changes every day. Our average test time is four minutes or less, and our average end-to-end deploy time is about 10 minutes. So we have a team of people who do nothing other than work really hard making lots of large and small changes and managing and nurturing a culture that basically maintains that level of ease and speed of shipping. It’s such a core part of our culture that would absolutely interrupt product work and defend that ability, rather than let it slide.
A philosophy, not the philosophy
Geoffrey: I know you speak frequently with engineering leaders at other tech companies, and you’ve mentioned the Run Less Software philosophy is different than how the teams at Stripe or Slack operate. What’s their point of view on this, and why is it different to Intercom’s?
Rich: That is really fun question. Anytime I go to conferences or I’m visiting San Francisco one thing I love to do is meet with people who are peers of mine or are in a similar type of role at companies I really admire. Stripe, Slack, Datadog and Fastly are some of the companies I’ve been lucky enough to have conversations with engineering leaders.
So far, I’ve found nobody who has a strategy like Run Less Software. I have, however, found two different strategies at play across a bunch of those companies. The first one is a cloud-agnostic strategy. What the leaders at some engineering companies seem to focus on most is minimizing their long-term hosting costs. One of the ways they can do that is by playing AWS and Google against each other in a bidding war for your business. The way they can make that a true negotiating strategy is by only using cloud-agnostic technologies.
on your constraints, there’s no right or wrong answer.
That means is they will only use technologies provided by AWS or Google where the cost of switching appears to be low between the two. Running a container in AWS feels very much like running a container in Google. The interfaces are very similar, the APIs are similar, and so the threat is, “If you don’t give me a good cost on this, I will pick it up and move it over to another company.”
This is a super valid strategy, but one of the tradeoffs is you’re going to have to hire more engineers and you are going to have to incur more human operational expense. I think one of the things that’s less scarce is money and one of the things that’s more scarce is engineers. That’s why while I think that cloud-agnostic strategy is a valid one, we are optimizing for getting more customer value out of the limited engineering resources that we have, so it’s not one that we choose.
One of the other valid engineering strategies I’ve seen is where there’s pretty much no constraints on the technologies you can use. Some companies allow the engineers huge control over the technologies they choose to help build their own products. The companies who I think do this best are the ones that are remote-friendly and their hiring bar is incredibly high. They will hire some of the best technologists in the world in a given technology and basically give them free reign to then use that technology.
At Intercom so much of the magic of our product development requires really, really close interaction between product, design, engineering, research and analytics. There are really tight face-to-face feedback loops and really high collaborative bandwidth between people. So that no constraints approach is not something that is completely appropriate for us right now. Also, there’s a bunch of security headaches that comes with that. Our security team would kill us if we tell them they have to support and battle test and give great guidance for people using pretty much any technology that they want to use.
It all depends on the constraints that you fix. If you’re relatively well funded and you’re willing to play a little bit of a longer game, you can get just as much value by betting heavily on AWS and letting them know that so you can partner in a bunch of ways.
Depending on your constraints, there’s no right or wrong answer, and it’s super interesting to hear how different people think about it.
Making an impact outside the codebase
Geoffrey: Let’s talk about hiring. How does the philosophy impact how or who we hire as an engineer at Intercom?
Rich: It definitely influences how we hire. That very first part of Run Less Software, which is about really understanding the problem you are trying to solve and being really pragmatic about how you solve it, that is not unique to Run Less Software at Intercom. It’s part of our whole product philosophy to really understand the problems that customers are facing and being super pragmatic about making sure that we provide the smallest, simplest, fastest thing as early as possible in order to solve their problem.
Our interviews are very much set up around making sure that when people tell us about the work they’ve done in the past, we make sure that they’re not telling us about technology implementation steps. We are checking to make sure that they really understood the problem they were solving and that they made the most pragmatic decisions at the time relative to their own set of constraints. They absolutely don’t need to have worked in a company that had a Run Less Software philosophy, but they should have understood the constraints in which they worked and made sure that they were always making the right pragmatic decisions inside of the framework they worked in.
We’re looking for high-level problem solvers.
We’re looking for these high-level problem solvers, people who have a lot of curiosity and inquisitiveness, self awareness and situation awareness, and who make good decisions.
From a technology perspective, while you can have a favorite technology and be very highly skilled in it, we look for people who are not dogmatic and won’t say, “I work at Ruby on Rails.” I love Ruby on Rails. It’s a happy coincidence that Intercom heavily uses Ruby on Rails, but should we ever decide to move away from it or ask them to work on a team which is more heavily invested in a different stack, this wouldn’t become a deal breaker for them.
Geoffrey: So much of the Run Less Software philosophy is about reducing choice and avoiding complexity. Do you find that Run Less Software has broader applications outside of engineering? It strikes me as something that would be valuable for a product team or even for a marketing team.
Rich: It’s resonated a lot with our design team. What I see there is that we will look to use common patterns and intuitable learnable patterns for solving particular problems. We are very deliberate and nuanced about when we decide to resolve something from first principles versus when we see there is an easy and obvious way of doing something. So yes, I do think Run Less Software is an engineering phrasing for some broader, product-wide, company-wide thoughts and feelings.
Geoffrey: Since the blog post was published you’ve gotten a lot of feedback, particularly from the Hacker News community, which is always interesting. What that’s been like, and how has it further shaped your thinking?
Rich: My role at Intercom is obviously as a Senior Engineer Director, but so much of what I think about is organizational health. When I think about organizational health I think about how hard it is to assume good intent. I think of a thing called the fundamental attribution error, which is where you say one thing and people naturally will think the worst of you. Their fight or flight response kicks in. I saw a little bit of that at times in some of the responses on Hacker News.
Part of the way we phrased the blog post is, “Here’s what we believe.” There is a risk that it can come across as dogma rather than strong opinions weakly held. Obviously there’s no right or wrong answer. As I’ve said, there are a bunch of different engineering philosophies that are equally valid. It just reminded me that when you’re talking in a large community environment, the more people there are, the more likely someone’s response is to give you their opposing opinion rather than to ask for further clarification or give you the benefit of the doubt.
If I was doing it all over again I’d probably put some sort of a disclaimer at the top that says, “Hey, there’s absolutely no right or wrong answers here. We’re not telling you what you should do. We’re not telling you should do what we do. We’re not even saying a thousand percent that what we’re doing is the right thing for us. We’re telling you to the best of our experience and the best of our reflection, this is what we’ve been doing and here is why we think it’s working for us – not why the world should go do it.”