Main illustration: Franco Zacharzewski
From premature optimization to over-engineering solutions for your product, it’s easy to get caught up in making technology decisions that slow you down instead of speeding you up.
So when it comes to building your technical strategy, you need to assess each component in relation to what success will look like for your business.
This post is an adaptation of a talk I recently gave at the Amazon Web Services (AWS) community day event in Dublin about the technical strategies I’ve experienced that don’t work and the ones that have helped us to grow and scale at Intercom.
Many of these approaches are intended as reasonable defaults; they are my opinions, they aren’t rules, and they certainly won’t fit every situation.
They’re based on my experiences working in technology, the practical application of methods in varied use cases, and speaking with peers about their strategies and successes. Although they may seem like strong opinions, many of these tips echo the main tenets of software engineering: work with you’ve got, design solutions as needed, don’t repeat yourself, and keep it simple, stupid!
The top ten technical strategies to avoid
1. Multi-cloud architectures
If you’ve been paying attention to any loud marketing efforts over the last few years, you have definitely heard about multi-cloud. If you’re unfamiliar with the term “multi-cloud,” it means deploying your application to a heterogeneous cloud-based platform spread across multiple cloud providers.
While that doesn’t sound very bad, according to Corey Quinn, the world’s most notorious Cloud Economist, it goes against best practices or “sensible defaults” and is “the worst practice to be avoided by default.” Corey works with his customers on reducing their AWS bills, and he’s seen large numbers of cloud architectures in practice, so I think he’s a pretty good source on this.
Even thinking about implementing a multi-cloud architecture is prematurely optimizing for practically all businesses – especially startups – and not a trap you want to fall into. Your company probably has enough problems worth solving that are far more valuable than any of the mythical benefits of multi-cloud deployment.
A common misconception is that a multi-cloud strategy will help you avoid vendor lock-in, but this is mostly an illusion stemming from vague future business needs. It can also be a drain on resources, as abstracting away the value of any particular cloud provider is time-consuming and will hinder your ability to leverage the cloud for your business.
Look, there are situations where a multi-cloud strategy will be of benefit to you. Maybe you’re Netflix or Apple and own a large percentage of the total traffic of the internet? But for the rest of us? Pick one cloud provider and don’t even think about moving workloads between them. Going all-in on one cloud provider is where the magic of cloud platforms comes to life: that is, ease of use, simplicity of the platform, and efficiency.
2. Adopting the “best tools”
Don’t use the best tools for the job. Sounds counterintuitive, right? In AWS, the best tool for a highly available key-value data access store is probably DynamoDB, and the best tool for a bunch of time-series data is probably Timestream. However, if you already have a fully operational MySQL Aurora installation in place, can’t you just put the data there instead?
“You should optimize globally, and that means using the tools you’re already using”
Even in the cloud, adding new technology to your stack can be a distraction. You should optimize globally, and that means using the tools you’re already using. Don’t add to your stack unless you’re certain that your use case will not be satisfied by existing software.
At Intercom, we call this “Run Less Software,” and it’s part of our technical strategy of being technically conservative. We think it works for us, it’s helped us avoid building and maintaining a lot of stuff that would have slowed us down over time.
3. Containers vs. serverless host environments
Day one of your startup is probably not the time to be learning Kubernetes. Maybe it is if you have a multi-year runway and significant infrastructure to build, or if you’re in the infrastructure space selling to Kubernetes users. But unless you are already quite proficient at Kubernetes, the quickest way to get a service up and running is to use the most simple, flexible, and common building blocks available, such as a bunch of EC2 hosts in an autoscaling group behind a load-balancer.
At Intercom, we’ve found success running Lambda as glue code between AWS services. I think Lambda is an amazing piece of technology, but it has its place. It’s great at performing simple tasks triggered by events, such as resizing images uploaded into an S3 bucket. I like to think of them as stored procedures for the cloud. But I would not like to run a large, complex application using Lambda as the limitations are significant, and areas like observability still don’t feel fully mature.
Written by a couple of ex-AWS engineers, the book The Good Parts of AWS by Daniel Vassallo and Josh Pschorr is highly opinionated about which parts of AWS to use and includes a good discussion of Lambda. “We think Lambda is great—definitely one of the good parts of AWS—as long as you treat it as the simple code runner that it is. A problem we often see is that people
sometimes mistake Lambda for a general-purpose application host.”
If you think it’s the right addition for your stack, use it for what it’s good for – it’s not, yet, a general-purpose computing platform, but it does work really well with many parts of the rest of the AWS ecosystem, and Lambda team are adding great features all the time.
4. Microservices cause undifferentiated heavy lifting
Similar to Kubernetes, unless your team already has a lot of experience with microservices, most startups shouldn’t go near them. Using microservices adds complexity, increases the things that can go wrong, and it’s a lot more work to get many services set up well compared to one or two.
“Our teams wanted to build products, not maintain services”
About 6 years ago at Intercom, we thought it was inevitable that significant new functionality should be developed as a standalone service. So we built new features like our webhook and event processing as small services that talked back to our Ruby on Rails monolith. But over time, we noticed that teams hated working on these services.
There was so much overhead and undifferentiated heavy lifting involved in maintaining these services, and adding new functionality seemed to take longer, compared to doing similar work in our majestic monolith. Our teams wanted to build products, not maintain services. In the last few years, we’ve been folding services back into our Ruby on Rails monolith. I suspect a similar experience could apply to many other service-orientated architectures.
5. Configuring the AWS Console
I regret almost every time I configure something in the AWS Console. “Click ops” can be fast and effective, but the advantage of having a version-controlled, peer-reviewed definition of your infrastructure is significant. It doesn’t matter much if you’re using Cloudformation, Terraform, or higher level tools like AWS Cloud Development Toolkit. Anything is better than clicking around the AWS Console.
Most of the time, infrastructure defined in code or configuration is easier to maintain. Having infrastructure defined in code doesn’t mean you need to make it complex though. Abstractions here using modules can be very powerful but can cause unexpected side effects, so I prefer to avoid DRYing (Don’t repeat yourself) things in favor of simple declarative instructions.
6. Building for scale
The cloud is a great place to build for scale, but that doesn’t mean you have to. In his 1968 book, The Art of Computer Programming, Donald Knuth noted that, “Premature optimization is the root of all evil.”
“Adding extreme scalability before it’s really necessary easily leads you down the road of technical debt and other inefficiencies”
Sure you get unfathomable scale at your fingertips by using the likes of S3, Amazon Simple Queue Service (SQS), and DynamoDB, and these days computers are really fast. But, as co-founder of Extreme Programme, Ron Jeffries, observed back in the days of XP, there’s a high chance that, “You aren’t gonna need it.” Adding extreme scalability before it’s really necessary easily leads you down the road of technical debt and other inefficiencies. You really can do a lot these days with a very small number of computers talking to a single database.
7. Optimizing costs
Speaking of premature optimization: nobody likes to waste money, and there sure are many ways to waste money in the cloud. AWS billing and optimization is a hard problem, though it’s getting easier thanks to vastly improved native tooling and new ways of purchasing capacity, like Savings Plans.
I think it’s best to be reactive with costs. Ship whatever it is you’re building, then set a calendar reminder to check the costs later on down the line. It can be hard to predict exactly what something costs – like if you’re building an entirely new service, how much effort is it going to take to figure out the related bandwidth charges, Aurora Storage IO, and Amazon Simple Queue Service (SQS) costs? These are not always easy to estimate.
I also find it’s easy to “snack” on cost reduction projects. Removing a few unused Elastic IPs or EBS volumes can save a good few dollars per month, and it’s so satisfying to clean things up. But will it really change the future outcome of your business? I sometimes try to justify these cleanups as making our infrastructure easier to understand, which can be a problem when you have a 9-year-old AWS account. But, most of the time, it’s better to focus on big picture problems rather than snacking on small cost reductions.
Having said that, I do actually spend a good bit of my time optimizing costs at Intercom. For a mature business with a significant AWS spend and clear business requirements here, this is work that is definitely worth doing, with real business impacting outcomes. We’re certainly not alone in having achieved significant improvements in our bill through cost optimization.
8. Copying hugely successful companies
Reading the engineering blogs of hugely successful companies that used to be startups, like Netflix, Uber, or Airbnb, is a great way to get completely distracted and over-engineer solutions to problems you probably don’t yet have. Also, the information you really need from these successful companies typically isn’t revealed in a blog or a conference talk. These things are usually artifacts of some engineer’s Objectives and Key Results (OKRs). Instead, look to peer relationships with engineers at similarly sized startups. In my experience, this can be really effective.
9. Copying hyperscalers
It’s great to take inspiration from successful startups, but you absolutely should not be looking at enormous cloud providers like Amazon, Google, and Microsoft. Some companies may benefit from a monorepo, five-nines availability, microservices, or site reliability engineering (SRE). But these are mostly solving problems that huge organizations have. Instead of a startup worrying about, say, their chaos engineering strategy, it’s best to build entirely on a small set of well-understood managed services with great redundancy built in, where somebody else is worrying about how to use chaos engineering to improve their managed service.
10. Listening to me
One surefire terrible technical strategy for your startup is to do whatever you hear at a conference. Only you can understand your business context and technical challenges. The differences between core competencies and undifferentiated heavy lifting isn’t always clear cut. There are myriad human factors to take into account when defining and implementing a technical strategy. So don’t blindly do any of these things. They’re just my honest opinions, and they’re what work for me in my current role.
Top 5 technical strategies to follow
Now that we’ve reviewed the bad, and the downright ugly of technical strategies, it’s time to turn our attention to the good. Here are five effective best practices that can create long-term positive impacts:
1. Build in security
Security is job number zero of, well, anything on the internet these days. Not only are consumer expectations higher than ever, regulations like GDPR require a reasonable level of security built into your product.
“Burning customers through security problems is a sure way to lose customer confidence”
Burning customers through security problems is a sure way to lose customer confidence. I’ve been on both sides of this and it is real. Building security in every product and feature you build from day zero is way easier than adding it afterwards. As your startup moves upmarket, the conversations with larger purchasers will become more detailed and require more rigor in your product. Thankfully this is easier than ever to do with good secure options in the cloud, and constant improvements like those around S3 bucket configuration that can prevent you from running into classic problems while using the cloud.
2. Ship, a lot
At Intercom, we say that “Shipping is your company’s heartbeat.” I don’t think it’s a coincidence that companies that focus on shipping are successful. In fact, this has been shown to be the case with a great deal of rigor.
I consider Accelerate: The Science of Lean Software and DevOps: Building and Scaling High Performing Technology Organizations by Nicole Forsgren, Jez Humble, and Gene Kim to be the bible of high-performance technology organizations. The authors applied research methods to discovering best practices used in real companies to be successful. If you care about your organization’s success, no matter what the industry, the knowledge in this book will help you a lot.
3. Hire for potential
Ideally, you’ll aim to hire generalists who really want to grow. The growth mindset alone encourages growth, and boy are you going to need it in a fast-growing environment. SMEs or specialists offer seductive productivity, but ultimately they might turn into silos that can slow you down, limiting advantages of collaboration and team-owned problems.
“If you do hire people with deep expertise, you should ensure they are set up to share that expertise to help develop your team”
You won’t get the best solution if the entire team can’t work on your biggest problems. You want people to grow towards owning and deeply understanding your organization’s main problems – rather than being siloed by gatekeepers. If you do hire people with deep expertise, you should ensure they are set up to share that expertise to help develop your team.
4. Bias towards high-level services
As I’ve mentioned, it’s wise to pick a small number of well-understood services to use. Elasticache, SQS, and Amazon Relational Database Service (RDS) are far better defaults to use instead of using your own Memcached, RabbitMQ, or self-run clustered MySQL setup.
Similarly, I think some of the managed cloud security and AI/ML services are looking really great, and I’d kick the tyres of them before building something along the same lines. When you do need to solve some problems that go beyond your current tech stack, I’d recommend that you first avoid building anything and simply use something that’s available in your existing cloud provider.
5. Focus on the customer
Is this a cliché? No – if you actually put this into practice, this means world-class observability, monitoring, operational best practices, good uptime, performance, and solid security.
I think Jeff Bezos was on the right track when he talked about always working backwards from the customer. I first learned this while working at Amazon, and it’s truer than ever working at Intercom. If you don’t know what your customers are doing, experiencing, and thinking, you are not focused on them. Great tools like Intercom and Honeycomb can definitely help you out a lot to understand your customers.