4 min read

Platform Engineering: Easy to Use, Hard to Mess Up

We used to have a saying on Recharge's Platform Services team: It should be really easy to use and really hard to fuck up. As a team building a platform that allowed not only ourselves, but other engineering teams to quickly spin up services, it's easy to see why we said this so often.

This team didn't really start as a Capital-P "Platform" team in the general sense, though. It started as a team that built an eventing and serverless platform (the Event Bus), then built first-party services on top, and exposed those services for other teams to consume. After each service we built on the Event Bus, we updated the underlying patterns, we defined new golden paths, we improved observability, and we exposed more utilities (like API clients, storage abstractions, base Cloud Function classes, etc). These services were so successful (in terms of launching quickly, enabling quick iteration, and scaling fast) that other teams wanted to start building services on the Event Bus too.

At first, other teams would look at what was done previously, recreate the setup of a repo, and go forward. This worked great early on. Other teams were getting a lot of the benefits. A lot, but not all. Many guardrails to keep us on the golden paths were conventions based on internal team knowledge. Also, our team didn't stop evolving the Event Bus platform. So, drift set in between repos. And once drift sets in, it compounds.

If you're having to set up CI/CD, API clients, error handling, and logging with each new service repo, you will almost certainly have drift between service repos. This means if someone moves from one service repo to another, there are nuances that will inevitably bite them. This also goes for incident response. The more "sameness" between services, the smaller context people need to know to solve things. They aren't looking in different places for different things. They know what they're looking for because it's the same for every service.

You see, there's a difference between having a single team that owns their own platform and a Platform that is set up in a way to enable other teams. We had to evolve into a Capital-P "Platform" team.

What I Mean by Platform Engineering

I tend to think of Platform Engineering as building internal platforms for the delivery and lifecycle management of services. It's about codifying the golden paths and making the right way, the easy way. Or, making things really easy to use and really hard to fuck up.

But, Platform Engineering is a multiplier. It amplifies whatever patterns and practices you already have. If your service design patterns are solid, Platform Engineering makes them consistently excellent across all teams. If your patterns are poor, Platform Engineering just makes it easier to build bad services faster.

In practice, this means:

  • Standardization – Every service starts from the same baseline.
  • Self-Service – Spinning up a new service takes minutes, not weeks.
  • Guardrails, Not Gates – You can move fast, but you stay safe.
  • Enablement Over Enforcement – Developers spend their time on product logic, not boilerplate.
  • Ongoing Updates – Each service can easily be kept updated to get the latest improvements shared across the platform.

Without Platform Engineering, you're reinventing CI/CD, IAM, observability, and more for every new service. As anyone who's ever set up a CI/CD pipeline knows, even when you have a reference point, it takes iteration to get it right. The same goes for API clients, error handling and logging layers, normalized request/response structures. All of those are decisions and configuration that take time to set up and get working correctly. With Platform Engineering, you run something like service new object-cache and start writing business logic immediately.

The Options

What happened with the Platform Services team was not only typical, but healthy. It's how pragmatic teams in pragmatic engineering organizations evolve. Like that team, most teams I’ve seen in this situation face the same three choices:

  1. Do nothing – Keep shipping, share one-off improvements between service repos, and fix problems as they come. Short-term velocity, long-term drag.
  2. Temporary Monorepo – Pull everything together into a single monorepo, but organize features into macro services. Then, after things are standardized and well-defined, break them back out later with your Platform Engineering approach. A structured path, but risky if your monorepo isn't working well (more on this later).
  3. Eventual Consistency – Keep repos separate, but invest in your Platform Engineering approach iteratively to make them eventually consistent: templates, shared modules, automation. This allows for an incremental approach but requires discipline.

The Spectrum

Whether you want to go with a temporary monorepo or stick with multiple service repos, you still need to put in the work. Both monorepo and multi-repo approaches can work, but they can both fail too.

  • A bad monorepo: slow CI, flaky tests, poor abstractions.
  • A good monorepo: fast pipelines, great patterns, consistent experience.
  • A bad multi-repo setup: drift and duplication.
  • A good multi-repo setup: standardized templates, automated propagation of improvements.

The difference? Good design patterns as the foundation. A bad monorepo amplifies poor abstractions and flaky patterns across all services, while a good one amplifies solid patterns consistently.

We saw this drift firsthand when someone created a function handler without using the existing base class and reimplemented a number of things incorrectly. This was found during an incident call and took time to understand. We went back and aligned it with the platform and made it harder to fuck up in the future with generated templates.

When a team spun up a service, we would take their questions and confusions, solve for them, then bring back improvements to the platform to benefit future services. This feedback loop is what turns either organizational approach from messy to manageable.

The Real Question

The question isn't whether to invest in Platform Engineering. It's how to get there. Consolidate first, or align incrementally? The answer depends on your commitment, team size, growth rate, and tolerance for disruption.

That said, a Platform Engineering approach isn't always the right answer. If your team is small enough that everyone can maintain shared context about patterns and practices, and you're not seeing people struggle with inconsistent implementations or step on each other during development, you might not need the formal separation and standardization yet.

More importantly, if you don't have solid service design patterns established, Platform Engineering might not be your first priority. You need that foundation first – otherwise you're just making it easier to consistently build the wrong thing.

With a sufficiently large organization, the Platform Engineering destination is the same: paved roads, golden paths, and engineers focused on building products instead of plumbing.