Monorepos are an organisational decision, not a technical one
You do not adopt Nx because it is cool. You adopt it when the pain of coordinating shared libraries across separate repos outweighs the simplicity of isolation. The tooling is good, but the real win is making team boundaries explicit through code structure.
Infrastructure-as-Code is a contract
When I write a CDK stack, I am not just provisioning resources. I am writing a contract: this is what the system looks like, reproducibly, forever. Drift from that contract is technical debt with real operational consequences. Treat it like application code — review it, test it, version it.
The best architecture decisions happen in boring meetings
Not in design docs or ADRs (though those matter). The real decisions happen in a 30-minute Slack call where someone says ‘should we really add another service for this?’ and the team actually stops to think. Slow down those conversations. They are cheaper than the refactor later.
Running 200 TPS on Lambda for an hour and learning humility
We load-tested our middleware at 200 transactions per second for a full hour. Lambda held. But the SQS queue depth graph looked like a heartbeat monitor, and I spent three hours convinced we had a bug. It was just auto-scaling warming up. Observability without context is noise.
Converting SOAP to JSON at the government’s pace
Legacy government systems speak SOAP. Modern microservices speak JSON. Building the translator layer in between — with WAF, private API Gateway, and Route 53 inbound resolvers — was the most unglamorous, most impactful work I have ever done. Nobody sees the middleware until it breaks.
What DLQs taught me about trust
The first time a dead-letter queue caught a batch of 3,000 failed messages in production, I felt sick. Then I realised: this is the system working. The DLQ is proof that we planned for failure. Reliability is not the absence of errors — it is knowing where they go.
Playwright + Python: bridging two worlds
Writing E2E tests for a TypeScript frontend using Python felt wrong at first. But the QA team owned Python, and the goal was automation, not purity. Two weeks in, we had 60% regression coverage on critical flows. Pragmatism beats elegance when deadlines are real.
The Teams webhook that became the most important endpoint
We built a Lambda that posts CloudWatch alarm summaries to a Teams channel. Took half a day. It became the first thing the on-call engineer checks in the morning. Sometimes the highest-ROI engineering is a webhook and a well-formatted message.
At what scale does serverless stop being the right answer?
I keep telling teams that serverless is good for bursty, unpredictable workloads. But where is the real inflection point? At 10,000 RPS? At sustained constant load? I have heard very different answers from very credible people and I still do not have mine.
How do you actually measure ‘code quality’?
Code coverage? Cyclomatic complexity? Pull request review time? I have used all of these. None of them capture the thing I actually care about: can a new engineer understand and safely modify this code in six months? I am still looking for a metric that gets close.