Writing

16 essays on evals, agents & AI platforms

22 Jun 2026
The Token Reckoning

The default today is to start with a frontier model and build agents on top. That's the correct way to start. It is not where high-volume, policy-bound work ends up. As agentic token volume compounds, the workloads that have a fixed action space and a hard conformance requirement land on fine-tuned open-weight models. This post is about why that migration is an economic inevitability, and what you actually have to build first to make it — the harness and the eval set, in that order.
06 Jun 2026
Your Eval Suite Is Already a Loss Function

Everyone tuning an agent runs the same manual loop: change a knob, re-run the evals, squint at the score, repeat. But the moment you wrote an eval suite, you defined a loss function. You just weren't optimizing against it. This post walks the theory, and how I turned it into `holodeck test optimize`: a coordinate-descent optimizer that tunes the numbers with Optuna and the prompt with an LLM standing in for the gradient.
25 May 2026
Production Considerations for the Claude Agent SDK, Part 2: Security Hardening

Part 2 of 2. Permission posture, container runtime hardening, prompt-injection defenses, and an opt-in credential boundary — what I changed after I stopped the OOMs.
19 May 2026
Production Considerations for the Claude Agent SDK, Part 1: Performance & Sizing

Part 1 of 2. What I learned shipping a Claude Agent SDK service to Azure Container Apps — memory math, OOMs, concurrency caps, and why subprocess pooling isn't a thing.
30 Apr 2026
Agent Workflows: A Solved Problem, Reinvented

Every major agent framework now ships its own workflow engine. But workflows are a solved problem in software: finite state machines, distributed sagas, durable execution. So why are we reinventing them? A look at where graph dataflow actually fits, where the frontier labs sit, and why bundling orchestration into an agent SDK might age badly.
02 Mar 2026
I Built an OpenTelemetry Instrumentor for Claude Agent SDK

I needed observability for my Claude agents, so I built a drop-in OTel instrumentation package. Here's how it works and why I went with a hook-driven approach.
25 Feb 2026
You Don't Need Any Other Agent Framework, You Only Need Claude Agents SDK

I built a multi-backend agent platform. Then Claude Agents SDK shipped and I realized it was the only backend I actually needed.
8 Feb 2026
Take Back the Stack. Your Cloud Provider Doesn't Want You To.

Cloud providers want to host your AI agents. I think it's time to stop letting them.
7 Feb 2026
RAG Is Dead. Long Live RAG. Or Is It?

The hype cycle churned through RAG, GraphRAG, and vector-everything. Meanwhile, a quiet Anthropic blog post from 2024 showed us what actually works — and why most organisations are still getting information retrieval wrong.
28 Jan 2026
From YAML to Production: Deploying HoloDeck Agents to Azure Container Apps

A step-by-step walkthrough of building and deploying a customer support agent to Azure Container Apps using HoloDeck's new deploy command. No Kubernetes required.
16 Jan 2026
Building a Filesystem + Bash Based Agentic Memory System (Part 1)

Part 1 of a 3-part series exploring how to give agents filesystem and bash access. Research, patterns, and design goals for building a sandboxed execution environment.
16 Jan 2026
How I Reduced My Agent's Token Consumption by 83%

MCP servers are great until you realize you're burning tokens on 16 tool definitions for a simple "hi there". Here's how I implemented Anthropic's tool search pattern in Holodeck.
10 Jan 2026
HoloDeck Samples

Sample agents, prerequisites, and more for getting started with HoloDeck.
15 Nov 2024
HoloDeck Part 2: What's Out There for AI Agents

A look at the current landscape of AI agent platforms - LangSmith, MLflow, PromptFlow, and the major cloud providers. What they do well, what's missing.
15 Nov 2024
HoloDeck Part 1: Why Building AI Agents Feels So Broken

We're building AI agents with ad-hoc tools, fragmented frameworks, and no real methodology. I've been thinking about what's wrong with this picture.
15 Nov 2024
HoloDeck Part 3: How I'm Approaching Agent Development

HoloDeck applies ML principles to agent engineering - YAML configuration, systematic evaluation, CI/CD integration. Here's how it works.