Writing

16 essays on evals, agents & AI platforms

  • The Token Reckoning

    The default today is to start with a frontier model and build agents on top. That's the correct way to start. It is not where high-volume, policy-bound work ends up. As agentic token volume compounds, the workloads that have a fixed action space and a hard conformance requirement land on fine-tuned open-weight models. This post is about why that migration is an economic inevitability, and what you actually have to build first to make it — the harness and the eval set, in that order.

  • Your Eval Suite Is Already a Loss Function

    Everyone tuning an agent runs the same manual loop: change a knob, re-run the evals, squint at the score, repeat. But the moment you wrote an eval suite, you defined a loss function. You just weren't optimizing against it. This post walks the theory, and how I turned it into `holodeck test optimize`: a coordinate-descent optimizer that tunes the numbers with Optuna and the prompt with an LLM standing in for the gradient.

  • Production Considerations for the Claude Agent SDK, Part 2: Security Hardening

    Part 2 of 2. Permission posture, container runtime hardening, prompt-injection defenses, and an opt-in credential boundary — what I changed after I stopped the OOMs.

  • Production Considerations for the Claude Agent SDK, Part 1: Performance & Sizing

    Part 1 of 2. What I learned shipping a Claude Agent SDK service to Azure Container Apps — memory math, OOMs, concurrency caps, and why subprocess pooling isn't a thing.

  • Agent Workflows: A Solved Problem, Reinvented

    Every major agent framework now ships its own workflow engine. But workflows are a solved problem in software: finite state machines, distributed sagas, durable execution. So why are we reinventing them? A look at where graph dataflow actually fits, where the frontier labs sit, and why bundling orchestration into an agent SDK might age badly.

  • I Built an OpenTelemetry Instrumentor for Claude Agent SDK

    I needed observability for my Claude agents, so I built a drop-in OTel instrumentation package. Here's how it works and why I went with a hook-driven approach.

  • You Don't Need Any Other Agent Framework, You Only Need Claude Agents SDK

    I built a multi-backend agent platform. Then Claude Agents SDK shipped and I realized it was the only backend I actually needed.

  • Take Back the Stack. Your Cloud Provider Doesn't Want You To.

    Cloud providers want to host your AI agents. I think it's time to stop letting them.

  • RAG Is Dead. Long Live RAG. Or Is It?

    The hype cycle churned through RAG, GraphRAG, and vector-everything. Meanwhile, a quiet Anthropic blog post from 2024 showed us what actually works — and why most organisations are still getting information retrieval wrong.

  • From YAML to Production: Deploying HoloDeck Agents to Azure Container Apps

    A step-by-step walkthrough of building and deploying a customer support agent to Azure Container Apps using HoloDeck's new deploy command. No Kubernetes required.

  • Building a Filesystem + Bash Based Agentic Memory System (Part 1)

    Part 1 of a 3-part series exploring how to give agents filesystem and bash access. Research, patterns, and design goals for building a sandboxed execution environment.

  • How I Reduced My Agent's Token Consumption by 83%

    MCP servers are great until you realize you're burning tokens on 16 tool definitions for a simple "hi there". Here's how I implemented Anthropic's tool search pattern in Holodeck.

  • HoloDeck Samples

    Sample agents, prerequisites, and more for getting started with HoloDeck.

  • HoloDeck Part 2: What's Out There for AI Agents

    A look at the current landscape of AI agent platforms - LangSmith, MLflow, PromptFlow, and the major cloud providers. What they do well, what's missing.

  • HoloDeck Part 1: Why Building AI Agents Feels So Broken

    We're building AI agents with ad-hoc tools, fragmented frameworks, and no real methodology. I've been thinking about what's wrong with this picture.

  • HoloDeck Part 3: How I'm Approaching Agent Development

    HoloDeck applies ML principles to agent engineering - YAML configuration, systematic evaluation, CI/CD integration. Here's how it works.