In Part 1, I talked about why agent development feels broken. Before building something myself, I spent time looking at what’s already out there. Here’s what I found.
This is Part 2 of a 3-Part Series
- Why It Feels Broken - What’s wrong with agent development
- What’s Out There (You are here)
- What I’m Building - HoloDeck’s approach and how it works
The Landscape
A bunch of platforms tackle parts of this problem. I wanted something open-source, self-hosted, and config-driven—something that fits into existing CI/CD workflows without vendor lock-in. That shaped how I evaluated these tools.
Developer Tools & Frameworks
LangSmith (LangChain Team)
LangSmith is really good at what it does—production observability and tracing for LangChain apps. If you’re already in the LangChain ecosystem and need monitoring, it’s solid.
| Aspect | HoloDeck | LangSmith |
|---|---|---|
| Deployment Model | Self-hosted (open-source) | SaaS only |
| CI/CD Integration | CLI-based, works in any pipeline | API-based, needs cloud connectivity |
| Agent Definition | Pure YAML | Python code + LangChain SDK |
| Primary Focus | Agent experimentation & deployment | Production observability & tracing |
| Agent Orchestration | Multi-agent patterns | Not designed for multi-agent workflows |
| Agent Evaluation | Custom criteria, LLM-as-judge, NLP metrics (BLEU, METEOR, ROUGE, F1) | LLM-as-judge, custom evaluators |
| Self-Hosted LLMs | Native support (Ollama, vLLM, OpenAI-compatible) | Via LangChain integrations |
Different tools for different problems. LangSmith is about monitoring production apps; I was looking for something to help with the build-and-test loop.
MLflow GenAI (Databricks)
MLflow is a beast for ML experiment tracking. Their GenAI additions are interesting, but it’s designed for model comparison rather than agent workflows. If you’re already using MLflow for ML ops, the GenAI features slot in nicely.
| Aspect | HoloDeck | MLflow GenAI |
|---|---|---|
| CI/CD Integration | CLI-native | Python SDK + REST API |
| Infrastructure | Lightweight, portable | Heavy (ML tracking server, often Databricks) |
| Agent Support | Purpose-built for agents | Focused on model evaluation |
| Multi-Agent | Native orchestration patterns | Single model/variant comparison |
| Complexity | Minimal (YAML) | Higher (ML engineering mindset) |
| Agent Evaluation | Custom criteria, LLM-as-judge, NLP metrics | LLM-as-judge, custom scorers |
The infrastructure overhead was the main thing that put me off. I wanted something lighter.
Microsoft PromptFlow
PromptFlow has a nice visual approach—you can see your flows as graphs, which is great for understanding what’s happening. But it’s really about individual functions and tools, not full agent orchestration.
| Aspect | HoloDeck | PromptFlow |
|---|---|---|
| CI/CD Integration | CLI-first | Python SDK, Azure-centric |
| Scope | Full agent lifecycle | Individual tools & functions |
| Design Target | Multi-agent workflows | Single tool/AI function development |
| Configuration | Pure YAML | Visual flow graphs + low-code Python |
| Agent Orchestration | Multi-agent patterns | Not designed for multi-agent |
| Self-Hosted | Yes | Limited (designed for Azure) |
| Agent Evaluation | Custom criteria, LLM-as-judge, NLP metrics | LLM-as-judge (GPT-based), F1/BLEU/ROUGE |
If you’re building individual AI functions and live in Azure, PromptFlow makes sense. For agent-level work, it’s not quite there.
The Cloud Providers
All three major clouds have agent platforms now. They’re impressive, but they come with the obvious trade-off: you’re locked into their ecosystem.
Azure AI Foundry (Microsoft)
Azure AI Foundry is Microsoft’s enterprise play. It integrates with the whole Microsoft stack—Teams, Copilot, etc. If you’re already a Microsoft shop, there’s a lot to like.
| Aspect | HoloDeck | Azure AI Foundry |
|---|---|---|
| Deployment Model | Self-hosted (open-source) | SaaS (Azure-dependent) |
| CI/CD Integration | CLI, works anywhere | Azure DevOps/GitHub Actions |
| Agent Definition | Pure YAML | Semantic Kernel SDK + Logic Apps |
| Primary Focus | Experimentation & deployment | Enterprise agent orchestration |
| Agent Orchestration | Multi-agent patterns | Multi-agent via Semantic Kernel |
| Self-Hosted | Yes | No (Azure required) |
| Agent Evaluation | Custom criteria, LLM-as-judge, NLP | LLM-as-judge, NLP metrics |
The Semantic Kernel framework is interesting, but the Azure dependency is real.
Amazon Bedrock AgentCore (AWS)
Bedrock AgentCore is AWS’s managed agent service. Good for running agents at scale if you’re already on AWS and using their model offerings.
| Aspect | HoloDeck | Amazon Bedrock AgentCore |
|---|---|---|
| Deployment Model | Self-hosted (open-source) | SaaS (AWS-managed) |
| CI/CD Integration | CLI, works anywhere | AWS CodePipeline/API-based |
| Agent Definition | Pure YAML | Code (SDK + LangGraph, CrewAI, etc.) |
| Primary Focus | Experimentation & deployment | Enterprise agent operations at scale |
| Agent Orchestration | Multi-agent patterns | Multi-agent collaboration (supervisor modes) |
| Self-Hosted | Yes | No (AWS required) |
| Agent Evaluation | Custom criteria, LLM-as-judge, NLP | LLM-as-judge, custom metrics, RAG eval |
| Self-Hosted LLMs | Native support (Ollama, vLLM) | Bedrock models only |
If you want to use local models or run outside AWS, this isn’t really an option.
Vertex AI Agent Engine (Google Cloud)
Google’s entry into the agent space. The A2A protocol for multi-agent communication is interesting. Like the others, you’re tied to GCP.
| Aspect | HoloDeck | Vertex AI Agent Engine |
|---|---|---|
| Deployment Model | Self-hosted (open-source) | SaaS (GCP-managed) |
| CI/CD Integration | CLI, works anywhere | Cloud Build/GitHub Actions |
| Agent Definition | Pure YAML | Code (ADK, LangChain, LangGraph) |
| Primary Focus | Experimentation & deployment | Production agent runtime |
| Agent Orchestration | Multi-agent patterns | Multi-agent via A2A protocol |
| Self-Hosted | Yes | No (GCP required) |
| Agent Evaluation | Custom criteria, LLM-as-judge, NLP | LLM-as-judge (Gemini), ROUGE/BLEU |
| Self-Hosted LLMs | Native support (Ollama, vLLM) | vLLM in Model Garden (complex setup) |
Similar story—great if you’re committed to GCP, but not portable.
What’s Missing
After looking at all of these, here’s what I couldn’t find:
- Self-hosted and cloud-agnostic - Everything is either SaaS or tied to a specific cloud
- Declarative agent definition - Most require SDK code, not just config
- Vendor-neutral CI/CD - The integrations assume you’re using their ecosystem
- Testing + evaluation + deployment in one place - Usually you’re stitching together multiple tools
This is the gap I’m trying to fill with HoloDeck. Not saying it’s better than these tools—they’re solving different problems. But if you care about portability and owning your workflow, there wasn’t much out there.
Quick Reference
| If you need… | Look at… |
|---|---|
| Production observability for LangChain | LangSmith |
| ML experiment tracking at scale | MLflow |
| Visual prompt flow design on Azure | PromptFlow |
| Enterprise agents in Microsoft ecosystem | Azure AI Foundry |
| Managed agents on AWS | Bedrock AgentCore |
| Production runtime on GCP | Vertex AI Agent Engine |
| Self-hosted, config-driven, CI/CD-native | HoloDeck |
Next Up
In Part 3, I’ll walk through how HoloDeck works—the design decisions, the YAML config approach, the SDK, and what’s actually built vs. what’s still on the roadmap.