This post walks through the complete Claude agent stack in six layers, from the developer's prompt to the private network target. Every architectural decision has a reason. We will get to all of them.
The single most important insight
Anthropic's infrastructure handles one thing: running the model. It produces text. When that text contains a tool_use block, the block is serialized as JSON and returned to whoever called the API. Your code runs the tool. Not Anthropic's.
This is not a minor implementation detail. It determines your entire security posture, network topology, and compliance story. Your database credentials, your internal API endpoints, your filesystem — none of it touches Anthropic's servers.
Layer 1–2: Entry surfaces
Developers reach Claude through three surfaces: the Claude.ai product (web, desktop, mobile, browser extension, Office integrations), first-party integrations like Slack or GitHub Actions, and direct API access via SDK or Claude Code CLI. All three converge on the same control plane underneath.
The practical difference is context: Claude.ai manages the system prompt and conversation for you. The API gives you full control. Claude Code is the API with a pre-built tool executor baked in — it is the fastest path from "I want an agent" to running code on your machine.
Layer 3: The control plane — what Anthropic actually does
Three subsystems run inside Anthropic's cloud, none of which execute your tools:
Model Router selects and runs the model. Haiku for latency-sensitive tasks, Sonnet as the default workhorse, Opus when extended thinking is needed. Prompt caching here is significant: cache hits reduce cost by up to 90% and latency by up to 85%. If your system prompt is long and stable — a common pattern in agentic setups — caching pays for itself immediately.
Project / Context Manager maintains the system prompt, attached knowledge, and the MCP tool schema registry. Claude Projects act as a persistent memory layer across sessions. Remote MCP servers registered here are proxied via HTTP/SSE — the model knows about the tools without the tool servers needing to be exposed to the internet directly.
Safety Gateway filters every response through Constitutional AI before it leaves Anthropic's network. It also handles routing to AWS Bedrock and GCP Vertex AI for customers with data residency requirements. When the model decides to call a tool, the gateway serializes the decision as a tool_use content block and returns it. That is the boundary.
Layer 4: Executor runtime — where tool calls land
This is the layer most teams underspecify. Four modes, each with different tradeoffs:
| Mode | Setup | Best for | Data residency |
|---|---|---|---|
| A · Claude Code | npx @anthropic-ai/claude-code |
Individual devs | Your machine |
| B · SDK Agent | Python/TS client loop | Custom workflows | Your server |
| C · MCP Fleet | HTTP/SSE MCP servers | Team tool sharing | Your infra |
| D · Bedrock/Vertex | Cloud-managed | Enterprise / regulated | Your AWS/GCP region |
Decision tree: Is this a single developer on a local machine? → Mode A. Custom business logic, running on a server? → Mode B. Team that needs shared private tools (internal DBs, ticketing, code review)? → Mode C. Regulated industry or data residency requirement? → Mode D.
All four modes share the same security property: outbound-only to api.anthropic.com. No inbound connections, no public IP required on the executor.
Layer 5: The tool_use loop
The agent loop is a simple while-loop that most developers write incorrectly the first time.
❌ The naive version — breaks on multi-step tasks:
# Wrong: single-shot, ignores tool_use
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=1024,
tools=tools,
messages=[{"role": "user", "content": user_prompt}]
)
# This only handles the first response — if the model returns
# tool_use, you never execute it and the task is abandoned.
✅ The correct loop:
import anthropic
client = anthropic.Anthropic()
def run_agent(user_prompt: str, tools: list, tool_executor: dict) -> str:
messages = [{"role": "user", "content": user_prompt}]
while True:
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=4096,
tools=tools,
messages=messages,
)
# Append the full assistant response (may contain mixed blocks)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
# Extract final text response
for block in response.content:
if block.type == "text":
return block.text
return ""
if response.stop_reason == "tool_use":
tool_results = []
for block in response.content:
if block.type == "tool_use":
fn = tool_executor.get(block.name)
result = fn(**block.input) if fn else f"Unknown tool: {block.name}"
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": str(result),
})
# Feed results back — the loop continues
messages.append({"role": "user", "content": tool_results})
continue
# Unexpected stop_reason (max_tokens, stop_sequence, etc.)
break
return ""
Three things to note about the correct version:
- The entire
response.contentlist is appended as-is to the message history. Stripping non-text blocks breaks the conversation state. - Tool results are returned in a
userturn, wrapped intool_resultblocks keyed bytool_use_id. Missing this ID causes a validation error. - The loop exits on
end_turn, not on "no more tool calls." The model decides when it is done.
Layer 6: Targets — what the tools actually touch
Depending on executor mode, the agent's tools reach into three categories of targets:
Private network (modes B, C, D): Git repositories, build caches, internal APIs, databases, staging environments. This data never leaves your infrastructure. The only thing that crosses to Anthropic is the tool's return value — a string or JSON blob. Design your tools to return summaries, not raw data dumps, when working with sensitive content.
VCS: The agent pushes a branch, opens a PR, and optionally triggers a webhook that loops back into the agent. GitHub, GitLab, Azure DevOps, Bitbucket all work. The agent does not merge — that decision belongs to a human reviewer.
Artifacts and outputs: Claude.ai renders HTML and React artifacts in a sandboxed iframe. File outputs — documents, spreadsheets, PDFs — are written to /outputs and served for download. Computer Use screenshots flow back as base64 image content blocks.
The MCP protocol layer
Model Context Protocol is the transport that connects the executor (Layer 4) to tool providers (Layer 5 and beyond). It deserves its own callout because it changes how you think about tool architecture.
Without MCP, every tool is a function in your executor's codebase. Adding a tool means a code change and a redeploy. With MCP, tools are services. A team's internal database tool, code review tool, and ticketing integration each run as independent MCP servers. The executor discovers them via the schema registry. Adding a tool means deploying a new MCP server and registering it in Claude Projects — no changes to the executor itself.
Two transports exist: stdio (subprocess, used by Claude Code for local tools) and HTTP/SSE (network, used for remote MCP servers and the cloud proxy in Claude Projects). For team deployments, HTTP/SSE with OAuth 2.1 is the right choice. The auth token never passes through Anthropic's systems — it is negotiated directly between the executor and the MCP server.
Zero Data Retention and the compliance story
ZDR means Anthropic does not log, store, or train on your prompts and responses. It is available on the API by agreement, and on AWS Bedrock and GCP Vertex AI by architecture (your data stays in your cloud account). For regulated industries — healthcare, finance, legal — Mode D (Bedrock/Vertex) with ZDR is the standard configuration.
The important nuance: ZDR governs what Anthropic stores. It says nothing about what your executor stores. Audit logging, request/response archiving, and PII scrubbing in tool results are your responsibility at Layer 4.
What this means in practice
When you debug a broken agent, the failure is almost always in one of three places: the tool_use loop not cycling correctly, tool results not being formatted as tool_result blocks with the right ID, or the system prompt not giving the model enough context to pick the right tool. The architecture above is stable. The loop logic and tool design are where the work lives.
Start with Mode A (Claude Code) to validate your tool design without ops overhead. Graduate to Mode B once you need to serve multiple users or run unattended. Add Mode C when multiple teams need access to the same internal tools. Mode D when legal says so.
The model is the easy part. The executor is where you earn your salary.
]]>