Shape
Cloudbox has one important idea: one agent run gets one real Cloudflare computer.
Your agent
│
│ POST /api/runs { repo, commands, verify, artifact }
▼
Cloudbox Worker (Astro + Hono)
│
├─ auth, validation, API/docs
│
▼
CloudboxRunner Durable Object
│
├─ boots / reuses the container
├─ records runnerReceipts (boot, request, error)
│
▼
Cloudflare Container (cloudbox-runner)
│
├─ git clone repo
├─ run commands
├─ run verification
├─ collect artifact
└─ return { receipts, artifact, diff }
The long-lived control plane is the Worker. The real execution backend is a Cloudflare Container running Linux tools like git, node, bun, and pnpm. The CloudboxRunner DO owns container lifecycle so a hot container can serve repeated requests; lifecycle events ride back on every response as runnerReceipts.
Runner size is a deploy-time choice. Set CLOUDBOX_RUNNER_INSTANCE_TYPE to the largest instance type your account supports and CLOUDBOX_RUNNER_MAX_INSTANCES for concurrency. Heavy repos need headroom.
The API and docs site. It accepts run requests from agents, browsers, scripts, and CI.
A Durable Object fronts a Cloudflare Container. It clones repos, runs commands, verifies work, and returns logs, diff, artifacts, and runnerReceipts.
Per-workspace Durable Object holds spec, receipts, and file index. R2 holds artifact bytes for inspectable proof.
D1, Queues, Workers AI, Cron, and Access fit around this core; they are not the execution primitive.
Receipt-first design
Cloudbox is designed for agents I supervise. I do not only want their final answer; I want durable evidence of how they got there.
Every protocol action appends a receipt:
init— workspace was materializedread— agent inspected a filewrite— agent produced or changed an artifactask— agent consulted a collaboratorsubmit— agent committed to an objective outcomegrade— rubric was replayed against the trail
The grader is intentionally structural in v0. It checks facts like:
- read this path
- wrote this path
- read A before B
- asked this person and not that person
- submitted this objective
That keeps the demo loop deterministic and cheap.
Durable Object per workspace
A materialized Cloudbox maps to one Durable Object instance. The id is a stable hash of the spec, so repeated materialization is idempotent.
The DO owns three SQLite tables:
state— immutable spec and materialization metadatafiles— file index, kind, state, dependency edges, R2 keyreceipts— ordered, append-only evidence log
This gives every agent run an isolated world with its own durable trail.
R2 for bytes
The DO stores file metadata. File bytes and artifacts live in R2 under:
<computer-id>/<path>
Local/no-binding development still works with placeholder content. When R2 is bound, reads and writes persist real bytes.
Optional Cloudflare pieces
Cloudbox’s core loop is Worker + Durable Objects + R2. Other Cloudflare services fit naturally around it:
- D1 for cross-workspace indexes, sweeps, history, leaderboards
- Queues for bulk materialization or asynchronous grading
- Workers AI for brief-to-spec generation and future judge fallbacks
- Workflows / Cron Triggers for cleanup and long-running sweeps
- Access for private agent workspaces
Demo path
The demo is not separate from the product. It is a Cloudbox spec about Cloudbox:
README.md positioning
/docs/quickstart.md seven-minute path
/docs/architecture.md infra shape
skeptic release reviewer
artifacts/launch-note.md generated handoff
The agent earns points only by leaving the right receipts. That is the product: constrain the workspace, run the agent, inspect the trail, grade the behavior.