Shape

Cloudbox has one important idea: one agent run gets one real Cloudflare computer.

Your agent

   │ POST /api/runs { repo, commands, verify, artifact }

Cloudbox Worker (Astro + Hono)

   ├─ auth, validation, API/docs


CloudboxRunner Durable Object

   ├─ boots / reuses the container
   ├─ records runnerReceipts (boot, request, error)


Cloudflare Container (cloudbox-runner)

   ├─ git clone repo
   ├─ run commands
   ├─ run verification
   ├─ collect artifact
   └─ return { receipts, artifact, diff }

The long-lived control plane is the Worker. The real execution backend is a Cloudflare Container running Linux tools like git, node, bun, and pnpm. The CloudboxRunner DO owns container lifecycle so a hot container can serve repeated requests; lifecycle events ride back on every response as runnerReceipts.

Runner size is a deploy-time choice. Set CLOUDBOX_RUNNER_INSTANCE_TYPE to the largest instance type your account supports and CLOUDBOX_RUNNER_MAX_INSTANCES for concurrency. Heavy repos need headroom.

Worker

The API and docs site. It accepts run requests from agents, browsers, scripts, and CI.

CloudboxRunner + Container

A Durable Object fronts a Cloudflare Container. It clones repos, runs commands, verifies work, and returns logs, diff, artifacts, and runnerReceipts.

ComputerDO + R2

Per-workspace Durable Object holds spec, receipts, and file index. R2 holds artifact bytes for inspectable proof.

D1, Queues, Workers AI, Cron, and Access fit around this core; they are not the execution primitive.

Receipt-first design

Cloudbox is designed for agents I supervise. I do not only want their final answer; I want durable evidence of how they got there.

Every protocol action appends a receipt:

  • init — workspace was materialized
  • read — agent inspected a file
  • write — agent produced or changed an artifact
  • ask — agent consulted a collaborator
  • submit — agent committed to an objective outcome
  • grade — rubric was replayed against the trail

The grader is intentionally structural in v0. It checks facts like:

  • read this path
  • wrote this path
  • read A before B
  • asked this person and not that person
  • submitted this objective

That keeps the demo loop deterministic and cheap.

Durable Object per workspace

A materialized Cloudbox maps to one Durable Object instance. The id is a stable hash of the spec, so repeated materialization is idempotent.

The DO owns three SQLite tables:

  • state — immutable spec and materialization metadata
  • files — file index, kind, state, dependency edges, R2 key
  • receipts — ordered, append-only evidence log

This gives every agent run an isolated world with its own durable trail.

R2 for bytes

The DO stores file metadata. File bytes and artifacts live in R2 under:

<computer-id>/<path>

Local/no-binding development still works with placeholder content. When R2 is bound, reads and writes persist real bytes.

Optional Cloudflare pieces

Cloudbox’s core loop is Worker + Durable Objects + R2. Other Cloudflare services fit naturally around it:

  • D1 for cross-workspace indexes, sweeps, history, leaderboards
  • Queues for bulk materialization or asynchronous grading
  • Workers AI for brief-to-spec generation and future judge fallbacks
  • Workflows / Cron Triggers for cleanup and long-running sweeps
  • Access for private agent workspaces

Demo path

The demo is not separate from the product. It is a Cloudbox spec about Cloudbox:

README.md                 positioning
/docs/quickstart.md       seven-minute path
/docs/architecture.md     infra shape
skeptic                   release reviewer
artifacts/launch-note.md  generated handoff

The agent earns points only by leaving the right receipts. That is the product: constrain the workspace, run the agent, inspect the trail, grade the behavior.