cairn

Open source · self-hosted · early access

Turn YAML and Markdown into governed AI commands.

Define a workflow as a pack — a few files of YAML and Markdown — and cairn compiles it into a single command that's eval-gated, policy-gated, and cost-capped. Reproducible, audited, on your infra. Fine-tune a model, gate a dataset, triage a CI/CD failure, run an incident RCA, ship a cited postmortem — the same governed command.

$ curl -fsSL https://cairndev.sh/install.sh | sh
$ cairn run finetune --model mistral-7b --dataset ./sft.jsonl --max-cost 40
› resolving · gpu runpod:a100 · method qlora · backend unsloth
› training · step 600/600 · loss 0.71 · spend $31.40 / $40.00 cap
› eval · accuracy 0.93 vs 0.91 baseline · +2.1%
› promote · staging → awaiting approval in Slack
✓ run complete · reproducible (cairn replay run-4f2a) · audited · $31.40
Built on LangGraph + Temporal Open source · permissive-licensed Python 3.12+ Self-hosted — your keys, your infra

The work behind AI has two bad options.

Stand up a heavyweight ML platform — SageMaker, Vertex, the internal one — and wait a quarter to onboard, file a ticket for every GPU, and accept the lock-in. Or wire it by hand: a notebook, a pile of bash, a training script that breaks the moment a library bumps. Either way, when someone asks what produced this model, what did it cost, and who approved shipping it — you don't have a clean answer. The platform is too heavy. The scripts are too fragile. Neither is reproducible, and neither is yours.

Not a black box.

Cairn is not an AI that guesses your workflow for you. It's a deterministic, reproducible execution layer you control. You declare the outcome — model, dataset, method, budget — and Cairn drives the tools underneath. Every run is content-addressed, so cairn replay reruns it exactly; cost-capped, so it stops before it overspends; gated, so nothing ships without a human; and recorded, so the trail of what happened already exists when you need it.

What can you run?

Today, Cairn spans the work behind AI — fine-tuning, evals, datasets, promotion, and incident response. The command surface stays the same as it grows.

cairn run finetune Now

A trained adapter — cost-capped before it starts, reproducible after.

cairn run eval Now

Your model scored against a baseline you actually trust.

cairn run dataset-eval Now

PII, dedup, and leakage caught before you spend a single GPU-hour.

cairn promote Now

A human approval in Slack — who, what, and when, on the record.

cairn run rca Now

An incident triaged — cited root cause, blast radius, and a postmortem draft, gated for sign-off.

cairn replay Now

Any past run, rerun exactly. “It worked last week” stops being a mystery.

cairn audit export Now

What ran, what it cost, who approved it — as markdown you can paste into the ticket.

Next on the same rail: inference cost optimization · CI / error triage · IaC change review · synthetic data. One command surface for the work behind AI.

How it works

Declarative in. Reproducible, governed run out — like terraform apply for the work behind AI.

  1. 01

    Declare the outcome

    A short YAML/Markdown spec, or just flags on the CLI. No DSL to learn — the languages you already use.

  2. 02

    Cairn compiles a governed run

    It drives the operators underneath — GPU provider, training framework, eval suite, registry. Content-addressed, so it's reproducible from the first byte.

  3. 03

    Governance at the chokepoint

    Cost ceiling enforced before the spend. Promotion paused for a human. Every step hash-chained into the audit log. No side door — not for a teammate, not for an agent.

  4. 04

    Runs on your infra

    Your keys, your GPUs, your logs. Nothing leaves your perimeter. cairn replay reruns any of it, exactly.

Built for teams that move fast but need control.

Speed and control are usually a trade. Cairn doesn't make you pick — control isn't bolted on top, it's enforced on the same command the developer already runs.

Cost ceilings, before the spend

--max-cost is a hard stop that kills the run, not a dashboard you read after the bill lands.

Approval gates that can't be bypassed

Promotion pauses for a human; the decision is durable and logged — enforced where operators are invoked, so there's no route around it.

A hash-chained audit trail, by default

Every run answers what ran, what it cost, and who approved it — without anyone remembering to switch it on.

Policy you write once

It applies to every operator — including the ones your own team builds.

The developer gets cairn run. The org gets the receipts. Same command.

One command. The team behind it gets the guarantees.

Same engine, same guards — pointed at the work each role can't afford to get wrong.

ML & AI engineers

Stop babysitting training scripts. One command fine-tunes, evals against your baseline, and stops the moment it blows the budget. Swap Unsloth for a Modal job or a RunPod A100 by changing a flag, not rewriting your pipeline. cairn replay reproduces any run bit-for-bit.

Developers

The work behind AI, as commands — not glue code. If you can run git, you can run a fine-tune, an eval, a cost check. YAML and Markdown in, a clean run record out. No new platform to learn, no UI to click through.

Platform engineers

Give your team one command and keep the control. Every run inherits cost caps, an approval gate, and a hash-chained audit trail — enforced at the chokepoint, not bolted on. Define policy once; it covers every operator. Self-hosted, so nothing leaves your perimeter.

SREs

Reproducible runs, durable execution, an audit trail by default. Long jobs survive restarts. When something ships that shouldn't have, the log already says who approved it and what it cost — RCA without the archaeology.

It's terraform plan/apply for the work behind AI.

You declare the outcome. You get a reviewable plan before anything runs — the approval gate is your plan step. apply reconciles the toolchain for you. The result is reproducible. The model you already think in, pointed at fine-tuning and the rest.

vs. Zapier

wires SaaS in someone else's cloud — no audit, no reproducibility. Cairn runs on your infra and proves every run.

vs. Heroku

abstracts by hosting it for you. Cairn abstracts while staying yours — no lock-in, no black box.

vs. GitHub Actions

CI glue bolted to one repo and one vendor. Cairn is portable, governed, and built for ML & infra work.

vs. Datadog

watches what happened. Cairn runs it — and the audit trail comes for free.

The operators aren't the moat. The contract is.

A catalog of integrations is the least defensible thing you can build. Cairn defines one contract every operator satisfies — and the payoff is that any operator, including one a stranger publishes, automatically inherits durable execution, cost ceilings, the approval gate, and the audit log, without its author writing a line of that. Operators are open and sandboxed by trust level; what they plug into is governed the same way, by construction.

  • Operators discovered via Python entry points — the engine never imports a pack
  • Durable execution + human-in-the-loop (approve in Slack), in-process up to Temporal
  • Profiles bind capabilities to providers — no vendor lock-in in the core
  • Your policies, audit history, and replayable runs accrue inside Cairn
Read the docs →
# discover what's installed
$ cairn ls

# preflight: operators, bindings, env, MCP
$ cairn doctor

# show the plan without running it
$ cairn run finetune \
    --model mistral-7b \
    --dataset ./sft.jsonl --dry-run

The CLI is open source. Start now.

Want release updates — and early access to the hosted team tier as it ships? Join the list.

What would you run first?

No spam — we'll only email about releases and early access.