Open source · self-hosted · early access
Define a workflow as a pack — a few files of YAML and Markdown — and cairn compiles it into a single command that's eval-gated, policy-gated, and cost-capped. Reproducible, audited, on your infra. Fine-tune a model, gate a dataset, triage a CI/CD failure, run an incident RCA, ship a cited postmortem — the same governed command.
curl -fsSL https://cairndev.sh/install.sh | sh $ cairn run finetune --model mistral-7b --dataset ./sft.jsonl --max-cost 40 › resolving · gpu runpod:a100 · method qlora · backend unsloth › training · step 600/600 · loss 0.71 · spend $31.40 / $40.00 cap › eval · accuracy 0.93 vs 0.91 baseline · +2.1% › promote · staging → awaiting approval in Slack ✓ run complete · reproducible (cairn replay run-4f2a) · audited · $31.40
$ cairn run dataset-eval --dataset ./data.jsonl --checks pii,dedup,leakage › checking · format · dedup · train/eval leakage · pii › report · 3 findings cited › gate · FAIL · 12.4% train/eval overlap > 5% threshold ✓ run complete · blocked before a GPU spun up · audited · $0.02
$ cairn run eval --model ./out/adapter --suite ./evals/*.yaml --baseline prod › evaluating · 200 cases · 4 suites › scoring · accuracy 0.93 · safety -2.3% vs prod › gate · REGRESSION on safety suite → awaiting approval ✓ run complete · audited · $0.11
$ cairn promote run-4f2a --to staging --approve-in slack › approval · posted to #ml-releases · waiting on @owner › approved · by alex@ · 14:22 UTC · recorded › registry · mistral-7b-sft@v3 → staging ✓ promoted · who / what / when on the record · audited
$ cairn run inference-cost-optimize --service chatbot --target -30% › profiling · 7d traffic · 2 models · batch + cache analysis › proposal · route 64% to llama-3.1-8b · est -34% spend › gate · production routing change → awaiting approval ✓ run complete · audited · $0.05
$ cairn run rca --incident pd-4821 --service payments › gather · logs (clickstack) · recent deploys · blast radius › hypothesis · connection-pool exhausted after deploy v2.3.1 › evidence · 3 findings cited · trace ids attached › verdict · likely-cause WARN · gate · awaiting sign-off in Slack ✓ postmortem drafted · cited · audited (cairn replay run-9c2e)
Stand up a heavyweight ML platform — SageMaker, Vertex, the internal one — and wait a quarter to onboard, file a ticket for every GPU, and accept the lock-in. Or wire it by hand: a notebook, a pile of bash, a training script that breaks the moment a library bumps. Either way, when someone asks what produced this model, what did it cost, and who approved shipping it — you don't have a clean answer. The platform is too heavy. The scripts are too fragile. Neither is reproducible, and neither is yours.
Cairn is not an AI that guesses your workflow for you. It's a deterministic, reproducible execution layer you control. You declare the outcome — model, dataset, method, budget — and Cairn drives the tools underneath. Every run is content-addressed, so cairn replay reruns it exactly; cost-capped, so it stops before it overspends; gated, so nothing ships without a human; and recorded, so the trail of what happened already exists when you need it.
Today, Cairn spans the work behind AI — fine-tuning, evals, datasets, promotion, and incident response. The command surface stays the same as it grows.
cairn run finetune Now A trained adapter — cost-capped before it starts, reproducible after.
cairn run eval Now Your model scored against a baseline you actually trust.
cairn run dataset-eval Now PII, dedup, and leakage caught before you spend a single GPU-hour.
cairn promote Now A human approval in Slack — who, what, and when, on the record.
cairn run rca Now An incident triaged — cited root cause, blast radius, and a postmortem draft, gated for sign-off.
cairn replay Now Any past run, rerun exactly. “It worked last week” stops being a mystery.
cairn audit export Now What ran, what it cost, who approved it — as markdown you can paste into the ticket.
Next on the same rail: inference cost optimization · CI / error triage · IaC change review · synthetic data. One command surface for the work behind AI.
Declarative in. Reproducible, governed run out — like terraform apply for the work behind AI.
A short YAML/Markdown spec, or just flags on the CLI. No DSL to learn — the languages you already use.
It drives the operators underneath — GPU provider, training framework, eval suite, registry. Content-addressed, so it's reproducible from the first byte.
Cost ceiling enforced before the spend. Promotion paused for a human. Every step hash-chained into the audit log. No side door — not for a teammate, not for an agent.
Your keys, your GPUs, your logs. Nothing leaves your perimeter. cairn replay reruns any of it, exactly.
Speed and control are usually a trade. Cairn doesn't make you pick — control isn't bolted on top, it's enforced on the same command the developer already runs.
--max-cost is a hard stop that kills the run, not a dashboard you read after the bill lands.
Promotion pauses for a human; the decision is durable and logged — enforced where operators are invoked, so there's no route around it.
Every run answers what ran, what it cost, and who approved it — without anyone remembering to switch it on.
It applies to every operator — including the ones your own team builds.
The developer gets cairn run. The org gets the receipts. Same command.
Same engine, same guards — pointed at the work each role can't afford to get wrong.
Stop babysitting training scripts. One command fine-tunes, evals against your baseline, and stops the moment it blows the budget. Swap Unsloth for a Modal job or a RunPod A100 by changing a flag, not rewriting your pipeline. cairn replay reproduces any run bit-for-bit.
The work behind AI, as commands — not glue code. If you can run git, you can run a fine-tune, an eval, a cost check. YAML and Markdown in, a clean run record out. No new platform to learn, no UI to click through.
Give your team one command and keep the control. Every run inherits cost caps, an approval gate, and a hash-chained audit trail — enforced at the chokepoint, not bolted on. Define policy once; it covers every operator. Self-hosted, so nothing leaves your perimeter.
Reproducible runs, durable execution, an audit trail by default. Long jobs survive restarts. When something ships that shouldn't have, the log already says who approved it and what it cost — RCA without the archaeology.
You declare the outcome. You get a reviewable plan before anything runs — the approval gate is your plan step. apply reconciles the toolchain for you. The result is reproducible. The model you already think in, pointed at fine-tuning and the rest.
wires SaaS in someone else's cloud — no audit, no reproducibility. Cairn runs on your infra and proves every run.
abstracts by hosting it for you. Cairn abstracts while staying yours — no lock-in, no black box.
CI glue bolted to one repo and one vendor. Cairn is portable, governed, and built for ML & infra work.
watches what happened. Cairn runs it — and the audit trail comes for free.
A catalog of integrations is the least defensible thing you can build. Cairn defines one contract every operator satisfies — and the payoff is that any operator, including one a stranger publishes, automatically inherits durable execution, cost ceilings, the approval gate, and the audit log, without its author writing a line of that. Operators are open and sandboxed by trust level; what they plug into is governed the same way, by construction.
# discover what's installed $ cairn ls # preflight: operators, bindings, env, MCP $ cairn doctor # show the plan without running it $ cairn run finetune \ --model mistral-7b \ --dataset ./sft.jsonl --dry-run
Want release updates — and early access to the hosted team tier as it ships? Join the list.