The Standard for
Software Factories


We define how AI agent factories operate at scale. How they consume context, follow rules, and ship production code.

~/standra axiombench --report ════════════════════════════════════ AXIOM COMPLIANCE BENCHMARK v2 (using Karpathy's autoresearch) ──────────────────────────────────── MODEL COMPLIANCE TOKENS ──────────────────────────────────── Opus 4.6 █████████▏ 91.6% 239 Sonnet 4.6 █████████ 90.7% 239 qwen3-coder ████████▊ 87.5% 239 ──────────────────────────────────── Markdown ref. █████████ 90.5% 1195 ──────────────────────────────────── SAVINGS: 87% RUNS: 1247 RULES: 10 ════════════════════════════════════ ~/standra
METRICS
1,247 Validated Research Runs
7 Published Standards
5 Models Benchmarked
87% Token Cost Reduction
THE PROBLEM
-- Status Quo --

AI coding tools are everywhere. Standards are nowhere.

Every AI coding agent — Claude, Cursor, Codex — consumes project rules differently. There is no shared format for how agents read context, follow instructions, or report results. Teams waste tokens, lose compliance, and can’t switch tools without rewriting everything.

PRINCIPLES
architecture

Open Standards

We publish the protocols AI agents speak. Axiom for rules. CACP for communication. Adopted, not imposed.

Protocol-First
biotech

Empirical Research

Every standard is validated with real data. 1,200+ runs across proprietary and open-source models. Published, reproducible, honest.

Data-Driven
verified

Production-Proven

Our standards come from building real products, not committees. If it doesn’t work in production, we don’t publish it.

Ready for Scale
ECOSYSTEM

Standards & Benchmarks

Axiom

Problem: Rules are verbose prose. Agents waste 87% of tokens reading them.

Compiles rules into compact tabular format. Same compliance, fraction of the cost.

GitHub →

CACP

Problem: Agents return free-form prose. Parsing is fragile and expensive.

Structured I/O protocol. Typed fields replace 2000-token prose with 200 tokens.

GitHub →

PawBench

Problem: No standard way to benchmark LLM inference with tool-calling.

4-dimensional benchmark: multi-turn, multi-agent, parallel, with tools.

GitHub →

AxiomBench

Problem: Nobody measures if agents actually follow project rules.

Compliance benchmark. 10 rules × 8 tasks × 5 models. 1,247 validated runs.

GitHub →

ServingCard

Problem: Model serving configs are tribal knowledge, not portable metadata.

Open spec for quantization, serving params, and deployment config.

servingcard.dev →
RESEARCH
“Compressed rule formats achieve the same compliance as verbose instructions at 87% lower cost — validated across Claude Opus, Claude Sonnet, and open‑source models.”
91.6%
Opus
90.7%
Sonnet
87.5%
qwen3-coder
Read the paper arrow_forward
ABOUT

The Consultancy

Zen Process

AI engineering consultancy. We help enterprises adopt AI-assisted development with the right standards, architecture, and implementation.

zen-process.com →

The Factory

Zen Labs

Product factory. We build real products using AI-powered engineering. Our standards come from production, not theory.

zp.digital →
AXIOM v0.6.0 ░░ CACP v1.0 ░░ PAWBENCH v1.0 ░░ SERVINGCARD v1.0 ░░ AXIOMBENCH 1247 RUNS ░░ 87% TOKEN SAVINGS ░░ 5 MODELS BENCHMARKED ░░ RESEARCH-FIRST ░░ HONEST CLAIMS ░░ OPEN STANDARDS ░░ standra.ai ░░ AXIOM v0.6.0 ░░ CACP v1.0 ░░ PAWBENCH v1.0 ░░ SERVINGCARD v1.0 ░░ AXIOMBENCH 1247 RUNS ░░ 87% TOKEN SAVINGS ░░ 5 MODELS BENCHMARKED ░░ RESEARCH-FIRST ░░ HONEST CLAIMS ░░ OPEN STANDARDS ░░ standra.ai ░░