// task-graph runtime LIVE

The Runtime

Models propose. Code decides. Every write passes a gate.

A task-graph execution engine that fans agent work out across the project portfolio and refuses to trust the results. Each audit node is a full plan/execute/review pipeline; a deterministic verifier re-checks every finding against the actual source before it is accepted; an advisory challenger can raise cautions on top; and the only path to a real file write is a replay of a saved, verified report through a chain of code-enforced gates. The whole thing runs locally, survives provider outages via model failover routes, and records every run in a durable ledger.

04 executable workflows 04 verifier failure domains 2,092 offline smoke checks $0.33 verified portfolio audit (usd)
why_it_exists [01]

Scale and trust

The linear pipeline proved one loop could run unattended. The next problem was scale and trust: run agent work across every project at once, and stop believing model output just because it arrived.

The runtime answers both. A graph engine wraps the proven pipeline as one node type and fans it out over the portfolio with bounded concurrency and a budget circuit breaker. Then a verification layer, written in code rather than prompted into a model, decides what is accepted downstream.

The design rule is constant across every layer: the graph structure, sequencing, and safety enforcement are deterministic. Only the inside of a node is allowed to be creative.

No model output is ever applied without passing deterministic validation gates. A model can propose a fix; it cannot reach the file.
What it adds
01Task-graph engine: fan-out, gather, budgets
02Deterministic verifier node (code, not model)
03Advisory challenger (cautions only)
04Replay-only apply engine behind gates
05Model failover routes per stage
06SQLite run ledger and queue
Status

Live. Four executable workflows run on the graph. Portfolio audit runs green with verification across all enrolled projects. Two real gated applies have landed in production code. Operated from the Studio dashboard inside VS Code.

the_engine [02]

One Graph, Many Pipelines

A workflow builds a DAG of nodes. Each fan-out child wraps the full plan/execute/review pipeline for one project. The graph is validated before any node runs, so a malformed graph fails before any model spend. Bounded concurrency and a per-run token budget act as circuit breakers: when the budget trips, in-flight nodes finish, nothing new is scheduled, and the run is marked rather than crashed. A failed node skips its descendants while independent branches continue.

flowchart TD
    P["Workflow profile
(committed JSON catalogue)"] --> B["Build + validate graph
DAG check, budget breaker"] B --> F["Fan-out: one audit node
per enrolled project"] F --> A1["audit/agentos-dashboard"] F --> A2["audit/salesforce-cicd-blueprint"] F --> A3["audit/salesforce-dev-tools"] A1 --> V1["verify"] A2 --> V2["verify"] A3 --> V3["verify"] V1 -.->|"fail: one bounded re-run
at reduced scope"| A1 V1 --> C1["challenge (opt-in,
advisory only)"] V2 --> G V3 --> G C1 --> G["gather: consolidated report
in declared order"] %% Blueprint tokens as literal hex (Mermaid cannot parse CSS vars or color-mix) classDef code fill:#62d99a1f,stroke:#62d99a,color:#dfe4ee,stroke-width:1.5px; classDef model fill:#8b93e81f,stroke:#8b93e8,color:#dfe4ee,stroke-width:1.5px; classDef entry fill:#e0be621f,stroke:#e0be62,color:#dfe4ee,stroke-width:1.5px; class P entry; class B,F,V1,V2,V3,G code; class A1,A2,A3,C1 model;

Purple nodes run models. Green nodes are deterministic code. The verifier and the gather never call a model; the challenger is model-backed but can only add caution, never change a verdict.

quality_gate [03]

The Verifier Is Code, Not a Model

Each audit node feeds a verifier node that re-checks the output deterministically: did the run actually complete and emit final JSON, does the report validate against the schema, and does every finding's anchor text resolve verbatim against the file on disk. A run that silently ran out of budget is a verification failure routed to a bounded re-run at reduced scope, not a clean pass. A clean zero-findings run with completion proof passes honestly.

output

Result exists and has the expected shape

completion

Final findings JSON was actually emitted

schema

Report passes the write-gate validation

anchor

Every finding resolves verbatim on disk

The four failure domains are a locked taxonomy, exported and smoke-tested. A finding the model cannot anchor to a real line of source is rejected before it reaches a report. This one gate killed the runtime's worst failure mode: audits that looked clean because the model never finished.

reliability [04]

Failover Routes, Not Single Models

Every pipeline stage resolves to an ordered model route: preferred first, then failovers. A transient provider error (429, 5xx, network) restarts the stage on the next model in the route; a real error (schema, validation, safety gate) is rethrown immediately rather than papered over. Stage telemetry records the chain of attempted models, so a run that survived an outage says so. Routes were born from a real incident: free-tier rate limits blocked the first portfolio run, and the fix was architecture, not retries.

Focused audit mode

A deterministic selector caps each audit at the highest-signal surfaces (entry points first, tests last, deduplicated by file) and injects a hard finalisation instruction naming the exact turn by which findings JSON must be emitted. Ending a run without JSON is a failure, not a pass.

Per-stage model choice

Five-layer precedence per stage: UI override, project, pipeline, workspace, stage default. Each layer can name a single model or a full route. Planning and review lean on reasoning models; execution leans on tool-reliable ones; every stage is swappable without touching the engine.

the_write_path [05]

Apply Is a Replay, Never an Improvisation

The runtime never writes source during an audit. A fix is applied later, by replaying a saved, verifier-passed report through a deterministic apply engine. Dry-run is the default; a real write requires an explicit flag and then survives a chain of gates enforced in code. Anchor matching is CRLF-aware (content is compared LF-normalised and written back in the file's native line endings), because the first real apply candidate was blocked by exactly that mismatch.

flowchart TD
    R["Saved audit report
(verifier-passed findings)"] --> RP["Replay:
--apply-from-code-report"] RP --> D{"Mode?"} D -->|"default"| PV["Dry-run preview
no write possible"] D -->|"--apply-write"| G1{"Inside FS_ROOTS allow-list?"} G1 -->|"no"| X["Rejected: no write"] G1 -->|"yes"| G2{"Working tree clean?"} G2 -->|"no"| X G2 -->|"yes"| G3{"Anchor matches source?
(CRLF-aware)
"} G3 -->|"no"| X G3 -->|"yes"| W["Apply the fix"] W --> G4{"Declared build gate passes?"} G4 -->|"no"| RV["Revert, native EOLs preserved"] G4 -->|"yes"| L["Append-only apply ledger"] %% Blueprint tokens as literal hex (Mermaid cannot parse CSS vars or color-mix) classDef entry fill:#e0be621f,stroke:#e0be62,color:#dfe4ee,stroke-width:1.5px; classDef gate fill:#e0be621f,stroke:#e0be62,color:#dfe4ee,stroke-width:1.5px; classDef code fill:#62d99a1f,stroke:#62d99a,color:#dfe4ee,stroke-width:1.5px; classDef fail fill:#e697521f,stroke:#e69752,color:#dfe4ee,stroke-width:1.5px; class R entry; class D,G1,G2,G3,G4 gate; class RP,PV,W,L code; class X,RV fail;
The gates are mandatory

A target with no declared build gate yields a gate error for every unit and no tree mutation. Apply cannot be combined with the graph or pipeline paths: one report, one target, one gated write path. Every apply is recorded in an append-only ledger.

Approval spine

For any future queued apply, an approval boundary already exists: grants bind to the exact apply fingerprint, are issued out of band via a CLI (grant, list, revoke), and fail closed. Approval grants no authority by itself, and queued apply remains impossible by design until it is deliberately switched on.

workflow_profiles [06]

Four Workflows on One Engine

Workflows are catalogued in a committed JSON registry with a hardened contract: profiles are descriptive metadata, an executable profile must map to a real workflow defined in code, and no JSON field can grant apply or execution authority. The registry is validated before any model call; a field that even looks like an authority grant throws at load. Project enrolment lives in a second committed registry, so adding a project to the portfolio is a reviewed git change.

Model-backed
code-audit

The reference workflow: fan-out code audit over the enrolled targets with optional verify and challenge stages. The only workflow with an (always gated) apply story.

Model-backed
website-audit

The proposal-first sibling: fans the website content audit out over the enrolled projects and consolidates. Read-only with no apply path; it writes nothing to the website.

Deterministic detector
documentation-drift

No model at all: inspects a bounded set of doc surfaces and reports candidate drift against known project state. Proof the graph layer carries more than audits.

Deterministic detector
changelog-coverage

Evidence source is git history: commits that landed after the newest changelog entry are reported as candidate coverage gaps. Detection only; drafting prose stays a planned, separate profile.

platform_spine [07]

Durable, Local, Honest About It

Underneath the runtime sits a local platform layer: a SQLite run ledger and queue (better-sqlite3, WAL) dual-written alongside the canonical JSONL telemetry, behind an async interface that could later point at a hosted database without touching callers. Store reads are opt-in with strict JSONL fallback. Queue payloads are secret-free by construction; the worker injects credentials at drain time. Orphaned store rows are retained and reported, never fabricated into fake history.

run telemetry
├── runs.jsonl / graph-runs.jsonl ── canonical, append-only
└── SQLite ledger + queue ── dual-written, WAL, opt-in reads
gateway (pm2, bearer-authed, loopback only)
├── run-control API: workflows, runs, live node events (SSE)
├── read-only: run history, queue status, profiles, projects
└── pinned non-secret env contract ── secrets never in config
proof_not_promises [08]

Validated Live

01
Portfolio audit, verified, for cents

The first portfolio-scale focused-plus-verify run came back green across all enrolled projects: every audit passed the deterministic verifier, for about US$0.33 in total model spend. Repeat paid runs confirmed the result was repeatable, not lucky.

02
Real applies, gated end to end

Two real fixes found by the audit have been applied through the full replay path (dry-run, gates, write, build gate, ledger) and committed: a dead write removed and a dead field removed. Small on purpose: the point was proving the write path, not the diff.

03
The failure modes are tested, not assumed

2,092 offline smoke checks run with zero token spend: DAG validation, budget trips, verifier mechanics, failover routes, CRLF apply matching, approval fail-closed behaviour, and the report parser against malformed model output. Runtime contracts are written down and locked in RUNTIME-CONTRACTS.md.

04
Reliability incidents became architecture

Free-tier rate limits became failover routes. A silently incomplete audit became the completion failure domain. A loose anchor became the verbatim anchor gate. A line-ending mismatch became CRLF-aware apply. Each hardening traces to a named, recorded incident.

See it operated

The runtime is driven from Studio, a hub inside the VS Code dashboard: discover workflows, price a run before launching it, and watch the graph execute live.

View the dashboard →