Presentation deck
Shipping a self-validating research pipeline
How a coding agent produces structured, evidence-backed company research at scale — and why the website never calls an LLM.
Yingting Huang · May 2026
What harness engineering buys you
Roughly 9,500 lines of harness — scripts, schemas, internal docs — produced 46 complete reports, 9,400 cited sources, and 14,400 atomic claims. Fully automated. Fully reproducible.
Every claim resolves to a quoted excerpt and a canonical URL.
The agent writes the content; the website never calls an LLM.
Static Astro build over typed YAML — no SSR, no inference at runtime.
03
Don't ask the agent to write a report
Ask it to run a pipeline. That distinction does most of the work.
Eight chapters, one tight loop
flowchart TB
Input[user input] --> Ctx[load runtime context]
Ctx --> Run[create report run]
Run --> Loop{for each<br/>of 8 chapters}
Loop --> Gather[gather sources]
Gather --> Draft[draft chapter YAML]
Draft --> Gate[chapter gate]
Gate -->|fail| Draft
Gate -->|pass| Loop
Loop --> Final[finalize:<br/>meta · ledger ·<br/>cross-chapter]
Final --> Out[YAML artifacts]
Out --> Site[Astro static build] - Every step is bound to a specific command, schema, and failure mode.
- The instruction surface is under a hundred lines; policy lives in config.
- The agent never has to remember the rules — it re-reads them every loop.
05
YAML is the world model
Every report is a typed YAML dataset, not a markdown document.
Why typed YAML pays off
A schema defines the envelope, ID grammar, controlled vocabularies, figure contracts, and a shared regex for inline claim refs.
- The agent cannot bullshit silently — every enum miss fails the gate with a one-line fix.
- The renderer needs no defensive layer: figure shapes are guaranteed valid at validation time.
- All 46 reports are isomorphic, so sector indexes and 'top-rated' rollups are trivial.
- A CI step replays every historical report through the current schema — no quiet corruption.
07
Workflow is configuration, not prompt
One config file is the agent's brief, the validator's rulebook, and the policy layer — all at once.
What that single config file owns
Edit one line; the next agent loop sees the new rule. Markdown the agent reads contains zero policy.
- Chapter briefs: descriptions, planned tables and figures, evidence strategy, quality bar.
- Per-chapter and per-report gates: minimum sources, distinct domains, paywall caps.
- Agent policies: research rules, hard rules, retry policy, treatment of volatile facts.
- The agent reads the projection of these rules — there is no second copy to drift against.
09
Why I rewrote the URL fetcher
Off-the-shelf 'fetch a URL' tools handle blogs. They don't handle SEC filings, paywalled news, anti-bot shields, or PDFs.
One default command, a few escape hatches
An agent's attention is a token budget. Every parallel path is a chance to choose wrong.
- Browser fingerprint rotation across TLS / HTTP-2 identity profiles, picked per host.
- Per-host fast-paths plus reader-proxy and Wayback Machine fallbacks when origins refuse.
- First-class PDF handling: magic-byte detect, pipe through extractor, return clean text.
- Boilerplate stripping by default; a flag for product and pricing pages where chrome is the content.
- Disk cache with TTL and structured JSON output, not a blob to re-parse.
11
Putting the agent in a well
An LLM is a strong but easily-distracted executor. Surround it with deterministic code that makes its mistakes legible and its corrections cheap.
The chapter gate is the agent's CI
flowchart LR
Draft[chapter draft] --> Check[chapter gate]
Check --> Fail{pass?}
Fail -->|no| Out[structured output:<br/>failures · warnings ·<br/>cascade-suppressed ·<br/>retry order]
Out --> Fix[agent fixes top<br/>of retry order]
Fix --> Draft
Fail -->|yes| Next[next chapter] - The validator returns structured entries, not prose.
- The agent switches on a stable enum, never NLP-parses a message.
- Failures are sorted upstream-first so fixes don't re-break what just resolved.
Four levers turn the loop into CI
- Stable failure dimensions — ~50 enums like tableShape or claimRefMissing. No NLP-parsing.
- One-line fixes per dimension, back-filled with specifics: 'add 2 more sources, one primary-tier'.
- Cascade suppression: mark root causes; hide derivative failures until they retest automatically.
- Retry precedence: upstream causes first, then corroboration, depth, and references.
Smaller, high-leverage additions
- Object-level aggregation: 'table T102 has 2 problems' beats two free-floating entries.
- Global hints when a dimension fires on three or more objects — fix the pattern, not the symptom.
- Acknowledged warnings: dismiss with a written justification instead of inventing content.
- Multiple output formats: text for humans, grep-friendly for shells, JSON for programs.
- Fail-safe retry budget: each retry must strictly reduce the failure count or stop.
15
The website is intentionally boring
Static HTML, no database, no SSR, no runtime LLM. The semantic content is decided at build time.
Renderer and validator share their contracts
flowchart LR YAML[YAML reports] --> Loader[Astro content loader] Loader --> Pages[run pages · sectors ·<br/>top-rated · search] Pages --> Renderers[diligence renderer ·<br/>figure renderer ·<br/>table renderer] Contracts[shared figure<br/>contracts] --> Renderers Contracts --> Validator[chapter gate] Renderers --> HTML[static HTML +<br/>print stylesheet]
- The renderer contract file is also imported by the validator — they cannot drift.
- Inline claim refs share their regex with the validator; one toggle hides every reference.
- Native window.print() plus CSS Paged Media gives clean A4 PDFs without Puppeteer.
Three sentences to take away
- YAML is the world model, not a rendering format. Every bug has a single home.
- Workflow is configuration, not prompt. The agent reads a projection — never the rules.
- The validator is the agent's feedback channel: stable dimensions, one-line fixes, cascade suppression, retry precedence.
- 9,500 lines of harness, 46 reports, 14,400 cited claims. Invest in the well; the agent does the digging.