Evaluation mode · Experimental
M3-AC — All-cards mode
Model receives every card in a document concatenated. Tests retrieval-without-oracle.
In one paragraph
All 35 NOAA cards (or all 186 V27 cards) concatenated and served to the model in one prompt. Tests whether models can locate and answer from a document-scale card bundle without pre-selection. NOAA bundle fits in frontier-tier context; V27/V35 bundles overflow most contexts.
How the inputs are generated
Generation · 01Generator script
evaluation_runs/harness/core.py:load_all_cardsInput sources
- • All cards in a document (active variant)
AI use
No — pure deterministic transformation
OCR / re-OCR
Inherits from the upstream pipeline variant
Approximate processing time
Negligible bundling time; model inference: ~10-30 seconds per cell on bundles that fit context.
Resource intensity
Medium — model inference or moderate I/O
Determinism
Deterministic (same input → same output, byte-identical)
Introduced
Cycle 4, 2026-05-21.
Related variants
Cross-reference · 06- Evaluation modeM3-L4 — Oracle retrieval modeModel receives exactly one pre-selected card per question. Isolates 'can the model answer given perfect retrieval?'
- Document-level map structureDocument table-of-contents mapOne card per document listing every detected table with caption, page, and dimensions. Enables two-shot retrieval.