Document-level map structure · Experimental
Document table-of-contents map
One card per document listing every detected table with caption, page, and dimensions. Enables two-shot retrieval.
In one paragraph
Single map per document. Lists every detected table: ID, caption, PDF page, doc page label, rows × columns, table type. ~3 KB per document — easily fits the smallest open-model context. Designed for two-shot retrieval: model reads the index, decides which table to request, then receives the specific card on a second call.
How the inputs are generated
Generation · 01Generator script
evaluation_runs/generate_map_variants.py:render_doc_indexInput sources
- • All pipeline-v0.6.1 cards in a document
- • Per-card frontmatter (caption, page, dims)
AI use
No — pure deterministic transformation
OCR / re-OCR
Inherits from the upstream pipeline variant
Approximate processing time
<1 second for all 3 documents (407 cards aggregated into 3 maps).
Resource intensity
Low — CPU-only post-processing, runs in seconds
Determinism
Deterministic (same input → same output, byte-identical)
Output location
card_sets/pipeline-v0.7-doc-index/Cards produced
3 maps (one per document)
Introduced
v0.7 map structures, 2026-05-22. Evaluated via M3-IDX mode, 2026-05-23.
Evaluation results
Diagnostic · 02Best open-model score
5/13 (Qwen2.5-7B, Granite-3.3-8B with M3-IDX two-shot mode) — Qwen2.5-7B, Granite-3.3-8B
Avg open-tier pass rate
~25% across the 7 open models (vs ~55% under M3-L4 oracle)
Typical card size
V27: 17 KB index map | V35: 18 KB | NOAA: 3 KB
Evaluation cycle
Cycle 31
Relative to v0.6.1 baseline
Evaluated in cycle 31 via the new M3-IDX two-shot retrieval harness mode. Result: every model loses 2-6 cells vs the cycle 17 M3-L4 oracle ceiling. The strongest open models (Qwen-7B, Granite-8B) lose the most (-6 each) because the open-tier breakthrough was the oracle removing retrieval. Three structural failure modes surfaced: caption-quality bottleneck (Q-NAT-INT-001 had 0% retrieval — all V27 phosphorus-table captions read 'No caption detected.'), pipeline mis-label propagation (Q-NAT-012 had 5/8 models pick `table_124` instead of `table_125` because the pipeline mis-labels nauplii data as 'cyprid'), and negative-control retrieval immunity (Q-NOAA-NEG-001 had 0% retrieval but 88% pass because no table has 1948 data).
Caveats and known limitations
Scope · 05- • Many V27/V35 entries read `'No caption detected.'` because Docling couldn't OCR a caption for that table — the model has to guess from `rows × cols` and `type` columns. This is the dominant bottleneck cycle 31 exposed.
- • Pipeline mis-labels (a wrong caption assigned to a table) cause retrieval failures across every model. Q-NAT-012's 0% retrieval is entirely due to the table_124 caption saying 'cyprid' when the data is actually nauplii — same mis-label first caught in cycle 2.
- • Negative-control questions are retrieval-immune: when no table has the answer, picking the wrong table still produces a correct refusal. Q-NOAA-NEG-001 had 0% retrieval correct but 88% verdict pass.
- • Recommended next variant: enriched doc-index with 2-3 sample rows + column headers + scope info per table entry. Index size grows ~3× but stays well under any context limit.
Related variants
Cross-reference · 06- Document-level map structureMulti-page continuation mapPer-document map of which tables continue onto the next PDF page.
- Per-table card variantCSV-only cardTable data rendered as raw CSV inside a Markdown code block. The most-effective open-tier variant.
- Evaluation modeM3-IDX — Two-shot retrieval modeModel picks a table from a per-document index, then receives that table. Tests retrieval + reading together; isolates the cost of removing the oracle.
- Document-level map structureEnriched document-index mapPer-document TOC with column headers + sample row labels + auto-detected scope. Designed to close the retrieval gap on tables with missing or uninformative captions.