Document-level map structure · Experimental

Document table-of-contents map

One card per document listing every detected table with caption, page, and dimensions. Enables two-shot retrieval.

In one paragraph

Single map per document. Lists every detected table: ID, caption, PDF page, doc page label, rows × columns, table type. ~3 KB per document — easily fits the smallest open-model context. Designed for two-shot retrieval: model reads the index, decides which table to request, then receives the specific card on a second call.

How the inputs are generated

Generation · 01

Generator script

evaluation_runs/generate_map_variants.py:render_doc_index

Input sources

• All pipeline-v0.6.1 cards in a document
• Per-card frontmatter (caption, page, dims)

AI use

No — pure deterministic transformation

OCR / re-OCR

Inherits from the upstream pipeline variant

Approximate processing time

<1 second for all 3 documents (407 cards aggregated into 3 maps).

Resource intensity

Low — CPU-only post-processing, runs in seconds

Determinism

Deterministic (same input → same output, byte-identical)

Output location

card_sets/pipeline-v0.7-doc-index/

Cards produced

3 maps (one per document)

Introduced

v0.7 map structures, 2026-05-22. Evaluated via M3-IDX mode, 2026-05-23.

Evaluation results

Diagnostic · 02

Best open-model score

5/13 (Qwen2.5-7B, Granite-3.3-8B with M3-IDX two-shot mode) — Qwen2.5-7B, Granite-3.3-8B

Avg open-tier pass rate

~25% across the 7 open models (vs ~55% under M3-L4 oracle)

Typical card size

V27: 17 KB index map | V35: 18 KB | NOAA: 3 KB

Evaluation cycle

Cycle 31

Relative to v0.6.1 baseline

Evaluated in cycle 31 via the new M3-IDX two-shot retrieval harness mode. Result: every model loses 2-6 cells vs the cycle 17 M3-L4 oracle ceiling. The strongest open models (Qwen-7B, Granite-8B) lose the most (-6 each) because the open-tier breakthrough was the oracle removing retrieval. Three structural failure modes surfaced: caption-quality bottleneck (Q-NAT-INT-001 had 0% retrieval — all V27 phosphorus-table captions read 'No caption detected.'), pipeline mis-label propagation (Q-NAT-012 had 5/8 models pick `table_124` instead of `table_125` because the pipeline mis-labels nauplii data as 'cyprid'), and negative-control retrieval immunity (Q-NOAA-NEG-001 had 0% retrieval but 88% pass because no table has 1948 data).

Caveats and known limitations

Scope · 05

• Many V27/V35 entries read `'No caption detected.'` because Docling couldn't OCR a caption for that table — the model has to guess from `rows × cols` and `type` columns. This is the dominant bottleneck cycle 31 exposed.
• Pipeline mis-labels (a wrong caption assigned to a table) cause retrieval failures across every model. Q-NAT-012's 0% retrieval is entirely due to the table_124 caption saying 'cyprid' when the data is actually nauplii — same mis-label first caught in cycle 2.
• Negative-control questions are retrieval-immune: when no table has the answer, picking the wrong table still produces a correct refusal. Q-NOAA-NEG-001 had 0% retrieval correct but 88% verdict pass.
• Recommended next variant: enriched doc-index with 2-3 sample rows + column headers + scope info per table entry. Index size grows ~3× but stays well under any context limit.

Related variants

Cross-reference · 06

← Back to all variants