M3-HYDE — Vector retrieval (HyDE) mode
Vector similarity over pre-embedded cards. A small independent model writes a hypothetical answer; that answer is embedded and matched against the corpus. Works for any model size — no TOC navigation required.
Three-stage retrieval flow. (1) A small independent generator model (default qwen2.5:3b) writes a 1-3 sentence hypothetical answer in the style of a real archival answer — naming the value, row, column. The generator does not know the real answer; the hypothetical is purely a search query. (2) The hypothetical answer is embedded with `nomic-embed-text` (open-weight, 768-dim, ~270 MB, runs locally via Ollama). (3) Cosine similarity against the pre-built card index (`evaluation_runs/hyde/card_index.jsonl`, 407 entries, ~6.5 MB) — top-1 card is then served to the evaluation model under the existing M3-L4 prompt. The evaluation model never sees the hypothetical answer; only the retrieved card reaches it. Key advantage over M3-IDX two-shot: a 3B local model can do the retrieval for any size evaluator, including ClimateGPT-13B whose 4K context can't hold the V27/V35 indexes.
How the inputs are generated
Generation · 01evaluation_runs/cycle_runner.py:run_cell_hyde + evaluation_runs/hyde/retrieve.py:retrieve- • Pre-built card embedding index at evaluation_runs/hyde/card_index.jsonl (built once via build_card_embeddings.py)
- • Per-query hypothetical answer generated by qwen2.5:3b (HyDE generator)
- • nomic-embed-text (768-dim) for both card-side and query-side embedding
Related variants
Cross-reference · 06- Evaluation modeM3-IDX — Two-shot retrieval modeModel picks a table from a per-document index, then receives that table. Tests retrieval + reading together; isolates the cost of removing the oracle.
- Evaluation modeM3-L4 — Oracle retrieval modeModel receives exactly one pre-selected card per question. Isolates 'can the model answer given perfect retrieval?'
- Document-level map structureDocument table-of-contents mapOne card per document listing every detected table with caption, page, and dimensions. Enables two-shot retrieval.