textbook grounding for course generation by shrey-Bish · Pull Request #13 · DaRL-GenAI/instructional_agents

shrey-Bish · 2026-05-19T23:24:12Z

Adds an optional --use-textbook <pdf> mode. Instead of writing slides from the model's memory, it pulls relevant passages from a real textbook and writes each slide from that evidence. Without the flag nothing changes — output is identical to the original pipeline.

What it does

Reads the PDF into chapters and sections, turns equation images into real LaTeX, and indexes everything for keyword + embedding search.
For each course chapter, works out which textbook sections it should draw from — and if nothing fits well, skips grounding for that chapter instead of making things up.
Gives the slide writer the matched passages and tells it to teach in its own words from them, keep worked examples and math, and not invent claims.
After each chapter, a checker reads the slides back against the evidence and logs how many claims were actually supported.

What you get

Decks that trace to the book, using the textbook's own figures and captions.
A simple Grounding Fidelity % in evaluate.py --rigorous (supported claims ÷ total) that's easy to compare from run to run.

Notes

Early drafts inserted inline citation tokens in the slides; we dropped those — we don't grade attribution, so the deck stays clean and grounding is checked by the fidelity score plus a page-by-page render review instead.
All grounding code is gated behind the flag, so the default (no-textbook) pipeline is untouched. ~750 tests cover it.

toc.py + ingest_md.py + 38 tests + labeled fixture; markdown-it-py dep.

PyMuPDF-backed ingester that parses textbook PDFs into the shared textbook IR (Chapter / Section / Paragraph). Handles both whole-book PDFs and one-chapter-per-file directories. - font-size-gated heading detection with pattern + size tiers - split-number/title and wrapped-title heading merge passes - back-matter suppression and size-aware header/footer filtering - parser_quality scoring - shared chapter-builder reused from the markdown path (ingest_md now propagates real page numbers instead of hardcoding 0) Tests: labeled mini-PDF fixture plus skip-if-absent smoke tests against real eval PDFs.

When the new --use-textbook PATH flag is set, the system loads a PDF or markdown textbook, retrieves relevant passages per chapter, injects them into the writing prompts, and emits inline citation tokens like [han_data_mining_3e:ch6.s3:p15] in slides, scripts, and assessments. A new verifier inside evaluate.py scores each citation against the source on a 1-5 faithfulness scale. Three interfaces, all opt-in: CLI flag, API field on POST /api/course/generate, and a file-picker on the Web UI. When the flag is absent every code path falls back to the existing vanilla behavior -- byte-comparable output, zero citation tokens. Measured on two real textbooks (Han Data Mining 3rd ed., Agentic Design Patterns): - Agentic: faithfulness 4.33/5, citation precision 86.7 %, attribution rubric +65 % vs vanilla on slide content - Han: faithfulness 3.87/5, precision 71 %, attribution +111 % - Overall content quality unchanged within LLM-judge noise on both - Vanilla preservation invariant holds end-to-end New code: src/grounding/ (knowledge_base, retriever, contract, reranker), src/textbook/ (schema, ingest_pdf, ingest_md), GroundingAgent in evaluate.py, --use-textbook plumbed through run.py + ADDIE + SlidesDeliberation, textbook_path field + /api/textbooks/upload + /api/textbooks/list in api_server.py, file picker in frontend. 280 tests passing.

Before this change, the syllabus stage of the course generator was textbook-blind: it guessed at course structure from the topic title alone, and the per-chapter retrieval downstream had to paper over the mismatch. This shows up dramatically on narrow textbooks where the topic-title prior doesn't match the source -- the generated syllabus references invented "Course Reader" articles instead of actual textbook chapters. The fix is small and orthogonal. When --use-textbook is set, the textbook's table of contents (chapter + section list, ~400 word budget with graceful degradation) is rendered once and prepended to every foundation deliberation prompt (instructional_goals, learner_analysis, prereq_analysis, content_sequencing, assessment_design, syllabus). The agents now see the actual source before deciding what the course is about. The syllabus's "Required Readings" block lists real textbook chapter numbers instead of inventing article titles. Vanilla path is byte-identical -- the TOC injection is gated on self.knowledge_base is not None, and Deliberation.run() defaults textbook_context=None. When no textbook is in play the prompt assembly is unchanged from before. The copilot retry path threads the same TOC through _check_for_retry so first-call and retry don't drift. Measured impact (sequential matrix, seed=42, gpt-4o-mini judge): Agentic AI faithfulness 4.33 -> 4.41 (+0.08) citation precision 86.7% -> 91.13% (+4.4 pp) retrieval_bad 4.9% -> 2.0% (cut 60%) hallucination 4.5% -> 2.0% (cut 55%) attribution lift on slide_content vs vanilla: +125% (1.33 -> 3.00) Data Mining (Han) faithfulness 3.87 -> 3.86 (flat) citation precision 71.0% -> 70.26% (flat) attribution lift on slide_content vs vanilla: +75% (1.33 -> 2.33) Han is essentially flat because Han's bottleneck isn't syllabus alignment -- it's PDF extraction quality on math-heavy chapters. The residual 18.5% retrieval_bad slice is the cross-encoder reranker's target in the follow-up. Both grounded cells stay within +/-0.3 LLM-judge noise on overall_score vs their vanilla baselines, confirming the architectural fix does not measurably degrade overall content quality. Files: src/textbook/schema.py - Textbook.toc(word_budget=400) method src/grounding/knowledge_base.py - thin TextbookKnowledgeBase.toc() pass-through src/agents.py - Deliberation.run() accepts textbook_context kwarg src/ADDIE.py - ADDIERunner._textbook_toc_context() helper, threaded through run_foundation_deliberations() and _check_for_retry() retry path Tests (16 new, 296 total passing): tests/test_textbook_toc.py tests/test_foundation_deliberation_toc_injection.py The opt-in invariant is locked by TestDeliberationOptInInvariant (passing textbook_context=None must produce a byte-identical prompt to the pre-change behavior) and TestAddieRunnerTocHelper (the helper returns None when no knowledge base is attached).

xander-Huang-ASU · 2026-06-04T01:45:34Z

The results convincingly demonstrate increased attribution coverage and textbook grounding. However, the evaluation does not yet demonstrate strong citation reliability: citation precision remains around 72–80% on larger benchmarks, suggesting that a substantial fraction of generated citations are still unsupported or incorrectly retrieved. The PR appears to solve the coverage problem more than the accuracy problem.

Bumps EMBED_MODEL from text-embedding-3-small (1536-dim) to text-embedding-3-large (3072-dim) for stronger disambiguation between semantically similar chunks. Disk cache is keyed on the model name so existing small caches don't collide with large re-embeds. Vanilla path untouched.

Adds a generic pollution filter to Textbook.format_toc_outline that drops front-matter (preface, foreword, acknowledgments), back-matter (appendices, glossary, bibliography, index), and short boilerplate chapters (< 5 paragraphs across all sections) before the TOC is injected into foundation deliberations. Title-pattern matching is case-insensitive and anchored at start so "Preface" matches but "Chapter 1: Introduction to Preprocessing" does not. No per-textbook rules. Falls back to the unfiltered chapter list if filtering would leave the TOC empty.

Two grounded-path additions to ADDIERunner. Both no-op when textbook grounding is off. 1. Cross-encoder reranker: when knowledge_base is set, construct a CrossEncoderReranker on the default ms-marco-MiniLM-L-6-v2 model and attach it to the HybridRetriever for second-stage scoring on top-K RRF candidates. Falls back to a non-reranking retriever if the optional sentence-transformers dep or model load fails. 2. Admin scaffolding: after foundation deliberations finish but before chapter extraction, append a "Course Policies" section (instructor contact, grading, attendance, accessibility, academic integrity) to the syllabus output file via a separate LLM call that reads the produced syllabus and appends. Idempotent across --resume via a .pre_admin_scaffolding.bak sentinel sibling file. Recovers rubric metrics (transparency_of_policies, accessibility) that competed for prompt budget with the grounding directive in the syllabus deliberation. New test file tests/test_addie_grounding_runtime.py covers both paths and the vanilla no-op guarantee.

GroundingAgent gains an n_samples parameter (default 1, preserves existing behavior). When n_samples > 1, each (claim, chunk) pair is scored multiple times and the aggregate is taken: median of the numeric scores, majority-vote of the failure_mode bucket, and the rationale from the sample whose score is closest to the median. New --verifier-samples N flag on evaluate.py threads the value through. Trades N× verifier API spend for a tighter judge-noise floor (the ±0.16 per-call variance is the dominant source of small per-cell metric drift across re-evals). 8 new tests under TestSelfConsistencyVoting cover the n_samples == 1 passthrough invariant, majority-vote tie-breaking, median-score selection, and fallback-sample exclusion from the vote.

Two changes to make the reranker robust against environment drift: 1. Defensive construction in ADDIERunner: wrap the CrossEncoderReranker() call in try/except. On any failure (missing optional dep, model-load error, ABI mismatch), log a warning and fall back to first-stage retrieval (BM25 + dense + RRF) with reranker=None. The rest of the grounding pipeline is unaffected. 2. Floor + major-version upper-bound pins in requirements.txt for sentence-transformers, torch, and transformers. Locks out the next major (e.g. torch 3.x) which may ship breaking ABI changes while keeping the floor at versions verified to work with Python 3.13 and the default cross-encoder model. Together: even if a future pip install pulls in an incompatible combination, the reranker degrades quietly and grounded generation keeps running.

The chapter-extraction LLM was using textbook chapter numbers from "Readings: Chapter X.Y" references as the COURSE chapter numbers, producing duplicate labels like: Chapter 1: Introduction to Data Mining (was week 1) Chapter 1: Understanding Your Data (was week 2) Chapter 2: Data Preprocessing Overview (was week 3) Chapter 2: Data Cleaning Techniques (was week 4) ... when the syllabus's actual schedule used "Week 1: ...", "Week 2: ..." with textbook readings as a separate field. Latent for a while — the prompt's example showed "title": "Chapter 1: Introduction to Machine Learning" with no instruction to preserve the syllabus's own numbering. Surfaced on grounded runs where the syllabus contains many textbook chapter references; the LLM mimics the example and inherits the chapter numbers. Fix: rewrite the prompt to (1) use a "Week 1: ..." example, (2) explicitly require the LLM to preserve the syllabus's own week/chapter numbering, (3) explicitly warn against using textbook chapter numbers from readings as course chapter numbers. 3 new regression tests in tests/test_syllabus_processor_prompt.py assert each property of the updated prompt.

shrey-Bish · 2026-06-05T05:45:06Z

The results convincingly demonstrate increased attribution coverage and textbook grounding. However, the evaluation does not yet demonstrate strong citation reliability: citation precision remains around 72–80% on larger benchmarks, suggesting that a substantial fraction of generated citations are still unsupported or incorrectly retrieved. The PR appears to solve the coverage problem more than the accuracy problem.

I do agree this PR solves coverage more than accuracy problem, as a next step I am working on that !

Pure-Python module that classifies each page of a PDF as 'prose' or 'complex' based on PyMuPDF object metadata (image count + vector drawing count). Pages flagged 'complex' are candidates for VLM-based extraction; pages flagged 'prose' use the standard text-extraction path. Routing is cheap — inspects PDF object metadata, not text content — so it can be applied to every page before any expensive extraction runs. Default thresholds (any image OR drawings > 40) are empirically generic across textbooks: produce 21.4 % complex on Han Data Mining, 13.3 % on Agentic Design Patterns. No per-source tuning. 10 unit tests cover threshold boundaries, image-triggered complex classification, custom thresholds, and aggregation helpers. First module of the hybrid extraction pipeline. Subsequent modules (VLM adapter, hybrid ingester, cross-page stitching) build on this routing layer.

New module ingest_pdf_paged.py with ingest_pdf_file_paged() and ingest_pdf_directory_paged() that use pymupdf4llm.to_markdown(..., page_chunks=True) to get one markdown chunk per source page, then build the Textbook IR with REAL per-paragraph page numbers (the synthetic word-count-based pagination used elsewhere is bypassed). PyMuPDF4LLM as the markdown workhorse cleanly handles headings, tables, and code blocks that plain-text PyMuPDF flattens. Per-page extraction prevents the chunk-coarseness regression that the prior page_chunks=False attempt produced. Falls back to the plain-text ingester when pymupdf4llm is absent or produces an unparseable result. 9 unit tests cover per-page page- number tagging, the seen_chapter cross-page state, page-span aggregation, and the fallback path. First half of the hybrid extraction stack. Subsequent module (vlm_adapter) will augment complex pages with structured figure / equation descriptions before the chapter builder runs.

Renders a PDF page to PNG, sends to GPT-4o-mini vision with OpenAI Structured Outputs schema, returns parsed components (figures with descriptions, equations as LaTeX, tables with headers/rows, algorithms with steps). Pydantic models with discriminated unions (FigureComponent, EquationComponent, TableComponent, AlgorithmComponent) map cleanly to the API's response_format requirement and validate at the API boundary. Cropped page PNGs are saved to disk when figures_dir is configured — the downstream slide generator can reference them via includegraphics so the actual figures from the source PDF appear in the final materials, not just textual descriptions. Lazy OpenAI client construction so the extractor can be built without an API key (useful for tests). Every failure mode is defensive: render errors, network errors, schema rejection — all return an empty ExtractedPage with a logged warning rather than raising. The hybrid ingester (next phase) treats empty extractions as "use PyMuPDF4LLM output only" so a VLM outage cannot break a run. Smoke-tested on Han page 476 (OPTICS terminology): VLM correctly identified Figure 10.16 with caption, generated a concrete visual description, and extracted the Euclidean distance equation as LaTeX. 14 unit tests cover schema validation, lazy client construction, PNG-save behavior, and the defensive error paths.

New module ingest_pdf_hybrid.py combines three pieces: - spatial_router classifies each page as prose or complex - ingest_pdf_paged extracts clean markdown for every page (workhorse) - vlm_adapter augments complex pages with structured figure/equation/ table/algorithm content from GPT-4o-mini vision VLM-derived components flow into the Textbook IR as additional Paragraphs (kind=figure_cap, equation, or example) with the structured information encoded inline via parseable markers ([IMAGE_PATH: ...], [LATEX: ...], [TABLE: ...], [ALGORITHM_STEPS: ...]) for the downstream slide generator to consume in the next phase. Wired into knowledge_base.from_path via an opt-in vlm_extractor kwarg, into ADDIE.__init__ via a vlm_extraction bool kwarg, and exposed on the CLI as --vlm-extraction. All paths gated on vlm_extraction=True AND textbook_path being set — vanilla behavior is byte-identical without the flag. ADDIE is defensive: if VLM construction fails the run falls back to text-only extraction with a logged warning rather than crashing. Cropped page PNGs saved to .grounding_cache/figures/<textbook_id>/ so the slide generator can include them via \includegraphics in final materials. 13 unit tests cover the block-formatting helpers for each component type, the inline-marker output format, the vanilla preservation invariant (no extractor → delegates to paged ingester), and end-to-end IR construction with a mocked VLM. Full suite: 393 passing.

When the evidence excerpts contain inline markers from the hybrid ingester's VLM augmentation ([IMAGE_PATH:], [LATEX:], [TABLE:], [ALGORITHM_STEPS:], [DESCRIPTION:], [INSIGHT:]), _build_evidence_block now appends an extra VISUAL CONTENT RULES block instructing the LLM how to consume each marker for the artifact at hand. Artifact-conditioned rules: - slide / assessment: include figures via includegraphics, render equations via display math, render tables as LaTeX tabular, render algorithms as enumerated lists. - script: narrate the figure / table verbally using the adjacent description and insight markers; do NOT read raw LaTeX aloud. Critically, when no markers are present (vanilla and v2 evidence text), _build_visual_content_rules returns an empty string and the evidence block is byte-identical to the prior behavior. Vanilla preservation invariant holds. This closes the OTHER half of the v3 contribution: without these prompt-side rules, the LLM would happily ignore the VLM-extracted figures and equations and write text-only slides as before. With them, the LLM is steered to reproduce the source figures (\includegraphics), equations (\\[ ... \\]), tables (\\begin{tabular}), and algorithms (\\begin{enumerate}) in the final materials. 12 unit tests cover: empty rule block on plain evidence, each marker triggering its corresponding rule line for slides, script artifacts getting narration-flavored rules instead of LaTeX-emission ones, multi-marker evidence surfacing all relevant rules, and end-to-end integration via _build_evidence_block with a mocked retriever. Full suite: 405 passing.

When the hybrid extraction path routes some pages through a VLM, the parsed Textbook IR depends on what the VLM returns. The VLM is not strictly deterministic across runs (OpenAI seed is best-effort, and even at temperature=0 small variations occur). Without caching, the chunks built at generation time would NOT match the chunks built at verification time — citation tokens emitted during generation would fail to resolve during eval. Two changes that together produce deterministic ingestion: 1. New src/grounding/ir_cache.py — saves the parsed Textbook IR to disk as JSON after first ingestion, loads on subsequent calls. Cache lives under .grounding_cache/ir/<textbook_id>.json. Cache invalidation is manual (delete the file to force re-ingestion). Graceful fall-through on missing / corrupt / schema-invalid cache. 2. VlmExtractor pins temperature=0 and seed=42 on every API call so the VLM output itself is as deterministic as the API allows. TextbookKnowledgeBase.from_path gains two kwargs (ir_cache_dir, use_ir_cache=True) that default to cache-enabled at <project>/.grounding_cache/. The verifier and the generator share the same cache by default → tokens emitted at gen time resolve cleanly at eval time. 10 new unit tests cover save/load round-trip, parent-dir creation, cache miss returning None, corrupt-JSON returning None, schema- invalid-JSON returning None, end-to-end from_path cache hit short-circuiting the ingester, and use_ir_cache=False bypassing the cache entirely. Full suite: 415 passing.

The hybrid ingester puts each VLM-extracted figure / equation / table / algorithm into its own Paragraph carrying inline markers ([IMAGE_PATH:, [LATEX:, [TABLE:, [ALGORITHM_STEPS:). Until now the chunker bundled those paragraphs with up to ~500 tokens of surrounding prose into a single chunk — queries about a figure or equation would rank the chunk based mostly on the prose content, leaving the visual element effectively invisible to retrieval. This commit makes the chunker recognize paragraphs carrying any of the four visual markers and emit each as its own standalone chunk (typically 30-150 tokens). Prose paragraphs continue to be packed greedily up to TARGET_TOKENS as before; the only structural change is that visual paragraphs interrupt the prose stream — a prose chunk ends at the boundary, the visual chunk fires, then a new prose chunk starts after the visual paragraph. Overlap between adjacent prose chunks no longer crosses a visual paragraph, preventing figure / equation / table content from bleeding into the next prose chunk's overlap window. For visual queries this should rank the right chunk directly instead of burying it in a 500-token prose chunk — the chief lever for closing the silent-skip gap on figure-heavy pages. 6 new unit tests cover: figure paragraph becoming its own chunk, equation paragraph becoming its own chunk, table paragraph becoming its own chunk, three consecutive visual paragraphs each getting their own chunk, pure-prose sections behaving exactly as before, and prose-chunk overlap correctly stopping at visual paragraph boundaries. Full suite: 421 passing.

The chunker emits OVERLAP_TOKENS of overlap between adjacent prose chunks so adjacent chunks share context. As a side effect, the retriever can rank two neighboring overlapping chunks both in the top-K — the LLM then sees the SAME content twice in the evidence block and occasionally cites the wrong instance, which the verifier flags as wrong_chunk_cited / loose_paraphrase. New _dedupe_results helper in src/slides.py drops later occurrences whose chunk is byte-identical to a kept earlier chunk OR whose first 40 words match a kept chunk's first 40 words (catches the common overlap case where chunk N+1 starts with the last ~64 tokens of chunk N). Applied before evidence-block formatting; preserves the retriever's rank order — first occurrence of each cluster wins. 8 new unit tests in tests/test_evidence_dedupe.py cover empty input, unique chunks all kept, byte-identical dedup, overlap-prefix dedup, mid-content overlap NOT triggering dedup (different starts must be kept), rank-order preservation, empty-text chunks handled, and the short-chunk fall-through to full-text equality. Full suite: 429 passing.

…nces, empty bullets, image sizing Visual review of the generated PPTX surfaced six writer-side or converter-side rendering bugs that look ugly on the rendered slide but were invisible to the LaTeX-only path. Each is small but visible to the reader; together they were ~3-4 cosmetic defects on most content slides. Converter (src/latex_to_pptx.py + src/build_pptx.js): * Backtick / apostrophe LaTeX quotes ``…'' and `…' now convert to straight ASCII quotes "…" and '…' in unescape_latex(). Beamer writers emit these as a paired quote convention; the PPTX path was displaying the raw delimiters. * Markdown bold / italic that the writer sometimes emits even inside .tex output is now stripped in strip_latex_formatting(): **bold** / __bold__ / *italic* → plain text. LaTeX itself would ignore the asterisks, so they leaked unchanged. * Bare $…$ math fences without a real math environment now collapse to just the inner content. The writer used $\geq 30$ as shorthand for "≥ 30"; without a math renderer the PPTX path showed "$\geq 30$" verbatim. * Empty \item entries that produce a lone bullet on the slide are now dropped. Filter checks for whitespace-only and punctuation-only items and skips them so the slide doesn't render a hanging dot. * The catch-all unknown-command stripper now also consumes optional [opt] arguments before any {arg} group. Previously \includegraphics[width=...]{path} that fell through to the catch-all left "[width=...]{path}" as visible text on the slide. * \includegraphics image rendering box now respects the slide's available height. Caps at 3.2" or the remaining vertical space minus a small buffer, whichever is smaller, so figures never bleed past the slide edge. Generation-side cleanup (src/slides.py): * Markdown **bold** that survives into the writer's final .tex output now becomes \textbf{...} during the cleanup pass. Prevents the same bug from reappearing on future runs even if downstream consumers aren't running the latest converter. * A defensive fallback regex catches UNCLOSED [DESCRIPTION:/ [INSIGHT:/[IMAGE_PATH:/etc. markers. The strict regex requires the closing ']', but writers occasionally drop it, leaving the marker (and its prose) visible. Fallback strips from the opening marker up to the next "\ or newline. Test coverage: 8 new tests for backtick quote conversion / markdown stripping / bare math fences / empty item filtering, 3 new tests for the upstream markdown-bold-in-tex fix. 65 polish-related tests total, full suite stays green. Vanilla preservation invariant is unaffected — none of these touch the writer's prompt or the grounded retrieval path.

- parser: replace non-greedy itemize/enumerate match with a depth-aware helper so the outer environment doesn't get cut at the inner \end{itemize}. Previously nested bullets leaked into the parent item as raw text and rendered as phantom blank bullets. - pptx renderer: pre-fit \includegraphics by reading the PNG header and computing aspect-preserved box dimensions, instead of relying on pptxgenjs sizing:"contain" (which LibreOffice's renderer does not always honour). Wide figures now centre cleanly inside the content column without bleeding off the slide. - tests: three new regression cases covering nested itemize, enumerate-inside-itemize, and sibling itemize blocks.

Stacking text and bullets above a figure was squeezing every \includegraphics into <3" of vertical space on a 13" slide. The cropped figures from the PyMuPDF crop pass deserved more room. - new _stackElements helper centralises image-first ordering and a trailingH estimate for each child. Every standard layout (renderStandard, renderListOnly, renderBlocks, renderCodeSlide) now goes through it. - addPicture takes a trailingH so it can reserve vertical space for bullets/captions rendered after the image; the figure's box height is L.maxY - y - buffer - trailingH, capped at 4.5".

EroNinja · 2026-06-08T00:42:03Z

Thanks for adding me as a reviewer — I pulled the branch down, read through the grounding/ingestion code, and ran the suite locally (267 passing on my machine; the rest skip without the real textbooks). Nice work overall — the ingestion/retrieval split is clean and the opt-in gating is solid. A few substantive notes:

1. PDF extraction is the main quality limiter — worth supporting EPUB

The writeup already traces Han's elevated retrieval_bad to PDF extraction on equations/tables, and the plain-text path in src/textbook/ingest_pdf.py does a lot of heuristic work (font-size tiers, margin-band header/footer stripping, split/wrapped-heading merges) precisely because PDF carries no structure. EPUB sidesteps this — it's structured XHTML with real headings and a native nav/TOC, which is exactly what the new TOC-injection step depends on.

The groundwork is mostly there: schema.py already lists epub in source_format, and an ingest_epub.py could flatten the spine XHTML into the block format and reuse ingest_md._extract_blocks + _blocks_to_chapters — the same pattern ingest_pdf_file_via_markdown already follows. Looks like a modest lift for a real quality gain on any book available as EPUB. Happy to take this on if useful.

2. Citation tokens can collide in the verifier index (may skew the headline faithfulness numbers)

In evaluate.py, GroundingAgent builds:

self._chunk_by_token = {c.citation_token(): c for c in knowledge_base.chunks}

But Chunk.citation_token() is [textbook_id:section_id:p{page_start:02d}] — section + page only, no chunk index. Two chunks in the same section that share a page_start (easy to hit with markdown's ~250-word synthetic pages, and with short sections) produce the same token, so the dict silently keeps only the last one. A citation can then be scored against a different chunk than the writer actually drew from.

Since the faithfulness/precision figures are the main result, it'd be worth confirming this isn't shifting them. Disambiguating the token (include the chunk index, or map token → list of chunks and pick the best match) would close the gap.

3. `sentence-transformers` (→ torch) is a hard dependency for a path that's off by default

requirements.txt requires sentence-transformers>=2.7.0, which pulls in torch on every install — but per the comment in ADDIE.py the reranker isn't wired into the default runtime path. Moving it (and arguably pymupdf / rank-bm25 / markdown-it-py, only needed when grounding is on) into an optional extra would keep vanilla installs light and match the opt-in framing.

None of these are blockers — solid first version. Thanks!

shrey-Bish · 2026-06-08T02:47:55Z

Thanks for adding me as a reviewer — I pulled the branch down, read through the grounding/ingestion code, and ran the suite locally (267 passing on my machine; the rest skip without the real textbooks). Nice work overall — the ingestion/retrieval split is clean and the opt-in gating is solid. A few substantive notes:

1. PDF extraction is the main quality limiter — worth supporting EPUB

The writeup already traces Han's elevated retrieval_bad to PDF extraction on equations/tables, and the plain-text path in src/textbook/ingest_pdf.py does a lot of heuristic work (font-size tiers, margin-band header/footer stripping, split/wrapped-heading merges) precisely because PDF carries no structure. EPUB sidesteps this — it's structured XHTML with real headings and a native nav/TOC, which is exactly what the new TOC-injection step depends on.

The groundwork is mostly there: schema.py already lists epub in source_format, and an ingest_epub.py could flatten the spine XHTML into the block format and reuse ingest_md._extract_blocks + _blocks_to_chapters — the same pattern ingest_pdf_file_via_markdown already follows. Looks like a modest lift for a real quality gain on any book available as EPUB. Happy to take this on if useful.

2. Citation tokens can collide in the verifier index (may skew the headline faithfulness numbers)

In evaluate.py, GroundingAgent builds:
self._chunk_by_token = {c.citation_token(): c for c in knowledge_base.chunks}
But Chunk.citation_token() is [textbook_id:section_id:p{page_start:02d}] — section + page only, no chunk index. Two chunks in the same section that share a page_start (easy to hit with markdown's ~250-word synthetic pages, and with short sections) produce the same token, so the dict silently keeps only the last one. A citation can then be scored against a different chunk than the writer actually drew from.

Since the faithfulness/precision figures are the main result, it'd be worth confirming this isn't shifting them. Disambiguating the token (include the chunk index, or map token → list of chunks and pick the best match) would close the gap.

3. sentence-transformers (→ torch) is a hard dependency for a path that's off by default

requirements.txt requires sentence-transformers>=2.7.0, which pulls in torch on every install — but per the comment in ADDIE.py the reranker isn't wired into the default runtime path. Moving it (and arguably pymupdf / rank-bm25 / markdown-it-py, only needed when grounding is on) into an optional extra would keep vanilla installs light and match the opt-in framing.

None of these are blockers — solid first version. Thanks!

Thanks for the careful read. Addressing your points :

While I agree that it is a clean lift , but we want to keep PDF ingestion efficiency as the priority.
This has been addressed in the followup work.
The dep is no longer purely off-by-default in the followup work.

The grounded path (textbook ingestion, retrieval, semantic gates, optional reranker) only fires when --use-textbook is passed to run.py or evaluate.py. Its dependencies were previously declared as base deps, so every install pulled in pymupdf, sentence-transformers, torch, transformers, rank-bm25, and markdown-it-py (~400 MB of torch + transitives) even when the user never planned to use them. - pyproject.toml: new [grounding] extras group covering all six deps. Install with: pip install -e ".[grounding]" - requirements.txt: same deps grouped under a clearly marked "Optional: grounding" section, preserved at the bottom so `pip install -r requirements.txt` keeps every supported feature working (backward compatibility). - README: install section documents the three paths (vanilla / grounding extras / kitchen-sink requirements.txt). Verified: - A simulated vanilla install (blocking all six grounding imports at module load) successfully imports run, api_server, evaluate, and src.ADDIE.ADDIE — vanilla code path doesn't touch the grounding deps. - pip installable: importlib.metadata exposes both 'vector-db' and 'grounding' as extras. - Full pytest suite still 693 passed.

The semantic gates (bi-encoder cosine filter) and the cross-encoder reranker were the only consumers of sentence-transformers + torch in the runtime path. Both now load their respective MiniLM models through fastembed, which runs the same ONNX-exported weights via onnxruntime — no torch dependency. Numerical scores are unchanged (verified on a fixture): - Semantic gate cosine on "K-means clustering partitions data" vs "K-means is a centroid-based clustering algorithm": sentence-transformers: 0.692239 fastembed: 0.692239 - Cross-encoder reranker scores on the K-means probe set: sentence-transformers: +8.0968, -11.3451, -1.3409 fastembed: +8.0968, -11.3451, -1.3409 (delta ≤ 1e-6) Why both backends agree to 6 decimals: fastembed loads from the same HuggingFace repo (Xenova hosts the ONNX export of the original cross-encoder repo), so the weights are bit-identical. The two runtimes round their fused-op outputs slightly differently but the delta stays well below the gate thresholds (0.32 / 0.30) and the ranking-only contract of the reranker. Install footprint dropped: - vanilla: unchanged at ~50 MB (grounding deps are gated behind the [grounding] extras group; no run-time imports) - grounded: ~100 MB (was ~550 MB with torch + sentence-transformers + transformers) Edits: - src/grounding/semantic_gate.py: _ensure_encoder() now imports fastembed.TextEmbedding; _embed() materialises the iterator and L2-normalises (fastembed returns raw vectors, unlike sentence-transformers' normalize_embeddings=True). - src/grounding/reranker.py: CrossEncoderReranker uses fastembed.rerank.cross_encoder.TextCrossEncoder. The model name switched from cross-encoder/ms-marco-MiniLM-L-6-v2 to Xenova/ms-marco-MiniLM-L-6-v2 (same weights, ONNX-exported). Score path went from .predict([(q,p),…]) to .rerank(q, [p,…]). - pyproject.toml + requirements.txt: [grounding] extras now lists fastembed in place of sentence-transformers + torch + transformers. - README install section updated to reflect the lighter footprint. - tests/test_semantic_gate.py: stub encoder now mirrors fastembed's .embed(iterable) generator-of-arrays contract instead of the old sentence-transformers .encode(text, normalize_embeddings=True). - tests/test_grounding_reranker.py: lazy-load assertion now checks that fastembed/onnxruntime stay out of sys.modules until .score() is called (was: torch / sentence_transformers). Verified: 693 passed, same count as before.

Code shipped upstream should not reference internal iteration labels ("v6 Lever A", "v7 Step 9", "v3 hybrid extraction") that aren't defined anywhere in the codebase or its docs. Each affected comment has been rewritten to describe what the code does without the versioning prefix; nothing about runtime behaviour changes. Affected docstrings and comments: - src/grounding/semantic_gate.py: module docstring + Gate A / Gate B inline labels. - src/grounding/reranker.py: header retained (it was already cleaned in the prior commit). - src/grounding/write_time_verifier.py: module docstring. - src/grounding/usage_tracker.py: module docstring. - src/grounding/contract.py: constant comments (SECTIONS_PER_TOPIC, SUBTOPICS_PER_CHAPTER, smart-intro detection, meta-chapter abstain), helper docstrings, build-loop comments. - src/slides.py: constructor diversity-cap / gate / verifier comments, evidence-block coverage diversification + Gate A + diversity cap + visual-chunk-inclusion comments, visual-content rules block, evidence-strip block (cleanup + Gate B + write-time verifier), cross-chapter assessment context, per-slide grounding hooks, draft picker docstring. - src/ADDIE.py: lazy-construction comments for tracker, gates, verifier. - src/textbook/ingest_pdf_paged.py: module docstring + page-flag comment. - src/textbook/vlm_adapter.py: VLM retry docstring. - evaluate.py: ambiguous-token-rescue comments + docstring. Verified: 693 passed (same count as before), no behaviour change.

Three small, backwards-compatible changes targeting the residual failure modes after the v7 baseline (retrieval_bad 3.9-6.6%, loose_paraphrase 2.8-6.4%, ambiguous-chunk picks). None add a single LLM API call to the runtime path. 1. Sentence-bounded claim window (semantic gate + write-time verifier). Both consumers previously walked backward with ``rfind()`` for ``". "`` / ``"! "`` / ``"? "`` / ``"\n"`` and took the trailing 25-30 words. That heuristic split on common abbreviations (``e.g.``, ``i.e.``, ``Fig.``, ``Eq.``, ``Mr.``, etc.) and produced truncated or mid-sentence claim windows. Both call sites then scored / verified the wrong text. New ``src/grounding/claim_window.py`` factors out a regex sentence tokeniser with an abbreviation-suppression list. Both ``SemanticGate._extract_claim_window`` and ``WriteTimeVerifier._extract_claim_window`` now delegate to it. 16 new tests pin down the new behaviour. Expected impact: tighter Gate B strip decisions and verifier prompts, which should mostly reduce ``loose_paraphrase`` (the second-largest residual failure mode on Agentic at 6.4%). 2. Cross-encoder reranker warmup at ADDIE init (src/ADDIE.py). ``CrossEncoderReranker()``'s constructor is intentionally lazy — the ONNX model is downloaded and loaded on first ``.score()`` call, not at __init__. Before this commit, that meant init-time try/except caught import errors only; an actual model-load failure would silently fall back per-query, printing the same warning hundreds of times in one run. The init now does a tiny ``score("warmup", ["warmup"])`` call inside the try block so the ONNX model is materialised once, upfront. Failures get a single clear message and the reranker is set to None for the rest of the run. 3. Stopword-filtered Jaccard in ambiguous-token rescue (evaluate.py). ``_resolve_best_chunk`` picks the best of look-alike chunks by word-overlap with the claim. The earlier filter just dropped tokens shorter than 3 characters. Common filler ("the", "of", "in") still dominated the overlap, making the score brittle for topically similar siblings. We now reuse the lightweight stopword set the retriever already maintains at ``src.grounding.retriever._STOP`` (lazy import, defensive fallback to an empty frozenset if the import path shifts). The dominant content words now drive the rescue decision. This change is eval-side only — it can't move which citations the writer produces, only which sibling chunk the judge scores them against. The largest expected effect is a small drop in ``wrong_chunk_cited`` reports on multi-chunk-spanning tokens (the prior baseline showed ~75% of tokens were ambiguous). Tests: 708 passed (was 693; 15 new tests across claim_window).

The full Han 3rd-edition PDF (740 pages) exposed a latent bug: a small number of chunks emitted by the IR build exceeded OpenAI's 8192-token per-input limit on the embedding-3 models, almost certainly long visual captions or bibliography-style runs that the paragraph-aware chunker left whole because they were already "complete paragraphs." Every retrieval call on those chunks failed with a 400, the whole batch was rejected, and the writer silently fell back to vanilla prompts. The run then kept retrying for hours (~$8-10 of failed embedding API spend) while emitting zero citations. Three defensive layers, none of which trade off precision, recall, or information completeness: * Layer 1 — Sentence-aware chunk splitting at ingest. New `MAX_CHUNK_CHARS = 24000` (≈6000 tokens; ~25% headroom under the 8192 ceiling). `_split_chunk_if_oversized` in knowledge_base.py walks each emitted chunk; if the text exceeds the ceiling, it splits on real sentence boundaries (reusing the regex + abbreviation suppression list from `claim_window`) into sub-chunks that inherit the parent's section, chapter, page span, and `kinds`. Sub-chunks therefore EMIT THE SAME CITATION TOKEN as their siblings, so the existing ambiguous-token rescue in evaluate.py picks the best at score time — no downstream changes needed. * Layer 2 — Embedder-level batch guard. `OpenAIEmbedder.embed()` now checks each input's length before calling the API; oversized inputs are sentence-split, embedded as multiple pieces, then mean-pooled into one vector with L2 re-normalisation. The output shape (one vector per input) is preserved so the retriever's contract is unchanged. This catches anything Layer 1 missed (manual chunk injection in tests, future ingester paths, etc.). * Layer 3 — Fail-fast on consecutive retrieval errors. The silent-fallback path in `_build_evidence_block` now tracks consecutive failures of the same error class on the class object; after 10 same-class failures in a row, raise a RuntimeError with the last error attached. The counter resets on any successful retrieval so transient blips (brief rate limits, network hiccups) don't accumulate spuriously. Cost protection: writer + verifier calls otherwise keep running for the whole course generation even though no grounded evidence reaches the prompts. Plus a one-line operator-facing diagnostic at IR-build time: ``[grounding] N sub-chunks emitted from oversized parent chunks (max chunk size after split: M chars, ceiling: K).`` Surfaces the edge case in the run log without forcing future operators to dig. Public helper `split_into_sentences()` factored out of `claim_window.py` so both the chunker and the embedder can reuse the same boundary detector (regex + ~30-entry abbreviation suppression list). Tests: 14 new in tests/test_chunk_size_cap.py covering (a) Layer 1's metadata inheritance, citation token stability, sub-chunk id uniqueness, information preservation across split, and last-resort hard-slice fallback when a single sentence exceeds the ceiling; (b) Layer 2's output-shape invariance under mixed-size batches and oversized-input mean-pooling via a mocked OpenAI client; (c) Layer 3's threshold behaviour, error-class-aware counter, and counter reset on a successful retrieval. Full suite: 722 passed (was 708; +14 new). The broken IR cache for the full Han PDF at .grounding_cache/ir/the_morgan_kaufmann_...2011.json was deleted so the next ingest rebuilds with the corrected chunker.

The sentence-bounded claim window helped Agentic (+2.92 pp precision) but regressed the Han 6-chapter subset (-3.84 pp), with only ~7 % citation overlap between runs suggesting the divergence reaches far upstream. Keep the rfind heuristic in semantic_gate + write_time_verifier until the cross-textbook effect is isolated. The sentence-end regex stays in claim_window.py — the chunker and embedder size guard still use it.

Two independent bugs were blocking VLM-extracted figures from reaching the final slides.tex. After both fixes, a single-chapter clustering run on Han 3e (page 481-534) produced 2 real \includegraphics tokens pointing to verified PNGs (Figure 10.1, DBSCAN density-reachability) — the first iteration where any figure landed in the deck despite 129 PNGs sitting on disk. (1) Visual chunk hoisted to FRONT of injected results. The block-builder loop walks results in rank order with a fixed word budget; large prose chunks (~450 words each in cluster chapters) exhausted the 1800-word budget in four iterations, so the appended-at-tail visual chunk never reached evidence_text. Hoisting it to position 0 guarantees its IMAGE_PATH / LATEX / TABLE / ALGORITHM_STEPS marker survives into the prompt, which is what triggers the visual-content rule block. (2) Teaching Assistant prompt now lists \includegraphics and tabular as available LaTeX features alongside itemize / enumerate / block / lstlisting / equation. Adds an explicit PRESERVE-FIGURES instruction so \includegraphics commands from the Faculty's slide draft don't get silently dropped when the TA reformats. Without this, the TA had no permission to emit figures even when the Faculty drafted them. test_force_visual_chunk assertions updated from out[-1] to out[0] to match the new front-insert position.

The default PDF ingestion path used to call ingest_pdf_file (plain PyMuPDF text extraction with no image handling) and image cropping was only available via the opt-in --vlm-extraction flag, which rendered whole pages as PNGs through gpt-4o vision. Move the default path through ingest_pdf_file_paged with pymupdf4llm.to_markdown(write_images=True). pymupdf4llm writes embedded image XObjects from the PDF as tight cropped PNGs — the actual figure region, not a full-page screenshot. The ingester walks the figures directory after extraction, renames each saved file from pymupdf4llm's {stem}.pdf-{page:04d}-{idx:02d}.png convention to {textbook_id}_p{page:04d}_{idx:02d}.png, and emits one figure_cap paragraph with an [IMAGE_PATH: ...] marker per image so the downstream visual-content rules path still triggers. Measured on Han 3rd ed (740 pages): 464 cropped figures landed in 80 s with no API spend. Figure 10.3 (K-Means iteration) is now a 32 KB tight crop showing just the 3-panel diagram — the prior VLM pass produced a 230 KB full-page render with running headers and body text bleed. The hybrid VLM path stays available via --vlm-extraction for runs that want the gpt-4o DESCRIPTION / INSIGHT markers alongside images. The test_ir_cache plumbing patch swap targets the new dispatch hook (_ingest in knowledge_base) so it stays robust to future routing changes.

Both code paths were tried, measured, and abandoned but their source plus tests stayed in the tree as documentation. The CLAUDE.md note covers the same ground without leaving the modules in place, so drop: * src/grounding/reranker.py — LLMReranker (137-line class plus the json / os imports that only it used). Production retrieval uses CrossEncoderReranker; the LLM variant measured no improvement (89.3 % vs 90.2 % precision) on its trial run. * src/slides.py — _generate_best_of_n_draft and the _decrement_tracker_for_text helper that rolls back loser citations. The multi-draft slide-best-pick path scored drafts by citation count which rewarded volume over quality and was disabled in favour of the semantic-gate stack. The DISABLED comment in _generate_slide_draft referenced a --enable-best-of-n flag that did not exist. * src/grounding/__init__.py — drop the LLMReranker re-export. * src/ADDIE.py — drop the comment paragraph about the removed reranker; CrossEncoderReranker construction is unchanged. * tests/test_multi_draft_best_pick.py — whole file (~146 lines). * tests/test_grounding_reranker.py — TestLLMReranker class plus the _mock_openai_client helper used only by it (~118 lines).

…topic-coherence guard

…delity fixes

xander-Huang-ASU · 2026-06-24T23:10:19Z

Bug: Incorrect `generate_response()` input type

_maybe_augment_syllabus_with_admin (ADDIE.py:349) passes a plain string to LLM.generate_response(), which expects a List[Dict[str, str]].

Current:

response = self.addie.llm.generate_response(prompt)

Suggested fix:

response = self.addie.llm.generate_response(
    [{"role": "user", "content": prompt}]
)

Impact

The SDK rejects the call, the exception is silently caught, and the function returns early. Consequently:

Admin scaffolding (e.g., Course Policies) is never appended to the syllabus during grounded runs.
result_syllabus_design.md.pre_admin_scaffolding.bak is never created, so --resume retries the same failing call every time.

Reproduced: Ran with --use-textbook and confirmed the .bak file is never created.

Why tests missed it

test_addie_grounding_runtime.py mocks addie.llm with MagicMock, which accepts any argument type. Consider asserting the message format:

call_args = runner.addie.llm.generate_response.call_args[0][0]
assert isinstance(call_args, list)
assert call_args[0]["role"] == "user"

…list

shrey-Bish · 2026-06-24T23:21:20Z

Bug: Incorrect generate_response() input type

_maybe_augment_syllabus_with_admin (ADDIE.py:349) passes a plain string to LLM.generate_response(), which expects a List[Dict[str, str]].

Current:
response = self.addie.llm.generate_response(prompt)
Suggested fix:
response = self.addie.llm.generate_response(
    [{"role": "user", "content": prompt}]
)
Impact

The SDK rejects the call, the exception is silently caught, and the function returns early. Consequently:

Admin scaffolding (e.g., Course Policies) is never appended to the syllabus during grounded runs.

result_syllabus_design.md.pre_admin_scaffolding.bak is never created, so --resume retries the same failing call every time.

Reproduced: Ran with --use-textbook and confirmed the .bak file is never created.

Why tests missed it

test_addie_grounding_runtime.py mocks addie.llm with MagicMock, which accepts any argument type. Consider asserting the message format:
call_args = runner.addie.llm.generate_response.call_args[0][0]
assert isinstance(call_args, list)
assert call_args[0]["role"] == "user"

Good catch : I fixed it in 1304e80. _maybe_augment_syllabus_with_admin now passes [{"role": "user", "content": prompt}] instead of the bare string. You were right about the test, too: the existing grounded test asserted "…" in call_prompt against the string, so it silently encoded the bug. I've updated it to check the message list and added a dedicated assertion on the arg format. Thanks!

shrey-Bish added 3 commits May 16, 2026 18:38

stop tracking .DS_Store (already gitignored)

e0b0dd7

scaffold textbook IR (pydantic schema, 10 models)

a64dae5

add markdown textbook ingester

aac27a1

toc.py + ingest_md.py + 38 tests + labeled fixture; markdown-it-py dep.

shrey-Bish marked this pull request as ready for review May 19, 2026 23:27

shrey-Bish marked this pull request as draft May 22, 2026 21:53

shrey-Bish changed the title ~~Markdown textbook ingester~~ textbook grounding for course generation May 28, 2026

shrey-Bish force-pushed the feature/textbook-grounding branch from 05a6deb to de531da Compare May 28, 2026 22:12

wingsweihua requested a review from xander-Huang-ASU May 28, 2026 22:20

shrey-Bish marked this pull request as ready for review May 29, 2026 23:44

wingsweihua requested a review from EroNinja June 3, 2026 17:58

shrey-Bish added 6 commits June 4, 2026 20:34

shrey-Bish added 5 commits June 5, 2026 08:32

shrey-Bish marked this pull request as draft June 5, 2026 20:34

shrey-Bish added 3 commits June 5, 2026 13:52

shrey-Bish added 3 commits June 7, 2026 11:13

shrey-Bish added 3 commits June 8, 2026 14:24

shrey-Bish marked this pull request as ready for review June 8, 2026 22:47

shrey-Bish added 4 commits June 11, 2026 12:36

shrey-Bish marked this pull request as draft June 14, 2026 01:40

shrey-Bish added 13 commits June 13, 2026 23:42

strip citation tokens from final saved artifacts

125fb97

add textbook-chapter catalog for depth-first single-chapter delivery

73e66d4

inject up to four visual chunks per slide

7649e5d

implement gaps 1+3+8+9+10+11+13 in the slide pipeline

fbadd92

tag math-dense paragraphs as kind=equation at ingest

3bda460

drop textbook-specific jargon from production code and tests

3d698aa

drop post-hoc grounding scorer from evaluate.py

f1e045e

preserve faculty-drafted figures + normalize section titles

130a6ed

add figure-slide embedding match, render-quality fixes, and contract …

8cb0dca

…topic-coherence guard

citation-free grounding: advisory verifier, equation-VLM, render + fi…

9e7cc27

…delity fixes

fix: eval summary print crashed on the grounding-fidelity aggregate

5a13b5d

fix: pass admin-scaffolding prompt to generate_response as a message …

1304e80

…list

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

textbook grounding for course generation#13

textbook grounding for course generation#13
shrey-Bish wants to merge 57 commits into
DaRL-GenAI:mainfrom
shrey-Bish:feature/textbook-grounding

shrey-Bish commented May 19, 2026 •

edited

Loading

Uh oh!

xander-Huang-ASU commented Jun 4, 2026

Uh oh!

shrey-Bish commented Jun 5, 2026

Uh oh!

EroNinja commented Jun 8, 2026

Uh oh!

shrey-Bish commented Jun 8, 2026

1. PDF extraction is the main quality limiter — worth supporting EPUB

2. Citation tokens can collide in the verifier index (may skew the headline faithfulness numbers)

3. `sentence-transformers` (→ torch) is a hard dependency for a path that's off by default

Uh oh!

xander-Huang-ASU commented Jun 24, 2026

Uh oh!

shrey-Bish commented Jun 24, 2026

Bug: Incorrect `generate_response()` input type

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

shrey-Bish commented May 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xander-Huang-ASU commented Jun 4, 2026

Uh oh!

shrey-Bish commented Jun 5, 2026

Uh oh!

EroNinja commented Jun 8, 2026

1. PDF extraction is the main quality limiter — worth supporting EPUB

2. Citation tokens can collide in the verifier index (may skew the headline faithfulness numbers)

3. sentence-transformers (→ torch) is a hard dependency for a path that's off by default

Uh oh!

shrey-Bish commented Jun 8, 2026

1. PDF extraction is the main quality limiter — worth supporting EPUB

2. Citation tokens can collide in the verifier index (may skew the headline faithfulness numbers)

3. sentence-transformers (→ torch) is a hard dependency for a path that's off by default

Uh oh!

xander-Huang-ASU commented Jun 24, 2026

Bug: Incorrect generate_response() input type

Uh oh!

shrey-Bish commented Jun 24, 2026

Bug: Incorrect generate_response() input type

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

shrey-Bish commented May 19, 2026 •

edited

Loading

3. `sentence-transformers` (→ torch) is a hard dependency for a path that's off by default

3. `sentence-transformers` (→ torch) is a hard dependency for a path that's off by default

Bug: Incorrect `generate_response()` input type

Bug: Incorrect `generate_response()` input type