Skip to content

Commit 43ef7e1

Browse files
cscheidclaude
andcommitted
Fix llms-txt anchor links and breadcrumb leaking
Links with anchors (e.g. about.html#section) were not being converted to .llms.md because the Lua pattern only matched .html at end-of-string. Now also matches .html followed by # and rewrites both cases. Breadcrumbs from sidebar navigation were leaking into .llms.md output. Add quarto-page-breadcrumbs to droppable_classes in the Lua filter. Adds test coverage for both fixes: anchor link conversion in both directions, .html# negative matches, sidebar config to trigger breadcrumbs, and breadcrumb text negative match. Co-Authored-By: Claude Opus 4.6 <[email protected]>
1 parent c792ec2 commit 43ef7e1

4 files changed

Lines changed: 22 additions & 4 deletions

File tree

src/resources/filters/llms/llms.lua

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ local droppable_classes = {
2525
["listing-categories"] = true,
2626
["quarto-listing-category"] = true, -- category filter sidebar
2727
["listing-category"] = true, -- individual category badges
28+
["quarto-page-breadcrumbs"] = true, -- breadcrumb navigation
2829
}
2930
local droppable_ids = {
3031
["quarto-header"] = true,
@@ -140,7 +141,8 @@ function Link(link)
140141
return link.content
141142
end
142143

143-
if link.target and link.target:match("%.html$") then
144+
if link.target and (link.target:match("%.html$") or link.target:match("%.html#")) then
145+
link.target = link.target:gsub("%.html#", ".llms.md#")
144146
link.target = link.target:gsub("%.html$", ".llms.md")
145147
if link.classes:includes("btn") then
146148
link.attr = pandoc.Attr()

tests/docs/smoke-all/website/llms-txt/_quarto.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,11 @@ website:
1111
- href: index.qmd
1212
text: Home
1313
- about.qmd
14+
sidebar:
15+
contents:
16+
- section: Info
17+
contents:
18+
- about.qmd
1419

1520
format:
1621
html:

tests/docs/smoke-all/website/llms-txt/about.qmd

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,9 +6,9 @@ _quarto:
66
ensureLlmsMdExists: true
77
ensureLlmsMdRegexMatches:
88
# First array: patterns that MUST match
9-
- ["^# About", "> \\*\\*NOTE:\\*\\*", "> \\*\\*WARNING:\\*\\*", "This is a note", "``` python", "def hello", "\\| Feature", "\\|[-]+\\|", "\\[home page\\]\\(.*\\.llms\\.md\\)"]
10-
# Second array: patterns that must NOT match (no .html links in llms.md)
11-
- ["\\.html\\)"]
9+
- ["^# About", "> \\*\\*NOTE:\\*\\*", "> \\*\\*WARNING:\\*\\*", "This is a note", "``` python", "def hello", "\\| Feature", "\\|[-]+\\|", "\\[home page\\]\\(.*\\.llms\\.md\\)", "\\[test site intro\\]\\(index\\.llms\\.md#test-content\\)"]
10+
# Second array: patterns that must NOT match (no .html links, no breadcrumbs)
11+
- ["\\.html\\)", "\\.html#", "\\[Info\\]"]
1212
---
1313

1414
About this test site.
@@ -42,3 +42,5 @@ def hello():
4242
## Link Example
4343

4444
Go back to the [home page](index.qmd).
45+
46+
Go to the [test site intro](index.qmd#test-content).

tests/docs/smoke-all/website/llms-txt/index.qmd

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,17 @@ _quarto:
1111
- ["^# llms-txt Test Site", "^## Pages", "\\[.*\\]\\(.*\\.llms\\.md\\)"]
1212
# Second array: patterns that must NOT match (empty)
1313
- []
14+
ensureLlmsMdRegexMatches:
15+
# First array: patterns that MUST match - verify anchor links are converted
16+
- ["\\[callout examples\\]\\(about\\.llms\\.md#callout-examples\\)"]
17+
# Second array: patterns that must NOT match (no .html or .html# links)
18+
- ["\\.html\\)", "\\.html#"]
1419
---
1520

21+
## Test Content
22+
1623
This is a test website for the llms-txt feature.
1724

1825
See the [about page](about.qmd) for more information.
26+
27+
Also see the [callout examples](about.qmd#callout-examples).

0 commit comments

Comments
 (0)