Curate Docs For AI (with Claude Code)

Curate and index documentation from any website into collections like tailwind/, horses/, etc. Reference collection indexes in your AI chats (e.g. @tailwind/INDEX.xml what's a utility?) so that only relevant docs are analysed. Much cleaner than a web-fetch and more focussed than a web-search. Keep your AI context sharp.

Terminal showing three-step workflow: (1) Running /curate-doc biome command, (2) Curation success output showing scraped documentation and generated INDEX.xml entry, (3) Use /ask-docs to query docs. Handwritten annotations highlight each step.

Complete workflow: curate → auto scrape → "/ask-docs biome Validate my config file please"

📦 Repo Collections

Available collections in this repo:

Collection	Collection Index	Description	Scraped	Source
📦 `biome/`	📄 `INDEX.xml`	Fast linter/formatter	2025-11-04	Official
📦 `claudecode/`	📄 `INDEX.xml`	Anthropic Claude Code	2026-02-05	Official
📦 `claudeplat/`	📄 `INDEX.xml`	Anthropic Claude Platform	2026-01-07	Official
📦 `clerk/`	📄 `INDEX.xml`	Authentication	2025-12-03	Official
📦 `convex/`	📄 `INDEX.xml`	Reactive database	2026-01-07	Official
🪝 `lefthook/`	📄 `INDEX.xml`	Git hooks manager	2025-11-24	Official
📦 `marimo/`	📄 `INDEX.xml`	Reactive Python notebooks	2025-11-11	Official
📦 `nextjs/`	📄 `INDEX.xml`	React framework	2025-12-02	Official
📦 `playwright/`	📄 `INDEX.xml`	Browser testing	2025-11-07	Official
📦 `shadcn/`	📄 `INDEX.xml`	React UI components	2025-12-16	Official, Guide
📦 `shiny/`	📄 `INDEX.xml`	Python web apps	2025-11-02	Official
📦 `tailwind/`	📄 `INDEX.xml`	CSS framework	2025-10-15	Official
📦 `tailwindplus/`	📄 `INDEX.xml`	Paid UI Components	2025-11-16	Official
📦 `uv/`	📄 `INDEX.xml`	Python projects	2026-05-19	Official
📦 `vercel/`	📄 `INDEX.xml`	Deployment platform	2025-10-20	Official
📦 `vitest/`	📄 `INDEX.xml`	Testing framework	2025-11-05	Official
📦 `zustand/`	📄 `INDEX.xml`	State management	2026-01-03	Official

Curate your own collections. The lefthook collection is non-standard — docs are downloaded directly from GitHub. For Anthropic docs use this tool.

🚀 Setup

# 1. Install UV
# 👉 https://docs.astral.sh/uv/getting-started/installation/

# 2. Clone repository
git clone https://github.com/michellepace/docs-for-ai.git
cd docs-for-ai

# 3. Get free FireCrawl API key (Only GitHub sources are downloaded directly)
# Visit: https://www.firecrawl.dev/app/api-keys

# 4. Add to your shell profile
echo 'export API_KEY_MCP_FIRECRAWL=your-api-key-here' >> ~/.zshrc
source ~/.zshrc  # Use ~/.bashrc if that's your shell

📖 Usage via Slash Commands

Important

Edit the paths in .claude/commands/ask-docs.md to match your local setup. To use from anywhere, move it to ~/.claude/commands/.

Slash Command	Purpose	.md Files	INDEX `<source>`
`/curate-doc <collection> <url>`	Add new or re-scrape	✅ Write	✅ Add/update INDEX.xml
`/rescrape-docs <collection>`	Re-scrape all docs	✅ Write all	✅ Selective update INDEX.xml
`/improve-index-xml <collection>`	Batch improve descriptions	📖 Read	✅ Update INDEX.xml
`/ask-docs <collection> <question>`	Query any collection	Docs analysed	Relevant docs identified

💡 Usage Example

Assume tailwind was not already a collection in this repo:

# Start a new collection
/curate-doc tailwind https://tailwindcss.com/docs/customizing-colors
# → Creates tailwind/ collection directory, with README.md + INDEX.xml, and first curated doc

# Re-scrape existing doc (refresh content from same URL)
/curate-doc tailwind https://tailwindcss.com/docs/customizing-colors
# → Re-scrapes, writes .md file, replaces source in INDEX.xml

# Curate a new doc into collection
/curate-doc tailwind https://tailwindcss.com/docs/styling-with-utility-classes
# → Scrapes page into collection, writes .md file, adds source to INDEX.xml

# Re-scrape all docs in collection
/rescrape-docs tailwind
# → Re-scrapes all URLs in INDEX.xml, writes all .md files, updates descriptions for changed content

# ✨ Use the docs
/ask-docs tailwind Please evaluate my project for correct usage of utility classes?
# → Searches tailwind/INDEX.xml for relevant docs, analyses these, gives you an answer

🏗️ How This Repo Works

Workflow: Python script fetches from source URL → writes .md file → creates INDEX.xml entry with PLACEHOLDER description → Claude Code generates semantic description. The /curate-doc command always regenerates the description, whereas /rescrape-docs only regenerates descriptions for files with content changes.

Source routing: If the source URL is on GitHub, a direct fetch is used instead of FireCrawl.

Directory Structure:

uv/
├── INDEX.xml               # Index of all docs
├── README.md
├── api-reference.md        # Scraped doc
├── getting-started.md      # Scraped doc
└── ...

INDEX.xml Schema:

<docs_index>
  <source>
    <title>Hello Document Title</title>
    <description>20-30 word dense summary optimised for semantic search...</description>
    <source_url>https://docs.example.com/hello</source_url>
    <local_file>hello-document-title.md</local_file>
    <scraped_at>2025-10-15</scraped_at>
  </source>
  <!-- Multiple <source> entries, one per .md file -->
</docs_index>

Scripts use the FireCrawl Python SDK for general web sources and Python stdlib (urllib.request) for GitHub raw markdown.

👉 Notes to Improve later

LLM Routing (2026-05-22)

"Semantic search" isn't the right term. The examples also need improving — very keyword-heavy, with redundant starting words. Index should say <summary> instead of <description>.

Each description is a routing signal for an LLM reader — Claude reads INDEX.xml to pick which files answer a question. Optimise for that, not human readability:

Discriminative descriptions

Bottom line: keep the descriptions, drop the "semantic search" framing. What you want is discriminative descriptions — meaningful and keyword-anchored and written to stand apart from their neighbours — read by an LLM, not matched by vectors.

Also, extract all duplicated examples into one .claude/commands/references/examples.md file.

Then regenerate all the PLACEHOLDER descriptions.

Remove Scripts? (2026-05-27)

Remove the scripts in scripts/ I don't need. For example, what about a bash for-loop over curate_doc.py, then allocate to subagents (run wc --chars *.md | sort -n or token counts). Read the diff on each local file and assess if the description needs refining (using head).

Pick a cap so no agent is overloaded, on two axes:

Input size — a target like "~X total words per agent" so context stays comfortable.
File count — a soft cap (e.g. ≤5 files) so no single agent does too much serial reading, which hurts both speed and summary quality.

The honest rule: isolation matters in proportion to file size and similarity. Give a large or highly-similar file its own agent (where contamination/dilution is real). Batch small, distinct files 3–5 together — the interference there is negligible, and you save the overhead. It's a quality-vs-efficiency trade-off, not a flat "always isolate".

For this task I'd set the budget as "≤5 files AND ≤~12k words per agent, whichever binds first". I'd rather use the anthropic tokeniser script in ~/projects/python/TEMP-token-counts/.

But if there's an API cost to running these, use a multiplier (approximated across curated files): tokens ≈ characters × 0.37 (I don't think my sampling was wonky across 49 files).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Curate Docs For AI (with Claude Code)

📦 Repo Collections

🚀 Setup

📖 Usage via Slash Commands

💡 Usage Example

🏗️ How This Repo Works

👉 Notes to Improve later

LLM Routing (2026-05-22)

Remove Scripts? (2026-05-27)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 282 Commits
.claude		.claude
.images		.images
.vscode		.vscode
biome		biome
claudecode		claudecode
claudeplat		claudeplat
clerk		clerk
convex		convex
lefthook		lefthook
marimo		marimo
nextjs		nextjs
playwright		playwright
scripts		scripts
shadcn		shadcn
shiny		shiny
tailwind		tailwind
tailwindplus		tailwindplus
tests		tests
uv		uv
vercel		vercel
vitest		vitest
zustand		zustand
.coderabbit.yaml		.coderabbit.yaml
.gitattributes		.gitattributes
.gitignore		.gitignore
.markdownlint-cli2.yaml		.markdownlint-cli2.yaml
.mcp.json		.mcp.json
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

Curate Docs For AI (with Claude Code)

📦 Repo Collections

🚀 Setup

📖 Usage via Slash Commands

💡 Usage Example

🏗️ How This Repo Works

👉 Notes to Improve later

LLM Routing (2026-05-22)

Remove Scripts? (2026-05-27)

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages