perf: highlight Markdown/MDX with tree-sitter (diff viewer, file viewer, content search)#148
Merged
Merged
Conversation
syntect's Sublime Markdown grammar is pathologically slow — ~1 ms/line, roughly 20x slower than the Rust grammar on the same bytes and ~2000x slower than plain text. The diff viewer pre-highlights the full old *and* new file on every selection, so a 715-line .mdx with a 3-line diff spent ~1.4 s (release) / ~3 s (dev) just colouring, regardless of diff size. Profiling showed ~100% of that cost is syntect's stateful parse; styling is free, so "highlight only displayed lines" does not help — reaching a mid-file hunk still requires parsing the whole prefix. Switch Markdown/MDX to tree-sitter, which parses the whole document in ~30 ms and sidesteps the mid-file-state problem entirely. Hybrid design: tree-sitter handles the Markdown structure (headings, emphasis, links, inline code, fence delimiters) while fenced code blocks keep going through the existing syntect path for their embedded language (graphql/json/ts/…) — those grammars are fast; only Markdown was the problem. Output mirrors the existing `line -> spans` map, so the rest of the diff pipeline is unchanged. Measured end-to-end on the same file: ~41 ms (old+new ≈ 82 ms) vs ~1.4 s before — ~17x faster. tree-sitter 0.26 and streaming-iterator are already in the workspace via gpui-component, so this only adds the tree-sitter-md grammar crate. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]> Claude-Session: https://claude.ai/code/session_019JZRfqVneyJnKjdrtSp3ro
…-sitter too Unify the highlighting path: `syntax::highlight_content` (used by the file viewer and the content-search preview) now routes Markdown/MDX through the same tree-sitter module as the diff viewer, instead of syntect's slow Markdown grammar. Extract a shared `markdown_line_spans` core that produces ordered per-line spans, with two thin adapters: `highlight_markdown_file` (diff viewer, `line -> spans` map, all lines) and `highlight_markdown_content` (file viewer, ordered `HighlightedLine`s with plain text, capped at `max_lines`). The whole document is always parsed (tree-sitter is cheap); only the emitted line count is capped. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]> Claude-Session: https://claude.ai/code/session_019JZRfqVneyJnKjdrtSp3ro
…t theme Markdown element colours were a hand-picked dark/light palette. Resolve them from the active syntect theme instead (via the TextMate scope each element maps to), so Markdown matches how that theme renders it elsewhere and follows the theme if it ever becomes configurable. Themes vary in coverage: Dracula defines markup.heading/bold/italic/link etc., while GitHub defines almost no markup.* rules. So each element falls back to the previous hand-picked colour when the theme has no rule for its scope (detected by the highlighter returning the default foreground). Result: Dracula now uses its real Markdown colours; GitHub is unchanged. Co-Authored-By: Claude Opus 4.8 (1M context) <[email protected]> Claude-Session: https://claude.ai/code/session_019JZRfqVneyJnKjdrtSp3ro
d9952f6 to
dcbb6dd
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Opening a git diff for some Markdown/MDX files was painfully slow — e.g. a 715-line
.mdxwith only a 3-line change took ~1.4 s (release) / ~3 s (dev) to render. The same slow highlighter also backs the file viewer and the content-search preview.Profiling isolated the layer (it is not git, rendering, or diff size):
~0.3 ms, Rust grammar~35 ms, Markdown~700 ms(≈20× slower than Rust, ≈2000× slower than plain text). The diff viewer does it twice (old + new) ⇒ ~1.4 s.Fix
Highlight Markdown/MDX with tree-sitter (
tree-sitter-md), which parses the whole document in ~30 ms and — because it parses the entire file cheaply — sidesteps the "carry parse state to a mid-file hunk" problem entirely.Hybrid design so there's no regression on embedded code:
Unified across all viewers. A shared
markdown_line_spanscore feeds two thin adapters with the existing output shapes, so callers are unchanged:highlight_markdown_file→ diff viewer (line -> spansmap, all lines).highlight_markdown_content→ file viewer & content-search preview (orderedHighlightedLines, capped atmax_lines), wired in viasyntax::highlight_content.Non-Markdown files are completely unaffected (still syntect).
Result
Measured end-to-end on the same file (release), including embedded graphql/json highlighting:
~17× faster — and now the file viewer / content-search preview get the same win.
Notes
tree-sitter0.26 andstreaming-iteratorare already in the workspace (viagpui-component), so this only adds thetree-sitter-mdgrammar crate.cccrate — fine on Linux/macOS; on Windows it needs the MSVC toolchain already used for this repo (x64 Native Tools).crates/okena-files/src/markdown_highlight.rswith unit tests (text reconstruction, heading colouring, embedded-language colouring,HighlightedLineshape +max_lines, path matching). Nounwrap/expectin non-test code (crate lint).Manual check suggested
Open a large
.mdx/.mdfile (e.g. a Contember docs page with graphql code fences) in both the diff viewer and the file viewer; confirm it renders instantly with headings/links/code coloured.🤖 Generated with Claude Code
https://claude.ai/code/session_019JZRfqVneyJnKjdrtSp3ro