Skip to content

Commit 83ccd8d

Browse files
NullVoxPopuliclaude
andcommitted
Address review: drop runtime simple-html-tokenizer, fix PathExpression original setter
Two review comments from NullVoxPopuli on the cleanup commit: 1. rollup.config.mjs: "remove this. it's internal, and we don't want it. this grammar work should negate the need for simple-html-tokenizer entirely" - Vendor the HTML5 named-entity table as packages/@glimmer/syntax/lib/parser/html5-named-char-refs.js (the ~2200-entry map from simple-html-tokenizer's generated data, now internal to @glimmer/syntax). - Inline the entity decoder in tokenizer-event-handlers.ts: handles hex (&#xHH;), decimal (&#dd;), and named entities via the vendored table. - Drop simple-html-tokenizer from @glimmer/syntax/package.json and revert the rollup.config resolvePackages central externalization. (simple-html-tokenizer is still a testDependency because internal-test-helpers and integration-tests use it, but no runtime code does anymore.) - .prettierignore: exclude the vendored char-ref file (huge one-liner). 2. auto-import-builtins.ts: "this should not need changing. changing `original` should mean that the other nodes updates" The Peggy grammar was emitting PathExpression nodes as plain objects with `original` as a data field, bypassing buildLegacyPath (which installs `original` as a getter/setter tied to head/tail). That's why the plugin was having to manually rewrite head/tail after assigning to `.original`. - convertLocations now upgrades every PathExpression through buildLegacyPath, so `node.path.original = 'foo.bar'` correctly updates head/tail via the setter. - Revert the manual head/tail mutation in auto-import-builtins.ts back to the original one-line `node.path.original = rewritten`. Verified: pnpm lint, pnpm build, pnpm vite build --mode=development, pnpm test (9138/9138 browser tests pass), and Prettier handlebars smoke test (163/163 pass, 151 snapshots). Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
1 parent 4b69af0 commit 83ccd8d

6 files changed

Lines changed: 33 additions & 31 deletions

File tree

.prettierignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,5 +12,6 @@ package.json
1212
pnpm-lock.yaml
1313
internal-docs/**/*.md
1414
packages/@glimmer/syntax/lib/hbs-parser/parser.js
15+
packages/@glimmer/syntax/lib/parser/html5-named-char-refs.js
1516
*.peggy
1617
tracerbench-testing/

packages/@ember/template-compiler/lib/plugins/auto-import-builtins.ts

Lines changed: 1 addition & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -51,21 +51,7 @@ function rewriteKeyword(
5151
name,
5252
});
5353
} else if (env.meta?.emberRuntime) {
54-
const rewritten = env.meta.emberRuntime.lookupKeyword(name);
55-
node.path.original = rewritten;
56-
// Also update head and tail so the v2 normalizer resolves correctly.
57-
// e.g., "__ember_keywords__.on" → head="__ember_keywords__", tail=["on"]
58-
const parts = rewritten.split('.');
59-
if (parts.length >= 2) {
60-
const headName = parts[0] as string;
61-
node.path.head = {
62-
type: 'VarHead',
63-
name: headName,
64-
original: headName,
65-
loc: node.path.head.loc,
66-
} as AST.VarHead;
67-
node.path.tail = parts.slice(1);
68-
}
54+
node.path.original = env.meta.emberRuntime.lookupKeyword(name);
6955
}
7056
}
7157

packages/@glimmer/syntax/lib/parser/html5-named-char-refs.js

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

packages/@glimmer/syntax/lib/parser/tokenizer-event-handlers.ts

Lines changed: 26 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
import type { Nullable } from '@glimmer/interfaces';
22
import { assign } from '@glimmer/util';
3-
import { EntityParser, HTML5NamedCharRefs } from 'simple-html-tokenizer';
43
import { parseTemplate } from '../hbs-parser/index.js';
4+
import HTML5NamedCharRefs from './html5-named-char-refs.js';
55

66
import type { NodeVisitor } from '../traversal/visitor';
77
import type * as ASTv1 from '../v1/api';
@@ -11,20 +11,25 @@ import * as src from '../source/api';
1111
import { generateSyntaxError } from '../syntax-error';
1212
import traverse from '../traversal/traverse';
1313
import Walker from '../traversal/walker';
14+
import { buildLegacyPath } from '../v1/legacy-interop';
1415
import publicBuilder from '../v1/public-builders';
1516

1617
// ============================================================================
1718
// Entity decoding helper
1819
// ============================================================================
1920

20-
// Full HTML5 entity decoding via simple-html-tokenizer's EntityParser so we
21-
// match the behavior of the previous tokenizer-based pipeline (~2200 named
22-
// entities plus numeric &#dd; and &#xHH; forms).
23-
const entityParser = new EntityParser(HTML5NamedCharRefs);
24-
const ENTITY_RE = /&(#[xX][0-9a-fA-F]+|#[0-9]+|[A-Za-z][A-Za-z0-9]*);/g;
21+
// Full HTML5 entity decoding, covering ~2200 named entities plus numeric
22+
// &#dd; and &#xHH; forms. The named-entity table is vendored from the HTML5
23+
// spec (see ./html5-named-char-refs.js).
24+
const ENTITY_RE = /&(#[xX]([0-9a-fA-F]+)|#([0-9]+)|([A-Za-z][A-Za-z0-9]*));/g;
25+
const NAMED: Record<string, string> = HTML5NamedCharRefs as Record<string, string>;
2526

2627
function decodeEntities(text: string): string {
27-
return text.replace(ENTITY_RE, (match, body) => entityParser.parse(body) ?? match);
28+
return text.replace(ENTITY_RE, (match, _body, hex, dec, name) => {
29+
if (hex !== undefined) return String.fromCodePoint(parseInt(hex, 16));
30+
if (dec !== undefined) return String.fromCodePoint(parseInt(dec, 10));
31+
return NAMED[name] ?? match;
32+
});
2833
}
2934

3035
// ============================================================================
@@ -469,6 +474,12 @@ function isPeggySpan(
469474

470475
/**
471476
* Recursively convert plain {start, end} location objects to SourceSpan instances.
477+
*
478+
* PathExpression nodes are additionally upgraded via `buildLegacyPath` so that
479+
* `node.path.original` is a live getter/setter backed by `head` and `tail`.
480+
* This way downstream AST plugins (e.g. auto-import-builtins) can assign
481+
* `node.path.original = 'foo.bar'` and have `head`/`tail` update automatically,
482+
* matching the behavior of the legacy parser-builders-produced nodes.
472483
*/
473484
// eslint-disable-next-line @typescript-eslint/no-explicit-any
474485
function convertLocations(node: any, source: src.Source): any {
@@ -482,6 +493,10 @@ function convertLocations(node: any, source: src.Source): any {
482493
let result: any = {};
483494

484495
for (let key of Object.keys(node)) {
496+
// PathExpression owns its own `original` via buildLegacyPath's getter.
497+
// Drop the grammar-emitted data field so the getter wins.
498+
if (node.type === 'PathExpression' && key === 'original') continue;
499+
485500
let val = node[key];
486501
if ((key === 'loc' || key === 'openTag' || key === 'closeTag') && isPeggySpan(val)) {
487502
result[key] = peggySpanToSourceSpan(source, val);
@@ -492,6 +507,10 @@ function convertLocations(node: any, source: src.Source): any {
492507
}
493508
}
494509

510+
if (result.type === 'PathExpression') {
511+
return buildLegacyPath({ head: result.head, tail: result.tail, loc: result.loc });
512+
}
513+
495514
return result;
496515
}
497516

packages/@glimmer/syntax/package.json

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -34,8 +34,7 @@
3434
"dependencies": {
3535
"@glimmer/interfaces": "workspace:*",
3636
"@glimmer/util": "workspace:*",
37-
"@glimmer/wire-format": "workspace:*",
38-
"simple-html-tokenizer": "^0.5.11"
37+
"@glimmer/wire-format": "workspace:*"
3938
},
4039
"devDependencies": {
4140
"@glimmer/debug-util": "workspace:*",

rollup.config.mjs

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@ const testDependencies = [
2323
'@simple-dom/serializer',
2424
'@simple-dom/void-map',
2525
'expect-type',
26+
'simple-html-tokenizer',
2627
];
2728

2829
let configs = [
@@ -427,13 +428,6 @@ export function resolvePackages(deps, params) {
427428
return { external: true, id: pkgName };
428429
}
429430

430-
// simple-html-tokenizer is a real runtime dep (listed in ember-source
431-
// package.json and @glimmer/syntax package.json). Keep it external so
432-
// consumers resolve it from their own node_modules.
433-
if (pkgName === 'simple-html-tokenizer') {
434-
return { external: true, id: source };
435-
}
436-
437431
if (isExternal?.(source)) {
438432
return { external: true, id: source };
439433
}

0 commit comments

Comments
 (0)