Skip to content

Commit 8d18b40

Browse files
committed
refactor(parsers): switch fixture format from JSON to YAML
Convert the 7 PARSER.batch fixture files from JSON to YAML so the multi-line `model_text` field reads as the actual wire format instead of a `\n`-escaped one-liner. Same data, more readable during PR review: - XML-style families (qwen3_coder, kimi_k2, glm47, harmony) get proper line breaks via YAML literal block scalars (`|-`). - DeepSeek embedded-JSON args lose the escaped-quote noise (`{\"location\":\"NYC\"}` -> `{"location":"NYC"}`). - Special tokens (`|` U+FF5C, `▁` U+2581) still round-trip literally via `allow_unicode=True`. Mechanical changes: - `regenerate_fixtures.py`: dump via PyYAML with a custom string presenter that picks block-scalar style for multi-line strings. Test pass: 111 passed / 90 skipped / 9 xfailed (unchanged from the JSON baseline). `--overwrite-if-exists` round-trip is byte-stable. - `test_parity_parser.py`: load via `yaml.safe_load`; glob shifts from `*.json` to `*.yaml`. - README: schema example reframed in YAML; mentions JSON only where it refers to wire-format JSON (tool-call args), not the fixture format. - Fix a pre-existing bug in `regenerate_fixtures.py` where `FIXTURES_ROOT = Path(__file__).parent` wrote outputs one level above the canonical `fixtures/` tree. Now writes back to `fixtures/`. PyYAML is already an ambient runtime dep; no new top-level requirement. Eventual Rust harness can use `serde_yaml` (already present transitively via `kube-client`). Signed-off-by: Keiven Chang <[email protected]>
1 parent 55eab8e commit 8d18b40

17 files changed

Lines changed: 1112 additions & 2024 deletions

tests/parity/README.md

Lines changed: 36 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,8 @@ tests/parity/
1212
├── conftest.py ← session-scoped fixtures (server boots, etc.)
1313
├── common.py ← ParseResult, canonical-JSON diff, decode_arguments
1414
└── parser/
15-
├── fixtures/ ← static JSON, generated from Dynamo as oracle
16-
│ └── <family>/PARSER.batch.json
15+
├── fixtures/ ← static YAML, generated from Dynamo as oracle
16+
│ └── <family>/PARSER.batch.yaml
1717
├── regenerate_fixtures.py ← (re-)build fixtures by running Dynamo's parser
1818
1919
├── dynamo.py ← M2 in-process wrapper (PyO3 binding)
@@ -117,37 +117,39 @@ They're stacked diagnostics:
117117

118118
## Fixture file schema
119119

120-
Each `<family>/PARSER.batch.json`:
121-
122-
```json
123-
{
124-
"family": "kimi_k2",
125-
"mode": "batch",
126-
"cases": {
127-
"1": {
128-
"description": "Single tool call (happy path)",
129-
"model_text": "<|tool_calls_section_begin|>...",
130-
"tools": [{"name": "...", "parameters": {...}}],
131-
"expected": {
132-
"calls": [{"name": "...", "arguments": {...}}],
133-
"normal_text": ""
134-
}
135-
},
136-
"2": { ... },
137-
...
138-
}
139-
}
120+
Each `<family>/PARSER.batch.yaml`:
121+
122+
```yaml
123+
family: kimi_k2
124+
mode: batch
125+
cases:
126+
'1':
127+
description: Single tool call (happy path)
128+
model_text: |-
129+
<|tool_calls_section_begin|>...
130+
tools:
131+
- name: ...
132+
parameters: {...}
133+
expected:
134+
calls:
135+
- name: ...
136+
arguments: {...}
137+
normal_text: ''
138+
'2': ...
140139
```
141140
142-
Case keys are `"1"``"10"` (string-typed because JSON object keys
143-
are strings); the harness reconstructs the full case ID
144-
`PARSER.batch.<n>` for test IDs and the `KNOWN_DIVERGENCES` keys.
141+
Case keys are `'1'``'10'` (quoted so YAML doesn't treat them as
142+
ints, which would also reorder them); the harness reconstructs the
143+
full case ID `PARSER.batch.<n>` for test IDs and the
144+
`KNOWN_DIVERGENCES` keys.
145145

146-
UTF-8 encoding with `ensure_ascii=False`, so DeepSeek special
147-
tokens (`` U+FF5C, `` U+2581) appear as literal characters
148-
rather than `\uXXXX` escapes.
146+
`model_text` uses YAML's literal block scalar (`|-`) so multi-line
147+
wire formats (XML-style families, harmony) read as the actual text
148+
the model would emit, not a `\n`-escaped one-liner. UTF-8 with
149+
`allow_unicode=True`, so DeepSeek special tokens (`|` U+FF5C, `▁`
150+
U+2581) appear as literal characters rather than escape sequences.
149151

150-
## Why families' JSONs look so similar (and why that's the point)
152+
## Why families' YAMLs look so similar (and why that's the point)
151153

152154
Open any two family files side-by-side and the case shells look
153155
nearly identical: same `description` strings, same `tools` schemas,
@@ -259,7 +261,7 @@ scope today; see `lib/parsers/PARSER_CASES.md`,
259261
`lib/parsers/PIPELINE_CASES.md` for the surrounding taxonomy that
260262
will guide which stages are worth adding when.
261263
262-
## Eventual goal: JSON fixtures as the single source of truth
264+
## Eventual goal: YAML fixtures as the single source of truth
263265
264266
Today there's overlap between this harness's fixtures and the
265267
hand-written Rust unit tests under `lib/parsers/src/tool_calling/*`
@@ -271,7 +273,7 @@ The intended end state is **one set of fixtures, multiple thin
271273
harnesses**, in subsequent PRs:
272274
273275
```
274-
tests/parity/parser/fixtures/<family>/PARSER.batch.json
276+
tests/parity/parser/fixtures/<family>/PARSER.batch.yaml
275277
276278
├── Python harness (M2 / M3) — already reads it
277279
└── Rust harness (future) — would read it too,
@@ -281,7 +283,7 @@ tests/parity/parser/fixtures/<family>/PARSER.batch.json
281283
282284
What that buys:
283285
284-
- **No duplicated test data.** Adding a case in JSON immediately
286+
- **No duplicated test data.** Adding a case in YAML immediately
285287
covers Dynamo (Rust harness), Dynamo-via-PyO3 (M2), and
286288
vLLM/SGLang servers (M3). Today, adding a Rust test means
287289
hand-mirroring the case into M2's `INPUTS` if you want
@@ -302,7 +304,7 @@ What stays in Rust-only tests after the migration:
302304
303305
Effort sketch (separate PRs after M2 + M3 land):
304306
305-
- **PR-X:** Rust harness that reads `PARSER.batch.json`, dispatches
307+
- **PR-X:** Rust harness that reads `PARSER.batch.yaml`, dispatches
306308
to `try_tool_call_parse_<family>(...)`, asserts on `expected`.
307309
~1-2 days.
308310
- **PR-Y:** Mechanical migration — delete the ~70 hand-written
@@ -326,7 +328,7 @@ real value-add is the cross-impl half (vLLM and SGLang).
326328
3. Add a section to `INPUTS` in `regenerate_fixtures.py` for every
327329
`(family, "PARSER.batch.<n>")` you want to cover (mirror the
328330
case shape from an existing family).
329-
4. Run the regenerator to materialize `<family>/PARSER.batch.json`.
331+
4. Run the regenerator to materialize `<family>/PARSER.batch.yaml`.
330332
5. Add the family's vLLM and SGLang dispatch entries to
331333
`_FAMILY_TO_VLLM_KEY` (`vllm.py`) and
332334
`_FAMILY_TO_SGLANG_DETECTOR` (`sglang.py`).

0 commit comments

Comments
 (0)