Skip to content

Commit bfd181f

Browse files
committed
Document Playwright best practices from axe test development
Add llm-docs/playwright-best-practices.md with patterns for writing reliable Playwright tests, derived from comprehensive axe accessibility test development (PR #14125). Covers web-first assertions, role-based selectors, handling non-unique selectors, completion signals, and parameterized testing. Refactor .claude/rules/testing/playwright-tests.md to brief reference format with link to detailed documentation, reducing auto-loaded context while preserving knowledge for explicit access.
1 parent 4affb36 commit bfd181f

2 files changed

Lines changed: 378 additions & 0 deletions

File tree

.claude/rules/testing/playwright-tests.md

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,17 @@ uv run python -m http.server 8080
111111
# Serves from tests/docs/playwright/
112112
```
113113

114+
## Best Practices
115+
116+
**For detailed examples and patterns, see [llm-docs/playwright-best-practices.md](../../llm-docs/playwright-best-practices.md)**
117+
118+
Key patterns for reliable tests:
119+
120+
- **Web-first assertions:** Use `expect(el).toContainText()`, `toBeAttached()`, `toHaveCSS()` instead of imperative DOM queries
121+
- **Role-based selectors:** Prefer `getByRole('tab', { name: 'Page 2' })` over `locator('a[data-bs-target]')`
122+
- **Non-unique selectors:** When using `.first()`, add comment explaining why selector may be non-unique and what you're testing
123+
- **Completion signals:** Use `data-feature-complete` attributes in finally blocks instead of arbitrary delays
124+
114125
## Utilities
115126

116127
From `src/utils.ts`:
Lines changed: 367 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,367 @@
1+
---
2+
main_commit: ee0f68be1
3+
analyzed_date: 2026-02-27
4+
key_files:
5+
- tests/integration/playwright/tests/axe-accessibility.spec.ts
6+
- tests/integration/playwright/tests/html-math-katex.spec.ts
7+
---
8+
9+
# Playwright Testing Best Practices
10+
11+
Best practices for writing reliable, maintainable Playwright tests in Quarto CLI, derived from comprehensive test development (axe-accessibility.spec.ts, 431 lines, 75 test cases across 3 formats).
12+
13+
## Web-First Assertions
14+
15+
**Always use Playwright's web-first assertions** - they auto-retry and are more reliable than imperative DOM queries.
16+
17+
### Text Content
18+
19+
```typescript
20+
// ✅ Good - auto-retrying, declarative
21+
await expect(element).toContainText('expected text');
22+
23+
// ❌ Bad - imperative, no auto-retry
24+
const text = await element.textContent();
25+
expect(text).toContain('expected text');
26+
```
27+
28+
### Element Presence
29+
30+
```typescript
31+
// ✅ Good - built-in waiting
32+
await expect(element).toBeAttached();
33+
await expect(element).toBeVisible();
34+
35+
// ❌ Bad - manual DOM check
36+
const isAttached = await page.evaluate(() =>
37+
document.querySelector('.element') !== null
38+
);
39+
expect(isAttached).toBe(true);
40+
```
41+
42+
### CSS Properties
43+
44+
```typescript
45+
// ✅ Good - direct assertion with auto-retry
46+
await expect(element).toHaveCSS('background-color', 'rgb(255, 0, 0)');
47+
await expect(element).not.toHaveCSS('background-color', 'rgba(0, 0, 0, 0)');
48+
49+
// ❌ Bad - manual getComputedStyle
50+
const bgColor = await element.evaluate(el =>
51+
window.getComputedStyle(el).backgroundColor
52+
);
53+
expect(bgColor).toBe('rgb(255, 0, 0)');
54+
```
55+
56+
### Waiting for Elements
57+
58+
```typescript
59+
// ✅ Good - use locator methods
60+
await element.waitFor({ state: 'visible' });
61+
await element.waitFor({ state: 'attached', timeout: 10000 });
62+
63+
// ❌ Bad - page-level selector waiting
64+
await page.waitForSelector('.element');
65+
```
66+
67+
### Why Web-First Assertions?
68+
69+
Web-first assertions automatically retry until:
70+
- The condition is met (test passes)
71+
- Timeout occurs (test fails with clear error)
72+
73+
This handles timing issues gracefully without manual `waitFor()` calls or fixed delays.
74+
75+
**Real-world impact from PR #14125:** Converting from imperative checks to web-first assertions eliminated multiple race conditions in cross-format testing where different formats loaded at different speeds.
76+
77+
## Role-Based Selectors
78+
79+
**Prefer semantic role-based selectors** over CSS attribute selectors.
80+
81+
### Interactive Elements
82+
83+
```typescript
84+
// ✅ Good - semantic, resilient to markup changes
85+
await page.getByRole('tab', { name: 'Page 2' }).click();
86+
await page.getByRole('button', { name: 'Toggle sidebar' }).click();
87+
await page.getByRole('heading', { name: 'Title' }).isVisible();
88+
await page.getByRole('navigation').locator('a', { hasText: 'Home' }).click();
89+
90+
// ❌ Bad - fragile, coupled to implementation
91+
await page.locator('a[data-bs-target="#page-2"]').click();
92+
await page.locator('.collapse-toggle').click();
93+
await page.locator('h1.title').isVisible();
94+
await page.locator('nav a:has-text("Home")').click();
95+
```
96+
97+
### When to Use CSS Selectors
98+
99+
Role-based selectors aren't always possible. Use CSS selectors for:
100+
- Custom components without ARIA roles
101+
- Testing implementation-specific classes (e.g., `.quarto-axe-report`)
102+
- Dynamic content where role/name combinations are too generic
103+
104+
```typescript
105+
// Acceptable - testing specific implementation class
106+
await page.locator('.quarto-axe-report').toBeVisible();
107+
108+
// Acceptable - no semantic role exists
109+
await page.locator('section.quarto-axe-report-slide').toBeAttached();
110+
```
111+
112+
### Why Role-Based Selectors?
113+
114+
1. **Readability:** `getByRole('tab', { name: 'Page 2' })` is self-documenting
115+
2. **Accessibility:** If the selector works, the element is accessible
116+
3. **Resilience:** CSS classes and attributes change; roles and labels are stable contracts
117+
4. **Maintenance:** Easier to understand intent when reviewing tests
118+
119+
**Real-world impact from PR #14125:** Dashboard rescan tests (8 test cases) initially used CSS attribute selectors like `a[data-bs-target="#page-2"]`. Refactoring to role-based selectors made tests readable without opening the HTML fixtures.
120+
121+
## Handling Non-Unique Selectors
122+
123+
When external tools produce selectors you don't control (e.g., axe-core returning generic "span"), use `.first()` with explanatory comments.
124+
125+
### Pattern
126+
127+
```typescript
128+
// ✅ Good - explicit about why .first() is used
129+
// Use .first() since axe-core may produce non-unique selectors (e.g., "span").
130+
// This tests integration (hover triggers highlight) not selector uniqueness.
131+
const element = page.locator(selector).first();
132+
await expect(element).toHaveClass(/quarto-axe-hover-highlight/);
133+
134+
// ❌ Bad - no explanation, unclear intent
135+
const element = page.locator(selector).first();
136+
await expect(element).toHaveClass(/quarto-axe-hover-highlight/);
137+
```
138+
139+
### When to Use `.first()`
140+
141+
Use when:
142+
- External tools generate selectors you don't control
143+
- Test focuses on interaction/integration, not selector precision
144+
- Selector is known to match multiple elements, but you only care about one
145+
146+
**Always add comments** explaining:
147+
1. Why the selector may be non-unique (e.g., "axe-core produces generic selectors")
148+
2. What the test is actually verifying (e.g., "hover triggers highlight, not selector uniqueness")
149+
150+
### Why Not Fix the Selector?
151+
152+
In integration tests, you're often verifying end-to-end behavior with third-party libraries. The test validates that your integration code works correctly, not that the third-party library produces optimal selectors.
153+
154+
**Real-world example from PR #14125:**
155+
156+
```typescript
157+
// axe-core returns CSS selectors like "span" for violations
158+
// We test: "when hovering violation target, does page element get highlighted?"
159+
// We don't test: "does axe produce unique selectors?" (that's axe's job)
160+
161+
const target = reportSlide.locator('.quarto-axe-violation-target').first();
162+
await target.hover();
163+
164+
// Use .first() since axe-core may produce non-unique selectors (e.g., "span").
165+
// This tests integration (hover triggers highlight) not selector uniqueness.
166+
const element = page.locator(selector).first();
167+
await expect(element).toHaveClass(/quarto-axe-hover-highlight/);
168+
```
169+
170+
## Async Completion Signals
171+
172+
For tests that wait for async operations (network requests, library initialization, processing), add deterministic completion signals instead of arbitrary delays or polling.
173+
174+
### Pattern
175+
176+
```typescript
177+
// In application code:
178+
async function init() {
179+
try {
180+
await loadLibrary();
181+
await processContent();
182+
} catch (error) {
183+
console.error('Initialization failed:', error);
184+
} finally {
185+
// Always set completion signal, even if work fails
186+
document.body.setAttribute('data-feature-complete', 'true');
187+
}
188+
}
189+
190+
// In test:
191+
await page.goto('/page.html');
192+
await page.waitForSelector('[data-feature-complete]', { timeout: 15000 });
193+
194+
// Now you can safely assert on the results
195+
await expect(page.locator('.result')).toBeVisible();
196+
```
197+
198+
### Error Handling
199+
200+
The signal should **always** be set, even on failure:
201+
202+
```typescript
203+
// ✅ Good - completion signal set in finally block
204+
async function processData() {
205+
try {
206+
await doAsyncWork();
207+
} catch (error) {
208+
console.error('Processing failed:', error);
209+
} finally {
210+
document.body.setAttribute('data-processing-complete', 'true');
211+
}
212+
}
213+
214+
// ❌ Bad - signal only set on success
215+
async function processData() {
216+
try {
217+
await doAsyncWork();
218+
document.body.setAttribute('data-processing-complete', 'true');
219+
} catch (error) {
220+
console.error('Processing failed:', error);
221+
}
222+
}
223+
```
224+
225+
### Why Not Use Fixed Delays?
226+
227+
```typescript
228+
// ❌ Bad - arbitrary delay, may be too short or unnecessarily long
229+
await page.goto('/page.html');
230+
await page.waitForTimeout(5000); // Hope 5 seconds is enough?
231+
232+
// ✅ Good - deterministic, completes as soon as ready
233+
await page.goto('/page.html');
234+
await page.waitForSelector('[data-feature-complete]', { timeout: 15000 });
235+
```
236+
237+
### Why Completion Signals?
238+
239+
1. **Deterministic:** Test knows exactly when async work is done
240+
2. **Fast:** No waiting longer than necessary
241+
3. **Clear failures:** Timeout means "work never completed" not "maybe we didn't wait long enough"
242+
4. **Debuggable:** Missing attribute = work didn't finish or crashed
243+
244+
**Real-world impact from PR #14125:** The axe accessibility tests initially had race conditions where tests would sometimes pass/fail depending on axe-core's CDN load speed. Adding `data-quarto-axe-complete` in a finally block made tests deterministic - they wait exactly as long as needed and fail clearly if axe never initializes.
245+
246+
### Advanced: Generation Counters for Rescanning
247+
248+
When operations can be triggered multiple times (e.g., rescanning on navigation), use generation counters to discard stale results:
249+
250+
```typescript
251+
let scanGeneration = 0;
252+
253+
async function rescan() {
254+
const currentGeneration = ++scanGeneration;
255+
256+
try {
257+
const results = await performScan();
258+
259+
// Discard stale results if user triggered another scan
260+
if (currentGeneration !== scanGeneration) {
261+
return;
262+
}
263+
264+
updateUI(results);
265+
} finally {
266+
// Only set completion for latest scan
267+
if (currentGeneration === scanGeneration) {
268+
document.body.setAttribute('data-scan-complete', 'true');
269+
}
270+
}
271+
}
272+
```
273+
274+
**From PR #14125 dashboard rescan:** Users can switch tabs/pages faster than axe scans complete. Generation counters ensure old scans don't overwrite newer results.
275+
276+
## Parameterized Tests
277+
278+
When testing the same behavior across multiple formats or configurations, use `test.describe` with a test cases array instead of separate spec files.
279+
280+
### Pattern
281+
282+
```typescript
283+
interface TestCase {
284+
format: string;
285+
url: string;
286+
expectedViolation?: string;
287+
shouldFail?: string; // Reason for expected failure
288+
}
289+
290+
const testCases: TestCase[] = [
291+
{ format: 'html', url: '/html/feature.html', expectedViolation: 'color-contrast' },
292+
{ format: 'revealjs', url: '/revealjs/feature.html', expectedViolation: 'link-name' },
293+
{
294+
format: 'dashboard',
295+
url: '/dashboard/feature.html',
296+
shouldFail: 'Dashboard has no <main> element (#13781)'
297+
},
298+
];
299+
300+
test.describe('Feature across formats', () => {
301+
for (const { format, url, expectedViolation, shouldFail } of testCases) {
302+
test(`${format} — feature detects ${expectedViolation}`, async ({ page }) => {
303+
if (shouldFail) {
304+
test.fail(); // Mark as expected failure
305+
}
306+
307+
await page.goto(url);
308+
// Shared test logic
309+
});
310+
}
311+
});
312+
```
313+
314+
### When to Use Parameterized Tests
315+
316+
Use when:
317+
- Same assertion logic applied to multiple formats (html, revealjs, dashboard, pdf)
318+
- Testing multiple output modes (console, json, document)
319+
- Testing across configurations (themes, options, feature flags)
320+
321+
**Benefits:**
322+
- Reduces file count (1 spec file instead of 3-10)
323+
- Centralizes shared helpers
324+
- Easy to add new test cases
325+
- Clear comparison of format differences
326+
327+
**From PR #14125:** axe-accessibility.spec.ts tests 3 formats × 3 output modes = 9 base cases in a single 431-line file instead of 9 separate spec files.
328+
329+
### Expected Failures with test.fail()
330+
331+
Mark known failures explicitly:
332+
333+
```typescript
334+
test('Feature that is broken in revealjs', async ({ page }) => {
335+
// RevealJS doesn't support this yet (#13781)
336+
test.fail();
337+
338+
// Normal test logic - if this unexpectedly passes, Playwright will flag it
339+
await page.goto('/revealjs/feature.html');
340+
await expect(page.locator('.feature')).toBeVisible();
341+
});
342+
```
343+
344+
**Why use test.fail():**
345+
- Documents known issues in test suite
346+
- Test passes when it fails (expected behavior)
347+
- Test **fails** if it unexpectedly passes (signals the bug is fixed)
348+
- Better than commenting out tests or skipping with test.skip()
349+
350+
## Summary
351+
352+
**Four key patterns for reliable Playwright tests:**
353+
354+
1. **Web-first assertions** - `expect(el).toContainText()` not `expect(await el.textContent())`
355+
2. **Role-based selectors** - `getByRole('tab', { name: 'Page 2' })` not `locator('a[data-bs-target]')`
356+
3. **Explicit .first() comments** - Explain why and what you're testing
357+
4. **Completion signals** - `data-feature-complete` in finally blocks, not arbitrary delays
358+
359+
These patterns emerged from building comprehensive cross-format test coverage and debugging race conditions. They make tests:
360+
- More reliable (fewer flaky failures)
361+
- More readable (intent is clear)
362+
- Easier to maintain (resilient to markup changes)
363+
- Faster to debug (clear failure modes)
364+
365+
**Reference implementations:**
366+
- `tests/integration/playwright/tests/axe-accessibility.spec.ts` - 431 lines, 75 test cases
367+
- `tests/integration/playwright/tests/html-math-katex.spec.ts` - Parameterized format testing

0 commit comments

Comments
 (0)