|
| 1 | +--- |
| 2 | +main_commit: ee0f68be1 |
| 3 | +analyzed_date: 2026-02-27 |
| 4 | +key_files: |
| 5 | + - tests/integration/playwright/tests/axe-accessibility.spec.ts |
| 6 | + - tests/integration/playwright/tests/html-math-katex.spec.ts |
| 7 | +--- |
| 8 | + |
| 9 | +# Playwright Testing Best Practices |
| 10 | + |
| 11 | +Best practices for writing reliable, maintainable Playwright tests in Quarto CLI, derived from comprehensive test development (axe-accessibility.spec.ts, 431 lines, 75 test cases across 3 formats). |
| 12 | + |
| 13 | +## Web-First Assertions |
| 14 | + |
| 15 | +**Always use Playwright's web-first assertions** - they auto-retry and are more reliable than imperative DOM queries. |
| 16 | + |
| 17 | +### Text Content |
| 18 | + |
| 19 | +```typescript |
| 20 | +// ✅ Good - auto-retrying, declarative |
| 21 | +await expect(element).toContainText('expected text'); |
| 22 | + |
| 23 | +// ❌ Bad - imperative, no auto-retry |
| 24 | +const text = await element.textContent(); |
| 25 | +expect(text).toContain('expected text'); |
| 26 | +``` |
| 27 | + |
| 28 | +### Element Presence |
| 29 | + |
| 30 | +```typescript |
| 31 | +// ✅ Good - built-in waiting |
| 32 | +await expect(element).toBeAttached(); |
| 33 | +await expect(element).toBeVisible(); |
| 34 | + |
| 35 | +// ❌ Bad - manual DOM check |
| 36 | +const isAttached = await page.evaluate(() => |
| 37 | + document.querySelector('.element') !== null |
| 38 | +); |
| 39 | +expect(isAttached).toBe(true); |
| 40 | +``` |
| 41 | + |
| 42 | +### CSS Properties |
| 43 | + |
| 44 | +```typescript |
| 45 | +// ✅ Good - direct assertion with auto-retry |
| 46 | +await expect(element).toHaveCSS('background-color', 'rgb(255, 0, 0)'); |
| 47 | +await expect(element).not.toHaveCSS('background-color', 'rgba(0, 0, 0, 0)'); |
| 48 | + |
| 49 | +// ❌ Bad - manual getComputedStyle |
| 50 | +const bgColor = await element.evaluate(el => |
| 51 | + window.getComputedStyle(el).backgroundColor |
| 52 | +); |
| 53 | +expect(bgColor).toBe('rgb(255, 0, 0)'); |
| 54 | +``` |
| 55 | + |
| 56 | +### Waiting for Elements |
| 57 | + |
| 58 | +```typescript |
| 59 | +// ✅ Good - use locator methods |
| 60 | +await element.waitFor({ state: 'visible' }); |
| 61 | +await element.waitFor({ state: 'attached', timeout: 10000 }); |
| 62 | + |
| 63 | +// ❌ Bad - page-level selector waiting |
| 64 | +await page.waitForSelector('.element'); |
| 65 | +``` |
| 66 | + |
| 67 | +### Why Web-First Assertions? |
| 68 | + |
| 69 | +Web-first assertions automatically retry until: |
| 70 | +- The condition is met (test passes) |
| 71 | +- Timeout occurs (test fails with clear error) |
| 72 | + |
| 73 | +This handles timing issues gracefully without manual `waitFor()` calls or fixed delays. |
| 74 | + |
| 75 | +**Real-world impact from PR #14125:** Converting from imperative checks to web-first assertions eliminated multiple race conditions in cross-format testing where different formats loaded at different speeds. |
| 76 | + |
| 77 | +## Role-Based Selectors |
| 78 | + |
| 79 | +**Prefer semantic role-based selectors** over CSS attribute selectors. |
| 80 | + |
| 81 | +### Interactive Elements |
| 82 | + |
| 83 | +```typescript |
| 84 | +// ✅ Good - semantic, resilient to markup changes |
| 85 | +await page.getByRole('tab', { name: 'Page 2' }).click(); |
| 86 | +await page.getByRole('button', { name: 'Toggle sidebar' }).click(); |
| 87 | +await page.getByRole('heading', { name: 'Title' }).isVisible(); |
| 88 | +await page.getByRole('navigation').locator('a', { hasText: 'Home' }).click(); |
| 89 | + |
| 90 | +// ❌ Bad - fragile, coupled to implementation |
| 91 | +await page.locator('a[data-bs-target="#page-2"]').click(); |
| 92 | +await page.locator('.collapse-toggle').click(); |
| 93 | +await page.locator('h1.title').isVisible(); |
| 94 | +await page.locator('nav a:has-text("Home")').click(); |
| 95 | +``` |
| 96 | + |
| 97 | +### When to Use CSS Selectors |
| 98 | + |
| 99 | +Role-based selectors aren't always possible. Use CSS selectors for: |
| 100 | +- Custom components without ARIA roles |
| 101 | +- Testing implementation-specific classes (e.g., `.quarto-axe-report`) |
| 102 | +- Dynamic content where role/name combinations are too generic |
| 103 | + |
| 104 | +```typescript |
| 105 | +// Acceptable - testing specific implementation class |
| 106 | +await page.locator('.quarto-axe-report').toBeVisible(); |
| 107 | + |
| 108 | +// Acceptable - no semantic role exists |
| 109 | +await page.locator('section.quarto-axe-report-slide').toBeAttached(); |
| 110 | +``` |
| 111 | + |
| 112 | +### Why Role-Based Selectors? |
| 113 | + |
| 114 | +1. **Readability:** `getByRole('tab', { name: 'Page 2' })` is self-documenting |
| 115 | +2. **Accessibility:** If the selector works, the element is accessible |
| 116 | +3. **Resilience:** CSS classes and attributes change; roles and labels are stable contracts |
| 117 | +4. **Maintenance:** Easier to understand intent when reviewing tests |
| 118 | + |
| 119 | +**Real-world impact from PR #14125:** Dashboard rescan tests (8 test cases) initially used CSS attribute selectors like `a[data-bs-target="#page-2"]`. Refactoring to role-based selectors made tests readable without opening the HTML fixtures. |
| 120 | + |
| 121 | +## Handling Non-Unique Selectors |
| 122 | + |
| 123 | +When external tools produce selectors you don't control (e.g., axe-core returning generic "span"), use `.first()` with explanatory comments. |
| 124 | + |
| 125 | +### Pattern |
| 126 | + |
| 127 | +```typescript |
| 128 | +// ✅ Good - explicit about why .first() is used |
| 129 | +// Use .first() since axe-core may produce non-unique selectors (e.g., "span"). |
| 130 | +// This tests integration (hover triggers highlight) not selector uniqueness. |
| 131 | +const element = page.locator(selector).first(); |
| 132 | +await expect(element).toHaveClass(/quarto-axe-hover-highlight/); |
| 133 | + |
| 134 | +// ❌ Bad - no explanation, unclear intent |
| 135 | +const element = page.locator(selector).first(); |
| 136 | +await expect(element).toHaveClass(/quarto-axe-hover-highlight/); |
| 137 | +``` |
| 138 | + |
| 139 | +### When to Use `.first()` |
| 140 | + |
| 141 | +Use when: |
| 142 | +- External tools generate selectors you don't control |
| 143 | +- Test focuses on interaction/integration, not selector precision |
| 144 | +- Selector is known to match multiple elements, but you only care about one |
| 145 | + |
| 146 | +**Always add comments** explaining: |
| 147 | +1. Why the selector may be non-unique (e.g., "axe-core produces generic selectors") |
| 148 | +2. What the test is actually verifying (e.g., "hover triggers highlight, not selector uniqueness") |
| 149 | + |
| 150 | +### Why Not Fix the Selector? |
| 151 | + |
| 152 | +In integration tests, you're often verifying end-to-end behavior with third-party libraries. The test validates that your integration code works correctly, not that the third-party library produces optimal selectors. |
| 153 | + |
| 154 | +**Real-world example from PR #14125:** |
| 155 | + |
| 156 | +```typescript |
| 157 | +// axe-core returns CSS selectors like "span" for violations |
| 158 | +// We test: "when hovering violation target, does page element get highlighted?" |
| 159 | +// We don't test: "does axe produce unique selectors?" (that's axe's job) |
| 160 | + |
| 161 | +const target = reportSlide.locator('.quarto-axe-violation-target').first(); |
| 162 | +await target.hover(); |
| 163 | + |
| 164 | +// Use .first() since axe-core may produce non-unique selectors (e.g., "span"). |
| 165 | +// This tests integration (hover triggers highlight) not selector uniqueness. |
| 166 | +const element = page.locator(selector).first(); |
| 167 | +await expect(element).toHaveClass(/quarto-axe-hover-highlight/); |
| 168 | +``` |
| 169 | + |
| 170 | +## Async Completion Signals |
| 171 | + |
| 172 | +For tests that wait for async operations (network requests, library initialization, processing), add deterministic completion signals instead of arbitrary delays or polling. |
| 173 | + |
| 174 | +### Pattern |
| 175 | + |
| 176 | +```typescript |
| 177 | +// In application code: |
| 178 | +async function init() { |
| 179 | + try { |
| 180 | + await loadLibrary(); |
| 181 | + await processContent(); |
| 182 | + } catch (error) { |
| 183 | + console.error('Initialization failed:', error); |
| 184 | + } finally { |
| 185 | + // Always set completion signal, even if work fails |
| 186 | + document.body.setAttribute('data-feature-complete', 'true'); |
| 187 | + } |
| 188 | +} |
| 189 | + |
| 190 | +// In test: |
| 191 | +await page.goto('/page.html'); |
| 192 | +await page.waitForSelector('[data-feature-complete]', { timeout: 15000 }); |
| 193 | + |
| 194 | +// Now you can safely assert on the results |
| 195 | +await expect(page.locator('.result')).toBeVisible(); |
| 196 | +``` |
| 197 | + |
| 198 | +### Error Handling |
| 199 | + |
| 200 | +The signal should **always** be set, even on failure: |
| 201 | + |
| 202 | +```typescript |
| 203 | +// ✅ Good - completion signal set in finally block |
| 204 | +async function processData() { |
| 205 | + try { |
| 206 | + await doAsyncWork(); |
| 207 | + } catch (error) { |
| 208 | + console.error('Processing failed:', error); |
| 209 | + } finally { |
| 210 | + document.body.setAttribute('data-processing-complete', 'true'); |
| 211 | + } |
| 212 | +} |
| 213 | + |
| 214 | +// ❌ Bad - signal only set on success |
| 215 | +async function processData() { |
| 216 | + try { |
| 217 | + await doAsyncWork(); |
| 218 | + document.body.setAttribute('data-processing-complete', 'true'); |
| 219 | + } catch (error) { |
| 220 | + console.error('Processing failed:', error); |
| 221 | + } |
| 222 | +} |
| 223 | +``` |
| 224 | + |
| 225 | +### Why Not Use Fixed Delays? |
| 226 | + |
| 227 | +```typescript |
| 228 | +// ❌ Bad - arbitrary delay, may be too short or unnecessarily long |
| 229 | +await page.goto('/page.html'); |
| 230 | +await page.waitForTimeout(5000); // Hope 5 seconds is enough? |
| 231 | + |
| 232 | +// ✅ Good - deterministic, completes as soon as ready |
| 233 | +await page.goto('/page.html'); |
| 234 | +await page.waitForSelector('[data-feature-complete]', { timeout: 15000 }); |
| 235 | +``` |
| 236 | + |
| 237 | +### Why Completion Signals? |
| 238 | + |
| 239 | +1. **Deterministic:** Test knows exactly when async work is done |
| 240 | +2. **Fast:** No waiting longer than necessary |
| 241 | +3. **Clear failures:** Timeout means "work never completed" not "maybe we didn't wait long enough" |
| 242 | +4. **Debuggable:** Missing attribute = work didn't finish or crashed |
| 243 | + |
| 244 | +**Real-world impact from PR #14125:** The axe accessibility tests initially had race conditions where tests would sometimes pass/fail depending on axe-core's CDN load speed. Adding `data-quarto-axe-complete` in a finally block made tests deterministic - they wait exactly as long as needed and fail clearly if axe never initializes. |
| 245 | + |
| 246 | +### Advanced: Generation Counters for Rescanning |
| 247 | + |
| 248 | +When operations can be triggered multiple times (e.g., rescanning on navigation), use generation counters to discard stale results: |
| 249 | + |
| 250 | +```typescript |
| 251 | +let scanGeneration = 0; |
| 252 | + |
| 253 | +async function rescan() { |
| 254 | + const currentGeneration = ++scanGeneration; |
| 255 | + |
| 256 | + try { |
| 257 | + const results = await performScan(); |
| 258 | + |
| 259 | + // Discard stale results if user triggered another scan |
| 260 | + if (currentGeneration !== scanGeneration) { |
| 261 | + return; |
| 262 | + } |
| 263 | + |
| 264 | + updateUI(results); |
| 265 | + } finally { |
| 266 | + // Only set completion for latest scan |
| 267 | + if (currentGeneration === scanGeneration) { |
| 268 | + document.body.setAttribute('data-scan-complete', 'true'); |
| 269 | + } |
| 270 | + } |
| 271 | +} |
| 272 | +``` |
| 273 | + |
| 274 | +**From PR #14125 dashboard rescan:** Users can switch tabs/pages faster than axe scans complete. Generation counters ensure old scans don't overwrite newer results. |
| 275 | + |
| 276 | +## Parameterized Tests |
| 277 | + |
| 278 | +When testing the same behavior across multiple formats or configurations, use `test.describe` with a test cases array instead of separate spec files. |
| 279 | + |
| 280 | +### Pattern |
| 281 | + |
| 282 | +```typescript |
| 283 | +interface TestCase { |
| 284 | + format: string; |
| 285 | + url: string; |
| 286 | + expectedViolation?: string; |
| 287 | + shouldFail?: string; // Reason for expected failure |
| 288 | +} |
| 289 | + |
| 290 | +const testCases: TestCase[] = [ |
| 291 | + { format: 'html', url: '/html/feature.html', expectedViolation: 'color-contrast' }, |
| 292 | + { format: 'revealjs', url: '/revealjs/feature.html', expectedViolation: 'link-name' }, |
| 293 | + { |
| 294 | + format: 'dashboard', |
| 295 | + url: '/dashboard/feature.html', |
| 296 | + shouldFail: 'Dashboard has no <main> element (#13781)' |
| 297 | + }, |
| 298 | +]; |
| 299 | + |
| 300 | +test.describe('Feature across formats', () => { |
| 301 | + for (const { format, url, expectedViolation, shouldFail } of testCases) { |
| 302 | + test(`${format} — feature detects ${expectedViolation}`, async ({ page }) => { |
| 303 | + if (shouldFail) { |
| 304 | + test.fail(); // Mark as expected failure |
| 305 | + } |
| 306 | + |
| 307 | + await page.goto(url); |
| 308 | + // Shared test logic |
| 309 | + }); |
| 310 | + } |
| 311 | +}); |
| 312 | +``` |
| 313 | + |
| 314 | +### When to Use Parameterized Tests |
| 315 | + |
| 316 | +Use when: |
| 317 | +- Same assertion logic applied to multiple formats (html, revealjs, dashboard, pdf) |
| 318 | +- Testing multiple output modes (console, json, document) |
| 319 | +- Testing across configurations (themes, options, feature flags) |
| 320 | + |
| 321 | +**Benefits:** |
| 322 | +- Reduces file count (1 spec file instead of 3-10) |
| 323 | +- Centralizes shared helpers |
| 324 | +- Easy to add new test cases |
| 325 | +- Clear comparison of format differences |
| 326 | + |
| 327 | +**From PR #14125:** axe-accessibility.spec.ts tests 3 formats × 3 output modes = 9 base cases in a single 431-line file instead of 9 separate spec files. |
| 328 | + |
| 329 | +### Expected Failures with test.fail() |
| 330 | + |
| 331 | +Mark known failures explicitly: |
| 332 | + |
| 333 | +```typescript |
| 334 | +test('Feature that is broken in revealjs', async ({ page }) => { |
| 335 | + // RevealJS doesn't support this yet (#13781) |
| 336 | + test.fail(); |
| 337 | + |
| 338 | + // Normal test logic - if this unexpectedly passes, Playwright will flag it |
| 339 | + await page.goto('/revealjs/feature.html'); |
| 340 | + await expect(page.locator('.feature')).toBeVisible(); |
| 341 | +}); |
| 342 | +``` |
| 343 | + |
| 344 | +**Why use test.fail():** |
| 345 | +- Documents known issues in test suite |
| 346 | +- Test passes when it fails (expected behavior) |
| 347 | +- Test **fails** if it unexpectedly passes (signals the bug is fixed) |
| 348 | +- Better than commenting out tests or skipping with test.skip() |
| 349 | + |
| 350 | +## Summary |
| 351 | + |
| 352 | +**Four key patterns for reliable Playwright tests:** |
| 353 | + |
| 354 | +1. **Web-first assertions** - `expect(el).toContainText()` not `expect(await el.textContent())` |
| 355 | +2. **Role-based selectors** - `getByRole('tab', { name: 'Page 2' })` not `locator('a[data-bs-target]')` |
| 356 | +3. **Explicit .first() comments** - Explain why and what you're testing |
| 357 | +4. **Completion signals** - `data-feature-complete` in finally blocks, not arbitrary delays |
| 358 | + |
| 359 | +These patterns emerged from building comprehensive cross-format test coverage and debugging race conditions. They make tests: |
| 360 | +- More reliable (fewer flaky failures) |
| 361 | +- More readable (intent is clear) |
| 362 | +- Easier to maintain (resilient to markup changes) |
| 363 | +- Faster to debug (clear failure modes) |
| 364 | + |
| 365 | +**Reference implementations:** |
| 366 | +- `tests/integration/playwright/tests/axe-accessibility.spec.ts` - 431 lines, 75 test cases |
| 367 | +- `tests/integration/playwright/tests/html-math-katex.spec.ts` - Parameterized format testing |
0 commit comments