feat(users): add GDPR data provider registry#3894
Conversation
Adds a config-free, import-safe leaf registry that optional modules can use to self-register GDPR data providers. This is the foundation for the GDPR export and erasure endpoints. Key features: - Map-keyed by stable string key (prevents double-registration) - Validates key, axis, retention, export, and erase parameters - Sequential execution with error propagation (fail-closed) - _reset() helper for testing Closes pierreb-devkit#3883
WalkthroughA new Map-keyed GDPR data provider registry module is created at ChangesGDPR Data Provider Registry
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related issues
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modules/users/lib/dataProvider.registry.js`:
- Around line 24-25: The validation check in the registerDataProvider function
currently only checks if the key is a non-empty string, but it allows
whitespace-only strings like ' ' to pass through. Update the validation
condition to also reject keys that consist only of whitespace characters by
using the trim() method to remove leading and trailing whitespace and then
checking if the resulting string is empty, ensuring that provider keys are truly
meaningful and non-whitespace.
- Line 86: The `getProviders()` function creates a shallow copy of the Map, but
the provider objects inside remain mutable references that can be altered by
callers, allowing modification of `axis`, `retention`, `export`, or `erase`
properties outside of the intended registry control. Modify `getProviders()` to
return a deeply immutable version of the Map by either cloning each provider
object and freezing them, or by using Object.freeze() on cloned copies of the
provider objects before returning them in the new Map. This ensures that callers
cannot mutate the provider objects and bypass the registration control flow.
In `@modules/users/tests/dataProvider.registry.test.js`:
- Around line 10-293: Add JSDoc headers with `@param` and `@returns` documentation
to all functions introduced or modified in this test file to comply with
repository documentation standards. For each test callback function, inline
async function (such as exportFn, eraseFn, tasksExport, uploadsExport,
tasksErase, uploadsErase, eraseFn1, eraseFn2, etc.), and helper function like
_reset, registerDataProvider, getProviders, runDataExport, and runDataErasure,
include JSDoc blocks that clearly document what parameters the function accepts
and what it returns, even for test files.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: bcf10f0b-6273-46ae-ac17-c70b86abb31a
📒 Files selected for processing (2)
modules/users/lib/dataProvider.registry.jsmodules/users/tests/dataProvider.registry.test.js
| if (typeof key !== 'string' || !key) { | ||
| throw new TypeError('registerDataProvider: key must be a non-empty string'); |
There was a problem hiding this comment.
Reject whitespace-only provider keys.
At Line 24, ' ' passes validation and becomes a valid registry key. This makes module manifests/audit output ambiguous and defeats the “non-empty string” intent.
Suggested fix
- if (typeof key !== 'string' || !key) {
+ if (typeof key !== 'string' || !key.trim()) {
throw new TypeError('registerDataProvider: key must be a non-empty string');
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (typeof key !== 'string' || !key) { | |
| throw new TypeError('registerDataProvider: key must be a non-empty string'); | |
| if (typeof key !== 'string' || !key.trim()) { | |
| throw new TypeError('registerDataProvider: key must be a non-empty string'); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modules/users/lib/dataProvider.registry.js` around lines 24 - 25, The
validation check in the registerDataProvider function currently only checks if
the key is a non-empty string, but it allows whitespace-only strings like ' '
to pass through. Update the validation condition to also reject keys that
consist only of whitespace characters by using the trim() method to remove
leading and trailing whitespace and then checking if the resulting string is
empty, ensuring that provider keys are truly meaningful and non-whitespace.
| * @description Get all registered providers (read-only view for testing/inspection). | ||
| * @returns {Map<string, Object>} | ||
| */ | ||
| export const getProviders = () => new Map(providers); |
There was a problem hiding this comment.
getProviders() leaks mutable provider objects despite “read-only view”.
At Line 86, new Map(providers) only copies the map container. The provider objects are still shared references, so callers can mutate axis, retention, export, or erase and silently alter runtime behavior outside registerDataProvider.
Suggested fix
-export const getProviders = () => new Map(providers);
+export const getProviders = () => new Map(
+ Array.from(providers.entries(), ([key, provider]) => [key, Object.freeze({ ...provider })]),
+);🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modules/users/lib/dataProvider.registry.js` at line 86, The `getProviders()`
function creates a shallow copy of the Map, but the provider objects inside
remain mutable references that can be altered by callers, allowing modification
of `axis`, `retention`, `export`, or `erase` properties outside of the intended
registry control. Modify `getProviders()` to return a deeply immutable version
of the Map by either cloning each provider object and freezing them, or by using
Object.freeze() on cloned copies of the provider objects before returning them
in the new Map. This ensures that callers cannot mutate the provider objects and
bypass the registration control flow.
| describe('DataProvider Registry', () => { | ||
| beforeEach(() => { | ||
| _reset(); | ||
| }); | ||
|
|
||
| describe('registerDataProvider', () => { | ||
| it('should register a valid provider', () => { | ||
| const exportFn = async () => ({}); | ||
| const eraseFn = async () => ({}); | ||
|
|
||
| registerDataProvider({ | ||
| key: 'tasks', | ||
| axis: 'user', | ||
| retention: 'delete', | ||
| export: exportFn, | ||
| erase: eraseFn, | ||
| }); | ||
|
|
||
| const providers = getProviders(); | ||
| expect(providers.size).toBe(1); | ||
| expect(providers.has('tasks')).toBe(true); | ||
| }); | ||
|
|
||
| it('should throw TypeError for empty key', () => { | ||
| expect(() => registerDataProvider({ | ||
| key: '', | ||
| axis: 'user', | ||
| retention: 'delete', | ||
| export: async () => ({}), | ||
| erase: async () => ({}), | ||
| })).toThrow('registerDataProvider: key must be a non-empty string'); | ||
| }); | ||
|
|
||
| it('should throw TypeError for non-string key', () => { | ||
| expect(() => registerDataProvider({ | ||
| key: 123, | ||
| axis: 'user', | ||
| retention: 'delete', | ||
| export: async () => ({}), | ||
| erase: async () => ({}), | ||
| })).toThrow('registerDataProvider: key must be a non-empty string'); | ||
| }); | ||
|
|
||
| it('should throw TypeError for invalid axis', () => { | ||
| expect(() => registerDataProvider({ | ||
| key: 'tasks', | ||
| axis: 'invalid', | ||
| retention: 'delete', | ||
| export: async () => ({}), | ||
| erase: async () => ({}), | ||
| })).toThrow('registerDataProvider: axis must be "user" or "org"'); | ||
| }); | ||
|
|
||
| it('should throw TypeError for invalid retention', () => { | ||
| expect(() => registerDataProvider({ | ||
| key: 'tasks', | ||
| axis: 'user', | ||
| retention: 'invalid', | ||
| export: async () => ({}), | ||
| erase: async () => ({}), | ||
| })).toThrow('registerDataProvider: retention must be "delete" or "anonymize"'); | ||
| }); | ||
|
|
||
| it('should throw TypeError for non-function export', () => { | ||
| expect(() => registerDataProvider({ | ||
| key: 'tasks', | ||
| axis: 'user', | ||
| retention: 'delete', | ||
| export: 'not a function', | ||
| erase: async () => ({}), | ||
| })).toThrow('registerDataProvider: export must be a function'); | ||
| }); | ||
|
|
||
| it('should throw TypeError for non-function erase', () => { | ||
| expect(() => registerDataProvider({ | ||
| key: 'tasks', | ||
| axis: 'user', | ||
| retention: 'delete', | ||
| export: async () => ({}), | ||
| erase: 'not a function', | ||
| })).toThrow('registerDataProvider: erase must be a function'); | ||
| }); | ||
|
|
||
| it('should overwrite provider with same key (key-dedup)', () => { | ||
| const exportFn1 = async () => ({ version: 1 }); | ||
| const eraseFn1 = async () => ({}); | ||
| const exportFn2 = async () => ({ version: 2 }); | ||
| const eraseFn2 = async () => ({}); | ||
|
|
||
| registerDataProvider({ | ||
| key: 'tasks', | ||
| axis: 'user', | ||
| retention: 'delete', | ||
| export: exportFn1, | ||
| erase: eraseFn1, | ||
| }); | ||
|
|
||
| registerDataProvider({ | ||
| key: 'tasks', | ||
| axis: 'org', | ||
| retention: 'anonymize', | ||
| export: exportFn2, | ||
| erase: eraseFn2, | ||
| }); | ||
|
|
||
| const providers = getProviders(); | ||
| expect(providers.size).toBe(1); | ||
| expect(providers.get('tasks').axis).toBe('org'); | ||
| expect(providers.get('tasks').retention).toBe('anonymize'); | ||
| }); | ||
| }); | ||
|
|
||
| describe('runDataExport', () => { | ||
| it('should run single provider export', async () => { | ||
| const exportFn = async (payload) => ({ | ||
| tasks: [{ id: 1, title: 'Test Task' }], | ||
| userId: payload.userId, | ||
| }); | ||
|
|
||
| registerDataProvider({ | ||
| key: 'tasks', | ||
| axis: 'user', | ||
| retention: 'delete', | ||
| export: exportFn, | ||
| erase: async () => ({}), | ||
| }); | ||
|
|
||
| const result = await runDataExport({ userId: 'user123' }); | ||
|
|
||
| expect(result.data.tasks).toEqual({ | ||
| tasks: [{ id: 1, title: 'Test Task' }], | ||
| userId: 'user123', | ||
| }); | ||
| expect(result.modules).toEqual(['tasks']); | ||
| }); | ||
|
|
||
| it('should run multiple providers sequentially', async () => { | ||
| const order = []; | ||
|
|
||
| const tasksExport = async () => { | ||
| order.push('tasks'); | ||
| return { tasks: [] }; | ||
| }; | ||
|
|
||
| const uploadsExport = async () => { | ||
| order.push('uploads'); | ||
| return { uploads: [] }; | ||
| }; | ||
|
|
||
| registerDataProvider({ | ||
| key: 'tasks', | ||
| axis: 'user', | ||
| retention: 'delete', | ||
| export: tasksExport, | ||
| erase: async () => ({}), | ||
| }); | ||
|
|
||
| registerDataProvider({ | ||
| key: 'uploads', | ||
| axis: 'user', | ||
| retention: 'delete', | ||
| export: uploadsExport, | ||
| erase: async () => ({}), | ||
| }); | ||
|
|
||
| const result = await runDataExport({ userId: 'user123' }); | ||
|
|
||
| expect(order).toEqual(['tasks', 'uploads']); | ||
| expect(result.modules).toEqual(['tasks', 'uploads']); | ||
| }); | ||
|
|
||
| it('should return empty data for zero providers', async () => { | ||
| const result = await runDataExport({ userId: 'user123' }); | ||
|
|
||
| expect(result.data).toEqual({}); | ||
| expect(result.modules).toEqual([]); | ||
| }); | ||
| }); | ||
|
|
||
| describe('runDataErasure', () => { | ||
| it('should run single provider erase', async () => { | ||
| const eraseFn = async (payload) => ({ | ||
| deleted: 5, | ||
| userId: payload.userId, | ||
| }); | ||
|
|
||
| registerDataProvider({ | ||
| key: 'tasks', | ||
| axis: 'user', | ||
| retention: 'delete', | ||
| export: async () => ({}), | ||
| erase: eraseFn, | ||
| }); | ||
|
|
||
| const result = await runDataErasure({ userId: 'user123' }); | ||
|
|
||
| expect(result.results.tasks).toEqual({ | ||
| deleted: 5, | ||
| userId: 'user123', | ||
| }); | ||
| }); | ||
|
|
||
| it('should propagate errors (fail-closed)', async () => { | ||
| const eraseFn1 = async () => { | ||
| throw new Error('Provider 1 failed'); | ||
| }; | ||
|
|
||
| const eraseFn2 = async () => ({ | ||
| deleted: 3, | ||
| }); | ||
|
|
||
| registerDataProvider({ | ||
| key: 'failing', | ||
| axis: 'user', | ||
| retention: 'delete', | ||
| export: async () => ({}), | ||
| erase: eraseFn1, | ||
| }); | ||
|
|
||
| registerDataProvider({ | ||
| key: 'success', | ||
| axis: 'user', | ||
| retention: 'delete', | ||
| export: async () => ({}), | ||
| erase: eraseFn2, | ||
| }); | ||
|
|
||
| await expect(runDataErasure({ userId: 'user123' })) | ||
| .rejects.toThrow('Provider 1 failed'); | ||
| }); | ||
|
|
||
| it('should run providers sequentially', async () => { | ||
| const order = []; | ||
|
|
||
| const tasksErase = async () => { | ||
| order.push('tasks'); | ||
| return { deleted: 1 }; | ||
| }; | ||
|
|
||
| const uploadsErase = async () => { | ||
| order.push('uploads'); | ||
| return { deleted: 2 }; | ||
| }; | ||
|
|
||
| registerDataProvider({ | ||
| key: 'tasks', | ||
| axis: 'user', | ||
| retention: 'delete', | ||
| export: async () => ({}), | ||
| erase: tasksErase, | ||
| }); | ||
|
|
||
| registerDataProvider({ | ||
| key: 'uploads', | ||
| axis: 'user', | ||
| retention: 'delete', | ||
| export: async () => ({}), | ||
| erase: uploadsErase, | ||
| }); | ||
|
|
||
| await runDataErasure({ userId: 'user123' }); | ||
|
|
||
| expect(order).toEqual(['tasks', 'uploads']); | ||
| }); | ||
| }); | ||
|
|
||
| describe('_reset', () => { | ||
| it('should clear all registered providers', () => { | ||
| registerDataProvider({ | ||
| key: 'tasks', | ||
| axis: 'user', | ||
| retention: 'delete', | ||
| export: async () => ({}), | ||
| erase: async () => ({}), | ||
| }); | ||
|
|
||
| expect(getProviders().size).toBe(1); | ||
|
|
||
| _reset(); | ||
|
|
||
| expect(getProviders().size).toBe(0); | ||
| }); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift
Add JSDoc headers for all new/modified functions in this test file.
Multiple functions introduced in this file (test callbacks and helper functions) do not include JSDoc blocks with @param and @returns, which violates the repository JS documentation rule.
As per coding guidelines, "**/*.js: Every function must have a JSDoc header with @param and @returns" and "**/*.js: Every new or modified function must have a JSDoc header...".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@modules/users/tests/dataProvider.registry.test.js` around lines 10 - 293, Add
JSDoc headers with `@param` and `@returns` documentation to all functions introduced
or modified in this test file to comply with repository documentation standards.
For each test callback function, inline async function (such as exportFn,
eraseFn, tasksExport, uploadsExport, tasksErase, uploadsErase, eraseFn1,
eraseFn2, etc.), and helper function like _reset, registerDataProvider,
getProviders, runDataExport, and runDataErasure, include JSDoc blocks that
clearly document what parameters the function accepts and what it returns, even
for test files.
Source: Coding guidelines
What
Adds a config-free, import-safe leaf registry that optional modules can use to self-register GDPR data providers. This is the foundation piece needed for the GDPR export and erasure endpoints (issue #3884).
Why
Following the same pattern as
organizations/lib/orgRemoval.registry.js, but using aMapkeyed by stable string keys instead of aSetof function identities. This prevents double-registration when inline arrows are used in*.init.jsfiles.Changes
New file:
modules/users/lib/dataProvider.registry.jsregisterDataProvider({ key, axis, retention, export, erase })— validates all inputs, throwsTypeErroron bad paramsrunDataExport(payload)— runs all providers sequentially, returns{ data, modules }runDataErasure(payload)— runs all providers sequentially, errors propagate (fail-closed)getProviders()— returns a read-onlyMapfor inspection_reset()— test helper to clear all registrationsNew file:
modules/users/tests/dataProvider.registry.test.jsTesting
All 15 tests pass. You can run them with:
npm test -- --testPathPattern=dataProvider.registryNotes
This registry intentionally imports no services — it follows the same pattern as
orgRemoval.registry.jsto avoid import cycles. Individual modules (tasks, uploads, etc.) will self-register their providers in their respective*.init.jsfiles once this is merged.Closes #3883