Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions assistant-output-disclosure-guard/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Assistant Output Disclosure Guard

This self-contained AI-Powered Research Assistant Suite slice validates synthetic AI assistant outputs before they are shown to researchers, collaborators, reviewers, or public discovery feeds.

The guard is distinct from broad assistant suites, citation recency checks, model-assumption diagnostics, dependency guards, reviewer calibration, external-validity transfer, sample custody, structured abstracts, and preregistration consistency. It focuses on output release safety: what an AI assistant is about to reveal.

## What it checks

- PHI, participant IDs, email addresses, and direct identifiers
- private storage links and unpublished dataset locations
- embargoed project details in public assistant output
- double-blind reviewer identity leakage
- prompt, system, or hidden-instruction leakage
- unsupported high-impact claims in review or gap-finder prose
- missing human approval for restricted outputs
- missing de-identification, redaction, consent, or data-use evidence

## Run locally

```sh
npm test
npm run demo
swift scripts/make-demo-video.swift artifacts/assistant-disclosure-demo.mp4
```

The demo writes reviewer artifacts under `artifacts/`:

- `assistant-disclosure-results.json`
- `assistant-disclosure-report.md`
- `assistant-disclosure-summary.svg`
- `assistant-disclosure-demo.mp4`
- `demo-transcript.md`

## Boundaries

All packets are synthetic. The module does not call external AI APIs, private manuscript stores, publisher systems, patient databases, credentials, payment systems, or live repositories.
16 changes: 16 additions & 0 deletions assistant-output-disclosure-guard/REQUIREMENT_MAP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Requirement Map

| Issue requirement | Implementation |
| --- | --- |
| Auto peer review reports | Checks peer-review assistant drafts for direct identifiers, reviewer leaks, unsupported claims, and missing redaction evidence before release. |
| Claims vs. evidence alignment | Reviews high-impact assistant claims for citation anchors and source support markers. |
| Reproducibility checker | Blocks reproducibility assistant output that exposes private storage links, internal run IDs, or non-redacted environment details. |
| Research gap finder | Reviews opportunity-feed text for embargoed project leakage and private lab capability exposure. |
| Real-time insights with rigor | Produces deterministic RELEASE, REVIEW, and HOLD decisions with remediation actions. |
| Project safety before public release | Enforces audience, embargo, double-blind, and human-approval gates before assistant output is visible. |
| Reviewer-ready evidence | Demo script generates JSON, Markdown, SVG, transcript, and MP4 artifacts for local replay. |
| Safe contribution boundary | Uses only synthetic packets and no external APIs, credentials, private manuscripts, or live data. |

## Distinct slice statement

This contribution focuses only on disclosure safety for generated assistant output. It does not implement a general AI assistant, citation retraction checks, dependency reproducibility, model assumptions, reviewer calibration, sample custody, structured abstracts, preregistration consistency, or external-validity transfer.
Binary file not shown.
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Assistant Output Disclosure Report

As of: 2026-06-18

## Summary

- Total packets: 5
- Release: 1
- Review: 1
- Hold: 3

## Packet Decisions

| Packet | Mode | Visibility | Decision | Primary reason |
| --- | --- | --- | --- | --- |
| Sanitized internal peer-review note | auto-peer-review | team | RELEASE_OUTPUT | All disclosure checks passed |
| Peer-review note with direct identifiers | auto-peer-review | external-review | HOLD_OUTPUT | DOUBLE_BLIND_IDENTITY_LEAK: Double-blind output appears to reveal reviewer identity. |
| Reproducibility note with private storage | reproducibility-checker | public | HOLD_OUTPUT | DATA_USE_EVIDENCE_MISSING: Restricted assistant output lacks data-use evidence. |
| Research gap feed with missing source support | research-gap-finder | team | REVIEW_BEFORE_RELEASE | UNSUPPORTED_HIGH_IMPACT_CLAIM: Found 1 high-impact claim(s) without source support. |
| Assistant note with instruction leakage | auto-peer-review | team | HOLD_OUTPUT | PROMPT_OR_INSTRUCTION_LEAK: Generated text appears to expose prompt, system, or instruction content. |

## Remediation Actions

### Sanitized internal peer-review note
- Release assistant output to the configured audience.

### Peer-review note with direct identifiers
- Redact direct identifiers and rerun disclosure review before release.
- Replace reviewer names with role-neutral labels before release.
- Attach human approval before releasing restricted output.
- Run and attach redaction review evidence.
- Add a de-identification summary for reviewer replay.
- Rewrite claims with source support or downgrade the language.

### Reproducibility note with private storage
- Redact direct identifiers and rerun disclosure review before release.
- Keep assistant output private until the embargo expires.
- Attach human approval before releasing restricted output.
- Run and attach redaction review evidence.
- Attach data-use agreement or mark the output non-releasable.

### Research gap feed with missing source support
- Rewrite claims with source support or downgrade the language.

### Assistant note with instruction leakage
- Suppress output and inspect the assistant prompt chain.
Original file line number Diff line number Diff line change
@@ -0,0 +1,288 @@
{
"asOf": "2026-06-18",
"policy": {
"embargoBufferDays": 0,
"unsupportedClaimVerbs": [
"proves",
"cures",
"guarantees",
"eliminates"
],
"restrictedAudienceRequiresApproval": true
},
"summary": {
"totalPackets": 5,
"release": 1,
"review": 1,
"hold": 3,
"heldPacketIds": [
"packet-phi-hold",
"packet-private-repro",
"packet-prompt-leak"
],
"reviewPacketIds": [
"packet-gap-review"
],
"topRisks": [
{
"packetId": "packet-phi-hold",
"severity": "HOLD_OUTPUT",
"code": "DOUBLE_BLIND_IDENTITY_LEAK"
},
{
"packetId": "packet-phi-hold",
"severity": "HOLD_OUTPUT",
"code": "EMAIL_ADDRESS"
},
{
"packetId": "packet-phi-hold",
"severity": "HOLD_OUTPUT",
"code": "MEDICAL_RECORD_NUMBER"
},
{
"packetId": "packet-phi-hold",
"severity": "HOLD_OUTPUT",
"code": "REDACTION_REVIEW_MISSING"
},
{
"packetId": "packet-phi-hold",
"severity": "HOLD_OUTPUT",
"code": "RESTRICTED_OUTPUT_LACKS_APPROVAL"
},
{
"packetId": "packet-private-repro",
"severity": "HOLD_OUTPUT",
"code": "DATA_USE_EVIDENCE_MISSING"
},
{
"packetId": "packet-private-repro",
"severity": "HOLD_OUTPUT",
"code": "EMBARGOED_PUBLIC_OUTPUT"
},
{
"packetId": "packet-private-repro",
"severity": "HOLD_OUTPUT",
"code": "INTERNAL_TOKEN_REFERENCE"
}
]
},
"results": [
{
"packetId": "packet-clean-review",
"title": "Sanitized internal peer-review note",
"assistantMode": "auto-peer-review",
"visibility": "team",
"audience": "project-authors",
"decision": "RELEASE_OUTPUT",
"disclosureSignals": {
"directIdentifierMatches": [],
"promptLeak": false,
"unsupportedClaimCount": 0,
"citationCount": 1
},
"evidence": {
"humanApproval": "review-lead-17",
"redactionReview": "passed",
"deidentificationSummary": "direct identifiers removed",
"dataUseAgreement": "dua-synthetic-2026"
},
"reasons": [],
"actions": [],
"riskScore": 0
},
{
"packetId": "packet-phi-hold",
"title": "Peer-review note with direct identifiers",
"assistantMode": "auto-peer-review",
"visibility": "external-review",
"audience": "double-blind-reviewers",
"decision": "HOLD_OUTPUT",
"disclosureSignals": {
"directIdentifierMatches": [
"EMAIL_ADDRESS",
"MEDICAL_RECORD_NUMBER"
],
"promptLeak": false,
"unsupportedClaimCount": 1,
"citationCount": 0
},
"evidence": {
"humanApproval": null,
"redactionReview": "missing",
"deidentificationSummary": null,
"dataUseAgreement": "dua-synthetic-2026"
},
"reasons": [
{
"severity": "HOLD_OUTPUT",
"code": "DOUBLE_BLIND_IDENTITY_LEAK",
"message": "Double-blind output appears to reveal reviewer identity."
},
{
"severity": "HOLD_OUTPUT",
"code": "EMAIL_ADDRESS",
"message": "Generated text matches email address pattern."
},
{
"severity": "HOLD_OUTPUT",
"code": "MEDICAL_RECORD_NUMBER",
"message": "Generated text matches medical record number pattern."
},
{
"severity": "HOLD_OUTPUT",
"code": "REDACTION_REVIEW_MISSING",
"message": "No passed redaction review is attached."
},
{
"severity": "HOLD_OUTPUT",
"code": "RESTRICTED_OUTPUT_LACKS_APPROVAL",
"message": "Restricted assistant output has no human approval evidence."
},
{
"severity": "REVIEW_BEFORE_RELEASE",
"code": "DEIDENTIFICATION_SUMMARY_MISSING",
"message": "No de-identification summary is attached."
},
{
"severity": "REVIEW_BEFORE_RELEASE",
"code": "UNSUPPORTED_HIGH_IMPACT_CLAIM",
"message": "Found 1 high-impact claim(s) without source support."
}
],
"actions": [
"Redact direct identifiers and rerun disclosure review before release.",
"Replace reviewer names with role-neutral labels before release.",
"Attach human approval before releasing restricted output.",
"Run and attach redaction review evidence.",
"Add a de-identification summary for reviewer replay.",
"Rewrite claims with source support or downgrade the language."
],
"riskScore": 12
},
{
"packetId": "packet-private-repro",
"title": "Reproducibility note with private storage",
"assistantMode": "reproducibility-checker",
"visibility": "public",
"audience": "public-project-page",
"decision": "HOLD_OUTPUT",
"disclosureSignals": {
"directIdentifierMatches": [
"PRIVATE_STORAGE_LINK",
"INTERNAL_TOKEN_REFERENCE"
],
"promptLeak": false,
"unsupportedClaimCount": 0,
"citationCount": 1
},
"evidence": {
"humanApproval": null,
"redactionReview": "missing",
"deidentificationSummary": "pending",
"dataUseAgreement": null
},
"reasons": [
{
"severity": "HOLD_OUTPUT",
"code": "DATA_USE_EVIDENCE_MISSING",
"message": "Restricted assistant output lacks data-use evidence."
},
{
"severity": "HOLD_OUTPUT",
"code": "EMBARGOED_PUBLIC_OUTPUT",
"message": "Output is public while project remains embargoed for 42 days."
},
{
"severity": "HOLD_OUTPUT",
"code": "INTERNAL_TOKEN_REFERENCE",
"message": "Generated text matches internal token reference pattern."
},
{
"severity": "HOLD_OUTPUT",
"code": "PRIVATE_STORAGE_LINK",
"message": "Generated text matches private storage link pattern."
},
{
"severity": "HOLD_OUTPUT",
"code": "REDACTION_REVIEW_MISSING",
"message": "No passed redaction review is attached."
},
{
"severity": "HOLD_OUTPUT",
"code": "RESTRICTED_OUTPUT_LACKS_APPROVAL",
"message": "Restricted assistant output has no human approval evidence."
}
],
"actions": [
"Redact direct identifiers and rerun disclosure review before release.",
"Keep assistant output private until the embargo expires.",
"Attach human approval before releasing restricted output.",
"Run and attach redaction review evidence.",
"Attach data-use agreement or mark the output non-releasable."
],
"riskScore": 12
},
{
"packetId": "packet-gap-review",
"title": "Research gap feed with missing source support",
"assistantMode": "research-gap-finder",
"visibility": "team",
"audience": "lab-members",
"decision": "REVIEW_BEFORE_RELEASE",
"disclosureSignals": {
"directIdentifierMatches": [],
"promptLeak": false,
"unsupportedClaimCount": 1,
"citationCount": 1
},
"evidence": {
"humanApproval": "gap-reviewer-4",
"redactionReview": "passed",
"deidentificationSummary": "no human subjects",
"dataUseAgreement": "not-required"
},
"reasons": [
{
"severity": "REVIEW_BEFORE_RELEASE",
"code": "UNSUPPORTED_HIGH_IMPACT_CLAIM",
"message": "Found 1 high-impact claim(s) without source support."
}
],
"actions": [
"Rewrite claims with source support or downgrade the language."
],
"riskScore": 1
},
{
"packetId": "packet-prompt-leak",
"title": "Assistant note with instruction leakage",
"assistantMode": "auto-peer-review",
"visibility": "team",
"audience": "project-authors",
"decision": "HOLD_OUTPUT",
"disclosureSignals": {
"directIdentifierMatches": [],
"promptLeak": true,
"unsupportedClaimCount": 0,
"citationCount": 0
},
"evidence": {
"humanApproval": "review-lead-19",
"redactionReview": "passed",
"deidentificationSummary": "not applicable",
"dataUseAgreement": "not-required"
},
"reasons": [
{
"severity": "HOLD_OUTPUT",
"code": "PROMPT_OR_INSTRUCTION_LEAK",
"message": "Generated text appears to expose prompt, system, or instruction content."
}
],
"actions": [
"Suppress output and inspect the assistant prompt chain."
],
"riskScore": 2
}
]
}
Loading