Skip to content

Add documentation for Google Drive connector OAuth scope strategy and verification deliverables#1979

Open
ricofurtado wants to merge 1 commit into
mainfrom
google-microsoft-auth-app-verification
Open

Add documentation for Google Drive connector OAuth scope strategy and verification deliverables#1979
ricofurtado wants to merge 1 commit into
mainfrom
google-microsoft-auth-app-verification

Conversation

@ricofurtado

@ricofurtado ricofurtado commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator
  • Introduced a new document outlining the final OAuth scope strategy for Google Drive connector authentication.
  • Added detailed deliverables for Google verification items related to protecting ingested Drive data and implementing OAuth flow and connector account model.
  • Created a guide for verifying authorized domains for the Google OAuth app, ensuring compliance with Google's requirements.

Summary by CodeRabbit

  • Documentation
    • Added new guidance for Google OAuth domain verification, including required domains, ownership verification steps, and acceptance criteria.
    • Added detailed verification checklists and evidence requirements for Google Drive connector approval, including consent, privacy, security, and rollout materials.
    • Added documentation for authorization metadata and access-control expectations for Google Drive–ingested content.

… verification deliverables

- Introduced a new document outlining the final OAuth scope strategy for Google Drive connector authentication.
- Added detailed deliverables for Google verification items related to protecting ingested Drive data and implementing OAuth flow and connector account model.
- Created a guide for verifying authorized domains for the Google OAuth app, ensuring compliance with Google's requirements.
@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

Four new Markdown documents are added under auth-app-verification/: a full Google Drive connector-only OAuth release plan, a verification deliverables guide for Items 5 and 6 (OAuth flow and connector account model), a deliverables specification for Item 8 (authorization metadata and retrieval filtering), and a domain verification guide for Google OAuth authorized domains.

Changes

Google Drive OAuth Verification Documentation

Layer / File(s) Summary
Authentication-only release plan
auth-app-verification/google_drive_authentication_only_plan.md
Defines the connector-only drive.readonly scope boundary, consent screen wording, backend OAuth endpoints, token storage/refresh/revoke requirements, connector account data model, Drive sync/reconcile behavior, per-document authorization metadata, privacy policy update requirements, restricted-scope verification package outline, production submission checklist, rollout tasks, and a clean release statement.
Verification Items 5 and 6 deliverables
auth-app-verification/google_verification_items_5_and_6_deliverables.md
Specifies evidence to provide for the OAuth flow (Item 5) and connector account model (Item 6): scope justification, demo video script, privacy policy requirements, data-flow diagram outline, security controls summary, artifact checklist, and a suggested internal submission message.
Verification Item 8 deliverables
auth-app-verification/google_verification_item_8_deliverables.md
Defines required authorization metadata fields for Drive-derived documents/chunks, OpenSearch index mapping evidence, server-side retrieval filter semantics for tenant and ACL isolation, and cross-user/tenant isolation test scenarios with acceptance criteria.
Authorized domains verification guide
auth-app-verification/verify_authorized_domains_openrag.md
Step-by-step instructions for identifying domains to verify, Google Search Console ownership verification methods, root domain selection, redirect URI and JavaScript origin requirements, pre-submission cleanup steps, OpenRAG-specific examples, and acceptance criteria.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~5 minutes

Suggested labels

documentation

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main documentation additions about OAuth scope strategy and Google verification deliverables.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch google-microsoft-auth-app-verification

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (3)
auth-app-verification/google_verification_item_8_deliverables.md (2)

217-226: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low value

Align ACL field description with earlier section.

The index mapping table here describes ACL/user/group fields as "Terms-query compatible" only, but the "Fields to verify" table on Line 79 specified "Filterable exact-match or terms-query compatible fields." Use consistent language: "Exact-match or terms-query compatible" to accurately describe both single-value and multi-value ACL filtering.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@auth-app-verification/google_verification_item_8_deliverables.md` around
lines 217 - 226, The ACL/user/group field description in the index mapping table
is inconsistent with the earlier “Fields to verify” wording. Update the table
entry in this deliverable section to use the same phrasing as the verification
section, specifically referencing the ACL/user/group fields description so it
reads “Exact-match or terms-query compatible” instead of only “Terms-query
compatible.”

128-140: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick win

Add explicit source_type verification to test scenarios.

The cross-user and tenant test scenarios verify isolation but do not explicitly include a step to confirm that retrieved results are tagged with source_type = google_drive. Without this, passing tests could theoretically match non-Drive content. Add a verification step:

 6. Assert that user B receives no chunks, sources, answers, or citations from user A's Drive document.
+6a. Assert that any results returned do not have source_type = google_drive.

Also applies to: 158-170

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@auth-app-verification/google_verification_item_8_deliverables.md` around
lines 128 - 140, The minimum test scenario currently checks cross-user isolation
but does not explicitly verify that matched results are Google Drive content.
Update the relevant scenario(s) to add a step that asserts retrieved chunks,
sources, answers, or citations are tagged with source_type = google_drive when
user A’s Drive document is returned, using the same test flow in the
deliverables section so the assertion is tied to the Drive connector behavior.
auth-app-verification/google_verification_items_5_and_6_deliverables.md (1)

26-36: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick win

Document state parameter validation as a security control.

The actual implementation at src/connectors/google_drive/oauth.py:168-200 enforces self._flow_state != state before token exchange, which prevents CSRF attacks. This security control is not mentioned in the deliverables table but should be included in the OAuth flow description or security controls summary.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@auth-app-verification/google_verification_items_5_and_6_deliverables.md`
around lines 26 - 36, Add the missing OAuth state validation security control to
the deliverables content by updating the OAuth flow description or security
controls summary in the Google Drive verification document. Reference the
existing Google Drive OAuth implementation in the oauth flow/state handling
logic (the state check before token exchange) and explicitly note that the flow
validates the returned state against the stored flow state to prevent CSRF
before exchanging the code for tokens.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@auth-app-verification/google_verification_items_5_and_6_deliverables.md`:
- Line 34: The documentation currently claims refresh tokens are encrypted or
securely stored, but the visible implementation in revoke_credentials and the
token_file handling does not show encryption-at-rest. Update the affected
sections of google_verification_items_5_and_6_deliverables.md to describe only
what the code actually does for token storage and revocation, and remove any
unsupported encryption language unless it is backed by a corresponding
implementation in the OAuth flow or storage layer.
- Around line 103-124: The scope declaration in Google verification docs does
not match the actual GoogleDriveOAuth scope set, so align them by updating
either the docs or the implementation to be consistent. Use GoogleDriveOAuth and
its SCOPES definition as the source of truth: either remove the extra auth
scopes from the code or revise the documentation to include all requested
scopes, including drive.metadata.readonly, cloud-identity.groups.readonly,
admin.directory.group.readonly, and the user identity scopes if they remain in
use.

In `@auth-app-verification/verify_authorized_domains_openrag.md`:
- Around line 43-48: The example callback URLs in the table are inconsistent
with the documented Google auth endpoints. Update the examples in
verify_authorized_domains_openrag.md to use one consistent callback path, and
align it with the connector-specific route from the authentication plan rather
than the current /oauth/callback example; make sure the URL examples and the
authorized domain guidance match the same endpoint used by the auth flow.
- Line 16: The OAuth redirect URI documented here conflicts with the upstream
authentication plan and the Google Drive OAuth implementation. Update the
redirect URI path in this document so it matches the production callback used by
the Google Drive connector and the `google_drive_authentication_only_plan`
reference, keeping the `redirect_uri` path consistent with
`src/connectors/google_drive/oauth.py` for domain verification.
- Around line 64-71: The redirect URI example in the Google Drive OAuth setup
introduces a conflicting path variant. Update the documentation to use one
canonical redirect URI consistently across the auth plan and this guide, using
the same path in the relevant OAuth setup sections and any related references.
Check the Google Drive callback example and the authentication plan text
together so only the correct redirect URI format remains.

---

Nitpick comments:
In `@auth-app-verification/google_verification_item_8_deliverables.md`:
- Around line 217-226: The ACL/user/group field description in the index mapping
table is inconsistent with the earlier “Fields to verify” wording. Update the
table entry in this deliverable section to use the same phrasing as the
verification section, specifically referencing the ACL/user/group fields
description so it reads “Exact-match or terms-query compatible” instead of only
“Terms-query compatible.”
- Around line 128-140: The minimum test scenario currently checks cross-user
isolation but does not explicitly verify that matched results are Google Drive
content. Update the relevant scenario(s) to add a step that asserts retrieved
chunks, sources, answers, or citations are tagged with source_type =
google_drive when user A’s Drive document is returned, using the same test flow
in the deliverables section so the assertion is tied to the Drive connector
behavior.

In `@auth-app-verification/google_verification_items_5_and_6_deliverables.md`:
- Around line 26-36: Add the missing OAuth state validation security control to
the deliverables content by updating the OAuth flow description or security
controls summary in the Google Drive verification document. Reference the
existing Google Drive OAuth implementation in the oauth flow/state handling
logic (the state check before token exchange) and explicitly note that the flow
validates the returned state against the stored flow state to prevent CSRF
before exchanging the code for tokens.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: fc999f93-091f-4b51-b3ef-b941e5b723fd

📥 Commits

Reviewing files that changed from the base of the PR and between 74fe6be and 8e81cae.

⛔ Files ignored due to path filters (1)
  • auth-app-verification/Item 2-Final OAuth Scope Strategy for Google Drive Connector Authentication.docx is excluded by !**/*.docx
📒 Files selected for processing (4)
  • auth-app-verification/google_drive_authentication_only_plan.md
  • auth-app-verification/google_verification_item_8_deliverables.md
  • auth-app-verification/google_verification_items_5_and_6_deliverables.md
  • auth-app-verification/verify_authorized_domains_openrag.md

| Scope used | Only `https://www.googleapis.com/auth/drive.readonly` |
| Scope justification | Explain why OpenRAG needs read-only Drive access |
| Disconnect flow | Show or explain how users disconnect Google Drive |
| Token handling summary | Explain that refresh tokens are stored securely and not logged |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift

Token storage encryption claims are unsupported by implementation evidence.

The documentation repeatedly states refresh tokens are "encrypted or securely stored" (lines 34, 72, 168, 218), but the actual revoke_credentials implementation at src/connectors/google_drive/oauth.py:228-238 shows a plain on-disk file (self.token_file) removed via os.remove(). There is no visible encryption-at-rest in the provided implementation. Submitting encryption claims to Google without implementation evidence creates compliance risk if audited.

Also applies to: 72-72, 168-176, 218-226

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@auth-app-verification/google_verification_items_5_and_6_deliverables.md` at
line 34, The documentation currently claims refresh tokens are encrypted or
securely stored, but the visible implementation in revoke_credentials and the
token_file handling does not show encryption-at-rest. Update the affected
sections of google_verification_items_5_and_6_deliverables.md to describe only
what the code actually does for token storage and revocation, and remove any
unsupported encryption language unless it is backed by a corresponding
implementation in the OAuth flow or storage layer.

Comment on lines +103 to +124
## 2. Scope declaration

Declare only this scope:

```text
https://www.googleapis.com/auth/drive.readonly
```

Do **not** include these if this release is connector-only:

```text
openid
email
profile
```

Do **not** include full Drive write scope:

```text
https://www.googleapis.com/auth/drive
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🔴 Critical | 🏗️ Heavy lift

Scope declaration contradicts actual implementation.

The documentation instructs Google to declare only drive.readonly and explicitly excludes openid, email, profile. However, the actual GoogleDriveOAuth implementation requests all of these scopes:

  • openid, email, profile — listed here as "Do not include" but present in SCOPES
  • drive.metadata.readonly — required by implementation but omitted from documentation
  • cloud-identity.groups.readonly, admin.directory.group.readonly — present in SCOPES

This cross-file inconsistency between auth-app-verification/google_verification_items_5_and_6_deliverables.md and src/connectors/google_drive/oauth.py:21-33 will cause Google verification rejection if submitted as-is. Either the implementation must be narrowed to match this documentation, or the documentation must accurately describe the full scope set.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@auth-app-verification/google_verification_items_5_and_6_deliverables.md`
around lines 103 - 124, The scope declaration in Google verification docs does
not match the actual GoogleDriveOAuth scope set, so align them by updating
either the docs or the implementation to be consistent. Use GoogleDriveOAuth and
its SCOPES definition as the source of truth: either remove the extra auth
scopes from the code or revise the documentation to include all requested
scopes, including drive.metadata.readonly, cloud-identity.groups.readonly,
admin.directory.group.readonly, and the user identity scopes if they remain in
use.

| App/homepage URL | `https://openr.ag` or `https://docs.openr.ag` | Yes |
| Privacy policy URL | `https://openr.ag/privacy` | Yes |
| Terms of service URL | `https://openr.ag/terms` | Yes, if provided |
| OAuth redirect URI | `https://app.openr.ag/api/auth/google/callback` | Yes |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🔴 Critical | ⚡ Quick win

Redirect URI path contradicts upstream authentication plan.

This document shows the OAuth redirect URI as https://app.openr.ag/api/auth/google/callback, but the upstream google_drive_authentication_only_plan.md:200 specifies /connectors/google-drive/oauth/callback. The implementation at src/connectors/google_drive/oauth.py:168-200 uses a parameterized redirect_uri without hardcoding a path. These documents must agree on the production redirect URI path for domain verification to succeed.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@auth-app-verification/verify_authorized_domains_openrag.md` at line 16, The
OAuth redirect URI documented here conflicts with the upstream authentication
plan and the Google Drive OAuth implementation. Update the redirect URI path in
this document so it matches the production callback used by the Google Drive
connector and the `google_drive_authentication_only_plan` reference, keeping the
`redirect_uri` path consistent with `src/connectors/google_drive/oauth.py` for
domain verification.

Comment on lines +43 to +48
| URL used | Authorized domain to add |
|---|---|
| `https://app.openr.ag/oauth/callback` | `openr.ag` |
| `https://docs.openr.ag/privacy` | `openr.ag` |
| `https://openrag.ibm.com/callback` | `ibm.com` |
| `https://openrag.cloud.ibm.com/callback` | `ibm.com` or possibly `cloud.ibm.com`, depending on Google Console validation |

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Example callback paths are inconsistent with documented endpoints.

Line 45 shows https://app.openr.ag/oauth/callback which doesn't match either:

  • The /api/auth/google/callback path on Line 16, or
  • The /connectors/google-drive/oauth/callback path from the upstream authentication plan

Use a single, consistent callback path across all documentation. Prefer the connector-specific path from the authentication plan.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@auth-app-verification/verify_authorized_domains_openrag.md` around lines 43 -
48, The example callback URLs in the table are inconsistent with the documented
Google auth endpoints. Update the examples in
verify_authorized_domains_openrag.md to use one consistent callback path, and
align it with the connector-specific route from the authentication plan rather
than the current /oauth/callback example; make sure the URL examples and the
authorized domain guidance match the same endpoint used by the auth flow.

Comment on lines +64 to +71
A cleaner setup is:

```text
Homepage: https://openr.ag
Privacy policy: https://openr.ag/privacy
Redirect URI: https://app.openr.ag/api/connectors/google-drive/oauth/callback
Authorized domain: openr.ag
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win

Redirect URI example introduces another path variant.

Line 69 shows https://app.openr.ag/api/connectors/google-drive/oauth/callback which adds an /api/ prefix not present in the upstream authentication plan's /connectors/google-drive/oauth/callback. Consolidate all documentation to use one canonical redirect URI path. If /api/ is the correct API prefix, update the authentication plan to match; otherwise, remove it here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@auth-app-verification/verify_authorized_domains_openrag.md` around lines 64 -
71, The redirect URI example in the Google Drive OAuth setup introduces a
conflicting path variant. Update the documentation to use one canonical redirect
URI consistently across the auth plan and this guide, using the same path in the
relevant OAuth setup sections and any related references. Check the Google Drive
callback example and the authentication plan text together so only the correct
redirect URI format remains.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant