Add documentation for Google Drive connector OAuth scope strategy and verification deliverables#1979
Add documentation for Google Drive connector OAuth scope strategy and verification deliverables#1979ricofurtado wants to merge 1 commit into
Conversation
… verification deliverables - Introduced a new document outlining the final OAuth scope strategy for Google Drive connector authentication. - Added detailed deliverables for Google verification items related to protecting ingested Drive data and implementing OAuth flow and connector account model. - Created a guide for verifying authorized domains for the Google OAuth app, ensuring compliance with Google's requirements.
WalkthroughFour new Markdown documents are added under ChangesGoogle Drive OAuth Verification Documentation
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~5 minutes Suggested labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 5
🧹 Nitpick comments (3)
auth-app-verification/google_verification_item_8_deliverables.md (2)
217-226: 📐 Maintainability & Code Quality | 🔵 Trivial | 💤 Low valueAlign ACL field description with earlier section.
The index mapping table here describes ACL/user/group fields as "Terms-query compatible" only, but the "Fields to verify" table on Line 79 specified "Filterable exact-match or terms-query compatible fields." Use consistent language: "Exact-match or terms-query compatible" to accurately describe both single-value and multi-value ACL filtering.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@auth-app-verification/google_verification_item_8_deliverables.md` around lines 217 - 226, The ACL/user/group field description in the index mapping table is inconsistent with the earlier “Fields to verify” wording. Update the table entry in this deliverable section to use the same phrasing as the verification section, specifically referencing the ACL/user/group fields description so it reads “Exact-match or terms-query compatible” instead of only “Terms-query compatible.”
128-140: 🎯 Functional Correctness | 🔵 Trivial | ⚡ Quick winAdd explicit
source_typeverification to test scenarios.The cross-user and tenant test scenarios verify isolation but do not explicitly include a step to confirm that retrieved results are tagged with
source_type = google_drive. Without this, passing tests could theoretically match non-Drive content. Add a verification step:6. Assert that user B receives no chunks, sources, answers, or citations from user A's Drive document. +6a. Assert that any results returned do not have source_type = google_drive.Also applies to: 158-170
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@auth-app-verification/google_verification_item_8_deliverables.md` around lines 128 - 140, The minimum test scenario currently checks cross-user isolation but does not explicitly verify that matched results are Google Drive content. Update the relevant scenario(s) to add a step that asserts retrieved chunks, sources, answers, or citations are tagged with source_type = google_drive when user A’s Drive document is returned, using the same test flow in the deliverables section so the assertion is tied to the Drive connector behavior.auth-app-verification/google_verification_items_5_and_6_deliverables.md (1)
26-36: 🔒 Security & Privacy | 🔵 Trivial | ⚡ Quick winDocument state parameter validation as a security control.
The actual implementation at
src/connectors/google_drive/oauth.py:168-200enforcesself._flow_state != statebefore token exchange, which prevents CSRF attacks. This security control is not mentioned in the deliverables table but should be included in the OAuth flow description or security controls summary.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@auth-app-verification/google_verification_items_5_and_6_deliverables.md` around lines 26 - 36, Add the missing OAuth state validation security control to the deliverables content by updating the OAuth flow description or security controls summary in the Google Drive verification document. Reference the existing Google Drive OAuth implementation in the oauth flow/state handling logic (the state check before token exchange) and explicitly note that the flow validates the returned state against the stored flow state to prevent CSRF before exchanging the code for tokens.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@auth-app-verification/google_verification_items_5_and_6_deliverables.md`:
- Line 34: The documentation currently claims refresh tokens are encrypted or
securely stored, but the visible implementation in revoke_credentials and the
token_file handling does not show encryption-at-rest. Update the affected
sections of google_verification_items_5_and_6_deliverables.md to describe only
what the code actually does for token storage and revocation, and remove any
unsupported encryption language unless it is backed by a corresponding
implementation in the OAuth flow or storage layer.
- Around line 103-124: The scope declaration in Google verification docs does
not match the actual GoogleDriveOAuth scope set, so align them by updating
either the docs or the implementation to be consistent. Use GoogleDriveOAuth and
its SCOPES definition as the source of truth: either remove the extra auth
scopes from the code or revise the documentation to include all requested
scopes, including drive.metadata.readonly, cloud-identity.groups.readonly,
admin.directory.group.readonly, and the user identity scopes if they remain in
use.
In `@auth-app-verification/verify_authorized_domains_openrag.md`:
- Around line 43-48: The example callback URLs in the table are inconsistent
with the documented Google auth endpoints. Update the examples in
verify_authorized_domains_openrag.md to use one consistent callback path, and
align it with the connector-specific route from the authentication plan rather
than the current /oauth/callback example; make sure the URL examples and the
authorized domain guidance match the same endpoint used by the auth flow.
- Line 16: The OAuth redirect URI documented here conflicts with the upstream
authentication plan and the Google Drive OAuth implementation. Update the
redirect URI path in this document so it matches the production callback used by
the Google Drive connector and the `google_drive_authentication_only_plan`
reference, keeping the `redirect_uri` path consistent with
`src/connectors/google_drive/oauth.py` for domain verification.
- Around line 64-71: The redirect URI example in the Google Drive OAuth setup
introduces a conflicting path variant. Update the documentation to use one
canonical redirect URI consistently across the auth plan and this guide, using
the same path in the relevant OAuth setup sections and any related references.
Check the Google Drive callback example and the authentication plan text
together so only the correct redirect URI format remains.
---
Nitpick comments:
In `@auth-app-verification/google_verification_item_8_deliverables.md`:
- Around line 217-226: The ACL/user/group field description in the index mapping
table is inconsistent with the earlier “Fields to verify” wording. Update the
table entry in this deliverable section to use the same phrasing as the
verification section, specifically referencing the ACL/user/group fields
description so it reads “Exact-match or terms-query compatible” instead of only
“Terms-query compatible.”
- Around line 128-140: The minimum test scenario currently checks cross-user
isolation but does not explicitly verify that matched results are Google Drive
content. Update the relevant scenario(s) to add a step that asserts retrieved
chunks, sources, answers, or citations are tagged with source_type =
google_drive when user A’s Drive document is returned, using the same test flow
in the deliverables section so the assertion is tied to the Drive connector
behavior.
In `@auth-app-verification/google_verification_items_5_and_6_deliverables.md`:
- Around line 26-36: Add the missing OAuth state validation security control to
the deliverables content by updating the OAuth flow description or security
controls summary in the Google Drive verification document. Reference the
existing Google Drive OAuth implementation in the oauth flow/state handling
logic (the state check before token exchange) and explicitly note that the flow
validates the returned state against the stored flow state to prevent CSRF
before exchanging the code for tokens.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: fc999f93-091f-4b51-b3ef-b941e5b723fd
⛔ Files ignored due to path filters (1)
auth-app-verification/Item 2-Final OAuth Scope Strategy for Google Drive Connector Authentication.docxis excluded by!**/*.docx
📒 Files selected for processing (4)
auth-app-verification/google_drive_authentication_only_plan.mdauth-app-verification/google_verification_item_8_deliverables.mdauth-app-verification/google_verification_items_5_and_6_deliverables.mdauth-app-verification/verify_authorized_domains_openrag.md
| | Scope used | Only `https://www.googleapis.com/auth/drive.readonly` | | ||
| | Scope justification | Explain why OpenRAG needs read-only Drive access | | ||
| | Disconnect flow | Show or explain how users disconnect Google Drive | | ||
| | Token handling summary | Explain that refresh tokens are stored securely and not logged | |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | 🏗️ Heavy lift
Token storage encryption claims are unsupported by implementation evidence.
The documentation repeatedly states refresh tokens are "encrypted or securely stored" (lines 34, 72, 168, 218), but the actual revoke_credentials implementation at src/connectors/google_drive/oauth.py:228-238 shows a plain on-disk file (self.token_file) removed via os.remove(). There is no visible encryption-at-rest in the provided implementation. Submitting encryption claims to Google without implementation evidence creates compliance risk if audited.
Also applies to: 72-72, 168-176, 218-226
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@auth-app-verification/google_verification_items_5_and_6_deliverables.md` at
line 34, The documentation currently claims refresh tokens are encrypted or
securely stored, but the visible implementation in revoke_credentials and the
token_file handling does not show encryption-at-rest. Update the affected
sections of google_verification_items_5_and_6_deliverables.md to describe only
what the code actually does for token storage and revocation, and remove any
unsupported encryption language unless it is backed by a corresponding
implementation in the OAuth flow or storage layer.
| ## 2. Scope declaration | ||
|
|
||
| Declare only this scope: | ||
|
|
||
| ```text | ||
| https://www.googleapis.com/auth/drive.readonly | ||
| ``` | ||
|
|
||
| Do **not** include these if this release is connector-only: | ||
|
|
||
| ```text | ||
| openid | ||
| profile | ||
| ``` | ||
|
|
||
| Do **not** include full Drive write scope: | ||
|
|
||
| ```text | ||
| https://www.googleapis.com/auth/drive | ||
| ``` | ||
|
|
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🔴 Critical | 🏗️ Heavy lift
Scope declaration contradicts actual implementation.
The documentation instructs Google to declare only drive.readonly and explicitly excludes openid, email, profile. However, the actual GoogleDriveOAuth implementation requests all of these scopes:
openid,email,profile— listed here as "Do not include" but present inSCOPESdrive.metadata.readonly— required by implementation but omitted from documentationcloud-identity.groups.readonly,admin.directory.group.readonly— present inSCOPES
This cross-file inconsistency between auth-app-verification/google_verification_items_5_and_6_deliverables.md and src/connectors/google_drive/oauth.py:21-33 will cause Google verification rejection if submitted as-is. Either the implementation must be narrowed to match this documentation, or the documentation must accurately describe the full scope set.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@auth-app-verification/google_verification_items_5_and_6_deliverables.md`
around lines 103 - 124, The scope declaration in Google verification docs does
not match the actual GoogleDriveOAuth scope set, so align them by updating
either the docs or the implementation to be consistent. Use GoogleDriveOAuth and
its SCOPES definition as the source of truth: either remove the extra auth
scopes from the code or revise the documentation to include all requested
scopes, including drive.metadata.readonly, cloud-identity.groups.readonly,
admin.directory.group.readonly, and the user identity scopes if they remain in
use.
| | App/homepage URL | `https://openr.ag` or `https://docs.openr.ag` | Yes | | ||
| | Privacy policy URL | `https://openr.ag/privacy` | Yes | | ||
| | Terms of service URL | `https://openr.ag/terms` | Yes, if provided | | ||
| | OAuth redirect URI | `https://app.openr.ag/api/auth/google/callback` | Yes | |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🔴 Critical | ⚡ Quick win
Redirect URI path contradicts upstream authentication plan.
This document shows the OAuth redirect URI as https://app.openr.ag/api/auth/google/callback, but the upstream google_drive_authentication_only_plan.md:200 specifies /connectors/google-drive/oauth/callback. The implementation at src/connectors/google_drive/oauth.py:168-200 uses a parameterized redirect_uri without hardcoding a path. These documents must agree on the production redirect URI path for domain verification to succeed.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@auth-app-verification/verify_authorized_domains_openrag.md` at line 16, The
OAuth redirect URI documented here conflicts with the upstream authentication
plan and the Google Drive OAuth implementation. Update the redirect URI path in
this document so it matches the production callback used by the Google Drive
connector and the `google_drive_authentication_only_plan` reference, keeping the
`redirect_uri` path consistent with `src/connectors/google_drive/oauth.py` for
domain verification.
| | URL used | Authorized domain to add | | ||
| |---|---| | ||
| | `https://app.openr.ag/oauth/callback` | `openr.ag` | | ||
| | `https://docs.openr.ag/privacy` | `openr.ag` | | ||
| | `https://openrag.ibm.com/callback` | `ibm.com` | | ||
| | `https://openrag.cloud.ibm.com/callback` | `ibm.com` or possibly `cloud.ibm.com`, depending on Google Console validation | |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Example callback paths are inconsistent with documented endpoints.
Line 45 shows https://app.openr.ag/oauth/callback which doesn't match either:
- The
/api/auth/google/callbackpath on Line 16, or - The
/connectors/google-drive/oauth/callbackpath from the upstream authentication plan
Use a single, consistent callback path across all documentation. Prefer the connector-specific path from the authentication plan.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@auth-app-verification/verify_authorized_domains_openrag.md` around lines 43 -
48, The example callback URLs in the table are inconsistent with the documented
Google auth endpoints. Update the examples in
verify_authorized_domains_openrag.md to use one consistent callback path, and
align it with the connector-specific route from the authentication plan rather
than the current /oauth/callback example; make sure the URL examples and the
authorized domain guidance match the same endpoint used by the auth flow.
| A cleaner setup is: | ||
|
|
||
| ```text | ||
| Homepage: https://openr.ag | ||
| Privacy policy: https://openr.ag/privacy | ||
| Redirect URI: https://app.openr.ag/api/connectors/google-drive/oauth/callback | ||
| Authorized domain: openr.ag | ||
| ``` |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Redirect URI example introduces another path variant.
Line 69 shows https://app.openr.ag/api/connectors/google-drive/oauth/callback which adds an /api/ prefix not present in the upstream authentication plan's /connectors/google-drive/oauth/callback. Consolidate all documentation to use one canonical redirect URI path. If /api/ is the correct API prefix, update the authentication plan to match; otherwise, remove it here.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@auth-app-verification/verify_authorized_domains_openrag.md` around lines 64 -
71, The redirect URI example in the Google Drive OAuth setup introduces a
conflicting path variant. Update the documentation to use one canonical redirect
URI consistently across the auth plan and this guide, using the same path in the
relevant OAuth setup sections and any related references. Check the Google Drive
callback example and the authentication plan text together so only the correct
redirect URI format remains.
Summary by CodeRabbit