Skip to content

feat: gdrive export and encryption service integration#5250

Open
Sentiaus wants to merge 11 commits into
apache:mainfrom
Sentiaus:gdrive/backend
Open

feat: gdrive export and encryption service integration#5250
Sentiaus wants to merge 11 commits into
apache:mainfrom
Sentiaus:gdrive/backend

Conversation

@Sentiaus
Copy link
Copy Markdown
Contributor

What changes were proposed in this PR?

Adds the backend required for Google Drive OAuth integration.

Schema changes: Adds a new user_oauth_token table (sql/updates/23.sql) to store encrypted OAuth tokens per provider. The provider column (google_drive, etc.) is intentionally generic so future integrations (AWS, Microsoft) can reuse the same table without a schema change. The auth blob is stored as a JWE-encrypted JSON string rather than a raw token.

Token encryption: Adds TokenEncryptionService using jose4j AES-256-GCM (DIRECT key management) to encrypt auth blobs at rest. The encryption key is read from auth.encryption.256-bit-secret in auth.conf, with AUTH_ENCRYPTION_SECRET as the env-var override. This follows the same pattern as the existing JWT secret key.

New endpoints — GoogleDriveAuthResource:

GET /api/auth/google/drive/connect — Returns a Google OAuth authorization URL for the frontend to open in a popup. Accepts a reauth query param; when true, sets prompt=consent to force Google to re-issue a refresh token (used when a previous token has returned invalid_grant). Requires REGULAR or ADMIN role.

GET /api/auth/google/drive/callback — Called by Google's OAuth redirect. Not role-gated (no Authorization header is present on a browser redirect). Authenticates the user via a short-lived JWT in the state query parameter, exchanges the code for tokens, encrypts the auth blob, and upserts into user_oauth_token.

GET /api/auth/google/drive/token — Decrypts the stored auth blob, uses the refresh token to fetch a short-lived access token from Google, and returns it to the frontend. Returns no_refresh_token if no record exists, or invalid_grant if Google rejects the refresh token. Requires REGULAR or ADMIN role.

GET /api/auth/google/config — Exposes clientId and redirectUri to the frontend so the Drive service doesn't need to hardcode them.

Config: Adds google.client-id, google.client-secret, and app-domain to UserSystemConfig and user-system.conf. These must be configured on the Texera GCP project before Drive integration will work.

Any related issues, documentation, discussions?

Closes #4240 (partial — frontend in follow-up PRs)

Google Documentation to enable Google Picker: https://developers.google.com/workspace/drive/picker/guides/overview

How was this PR tested?

  • sbt "Auth/testOnly org.apache.texera.auth.TokenEncryptionServiceSpec" — 2 unit tests covering encrypt/decrypt round-trip and invalid-input error case
  • Backend compiles cleanly: sbt amber/compile
  • The /callback endpoint was tested manually via the full OAuth flow in a local dev environment

Was this PR authored or co-authored using generative AI tooling?

Commit messages and some implementation co-authored with Claude Sonnet 4.6

Sentiaus and others added 3 commits May 26, 2026 23:59
… DB schema

- Add user_oauth_token table to store encrypted OAuth refresh tokens per provider
- Add TokenEncryptionService using jose4j AES-256-GCM for encrypting auth blobs
- Add AuthConfig.encryptionSecretKey reading from auth.encryption.256-bit-secret
- Add GoogleDriveAuthResource with /connect, /callback, and /token endpoints
- Add GoogleAuthResource config endpoint exposing client ID and redirect URI
- Add DriveTokenIssueResponse and GoogleAuthConfigResponse HTTP models
- Wire GoogleDriveAuthResource into TexeraWebApplication and GuestAuthFilter
- Add google.client-id, client-secret, and app-domain to UserSystemConfig
- Update k8s values with new config keys

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 49.07%. Comparing base (ca829ec) to head (dacd149).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #5250      +/-   ##
============================================
- Coverage     49.11%   49.07%   -0.04%     
- Complexity     2378     2380       +2     
============================================
  Files          1051     1050       -1     
  Lines         40342    40301      -41     
  Branches       4277     4266      -11     
============================================
- Hits          19812    19777      -35     
+ Misses        19373    19368       -5     
+ Partials       1157     1156       -1     
Flag Coverage Δ *Carryforward flag
access-control-service 41.89% <ø> (+2.36%) ⬆️
agent-service 33.76% <ø> (ø) Carriedforward from 8a3b777
amber 51.58% <ø> (+<0.01%) ⬆️ Carriedforward from 8a3b777
computing-unit-managing-service 0.00% <ø> (ø)
config-service 0.00% <ø> (ø)
file-service 38.42% <ø> (+0.42%) ⬆️
frontend 40.90% <ø> (-0.16%) ⬇️ Carriedforward from 8a3b777
python 90.79% <ø> (ø) Carriedforward from 8a3b777
workflow-compiling-service 56.81% <ø> (ø)

*This pull request uses carry forward flags. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Sentiaus Sentiaus changed the title feat: GDrive Backend integration feat: gdrive export and encryption service integration May 27, 2026
@Sentiaus Sentiaus mentioned this pull request May 27, 2026
5 tasks
@chenlica chenlica requested a review from xuang7 May 28, 2026 00:12
Copy link
Copy Markdown
Contributor

@xuang7 xuang7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left a few comments. Please follow the formatting instructions in the
contributing guide and fix the formatting issues.

Comment thread amber/src/main/scala/org/apache/texera/web/auth/GuestAuthFilter.scala Outdated
logger.error("Google token exchange failed in callback", e)
Response.status(Response.Status.BAD_GATEWAY).build()
case e: Exception =>
logger.error("Unexpected error in OAuth callback", e)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider returning an error message to the opener window and closing the OAuth popup.

@QueryParam("reauth") @DefaultValue("false") reauth: Boolean
): Response = {
val user = sessionUser.getUser
val state = JwtAuth.jwtToken(jwtClaims(user, TOKEN_EXPIRE_TIME_IN_MINUTES))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should avoid using the normal session JWT as the OAuth state. Since it is still a valid login token before expiration, it may be safer to use a dedicated short-lived OAuth state token instead.

Comment thread bin/k8s/values-development.yaml Outdated
Comment thread bin/k8s/values.yaml Outdated
Comment thread common/config/src/main/resources/auth.conf Outdated
Comment thread common/config/src/main/scala/org/apache/texera/config/AuthConfig.scala Outdated
Comment thread sql/updates/23.sql

try {
val blob = mapper.readTree(TokenEncryptionService.decrypt(record.getAuthBlob))
val refreshToken = blob.get("refreshToken").asText()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest using path("refreshToken").asText("") here to avoid a possible NPE when the field is missing.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuang7 I can add this, but seeing as this is wrapped in a try-catch, I feel like the error is fine/more defined, compared to getting "", sending a request to google and getting an error there.

Sentiaus and others added 2 commits May 28, 2026 00:31
…ogleDriveAuthResource

OAuth state is now a UUID stored in a ConcurrentHashMap with a 10-minute TTL,
consumed exactly once on callback. Removes JwtParser/JwtAuth dependency from
the Drive resource and avoids encoding user info in the callback URL.

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@github-actions github-actions Bot removed the dev label May 28, 2026
xuang7 and others added 5 commits May 28, 2026 13:32
Removed random secret key for eSecretKey
Added default asText("") to avoid NPE
…_token

- Add DELETE /api/auth/google/drive/disconnect to remove stored OAuth token
- Add created_at and updated_at columns to user_oauth_token table
- Set updated_at on token refresh in callback

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common ddl-change Changes to the TexeraDB DDL engine

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Export to external storage

3 participants