Cio shared files 3 by ricofurtado · Pull Request #2012 · langflow-ai/openrag

ricofurtado · 2026-07-03T06:18:09Z

s

Summary by CodeRabbit

New Features
- Added support for shared bucket imports in the file browser flow.
- Shared status is now carried through during ingestion so selected files can be handled appropriately.
Bug Fixes
- Improved file sync handling so previously imported items are classified more accurately when shared buckets are involved.

* feat: add SaaS workspace connector permissions with RBAC gates Introduce admin Connectors Permission settings (cloud-only) backed by workspace_config, enforce availability on connector APIs and OAuth init, and sync the new connectors:manage:access permission at startup without an Alembic migration. OSS/dev OSS brand bypasses policy; add dev role toggle and brand-aware proxy headers for local RBAC testing. * style: ruff autofix (auto) * address CodeRabbit comments and fix backend lint unused imports * fix backend lint * feat: workspace connector permissions with aligned SaaS policy guards Add admin Connectors Permission (workspace_config-backed) with RBAC connectors:manage:access, enforce policy on connector APIs and OAuth init, and keep the permission list independent of the live connectors tab. Consolidate frontend connector-access hooks into useGetConnectorsQuery and settings tab helpers into brand.ts. Fix Connectors Permission tab redirects (RSC dev-brand default, wait for permissionsResolved). Let explicitly enabled connectors override deployment visibility filters; add OPENRAG_DEV_CONNECTOR_POLICY for local OSS dev backend enforcement Snapshot and restore all cached connector query keys in connect/disconnect mutations so optimistic updates stay correct when policy context changes mid-mutation (brand toggle, permissions resolving). * style: ruff autofix (auto) * addressed coderabbit comments * remove mentions of dev only envs in the env.example --------- Co-authored-by: Olfa Maslah <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Edwin Jose <[email protected]>

* Update connector.py * update env in settings * fix webhook url of sharepoint

…bscriptions, change discovery) (#1867) * fix: alias legacy /connectors/google/webhook to google_drive and reload connections store on webhook channel lookup miss * Fix Sharepoint and Google Drive * fix: propagate deleted items from SharePoint/OneDrive webhook delta query Main's delete-event coverage (#1852) expects handle_webhook to return deleted file ids so sync_specific_files can run its deleted-at-source cleanup (get_file_content -> 404 -> delete indexed chunks). The delta implementation skipped deleted items; now deleted files propagate and deleted folders stay excluded. * use logging _config * Update test_webhook_type_alias.py

…nRAG-API-JWT header (#1874) * use fallback header * Notes on new header * raises http exception for missing jwt

…xed without an owner

Improving error return Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

…eries

…ails for shared documents

… related functionality

… documents

…or_files method

* no derivied state * coderabbit comments

What changed: - Renamed the env var read by the backend to OPENSEARCH_NODE_COUNT_CHECK_ENABLED in src/config/settings.py. - Updated Docker Compose to pass OPENSEARCH_NODE_COUNT_CHECK_ENABLED in docker-compose.yml. - Added the canonical flag to the example env file in .env.example. - Documented the flag in the OpenSearch config table in docs/docs/reference/configuration.mdx. - Updated the existing readiness test wording in tests/unit/test_opensearch_wait_node_count.py. - Added a regression test that loads config.settings with OPENSEARCH_NODE_COUNT_CHECK_ENABLED=false and verifies the canonical flag is what the code reads in tests/unit/config/ test_opensearch_node_count_check_enabled.py.

* chore: Segment improvements (#1963) * Update static properties * and knowledge/setting actions * avoid duplicate events * coderabbit comments

* fix the cleanup timeout issue for the embedding step * final step fix

… for permission checks

… status reporting

…OPENSEARCH_NODE_COUNT_CHECK

* fix the cleanup timeout issue for the embedding step * final step fix

* fix max tokens when splitting for langflow-less ingestion * fix coderabbit picks * style: ruff autofix (auto) * added non lf ingestion test --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

* upgrade OpenSearch to 3.6.0 * fixed PR review comments * fix nitpicks

What changed: - Renamed the env var read by the backend to OPENSEARCH_NODE_COUNT_CHECK_ENABLED in src/config/settings.py. - Updated Docker Compose to pass OPENSEARCH_NODE_COUNT_CHECK_ENABLED in docker-compose.yml. - Added the canonical flag to the example env file in .env.example. - Documented the flag in the OpenSearch config table in docs/docs/reference/configuration.mdx. - Updated the existing readiness test wording in tests/unit/test_opensearch_wait_node_count.py. - Added a regression test that loads config.settings with OPENSEARCH_NODE_COUNT_CHECK_ENABLED=false and verifies the canonical flag is what the code reads in tests/unit/config/ test_opensearch_node_count_check_enabled.py.

* fix the cleanup timeout issue for the embedding step * final step fix

* ascii support for ingest files * update the flow json * style: ruff autofix (auto) * Update langflow_headers.py * Update test_langflow_ingest_callback.py * Update test-ci.yml * Update test_onboarding_sample_docs.py * style: ruff autofix (auto) * Update test_ascii_safe_header_value.py * Update langflow_file_service.py * re trigger commit * fix: purge config_manager singleton in non-langflow ingestion test The _purge_modules() list was missing "config.config_manager", so the ConfigManager singleton retained its cached _config from a previous test. Subsequent calls to get_openrag_config() returned the old config where disable_ingest_with_langflow=False, causing the router to use the Langflow path even though DISABLE_INGEST_WITH_LANGFLOW=true was set in the environment. Adding config.config_manager to the purge list forces a fresh load on the next import, picking up the correct env-var values. * fix: use hash_id to derive expected document_id in CSV ingestion test DocumentFileProcessor ignores Docling's binary_hash field and computes document_id = hash_id(file_path) from the actual file content. The test was asserting against the literal mock value "sha-csv-integration-123" which never appears in the indexed documents, giving 0 hits and a false failure. Now the test derives expected_document_id via hash_id(csv_path) to match what the production code actually indexes. --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>

coderabbitai · 2026-07-03T06:18:32Z

Walkthrough

This PR threads an optional shared boolean through the connector sync flow: FileBrowserDialog accepts a shared prop and conditionally includes it in the sync mutation payload, SharedBucketView forwards it based on ingest settings, and the backend connector_sync endpoint passes ingest_settings and shared to get_synced_file_ids_for_connector.

Changes

Shared flag propagation for connector sync

Layer / File(s)	Summary
FileBrowserDialog shared prop and mutation payload `frontend/components/file-browser-dialog.tsx`, `frontend/components/connectors/shared-bucket-view.tsx`	Adds optional `shared?: boolean` to `FileBrowserDialogProps`, destructures it, conditionally adds it to the `syncMutation.mutateAsync` payload, and `SharedBucketView` forwards `ingestSettings.shared ?? false` (or `undefined`) based on `showShared`.
Backend connector_sync forwards shared flag `src/api/connectors.py`	`connector_sync` passes `ingest_settings=body.settings` and `shared=body.shared` into `get_synced_file_ids_for_connector(...)` during bucket-filter reconciliation.

Estimated code review effort: 1 (Trivial) | ~5 minutes

Suggested labels: enhancement

Suggested reviewers: edwinjosechittilappilly, lucaseduoli, mfortman11

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Title check	❓ Inconclusive	The title is related to shared-file work, but it is too vague and branch-like to clearly describe the change.	Rename it to a concise description of the main change, such as supporting shared files in connector sync and the file browser.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch cio-shared-files-3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

github-actions · 2026-07-03T06:27:49Z

React Doctor found 1 new issue in 1 file · 1 warning · score 87 / 100 (Great) · 0 fixed · vs main

1 warning

components/file-browser-dialog.tsx

⚠️ L157 Missing effect dependencies exhaustive-deps

_{Reviewed by React Doctor for commit 6cd0631. See inline comments for fixes.}

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/models/processors.py (1)
268-315: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

_delete_connector_chunks should match both shared and private ownership states for the same stable file id.

resolve_shared_owner_fields writes connector chunks with owner=None when shared=True, but this cleanup only targets one ownership shape at a time. If the same connector file is re-synced after its shared flag changes, deleted-at-source and rename cleanup can leave stale chunks behind under the previous ownership state, causing duplicate hits.

Use the same owner == user OR owner missing filter here, as in build_replace_filename_query, so the stable file_id is cleaned up regardless of the prior mode.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/processors.py` around lines 268 - 315, The ownership filter in
_delete_connector_chunks only matches one state at a time, so stale chunks can
survive when a connector file switches between shared and private modes. Update
the query-building logic in _delete_connector_chunks to use the same
owner-matching behavior as build_replace_filename_query: for a given stable
file_id, match both owner == owner_user_id and owner missing/None so cleanup
removes chunks regardless of the prior shared flag. Keep the existing
document_id/connector_file_id matching, but make the owner filter cover both
ownership shapes.

🧹 Nitpick comments (2)

src/api/v1/documents.py (1)
185-212: 🗄️ Data Integrity & Integration | 🔵 Trivial | 🏗️ Heavy lift

Type the new aggregate response outside the route handler.

This /v1 handler now constructs a new per-file response envelope inline. Please move the aggregation shape into a core/service helper and expose it through a Pydantic response model so the stable API schema/SDK contract cannot drift from the implementation. As per path instructions, "src/api/**/*.py: FastAPI routes. Public/stable endpoints belong under src/api/v1/. Verify dependency injection via src/dependencies.py, response model typing, and correct HTTP status codes. No business logic in route handlers."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/api/v1/documents.py` around lines 185 - 212, The /v1 documents route is
building a new aggregate response inline in the handler, which should be moved
out of the FastAPI route and into a core/service helper with a Pydantic response
model. Update the delete-by-filename flow around
delete_documents_by_filename_core and the aggregation in the route to return a
typed response object instead of assembling JSONResponse manually, so the public
schema and SDK contract stay stable. Keep the route thin by delegating the
per-file aggregation and status calculation to the service layer, and ensure the
handler uses the declared response model and correct HTTP status code.
Source: Path instructions
tests/unit/test_document_delete_by_id.py (1)
140-179: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert the exact combined-scope delete IDs.

This test proves the query and count, but not that both returned IDs are actually passed to delete. Add the same delete_calls assertion used in the anonymous-only test to lock down the owned + ownerless path.
Proposed test assertion
     assert backend_opensearch_client.search_calls[0]["body"]["query"] == {
         "bool": {
             "filter": [
                 {"term": {"filename": "shared.pdf"}},
@@
             ]
         }
     }
+    assert backend_opensearch_client.delete_calls == [
+        {"index": "documents", "id": "owned-chunk", "refresh": True},
+        {"index": "documents", "id": "anonymous-chunk", "refresh": True},
+    ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_document_delete_by_id.py` around lines 140 - 179, The test
for delete_documents_by_filename_core should also verify that both matched
document IDs are actually deleted, not just counted. Extend
test_delete_documents_by_filename_combines_owned_and_anonymous_scopes to assert
backend_opensearch_client.delete_calls contains the expected owned and anonymous
chunk IDs, using the same delete_calls pattern already used in the
anonymous-only test, so the combined-scope path is fully covered.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@flows/ingestion_flow.json`:
- Line 3754: The embedding model list now only includes text-embedding-3-large,
which makes search() skip existing text-embedding-3-small vectors during query
planning. Update the embedding configuration in the ingestion flow to keep
text-embedding-3-small available alongside text-embedding-3-large until old
vectors are migrated, so the matching embedding object still exists for
previously indexed documents. Keep the change localized to the embedding/model
mapping used by search() and the ingestion flow config.

In `@tests/unit/test_shared_flag.py`:
- Around line 181-237: The two shared-flag tests are failing before the guard is
exercised because `connector_sync` is invoked without its required `request`
argument. Update both calls in `test_non_cos_connector_rejects_shared_true` and
`test_ibm_cos_shared_true_does_not_hit_guard` to pass a mock `Request` object,
using the same `connector_sync` setup already in the tests so the shared-flag
logic can be reached.

---

Outside diff comments:
In `@src/models/processors.py`:
- Around line 268-315: The ownership filter in _delete_connector_chunks only
matches one state at a time, so stale chunks can survive when a connector file
switches between shared and private modes. Update the query-building logic in
_delete_connector_chunks to use the same owner-matching behavior as
build_replace_filename_query: for a given stable file_id, match both owner ==
owner_user_id and owner missing/None so cleanup removes chunks regardless of the
prior shared flag. Keep the existing document_id/connector_file_id matching, but
make the owner filter cover both ownership shapes.

---

Nitpick comments:
In `@src/api/v1/documents.py`:
- Around line 185-212: The /v1 documents route is building a new aggregate
response inline in the handler, which should be moved out of the FastAPI route
and into a core/service helper with a Pydantic response model. Update the
delete-by-filename flow around delete_documents_by_filename_core and the
aggregation in the route to return a typed response object instead of assembling
JSONResponse manually, so the public schema and SDK contract stay stable. Keep
the route thin by delegating the per-file aggregation and status calculation to
the service layer, and ensure the handler uses the declared response model and
correct HTTP status code.

In `@tests/unit/test_document_delete_by_id.py`:
- Around line 140-179: The test for delete_documents_by_filename_core should
also verify that both matched document IDs are actually deleted, not just
counted. Extend
test_delete_documents_by_filename_combines_owned_and_anonymous_scopes to assert
backend_opensearch_client.delete_calls contains the expected owned and anonymous
chunk IDs, using the same delete_calls pattern already used in the
anonymous-only test, so the combined-scope path is fully covered.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8915c68e-2a40-4582-99bd-88f0ee53fd78

📥 Commits

Reviewing files that changed from the base of the PR and between 4f104ec and 65c2c56.

📒 Files selected for processing (35)

.github/workflows/test-ci.yml
alembic/versions/0007_add_knowledge_delete_anonymous.py
flows/ingestion_flow.json
frontend/app/api/mutations/useSyncConnector.ts
frontend/components/cloud-picker/ingest-settings.tsx
frontend/components/cloud-picker/types.ts
frontend/components/connectors/shared-bucket-view.tsx
frontend/components/file-browser-dialog.tsx
frontend/components/knowledge-actions-dropdown.tsx
frontend/components/knowledge-batch-actions-bar.tsx
frontend/enhancements/connectors/ibm-cos/components/bucket-view.tsx
src/api/connectors.py
src/api/documents.py
src/api/v1/documents.py
src/config/settings.py
src/connectors/service.py
src/db/seed.py
src/dependencies.py
src/models/processors.py
src/services/document_index_writer.py
src/services/langflow_file_service.py
src/utils/langflow_headers.py
src/utils/opensearch_queries.py
tests/integration/core/test_non_langflow_ingestion.py
tests/integration/core/test_onboarding_sample_docs.py
tests/integration/core/test_shared_flag_dls.py
tests/unit/db/migrations/test_add_anonymous_delete_permission.py
tests/unit/db/test_rbac_seed_idempotency.py
tests/unit/dependencies/test_rbac_kill_switch.py
tests/unit/dependencies/test_require_api_key_permission.py
tests/unit/dependencies/test_require_permission.py
tests/unit/test_ascii_safe_header_value.py
tests/unit/test_document_delete_by_id.py
tests/unit/test_langflow_ingest_callback.py
tests/unit/test_shared_flag.py

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/models/processors.py (1)
268-315: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

_delete_connector_chunks should match both shared and private ownership states for the same stable file id.

resolve_shared_owner_fields writes connector chunks with owner=None when shared=True, but this cleanup only targets one ownership shape at a time. If the same connector file is re-synced after its shared flag changes, deleted-at-source and rename cleanup can leave stale chunks behind under the previous ownership state, causing duplicate hits.

Use the same owner == user OR owner missing filter here, as in build_replace_filename_query, so the stable file_id is cleaned up regardless of the prior mode.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/models/processors.py` around lines 268 - 315, The ownership filter in
_delete_connector_chunks only matches one state at a time, so stale chunks can
survive when a connector file switches between shared and private modes. Update
the query-building logic in _delete_connector_chunks to use the same
owner-matching behavior as build_replace_filename_query: for a given stable
file_id, match both owner == owner_user_id and owner missing/None so cleanup
removes chunks regardless of the prior shared flag. Keep the existing
document_id/connector_file_id matching, but make the owner filter cover both
ownership shapes.

🧹 Nitpick comments (2)

src/api/v1/documents.py (1)
185-212: 🗄️ Data Integrity & Integration | 🔵 Trivial | 🏗️ Heavy lift

Type the new aggregate response outside the route handler.

This /v1 handler now constructs a new per-file response envelope inline. Please move the aggregation shape into a core/service helper and expose it through a Pydantic response model so the stable API schema/SDK contract cannot drift from the implementation. As per path instructions, "src/api/**/*.py: FastAPI routes. Public/stable endpoints belong under src/api/v1/. Verify dependency injection via src/dependencies.py, response model typing, and correct HTTP status codes. No business logic in route handlers."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/api/v1/documents.py` around lines 185 - 212, The /v1 documents route is
building a new aggregate response inline in the handler, which should be moved
out of the FastAPI route and into a core/service helper with a Pydantic response
model. Update the delete-by-filename flow around
delete_documents_by_filename_core and the aggregation in the route to return a
typed response object instead of assembling JSONResponse manually, so the public
schema and SDK contract stay stable. Keep the route thin by delegating the
per-file aggregation and status calculation to the service layer, and ensure the
handler uses the declared response model and correct HTTP status code.
Source: Path instructions
tests/unit/test_document_delete_by_id.py (1)
140-179: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick win

Assert the exact combined-scope delete IDs.

This test proves the query and count, but not that both returned IDs are actually passed to delete. Add the same delete_calls assertion used in the anonymous-only test to lock down the owned + ownerless path.
Proposed test assertion
     assert backend_opensearch_client.search_calls[0]["body"]["query"] == {
         "bool": {
             "filter": [
                 {"term": {"filename": "shared.pdf"}},
@@
             ]
         }
     }
+    assert backend_opensearch_client.delete_calls == [
+        {"index": "documents", "id": "owned-chunk", "refresh": True},
+        {"index": "documents", "id": "anonymous-chunk", "refresh": True},
+    ]
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_document_delete_by_id.py` around lines 140 - 179, The test
for delete_documents_by_filename_core should also verify that both matched
document IDs are actually deleted, not just counted. Extend
test_delete_documents_by_filename_combines_owned_and_anonymous_scopes to assert
backend_opensearch_client.delete_calls contains the expected owned and anonymous
chunk IDs, using the same delete_calls pattern already used in the
anonymous-only test, so the combined-scope path is fully covered.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@flows/ingestion_flow.json`:
- Line 3754: The embedding model list now only includes text-embedding-3-large,
which makes search() skip existing text-embedding-3-small vectors during query
planning. Update the embedding configuration in the ingestion flow to keep
text-embedding-3-small available alongside text-embedding-3-large until old
vectors are migrated, so the matching embedding object still exists for
previously indexed documents. Keep the change localized to the embedding/model
mapping used by search() and the ingestion flow config.

In `@tests/unit/test_shared_flag.py`:
- Around line 181-237: The two shared-flag tests are failing before the guard is
exercised because `connector_sync` is invoked without its required `request`
argument. Update both calls in `test_non_cos_connector_rejects_shared_true` and
`test_ibm_cos_shared_true_does_not_hit_guard` to pass a mock `Request` object,
using the same `connector_sync` setup already in the tests so the shared-flag
logic can be reached.

---

Outside diff comments:
In `@src/models/processors.py`:
- Around line 268-315: The ownership filter in _delete_connector_chunks only
matches one state at a time, so stale chunks can survive when a connector file
switches between shared and private modes. Update the query-building logic in
_delete_connector_chunks to use the same owner-matching behavior as
build_replace_filename_query: for a given stable file_id, match both owner ==
owner_user_id and owner missing/None so cleanup removes chunks regardless of the
prior shared flag. Keep the existing document_id/connector_file_id matching, but
make the owner filter cover both ownership shapes.

---

Nitpick comments:
In `@src/api/v1/documents.py`:
- Around line 185-212: The /v1 documents route is building a new aggregate
response inline in the handler, which should be moved out of the FastAPI route
and into a core/service helper with a Pydantic response model. Update the
delete-by-filename flow around delete_documents_by_filename_core and the
aggregation in the route to return a typed response object instead of assembling
JSONResponse manually, so the public schema and SDK contract stay stable. Keep
the route thin by delegating the per-file aggregation and status calculation to
the service layer, and ensure the handler uses the declared response model and
correct HTTP status code.

In `@tests/unit/test_document_delete_by_id.py`:
- Around line 140-179: The test for delete_documents_by_filename_core should
also verify that both matched document IDs are actually deleted, not just
counted. Extend
test_delete_documents_by_filename_combines_owned_and_anonymous_scopes to assert
backend_opensearch_client.delete_calls contains the expected owned and anonymous
chunk IDs, using the same delete_calls pattern already used in the
anonymous-only test, so the combined-scope path is fully covered.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 8915c68e-2a40-4582-99bd-88f0ee53fd78

📥 Commits

Reviewing files that changed from the base of the PR and between 4f104ec and 65c2c56.

📒 Files selected for processing (35)

.github/workflows/test-ci.yml
alembic/versions/0007_add_knowledge_delete_anonymous.py
flows/ingestion_flow.json
frontend/app/api/mutations/useSyncConnector.ts
frontend/components/cloud-picker/ingest-settings.tsx
frontend/components/cloud-picker/types.ts
frontend/components/connectors/shared-bucket-view.tsx
frontend/components/file-browser-dialog.tsx
frontend/components/knowledge-actions-dropdown.tsx
frontend/components/knowledge-batch-actions-bar.tsx
frontend/enhancements/connectors/ibm-cos/components/bucket-view.tsx
src/api/connectors.py
src/api/documents.py
src/api/v1/documents.py
src/config/settings.py
src/connectors/service.py
src/db/seed.py
src/dependencies.py
src/models/processors.py
src/services/document_index_writer.py
src/services/langflow_file_service.py
src/utils/langflow_headers.py
src/utils/opensearch_queries.py
tests/integration/core/test_non_langflow_ingestion.py
tests/integration/core/test_onboarding_sample_docs.py
tests/integration/core/test_shared_flag_dls.py
tests/unit/db/migrations/test_add_anonymous_delete_permission.py
tests/unit/db/test_rbac_seed_idempotency.py
tests/unit/dependencies/test_rbac_kill_switch.py
tests/unit/dependencies/test_require_api_key_permission.py
tests/unit/dependencies/test_require_permission.py
tests/unit/test_ascii_safe_header_value.py
tests/unit/test_document_delete_by_id.py
tests/unit/test_langflow_ingest_callback.py
tests/unit/test_shared_flag.py

🛑 Comments failed to post (2)

flows/ingestion_flow.json (1)

3754-3754: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Map the relevant file and locate the referenced nodes/strings.
git ls-files flows/ingestion_flow.json
rg -n '"name": "text-embedding-3-(small|large)"|EmbeddingModel-|chunk_embedding_|OpenSearch' flows/ingestion_flow.json

# Inspect the relevant line ranges around the three embedding model nodes.
sed -n '3720,3785p' flows/ingestion_flow.json
sed -n '4250,4325p' flows/ingestion_flow.json
sed -n '4775,4845p' flows/ingestion_flow.json