Cio shared files 3#2012
Conversation
* feat: add SaaS workspace connector permissions with RBAC gates Introduce admin Connectors Permission settings (cloud-only) backed by workspace_config, enforce availability on connector APIs and OAuth init, and sync the new connectors:manage:access permission at startup without an Alembic migration. OSS/dev OSS brand bypasses policy; add dev role toggle and brand-aware proxy headers for local RBAC testing. * style: ruff autofix (auto) * address CodeRabbit comments and fix backend lint unused imports * fix backend lint * feat: workspace connector permissions with aligned SaaS policy guards Add admin Connectors Permission (workspace_config-backed) with RBAC connectors:manage:access, enforce policy on connector APIs and OAuth init, and keep the permission list independent of the live connectors tab. Consolidate frontend connector-access hooks into useGetConnectorsQuery and settings tab helpers into brand.ts. Fix Connectors Permission tab redirects (RSC dev-brand default, wait for permissionsResolved). Let explicitly enabled connectors override deployment visibility filters; add OPENRAG_DEV_CONNECTOR_POLICY for local OSS dev backend enforcement Snapshot and restore all cached connector query keys in connect/disconnect mutations so optimistic updates stay correct when policy context changes mid-mutation (brand toggle, permissions resolving). * style: ruff autofix (auto) * addressed coderabbit comments * remove mentions of dev only envs in the env.example --------- Co-authored-by: Olfa Maslah <[email protected]> Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com> Co-authored-by: Edwin Jose <[email protected]>
* Update connector.py * update env in settings * fix webhook url of sharepoint
…bscriptions, change discovery) (#1867) * fix: alias legacy /connectors/google/webhook to google_drive and reload connections store on webhook channel lookup miss * Fix Sharepoint and Google Drive * fix: propagate deleted items from SharePoint/OneDrive webhook delta query Main's delete-event coverage (#1852) expects handle_webhook to return deleted file ids so sync_specific_files can run its deleted-at-source cleanup (get_file_content -> 404 -> delete indexed chunks). The delta implementation skipped deleted items; now deleted files propagate and deleted folders stay excluded. * use logging _config * Update test_webhook_type_alias.py
…nRAG-API-JWT header (#1874) * use fallback header * Notes on new header * raises http exception for missing jwt
…xed without an owner
Improving error return Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…ails for shared documents
… related functionality
* no derivied state * coderabbit comments
What changed:
- Renamed the env var read by the backend to OPENSEARCH_NODE_COUNT_CHECK_ENABLED in src/config/settings.py.
- Updated Docker Compose to pass OPENSEARCH_NODE_COUNT_CHECK_ENABLED in docker-compose.yml.
- Added the canonical flag to the example env file in .env.example.
- Documented the flag in the OpenSearch config table in docs/docs/reference/configuration.mdx.
- Updated the existing readiness test wording in tests/unit/test_opensearch_wait_node_count.py.
- Added a regression test that loads config.settings with OPENSEARCH_NODE_COUNT_CHECK_ENABLED=false and verifies the canonical flag is what the code reads in tests/unit/config/
test_opensearch_node_count_check_enabled.py.
* chore: Segment improvements (#1963) * Update static properties * and knowledge/setting actions * avoid duplicate events * coderabbit comments
* fix the cleanup timeout issue for the embedding step * final step fix
… for permission checks
… status reporting
…OPENSEARCH_NODE_COUNT_CHECK
* fix the cleanup timeout issue for the embedding step * final step fix
* fix max tokens when splitting for langflow-less ingestion * fix coderabbit picks * style: ruff autofix (auto) * added non lf ingestion test --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
* upgrade OpenSearch to 3.6.0 * fixed PR review comments * fix nitpicks
What changed:
- Renamed the env var read by the backend to OPENSEARCH_NODE_COUNT_CHECK_ENABLED in src/config/settings.py.
- Updated Docker Compose to pass OPENSEARCH_NODE_COUNT_CHECK_ENABLED in docker-compose.yml.
- Added the canonical flag to the example env file in .env.example.
- Documented the flag in the OpenSearch config table in docs/docs/reference/configuration.mdx.
- Updated the existing readiness test wording in tests/unit/test_opensearch_wait_node_count.py.
- Added a regression test that loads config.settings with OPENSEARCH_NODE_COUNT_CHECK_ENABLED=false and verifies the canonical flag is what the code reads in tests/unit/config/
test_opensearch_node_count_check_enabled.py.
* fix the cleanup timeout issue for the embedding step * final step fix
* ascii support for ingest files * update the flow json * style: ruff autofix (auto) * Update langflow_headers.py * Update test_langflow_ingest_callback.py * Update test-ci.yml * Update test_onboarding_sample_docs.py * style: ruff autofix (auto) * Update test_ascii_safe_header_value.py * Update langflow_file_service.py * re trigger commit * fix: purge config_manager singleton in non-langflow ingestion test The _purge_modules() list was missing "config.config_manager", so the ConfigManager singleton retained its cached _config from a previous test. Subsequent calls to get_openrag_config() returned the old config where disable_ingest_with_langflow=False, causing the router to use the Langflow path even though DISABLE_INGEST_WITH_LANGFLOW=true was set in the environment. Adding config.config_manager to the purge list forces a fresh load on the next import, picking up the correct env-var values. * fix: use hash_id to derive expected document_id in CSV ingestion test DocumentFileProcessor ignores Docling's binary_hash field and computes document_id = hash_id(file_path) from the actual file content. The test was asserting against the literal mock value "sha-csv-integration-123" which never appears in the indexed documents, giving 0 hits and a false failure. Now the test derives expected_document_id via hash_id(csv_path) to match what the production code actually indexes. --------- Co-authored-by: autofix-ci[bot] <114827586+autofix-ci[bot]@users.noreply.github.com>
WalkthroughThis PR threads an optional ChangesShared flag propagation for connector sync
Estimated code review effort: 1 (Trivial) | ~5 minutes Suggested labels: Suggested reviewers: 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
React Doctor found 1 new issue in 1 file · 1 warning · score 87 / 100 (Great) · 0 fixed · vs Reviewed by React Doctor for commit |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/models/processors.py (1)
268-315: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
_delete_connector_chunksshould match both shared and private ownership states for the same stable file id.
resolve_shared_owner_fieldswrites connector chunks withowner=Nonewhenshared=True, but this cleanup only targets one ownership shape at a time. If the same connector file is re-synced after itssharedflag changes, deleted-at-source and rename cleanup can leave stale chunks behind under the previous ownership state, causing duplicate hits.Use the same
owner == user OR owner missingfilter here, as inbuild_replace_filename_query, so the stablefile_idis cleaned up regardless of the prior mode.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/models/processors.py` around lines 268 - 315, The ownership filter in _delete_connector_chunks only matches one state at a time, so stale chunks can survive when a connector file switches between shared and private modes. Update the query-building logic in _delete_connector_chunks to use the same owner-matching behavior as build_replace_filename_query: for a given stable file_id, match both owner == owner_user_id and owner missing/None so cleanup removes chunks regardless of the prior shared flag. Keep the existing document_id/connector_file_id matching, but make the owner filter cover both ownership shapes.
🧹 Nitpick comments (2)
src/api/v1/documents.py (1)
185-212: 🗄️ Data Integrity & Integration | 🔵 Trivial | 🏗️ Heavy liftType the new aggregate response outside the route handler.
This
/v1handler now constructs a new per-file response envelope inline. Please move the aggregation shape into a core/service helper and expose it through a Pydantic response model so the stable API schema/SDK contract cannot drift from the implementation. As per path instructions, "src/api/**/*.py: FastAPI routes. Public/stable endpoints belong under src/api/v1/. Verify dependency injection via src/dependencies.py, response model typing, and correct HTTP status codes. No business logic in route handlers."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/api/v1/documents.py` around lines 185 - 212, The /v1 documents route is building a new aggregate response inline in the handler, which should be moved out of the FastAPI route and into a core/service helper with a Pydantic response model. Update the delete-by-filename flow around delete_documents_by_filename_core and the aggregation in the route to return a typed response object instead of assembling JSONResponse manually, so the public schema and SDK contract stay stable. Keep the route thin by delegating the per-file aggregation and status calculation to the service layer, and ensure the handler uses the declared response model and correct HTTP status code.Source: Path instructions
tests/unit/test_document_delete_by_id.py (1)
140-179: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert the exact combined-scope delete IDs.
This test proves the query and count, but not that both returned IDs are actually passed to delete. Add the same
delete_callsassertion used in the anonymous-only test to lock down the owned + ownerless path.Proposed test assertion
assert backend_opensearch_client.search_calls[0]["body"]["query"] == { "bool": { "filter": [ {"term": {"filename": "shared.pdf"}}, @@ ] } } + assert backend_opensearch_client.delete_calls == [ + {"index": "documents", "id": "owned-chunk", "refresh": True}, + {"index": "documents", "id": "anonymous-chunk", "refresh": True}, + ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/test_document_delete_by_id.py` around lines 140 - 179, The test for delete_documents_by_filename_core should also verify that both matched document IDs are actually deleted, not just counted. Extend test_delete_documents_by_filename_combines_owned_and_anonymous_scopes to assert backend_opensearch_client.delete_calls contains the expected owned and anonymous chunk IDs, using the same delete_calls pattern already used in the anonymous-only test, so the combined-scope path is fully covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@flows/ingestion_flow.json`:
- Line 3754: The embedding model list now only includes text-embedding-3-large,
which makes search() skip existing text-embedding-3-small vectors during query
planning. Update the embedding configuration in the ingestion flow to keep
text-embedding-3-small available alongside text-embedding-3-large until old
vectors are migrated, so the matching embedding object still exists for
previously indexed documents. Keep the change localized to the embedding/model
mapping used by search() and the ingestion flow config.
In `@tests/unit/test_shared_flag.py`:
- Around line 181-237: The two shared-flag tests are failing before the guard is
exercised because `connector_sync` is invoked without its required `request`
argument. Update both calls in `test_non_cos_connector_rejects_shared_true` and
`test_ibm_cos_shared_true_does_not_hit_guard` to pass a mock `Request` object,
using the same `connector_sync` setup already in the tests so the shared-flag
logic can be reached.
---
Outside diff comments:
In `@src/models/processors.py`:
- Around line 268-315: The ownership filter in _delete_connector_chunks only
matches one state at a time, so stale chunks can survive when a connector file
switches between shared and private modes. Update the query-building logic in
_delete_connector_chunks to use the same owner-matching behavior as
build_replace_filename_query: for a given stable file_id, match both owner ==
owner_user_id and owner missing/None so cleanup removes chunks regardless of the
prior shared flag. Keep the existing document_id/connector_file_id matching, but
make the owner filter cover both ownership shapes.
---
Nitpick comments:
In `@src/api/v1/documents.py`:
- Around line 185-212: The /v1 documents route is building a new aggregate
response inline in the handler, which should be moved out of the FastAPI route
and into a core/service helper with a Pydantic response model. Update the
delete-by-filename flow around delete_documents_by_filename_core and the
aggregation in the route to return a typed response object instead of assembling
JSONResponse manually, so the public schema and SDK contract stay stable. Keep
the route thin by delegating the per-file aggregation and status calculation to
the service layer, and ensure the handler uses the declared response model and
correct HTTP status code.
In `@tests/unit/test_document_delete_by_id.py`:
- Around line 140-179: The test for delete_documents_by_filename_core should
also verify that both matched document IDs are actually deleted, not just
counted. Extend
test_delete_documents_by_filename_combines_owned_and_anonymous_scopes to assert
backend_opensearch_client.delete_calls contains the expected owned and anonymous
chunk IDs, using the same delete_calls pattern already used in the
anonymous-only test, so the combined-scope path is fully covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8915c68e-2a40-4582-99bd-88f0ee53fd78
📒 Files selected for processing (35)
.github/workflows/test-ci.ymlalembic/versions/0007_add_knowledge_delete_anonymous.pyflows/ingestion_flow.jsonfrontend/app/api/mutations/useSyncConnector.tsfrontend/components/cloud-picker/ingest-settings.tsxfrontend/components/cloud-picker/types.tsfrontend/components/connectors/shared-bucket-view.tsxfrontend/components/file-browser-dialog.tsxfrontend/components/knowledge-actions-dropdown.tsxfrontend/components/knowledge-batch-actions-bar.tsxfrontend/enhancements/connectors/ibm-cos/components/bucket-view.tsxsrc/api/connectors.pysrc/api/documents.pysrc/api/v1/documents.pysrc/config/settings.pysrc/connectors/service.pysrc/db/seed.pysrc/dependencies.pysrc/models/processors.pysrc/services/document_index_writer.pysrc/services/langflow_file_service.pysrc/utils/langflow_headers.pysrc/utils/opensearch_queries.pytests/integration/core/test_non_langflow_ingestion.pytests/integration/core/test_onboarding_sample_docs.pytests/integration/core/test_shared_flag_dls.pytests/unit/db/migrations/test_add_anonymous_delete_permission.pytests/unit/db/test_rbac_seed_idempotency.pytests/unit/dependencies/test_rbac_kill_switch.pytests/unit/dependencies/test_require_api_key_permission.pytests/unit/dependencies/test_require_permission.pytests/unit/test_ascii_safe_header_value.pytests/unit/test_document_delete_by_id.pytests/unit/test_langflow_ingest_callback.pytests/unit/test_shared_flag.py
There was a problem hiding this comment.
Caution
Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/models/processors.py (1)
268-315: 🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
_delete_connector_chunksshould match both shared and private ownership states for the same stable file id.
resolve_shared_owner_fieldswrites connector chunks withowner=Nonewhenshared=True, but this cleanup only targets one ownership shape at a time. If the same connector file is re-synced after itssharedflag changes, deleted-at-source and rename cleanup can leave stale chunks behind under the previous ownership state, causing duplicate hits.Use the same
owner == user OR owner missingfilter here, as inbuild_replace_filename_query, so the stablefile_idis cleaned up regardless of the prior mode.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/models/processors.py` around lines 268 - 315, The ownership filter in _delete_connector_chunks only matches one state at a time, so stale chunks can survive when a connector file switches between shared and private modes. Update the query-building logic in _delete_connector_chunks to use the same owner-matching behavior as build_replace_filename_query: for a given stable file_id, match both owner == owner_user_id and owner missing/None so cleanup removes chunks regardless of the prior shared flag. Keep the existing document_id/connector_file_id matching, but make the owner filter cover both ownership shapes.
🧹 Nitpick comments (2)
src/api/v1/documents.py (1)
185-212: 🗄️ Data Integrity & Integration | 🔵 Trivial | 🏗️ Heavy liftType the new aggregate response outside the route handler.
This
/v1handler now constructs a new per-file response envelope inline. Please move the aggregation shape into a core/service helper and expose it through a Pydantic response model so the stable API schema/SDK contract cannot drift from the implementation. As per path instructions, "src/api/**/*.py: FastAPI routes. Public/stable endpoints belong under src/api/v1/. Verify dependency injection via src/dependencies.py, response model typing, and correct HTTP status codes. No business logic in route handlers."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/api/v1/documents.py` around lines 185 - 212, The /v1 documents route is building a new aggregate response inline in the handler, which should be moved out of the FastAPI route and into a core/service helper with a Pydantic response model. Update the delete-by-filename flow around delete_documents_by_filename_core and the aggregation in the route to return a typed response object instead of assembling JSONResponse manually, so the public schema and SDK contract stay stable. Keep the route thin by delegating the per-file aggregation and status calculation to the service layer, and ensure the handler uses the declared response model and correct HTTP status code.Source: Path instructions
tests/unit/test_document_delete_by_id.py (1)
140-179: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winAssert the exact combined-scope delete IDs.
This test proves the query and count, but not that both returned IDs are actually passed to delete. Add the same
delete_callsassertion used in the anonymous-only test to lock down the owned + ownerless path.Proposed test assertion
assert backend_opensearch_client.search_calls[0]["body"]["query"] == { "bool": { "filter": [ {"term": {"filename": "shared.pdf"}}, @@ ] } } + assert backend_opensearch_client.delete_calls == [ + {"index": "documents", "id": "owned-chunk", "refresh": True}, + {"index": "documents", "id": "anonymous-chunk", "refresh": True}, + ]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/test_document_delete_by_id.py` around lines 140 - 179, The test for delete_documents_by_filename_core should also verify that both matched document IDs are actually deleted, not just counted. Extend test_delete_documents_by_filename_combines_owned_and_anonymous_scopes to assert backend_opensearch_client.delete_calls contains the expected owned and anonymous chunk IDs, using the same delete_calls pattern already used in the anonymous-only test, so the combined-scope path is fully covered.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@flows/ingestion_flow.json`:
- Line 3754: The embedding model list now only includes text-embedding-3-large,
which makes search() skip existing text-embedding-3-small vectors during query
planning. Update the embedding configuration in the ingestion flow to keep
text-embedding-3-small available alongside text-embedding-3-large until old
vectors are migrated, so the matching embedding object still exists for
previously indexed documents. Keep the change localized to the embedding/model
mapping used by search() and the ingestion flow config.
In `@tests/unit/test_shared_flag.py`:
- Around line 181-237: The two shared-flag tests are failing before the guard is
exercised because `connector_sync` is invoked without its required `request`
argument. Update both calls in `test_non_cos_connector_rejects_shared_true` and
`test_ibm_cos_shared_true_does_not_hit_guard` to pass a mock `Request` object,
using the same `connector_sync` setup already in the tests so the shared-flag
logic can be reached.
---
Outside diff comments:
In `@src/models/processors.py`:
- Around line 268-315: The ownership filter in _delete_connector_chunks only
matches one state at a time, so stale chunks can survive when a connector file
switches between shared and private modes. Update the query-building logic in
_delete_connector_chunks to use the same owner-matching behavior as
build_replace_filename_query: for a given stable file_id, match both owner ==
owner_user_id and owner missing/None so cleanup removes chunks regardless of the
prior shared flag. Keep the existing document_id/connector_file_id matching, but
make the owner filter cover both ownership shapes.
---
Nitpick comments:
In `@src/api/v1/documents.py`:
- Around line 185-212: The /v1 documents route is building a new aggregate
response inline in the handler, which should be moved out of the FastAPI route
and into a core/service helper with a Pydantic response model. Update the
delete-by-filename flow around delete_documents_by_filename_core and the
aggregation in the route to return a typed response object instead of assembling
JSONResponse manually, so the public schema and SDK contract stay stable. Keep
the route thin by delegating the per-file aggregation and status calculation to
the service layer, and ensure the handler uses the declared response model and
correct HTTP status code.
In `@tests/unit/test_document_delete_by_id.py`:
- Around line 140-179: The test for delete_documents_by_filename_core should
also verify that both matched document IDs are actually deleted, not just
counted. Extend
test_delete_documents_by_filename_combines_owned_and_anonymous_scopes to assert
backend_opensearch_client.delete_calls contains the expected owned and anonymous
chunk IDs, using the same delete_calls pattern already used in the
anonymous-only test, so the combined-scope path is fully covered.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 8915c68e-2a40-4582-99bd-88f0ee53fd78
📒 Files selected for processing (35)
.github/workflows/test-ci.ymlalembic/versions/0007_add_knowledge_delete_anonymous.pyflows/ingestion_flow.jsonfrontend/app/api/mutations/useSyncConnector.tsfrontend/components/cloud-picker/ingest-settings.tsxfrontend/components/cloud-picker/types.tsfrontend/components/connectors/shared-bucket-view.tsxfrontend/components/file-browser-dialog.tsxfrontend/components/knowledge-actions-dropdown.tsxfrontend/components/knowledge-batch-actions-bar.tsxfrontend/enhancements/connectors/ibm-cos/components/bucket-view.tsxsrc/api/connectors.pysrc/api/documents.pysrc/api/v1/documents.pysrc/config/settings.pysrc/connectors/service.pysrc/db/seed.pysrc/dependencies.pysrc/models/processors.pysrc/services/document_index_writer.pysrc/services/langflow_file_service.pysrc/utils/langflow_headers.pysrc/utils/opensearch_queries.pytests/integration/core/test_non_langflow_ingestion.pytests/integration/core/test_onboarding_sample_docs.pytests/integration/core/test_shared_flag_dls.pytests/unit/db/migrations/test_add_anonymous_delete_permission.pytests/unit/db/test_rbac_seed_idempotency.pytests/unit/dependencies/test_rbac_kill_switch.pytests/unit/dependencies/test_require_api_key_permission.pytests/unit/dependencies/test_require_permission.pytests/unit/test_ascii_safe_header_value.pytests/unit/test_document_delete_by_id.pytests/unit/test_langflow_ingest_callback.pytests/unit/test_shared_flag.py
🛑 Comments failed to post (2)
flows/ingestion_flow.json (1)
3754-3754: 🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash set -euo pipefail # Map the relevant file and locate the referenced nodes/strings. git ls-files flows/ingestion_flow.json rg -n '"name": "text-embedding-3-(small|large)"|EmbeddingModel-|chunk_embedding_|OpenSearch' flows/ingestion_flow.json # Inspect the relevant line ranges around the three embedding model nodes. sed -n '3720,3785p' flows/ingestion_flow.json sed -n '4250,4325p' flows/ingestion_flow.json sed -n '4775,4845p' flows/ingestion_flow.jsonRepository: langflow-ai/openrag
Length of output: 50376
🏁 Script executed:
#!/bin/bash set -euo pipefail # Find the search-time logic that decides which embedding fields are queried. rg -n "chunk_embedding_|get_embedding_field_name|_get_index_properties|_is_knn_vector_field|search_documents|raw_search|embedding_model_name|detect.*model|available models|multi-model" flows/ingestion_flow.json # Read the search-related methods in smaller slices. sed -n '3000,3450p' flows/ingestion_flow.json sed -n '3450,3950p' flows/ingestion_flow.jsonRepository: langflow-ai/openrag
Length of output: 50376
🏁 Script executed:
#!/bin/bash set -euo pipefail # Locate the search implementation and index-mapping helpers. rg -n "def search_documents|def _get_index_properties|def _get_field_dimension|def _is_knn_vector_field|chunk_embedding_" flows/ingestion_flow.json # Read only the relevant function bodies/slices. sed -n '3950,4700p' flows/ingestion_flow.json sed -n '4700,5400p' flows/ingestion_flow.jsonRepository: langflow-ai/openrag
Length of output: 50376
🏁 Script executed:
#!/bin/bash set -euo pipefail # Tight, context-rich search for the exact helper/function definitions. rg -n -C 40 "def search_documents|def _get_index_properties|def _get_field_dimension|def _is_knn_vector_field|def _get_available_embedding_fields|def _resolve_embedding_field|chunk_embedding_" flows/ingestion_flow.json # If search_documents exists, print a slightly larger window around it using line numbers from the file. python3 - <<'PY' from pathlib import Path text = Path("flows/ingestion_flow.json").read_text() for needle in ["def search_documents", "def _get_index_properties", "def _get_field_dimension", "def _is_knn_vector_field", "def _get_available_embedding_fields", "def _resolve_embedding_field"]: idx = text.find(needle) if idx != -1: start = max(0, idx - 1500) end = min(len(text), idx + 7000) print(f"\n===== {needle} =====\n") print(text[start:end]) PYRepository: langflow-ai/openrag
Length of output: 50376
🏁 Script executed:
#!/bin/bash set -euo pipefail python3 - <<'PY' import json from pathlib import Path obj = json.loads(Path("flows/ingestion_flow.json").read_text()) code = None # Find the embedded code string in the JSON tree. def walk(x): global code if code is not None: return if isinstance(x, dict): for k, v in x.items(): if k == "code" and isinstance(v, str) and "def search_documents" in v: code = v return walk(v) elif isinstance(x, list): for v in x: walk(v) walk(obj) if code is None: raise SystemExit("embedded code not found") needles = [ "def search_documents", "def _get_index_properties", "def _get_field_dimension", "def _is_knn_vector_field", "def _get_available_embedding_fields", "def _resolve_embedding_field", "chunk_embedding_", ] for needle in needles: idx = code.find(needle) print(f"\n===== {needle} =====") if idx == -1: print("NOT FOUND") continue start = max(0, idx - 1500) end = min(len(code), idx + 5000) print(code[start:end]) PYRepository: langflow-ai/openrag
Length of output: 181
🏁 Script executed:
python3 - <<'PY' import json from pathlib import Path obj = json.loads(Path("flows/ingestion_flow.json").read_text()) # Show the top-level shape so we can target the embedded code path. print(type(obj).__name__) if isinstance(obj, dict): print(sorted(obj.keys())) # Recursively find any string values containing these markers and print their JSON path. needles = ["search_documents", "_get_index_properties", "_get_field_dimension", "chunk_embedding_"] hits = [] def walk(x, path="$"): if isinstance(x, dict): for k, v in x.items(): walk(v, f"{path}.{k}") elif isinstance(x, list): for i, v in enumerate(x): walk(v, f"{path}[{i}]") elif isinstance(x, str): for needle in needles: if needle in x: hits.append((path, needle, x[:120].replace("\n", "\\n"))) walk(obj) for path, needle, preview in hits[:100]: print(f"{needle} @ {path} :: {preview}") print("hits:", len(hits)) PYRepository: langflow-ai/openrag
Length of output: 1703
🏁 Script executed:
python3 - <<'PY' import json from pathlib import Path obj = json.loads(Path("flows/ingestion_flow.json").read_text()) code = obj["data"]["nodes"][6]["data"]["node"]["template"]["code"]["value"] needles = [ "def search_documents", "def _get_index_properties", "def _get_field_dimension", "def _is_knn_vector_field", "def _get_available_embedding_fields", "def _resolve_embedding_field", "def _get_embedding_model_name", "get_embedding_field_name", "chunk_embedding_", ] lines = code.splitlines() for needle in needles: matches = [i for i, line in enumerate(lines, 1) if needle in line] print(f"\n===== {needle} =====") if not matches: print("NOT FOUND") continue for lineno in matches[:5]: start = max(1, lineno - 20) end = min(len(lines), lineno + 140) print(f"\n--- lines {start}-{end} ---") for i in range(start, end + 1): print(f"{i:5d}: {lines[i-1]}") PYRepository: langflow-ai/openrag
Length of output: 50375
🏁 Script executed:
python3 - <<'PY' import json from pathlib import Path obj = json.loads(Path("flows/ingestion_flow.json").read_text()) code = obj["data"]["nodes"][6]["data"]["node"]["template"]["code"]["value"] lines = code.splitlines() # Print focused windows around the relevant ingestion and search logic. windows = [ (1370, 1515), # ingestion field selection / dimensions (1680, 1845), # model detection (1992, 2325), # query embedding / field selection / dis_max search (2325, 2405), # result handling ] for start, end in windows: print(f"\n===== lines {start}-{end} =====") for i in range(start, min(end, len(lines)) + 1): print(f"{i:5d}: {lines[i-1]}") PYRepository: langflow-ai/openrag
Length of output: 39125
Keep
text-embedding-3-smallavailable until old vectors are migrated
search()only builds KNN queries for models that still have a matching embedding object. With onlytext-embedding-3-largewired here, documents already indexed astext-embedding-3-smallwill be detected in the corpus but skipped at query time, so they become effectively unsearchable unless you backfill/reindex or keep the small model available during the transition.text-embedding-3-largealso doubles vector width, so the index footprint and embedding cost go up.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@flows/ingestion_flow.json` at line 3754, The embedding model list now only includes text-embedding-3-large, which makes search() skip existing text-embedding-3-small vectors during query planning. Update the embedding configuration in the ingestion flow to keep text-embedding-3-small available alongside text-embedding-3-large until old vectors are migrated, so the matching embedding object still exists for previously indexed documents. Keep the change localized to the embedding/model mapping used by search() and the ingestion flow config.tests/unit/test_shared_flag.py (1)
181-237: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash rg -n "async def connector_sync" -A 12 src/api/connectors.py rg -n "await connector_sync(" -A 8 tests/unit/test_shared_flag.pyRepository: langflow-ai/openrag
Length of output: 1534
🏁 Script executed:
#!/bin/bash sed -n '1,220p' src/api/connectors.py | sed -n '1,220p'Repository: langflow-ai/openrag
Length of output: 8063
🏁 Script executed:
#!/bin/bash rg -n "def _connector_access_denied|async def _connector_access_denied|_connector_access_denied\(" src/api/connectors.py -A 40 -B 20Repository: langflow-ai/openrag
Length of output: 23339
Pass
requestinto bothconnector_synccalls
tests/unit/test_shared_flag.py:181-237—connector_syncrequires arequestargument, so these tests raiseTypeErrorbefore reaching the shared-flag branch. Add a mockRequestto both calls.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/test_shared_flag.py` around lines 181 - 237, The two shared-flag tests are failing before the guard is exercised because `connector_sync` is invoked without its required `request` argument. Update both calls in `test_non_cos_connector_rejects_shared_true` and `test_ibm_cos_shared_true_does_not_hit_guard` to pass a mock `Request` object, using the same `connector_sync` setup already in the tests so the shared-flag logic can be reached.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/api/connectors.py (1)
72-77: 🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
get_synced_file_ids_for_connectormust accept the new kwargs here
src/api/connectors.py:72-77still definesget_synced_file_ids_for_connector()withoutingest_settingsorshared, so the bucket-filter sync path atsrc/api/connectors.py:984-991raisesTypeErroron every request and falls through to the generic 500 handler. Update the function signature (and any downstream filtering if needed) to match the caller.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/api/connectors.py` around lines 72 - 77, `get_synced_file_ids_for_connector` is missing the new `ingest_settings` and `shared` kwargs, which causes the bucket-filter sync path to fail with a TypeError when called from the sync flow. Update the function signature in `get_synced_file_ids_for_connector` to accept these parameters, and thread them through any related filtering logic or helper calls so the caller in the sync path can pass them without error.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In `@src/api/connectors.py`:
- Around line 72-77: `get_synced_file_ids_for_connector` is missing the new
`ingest_settings` and `shared` kwargs, which causes the bucket-filter sync path
to fail with a TypeError when called from the sync flow. Update the function
signature in `get_synced_file_ids_for_connector` to accept these parameters, and
thread them through any related filtering logic or helper calls so the caller in
the sync path can pass them without error.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: f97ac426-86db-4987-8f34-43c1a0b1fd3d
📒 Files selected for processing (3)
frontend/components/connectors/shared-bucket-view.tsxfrontend/components/file-browser-dialog.tsxsrc/api/connectors.py
🚧 Files skipped from review as they are similar to previous changes (1)
- frontend/components/file-browser-dialog.tsx
s
Summary by CodeRabbit