MicrosoftDocs
diff --git a/‎articles/storage/files/TOC.yml‎
Lines changed: 40 additions & 0 deletions b/‎articles/storage/files/TOC.yml‎
Lines changed: 40 additions & 0 deletions
diff --git a/‎articles/storage/files/artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/orchestrations/haystack.md‎
Lines changed: 41 additions & 0 deletions b/‎articles/storage/files/artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/orchestrations/haystack.md‎
Lines changed: 41 additions & 0 deletions
diff --git a/‎articles/storage/files/artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/orchestrations/langchain.md‎
Lines changed: 43 additions & 0 deletions b/‎articles/storage/files/artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/orchestrations/langchain.md‎
Lines changed: 43 additions & 0 deletions
diff --git a/‎articles/storage/files/artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/orchestrations/llamaindex.md‎
Lines changed: 41 additions & 0 deletions b/‎articles/storage/files/artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/orchestrations/llamaindex.md‎
Lines changed: 41 additions & 0 deletions
diff --git a/‎articles/storage/files/artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/setup.md‎
Lines changed: 155 additions & 0 deletions b/‎articles/storage/files/artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/setup.md‎
Lines changed: 155 additions & 0 deletions
@@ -287,6 +287,46 @@
     href: storage-java-how-to-use-file-storage.md
   - name: Python
     href: storage-python-how-to-use-file-storage.md
+- name: Files for artificial intelligence (AI)
+  items:
+  - name: Retrieval-augmented generation (RAG)
+    items:
+    - name: What is retrieval-augmented generation?
+      href: artificial-intelligence/retrieval-augmented-generation/overview.md
+    - name: LangChain
+      href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/orchestrations/langchain.md
+    - name: LlamaIndex
+      href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/orchestrations/llamaindex.md
+    - name: Haystack
+      href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/orchestrations/haystack.md
+    - name: Pinecone
+      href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/vector-databases/pinecone.md
+    - name: Weaviate
+      href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/vector-databases/weaviate.md
+    - name: Qdrant
+      href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/vector-databases/qdrant.md
+    - name: Tutorials
+      items:
+      - name: Get started
+        href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/setup.md
+      - name: Use LangChain with Pinecone
+        href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/tutorials/langchain-pinecone/tutorial-langchain-pinecone.md
+      - name: Use LangChain with Weaviate
+        href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/tutorials/langchain-weaviate/tutorial-langchain-weaviate.md
+      - name: Use LangChain with Qdrant
+        href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/tutorials/langchain-qdrant/tutorial-langchain-qdrant.md
+      - name: Use LlamaIndex with Pinecone
+        href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/tutorials/llamaindex-pinecone/tutorial-llamaindex-pinecone.md
+      - name: Use LlamaIndex with Weaviate
+        href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/tutorials/llamaindex-weaviate/tutorial-llamaindex-weaviate.md
+      - name: Use LlamaIndex with Qdrant
+        href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/tutorials/llamaindex-qdrant/tutorial-llamaindex-qdrant.md
+      - name: Use Haystack with Pinecone
+        href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/tutorials/haystack-pinecone/tutorial-haystack-pinecone.md
+      - name: Use Haystack with Weaviate
+        href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/tutorials/haystack-weaviate/tutorial-haystack-weaviate.md
+      - name: Use Haystack with Qdrant
+        href: artificial-intelligence/retrieval-augmented-generation/open-source-frameworks/tutorials/haystack-qdrant/tutorial-haystack-qdrant.md
 - name: Troubleshooting
   items:
   - name: Troubleshoot Azure Files
 
@@ -0,0 +1,41 @@
+---
+title: Build RAG pipelines with Haystack and Azure Files
+description: Use Haystack as an orchestration framework to build retrieval-augmented generation (RAG) pipelines using data stored in Azure Files.
+author: ftrichardson1
+ms.service: azure-file-storage
+ms.topic: overview
+ms.date: 04/09/2026
+ms.author: t-flynnr
+---
+
+# Haystack with Azure Files
+
+Haystack is an open-source framework that models every pipeline as a directed acyclic graph (DAG) of typed components. By using Haystack with Azure Files, you can build retrieval-augmented generation (RAG) pipelines that use your existing file shares as a primary data source.
+
+Haystack separates indexing (embed and write) from querying (embed, retrieve, prompt, and generate) into distinct pipeline objects, making each independently testable and deployable.
+
+## Why use Haystack with Azure Files?
+
+* **Explicit pipeline DAGs:** Every component is a separate, typed node with named input and output sockets. You can visualize the pipeline, validate connections at build time, and trace data through each stage.
+* **Separate indexing and query pipelines:** Haystack separates ingestion from retrieval into distinct pipeline objects, making each independently testable and deployable.
+* **Custom components via `@component`:** Any Python class decorated with `@component` becomes a pipeline node with typed sockets, making it straightforward to add custom filtering or domain-specific logic as a first-class pipeline node.
+* **Built-in evaluation tools:** Haystack includes evaluation components for measuring retrieval and generation quality, so you can quantify the impact of changes to your pipeline.
+
+## Tutorials
+
+The following tutorials demonstrate how to build RAG pipelines over documents stored in Azure Files using Haystack with different vector databases:
+
+| Vector database | Tutorial |
+| :--- | :--- |
+| Pinecone | [Haystack + Pinecone](../tutorials/haystack-pinecone/tutorial-haystack-pinecone.md) |
+| Weaviate | [Haystack + Weaviate](../tutorials/haystack-weaviate/tutorial-haystack-weaviate.md) |
+| Qdrant | [Haystack + Qdrant](../tutorials/haystack-qdrant/tutorial-haystack-qdrant.md) |
+
+> [!NOTE]
+> All tutorials require the same [project setup and prerequisites](../setup.md).
+
+## Next steps
+
+* [Azure Storage documentation](/azure/storage/)
+* [Haystack documentation](https://docs.haystack.deepset.ai/)
+* [Haystack GitHub](https://github.com/deepset-ai/haystack)
@@ -0,0 +1,43 @@
+---
+title: Build RAG pipelines with LangChain and Azure Files
+description: Use LangChain as an orchestration framework to build retrieval-augmented generation (RAG) pipelines using data stored in Azure Files.
+author: ftrichardson1
+ms.service: azure-file-storage
+ms.topic: overview
+ms.date: 04/09/2026
+ms.author: t-flynnr
+---
+
+# LangChain with Azure Files
+
+LangChain is an open-source framework designed to simplify the creation of applications powered by large language models (LLMs). By using LangChain with Azure Files, you can build robust retrieval-augmented generation (RAG) pipelines that leverage your existing file shares as a primary data source.
+
+LangChain's modular architecture and **LangChain Expression Language (LCEL)** allow you to swap components—such as document loaders, retrievers, and vector stores—with minimal code changes.
+
+## Why use LangChain with Azure Files?
+
+Integrating LangChain with Azure Files offers several advantages for AI workflows:
+
+* **Modular integrations:** Connect Azure Files to a wide array of vector databases and LLMs without rewriting core logic.
+* **Streamlined orchestration:** Use LCEL to build composable, testable pipelines that support asynchronous execution and real-time streaming.
+* **Optional observability:** Integrate with tools like LangSmith to trace execution, evaluate retrieval quality, and debug latency.
+* **Direct data access:** Directly ingest unstructured data from Azure Files, maintaining your existing storage hierarchy as the system of record.
+
+## Tutorials
+
+The following tutorials demonstrate how to build RAG pipelines over documents stored in Azure Files using LangChain with different vector databases:
+
+| Vector database | Tutorial |
+| :--- | :--- |
+| Pinecone | [LangChain + Pinecone](../tutorials/langchain-pinecone/tutorial-langchain-pinecone.md) |
+| Weaviate | [LangChain + Weaviate](../tutorials/langchain-weaviate/tutorial-langchain-weaviate.md) |
+| Qdrant | [LangChain + Qdrant](../tutorials/langchain-qdrant/tutorial-langchain-qdrant.md) |
+
+> [!NOTE]
+> All tutorials require the same [project setup and prerequisites](../setup.md).
+
+## Next steps
+
+* [Azure Storage documentation](/azure/storage/)
+* [LangChain documentation](https://python.langchain.com/docs/introduction/)
+* [LangChain GitHub](https://github.com/langchain-ai/langchain)
@@ -0,0 +1,41 @@
+---
+title: Build RAG pipelines with LlamaIndex and Azure Files
+description: Use LlamaIndex as an orchestration framework to build retrieval-augmented generation (RAG) pipelines using data stored in Azure Files.
+author: ftrichardson1
+ms.service: azure-file-storage
+ms.topic: overview
+ms.date: 04/09/2026
+ms.author: t-flynnr
+---
+
+# LlamaIndex with Azure Files
+
+LlamaIndex is an open-source framework designed for building retrieval-augmented generation (RAG) applications. By using LlamaIndex with Azure Files, you can build RAG pipelines that use your existing file shares as a primary data source.
+
+LlamaIndex provides fine-grained control over each stage of the pipeline through abstractions such as `SentenceSplitter` for chunking, `VectorStoreIndex` for indexing, and `RetrieverQueryEngine` for query-time retrieval and response synthesis.
+
+## Why use LlamaIndex with Azure Files?
+
+* **Retrieval-focused abstractions:** LlamaIndex provides specialized index types (`VectorStoreIndex`, `KeywordTableIndex`, `KnowledgeGraphIndex`) and query engines that give you control over retrieval strategies without restructuring your pipeline.
+* **Node-based document model:** Documents are parsed into typed nodes that carry metadata and parent-child relationships, enabling filtering and source citation at query time.
+* **Broad connector ecosystem:** LlamaHub provides connectors for data sources beyond file systems, so the same retrieval patterns you build for Azure Files extend to databases, APIs, and SaaS tools.
+* **Multimodal support:** LlamaIndex handles text, tables, images, and structured data within a single index, which is useful for Azure Files shares that contain mixed document types.
+
+## Tutorials
+
+The following tutorials demonstrate how to build RAG pipelines over documents stored in Azure Files using LlamaIndex with different vector databases:
+
+| Vector database | Tutorial |
+| :--- | :--- |
+| Pinecone | [LlamaIndex + Pinecone](../tutorials/llamaindex-pinecone/tutorial-llamaindex-pinecone.md) |
+| Weaviate | [LlamaIndex + Weaviate](../tutorials/llamaindex-weaviate/tutorial-llamaindex-weaviate.md) |
+| Qdrant | [LlamaIndex + Qdrant](../tutorials/llamaindex-qdrant/tutorial-llamaindex-qdrant.md) |
+
+> [!NOTE]
+> All tutorials require the same [project setup and prerequisites](../setup.md).
+
+## Next steps
+
+* [Azure Storage documentation](/azure/storage/)
+* [LlamaIndex documentation](https://docs.llamaindex.ai/)
+* [LlamaIndex GitHub](https://github.com/run-llama/llama_index)
@@ -0,0 +1,155 @@
+---
+title: Prepare Azure Files data for document-based RAG applications with open-source frameworks
+description: Learn how to authenticate to an Azure file share and download files for ingestion into a document-based RAG application using open-source frameworks.
+author: ftrichardson1
+ms.service: azure-file-storage
+ms.topic: how-to
+ms.date: 04/08/2026
+ms.author: t-flynnr
+ms.custom: devx-track-python
+---
+
+# Prepare Azure Files data for document‑based RAG applications using open‑source AI tooling
+
+This article describes how to authenticate to an Azure file share and download its contents for use with open‑source retrieval‑augmented generation (RAG) tooling.
+
+## Prerequisites
+
+- An [Azure file share](/azure/storage/files/create-classic-file-share?tabs=azure-portal) containing the documents you want to query. If you don't have an Azure subscription, [create one for free](https://azure.microsoft.com/free/).
+- [Python 3.12.10](https://www.python.org/downloads/release/python-31210/). On Windows, install the **x64** version.
+- [Azure CLI](/cli/azure/install-azure-cli).
+
+## Grant access to an Azure file share
+
+This article uses Microsoft Entra ID authentication via [`DefaultAzureCredential`](/azure/developer/python/sdk/authentication/credential-chains?tabs=dac#defaultazurecredential-overview), the recommended credential pattern for Azure software development kits (SDKs). This approach avoids storage account keys and provides a portable authentication mechanism that works across development and production environments.
+
+> [!TIP]
+> Receiving a `403 Forbidden` error typically indicates missing authorization rather than failed authentication.
+
+Assign the [**Storage File Data Privileged Reader**](/azure/storage/files/authorize-oauth-rest?tabs=portal#privileged-access-and-access-permissions-for-data-operations) role on the storage account hosting the file share.
+
+> [!NOTE]
+> This role is required because the code accesses Azure Files using [`token_intent="backup"`](/python/api/azure-storage-file-share/azure.storage.fileshare.shareclient#keyword-only-parameters). This access pattern bypasses file‑level permissions, so Azure requires a privileged role. The **Storage File Data Privileged Reader** role is sufficient because the code performs only read operations and doesn't modify file contents.
+
+#### Azure portal
+
+1. Sign in to the [Azure portal](https://portal.azure.com) and navigate to your storage account.
+2. Select **Access Control (IAM)** > **Add** > **Add role assignment**.
+3. Search for **Storage File Data Privileged Reader**, select it, and select **Next**.
+4. Select **Select members**, search for your user account, and select it.
+5. Select **Review + assign**.
+
+#### Azure CLI
+
+```bash
+az login
+
+az role assignment create \
+  --assignee $(az ad signed-in-user show --query id -o tsv) \
+  --role "Storage File Data Privileged Reader" \
+  --scope $(az storage account show \
+	--name <your-storage-account-name> \
+	--query id -o tsv)
+```
+
+#### Azure PowerShell
+
+```powershell
+Connect-AzAccount
+
+New-AzRoleAssignment `
+	-SignInName (Get-AzADUser -SignedIn).UserPrincipalName `
+	-RoleDefinitionName "Storage File Data Privileged Reader" `
+	-Scope (Get-AzStorageAccount `
+		-ResourceGroupName <your-resource-group> `
+		-Name <your-storage-account-name>).Id
+```
+
+## Set environment variables
+
+Create a `.env` file in your project directory with your Azure Files connection details:
+
+```text
+AZURE_STORAGE_ACCOUNT_NAME=<your-storage-account-name>
+AZURE_STORAGE_SHARE_NAME=<your-share-name>
+```
+
+| Variable | Description |
+| :--- | :--- |
+| `AZURE_STORAGE_ACCOUNT_NAME` | The name of your Azure Storage account |
+| `AZURE_STORAGE_SHARE_NAME` | The name of your Azure file share |
+
+## Download files from an Azure file share
+
+1. Install the required packages:
+
+   - `azure-identity`—provides `DefaultAzureCredential` for passwordless authentication.
+   - `azure-storage-file-share`—provides the [`ShareClient`](/python/api/azure-storage-file-share/azure.storage.fileshare.shareclient) used to connect to and download files from the share.
+
+   ```bash
+   pip install azure-identity
+   pip install azure-storage-file-share
+   ```
+
+2. Connect to an Azure file share, recursively enumerate its directory structure, and collect the details required to locate and download each file. The `ShareClient` requires `token_intent="backup"` when using Microsoft Entra ID–based authentication.
+
+   ```python
+   import os
+   import posixpath
+   import tempfile
+
+   from azure.identity import DefaultAzureCredential
+   from azure.storage.fileshare import ShareClient
+
+   account_name = os.environ["AZURE_STORAGE_ACCOUNT_NAME"]
+   share_name = os.environ["AZURE_STORAGE_SHARE_NAME"]
+
+   share = ShareClient(
+	   account_url=f"https://{account_name}.file.core.windows.net",
+	   share_name=share_name,
+	   credential=DefaultAzureCredential(),
+	   token_intent="backup",
+   )
+
+   root = share.get_directory_client("")
+   file_references = []
+   directories_to_traverse = [root]
+
+   while directories_to_traverse:
+	   current = directories_to_traverse.pop()
+	   for item in current.list_directories_and_files():
+		   if item.is_directory:
+			   directories_to_traverse.append(current.get_subdirectory_client(item.name))
+		   else:
+			   # Azure Files paths use posix-style separators.
+			   relative_path = posixpath.join(current.directory_path or "", item.name)
+			   file_references.append((item.name, relative_path, current))
+   ```
+
+3. Download the files. Before writing files to disk, the code validates each resolved file path to ensure it remains within the destination directory. This validation prevents files from being written outside the intended location when processing directory structures from an external source.
+
+   ```python
+   with tempfile.TemporaryDirectory() as destination:
+	   for name, relative_path, parent_directory in file_references:
+		   file_client = parent_directory.get_file_client(name)
+		   local_path = os.path.join(destination, relative_path)
+
+		   # Path traversal guard
+		   real_dest = os.path.realpath(destination) + os.sep
+		   if not os.path.realpath(local_path).startswith(real_dest):
+			   raise ValueError(f"Path traversal detected: {relative_path}")
+
+		   os.makedirs(os.path.dirname(local_path), exist_ok=True)
+		   with open(local_path, "wb") as f:
+			   for chunk in file_client.download_file().chunks():
+				   f.write(chunk)
+   ```
+
+## Next steps
+
+Choose a tutorial to continue with parsing, chunking, embedding, and querying:
+
+- [LangChain](orchestrations/langchain.md)—LangChain + Pinecone, Weaviate, Qdrant
+- [LlamaIndex](orchestrations/llamaindex.md)—LlamaIndex + Pinecone, Weaviate, Qdrant
+- [Haystack](orchestrations/haystack.md)—Haystack + Pinecone, Weaviate, Qdrant
+