Skip to content

Commit 3baebe3

Browse files
author
Mukesh Dua
committed
Update links and add new Bookshelf & Knowledge Bases article; enhance documentation with images and toc updates which were removed with previous PR merge
1 parent ffb4d38 commit 3baebe3

25 files changed

Lines changed: 187 additions & 18 deletions

articles/microsoft-discovery/concept-azure-container-registry.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -221,5 +221,5 @@ When you develop and publish tools for Microsoft Discovery, the containerized to
221221
- [What is Microsoft Discovery?](overview-what-is-microsoft-discovery.md)
222222
- [Virtual networks and subnets in Microsoft Discovery](concept-virtual-networks.md)
223223
- [Role assignments in Microsoft Discovery](concept-role-assignments.md)
224-
- [Azure Container Registry documentation](https://learn.microsoft.com/azure/container-registry/)
224+
- [Azure Container Registry documentation](/azure/container-registry/)
225225
- [Azure Private Endpoint overview](../private-link/private-endpoint-overview.md)

articles/microsoft-discovery/concept-bookshelf-and-knowledgebases.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ Currently, the Bookshelf supports indexing unstructured (text-based) file format
3535
* JSON
3636
* CSV
3737

38-
The Bookshelf uses Azure AI Search Enrichment to process supported file formats. Images embedded in supported file formats are processed using Azure AI Search's built-in [Vision skill](https://learn.microsoft.com/azure/ai-services/computer-vision/overview), which automatically generates alt-text for embedded images. See [Azure AI Search's documentation](https://microsoftapc.sharepoint.com/:x:/t/ProjectParagon/IQDSrYORrkMDSa9OME93rcyYAc2EV_jDqr9aD3jYTThB7Cs?e=DfOv2t)for the full list of supported file formats.
38+
The Bookshelf uses Azure AI Search Enrichment to process supported file formats. Images embedded in supported file formats are processed using Azure AI Search's built-in [Vision skill](/azure/ai-services/computer-vision/overview), which automatically generates alt-text for embedded images. See [Azure AI Search's documentation](https://microsoftapc.sharepoint.com/:x:/t/ProjectParagon/IQDSrYORrkMDSa9OME93rcyYAc2EV_jDqr9aD3jYTThB7Cs?e=DfOv2t)for the full list of supported file formats.
3939

4040
The knowledge graph and vector database that results from indexing, collectively known as a Knowledge Base (KB), are stored in an Azure SQL DB in your subscription.
4141

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
---
2+
ms.service: azure
3+
ms.author: reburkea
4+
author: reburkea
5+
title: Microsoft Discovery Bookshelf & Knowledge Bases
6+
description: Conceptual overview of Microsoft Discovery Bookshelf service and Knowledge Bases.
7+
ms.topic: concept-article
8+
ms.date: 03/23/2026
9+
---
10+
11+
# Microsoft Discovery Bookshelf
12+
Microsoft Discovery includes the Bookshelf, a service that enables customers to convert their data into curated graphs known as Knowledge Bases (KBs). The key components of the Bookshelf service are the Bookshelf resource and Knowledge Bases within each Bookshelf. A Knowledge Base contains a vector database and knowledge graph of your indexed artifacts. KBs can be used by Discovery agents as grounding skills and queried by Discovery agents for various use cases, including answering questions, summarization, and reasoning.
13+
14+
## When to use the Bookshelf
15+
The Bookshelf is best for reasoning over your curated, proprietary data. Knowledge Bases are especially effective when their scoped contents are thematically related and directly applicable to your Discovery workflow. For example, an Application-Specific Integrated Circuit (ASIC) design team could create a Knowledge Base with their project's hardware specifications, simulation result reports, and the latest relevant literature from the field. Querying this Knowledge Base during design workflows ensures Discovery's reasoning is grounded with previous engineering content and scientific literature.
16+
17+
For using data in a tool call or otherwise directly using data in Discovery, creating a Knowledge Base is often not necessary. Similarly, to search over vast repositories of data or to find resources that might be relevant to your workflow, we suggest using Azure AI Search, SharePoint Search, or similar general purpose search tools. Once you have identified the data that is most relevant to your workflow, a Knowledge Base including this curated data can help ground your Discovery workflows and derive new insights in context.
18+
19+
## Features
20+
At a high level, the Bookshelf works by converting diverse file formats to text, then generating a graphical representation of that text, which can be queried using natural language.
21+
22+
The Bookshelf uses an advanced technique developed by Microsoft Research called Graph Retrieval-Augmented Generation (GraphRAG) to transform customer data into graph-based representations and generate responses to queries. Unlike traditional RAG methods, GraphRAG-based algorithms not only create an indexed vector database of the source content but also constructs a knowledge graph that captures entity relationships within the data. Research from Microsoft demonstrates that GraphRAG delivers more accurate and comprehensive grounding information than standard RAG or vector-based techniques, leading to higher-quality responses.
23+
24+
### Indexing
25+
Currently, the Bookshelf supports indexing unstructured (text-based) file formats stored in Azure Blob Storage. Supported file formats include:
26+
27+
* Text (.txt)
28+
* PDF (.pdf)
29+
* Word (.docx)
30+
* PowerPoint (.pptx)
31+
* Excel (.xlsx)
32+
* Markdown
33+
* XML
34+
* HTML
35+
* JSON
36+
* CSV
37+
38+
The Bookshelf uses Azure AI Search Enrichment to process supported file formats. Images embedded in supported file formats are processed using Azure AI Search's built-in [Vision skill](/azure/ai-services/computer-vision/overview), which automatically generates alt-text for embedded images. See [Azure AI Search's documentation](/azure/search/cognitive-search-skill-document-intelligence-layout#supported-file-formats) for the full list of supported file formats.
39+
40+
The knowledge graph and vector database that results from indexing, collectively known as a Knowledge Base (KB), are stored in an Azure SQL DB in your subscription.
41+
42+
### Query
43+
The Bookshelf provides the query function that can be invoked by any agent running on the Microsoft Discovery platform, including your own agent.
44+
45+
## Known limitations
46+
47+
### Unsupported file types
48+
Encrypted, password-protected, or sensitivity-labeled files aren't supported for indexing. Any unsupported file types are skipped during indexing.
49+
50+
### Cross-project sharing
51+
52+
Bookshelves can't be shared across projects. Each project must have its own dedicated Bookshelves and Knowledge Bases.
53+
54+
> [!NOTE]
55+
> The ability to share Bookshelves across projects is a planned feature for future releases.
56+
57+
### One knowledge base per Bookshelf
58+
59+
Each Bookshelf can only contain one Knowledge Base. However, Projects can contain many Bookshelves.
60+
61+
> [!NOTE]
62+
> The ability to create multiple Knowledge Bases within the same Bookshelf is a planned feature for future releases.
63+
64+
### Incremental indexing
65+
66+
Incremental indexing isn't currently supported. To update Knowledge Bases, you must delete them and re-index.
67+
68+
> [!NOTE]
69+
> Incremental indexing is a planned feature for future releases.
70+
71+
### Scale
72+
73+
The Bookshelf currently supports Small (<200 MB of text), Medium (<500 MB of text, default size), and Large (<1 GB of text)-sized deployments. For more information on supported index sizes and the resources required to support each size, see the Bookshelf creation How To guide.
74+
75+
### Best practices
76+
77+
The Bookshelf is an evolving feature. Over the course of future releases, we'll improve the costs and time associated with creating Bookshelf deployments and indexing and searching over KBs. We'll also support incremental indexing and we'll take advantage of newer GPT models for search. Currently, for the best performance and to minimize costs of re-deployment, re-indexing, re-enrichment, or search, we recommend the following best practices:
78+
79+
* Limit each Knowledge Base to Small or Medium (default)-sized deployments
80+
* Ensure each KB's content is thematically coherent and directly applicable to your Discovery workflow.

articles/microsoft-discovery/concept-discovery-engine.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -119,7 +119,7 @@ When cognition executes tasks, it draws on the full Microsoft Discovery platform
119119

120120
- **[Agents](concept-discovery-agent.md)**: Specialized AI systems that execute specific types of work. Cognition selects the agent whose capabilities best match each task. An agent is associated to the best model for the type of work required.
121121
- **Tools**: Containerized executables that run on the [supercomputer](how-to-manage-supercomputers.md) for computation, data processing, and analysis. Tools handle work that requires specialized software or significant compute resources.
122-
- **[Bookshelf](concept-bookshelf-and-knowledgebases.md)**: Knowledge bases built from your documents and scientific literature. Agents query bookshelves to ground their reasoning in relevant context.
122+
- **[Bookshelf](concept-bookshelf-knowledge-bases.md)**: Knowledge bases built from your documents and scientific literature. Agents query bookshelves to ground their reasoning in relevant context.
123123

124124
You configure these resources when you set up your workspace and project. Cognition then orchestrates them automatically based on what each task requires.
125125

articles/microsoft-discovery/concept-projects-investigations.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,4 +102,4 @@ Subscription
102102
- [Add agents using bundles](quickstart-agents-bundles.md)
103103
- [Microsoft Discovery agents](concept-discovery-agent.md)
104104
- [Agent types in Microsoft Discovery](concept-discovery-agent-types.md)
105-
- [Bookshelf & Knowledge Bases](concept-bookshelf-and-knowledgebases.md)
105+
- [Bookshelf & Knowledge Bases](concept-bookshelf-knowledge-bases.md)

articles/microsoft-discovery/concept-resource-provider-registration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -86,7 +86,7 @@ Refresh the **Resource providers** page and confirm that all the Resource Provid
8686

8787
#### Prerequisites
8888

89-
- [Azure CLI installed](https://learn.microsoft.com/cli/azure/install-azure-cli)
89+
- [Azure CLI installed](/cli/azure/install-azure-cli)
9090
- Authenticated to your Azure account (`az login`)
9191

9292
#### Register the resource provider
@@ -113,7 +113,7 @@ az provider list --query "[].{Provider:namespace, Status:registrationState}" --o
113113

114114
#### Prerequisites
115115

116-
- [Azure PowerShell module installed](https://learn.microsoft.com/powershell/azure/install-azure-powershell)
116+
- [Azure PowerShell module installed](/powershell/azure/install-azure-powershell)
117117
- Authenticated to your Azure account (`Connect-AzAccount`)
118118

119119
#### Register the resource provider

articles/microsoft-discovery/how-to-data-handling-with-tools-agents.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -409,5 +409,5 @@ If a resource doesn't appear in your conversation, use the following steps:
409409

410410
- [Microsoft Discovery agents](concept-discovery-agent.md)
411411
- [Agent types in Microsoft Discovery](concept-discovery-agent-types.md)
412-
- [Bookshelf and knowledge bases](concept-bookshelf-and-knowledgebases.md)
412+
- [Bookshelf and knowledge bases](concept-bookshelf-knowledge-bases.md)
413413
- [Storage assets and storage containers in Microsoft Discovery](concept-storage-account.md)

articles/microsoft-discovery/how-to-deploy-network-hardened-stack.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -246,4 +246,4 @@ When you complete this deployment, you have:
246246
## Next steps
247247

248248
- [Configure network security](how-to-configure-network-security.md) - detailed network hardening and PE setup
249-
- [Bookshelf and Knowledge Bases](concept-bookshelf-and-knowledgebases.md)
249+
- [Bookshelf and Knowledge Bases](concept-bookshelf-knowledge-bases.md)

articles/microsoft-discovery/how-to-manage-supercomputers.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ Nodepools define the compute capacity (VMs) attached to a Supercomputer. You can
9999
2. Under **Settings**, select **Nodepools**.
100100
3. Select **Create**.
101101

102-
:::image type="content" source="./media/how-to-manage-supercomputers/create-supercomputer-nodepool-create.jpg" alt-text="Screenshot of Azure portal showing Supercomputer create nodepool page." lightbox="./media/how-to-manage-supercomputers/create-supercomputer-nodepool-create.jpg":::
102+
:::image type="content" source="./media/how-to-manage-supercomputers/create-supercomputer-node-pool-create.jpg" alt-text="Screenshot of Azure portal showing Supercomputer create nodepool page." lightbox="./media/how-to-manage-supercomputers/create-supercomputer-node-pool-create.jpg":::
103103

104104
### Configure basic settings
105105

@@ -123,7 +123,7 @@ Nodepools define the compute capacity (VMs) attached to a Supercomputer. You can
123123

124124
1. Choose a **Virtual Machine type** for the Node Pool.
125125

126-
:::image type="content" source="./media/how-to-manage-supercomputers/create-supercomputer-nodepool-vm-configuration.jpg" alt-text="SCreenshot of Azure portal showing Nodepool select VM type." lightbox="./media/how-to-manage-supercomputers/create-supercomputer-nodepool-vm-configuration.jpg":::
126+
:::image type="content" source="./media/how-to-manage-supercomputers/create-supercomputer-node-pool-vm-configuration.jpg" alt-text="SCreenshot of Azure portal showing Nodepool select VM type." lightbox="./media/how-to-manage-supercomputers/create-supercomputer-node-pool-vm-configuration.jpg":::
127127

128128
> [!NOTE]
129129
> The selected Virtual Machine type must be available and quota-approved in the selected region.
@@ -134,7 +134,7 @@ Nodepools define the compute capacity (VMs) attached to a Supercomputer. You can
134134

135135
Specify the **maximum node count**, which defines the upper bound for automatically scaling.
136136

137-
:::image type="content" source="./media/how-to-manage-supercomputers/create-supercomputer-nodepool-scaling.jpg" alt-text="Screenshot of Azure portal showing Nodepool scaling options." lightbox="./media/how-to-manage-supercomputers/create-supercomputer-nodepool-scaling.jpg":::
137+
:::image type="content" source="./media/how-to-manage-supercomputers/create-supercomputer-node-pool-scaling.jpg" alt-text="Screenshot of Azure portal showing Nodepool scaling options." lightbox="./media/how-to-manage-supercomputers/create-supercomputer-node-pool-scaling.jpg":::
138138

139139
### Create the Nodepool
140140

@@ -158,7 +158,7 @@ To delete the nodepools, follow these steps:
158158
- Select the Supercomputer that owns the nodepool.
159159
3. Select the **Nodepool** under **Settings** in the left pane.
160160

161-
:::image type="content" source="./media/how-to-manage-supercomputers/delete-nodepool.jpg" alt-text="Screenshot of Azure portal showing nodepools." lightbox="./media/how-to-manage-supercomputers/delete-nodepool.jpg":::
161+
:::image type="content" source="./media/how-to-manage-supercomputers/delete-node-pool.jpg" alt-text="Screenshot of Azure portal showing nodepools." lightbox="./media/how-to-manage-supercomputers/delete-node-pool.jpg":::
162162

163163
4. Select the nodepool or nodepools that you want to delete and select **Delete**
164164
1. Wait for all the nodepools to get deleted, then navigate to the supercomputer and select the **Overview** section in the left pane

articles/microsoft-discovery/media/how-to-manage-supercomputers/create-supercomputer-nodepool-create.jpg renamed to articles/microsoft-discovery/media/how-to-manage-supercomputers/create-supercomputer-node-pool-create.jpg

File renamed without changes.

0 commit comments

Comments
 (0)