From 5fde6440740b7ad121c1d86ec8eff3badb0e3b6c Mon Sep 17 00:00:00 2001 From: FarreltinF <111532171+FarreltinF@users.noreply.github.com> Date: Wed, 26 Mar 2025 10:47:36 -0700 Subject: [PATCH] Update ocr-overview.md Product update :OCR expand support for Office files --- microsoft-365/syntex/ocr-overview.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/microsoft-365/syntex/ocr-overview.md b/microsoft-365/syntex/ocr-overview.md index bc365d62fc6..6294ada997c 100644 --- a/microsoft-365/syntex/ocr-overview.md +++ b/microsoft-365/syntex/ocr-overview.md @@ -35,13 +35,17 @@ For example, you enable the OCR service and then add image files to your documen |Endpoint |Supported file types | |---------|---------| -|SharePoint and OneDrive |`.bmp, .png, .jpeg, .jpg, .jfif, .arw, .cr2, .crw, .erf, .gif, .mef, .mrw, .nef, .nrw, .orf, .pef, .raw, .rw2, .rw1, .sr2, .tif, .tiff, .heic, .heif, .ari, .bay, .cap, .cr3, .dcs, .dcr, .drf, .eip, .fff, .iiq, .k25, .kdc, .mef, .mos, .ptx, .pxn, .raf, .rwl, .sr2, .srf, .srw, .x3f, .dng, .tiff, and .pdf` | +|SharePoint and OneDrive |` .docx, .pptx, .xlsx, .bmp, .png, .jpeg, .jpg, .jfif, .arw, .cr2, .crw, .erf, .gif, .mef, .mrw, .nef, .nrw, .orf, .pef, .raw, .rw2, .rw1, .sr2, .tif, .tiff, .heic, .heif, .ari, .bay, .cap, .cr3, .dcs, .dcr, .drf, .eip, .fff, .iiq, .k25, .kdc, .mef, .mos, .ptx, .pxn, .raf, .rwl, .sr2, .srf, .srw, .x3f, .dng, .tiff, and .pdf` | |Teams, Exchange, and Windows devices |`.bmp, .png, .jpeg, .jpg, .tiff, and .pdf` | In addition to image-based PDF, SharePoint OCR supports hybrid PDF (text plus image PDF). Newly uploaded hybrid PDFs will be processed by the OCR service. > [!NOTE] > When you apply OCR to an image file, the text is stored in the **Extracted text** metadata column. When you apply OCR to a PDF or TIFF file, the extracted text is indexed in search but not available in the metadata column. +> + +### Office file support in SharePoint +SharePoint now supports OCR for Microsoft 365 Office files, including Word, PowerPoint, and Excel documents. Any images added to these files will be automatically scanned using OCR, and the extracted text will be indexed for search purposes and integrated into compliance solutions. Additionally, SharePoint has implemented de-duplication processes to check for unique images to avoid duplicate charges against the same images. ### Supported languages