| title | How to configure ORC format in the pipeline of Data Factory in Microsoft Fabric | |
|---|---|---|
| description | This article explains how to configure ORC format in the pipeline of Data Factory in Microsoft Fabric. | |
| ms.reviewer | jianleishen | |
| ms.topic | how-to | |
| ms.date | 06/25/2024 | |
| ms.custom |
|
ORC format in Data Factory in [!INCLUDE product-name]
This article outlines how to configure ORC format in the pipeline of Data Factory in [!INCLUDE product-name].
ORC format is supported for the following activities and connectors as a source and destination.
| Category | Connector/Activity |
|---|---|
| Supported connector | Amazon S3 |
| Amazon S3 Compatible | |
| Azure Blob Storage | |
| Azure Data Lake Storage Gen2 | |
| Azure Files | |
| File system | |
| FTP | |
| Google Cloud Storage | |
| HTTP | |
| Lakehouse Files | |
| Oracle Cloud Storage | |
| SFTP | |
| Supported activity | Copy activity (source/destination) |
| Lookup activity | |
| GetMetadata activity | |
| Delete data activity |
To configure ORC format, choose your connection in the source or destination of a pipeline copy activity, and then select ORC in the drop-down list of File format. Select Settings for further configuration of this format.
:::image type="content" source="./media/format-common/file-settings.png" alt-text="Screenshot showing file format settings.":::
After you select Settings in the File format section, the following properties are shown in the pop-up File format settings dialog box.
:::image type="content" source="./media/format-orc/file-settings.png" alt-text="Screenshot showing ORC file format source.":::
- Compression type: Choose the compression codec used to read ORC files in the drop-down list. You can choose from None, zlib or snappy.
After you select Settings, the following properties are shown in the pop-up File format settings dialog box.
:::image type="content" source="./media/format-orc/file-settings.png" alt-text="Screenshot showing ORC file format destination.":::
- Compression type: Choose the compression codec used to write ORC files in the drop-down list. You can choose from None, zlib or snappy.
Under Advanced settings in the Destination tab, the following ORC format related properties are displayed.
- Max rows per file: When writing data into a folder, you can choose to write to multiple files and specify the maximum rows per file. Specify the maximum rows that you want to write per file.
- File name prefix: Applicable when Max rows per file is configured. Specify the file name prefix when writing data to multiple files, resulted in this pattern:
<fileNamePrefix>_00000.<fileExtension>. If not specified, the file name prefix is auto generated. This property doesn't apply when the source is a file based store or a partition option enabled data store.
The following properties are supported in the copy activity Source section when using ORC format.
| Name | Description | Value | Required | JSON script property |
|---|---|---|---|---|
| File format | The file format that you want to use. | ORC | Yes | type (under datasetSettings):Orc |
| Compression type | The compression codec used to read ORC files. | None zlib snappy |
No | orcCompressionCodec: none zlib snappy |
The following properties are supported in the copy activity Destination section when using the ORC format.
| Name | Description | Value | Required | JSON script property |
|---|---|---|---|---|
| File format | The file format that you want to use. | ORC | Yes | type (under datasetSettings):Orc |
| Compression type | The compression codec used to write ORC files. | None zlib snappy |
No | orcCompressionCodec: none zlib snappy |
| Max rows per file | When writing data into a folder, you can choose to write to multiple files and specify the maximum rows per file. Specify the maximum rows that you want to write per file. | <your max rows per file> | No | maxRowsPerFile |
| File name prefix | Applicable when Max rows per file is configured. Specify the file name prefix when writing data to multiple files, resulted in this pattern: <fileNamePrefix>_00000.<fileExtension>. If not specified, the file name prefix is auto generated. This property doesn't apply when the source is a file based store or a partition option enabled data store. |
<your file name prefix> | No | fileNamePrefix |