Skip to content

Commit 56596c4

Browse files
Merge pull request #54251 from weslbo/fix-bug-161307
Clarify Avro file description in file-storage.md
2 parents c44c2e2 + d9f6183 commit 56596c4

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

learn-pr/wwl-data-ai/explore-core-data-concepts/includes/3-file-storage.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ While human-readable formats for structured and semi-structured data can be usef
9494

9595
Some common optimized file formats you might see include *Avro*, *ORC*, and *Parquet*:
9696

97-
- *Avro* is a row-based format. It was created by Apache. Each record contains a header that describes the structure of the data in the record. This header is stored as JSON. The data is stored as binary information. An application uses the information in the header to parse the binary data and extract the fields it contains. Avro is a good format for compressing data and minimizing storage and network bandwidth requirements.
97+
- *Avro* is a row-based format. It was created by Apache. Each file contains a header that describes the structure of the data in the file. This header is stored as JSON. The data is stored as binary information in one or more blocks of records. An application uses the information in the header to parse the binary data and extract the fields it contains. Avro is a good format for compressing data and minimizing storage and network bandwidth requirements.
9898

9999
- *ORC* (Optimized Row Columnar format) organizes data into columns rather than rows. It is an Apache project, originally developed as a Hadoop-native format for optimizing read and write operations in Apache Hive (Hive is a data warehouse system that supports fast data summarization and querying over large datasets). An ORC file contains *stripes* of data. Each stripe holds the data for a column or set of columns. A stripe contains an index into the rows in the stripe, the data for each row, and a footer that holds statistical information (count, sum, max, min, and so on) for each column.
100100

0 commit comments

Comments
 (0)