Skip to content

Commit d9f6183

Browse files
committed
Clarify Avro file description in file-storage.md
1 parent 5553b0f commit d9f6183

1 file changed

Lines changed: 1 addition & 1 deletion

File tree

learn-pr/wwl-data-ai/explore-core-data-concepts/includes/3-file-storage.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,7 +94,7 @@ While human-readable formats for structured and semi-structured data can be usef
9494

9595
Some common optimized file formats you might see include *Avro*, *ORC*, and *Parquet*:
9696

97-
- *Avro* is a row-based format. It was created by Apache. Each record contains a header that describes the structure of the data in the record. This header is stored as JSON. The data is stored as binary information. An application uses the information in the header to parse the binary data and extract the fields it contains. Avro is a good format for compressing data and minimizing storage and network bandwidth requirements.
97+
- *Avro* is a row-based format. It was created by Apache. Each file contains a header that describes the structure of the data in the file. This header is stored as JSON. The data is stored as binary information in one or more blocks of records. An application uses the information in the header to parse the binary data and extract the fields it contains. Avro is a good format for compressing data and minimizing storage and network bandwidth requirements.
9898

9999
- *ORC* (Optimized Row Columnar format) organizes data into columns rather than rows. It is an Apache project, originally developed as a Hadoop-native format for optimizing read and write operations in Apache Hive (Hive is a data warehouse system that supports fast data summarization and querying over large datasets). An ORC file contains *stripes* of data. Each stripe holds the data for a column or set of columns. A stripe contains an index into the rows in the stripe, the data for each row, and a footer that holds statistical information (count, sum, max, min, and so on) for each column.
100100

0 commit comments

Comments
 (0)