Skip to content

Commit b07f4ed

Browse files
authored
Merge pull request #305387 from beharris/bh-mpc-edits
Bh mpc edits
2 parents 8a437a8 + 754b9eb commit b07f4ed

2 files changed

Lines changed: 47 additions & 22 deletions

File tree

articles/planetary-computer/data-cube-overview.md

Lines changed: 43 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -13,64 +13,85 @@ ms.custom:
1313
---
1414
# Data cubes in Microsoft Planetary Computer Pro
1515

16-
As mentioned in [Supported Data Types](./supported-data-types.md), Microsoft Planetary Computer Pro supports ingestion, cloud optimization, and visualization of data cube files in NetCDF, HDF5, and GRIB2 formats. Though complex and historically cumbersome on local storage, these assets are optimized for cloud environments with Planetary Computer Pro, further empowering them as efficient tools to structure and store multidimensional data like satellite imagery and climate models.
16+
As mentioned in [Supported Data Types](./supported-data-types.md), Microsoft Planetary Computer Pro supports ingestion, cloud optimization, and visualization of data cube files in NetCDF, HDF5, Zarr, and GRIB2 formats. Though complex and historically cumbersome on local storage, these assets are optimized for cloud environments with Planetary Computer Pro, further empowering them as efficient tools to structure and store multidimensional data like satellite imagery and climate models.
1717

18-
## Handling data cubes in Planetary Computer Pro
18+
## Ingestion of data cubes
1919

20-
Data cube files can be ingested into Planetary Computer Pro in the same way as other raster data types. As with other date formats, assets and associated Spatio Temporal Asset Catalog (STAC) Items must first be stored in Azure Blob Storage. Unlike other two-dimensional raster assets, however, additional processing will occur upon ingestion of certain data cube formats (NetCDF and HDF5).
20+
Data cube files can be ingested into Planetary Computer Pro in the same way as other raster data types. As with other date formats, assets and associated Spatio Temporal Asset Catalog (STAC) Items must first be stored in Azure Blob Storage. Unlike other two-dimensional raster assets, however, more cloud optimization steps occur upon ingestion of certain data cube formats (NetCDF and HDF5).
2121

2222
> [!NOTE]
23-
> GRIB2 data will be ingested in the same way as other two-dimensional raster data (with no additional enrichment), as they are essentially a collection of 2D rasters with an associated index file that references the data efficiently in cloud environments.
23+
> GRIB2 data is ingested in the same way as other two-dimensional raster data (with no other cloud optimization steps), as they're essentially a collection of 2D rasters with an associated index file that references the data efficiently in cloud environments. Similarly, Zarr is already a cloud-native format, so no optimization takes place upon ingestion.
2424
25-
## Enabling data cube enrichment of STAC assets
25+
## Cloud optimization of data cubes
2626

27-
When a STAC Item containing NetCDF or HDF5 assets is ingested, those assets can be enriched with data cube functionality. When data cube functionality is enabled, a Kerchunk manifest is generated and stored in blob storage alongside the asset, enabling more efficient data access.
27+
When a STAC Item containing NetCDF or HDF5 assets is ingested, the assets are cloud optimized, not by transforming the data itself, but rather by generation of reference files that enable more efficient data access.
2828

29-
### Data cube enrichment and Kerchunk manifests
29+
### Cloud optimization via Kerchunk manifests
3030

31-
For STAC assets in **NetCDF** or **HDF5** formats, Planetary Computer can apply **Data cube enrichment** during ingestion. This process generates a **Kerchunk manifest**, which is stored in blob storage alongside the asset. The Kerchunk manifest enables efficient access to chunked dataset formats.
31+
Unlike 2D raster data that is transformed into Cloud Optimized Geotiffs (COGs) when ingested into Planetary Computer Pro, data cube assets are optimized by generation of reference files, or Kerchunk manifests. [Kerchunk](https://fsspec.github.io/kerchunk/) is an open-source Python library that creates these chunk manifests, or JSON files that describe the structure of the data cube and its chunks using Zarr-style chunk keys that map to the byte ranges in the original file where those chunks reside. Once generated, the Kerchunk files are stored in blob storage alongside the assets, and the STAC items are enriched to include references to these manifests, optimizing data access for cloud environments.
3232

33-
### Enabling data cube enrichment
33+
### STAC item properties that trigger cloud optimization
3434

35-
Data cube enrichment is **enabled** for applicable assets in the STAC item JSON. For each asset, enrichment is triggered if both of the following conditions are met:
35+
Within the collection's STAC items, the following conditions must be true for a data cube asset to be cloud optimized:
3636

3737
* The asset format is one of the following types:
3838
- `application/netcdf`
3939
- `application/x-netcdf`
4040
- `application/x-hdf5`
4141
* The asset has a `roles` field that includes either `data` or `visual` within its list of roles.
4242

43-
If these conditions are met, a **Kerchunk manifest** (`assetid-kerchunk.json`) is generated in blob storage alongside the asset.
43+
If these conditions are met, a Kerchunk manifest (`assetid-kerchunk.json`) is generated in blob storage alongside the asset.
4444

4545
> [!NOTE]
4646
> The asset format type`application/x-hdf` often corresponds to HDF4 assets. GeoCatalog ingestion doesn't currently support creating virtual kerchunk manifests for HDF4 due to its added complexity and multiple variants.
4747
48-
### Data cube enrichment modifies the STAC item JSON
48+
### STAC item enrichment
4949

50-
For each enriched asset within the **STAC item JSON**, the following fields are added:
50+
For each optimized asset within the STAC item, the following fields are added:
5151

5252
- `msft:datacube_converted: true` – Indicates that enrichment was applied.
5353
- `cube:dimensions` – A dictionary listing dataset dimensions and their properties.
5454
- `cube:variables` – A dictionary describing dataset variables and their properties.
5555

56+
These variables should be used for render configurations to ensure that your visualization of data cube assets in the Explorer is reading and rendering your data most efficiently.
5657

57-
### Disabling data cube enrichment
58+
### Benefits of cloud optimized data cubes
5859

59-
To **disable enrichment** for an asset, remove `data` and `visual` from the asset’s `roles` list in the STAC item JSON before ingestion.
60+
Data cube cloud optimization improves data access performance, especially for visualization workflows. When a Kerchunk manifest is present, it allows faster access compared to loading the entire dataset file.
6061

61-
### Handling enrichment failures
62+
The Microsoft Planetary Computer Pro Explorer and tiling APIs preferentially use the Kerchunk manifest for data read operations if one exists in the same blob storage directory as the original asset.
6263

63-
If Data cube enrichment fails, the asset can be **re-ingested** with enrichment disabled by updating the STAC item JSON to exclude the `data` or `visual` role before retrying ingestion.
64+
Reading data using a chunked, reference-based approach is faster because it avoids reading the entire file into memory.
6465

65-
### Why enable data cube enrichment?
66+
### Disabling data cube cloud optimization
6667

67-
Enabling Data cube enrichment improves **data access performance**, especially for visualization workflows. When a Kerchunk manifest is present, it allows **faster access** compared to loading the entire dataset file.
68+
If you decide you don't want to work with cloud optimized data cube assets, disable cloud optimization by removing `data` and `visual` from the asset’s `roles` list in the STAC item JSON before ingestion.
6869

69-
### Faster dataset access for data APIs and visualization with Kerchunk
70+
## Zarr ingestion and data updates
7071

71-
The Data Explorer and tiling APIs preferentially use the **Kerchunk manifest (`.json`)** for data read operations if one exists in the same blob storage directory as the original asset. Instead of opening the full `.nc` file, we use a **Zarr with reference files** to access only the necessary data.
72+
As previously mentioned, Zarr is inherently a cloud-native format, so no extra optimization occurs when ingested and no modification of its STAC items is necessary. However, if you plan to dynamically update your Zarr assets and reingest STAC items to work with the latest version, you need to be aware of two update methods: **Append** and **Sync**.
7273

73-
Reading data using a chunked, reference-based approach is faster because it avoids reading the entire file into memory.
74+
### Append
75+
76+
If you add new data to a locally stored Zarr store, but want to update the version stored in Planetary Computer Pro, you need to reingest the STAC item. When that item is reingested, the default behavior is to review the assets for any new data, and add it to the data stored in the cloud. No modification to the STAC item is necessary prior to reingestion.
77+
78+
### Sync
79+
80+
If you remove data from a locally stored Zarr store, reingesting the same STAC item won't allow the cloud-based version to match the version on your machine, as the **append** functionality looks for new data, but not adjust according to any missing data. That's where **sync** comes into play. By modifying the STAC item to include a parameter that indicates you want to sync, the existing data with the new, and reingesting that modified STAC item, only the most up-to-date data from the Zarr store are available in Planetary Computer Pro. The modification to the STAC item should appear as follows:
81+
82+
```json
83+
{
84+
...
85+
"assets": {
86+
"pr": {
87+
"href": "https://managedstorage.azure.com/collection-container/somestuff/pr.zarr",
88+
"msft:ingestion": {
89+
"directory": "sync"
90+
}
91+
}
92+
}
93+
}
94+
```
7495

7596
## Related content
7697

articles/planetary-computer/data-cube-quickstart.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,10 @@ More information about visualizing GRIB2 data can be found in the [Visualizing a
7979

8080
Once your data cube assets are ingested and configured, you can visualize them in the Planetary Computer Pro Explorer. A step-by-step guide for using the Explorer can be followed in [Quickstart: Use the Explorer in Microsoft Planetary Computer Pro](use-explorer.md).
8181

82+
#### Time slider for data cube visualization
83+
84+
If your data cube assets have a temporal component, you can use the time slider in the Explorer to visualize changes over time. The time slider will appear automatically if your STAC Items contains assets with a `time` dimension with an `extent` and `step` field.
85+
8286
## Related content
8387

8488
- [Access STAC collection data cube assets with a collection-level SAS token](./get-collection-sas-token.md)

0 commit comments

Comments
 (0)