You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/planetary-computer/data-cube-overview.md
+43-22Lines changed: 43 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,64 +13,85 @@ ms.custom:
13
13
---
14
14
# Data cubes in Microsoft Planetary Computer Pro
15
15
16
-
As mentioned in [Supported Data Types](./supported-data-types.md), Microsoft Planetary Computer Pro supports ingestion, cloud optimization, and visualization of data cube files in NetCDF, HDF5, and GRIB2 formats. Though complex and historically cumbersome on local storage, these assets are optimized for cloud environments with Planetary Computer Pro, further empowering them as efficient tools to structure and store multidimensional data like satellite imagery and climate models.
16
+
As mentioned in [Supported Data Types](./supported-data-types.md), Microsoft Planetary Computer Pro supports ingestion, cloud optimization, and visualization of data cube files in NetCDF, HDF5, Zarr, and GRIB2 formats. Though complex and historically cumbersome on local storage, these assets are optimized for cloud environments with Planetary Computer Pro, further empowering them as efficient tools to structure and store multidimensional data like satellite imagery and climate models.
17
17
18
-
## Handling data cubes in Planetary Computer Pro
18
+
## Ingestion of data cubes
19
19
20
-
Data cube files can be ingested into Planetary Computer Pro in the same way as other raster data types. As with other date formats, assets and associated Spatio Temporal Asset Catalog (STAC) Items must first be stored in Azure Blob Storage. Unlike other two-dimensional raster assets, however, additional processing will occur upon ingestion of certain data cube formats (NetCDF and HDF5).
20
+
Data cube files can be ingested into Planetary Computer Pro in the same way as other raster data types. As with other date formats, assets and associated Spatio Temporal Asset Catalog (STAC) Items must first be stored in Azure Blob Storage. Unlike other two-dimensional raster assets, however, more cloud optimization steps occur upon ingestion of certain data cube formats (NetCDF and HDF5).
21
21
22
22
> [!NOTE]
23
-
> GRIB2 data will be ingested in the same way as other two-dimensional raster data (with no additional enrichment), as they are essentially a collection of 2D rasters with an associated index file that references the data efficiently in cloud environments.
23
+
> GRIB2 data is ingested in the same way as other two-dimensional raster data (with no other cloud optimization steps), as they're essentially a collection of 2D rasters with an associated index file that references the data efficiently in cloud environments. Similarly, Zarr is already a cloud-native format, so no optimization takes place upon ingestion.
24
24
25
-
## Enabling data cube enrichment of STAC assets
25
+
## Cloud optimization of data cubes
26
26
27
-
When a STAC Item containing NetCDF or HDF5 assets is ingested, those assets can be enriched with data cube functionality. When data cube functionality is enabled, a Kerchunk manifest is generated and stored in blob storage alongside the asset, enabling more efficient data access.
27
+
When a STAC Item containing NetCDF or HDF5 assets is ingested, the assets are cloud optimized, not by transforming the data itself, but rather by generation of reference files that enable more efficient data access.
28
28
29
-
### Data cube enrichment and Kerchunk manifests
29
+
### Cloud optimization via Kerchunk manifests
30
30
31
-
For STAC assets in **NetCDF**or **HDF5** formats, Planetary Computer can apply **Data cube enrichment** during ingestion. This process generates a **Kerchunk manifest**, which is stored in blob storage alongside the asset. The Kerchunk manifest enables efficient access to chunked dataset formats.
31
+
Unlike 2D raster data that is transformed into Cloud Optimized Geotiffs (COGs) when ingested into Planetary Computer Pro, data cube assets are optimized by generation of reference files, or Kerchunk manifests. [Kerchunk](https://fsspec.github.io/kerchunk/) is an open-source Python library that creates these chunk manifests, or JSON files that describe the structure of the data cube and its chunks using Zarr-style chunk keys that map to the byte ranges in the original file where those chunks reside. Once generated, the Kerchunk files are stored in blob storage alongside the assets, and the STAC items are enriched to include references to these manifests, optimizing data access for cloud environments.
32
32
33
-
### Enabling data cube enrichment
33
+
### STAC item properties that trigger cloud optimization
34
34
35
-
Data cube enrichment is **enabled** for applicable assets in the STAC item JSON. For each asset, enrichment is triggered if both of the following conditions are met:
35
+
Within the collection's STAC items, the following conditions must be true for a data cube asset to be cloud optimized:
36
36
37
37
* The asset format is one of the following types:
38
38
-`application/netcdf`
39
39
-`application/x-netcdf`
40
40
-`application/x-hdf5`
41
41
* The asset has a `roles` field that includes either `data` or `visual` within its list of roles.
42
42
43
-
If these conditions are met, a **Kerchunk manifest** (`assetid-kerchunk.json`) is generated in blob storage alongside the asset.
43
+
If these conditions are met, a Kerchunk manifest (`assetid-kerchunk.json`) is generated in blob storage alongside the asset.
44
44
45
45
> [!NOTE]
46
46
> The asset format type`application/x-hdf` often corresponds to HDF4 assets. GeoCatalog ingestion doesn't currently support creating virtual kerchunk manifests for HDF4 due to its added complexity and multiple variants.
47
47
48
-
### Data cube enrichment modifies the STAC item JSON
48
+
### STAC item enrichment
49
49
50
-
For each enriched asset within the **STAC item JSON**, the following fields are added:
50
+
For each optimized asset within the STAC item, the following fields are added:
51
51
52
52
-`msft:datacube_converted: true` – Indicates that enrichment was applied.
53
53
-`cube:dimensions` – A dictionary listing dataset dimensions and their properties.
54
54
-`cube:variables` – A dictionary describing dataset variables and their properties.
55
55
56
+
These variables should be used for render configurations to ensure that your visualization of data cube assets in the Explorer is reading and rendering your data most efficiently.
56
57
57
-
### Disabling data cube enrichment
58
+
### Benefits of cloud optimized data cubes
58
59
59
-
To **disable enrichment** for an asset, remove `data` and `visual` from the asset’s `roles` list in the STAC item JSON before ingestion.
60
+
Data cube cloud optimization improves data access performance, especially for visualization workflows. When a Kerchunk manifest is present, it allows faster access compared to loading the entire dataset file.
60
61
61
-
### Handling enrichment failures
62
+
The Microsoft Planetary Computer Pro Explorer and tiling APIs preferentially use the Kerchunk manifest for data read operations if one exists in the same blob storage directory as the original asset.
62
63
63
-
If Data cube enrichment fails, the asset can be **re-ingested** with enrichment disabled by updating the STAC item JSON to exclude the `data` or `visual` role before retrying ingestion.
64
+
Reading data using a chunked, reference-based approach is faster because it avoids reading the entire file into memory.
64
65
65
-
### Why enable data cube enrichment?
66
+
### Disabling data cube cloud optimization
66
67
67
-
Enabling Data cube enrichment improves **data access performance**, especially for visualization workflows. When a Kerchunk manifest is present, it allows **faster access** compared to loading the entire dataset file.
68
+
If you decide you don't want to work with cloud optimized data cube assets, disable cloud optimization by removing `data` and `visual` from the asset’s `roles` list in the STAC item JSON before ingestion.
68
69
69
-
### Faster dataset access for data APIs and visualization with Kerchunk
70
+
##Zarr ingestion and data updates
70
71
71
-
The Data Explorer and tiling APIs preferentially use the **Kerchunk manifest (`.json`)** for data read operations if one exists in the same blob storage directory as the original asset. Instead of opening the full `.nc` file, we use a **Zarr with reference files**to access only the necessary data.
72
+
As previously mentioned, Zarr is inherently a cloud-native format, so no extra optimization occurs when ingested and no modification of its STAC items is necessary. However, if you plan to dynamically update your Zarr assets and reingest STAC items to work with the latest version, you need to be aware of two update methods: **Append** and **Sync**.
72
73
73
-
Reading data using a chunked, reference-based approach is faster because it avoids reading the entire file into memory.
74
+
### Append
75
+
76
+
If you add new data to a locally stored Zarr store, but want to update the version stored in Planetary Computer Pro, you need to reingest the STAC item. When that item is reingested, the default behavior is to review the assets for any new data, and add it to the data stored in the cloud. No modification to the STAC item is necessary prior to reingestion.
77
+
78
+
### Sync
79
+
80
+
If you remove data from a locally stored Zarr store, reingesting the same STAC item won't allow the cloud-based version to match the version on your machine, as the **append** functionality looks for new data, but not adjust according to any missing data. That's where **sync** comes into play. By modifying the STAC item to include a parameter that indicates you want to sync, the existing data with the new, and reingesting that modified STAC item, only the most up-to-date data from the Zarr store are available in Planetary Computer Pro. The modification to the STAC item should appear as follows:
Copy file name to clipboardExpand all lines: articles/planetary-computer/data-cube-quickstart.md
+4Lines changed: 4 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -79,6 +79,10 @@ More information about visualizing GRIB2 data can be found in the [Visualizing a
79
79
80
80
Once your data cube assets are ingested and configured, you can visualize them in the Planetary Computer Pro Explorer. A step-by-step guide for using the Explorer can be followed in [Quickstart: Use the Explorer in Microsoft Planetary Computer Pro](use-explorer.md).
81
81
82
+
#### Time slider for data cube visualization
83
+
84
+
If your data cube assets have a temporal component, you can use the time slider in the Explorer to visualize changes over time. The time slider will appear automatically if your STAC Items contains assets with a `time` dimension with an `extent` and `step` field.
85
+
82
86
## Related content
83
87
84
88
-[Access STAC collection data cube assets with a collection-level SAS token](./get-collection-sas-token.md)
0 commit comments