Recommendations

Data Format

  • Data Format: At this time, COG + pgSTAC tiling performs better than tiling Zarr or kerchunk references, at all zoom levels.
  • Kerchunk Reference Files: The performance of tiling using a kerchunk reference can be as good or better than a zarr store. It is important to consider this is when the NetCDF files’ chunks are the same as the zarr store version.

Zarr-specific Recommendations

  • Ensure no zarr coordinate chunking: Ensure coordinate data is not being chunked. If coordinates are being chunked, it will result in more files being opened during xarray.open_dataset and cause significant performance degradation.
  • Smaller chunk sizes perform better: Chunk size significantly impacts performance. A specific recommendation depends on the performance requirements of the application.
  • Fewer spatial chunks perform better: A greater number of chunks, spatially, will impact performance especially at low zoom levels as more chunks are loaded for greater spatial coverage.
  • Pyramids improve performance for high resolution datasets: High resolution datasets will suffer having either large chunks or many chunks, or both. To provide a good experience, zarr data can be aggregated into multiscale datasets, otherwise known as pyramids.

What is high resolution?

Given the current performance of titiler-xarray in tile-server-e2e-benchmarks.ipynb and assuming you are targeting 300ms or less, it would be suggested to target 8mb or smaller for your chunks.

To give a sense of what this means in terms of spatial resolution, and assuming a global dataset where the full spatial extent is stored in a single chunk, you would have the following dimensions of your dataset:

import numpy as np
datatypes = ["float16", "float32", "float64"]
total_global_chunk_size_mb = 8

for data_type in datatypes:
    # Determine the size in bytes of each data value
    dtype = np.dtype(data_type)
    # calcuate the itemsize in megabytes
    itemsize_mb = dtype.itemsize/1024/1024
    y_dim = np.sqrt(total_global_chunk_size_mb/2/itemsize_mb)
    x_dim = y_dim * 2
    x_deg = np.round(180/y_dim, 3)
    y_deg = np.round(360/x_dim, 3)
    # Source for lat/lon degrees conversion to meters: https://www.sco.wisc.edu/2022/01/21/how-big-is-a-degree/
    deg_to_km = (111000/1000)
    
    print(f"For data type {dtype}, an 8MB spatial dataset would have:")
    print(f"* Dimensions: {int(np.round(y_dim))} x {int(np.round(x_dim))}")
    print(f"* Degrees for global data: {np.round(180/y_dim, 3)} x {np.round(360/x_dim, 3)}")
    
    # Some sources calculate that a degree of longitude at the equator is 111,319.5 meters, but this is just a ballpark figure for the spatial resolution.
    print(f"* Approximate kilometers(km) at the equator: {int(np.round(deg_to_km * y_deg))} x {int(np.round(deg_to_km * x_deg, 0))}\n")
For data type float16, an 8MB spatial dataset would have:
* Dimensions: 1448 x 2896
* Degrees for global data: 0.124 x 0.124
* Approximate kilometers(km) at the equator: 14 x 14

For data type float32, an 8MB spatial dataset would have:
* Dimensions: 1024 x 2048
* Degrees for global data: 0.176 x 0.176
* Approximate kilometers(km) at the equator: 20 x 20

For data type float64, an 8MB spatial dataset would have:
* Dimensions: 724 x 1448
* Degrees for global data: 0.249 x 0.249
* Approximate kilometers(km) at the equator: 28 x 28

If your dataset has a higher resolution than what is listed above, you will either want to chunk your data spatially or create a pyramid or both. Having spatially chunked data can also impact performance at low zoom levels, so you should try chunks significantly smaller than 8MB, say 4MB. Assuming the spatial extent of your data is larger than 16MB, you will probably want to create a pyramid.

It’s common to prefer larger chunk sizes for analysis workflows. These situations may motivate creating pyramids with small chunks for visualization purposes.

The Zarr V3 sharding extension may help in the future with the trade-off between chunk size and number of chunks at different zoom levels. Sharding stores multiple chunks together in one object, so range requests for small chunks will still work for high (zoomed in) zoom levels while the potential to concatenate adjacent ranges into a single request means multiple chunks or an entire shard could be read in one request for low (zoomed out) zoom levels.