Recommendations
Here we provide recommendations for producing pyramids for performant Zarr visualization on the web. These recommendations are based on the end-to-end benchmarking results for the dynamic client approach. These benchmark consider the use-case of rendering data on a map. However, we also discuss how time series visualization could factor into these decisions. We eventually aim to make pyramid generation optional for the dynamic client approach, such that the raw data could be rendered on a web map with limited zooming and panning functionality. Many of these recommendations should still apply when rendering raw data on a web map, however, it would be much more important to also consider the performance implications for scientific computational workflows in that case.
Zarr Version
The end-to-end benchmarking results showed that V2 and V3 data are comparable in performance. Therefore, we recommend adopting the Zarr V3 specification if your preferred Zarr implementation includes the approved version of the Zarr V3 spec. At the time of writing, the Zarrita Python library has implemented the approved Zarr V3 spec but is not recommended for production use. The Zarr Python library is undergoing a refactor to bring the library up-to-date with the V3 spec.
Number of pixels per tile (spatial chunking)
The number of pixels per tile (i.e., spatial chunking) must be a multiple of 16 and is generally 128, 256, or 512. The end-to-end benchmarks tested 128 and 256 pixels per tile and showed that the number of pixels per tile does not impact rendering performance at a given chunk size. However, including more pixels per tile would reduce the proportion of other dimensions (e.g., time) that can be included in a chunk of a given size. Therefore, it would be worth considering fewer pixels per tile (e.g., 128) if visualizing time series is an important use case. By contrast, if only spatial rendering is important and more detail at coarser zoom levels is desired, you may consider increasing the number of pixels per tile to 256. If corresponding to “de facto” standards is particularly important, 256 is commonly used as the spatial width for tiles. In the current implementation of the dynamic client approach, increasing the number of pixels per tile does not increase total storage costs, as the number of zoom levels before reaching full resolution would be correspondingly smaller. However, larger chunk sizes could reduce storage costs when the requirement to align chunks with conventional zoom level boundaries is relaxed in future releases of the maps library.
Chunk size (non-spatial chunking)
The chunk size was the strongest driver for the total time required to render datasets. For optimal rendering performance, we recommended targeting chunk sizes <1MB for the uncompressed data.
Zarr V3 sharding extension
The end-to-end benchmarks showed that the time to render was slower for sharded V3 datasets relative to V2 datasets for zoom levels greater than 0. However, a primary benefit of the sharding extension is that it allows the same dataset to be accessed via large shards for analysis and smaller chunks for visualization. Further, a single file can store many chunks which benefits applications that rely on file-based operations. Given these benefits, we recommend leveraging the sharding specification after its review and acceptance as a Zarr Enhancement Proposal (ZEP) and your preferred Zarr implementation includes the approved sharding extension. The voting process for this ZEP expected to end on October 31, 2023. We found the the shard size does not impact the time to render and therefore recommend a follow-up study on optimal shard structures for computational workflows.
We expect that the performance difference between sharded and non-sharded datasets could be minimized by future optimizations in the loading library, such as through the concatenation of range requests for adjacent chunks at higher zoom levels.