Benchmarks 5: Zarr Pyramids

Explanation

In the previous notebooks, we saw how chunk size and number can impact performance. Pyramids, or multiscale datasets, aggregate data at various levels to create reduced resolution datasets which perform better at low zoom levels. These datasets are not representing the “raw data” but it is assumed these aggregated datasets are intended for visual representation of the data and not numerical analysis.

Dataset Generation

The code to produce the zarr pyramids is in the tile-benchmarking repo: 02-run-tests/05-cmip6-pyramid.ipynb.

Tests

Tests were run via the tile-benchmarking/02-run-tests/04-number-of-spatial-chunks.ipynb notebook.

import pandas as pd
import warnings
import holoviews as hv

pd.options.plotting.backend = "holoviews"

warnings.filterwarnings("ignore")

git_url_path = "https://raw.githubusercontent.com/developmentseed/tile-benchmarking/main/02-run-tests/results-csvs/"
df = pd.read_csv(f"{git_url_path}/05-cmip6-pyramid-results.csv")

zooms = range(4)

plt_opts = {"width": 400, "height": 300}

plts = []

for zoom_level in zooms:
    df_level = df[df["zoom"] == zoom_level]
    plts.append(
        df_level.hvplot.box(
            y="time",
            by=["data_format"],
            c="data_format",
            cmap="Plasma_r",
            ylabel="Time to render (ms)",
            xlabel="Data Format",
            legend=False,
            title=f"Zoom level {zoom_level}",
        ).opts(**plt_opts)
    )
hv.Layout(plts).cols(2)

Interpretation of the Results

At low zoom levels, pyramids improved performance. It is notable that this dataset is not very high resolution and that a more significant performance improvement may be seen for pyramids for higher resolution datasets. For the CMIP6 dataset, it only makes sense to use a pyramid for zoom levels 0 and 1.