import pandas as pd
import hvplot.pandas
import holoviews as hv
= 'holoviews'
pd.options.plotting.backend import warnings
'ignore') warnings.filterwarnings(
Benchmarks 5: Zarr Pyramids
Explanation
In the previous notebooks, we saw how chunk size and number can impact performance. Pyramids, or multiscale datasets, aggregate data at various levels to create reduced resolution datasets which perform better at low zoom levels. These datasets are not representing the “raw data” but it is assumed these aggregated datasets are intended for visual representation of the data and not numerical analysis.
Dataset Generation
The code to produce the zarr pyramids is in the tile-benchmarking repo: 02-run-tests/05-cmip6-pyramid.ipynb.
Tests
Tests were run via the tile-benchmarking/02-run-tests/04-number-of-spatial-chunks.ipynb notebook.
= "https://raw.githubusercontent.com/developmentseed/tile-benchmarking/main/02-run-tests/results-csvs/"
git_url_path = pd.read_csv(f"{git_url_path}/05-cmip6-pyramid-results.csv") df
= range(4)
zooms
= {"width": 400, "height": 300}
plt_opts
= []
plts
for zoom_level in zooms:
= df[df["zoom"] == zoom_level]
df_level
plts.append(
df_level.hvplot.box(="time",
y=["data_format"],
by="data_format",
c='Plasma_r',
cmap="Time to render (ms)",
ylabel="Data Format",
xlabel=False,
legend=f"Zoom level {zoom_level}",
title**plt_opts)
).opts(
)2) hv.Layout(plts).cols(
Interpretation of the Results
At low zoom levels, pyramids improved performance. It is notable that this dataset is not very high resolution and that a more significant performance improvement may be seen for pyramids for higher resolution datasets. For the CMIP6 dataset, it only makes sense to use a pyramid for zoom levels 0 and 1.