import pandas as pd
import hvplot.pandas
import holoviews as hv
= 'holoviews'
pd.options.plotting.backend import warnings
'ignore') warnings.filterwarnings(
Benchmarks 4: Spatial Chunk Variation
Explanation
In the previous notebook, we saw how chunk size impacts performance. However, using a small chunk size will result in more chunks. In this notebook, we explore how the number of chunks spatially can impact performance, especially at low zoom levels.
Dataset Generation
We compared the performance of tiling artificially generated Zarr data with constant chunk size and increased the spatial resolution, so a varied number of chunks is required for spatial coverage.
The code to produce the zarr stores are in the tile-benchmarking repo: 01-generate-datasets/generate-fake-data-with-chunks.ipynb.
Tests
Tests were run via the tile-benchmarking/02-run-tests/04-number-of-spatial-chunks.ipynb notebook.
= "https://raw.githubusercontent.com/developmentseed/tile-benchmarking/main/02-run-tests/results-csvs/"
git_url_path = pd.read_csv(f"{git_url_path}/04-number-of-spatial-chunks-results.csv") df
= range(6)
zooms
= {"width": 400, "height": 300}
plt_opts
= []
plts
for zoom_level in zooms:
= df[df["zoom"] == zoom_level]
df_level
plts.append(
df_level.hvplot.box(="time",
y=["number_of_spatial_chunks"],
by="number_of_spatial_chunks",
c='Plasma_r',
cmap="Time to render (ms)",
ylabel="Number of spatial chunks",
xlabel=False,
legend=f"Zoom level {zoom_level}",
title**plt_opts)
).opts(
)2) hv.Layout(plts).cols(
Interpretation of the Results
Having a greater number of spatial chunks degrades performance at low zoom levels as seen above most notably for zooms 0 and 1. At high zoom levels, since fewer chunks need to be loaded, there is no difference in performance.
We can solve the problem of slow performance at low zoom levels with pyramids, or multiscale, datasets, as demonstrated in Benchmarks: Pyramids.