Benchmarks 3: Chunk Size Variation

Explanation

We compared the performance of tiling artificially generated Zarr data to different chunk sizes. The CMIP6 data provides an excellent real world dataset, but is relatively low resolution. In order to study the impact of higher resolution data, we artificially generated Zarr datastores to explore the relationship between tile generation time and chunk size.

Dataset Generation

We generated multiple data stores of increasingly fine spatial resolution so the total size of each chunk, with no spatial chunks, is multiplied by a factor of 2, to allow for increased chunk size.

The code to produce the zarr stores are in the tile-benchmarking repo: 01-generate-datasets/generate-fake-data-with-chunks.ipynb.

Tests

Tests were run via the tile-benchmarking/02-run-tests/03-chunk-size.ipynb notebook.

import pandas as pd
import hvplot.pandas
import holoviews as hv
pd.options.plotting.backend = 'holoviews'
import warnings
warnings.filterwarnings('ignore')
git_url_path = "https://raw.githubusercontent.com/developmentseed/tile-benchmarking/main/02-run-tests/results-csvs/"
df = pd.read_csv(f"{git_url_path}/03-chunk-size-results.csv")
zooms = range(6)
plt_opts = {"width": 400, "height": 300}

plts = []

for zoom_level in zooms:
    df_level = df[df["zoom"] == zoom_level]
    plts.append(
        df_level.hvplot.box(
            y="time",
            by=["chunk_size_mb"],
            c="chunk_size_mb",
            cmap='Plasma_r',
            ylabel="Time to render (ms)",
            xlabel="Chunk size (MB)",
            legend=False,
            title=f"Zoom level {zoom_level}",
        ).opts(**plt_opts)
    )
hv.Layout(plts).cols(2)

Interpretation of the Results

It’s clear smaller chunk sizes support faster tile rendering, regardless of zoom level. However, the tradeoff will be in the number of chunks. Smaller chunk sizes means more spatial chunks will be required to render the data which can impact performance at low zoom levels. We will take a look at this in the next notebook Benchmarks: Exploring Spatial Chunk Variation Impacts on Tile Generation.