Benchmarking: Zarr Version

import holoviews as hv
import hvplot
import hvplot.pandas  # noqa
import pandas as pd
import statsmodels.formula.api as smf

pd.options.plotting.backend = "holoviews"

Read summary of all benchmarking results.

summary = pd.read_parquet("s3://carbonplan-benchmarks/benchmark-data/v0.2/summary.parq")

Subset the data to isolate the impact of Zarr version and chunk size.

df = summary[
    (summary["projection"] == 4326)
    & (summary["pixels_per_tile"] == 128)
    & (summary["shard_size"] == 0)
    & (summary["region"] == "us-west-2")
]

Set plot options.

cmap = ["#E1BE6A", "#40B0A6"]
plt_opts = {"width": 600, "height": 400}

Create a box plot showing how the rendering time depends on Zarr version and chunk size.

df.hvplot.box(
    y="duration",
    by=["actual_chunk_size", "zarr_version"],
    c="zarr_version",
    cmap=cmap,
    ylabel="Time to render (ms)",
    xlabel="Chunk size (MB); Zarr Version",
    legend=False,
).opts(**plt_opts)

Fit a multiple linear regression to the results. The results show that the chunk size strongly impacts the time to render. Datasets with larger chunk sizes take longer to render. The Zarr version does not have a noticeable impact on rendering time.

model = smf.ols("duration ~ actual_chunk_size + C(zarr_version)", data=df).fit()
model.summary()
OLS Regression Results
Dep. Variable: duration R-squared: 0.511
Model: OLS Adj. R-squared: 0.507
Method: Least Squares F-statistic: 132.4
Date: Sat, 02 Sep 2023 Prob (F-statistic): 4.58e-40
Time: 19:45:37 Log-Likelihood: -2050.1
No. Observations: 256 AIC: 4106.
Df Residuals: 253 BIC: 4117.
Df Model: 2
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 2066.9020 82.158 25.158 0.000 1905.102 2228.702
C(zarr_version)[T.3] -17.5790 91.439 -0.192 0.848 -197.657 162.499
actual_chunk_size 84.7372 5.208 16.269 0.000 74.480 94.995
Omnibus: 37.985 Durbin-Watson: 1.955
Prob(Omnibus): 0.000 Jarque-Bera (JB): 52.057
Skew: -0.953 Prob(JB): 4.96e-12
Kurtosis: 4.116 Cond. No. 31.2


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Show the rendering time at different zoom levels.

plt_opts = {"width": 400, "height": 300}

plts = []

for zoom_level in range(4):
    df_level = df[df["zoom"] == zoom_level]
    plts.append(
        df_level.hvplot.box(
            y="duration",
            by=["actual_chunk_size", "zarr_version"],
            c="zarr_version",
            cmap=cmap,
            ylabel="Time to render (ms)",
            xlabel="Chunk size (MB); Zarr version",
            legend=False,
            title=f"Zoom level {zoom_level}",
        ).opts(**plt_opts)
    )
hv.Layout(plts).cols(2)
/Users/max/mambaforge/envs/benchmark-maps/lib/python3.10/site-packages/holoviews/plotting/bokeh/plot.py:987: UserWarning: found multiple competing values for 'toolbar.active_drag' property; using the latest value
  layout_plot = gridplot(
/Users/max/mambaforge/envs/benchmark-maps/lib/python3.10/site-packages/holoviews/plotting/bokeh/plot.py:987: UserWarning: found multiple competing values for 'toolbar.active_scroll' property; using the latest value
  layout_plot = gridplot(

Add a multiplicative interaction term with zoom level to the multiple linear regression. The results show that chunk size has a significant impact on rendering performance at higher zoom levels, with the most pronounced affect at zoom level 3.

model = smf.ols("duration ~ actual_chunk_size * C(zoom)", data=df).fit()
model.summary()
OLS Regression Results
Dep. Variable: duration R-squared: 0.919
Model: OLS Adj. R-squared: 0.917
Method: Least Squares F-statistic: 401.4
Date: Sat, 02 Sep 2023 Prob (F-statistic): 2.29e-131
Time: 19:45:37 Log-Likelihood: -1820.2
No. Observations: 256 AIC: 3656.
Df Residuals: 248 BIC: 3685.
Df Model: 7
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
Intercept 2274.2785 56.178 40.484 0.000 2163.633 2384.925
C(zoom)[T.1.0] 171.4040 79.447 2.157 0.032 14.927 327.881
C(zoom)[T.2.0] -595.6177 79.447 -7.497 0.000 -752.095 -439.141
C(zoom)[T.3.0] -440.4506 79.447 -5.544 0.000 -596.928 -283.974
actual_chunk_size -6.0398 4.286 -1.409 0.160 -14.482 2.403
actual_chunk_size:C(zoom)[T.1.0] 71.2072 6.062 11.747 0.000 59.268 83.147
actual_chunk_size:C(zoom)[T.2.0] 140.8571 6.062 23.236 0.000 128.918 152.796
actual_chunk_size:C(zoom)[T.3.0] 151.0435 6.062 24.917 0.000 139.104 162.983
Omnibus: 23.536 Durbin-Watson: 1.445
Prob(Omnibus): 0.000 Jarque-Bera (JB): 39.029
Skew: 0.545 Prob(JB): 3.35e-09
Kurtosis: 4.572 Cond. No. 94.1


Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.