Benchmarking: Zarr Version

import holoviews as hv
import hvplot
import hvplot.pandas  # noqa
import pandas as pd
import statsmodels.formula.api as smf

pd.options.plotting.backend = "holoviews"

Read summary of all benchmarking results.

summary = pd.read_parquet("s3://carbonplan-benchmarks/benchmark-data/v0.2/summary.parq")

Subset the data to isolate the impact of Zarr version and chunk size.

df = summary[
    (summary["projection"] == 4326)
    & (summary["pixels_per_tile"] == 128)
    & (summary["shard_size"] == 0)
    & (summary["region"] == "us-west-2")
]

Set plot options.

cmap = ["#E1BE6A", "#40B0A6"]
plt_opts = {"width": 600, "height": 400}

Create a box plot showing how the rendering time depends on Zarr version and chunk size.

df.hvplot.box(
    y="duration",
    by=["actual_chunk_size", "zarr_version"],
    c="zarr_version",
    cmap=cmap,
    ylabel="Time to render (ms)",
    xlabel="Chunk size (MB); Zarr Version",
    legend=False,
).opts(**plt_opts)

Fit a multiple linear regression to the results. The results show that the chunk size strongly impacts the time to render. Datasets with larger chunk sizes take longer to render. The Zarr version does not have a noticeable impact on rendering time.

model = smf.ols("duration ~ actual_chunk_size + C(zarr_version)", data=df).fit()
model.summary()

OLS Regression Results
Dep. Variable:	duration	R-squared:	0.511
Model:	OLS	Adj. R-squared:	0.507
Method:	Least Squares	F-statistic:	132.4
Date:	Sat, 02 Sep 2023	Prob (F-statistic):	4.58e-40
Time:	19:45:37	Log-Likelihood:	-2050.1
No. Observations:	256	AIC:	4106.
Df Residuals:	253	BIC:	4117.
Df Model:	2
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	2066.9020	82.158	25.158	0.000	1905.102	2228.702
C(zarr_version)[T.3]	-17.5790	91.439	-0.192	0.848	-197.657	162.499
actual_chunk_size	84.7372	5.208	16.269	0.000	74.480	94.995

Omnibus:	37.985	Durbin-Watson:	1.955
Prob(Omnibus):	0.000	Jarque-Bera (JB):	52.057
Skew:	-0.953	Prob(JB):	4.96e-12
Kurtosis:	4.116	Cond. No.	31.2

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Show the rendering time at different zoom levels.

plt_opts = {"width": 400, "height": 300}

plts = []

for zoom_level in range(4):
    df_level = df[df["zoom"] == zoom_level]
    plts.append(
        df_level.hvplot.box(
            y="duration",
            by=["actual_chunk_size", "zarr_version"],
            c="zarr_version",
            cmap=cmap,
            ylabel="Time to render (ms)",
            xlabel="Chunk size (MB); Zarr version",
            legend=False,
            title=f"Zoom level {zoom_level}",
        ).opts(**plt_opts)
    )
hv.Layout(plts).cols(2)

/Users/max/mambaforge/envs/benchmark-maps/lib/python3.10/site-packages/holoviews/plotting/bokeh/plot.py:987: UserWarning: found multiple competing values for 'toolbar.active_drag' property; using the latest value
  layout_plot = gridplot(
/Users/max/mambaforge/envs/benchmark-maps/lib/python3.10/site-packages/holoviews/plotting/bokeh/plot.py:987: UserWarning: found multiple competing values for 'toolbar.active_scroll' property; using the latest value
  layout_plot = gridplot(

Add a multiplicative interaction term with zoom level to the multiple linear regression. The results show that chunk size has a significant impact on rendering performance at higher zoom levels, with the most pronounced affect at zoom level 3.

model = smf.ols("duration ~ actual_chunk_size * C(zoom)", data=df).fit()
model.summary()

OLS Regression Results
Dep. Variable:	duration	R-squared:	0.919
Model:	OLS	Adj. R-squared:	0.917
Method:	Least Squares	F-statistic:	401.4
Date:	Sat, 02 Sep 2023	Prob (F-statistic):	2.29e-131
Time:	19:45:37	Log-Likelihood:	-1820.2
No. Observations:	256	AIC:	3656.
Df Residuals:	248	BIC:	3685.
Df Model:	7
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	2274.2785	56.178	40.484	0.000	2163.633	2384.925
C(zoom)[T.1.0]	171.4040	79.447	2.157	0.032	14.927	327.881
C(zoom)[T.2.0]	-595.6177	79.447	-7.497	0.000	-752.095	-439.141
C(zoom)[T.3.0]	-440.4506	79.447	-5.544	0.000	-596.928	-283.974
actual_chunk_size	-6.0398	4.286	-1.409	0.160	-14.482	2.403
actual_chunk_size:C(zoom)[T.1.0]	71.2072	6.062	11.747	0.000	59.268	83.147
actual_chunk_size:C(zoom)[T.2.0]	140.8571	6.062	23.236	0.000	128.918	152.796
actual_chunk_size:C(zoom)[T.3.0]	151.0435	6.062	24.917	0.000	139.104	162.983

Omnibus:	23.536	Durbin-Watson:	1.445
Prob(Omnibus):	0.000	Jarque-Bera (JB):	39.029
Skew:	0.545	Prob(JB):	3.35e-09
Kurtosis:	4.572	Cond. No.	94.1

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.