Benchmarking: Pixels per Tile

import hvplot
import hvplot.pandas  # noqa
import pandas as pd
import statsmodels.formula.api as smf

pd.options.plotting.backend = "holoviews"

Read summary of all benchmarking results.

summary = pd.read_parquet("s3://carbonplan-benchmarks/benchmark-data/v0.2/summary.parq")

Subset the data to isolate the impact of the number of pixels per tile and chunk size.

df = summary[
    (summary["projection"] == 3857) & (summary["region"] == "us-west-2")
].sort_values(by=["target_chunk_size", "pixels_per_tile"])

Set plot options.

cmap = ["#994F00", "#006CD1"]
plt_opts = {"width": 600, "height": 400}

Create a box plot showing how the rendering time depends on the number of pixels per tile and chunk size.

df.hvplot.box(
    y="duration",
    by=["actual_chunk_size", "pixels_per_tile"],
    c="pixels_per_tile",
    cmap=cmap,
    ylabel="Time to render (ms)",
    xlabel="Chunk size (MB); Pixels per tile",
    legend=False,
).opts(**plt_opts)

Fit a multiple linear regression to the results. The results show that the number of pixels per tile independent of the chunk size does not significantly impact rendering time. Datasets with larger chunks take longer to render.

model = smf.ols("duration ~ actual_chunk_size + C(pixels_per_tile)", data=df).fit()
model.summary()

OLS Regression Results
Dep. Variable:	duration	R-squared:	0.275
Model:	OLS	Adj. R-squared:	0.273
Method:	Least Squares	F-statistic:	193.4
Date:	Tue, 29 Aug 2023	Prob (F-statistic):	6.08e-72
Time:	20:29:05	Log-Likelihood:	-7981.0
No. Observations:	1024	AIC:	1.597e+04
Df Residuals:	1021	BIC:	1.598e+04
Df Model:	2
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
Intercept	2031.8289	33.974	59.805	0.000	1965.162	2098.496
C(pixels_per_tile)[T.256]	-3.3655	37.576	-0.090	0.929	-77.101	70.370
actual_chunk_size	43.2144	2.250	19.209	0.000	38.800	47.629

Omnibus:	35.870	Durbin-Watson:	2.052
Prob(Omnibus):	0.000	Jarque-Bera (JB):	39.124
Skew:	0.477	Prob(JB):	3.19e-09
Kurtosis:	2.907	Cond. No.	29.2

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.