import hvplot
import hvplot.pandas # noqa
import pandas as pd
import statsmodels.formula.api as smf
pd.options.plotting.backend = "holoviews"Benchmarking: AWS Region
Read summary of all benchmarking results.
summary = pd.read_parquet("s3://carbonplan-benchmarks/benchmark-data/v0.2/summary.parq")Subset the data to isolate the impact of location and chunk size.
df = summary[
(summary["projection"] == 3857)
& (summary["pixels_per_tile"] == 128)
& (summary["shard_size"] == 0)
]Set plot options.
cmap = ["#FFC20A", "#0C7BDC"]
plt_opts = {"width": 600, "height": 400}Create a box plot showing how the rendering time depends on the AWS region and chunk size.
df.hvplot.box(
y="duration",
by=["actual_chunk_size", "region"],
c="region",
cmap=cmap,
ylabel="Time to render (ms)",
xlabel="Chunk size (MB); AWS region",
legend=False,
).opts(**plt_opts)Fit a multiple linear regression to the results. The results show that the chunk size strongly impacts the time to render the data. Datasets with larger chunk sizes take longer to render. The AWS region does not have a noticeable impact on rendering time.
model = smf.ols("duration ~ actual_chunk_size + C(region)", data=df).fit()
model.summary()| Dep. Variable: | duration | R-squared: | 0.446 |
| Model: | OLS | Adj. R-squared: | 0.444 |
| Method: | Least Squares | F-statistic: | 205.1 |
| Date: | Tue, 29 Aug 2023 | Prob (F-statistic): | 4.58e-66 |
| Time: | 20:28:30 | Log-Likelihood: | -3916.0 |
| No. Observations: | 512 | AIC: | 7838. |
| Df Residuals: | 509 | BIC: | 7851. |
| Df Model: | 2 | ||
| Covariance Type: | nonrobust |
| coef | std err | t | P>|t| | [0.025 | 0.975] | |
| Intercept | 1859.2163 | 40.422 | 45.995 | 0.000 | 1779.801 | 1938.631 |
| C(region)[T.us-west-2] | -53.6344 | 44.989 | -1.192 | 0.234 | -142.021 | 34.752 |
| actual_chunk_size | 51.8170 | 2.563 | 20.221 | 0.000 | 46.782 | 56.852 |
| Omnibus: | 22.416 | Durbin-Watson: | 1.979 |
| Prob(Omnibus): | 0.000 | Jarque-Bera (JB): | 12.956 |
| Skew: | 0.227 | Prob(JB): | 0.00154 |
| Kurtosis: | 2.367 | Cond. No. | 31.2 |
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.