Benchmarking Methodology

End-to-End Benchmarks

The end-to-end benchmarks capture the user experience for various interactions. The suite of benchmarks included in this cookbook are designed to inform how the choice of Zarr versions and chunking schemes influence the user experience.

End-to-End Benchmarks: Datasets

We used the publicly available NASA Earth Exchange Global Daily Downscaled Projections (NEX-GDDP-CMIP6) AWS public dataset for this project. For this demonstration, we used two years of the the daily maximum near-surface air temperature (tasmax) variable from the ACCESS-CM2 climate model.

We first transformed the NetCDF files hosted on S3 to Zarr stores, with the full notebook available in the benchmark-maps repository.

Next, we used ndpyramid to generate pyramids for the Zarr store. The notebook for generating pyramids is available in the benchmark-maps repository. We created pyramids containing four zoom levels using the pyramid_reproject function in ndpyramid. We generated pyramids using multiple data configurations, including 128 and 256 pixels per tile in the spatial dimensions and 1MB, 5MB, 10MB, and 25MB target chunk sizes. The chunk size along the time dimension was the largest number of time slices that would evenly divide the time dimension and produce an uncompressed chunk that did not exceed the target size. For most use-cases, the time dimension would not need to be evenly divisable by the chunk size; this was only necessary for the comparison with V3 sharded datasets. The data were encoded as float32 while the time coordinate was encoded as int32 using level 1 gzip compression.

We used the experimental zarrita library to convert the pyramids to Zarr V3 data for performance testing. The data leveraged the same encoding as the Zarr V2 pyramids, with the addition of a sharding codec. The target shard sizes were 25MB, 50MB, and 100MB and the chunk size within each shard was equivalent to the V2 chunking scheme.

End-to-End Benchmarks: Approach

CarbonPlan’s benchmark-maps repository leverages Playwright for the end-to-end performance benchmarks. By default, the benchmarks are run on https://prototype-maps.vercel.app/ although the url is configurable. The dynamic client prototype library shows this domain after selecting a dataset and Zarr version.

The benchmarking script takes the following steps:

  1. Launch chromium browser
  2. Create a new page
  3. Start chromium tracing
  4. Navigate to web mapping application
  5. Select Dataset in the dropdown
  6. Wait 5 seconds for the page the render
  7. Zoom in a defined number of times, waiting 5 seconds after each action
  8. Write out metadata about each run and the trace record

End-to-End Benchmarks: Instructions

The benchmark-maps repository can be used to run the benchmarking suite. The first step is to clone the repository:

git clone https://github.com/carbonplan/benchmark-maps.git

The next step is to create an environment for running the benchmarks. We recommend mamba for managing the environment. You will also need to install the required dependencies for playwright:

mamba env create --file binder/environment.yml
mamba activate benchmark-maps
playwright install

Once the environment is set up, you can run the benchmarks by running the following command:

carbonplan_benchmarks --dataset pyramids-v2-3857-True-128-1-0-0-f4-0-0-0-gzipL1-100 --action zoom_in --zoom-level 4

In addition, main.sh in the benchmark-maps repository is a script for running multiple iterations of the benchmarks on multiple datasets and Zarr versions.

End-to-End Benchmarks: Processing

Each benchmark yields a metadata file and trace record. The carbonplan_benchmarks Python package provides utilities for analyzing and visualizing these outputs. For each interation (e.g., loading the page, zooming in), we extracted information about the requests (e.g., duration, URL, encoded data length), frames (e.g., duration, status), and calculated the amount of time before rendering was complete. Note that these metrics do not consider the fact that the time to render the first part of the data on the page strongly influences the user experience and would be much faster than the time to render the entire page.