Benchmarking Methodology

Benchmarks are provided for a deployed API (end-to-end tests, described below) as well as for the application code.

Scripts for generating the data and running the benchmarks can be found in https://github.com/developmentseed/tile-benchmarking.

Code Benchmarks

While the end-to-end benchmarks described below establish tiling performance as experienced by an end user, the introduction of the network introduce variables which are difficult to control, specifically the many varieties and locations of a client and server.

Application code benchmarks provide a clear understanding of performance for the code alone due to differences across datasets.

Code Benchmarks: Datasets

Reproducibility is important to the integrity of this project and its results. We used a publicly available dataset and attempt to make the steps as fully documented and reproducible as possible. For code profiling, we focused the NASA Earth Exchange Global Daily Downscaled Projections (NEX-GDDP-CMIP6) AWS public dataset for this project.

There are 2 datasets listed on AWS from this project. One (1) is an archive of NetCDF files from about 35 different climate models, each supplying historical and predicted values for up to 9 environment variables, daily, from 1950 to 2100. To minimize preprocessing, test datasets were generated for the first 2 years of historical data, for 1 model and for 1 variable. These variables or dataset could easily be modified or swapped, but we expect the relative performance using different datasets to be the same.

In addition to the NetCDF data, there is an archive of COGs generated from the corresponding NetCDFs to support visualization via dynamic tiling using COGs. COGs are only available for 2 models, so for intercomparison of the tiling approach between COGs and Zarr, one of those models (GISS-E2-1-G) is used to generate Zarr stores.

Benchmarks were generated for multiple copies of the CMIP6 daily data and some synthetic data to understand the performance for different data pre-processing options. These copies were differentiated by:

  • data file format (COG, NetCDF and Zarr)
  • different chunking configurations
  • pyramids

Note: At this time, a different model is used for the dynamic client benchmarks (ACCESS-CM2), but we believe the test results and corresponding recommendations would not change for different CMIP6 models.

Benchmarks were run on the VEDA JupyterHub. The VEDA documentation details how to request access: https://nasa-impact.github.io/veda-docs/services/jupyterhub.html. We chose not to make the database or S3 bucket nasa-eodc-data-store fully public. You must be logged into the VEDA JupyterHub to run those benchmarks or reproduce the test datasets.

Code Benchmarks: Approach

We include performance results of code for tiling both COGs and Zarr to help data providers decide which format better suits their overall needs. To make these results comparable, we assume the following process for creating image tiles:

  1. Start with a known collection and query parameters
  2. Read metadata
    1. For COGs, the query is registered with the pgSTAC database for the collection id and query parameters, such as variable and datetime.
    2. For Zarr, the metadata is “lazily loaded” for the variable and temporal extent from the known collection store.
  3. Generate tiles
    1. For COGs, the mosaic ID returned from the registered query is used to read chunks from COGs on S3.
    2. For Zarr, the metadata is used by xarray to read chunks from NetCDFs or Zarrs on S3 and the XarrayReader of rio_tiler is used to generate tiles.

To make as close to an “apples to apples” comparison as possible, we have stored COG metadata using pgSTAC for data in the nex-gddp-cmip6-cog bucket in AWS Relational Database Service (RDS) and Zarr metadata and data files in S3.

To profile the code for rendering tiles with either rio_tiler.XarrayReader or titiler-pgstac.PGSTACBackend, code was copied from those projects as needed to inject timers.

The time to generate a tile at various zoom levels was tested for each data store multiple times and box plots are used to report results for consistency with the dynamic client reporting.

End-to-End Benchmarks

End-to-end tests provide benchmarks of response times for various tiles and datasets to titiler-xarray.

Details and code to generate the benchmarks and store the results on S3 is documented in tile-server-e2e-benchmarks.ipynb.

End-to-End Benchmarks: Datasets

A variety of datasets were selected for end-to-end testing and hopefully the framework makes it easy to modify and test new datasets and use cases come up. See the tile-server-e2e-benchmarks.ipynb for specific datasets.

End-to-End Benchmarks: Approach

Code from https://github.com/bdon/TileSiege was used to generate a set of tile URLs for the selected test datasets. The testing tool https://locust.io/ was used to run tests. Results are stored uploated as CSV files in S3. These results are read and plotted in tile-server-e2e-benchmarks.ipynb.