import pandas as pd
import hvplot.pandas
import holoviews as hv
= 'holoviews'
pd.options.plotting.backend import warnings
'ignore') warnings.filterwarnings(
Benchmarks 1: Tiling COGs with and without GDAL Environment Variables
Explanation
This notebook does not report any results for tiling Zarr datasets. It surfaces the significance of the underlying environment and configuration of low level libraries to the performance of a framework we are comparing with for tiling imagery.
titiler-pgstac creates image tiles using rio-tiler which uses rasterio. Rasterio uses GDAL “under the hood”. Certain GDAL environment variables impact tiling performance when working with rasterio to read data from Cloud-Optimized GeoTIFFs
As noted in Benchmarking Methodolgy, the time to tile includes the time to query a pgSTAC database and then use the query ID returned to read and create image tiles from COGs on S3. The libraries used were pgSTAC
for reading STAC metadata and rasterio (via rio_tiler
) for reading COGs on S3.
Dataset Generation
All dataset generation code is in the tiling-benchmark repo’s cmip6-pgstac directory. The STAC collection is defined in CMIP6_daily_GISS-E2-1-G_tas_collection.json. The STAC item records for the CMIP6 COGs are generated in the 01-generate-datasets/cmip6-pgstac/generate_cmip6_items.ipynb notebook. They are seeded via seed-db.sh.
Tests
Tests were run via the tile-benchmarking/02-run-tests/01-cog-gdal-tests.ipynb notebook.
= "https://raw.githubusercontent.com/developmentseed/tile-benchmarking/main/02-run-tests/results-csvs/"
git_url_path = pd.read_csv(f"{git_url_path}/01-cog-gdal-results.csv")
df 'set_gdal_vars'] = df['set_gdal_vars'].astype(str) df[
= range(6)
zooms = ["#E1BE6A", "#40B0A6"]
cmap = {"width": 300, "height": 250}
plt_opts
= []
plts
for zoom_level in zooms:
= df[df["zoom"] == zoom_level]
df_level
plts.append(
df.hvplot.box(="time",
y=["set_gdal_vars"],
by="set_gdal_vars",
c=cmap,
cmap="Time to render (ms)",
ylabel="GDAL Environment Variables Set/Unset",
xlabel=False,
legend=f"Zoom level {zoom_level}",
title**plt_opts)
).opts(
)2) hv.Layout(plts).cols(
Interpretation of the Results
- Setting these GDAL environment variables significantly impacts performance, with 100x speed up in performance.
- Not shown above, but variation across tiles is not significant.
- Variation across zoom levels is not significant.
These GDAL variables are documented here: https://developmentseed.org/titiler/advanced/performance_tuning/.
By setting the GDAL environment variables we limit the number of total requests to S3. Specifically, these environment variables ensure that:
- All of the metadata may be read in 1 request. This is not necessarily true, but more likely since we increase the initial number of GDAL ingested bytes.
- There are no extra LIST requests which GDAL uses to discover sidecar files. COGs don’t have sidecar files.
- Consecutive range requests are merged into 1 request.
- Multiple range requests use the same TCP connection.