This notebook was written on the VEDA JupyterHub and as such is designed to be run on a jupyterhub which is associated with an AWS IAM role which has been granted permissions to the VEDA data store via its bucket policy. The instance used provided 16GB of RAM.
See (VEDA Analytics JupyterHub Access)[https://nasa-impact.github.io/veda-docs/veda-jh-access.html] for information about how to gain access.
Outside the Hub
The data is in a protected bucket. Please request access by emailng aimee@developmentseed.org or alexandra@developmentseed.org and providing your affiliation, interest in or expected use of the dataset and an AWS IAM role or user Amazon Resource Name (ARN). The team will help you configure the cognito client.
You should then run:
%run -i 'cognito_login.py'
Approach
Motivation
The VEDA backend (via TiTiler) provides an API endpoint for computing zonal statistics (average, standard deviation, and other metrics over a geographic subset) across gridded (raster) data such as satellite imagery or climate datasets.
Some statistics, such as average, median, standard deviation, or percentiles are sensitive to differences in grid cell / pixel sizes: when some grid cells are (in metric units) have a larger area than others, the values in these cells will be under-represented. Grid cell sizes depends on the grid / projection of the data.
Varying grid cell sizes is common for climate datasets that are stored on a grid in geographic coordinates (lat/lon), for example a 0.1 degree by 0.1 degree global grid. Here, grid cell size will decrease from low to high latitudes. Computing averages over large spans of latitude will result in statistics where values closer to the poles are strongly over-represented.
To avoid this inaccuracy in zonal statistics computed with our API, we introduced a method to reproject the source data to an equal-area projection as an intermediate step in calculating statistics.
Note: this reprojection is not needed for example for accurate zonal statistics on a Sentinel-2 scene, using the Military Grid Reference System (MGRS) and a Mercator (UTM) projection. Here, pixel areas are the same across the scene in the native projection.
In this notebook
This notebook presents a validation of VEDA’s API for zonal statistics against a known way to compute area-weighted averages for gridded datasets on a regular lat/lon grid.
For illustration, we choose a real dataset from the VEDA data catalog and a subsetting area that spans a large latitude range.
The figures below show the average calculated over that area of interest with the different methods, for comparison.
%pip install pystac_client
Requirement already satisfied: pystac_client in /srv/conda/envs/notebook/lib/python3.10/site-packages (0.7.2)
Requirement already satisfied: requests>=2.28.2 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from pystac_client) (2.31.0)
Requirement already satisfied: pystac[validation]>=1.7.2 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from pystac_client) (1.8.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from pystac_client) (2.8.2)
Requirement already satisfied: jsonschema>=4.0.1 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from pystac[validation]>=1.7.2->pystac_client) (4.17.3)
Requirement already satisfied: six>=1.5 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from python-dateutil>=2.8.2->pystac_client) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from requests>=2.28.2->pystac_client) (3.1.0)
Requirement already satisfied: idna<4,>=2.5 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from requests>=2.28.2->pystac_client) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from requests>=2.28.2->pystac_client) (1.26.15)
Requirement already satisfied: certifi>=2017.4.17 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from requests>=2.28.2->pystac_client) (2023.5.7)
Requirement already satisfied: attrs>=17.4.0 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from jsonschema>=4.0.1->pystac[validation]>=1.7.2->pystac_client) (23.1.0)
Requirement already satisfied: pyrsistent!=0.17.0,!=0.17.1,!=0.17.2,>=0.14.0 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from jsonschema>=4.0.1->pystac[validation]>=1.7.2->pystac_client) (0.19.3)
Note: you may need to restart the kernel to use updated packages.
import tqdmimport requestsimport rasterioimport rasterio.crsimport xarray as xrimport rioxarrayimport pandas as pdimport numpy as npimport matplotlib.pyplot as pltfrom pystac_client import Client
Define formula for grid cell area for geographic coordinates
def _get_unique_diff(arr):assert np.ndim(arr) ==1 deltas = np.diff(arr)ifnot np.allclose(deltas, deltas[0]): raiseValueError(f"The spacing in the array is not uniform: {list(np.unique(deltas))}")return deltas[0]def grid_cell_area(lat):""" https://www.mathworks.com/matlabcentral/answers/447847-how-to-calculate-the-area-of-each-grid-cell https://gis.stackexchange.com/a/28156 """# get lat spacing asserting it is uniform dlat = _get_unique_diff(lat)# calculate cell edge lat lat_edge = lat - dlat /2.# radius of Earth in meters R_e =6371e3# calculate cell area as a function of latitudereturn R_e **2* (np.sin(np.radians(lat_edge + dlat)) - np.sin(np.radians(lat_edge))) * np.radians(dlat)
df.plot(ylabel=f"average {ASSET_NAME}", title=f"{ASSET_NAME} averaged over {AOI_NAME}") ;
Plot grid cell area as a function of latitude
plt.plot(lat_vals, area_lat_2d[:, 0])plt.ylabel("Grid cell area (m²)")plt.xlabel("Latitude") ;
Compute zonal averages using TiTiler API
We make use of the option on TiTiler to reproject the data subset to an equal-area projection (Equal Area Cylindrical) before computing the statistics.