Open and visualize COGs

Demonstrates how to visualize COGs using pystac_client, rioxarray, stackstac, and hvplot
Author

Julia Signell

Published

February 2, 2023

Run this notebook

You can launch this notebook in VEDA JupyterHub by clicking the link below.

Launch in VEDA JupyterHub (requires access)

Learn more

Inside the Hub

This notebook was written on the VEDA JupyterHub and as such is designed to be run on a jupyterhub which is associated with an AWS IAM role which has been granted permissions to the VEDA data store via its bucket policy. The instance used provided 16GB of RAM.

See (VEDA Analytics JupyterHub Access)[https://nasa-impact.github.io/veda-docs/veda-jh-access.html] for information about how to gain access.

Outside the Hub

The data is in a protected bucket. Please request access by emailng aimee@developmentseed.org or alexandra@developmentseed.org and providing your affiliation, interest in or expected use of the dataset and an AWS IAM role or user Amazon Resource Name (ARN). The team will help you configure the cognito client.

You should then run:

%run -i 'cognito_login.py'

Approach

  1. Use pystac_client to open the STAC catalog and retrieve the items in the collection
  2. Use stackstac to create an xarray dataset containing all the items
  3. Use rioxarray to crop data to AOI
  4. Use hvplot to render the COG at every timestep
import requests
from pystac_client import Client
import pandas as pd
import stackstac

import rioxarray  # noqa
import hvplot.xarray  # noqa

Declare your collection of interest

You can discover available collections the following ways:

  • Programmatically: see example in the list-collections.ipynb notebook
  • JSON API: https://staging-stac.delta-backend.com/collections
  • STAC Browser: http://veda-staging-stac-browser.s3-website-us-west-2.amazonaws.com
STAC_API_URL = "https://staging-stac.delta-backend.com/"
collection_id = "no2-monthly"

Discover items in collection for region and time of interest

Use pystac_client to search the STAC collection for a particular area of interest within specified datetime bounds.

catalog = Client.open(STAC_API_URL)
search = catalog.search(collections=[collection_id], sortby="start_datetime")

item_collection = search.item_collection()
print(f"Found {len(item_collection)} items")
Found 93 items

Define an AOI

We can fetch GeoJSON for metropolitan France and Corsica (excluding overseas territories) from an authoritative online source (https://gadm.org/download_country.html).

response = requests.get(
    "https://geodata.ucdavis.edu/gadm/gadm4.1/json/gadm41_FRA_0.json"
)

# If anything goes wrong with this request output error contents
assert response.ok, response.text

result = response.json()
print(f"There are {len(result['features'])} features in this collection")
There are 1 features in this collection

That is the geojson for a feature collection, but since there is only one feature in it we can grab just that.

france_aoi = result["features"][0]

Read data

Create an xarray.DataArray using stackstac

da = stackstac.stack(item_collection, epsg=4326)
da = da.assign_coords({"time": pd.to_datetime(da.start_datetime)}).squeeze()
da
/srv/conda/envs/notebook/lib/python3.11/site-packages/stackstac/prepare.py:408: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  times = pd.to_datetime(
<xarray.DataArray 'stackstac-a148846a6e7930da708c1b52af0a7d72' (time: 93,
                                                                y: 1800, x: 3600)>
dask.array<getitem, shape=(93, 1800, 3600), dtype=float64, chunksize=(1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/15)
    id              (time) <U37 'OMI_trno2_0.10x0.10_201601_Col3_V4.nc' ... '...
    band            <U11 'cog_default'
  * x               (x) float64 -180.0 -179.9 -179.8 ... 179.7 179.8 179.9
  * y               (y) float64 90.0 89.9 89.8 89.7 ... -89.6 -89.7 -89.8 -89.9
    proj:bbox       object {90.0, 180.0, -90.0, -180.0}
    start_datetime  (time) <U19 '2016-01-01T00:00:00' ... '2023-09-01T00:00:00'
    ...              ...
    proj:transform  object {0.1, 0.0, 1.0, -0.1, -180.0, 90.0}
    proj:shape      object {1800.0, 3600.0}
    title           <U17 'Default COG Layer'
    description     <U47 'Cloud optimized default layer to display on map'
    epsg            int64 4326
  * time            (time) datetime64[ns] 2016-01-01 2016-02-01 ... 2023-09-01
Attributes:
    spec:        RasterSpec(epsg=4326, bounds=(-180.0, -90.0, 180.0, 90.0), r...
    crs:         epsg:4326
    transform:   | 0.10, 0.00,-180.00|\n| 0.00,-0.10, 90.00|\n| 0.00, 0.00, 1...
    resolution:  0.1

Clip the data to AOI

subset = da.rio.clip([france_aoi["geometry"]])
subset
<xarray.DataArray 'stackstac-a148846a6e7930da708c1b52af0a7d72' (time: 93,
                                                                y: 97, x: 143)>
dask.array<getitem, shape=(93, 97, 143), dtype=float64, chunksize=(1, 97, 143), chunktype=numpy.ndarray>
Coordinates: (12/16)
    id              (time) <U37 'OMI_trno2_0.10x0.10_201601_Col3_V4.nc' ... '...
    band            <U11 'cog_default'
  * x               (x) float64 -4.7 -4.6 -4.5 -4.4 -4.3 ... 9.1 9.2 9.3 9.4 9.5
  * y               (y) float64 51.0 50.9 50.8 50.7 50.6 ... 41.7 41.6 41.5 41.4
    proj:bbox       object {90.0, 180.0, -90.0, -180.0}
    start_datetime  (time) <U19 '2016-01-01T00:00:00' ... '2023-09-01T00:00:00'
    ...              ...
    proj:shape      object {1800.0, 3600.0}
    title           <U17 'Default COG Layer'
    description     <U47 'Cloud optimized default layer to display on map'
    epsg            int64 4326
  * time            (time) datetime64[ns] 2016-01-01 2016-02-01 ... 2023-09-01
    spatial_ref     int64 0
Attributes:
    spec:        RasterSpec(epsg=4326, bounds=(-180.0, -90.0, 180.0, 90.0), r...
    resolution:  0.1

Compute and plot

So far we have just been setting up a calculation lazily in Dask. Now we can trigger computation using .compute().

%%time

image_stack = subset.compute()
CPU times: user 3.21 s, sys: 581 ms, total: 3.79 s
Wall time: 7.03 s
# get the 2% and 98% percentiles for min and max bounds of color
vmin, vmax = image_stack.quantile(0.02).item(), image_stack.quantile(0.98).item()

image_stack.hvplot(
    groupby="time",
    tiles=True,
    colorbar=False,
    clim=(vmin, vmax),
    cmap="viridis",
    alpha=0.8,
    frame_height=512,
    widget_location="bottom",
)