Run this notebook
You can launch this notebook in VEDA JupyterHub by clicking the link below.
Launch in VEDA JupyterHub (requires access)
Learn more
Inside the Hub
This notebook was written on the VEDA JupyterHub and as such is designed to be run on a jupyterhub which is associated with an AWS IAM role which has been granted permissions to the VEDA data store via its bucket policy. The instance used provided 16GB of RAM.
See (VEDA Analytics JupyterHub Access)[https://nasa-impact.github.io/veda-docs/veda-jh-access.html] for information about how to gain access.
Outside the Hub
The data is in a protected bucket. Please request access by emailng aimee@developmentseed.org or alexandra@developmentseed.org and providing your affiliation, interest in or expected use of the dataset and an AWS IAM role or user Amazon Resource Name (ARN). The team will help you configure the cognito client.
You should then run:
%run -i 'cognito_login.py'
Approach
Query STAC API and explore item contents for a given collection
Read and access the data
Visualize the collection with hvplot
Run zonal statistics on collection using rasterstats
Visualize resultant zonal statistics on a choropleth map
About the Data
The NCEO Aboveground Woody Biomass 2017 dataset is a map for Africa at 100 m spatial resolution which was developed using a combination of LiDAR, Synthetic Aperture Radar (SAR) and optical based data. Aboveground woody biomass (AGB) plays an key role in the study of the Earth’s carbon cycle and response to climate change. Estimation based on Earth Observation measurements is an effective method for regional scale studies and the results are expressed as dry matter in Mg ha-1.
Important Note : Users of this dataset should keep in mind that the map is a continental-scale dataset, generated using a combination of different remote sensing data types, with a single method for the whole study area produced in 2017. Users, therefore, should understand that accuracy may vary for different regions and vegetation types.
The Case Study - Guinea
Mapping and understanding the spatial distribution of AGB is key to understanding carbon dioxide emissions from tropical deforestation through the loss of woody carbon stocks. The resulting carbon fluxes from these land-use changes and vegetation degradation can have negative impacts on the global carbon cycle. Change analysis between AGB maps overtime can display losses in high biomass forests, due to suspected deforestation and forest degredation.
The forests of southern Guinea are reported to have some of the highest density AGB of any forest in the world and are one of the most threatened ecoregions in Africa . Importantly, this area was also the epicenter of the 2014 Ebola outbreak, which had a lasting impact on the region. There is more and more evidence that human deforestation activities in this area may have accelerated the spread of the deadly virus as a result of increasing human-bat interactions in the region.
In this example we explore the NCEO AGB dataset for 2017, running zonal statistics at the district (administrative 2) level to understand those areas in Guinea that need greatest prioritization for protection and conservation.
Setting up the Environment
To run zonal statistics
we’ll need to import a python package called rasterstats
into our environment. You can uncomment the following line for installation. This cell needs only needs to be run once.
Requirement already satisfied: rasterstats in /srv/conda/envs/notebook/lib/python3.10/site-packages (0.18.0)
Requirement already satisfied: fiona<1.9 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from rasterstats) (1.8.22)
Requirement already satisfied: shapely in /srv/conda/envs/notebook/lib/python3.10/site-packages (from rasterstats) (1.8.5.post1)
Requirement already satisfied: cligj>=0.4 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from rasterstats) (0.7.2)
Requirement already satisfied: simplejson in /srv/conda/envs/notebook/lib/python3.10/site-packages (from rasterstats) (3.19.1)
Requirement already satisfied: affine<3.0 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from rasterstats) (2.3.1)
Requirement already satisfied: click>7.1 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from rasterstats) (8.1.3)
Requirement already satisfied: numpy>=1.9 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from rasterstats) (1.23.5)
Requirement already satisfied: rasterio>=1.0 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from rasterstats) (1.3.4)
Requirement already satisfied: attrs>=17 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from fiona<1.9->rasterstats) (22.2.0)
Requirement already satisfied: six>=1.7 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from fiona<1.9->rasterstats) (1.16.0)
Requirement already satisfied: setuptools in /srv/conda/envs/notebook/lib/python3.10/site-packages (from fiona<1.9->rasterstats) (65.6.3)
Requirement already satisfied: certifi in /srv/conda/envs/notebook/lib/python3.10/site-packages (from fiona<1.9->rasterstats) (2022.12.7)
Requirement already satisfied: click-plugins>=1.0 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from fiona<1.9->rasterstats) (1.1.1)
Requirement already satisfied: munch in /srv/conda/envs/notebook/lib/python3.10/site-packages (from fiona<1.9->rasterstats) (2.5.0)
Requirement already satisfied: snuggs>=1.4.1 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from rasterio>=1.0->rasterstats) (1.4.7)
Requirement already satisfied: pyparsing>=2.1.6 in /srv/conda/envs/notebook/lib/python3.10/site-packages (from snuggs>=1.4.1->rasterio>=1.0->rasterstats) (3.0.9)
Querying the STAC API
from pystac_client import Client
# Provide STAC API endpoint
STAC_API_URL = "https://staging-stac.delta-backend.com/"
# Declare collection of interest - NCEO Biomass
collection = "nceo_africa_2017"
Now let’s check how many total items are available.
search = Client.open (STAC_API_URL).search(collections= [collection])
items = list (search.items())
print (f"Found { len (items)} items" )
This makes sense as there is only one item available: a map for 2017.
# Explore the "cog_default" asset of one item to see what it contains
items[0 ].assets["cog_default" ].to_dict()
{'href': 's3://nasa-maap-data-store/file-staging/nasa-map/nceo-africa-2017/AGB_map_2017v0m_COG.tif',
'type': 'image/tiff; application=geotiff; profile=cloud-optimized',
'title': 'Default COG Layer',
'description': 'Cloud optimized default layer to display on map',
'raster:bands': [{'scale': 1.0,
'nodata': 'inf',
'offset': 0.0,
'sampling': 'area',
'data_type': 'uint16',
'histogram': {'max': 429.0,
'min': 0.0,
'count': 11.0,
'buckets': [405348.0,
44948.0,
18365.0,
6377.0,
3675.0,
3388.0,
3785.0,
9453.0,
13108.0,
1186.0]},
'statistics': {'mean': 37.58407913145342,
'stddev': 81.36678677343947,
'maximum': 429.0,
'minimum': 0.0,
'valid_percent': 50.42436439336373}}],
'roles': ['data', 'layer']}
Explore through the item’s assets. We can see from the data’s statistics values that the min
and max
values for the observed values range from 0
to 429
Mg ha-1.
Exploring the properties ["proj:epsg"]
we also notice something strange, as the value is float
and should be integer
. We’ll revise this using the code below, but it will be revised upstream in the future.
items[0 ].properties["proj:epsg" ] = int (items[0 ].properties["proj:epsg" ])
Reading and accessing the data
Now that we’ve explored the dataset through the STAC API, let’s read and access the dataset itself.
# This is a workaround that is planning to move up into stackstac itself
import boto3
import stackstac
import rasterio as rio
import rioxarray
gdal_env = stackstac.DEFAULT_GDAL_ENV.updated(
always= dict (
AWS_NO_SIGN_REQUEST= True , session= rio.session.AWSSession(boto3.Session())
)
)
da = stackstac.stack(items[0 ], gdal_env= gdal_env)
da
<xarray.DataArray 'stackstac-486a109067eae0fbd2de23580e0f93f3' (time: 1,
band: 1,
y: 81025,
x: 78078)>
dask.array<fetch_raster_window, shape=(1, 1, 81025, 78078), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/16)
* time (time) datetime64[ns] NaT
id (time) <U19 'AGB_map_2017v0m_COG'
* band (band) <U11 'cog_default'
* x (x) float64 -18.27 -18.27 -18.27 ... 51.86 51.86 51.86
* y (y) float64 37.73 37.73 37.73 37.73 ... -35.05 -35.05 -35.05
proj:transform object {0.0, 1.0, 37.73103856358817, 0.000898315284119521...
... ...
proj:geometry object {'type': 'Polygon', 'coordinates': [[[-18.27352950...
proj:shape object {81024.0, 78077.0}
title <U17 'Default COG Layer'
description <U47 'Cloud optimized default layer to display on map'
raster:bands object {'scale': 1.0, 'nodata': 'inf', 'offset': 0.0, 'sa...
epsg int64 4326
Attributes:
spec: RasterSpec(epsg=4326, bounds=(-18.274427824843425, -35.05405...
crs: epsg:4326
transform: | 0.00, 0.00,-18.27|\n| 0.00,-0.00, 37.73|\n| 0.00, 0.00, 1.00|
resolution: 0.0008983152841195214 dask.array<chunksize=(1, 1, 1024, 1024), meta=np.ndarray>
Bytes
47.13 GiB
8.00 MiB
Shape
(1, 1, 81025, 78078)
(1, 1, 1024, 1024)
Dask graph
6160 chunks in 3 graph layers
Data type
float64 numpy.ndarray
1 1 78078 81025 1
Coordinates: (16)
time
(time)
datetime64[ns]
NaT
array(['NaT'], dtype='datetime64[ns]') id
(time)
<U19
'AGB_map_2017v0m_COG'
array(['AGB_map_2017v0m_COG'], dtype='<U19') band
(band)
<U11
'cog_default'
array(['cog_default'], dtype='<U11') x
(x)
float64
-18.27 -18.27 ... 51.86 51.86
array([-18.274428, -18.27353 , -18.272631, ..., 51.861538, 51.862436,
51.863335]) y
(y)
float64
37.73 37.73 37.73 ... -35.05 -35.05
array([ 37.731937, 37.731039, 37.73014 , ..., -35.051364, -35.052262,
-35.053161]) proj:transform
()
object
{0.0, 1.0, 37.73103856358817, 0....
array({0.0, 1.0, 37.73103856358817, 0.0008983152841195214, -18.273529509559307, -0.0008983152841195214},
dtype=object) proj:epsg
()
int64
4326
proj:bbox
()
object
{37.73103856358817, 51.864232928...
array({37.73103856358817, 51.86423292864056, -35.054059016911935, -18.273529509559307},
dtype=object) start_datetime
()
<U25
'2017-01-01T00:00:00+00:00'
array('2017-01-01T00:00:00+00:00', dtype='<U25') end_datetime
()
<U25
'2017-12-31T23:59:59+00:00'
array('2017-12-31T23:59:59+00:00', dtype='<U25') proj:geometry
()
object
{'type': 'Polygon', 'coordinates...
array({'type': 'Polygon', 'coordinates': [[[-18.273529509559307, -35.054059016911935], [51.86423292864056, -35.054059016911935], [51.86423292864056, 37.73103856358817], [-18.273529509559307, 37.73103856358817], [-18.273529509559307, -35.054059016911935]]]},
dtype=object) proj:shape
()
object
{81024.0, 78077.0}
array({81024.0, 78077.0}, dtype=object) title
()
<U17
'Default COG Layer'
array('Default COG Layer', dtype='<U17') description
()
<U47
'Cloud optimized default layer t...
array('Cloud optimized default layer to display on map', dtype='<U47') raster:bands
()
object
{'scale': 1.0, 'nodata': 'inf', ...
array({'scale': 1.0, 'nodata': 'inf', 'offset': 0.0, 'sampling': 'area', 'data_type': 'uint16', 'histogram': {'max': 429.0, 'min': 0.0, 'count': 11.0, 'buckets': [405348.0, 44948.0, 18365.0, 6377.0, 3675.0, 3388.0, 3785.0, 9453.0, 13108.0, 1186.0]}, 'statistics': {'mean': 37.58407913145342, 'stddev': 81.36678677343947, 'maximum': 429.0, 'minimum': 0.0, 'valid_percent': 50.42436439336373}},
dtype=object) epsg
()
int64
4326
Indexes: (4)
PandasIndex
PandasIndex(DatetimeIndex(['NaT'], dtype='datetime64[ns]', name='time', freq=None)) PandasIndex
PandasIndex(Index(['cog_default'], dtype='object', name='band')) PandasIndex
PandasIndex(Float64Index([-18.274427824843425, -18.273529509559307, -18.272631194275185,
-18.271732878991067, -18.27083456370695, -18.269936248422827,
-18.26903793313871, -18.268139617854587, -18.26724130257047,
-18.26634298728635,
...
51.855249775799365, 51.85614809108348, 51.8570464063676,
51.85794472165172, 51.85884303693584, 51.859741352219956,
51.860639667504074, 51.86153798278819, 51.862436298072325,
51.86333461335644],
dtype='float64', name='x', length=78078)) PandasIndex
PandasIndex(Float64Index([ 37.73193687887226, 37.73103856358814, 37.730140248304025,
37.7292419330199, 37.72834361773578, 37.72744530245166,
37.726546987167545, 37.72564867188343, 37.72475035659931,
37.72385204131518,
...
-35.04507586407077, -35.045974179354886, -35.046872494639004,
-35.04777080992312, -35.04866912520724, -35.04956744049136,
-35.05046575577549, -35.05136407105961, -35.05226238634373,
-35.053160701627846],
dtype='float64', name='y', length=81025)) Attributes: (4)
spec : RasterSpec(epsg=4326, bounds=(-18.274427824843425, -35.054059016911964, 51.86423292864057, 37.73193687887226), resolutions_xy=(0.0008983152841195214, 0.0008983152841195214)) crs : epsg:4326 transform : | 0.00, 0.00,-18.27|
| 0.00,-0.00, 37.73|
| 0.00, 0.00, 1.00| resolution : 0.0008983152841195214
# Create an AOI for our study area
# Guinea
guinea_aoi = {
"type" : "Feature" ,
"properties" : {},
"geometry" : {
"coordinates" : [
[
[- 15.519958756713947 , 12.732440363049193 ],
[- 15.519958756713947 , 6.771426493209475 ],
[- 7.078554695621165 , 6.771426493209475 ],
[- 7.078554695621165 , 12.732440363049193 ],
[- 15.519958756713947 , 12.732440363049193 ],
]
],
"type" : "Polygon" ,
},
}
# Subset to bounding box of Guinea
subset = da.rio.clip([guinea_aoi["geometry" ]])
subset
<xarray.DataArray 'stackstac-486a109067eae0fbd2de23580e0f93f3' (time: 1,
band: 1,
y: 6636, x: 9397)>
dask.array<getitem, shape=(1, 1, 6636, 9397), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/17)
* time (time) datetime64[ns] NaT
id (time) <U19 'AGB_map_2017v0m_COG'
* band (band) <U11 'cog_default'
* x (x) float64 -15.52 -15.52 -15.52 ... -7.081 -7.08 -7.079
* y (y) float64 12.73 12.73 12.73 12.73 ... 6.773 6.772 6.772
proj:transform object {0.0, 1.0, 37.73103856358817, 0.000898315284119521...
... ...
proj:shape object {81024.0, 78077.0}
title <U17 'Default COG Layer'
description <U47 'Cloud optimized default layer to display on map'
raster:bands object {'scale': 1.0, 'nodata': 'inf', 'offset': 0.0, 'sa...
epsg int64 4326
spatial_ref int64 0
Attributes:
spec: RasterSpec(epsg=4326, bounds=(-18.274427824843425, -35.05405...
resolution: 0.0008983152841195214 dask.array<chunksize=(1, 1, 842, 5), meta=np.ndarray>
Bytes
475.76 MiB
8.00 MiB
Shape
(1, 1, 6636, 9397)
(1, 1, 1024, 1024)
Dask graph
77 chunks in 7 graph layers
Data type
float64 numpy.ndarray
1 1 9397 6636 1
Coordinates: (17)
time
(time)
datetime64[ns]
NaT
array(['NaT'], dtype='datetime64[ns]') id
(time)
<U19
'AGB_map_2017v0m_COG'
array(['AGB_map_2017v0m_COG'], dtype='<U19') band
(band)
<U11
'cog_default'
array(['cog_default'], dtype='<U11') x
(x)
float64
-15.52 -15.52 ... -7.08 -7.079
axis : X long_name : longitude standard_name : longitude units : degrees_east array([-15.519295, -15.518397, -15.517498, ..., -7.080521, -7.079623,
-7.078724]) y
(y)
float64
12.73 12.73 12.73 ... 6.772 6.772
axis : Y long_name : latitude standard_name : latitude units : degrees_north array([12.731823, 12.730924, 12.730026, ..., 6.773297, 6.772399, 6.771501]) proj:transform
()
object
{0.0, 1.0, 37.73103856358817, 0....
array({0.0, 1.0, 37.73103856358817, 0.0008983152841195214, -18.273529509559307, -0.0008983152841195214},
dtype=object) proj:epsg
()
int64
4326
proj:bbox
()
object
{-35.054059016911935, 51.8642329...
array({-35.054059016911935, 51.86423292864056, 37.73103856358817, -18.273529509559307},
dtype=object) start_datetime
()
<U25
'2017-01-01T00:00:00+00:00'
array('2017-01-01T00:00:00+00:00', dtype='<U25') end_datetime
()
<U25
'2017-12-31T23:59:59+00:00'
array('2017-12-31T23:59:59+00:00', dtype='<U25') proj:geometry
()
object
{'type': 'Polygon', 'coordinates...
array({'type': 'Polygon', 'coordinates': [[[-18.273529509559307, -35.054059016911935], [51.86423292864056, -35.054059016911935], [51.86423292864056, 37.73103856358817], [-18.273529509559307, 37.73103856358817], [-18.273529509559307, -35.054059016911935]]]},
dtype=object) proj:shape
()
object
{81024.0, 78077.0}
array({81024.0, 78077.0}, dtype=object) title
()
<U17
'Default COG Layer'
array('Default COG Layer', dtype='<U17') description
()
<U47
'Cloud optimized default layer t...
array('Cloud optimized default layer to display on map', dtype='<U47') raster:bands
()
object
{'scale': 1.0, 'nodata': 'inf', ...
array({'scale': 1.0, 'nodata': 'inf', 'offset': 0.0, 'sampling': 'area', 'data_type': 'uint16', 'histogram': {'max': 429.0, 'min': 0.0, 'count': 11.0, 'buckets': [405348.0, 44948.0, 18365.0, 6377.0, 3675.0, 3388.0, 3785.0, 9453.0, 13108.0, 1186.0]}, 'statistics': {'mean': 37.58407913145342, 'stddev': 81.36678677343947, 'maximum': 429.0, 'minimum': 0.0, 'valid_percent': 50.42436439336373}},
dtype=object) epsg
()
int64
4326
spatial_ref
()
int64
0
crs_wkt : GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]] semi_major_axis : 6378137.0 semi_minor_axis : 6356752.314245179 inverse_flattening : 298.257223563 reference_ellipsoid_name : WGS 84 longitude_of_prime_meridian : 0.0 prime_meridian_name : Greenwich geographic_crs_name : WGS 84 grid_mapping_name : latitude_longitude spatial_ref : GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]] GeoTransform : -15.519744006090912 0.0008983152841195214 0.0 12.732271679468038 0.0 -0.0008983152841195215 Indexes: (4)
PandasIndex
PandasIndex(DatetimeIndex(['NaT'], dtype='datetime64[ns]', name='time', freq=None)) PandasIndex
PandasIndex(Index(['cog_default'], dtype='object', name='band')) PandasIndex
PandasIndex(Float64Index([-15.519294848448853, -15.518396533164733, -15.517498217880615,
-15.516599902596495, -15.515701587312375, -15.514803272028256,
-15.513904956744136, -15.513006641460017, -15.512108326175897,
-15.511210010891777,
...
-7.086809276418906, -7.085910961134786, -7.085012645850668,
-7.084114330566548, -7.083216015282428, -7.082317699998308,
-7.08141938471419, -7.08052106943007, -7.07962275414595,
-7.07872443886183],
dtype='float64', name='x', length=9397)) PandasIndex
PandasIndex(Float64Index([12.731822521825979, 12.73092420654186, 12.730025891257743,
12.72912757597362, 12.728229260689503, 12.727330945405381,
12.726432630121263, 12.725534314837144, 12.724635999553023,
12.723737684268905,
...
6.779585449250032, 6.77868713396591, 6.777788818681792,
6.776890503397674, 6.775992188113552, 6.775093872829434,
6.774195557545315, 6.773297242261194, 6.7723989269770755,
6.771500611692954],
dtype='float64', name='y', length=6636)) Attributes: (2)
spec : RasterSpec(epsg=4326, bounds=(-18.274427824843425, -35.054059016911964, 51.86423292864057, 37.73193687887226), resolutions_xy=(0.0008983152841195214, 0.0008983152841195214)) resolution : 0.0008983152841195214
# select the band of interest, as there is only one in this dataset we'll select the default
data_band = subset.sel(band= "cog_default" )
data_band
<xarray.DataArray 'stackstac-486a109067eae0fbd2de23580e0f93f3' (time: 1,
y: 6636, x: 9397)>
dask.array<getitem, shape=(1, 6636, 9397), dtype=float64, chunksize=(1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/17)
* time (time) datetime64[ns] NaT
id (time) <U19 'AGB_map_2017v0m_COG'
band <U11 'cog_default'
* x (x) float64 -15.52 -15.52 -15.52 ... -7.081 -7.08 -7.079
* y (y) float64 12.73 12.73 12.73 12.73 ... 6.773 6.772 6.772
proj:transform object {0.0, 1.0, 37.73103856358817, 0.000898315284119521...
... ...
proj:shape object {81024.0, 78077.0}
title <U17 'Default COG Layer'
description <U47 'Cloud optimized default layer to display on map'
raster:bands object {'scale': 1.0, 'nodata': 'inf', 'offset': 0.0, 'sa...
epsg int64 4326
spatial_ref int64 0
Attributes:
spec: RasterSpec(epsg=4326, bounds=(-18.274427824843425, -35.05405...
resolution: 0.0008983152841195214 Coordinates: (17)
time
(time)
datetime64[ns]
NaT
array(['NaT'], dtype='datetime64[ns]') id
(time)
<U19
'AGB_map_2017v0m_COG'
array(['AGB_map_2017v0m_COG'], dtype='<U19') band
()
<U11
'cog_default'
array('cog_default', dtype='<U11') x
(x)
float64
-15.52 -15.52 ... -7.08 -7.079
axis : X long_name : longitude standard_name : longitude units : degrees_east array([-15.519295, -15.518397, -15.517498, ..., -7.080521, -7.079623,
-7.078724]) y
(y)
float64
12.73 12.73 12.73 ... 6.772 6.772
axis : Y long_name : latitude standard_name : latitude units : degrees_north array([12.731823, 12.730924, 12.730026, ..., 6.773297, 6.772399, 6.771501]) proj:transform
()
object
{0.0, 1.0, 37.73103856358817, 0....
array({0.0, 1.0, 37.73103856358817, 0.0008983152841195214, -18.273529509559307, -0.0008983152841195214},
dtype=object) proj:epsg
()
int64
4326
proj:bbox
()
object
{-35.054059016911935, 51.8642329...
array({-35.054059016911935, 51.86423292864056, 37.73103856358817, -18.273529509559307},
dtype=object) start_datetime
()
<U25
'2017-01-01T00:00:00+00:00'
array('2017-01-01T00:00:00+00:00', dtype='<U25') end_datetime
()
<U25
'2017-12-31T23:59:59+00:00'
array('2017-12-31T23:59:59+00:00', dtype='<U25') proj:geometry
()
object
{'type': 'Polygon', 'coordinates...
array({'type': 'Polygon', 'coordinates': [[[-18.273529509559307, -35.054059016911935], [51.86423292864056, -35.054059016911935], [51.86423292864056, 37.73103856358817], [-18.273529509559307, 37.73103856358817], [-18.273529509559307, -35.054059016911935]]]},
dtype=object) proj:shape
()
object
{81024.0, 78077.0}
array({81024.0, 78077.0}, dtype=object) title
()
<U17
'Default COG Layer'
array('Default COG Layer', dtype='<U17') description
()
<U47
'Cloud optimized default layer t...
array('Cloud optimized default layer to display on map', dtype='<U47') raster:bands
()
object
{'scale': 1.0, 'nodata': 'inf', ...
array({'scale': 1.0, 'nodata': 'inf', 'offset': 0.0, 'sampling': 'area', 'data_type': 'uint16', 'histogram': {'max': 429.0, 'min': 0.0, 'count': 11.0, 'buckets': [405348.0, 44948.0, 18365.0, 6377.0, 3675.0, 3388.0, 3785.0, 9453.0, 13108.0, 1186.0]}, 'statistics': {'mean': 37.58407913145342, 'stddev': 81.36678677343947, 'maximum': 429.0, 'minimum': 0.0, 'valid_percent': 50.42436439336373}},
dtype=object) epsg
()
int64
4326
spatial_ref
()
int64
0
crs_wkt : GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]] semi_major_axis : 6378137.0 semi_minor_axis : 6356752.314245179 inverse_flattening : 298.257223563 reference_ellipsoid_name : WGS 84 longitude_of_prime_meridian : 0.0 prime_meridian_name : Greenwich geographic_crs_name : WGS 84 grid_mapping_name : latitude_longitude spatial_ref : GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],AXIS["Latitude",NORTH],AXIS["Longitude",EAST],AUTHORITY["EPSG","4326"]] GeoTransform : -15.519744006090912 0.0008983152841195214 0.0 12.732271679468038 0.0 -0.0008983152841195215 Indexes: (3)
PandasIndex
PandasIndex(DatetimeIndex(['NaT'], dtype='datetime64[ns]', name='time', freq=None)) PandasIndex
PandasIndex(Float64Index([-15.519294848448853, -15.518396533164733, -15.517498217880615,
-15.516599902596495, -15.515701587312375, -15.514803272028256,
-15.513904956744136, -15.513006641460017, -15.512108326175897,
-15.511210010891777,
...
-7.086809276418906, -7.085910961134786, -7.085012645850668,
-7.084114330566548, -7.083216015282428, -7.082317699998308,
-7.08141938471419, -7.08052106943007, -7.07962275414595,
-7.07872443886183],
dtype='float64', name='x', length=9397)) PandasIndex
PandasIndex(Float64Index([12.731822521825979, 12.73092420654186, 12.730025891257743,
12.72912757597362, 12.728229260689503, 12.727330945405381,
12.726432630121263, 12.725534314837144, 12.724635999553023,
12.723737684268905,
...
6.779585449250032, 6.77868713396591, 6.777788818681792,
6.776890503397674, 6.775992188113552, 6.775093872829434,
6.774195557545315, 6.773297242261194, 6.7723989269770755,
6.771500611692954],
dtype='float64', name='y', length=6636)) Attributes: (2)
spec : RasterSpec(epsg=4326, bounds=(-18.274427824843425, -35.054059016911964, 51.86423292864057, 37.73193687887226), resolutions_xy=(0.0008983152841195214, 0.0008983152841195214)) resolution : 0.0008983152841195214
Visualizing the NCEO Biomass 2017 layer for our study area in Guinea
Now that we’ve got the NCEO data layer subsetted for Guinea, let’s visualize it using hvplot
.
import hvplot.xarray
biomass = data_band.squeeze()
biomass
biomass.hvplot(
x= "x" ,
y= "y" ,
coastline= True ,
rasterize= True ,
cmap= "viridis" ,
widget_location= "bottom" ,
frame_width= 600 ,
)
Zonal Statistics
This map we created above is great, but let’s focus on which districts (administrative level 2 boundaries) should be prioritized for forest conservation.
Zonal statistics is an operation that calculates statistics on the cell values of a raster layer (e.g., the NCEO AGB dataset) within the zones (i.e., polygons) of another dataset. It is an analytical tool that can calculate the mean, median, sum, minimum, maximum, or range in each zone. The zonal extent, often polygons, can be in the form of objects like administrative boundaries, water catchment areas, or field boundaries.
In this example, we’ll explore the data contained in the NCEO AGB collection and analyze it for each of the districts in Guinea. To do this we will need to import district (administrative level 2) boundary layers from below. We will use the Humanitarian Data Exchange (HDX) site to retrieve subnational administrative boundaries for Guinea . Specifically, we will use the geoBoundaries-GIN-ADM2_simplified.geojson
which can be accessed here and read them in directly using geopandas
.
import geopandas as gpd
admin2_gdf = gpd.read_file(
"https://raw.githubusercontent.com/wmgeolab/geoBoundaries/0f0b6f5fb638e7faf115f876da4e77d8f7fa319f/releaseData/gbOpen/GIN/ADM2/geoBoundaries-GIN-ADM2_simplified.geojson"
)
# check the CRS
print (admin2_gdf.crs)
import pandas as pd
from rasterstats import zonal_stats
admin2_biomass = pd.DataFrame(
zonal_stats(
admin2_gdf,
biomass.values,
affine= biomass.rio.transform(),
nodata= biomass.rio.nodata,
band= 1
# geojson_out=True
),
index= admin2_gdf.index,
)
admin2_biomass
/srv/conda/envs/notebook/lib/python3.10/site-packages/rasterstats/io.py:335: NodataWarning: Setting nodata to -999; specify nodata explicitly
warnings.warn(
0
0.0
565.0
51.445738
1276378
1
0.0
546.0
46.846892
527269
2
0.0
513.0
48.862863
1107831
3
0.0
345.0
31.315987
41660
4
0.0
494.0
47.809879
116515
5
0.0
422.0
57.583787
530195
6
0.0
548.0
48.555337
317191
7
0.0
438.0
43.717062
1176897
8
0.0
448.0
44.744092
403399
9
0.0
533.0
78.724252
1315404
10
0.0
422.0
46.973538
426420
11
0.0
452.0
50.016374
161593
12
0.0
483.0
52.088640
1145951
13
0.0
563.0
89.595375
429232
14
0.0
503.0
51.326848
1769560
15
0.0
558.0
65.559085
955743
16
0.0
486.0
48.230132
897380
17
0.0
590.0
90.110284
630818
18
0.0
413.0
47.668705
364092
19
0.0
339.0
37.304653
544541
20
0.0
501.0
57.187525
1614332
21
0.0
470.0
46.776737
216368
22
0.0
469.0
57.435206
278955
23
0.0
592.0
71.918267
461576
24
0.0
623.0
123.937151
814877
25
0.0
451.0
45.058698
859223
26
0.0
564.0
74.196642
1042860
27
0.0
406.0
33.353046
1191380
28
0.0
592.0
86.547049
413435
29
0.0
560.0
57.591189
460766
30
0.0
392.0
29.201285
1813376
31
0.0
554.0
57.991285
771431
32
0.0
389.0
49.663329
615108
33
0.0
588.0
131.323274
326021
Now we’ll join the administrative level 2 boundaries to the zonal statistics results, so that we can map the districts on a choropleth map.
concat_df = admin2_gdf.join(admin2_biomass)
concat_df
0
1
GN-BE
Beyla
ADM2
GIN-ADM2-49546643B63767081
GIN
ADM2
POLYGON ((-8.24559 8.44255, -8.24158 8.45044, ...
0.0
565.0
51.445738
1276378
1
2
GN-BF
Boffa
ADM2
GIN-ADM2-49546643B69790359
GIN
ADM2
MULTIPOLYGON (((-13.77147 9.84445, -13.76994 9...
0.0
546.0
46.846892
527269
2
3
GN-BK
Boke
ADM2
GIN-ADM2-49546643B67680147
GIN
ADM2
MULTIPOLYGON (((-14.57512 10.76872, -14.57633 ...
0.0
513.0
48.862863
1107831
3
4
GN-C
Conakry
ADM2
GIN-ADM2-49546643B26553537
GIN
ADM2
MULTIPOLYGON (((-13.78686 9.46592, -13.79013 9...
0.0
345.0
31.315987
41660
4
5
GN-CO
Coyah
ADM2
GIN-ADM2-49546643B29309121
GIN
ADM2
POLYGON ((-13.49399 9.53945, -13.48050 9.55304...
0.0
494.0
47.809879
116515
5
6
GN-DB
Dabola
ADM2
GIN-ADM2-49546643B70320134
GIN
ADM2
POLYGON ((-10.46739 10.53598, -10.46752 10.545...
0.0
422.0
57.583787
530195
6
7
GN-DL
Dalaba
ADM2
GIN-ADM2-49546643B47404564
GIN
ADM2
POLYGON ((-12.01167 11.29091, -12.03171 11.288...
0.0
548.0
48.555337
317191
7
8
GN-DI
Dinguiraye
ADM2
GIN-ADM2-49546643B47728803
GIN
ADM2
POLYGON ((-10.72063 11.13326, -10.72092 11.144...
0.0
438.0
43.717062
1176897
8
9
GN-DU
Dubreka
ADM2
GIN-ADM2-49546643B78750611
GIN
ADM2
MULTIPOLYGON (((-13.76504 9.82404, -13.75194 9...
0.0
448.0
44.744092
403399
9
10
GN-FA
Faranah
ADM2
GIN-ADM2-49546643B99428691
GIN
ADM2
POLYGON ((-11.38731 10.39356, -11.38273 10.350...
0.0
533.0
78.724252
1315404
10
11
GN-FO
Forecariah
ADM2
GIN-ADM2-49546643B32851960
GIN
ADM2
MULTIPOLYGON (((-13.32015 9.14776, -13.32062 9...
0.0
422.0
46.973538
426420
11
12
GN-FR
Fria
ADM2
GIN-ADM2-49546643B75641357
GIN
ADM2
POLYGON ((-13.76799 10.27884, -13.73119 10.276...
0.0
452.0
50.016374
161593
12
13
GN-GA
Gaoual
ADM2
GIN-ADM2-49546643B44796554
GIN
ADM2
POLYGON ((-13.84293 11.29667, -13.83242 11.291...
0.0
483.0
52.088640
1145951
13
14
GN-GU
Gueckedou
ADM2
GIN-ADM2-49546643B59147082
GIN
ADM2
POLYGON ((-10.59971 9.05848, -10.59402 9.05494...
0.0
563.0
89.595375
429232
14
15
GN-KA
Kankan
ADM2
GIN-ADM2-49546643B19447005
GIN
ADM2
POLYGON ((-8.14727 9.58395, -8.15293 9.58911, ...
0.0
503.0
51.326848
1769560
15
16
GN-KE
Kerouane
ADM2
GIN-ADM2-49546643B28981869
GIN
ADM2
POLYGON ((-8.61661 9.50260, -8.60868 9.51354, ...
0.0
558.0
65.559085
955743
16
17
GN-KD
Kindia
ADM2
GIN-ADM2-49546643B38105311
GIN
ADM2
POLYGON ((-13.11475 9.58669, -13.10890 9.58190...
0.0
486.0
48.230132
897380
17
18
GN-KS
Kissidougou
ADM2
GIN-ADM2-49546643B39508892
GIN
ADM2
POLYGON ((-10.45426 9.10945, -10.45334 9.08925...
0.0
590.0
90.110284
630818
18
19
GN-KB
Koubia
ADM2
GIN-ADM2-49546643B329053
GIN
ADM2
POLYGON ((-11.30453 12.01713, -11.31240 12.021...
0.0
413.0
47.668705
364092
19
20
GN-KN
Koundara
ADM2
GIN-ADM2-49546643B74925550
GIN
ADM2
POLYGON ((-12.82676 12.14425, -12.76880 12.221...
0.0
339.0
37.304653
544541
20
21
GN-KO
Kouroussa
ADM2
GIN-ADM2-49546643B81289084
GIN
ADM2
POLYGON ((-10.46739 10.53598, -10.46733 10.531...
0.0
501.0
57.187525
1614332
21
22
GN-LA
Labe
ADM2
GIN-ADM2-49546643B47788034
GIN
ADM2
POLYGON ((-12.01167 11.29091, -11.98685 11.320...
0.0
470.0
46.776737
216368
22
23
GN-LE
Lelouma
ADM2
GIN-ADM2-49546643B80531036
GIN
ADM2
POLYGON ((-12.99636 11.18952, -12.98648 11.187...
0.0
469.0
57.435206
278955
23
24
GN-LO
Lola
ADM2
GIN-ADM2-49546643B51651521
GIN
ADM2
POLYGON ((-8.46455 8.27185, -8.44429 8.25379, ...
0.0
592.0
71.918267
461576
24
25
GN-MC
Macenta
ADM2
GIN-ADM2-49546643B91718973
GIN
ADM2
POLYGON ((-8.95774 8.77472, -9.01024 8.79308, ...
0.0
623.0
123.937151
814877
25
26
GN-ML
Mali
ADM2
GIN-ADM2-49546643B68291102
GIN
ADM2
POLYGON ((-12.76304 11.85482, -12.74823 11.857...
0.0
451.0
45.058698
859223
26
27
GN-MM
Mamou
ADM2
GIN-ADM2-49546643B49157402
GIN
ADM2
POLYGON ((-11.15547 11.05524, -11.13717 11.074...
0.0
564.0
74.196642
1042860
27
28
GN-MD
Mandiana
ADM2
GIN-ADM2-49546643B49348937
GIN
ADM2
POLYGON ((-8.13614 10.00000, -8.13498 10.00774...
0.0
406.0
33.353046
1191380
28
29
GN-NZ
Nzerekore
ADM2
GIN-ADM2-49546643B97455025
GIN
ADM2
POLYGON ((-8.93454 8.25441, -8.93687 8.25503, ...
0.0
592.0
86.547049
413435
29
30
GN-PI
Pita
ADM2
GIN-ADM2-49546643B22597757
GIN
ADM2
POLYGON ((-12.20899 11.16225, -12.21822 11.152...
0.0
560.0
57.591189
460766
30
31
GN-SI
Siguiri
ADM2
GIN-ADM2-49546643B98837050
GIN
ADM2
POLYGON ((-10.00475 11.40696, -10.00285 11.401...
0.0
392.0
29.201285
1813376
31
32
GN-TE
Telimele
ADM2
GIN-ADM2-49546643B10795278
GIN
ADM2
POLYGON ((-13.65247 10.66825, -13.59967 10.711...
0.0
554.0
57.991285
771431
32
33
GN-TO
Tougue
ADM2
GIN-ADM2-49546643B67909893
GIN
ADM2
POLYGON ((-11.74293 10.98745, -11.70851 11.019...
0.0
389.0
49.663329
615108
33
34
GN-YO
Yomou
ADM2
GIN-ADM2-49546643B32761429
GIN
ADM2
POLYGON ((-9.34981 7.75681, -9.34896 7.75350, ...
0.0
588.0
131.323274
326021
By sorting the results, we can identify those top districts with the highest mean AGB.
concat_df_sorted = concat_df.sort_values(by= "mean" , ascending= False )
concat_df_sorted.head()
33
34
GN-YO
Yomou
ADM2
GIN-ADM2-49546643B32761429
GIN
ADM2
POLYGON ((-9.34981 7.75681, -9.34896 7.75350, ...
0.0
588.0
131.323274
326021
24
25
GN-MC
Macenta
ADM2
GIN-ADM2-49546643B91718973
GIN
ADM2
POLYGON ((-8.95774 8.77472, -9.01024 8.79308, ...
0.0
623.0
123.937151
814877
17
18
GN-KS
Kissidougou
ADM2
GIN-ADM2-49546643B39508892
GIN
ADM2
POLYGON ((-10.45426 9.10945, -10.45334 9.08925...
0.0
590.0
90.110284
630818
13
14
GN-GU
Gueckedou
ADM2
GIN-ADM2-49546643B59147082
GIN
ADM2
POLYGON ((-10.59971 9.05848, -10.59402 9.05494...
0.0
563.0
89.595375
429232
28
29
GN-NZ
Nzerekore
ADM2
GIN-ADM2-49546643B97455025
GIN
ADM2
POLYGON ((-8.93454 8.25441, -8.93687 8.25503, ...
0.0
592.0
86.547049
413435
Visualizing the results with a choropleth map
Now, let’s visualize the results!
import hvplot.pandas
# renaming the shapeName to District for improved legend
concat_df.rename(columns= {"shapeName" : "District" }, inplace= True )
agb = concat_df.hvplot(
c= "mean" ,
width= 900 ,
height= 500 ,
geo= True ,
hover_cols= ["mean" , "District" ],
cmap= "viridis" ,
hover_fill_color= "white" ,
line_width= 1 ,
title= "Mean Aboveground Woody Biomass per Guinean District (Mg ha-1)" ,
tiles= "CartoLight" ,
)
agb
/srv/conda/envs/notebook/lib/python3.10/site-packages/geoviews/operation/projection.py:79: ShapelyDeprecationWarning: Iteration over multi-part geometries is deprecated and will be removed in Shapely 2.0. Use the `geoms` property to access the constituent parts of a multi-part geometry.
polys = [g for g in geom if g.area > 1e-15]
By hovering over the map, we can identify the names and mean AGB per district.
Summary
In this case study we have successfully performed zonal statistics on the NCEO AGB dataset in Guinea and displayed the results on a choropleth map. The results of this analysis can dispaly those districts which contain the greatest average amount of AGB and should be prioritized for forest protection efforts.