Dataset Ingestion

Guide to ingesting and publishing data to the VEDA data store & STAC API

VEDA uses a centralized Spatio-Temporal Asset Catalog (STAC) for data dissemination and prefers to host datasets in cloud-object storage (AWS S3 in the region us-west-2) in the cloud-optimized file formats Cloud-Optimized GeoTIFF (COG) and Zarr, which enables viewing and efficient access in the cloud directly from the original datafiles without copies or multiple versions.

Steps for ingesting a dataset

For dataset ingestion, generally four steps are required. Depending on the capacity of the dataset provider, some of the steps can be completed by the VEDA team on request.

The data ingestion process requires Cognito credentials (username and password). In order to retrieve these credentials, you’ll need to contact a member of the VEDA Data Services Team at who can set up an account and credentials for you. The first time you log in using the Cognito Client, you will be prompted to set a new password.

Complete as many steps of the process as you have capacity or authorization to. Please follow the steps and guides outlined below:

  1. Open a dedicated pull request in the veda-data repository. Please read through these docs fully first as you they will help supply the information required to complete the PR. Use this “new dataset” template to open a new issue and get started.
  2. Transform datasets to conform with cloud-optimized file formats - see file preparation
  3. Upload files to storage (may be skipped, if data is cloud-optimized and in us-west-2)
  4. Load those records into the VEDA STAC - see catalog ingestion

For a walk through of the full process outlined above, please refer to this example notebook. This notebook uses the GEOGLAM June 2023 dataset as an example, but please use this as a guide for the ingestion process (and required dataset defintions), replacing the GEOGLAM dataset with your own.

Stuck on how to develop compliant metadata records for your dataset?

Checkout the following notebooks and resources to help provide you with the STAC metadata required to create the dataset definitions needed for catalog ingestion.