Earth Science Data Search · CMR Agent
Helps users discover and understand NASA Earthdata datasets relevant to Earth science questions.
Designers: Nidhi Jha, Emily Foshee, Sid Chaudhary, Madison Wallner
Developers: Nishan Pantha, Muthukumaran Ramasubramanian, Simran KC, Sajil Awale, Pushwitha Krishnappa
An AKD-designed assistant integrated into NASA's Common Metadata Repository (CMR). A non-decision-making, human-in-the-loop agent that supports transparent discovery of candidate collections in CMR.
The Earth Science Data Search Agent is an AKD-designed assistant integrated into NASA’s Common Metadata Repository (CMR) that helps users discover and understand NASA Earthdata datasets relevant to Earth science questions. It is a non-decision-making, human-in-the-loop agent, meaning it does not recommend or endorse datasets, but instead supports transparent discovery of candidate collections in CMR.
The agent helps users clarify variables, spatial and temporal bounds, and whether indirect discovery is allowed. It then maps the confirmed scope to GCMD keywords and CMR API parameters, searches CMR, ranks collections by metadata relevance only, and explains why each dataset appears. The researcher makes the final decision on dataset suitability.
Inputs
- Free-text Earth science questions
- User-selected expertise level, such as Intermediate or Advanced
- Clarified science variables, spatial bounds, and temporal bounds
- User confirmation on whether indirect inference or multi-hop discovery is permitted
- NASA Earthdata / CMR metadata
- GCMD keyword mappings
- Optional user-approved literature-based discovery context
Collaborative Design
The agent provides structured search support while users retain control over the scientific framing, assumptions, and final dataset selection. The agent is explicitly non-prescriptive: it does not endorse, select, or judge datasets for suitability. Instead, it surfaces candidate collections, explains the metadata basis for their inclusion, and identifies what users should verify manually.
Users define or refine the discovery task through guided framing such as Science Domain, Search Scope and User Expertise Level.
Tools and data sources
- NASA CMR Search API, collection discovery only
- GCMD Keyword Management System, vocabulary mapping only
- Semantic Scholar API, optional, user-approved indirect discovery only
- Google Scholar, last resort for literature lookup
- Earthdata Search Web App, link handoff only
Outputs
- Clarifying questions for variables, spatial bounds, temporal bounds, and inference permissions
- Interpreted search scope based on the user’s confirmed question
- Ranked list of candidate CMR collections based on metadata relevance only
- Explanation of why each dataset appears in the results
- Notes on missing, incomplete, or ambiguous metadata
- Reproducibility log with CMR endpoints, parameters, GCMD mappings, and timestamps
- Fact-check and verification list for user review