AI Agents for Science

Why Standard AI Chatbots Struggle with Specialized Science Tasks

Standard AI chatbots often lack access to the specialized tools, databases, terminology, and ontologies needed for advanced scientific work. As a result, they have a higher risk of hallucinating when they operate in narrow technical domains.

The value a chatbot provides usually depends on the user’s domain expertise. A subject matter expert can ask precise questions, recognize weak reasoning, and quickly judge whether an answer is correct. A novice, however, may not be sure what to ask, which assumptions to challenge, or how to verify a response.

This uneven performance is often described as the “jagged edge of AI”: AI performs well on some tasks but fails unexpectedly on others, even when the tasks appear similar.

What Is an AI Agent?

An AI agent can be viewed as an independent digital worker. You give it a task, and it determines how to accomplish that task.

Unlike a standard chatbot, which mainly responds to questions, an agent can take action. It can analyze a problem, break it into smaller steps, use the right tools, review intermediate results, and correct mistakes as it works toward completion.

How an Agent Is Designed

A well-designed agent typically includes three core elements:

Reasoning logic, often implemented through prompts or instructions.
Contextual information, which grounds the agent’s reasoning in the relevant domain.
Tool access, along with clear guidance on when to use each tool and how to use it correctly.

Through the design of these elements, agents can help smooth the jagged edge of AI. When a subject matter expert helps build the agent, the design can capture some of that expert’s implicit knowledge: how to reason through the task, what context matters, which tools to use, and when to use them.

As a result, a novice using the agent may achieve productivity gains closer to those of an expert because the agent embeds expert judgment into the workflow.

However, designing an agent for complex scientific tasks is difficult, and it should not be treated as a purely technical endeavor. Building a capable agent requires close collaboration among developers, scientists, domain experts, and end users to ensure the agent reasons correctly, uses tools appropriately, and produces reliable outputs.

01 · Vision

Accelerated Knowledge Discovery: How we build AI Agents for Science

We want to transform AI into a full cognitive research partner. The Accelerated Knowledge Discovery (AKD) project drives this transformation through five strategic pillars.

Pillar 1
Co-design from the start

Using the CARE methodology, we seek to foster a triadic collaboration between developers, scientists, and AI agents to ensure every agent is scientifically grounded, technically sound, context optimized and efficient.
Pillar 2
Demystify the AI Agent

We want to enable everyone to design, build, and share by providing a common Process, Software Development Kit (SDK) and accessible platforms, lowering the barrier to entry for the entire community.
Pillar 3
Provide Science Guardrails

To ensure safety and scientific integrity, we want to integrate specialized Risk Agents which provide science-specific guardrails: factuality reasoning over claims and citations, the NASA Science Risk taxonomy, and domain-aware compliance checks.
Pillar 4
Standardize the process

We seek to implement community-accepted design and development processes across the science community, ensuring that agent behaviour is consistent, reproducible, and scalable.
Pillar 5
Build a collaborative community

By creating an open ecosystem, we want to enable the ecosystems to evolve and improve through feedback and shared contributions, ultimately accelerating the pace of science for everyone.

Read the full vision

Single agents

Complex (end-to-end) research agents

Open Project

Explore the AI Agents for Science repository, contribute agents, review documentation, and help build open, trustworthy AI infrastructure for science.

View GitHub Repository Explore Agents → Learn CARE → Contribute →

02 · Strategic Objectives

Four anchors guide everything we build.

01 / Time

Reclaim Scientists' Time

Currently, scientists spend 70–80% of their time on manual, repetitive tasks such as curating datasets and cross-referencing literature. By reducing the time spent on such tasks, AKD enables researchers to instead focus on high-level hypothesis generation and data analysis.

02 / Orchestration

Orchestrate Cross-Disciplinary Research

AKD connects disparate data from siloed science repositories and publications, acting as an intelligent orchestrator by selectively merging individual information sources into a unified knowledge stream.

03 / Infrastructure

Integrate Uniform Infrastructure

AKD implements community-accepted design processes to harmonize the use of AI science agents across NASA's data repositories. With a common AI Agent Software Development Kit (SDK), the framework ensures that agents operate within established scientific guardrails.

04 / Trust

Scientific Integrity

Built on foundations of transparency, reproducibility, and open science, AKD strives for the utmost scientific integrity in design and practice.

03 · Methodology

Design with CARE.

Collaborative Agent Reasoning Engineering (CARE) is a disciplined, stage-gated methodology designed to systematically engineer AI agents for scientific and technical workflows.

Subject Matter Experts

Provide domain authority: surfacing nuanced constraints and validating scientific "truth".
Developers

Act as the implementation authority, ensuring tool realism and feasibility.
Helper LLM Agents

Serve as facilitation infrastructure: asking phase-aligned questions, drafting specifications in Markdown, and proposing revisions for human approval.

Five phases: each gated by joint SME / developer / agent approval

Phase 1 Scope & Decompose
Defines the target workflow, users, and constraints.
Phase 2 Key Information Elicitation
Captures details on tools, domain context, and required output formats.
Phase 3 Reasoning Policy & Guardrails
Codifies "expert-like" thinking logic and safety boundaries for uncertainty or tool errors.
Phase 4 Implementation
Translates approved artifacts into an engineered agent prompt using established design patterns.
Phase 5 Benchmarking & Verification
Establishes realistic query sets and scoring rubrics to detect regressions over time.

Read the full methodology

04 · Agent Development Pathway

From idea to published agent: pick your path.

AKD is a composable framework for AI agents in NASA science. Each agent is designed with CARE, assembled from shared building blocks (prompts, tools, context, and pluggable guardrail and reasoning services) reviewed by a domain expert, then run on whichever surface scientists already work in: AKD Flow, integrated chatbots in Teams or Slack, Community MCP servers, or hosted chatbots in ChatGPT or ChatGSFC.

The same blocks compose every agent; the same agent reaches scientists wherever they work. Pick a tour below to highlight one path through it. Hover over any card to view a full description; click the icon in any step to open the associated repository.

Design the agent

Step 1 CARE Design Process

A 5-phase stage-gated process where SMEs, developers, and helper LLM agents specify what the agent should do.

How will you run CARE?

Path A CARE Studio in AKD Labs UI

Path B CARE phase prompts (ChatGPT / ChatGSFC)

What CARE produces

Step 2 CARE Output

Prompt
Tools
Context
Guardrails

How will the agent run?

Step 3 · Pick a runtime Choose Operationalization Path

1 Chat assistant
2 Custom agent
3 Agent workflow

Branch 1 · Chat assistant Chat assistant → ChatGPT / ChatGSFC ●

Branch 2 · Custom agent Custom agent → Build in Agent Toolkit ●

Iterate AKD Labs feedback loop

Branch 3 · Agent workflow Agent workflow → Compose in Flow ●

Pass SME review

B2 + B3 · SME review SME Readiness Review (gate) ●

Runs on AKD Flow

Step 5 · Runtime AKD Flow UI

The canonical production environment for every approved agent.

Where scientists meet it

Surface AKD Flow Workflows

Surface Hosted Chatbots (ChatGPT · ChatGSFC)

Surface Community MCP Servers

Surface Agent Toolkit (akd-ext)

Surface Integrated Chatbots

Teams · Slack · NASA collab tools

AKD services plug into the path above

Governance and safety services come built in, available across every stage of the pathway. Highlighted services match the active tour; teams pick the rest as needed.

Reusable Scientific Tools MCP-wrapped search, retrieval, and analysis tools that agents compose instead of re-implementing.
FactReasoner Targeted factuality reasoning over claims, attribution, and supporting evidence to mitigate hallucinations.
Risk Agent An LLM judge that evaluates outputs against the IBM Risk Atlas and a NASA Science Literature Risk taxonomy, along with context-specific risks for each agent.
Compliance Checking Automated guardrail layer enforcing science-specific constraints on inputs and outputs.
Science Guardrails A reusable input/output safety net: each guardrail acts as a checkpoint between an AI agent and the outside world.
Granite Guardian A fast, content-focused moderation LLM based on the IBM Granite family: assesses jailbreaks, harm, bias, plus RAG-specific groundedness checks.

05 · Standalone Agents

Specialized AKD agents for scientific domains.

Specialized AKD agents for scientific domains can be used as standalone agents or in multi-agent workflows. Each agent is designed via the CARE process, ships with a documented reasoning strategy and a curated tool set, and is non-prescriptive by design. The specialized agents identify candidates, evidence, and caveats, while human researchers retain authority over output selection and interpretation.

Earth Science Data Search · CMR Agent

Helps users discover and understand NASA Earthdata datasets relevant to Earth science questions.

Details

Planetary Data Search · PDS Agent

An AKD agent for discovering datasets and products across NASA's Planetary Data System.

Details

Code Search Agent

An AKD agent for discovering publicly available scientific code repositories that plausibly align with a user's technical or scientific task.

Details

Astro Data Search Agent

An AKD agent for discovering astronomical datasets across NASA's astrophysics archives.

Details

Gap Analysis · Gap Search Agent

Identifies defensible research gaps, contradictions, and candidate research questions from a user-curated collection of academic papers.

Details

Scientific Illustrator Agent

Transforms scientific text into structured, publication-grade visual prompts, diagrams, and figure concepts.

Details

Scientific Paper Writing Assistant

A collaborative scientific writing agent that converts research materials into publication-ready manuscript sections.

Details

NASA Data Governance Agent

Supports practical, traceable, and role-adaptive data governance guidance across all NASA Science Mission Directorate divisions.

Details

iESO Agent for NASA Worldview

AKD-designed, CARE-based chatbot agent for interpreting Earth observation imagery and datasets in NASA Worldview.

Details

NASA PSI Agent

Scientific knowledge and data assistant for NASA Physical Science Informatics repositories.

Details

All agents operate as decision-support systems with explicit human-in-the-loop oversight.

06 · Multi-Agent Workflows

End-to-end research pipelines.

Multi-agent workflows connect published AKD agents into end-to-end research pipelines. Workflows are composed in AKD Flow as graphs of nodes (agents) and edges (execution order) with explicit human approval gates between stages so researchers can review, refine, or pause the pipeline at any point.

Climate Modeling Workflow (Closed-Loop CM1)

A five-stage pipeline for atmospheric simulation research using the CM1 model, from initial question through gap analysis, capability and feasibility mapping, and executable specification design.

gap
cmr
code-search

Source on akd-suite ↗

FM Tuning Workflow

A multi-agent workflow targeting structured foundation model tuning for science applications. (In development)

Community Workflows

Community-contributed pipelines composed from published AKD agents.

07 · Trust & Safety

A reusable input/output safety net.

Scientific guardrails establish a reusable input/output safety net. In AKD, Granite Guardian and Risk Agent serve as critical guardrails. Each guardrail acts as a checkpoint between an AI agent and the outside world. The guardrails inspect both input and output and issue a standardized "pass/fail" report that identifies potential risks and remains attached to the agent's response. Because the reports follow a consistent format, providers are interchangeable and composable across any AKD agent.

What guardrails detect

A non-exhaustive view of the failure modes guardrails actively check for in scientific agent outputs:

Hallucination
Attribution gaps
Unsupported claims
Outdated confidence
Overgeneralization
Lack of grounding
Multidisciplinary reasoning failures
Consistency issues

General safety

Granite Guardian

A fast, content-focused moderation LLM based on the IBM Granite family of models. Assesses jailbreaks, violence, sexual content, profanity, social biases, unethical behaviors, and harm, plus executes RAG-specific checks for groundedness, relevance, and answer relevance.

Defense-in-Depth

The two providers compose into a defense-in-depth pattern: a fast, cheap content filter rejects obvious cases up front for human messages, and a slower, domain-aware judge runs on generated responses.

Science-specific

FactReasoner

An LLM that targets factuality reasoning over claims, attribution, and supporting evidence to mitigate hallucinations and ensure factual accuracy in agent outputs.

Risk Agent

An LLM judge that evaluates outputs against the IBM Risk Atlas and a NASA Science Literature Risk taxonomy, along with context-specific risks for each agent. Domain-aware, detecting hallucinations, attribution gaps, consistency issues, overgeneralizations, outdated confidence, and multidisciplinary failures. Uses a DAG-based, importance-weighted evaluation graph.

08 · Real Science Applications

Agents in action.

Research in Action using AKD agents demonstrates how multiple AI agents can support an end-to-end scientific workflow, from literature discovery to research gap identification, hypothesis development, data search, code reuse, analysis support, and scientific output.

Agricultural Land Abandonment in Conflict-Affected Myanmar

The Research in Action workflow demonstrates an end-to-end multi-agent research case study that involves connecting literature ingestion, gap detection, and paper drafting into a single workflow with human approval between stages.

Designers: Nidhi Jha, Siddharth Chaudhary

Read the case study

09 · Partners & Collaborations

Built across NASA and partner organizations.

AKD is a collaborative effort spanning NASA programs, research labs, and engineering partners connected through shared infrastructure and tooling.

NASA-IMPACT AI teamProgram Lead
IBM ResearchGuardrails · AI Risk
Development SeedEngineering
NASA WorldviewEarth Visualization
Physical Science InformaticsPSI
iESO / MIOiESO/MIO collaboration with NASA Worldview
NASA Science Discovery EngineSDE
Open CommunityContributors

10 · AKD Multi-Agent Framework

AKD Enterprise Stack.

The AKD SDK can be used to build a multi-agent framework, a workflow-oriented web product, and an experimental lab inside a single shared ecosystem of Python libraries, AKD agents, and science-specific guardrails. Seven layers run from infrastructure up to user-facing applications, with a governance lane in parallel. Example governance gates apply at each layer instead of waiting for final output evaluation.

L1 Application Components AKD SDK

Agent Test & Validation Environment
Agentic Apps (e.g., AKD UI)
AI Conversational Search · RAG Chatbots
Traditional Search Interface

Example gate Security & Public Use Policy Approval

L2 Auth & Authorization

Authentication / SSO
Quota Management
Access Control (Authorization)

L3 Search & Discovery

Agent Registry
Tools Registry
Notebook Registry
Documents · Code Registry

Example gate Registry Approval Processes

L4 Agentic Layer

Agentic (LLM + Tools) Engine
Agent Composition Engine
Conversational RAG Engine

Example gate SME Approval

L5 Services APIs

JupyterHub
LLM Inference (Ollama · Bedrock · vLLM)
AI Sandbox (e.g., E2B.dev)
Scientific Guardrails for AI (Factuality · Compliance)
Reusable Scientific Tools
SDK APIs

Example gate Compliance & Cost Checks

L6 Science Artifacts & KB

Notebooks
Documents
Metadata
Code Repositories
CARE-Designed Agents & Tool Prompts

Example gate Access Control List

L7 Infrastructure

NASA Science Cloud
On-Prem (FM Models & Inferencing)
Personal Cloud

11 · Governance

Governance built in, ready for teams.

Governance should not be an isolated process at the endpoint of agent design. In AKD, governance is integrated throughout, supporting teams in their initial design decisions. Checkpoints run in parallel with the development pathway, and the same governance applies regardless of how an agent is distributed: AKD Flow workflows, Custom GPTs, community MCP servers, or the akd-ext open-source repository.

Stage 1 · During Design Technical Best Practices

AKD's technical best practices are a codified set of design and implementation community-accepted best practices and architectural requirements for agent development. The practices include mandatory protocols for establishing science guardrails, modularity, and error handling to ensure agents built are all at a uniform level.

Stage 2 · During Build Benchmark & Guardrail Compliance

Each agent must pass a defined benchmark suite and integrate the guardrail layer (Granite Guardian, Risk Agent, FactReasoner, NASA Risk Taxonomy) that gates outputs. Teams pass benchmark thresholds in AKD Labs and integrate guardrails before the SME review.

Stage 3 · At Review Gate SME Readiness Review

When the agent meets its design requirements, an SME lead conducts a readiness review. The agent must demonstrate scientific integrity (non-prescriptive behavior, traceable reasoning), guardrail compliance, and benchmark results before promotion is approved.

Stage 4 · In Production Operational Blueprint & Community Infrastructure

A community-developed manual defines the roles, responsibilities, and decision-making workflows for AI agent deployment, establishing essential Human-in-the-Loop (HITL) checkpoints and ethical compliance pathways for AI-generated scientific outputs. The Agent Hub provides peer review, pipelines for community contributions, and a standardized documentation portal.

12 · Team