AI Agents for Science

01

Why Standard AI Chatbots Struggle with Specialized Science Tasks

Standard AI chatbots often lack access to the specialized tools, databases, terminology, and ontologies needed for advanced scientific work. As a result, they have a higher risk of hallucinating when they operate in narrow technical domains.

The value a chatbot provides usually depends on the user’s domain expertise. A subject matter expert can ask precise questions, recognize weak reasoning, and quickly judge whether an answer is correct. A novice, however, may not be sure what to ask, which assumptions to challenge, or how to verify a response.

This uneven performance is often described as the “jagged edge of AI”: AI performs well on some tasks but fails unexpectedly on others, even when the tasks appear similar.

02

What Is an AI Agent?

An AI agent can be viewed as an independent digital worker. You give it a task, and it determines how to accomplish that task.

Unlike a standard chatbot, which mainly responds to questions, an agent can take action. It can analyze a problem, break it into smaller steps, use the right tools, review intermediate results, and correct mistakes as it works toward completion.

03

How an Agent Is Designed

A well-designed agent typically includes three core elements:

  • Reasoning logic, often implemented through prompts or instructions.
  • Contextual information, which grounds the agent’s reasoning in the relevant domain.
  • Tool access, along with clear guidance on when to use each tool and how to use it correctly.

Through the design of these elements, agents can help smooth the jagged edge of AI. When a subject matter expert helps build the agent, the design can capture some of that expert’s implicit knowledge: how to reason through the task, what context matters, which tools to use, and when to use them.

As a result, a novice using the agent may achieve productivity gains closer to those of an expert because the agent embeds expert judgment into the workflow.

However, designing an agent for complex scientific tasks is difficult, and it should not be treated as a purely technical endeavor. Building a capable agent requires close collaboration among developers, scientists, domain experts, and end users to ensure the agent reasons correctly, uses tools appropriately, and produces reliable outputs.

01 · Vision

Accelerated Knowledge Discovery: How we build AI Agents for Science

We want to transform AI into a full cognitive research partner. The Accelerated Knowledge Discovery (AKD) project drives this transformation through five strategic pillars.

  • Pillar 1

    Co-design from the start

    Using the CARE methodology, we seek to foster a triadic collaboration between developers, scientists, and AI agents to ensure every agent is scientifically grounded, technically sound, context optimized and efficient.

  • Pillar 2

    Demystify the AI Agent

    We want to enable everyone to design, build, and share by providing a common Process, Software Development Kit (SDK) and accessible platforms, lowering the barrier to entry for the entire community.

  • Pillar 3

    Provide Science Guardrails

    To ensure safety and scientific integrity, we want to integrate specialized Risk Agents which provide science-specific guardrails: factuality reasoning over claims and citations, the NASA Science Risk taxonomy, and domain-aware compliance checks.

  • Pillar 4

    Standardize the process

    We seek to implement community-accepted design and development processes across the science community, ensuring that agent behaviour is consistent, reproducible, and scalable.

  • Pillar 5

    Build a collaborative community

    By creating an open ecosystem, we want to enable the ecosystems to evolve and improve through feedback and shared contributions, ultimately accelerating the pace of science for everyone.

Read the full vision
10
Single agents
3
Complex (end-to-end) research agents
Open Project

Explore the AI Agents for Science repository, contribute agents, review documentation, and help build open, trustworthy AI infrastructure for science.

02 · Strategic Objectives

Four anchors guide everything we build.

01 / Time

Reclaim Scientists' Time

Currently, scientists spend 70–80% of their time on manual, repetitive tasks such as curating datasets and cross-referencing literature. By reducing the time spent on such tasks, AKD enables researchers to instead focus on high-level hypothesis generation and data analysis.

02 / Orchestration

Orchestrate Cross-Disciplinary Research

AKD connects disparate data from siloed science repositories and publications, acting as an intelligent orchestrator by selectively merging individual information sources into a unified knowledge stream.

03 / Infrastructure

Integrate Uniform Infrastructure

AKD implements community-accepted design processes to harmonize the use of AI science agents across NASA's data repositories. With a common AI Agent Software Development Kit (SDK), the framework ensures that agents operate within established scientific guardrails.

04 / Trust

Scientific Integrity

Built on foundations of transparency, reproducibility, and open science, AKD strives for the utmost scientific integrity in design and practice.

03 · Methodology

Design with CARE.

Collaborative Agent Reasoning Engineering (CARE) is a disciplined, stage-gated methodology designed to systematically engineer AI agents for scientific and technical workflows.

  • Subject Matter Experts

    Provide domain authority: surfacing nuanced constraints and validating scientific "truth".

  • Developers

    Act as the implementation authority, ensuring tool realism and feasibility.

  • Helper LLM Agents

    Serve as facilitation infrastructure: asking phase-aligned questions, drafting specifications in Markdown, and proposing revisions for human approval.

Five phases: each gated by joint SME / developer / agent approval

  1. Phase 1 Scope & Decompose

    Defines the target workflow, users, and constraints.

  2. Phase 2 Key Information Elicitation

    Captures details on tools, domain context, and required output formats.

  3. Phase 3 Reasoning Policy & Guardrails

    Codifies "expert-like" thinking logic and safety boundaries for uncertainty or tool errors.

  4. Phase 4 Implementation

    Translates approved artifacts into an engineered agent prompt using established design patterns.

  5. Phase 5 Benchmarking & Verification

    Establishes realistic query sets and scoring rubrics to detect regressions over time.

Read the full methodology
04 · Agent Development Pathway

From idea to published agent: pick your path.

AKD is a composable framework for AI agents in NASA science. Each agent is designed with CARE, assembled from shared building blocks (prompts, tools, context, and pluggable guardrail and reasoning services) reviewed by a domain expert, then run on whichever surface scientists already work in: AKD Flow, integrated chatbots in Teams or Slack, Community MCP servers, or hosted chatbots in ChatGPT or ChatGSFC.

The same blocks compose every agent; the same agent reaches scientists wherever they work. Pick a tour below to highlight one path through it. Hover over any card to view a full description; click the icon in any step to open the associated repository.

Design the agent
How will you run CARE?
Path A CARE Studio in AKD Labs UI
Path B CARE phase prompts (ChatGPT / ChatGSFC)
What CARE produces
Step 2 CARE Output
  • Prompt
  • Tools
  • Context
  • Guardrails
How will the agent run?
Step 3 · Pick a runtime Choose Operationalization Path
  • 1 Chat assistant
  • 2 Custom agent
  • 3 Agent workflow
Branch 1 · Chat assistant Chat assistant → ChatGPT / ChatGSFC
Branch 2 · Custom agent Custom agent → Build in Agent Toolkit
Iterate AKD Labs feedback loop
Branch 3 · Agent workflow Agent workflow → Compose in Flow
Pass SME review
B2 + B3 · SME review SME Readiness Review (gate)
Runs on AKD Flow
Where scientists meet it
Surface AKD Flow Workflows
Surface Hosted Chatbots (ChatGPT · ChatGSFC)
Surface Community MCP Servers
Surface Agent Toolkit (akd-ext)
Surface Integrated Chatbots

Teams · Slack · NASA collab tools

05 · Standalone Agents

Specialized AKD agents for scientific domains.

Specialized AKD agents for scientific domains can be used as standalone agents or in multi-agent workflows. Each agent is designed via the CARE process, ships with a documented reasoning strategy and a curated tool set, and is non-prescriptive by design. The specialized agents identify candidates, evidence, and caveats, while human researchers retain authority over output selection and interpretation.

Code Search Agent

An AKD agent for discovering publicly available scientific code repositories that plausibly align with a user's technical or scientific task.

  • search
  • code
  • discovery
Details

NASA Data Governance Agent

Supports practical, traceable, and role-adaptive data governance guidance across all NASA Science Mission Directorate divisions.

  • governance
  • policy
  • compliance
Details

NASA PSI Agent

Scientific knowledge and data assistant for NASA Physical Science Informatics repositories.

  • search
  • physical-science
  • psi
  • knowledge
Details

All agents operate as decision-support systems with explicit human-in-the-loop oversight.

06 · Multi-Agent Workflows

End-to-end research pipelines.

Multi-agent workflows connect published AKD agents into end-to-end research pipelines. Workflows are composed in AKD Flow as graphs of nodes (agents) and edges (execution order) with explicit human approval gates between stages so researchers can review, refine, or pause the pipeline at any point.

FM Tuning Workflow

A multi-agent workflow targeting structured foundation model tuning for science applications. (In development)

07 · Trust & Safety

A reusable input/output safety net.

Scientific guardrails establish a reusable input/output safety net. In AKD, Granite Guardian and Risk Agent serve as critical guardrails. Each guardrail acts as a checkpoint between an AI agent and the outside world. The guardrails inspect both input and output and issue a standardized "pass/fail" report that identifies potential risks and remains attached to the agent's response. Because the reports follow a consistent format, providers are interchangeable and composable across any AKD agent.

What guardrails detect

A non-exhaustive view of the failure modes guardrails actively check for in scientific agent outputs:

  • Hallucination
  • Attribution gaps
  • Unsupported claims
  • Outdated confidence
  • Overgeneralization
  • Lack of grounding
  • Multidisciplinary reasoning failures
  • Consistency issues
General safety

Granite Guardian

A fast, content-focused moderation LLM based on the IBM Granite family of models. Assesses jailbreaks, violence, sexual content, profanity, social biases, unethical behaviors, and harm, plus executes RAG-specific checks for groundedness, relevance, and answer relevance.

Defense-in-Depth

The two providers compose into a defense-in-depth pattern: a fast, cheap content filter rejects obvious cases up front for human messages, and a slower, domain-aware judge runs on generated responses.

Science-specific

FactReasoner

An LLM that targets factuality reasoning over claims, attribution, and supporting evidence to mitigate hallucinations and ensure factual accuracy in agent outputs.

Risk Agent

An LLM judge that evaluates outputs against the IBM Risk Atlas and a NASA Science Literature Risk taxonomy, along with context-specific risks for each agent. Domain-aware, detecting hallucinations, attribution gaps, consistency issues, overgeneralizations, outdated confidence, and multidisciplinary failures. Uses a DAG-based, importance-weighted evaluation graph.

08 · Real Science Applications

Agents in action.

Research in Action using AKD agents demonstrates how multiple AI agents can support an end-to-end scientific workflow, from literature discovery to research gap identification, hypothesis development, data search, code reuse, analysis support, and scientific output.

09 · Partners & Collaborations

Built across NASA and partner organizations.

AKD is a collaborative effort spanning NASA programs, research labs, and engineering partners connected through shared infrastructure and tooling.

  • NASA-IMPACT AI teamProgram Lead
  • IBM ResearchGuardrails · AI Risk
  • Development SeedEngineering
  • NASA WorldviewEarth Visualization
  • Physical Science InformaticsPSI
  • iESO / MIOiESO/MIO collaboration with NASA Worldview
  • NASA Science Discovery EngineSDE
  • Open CommunityContributors
10 · AKD Multi-Agent Framework

AKD Enterprise Stack.

The AKD SDK can be used to build a multi-agent framework, a workflow-oriented web product, and an experimental lab inside a single shared ecosystem of Python libraries, AKD agents, and science-specific guardrails. Seven layers run from infrastructure up to user-facing applications, with a governance lane in parallel. Example governance gates apply at each layer instead of waiting for final output evaluation.

L1 Application Components AKD SDK
  • Agent Test & Validation Environment
  • Agentic Apps (e.g., AKD UI)
  • AI Conversational Search · RAG Chatbots
  • Traditional Search Interface
Example gate Security & Public Use Policy Approval
L2 Auth & Authorization
  • Authentication / SSO
  • Quota Management
  • Access Control (Authorization)
L3 Search & Discovery
  • Agent Registry
  • Tools Registry
  • Notebook Registry
  • Documents · Code Registry
Example gate Registry Approval Processes
L4 Agentic Layer
  • Agentic (LLM + Tools) Engine
  • Agent Composition Engine
  • Conversational RAG Engine
Example gate SME Approval
L5 Services APIs
  • JupyterHub
  • LLM Inference (Ollama · Bedrock · vLLM)
  • AI Sandbox (e.g., E2B.dev)
  • Scientific Guardrails for AI (Factuality · Compliance)
  • Reusable Scientific Tools
  • SDK APIs
Example gate Compliance & Cost Checks
L6 Science Artifacts & KB
  • Notebooks
  • Documents
  • Metadata
  • Code Repositories
  • CARE-Designed Agents & Tool Prompts
Example gate Access Control List
L7 Infrastructure
  • NASA Science Cloud
  • On-Prem (FM Models & Inferencing)
  • Personal Cloud
11 · Governance

Governance built in, ready for teams.

Governance should not be an isolated process at the endpoint of agent design. In AKD, governance is integrated throughout, supporting teams in their initial design decisions. Checkpoints run in parallel with the development pathway, and the same governance applies regardless of how an agent is distributed: AKD Flow workflows, Custom GPTs, community MCP servers, or the akd-ext open-source repository.

Stage 1 · During Design Technical Best Practices

AKD's technical best practices are a codified set of design and implementation community-accepted best practices and architectural requirements for agent development. The practices include mandatory protocols for establishing science guardrails, modularity, and error handling to ensure agents built are all at a uniform level.

Stage 2 · During Build Benchmark & Guardrail Compliance

Each agent must pass a defined benchmark suite and integrate the guardrail layer (Granite Guardian, Risk Agent, FactReasoner, NASA Risk Taxonomy) that gates outputs. Teams pass benchmark thresholds in AKD Labs and integrate guardrails before the SME review.

Stage 3 · At Review Gate SME Readiness Review

When the agent meets its design requirements, an SME lead conducts a readiness review. The agent must demonstrate scientific integrity (non-prescriptive behavior, traceable reasoning), guardrail compliance, and benchmark results before promotion is approved.

Stage 4 · In Production Operational Blueprint & Community Infrastructure

A community-developed manual defines the roles, responsibilities, and decision-making workflows for AI agent deployment, establishing essential Human-in-the-Loop (HITL) checkpoints and ethical compliance pathways for AI-generated scientific outputs. The Agent Hub provides peer review, pipelines for community contributions, and a standardized documentation portal.

12 · Team

The people behind AKD.

AKD is built by a multi-organization team of scientists, engineers, and researchers.

NASA-IMPACT AI team

  • Rahul Ramachandran (NASA)
  • Muthukumaran Ramasubramanian (UAH)
  • Nidhi Jha (UAH)
  • Ajinkya Kulkarni (UAH)
  • Nishan Pantha (UAH)
  • Paridhi Parijuli (UAH)
  • Sanjog Thapa (UAH)
  • Rohit Sahoo (UAH)
  • Luke Payne (SESDA)
  • Pushwitha Krishnappa (UAH)
  • Rachel Slank (USRA)
  • Ash Danehkar (USRA)
  • Sajil Awale (UAH)
  • Simran KC (UAH)
  • Ray French
  • McKenzie Hicks (Barrios)

IBM Research

  • Juan Bernabe-Moreno
  • Geeth Del Mel
  • Alessandra Pascale
  • Bishwaranjan Bhattacharjee
  • James Barry
  • Vishnudev Kuruvanthodi
  • Movina Moses
  • Tigran Tchkarian
  • Javier Carnerero Cano
  • Mohab Elkaref

Development Seed

  • Leo Thomas
  • Gjore Milevski
  • Lane Goodman