Guardrails

General safety + science-specific guardrails

Composable components teams declare at design time, run during build, and keep active at runtime. Generic LLM safety on one side, science-specific reasoning on the other. Both are separate components in the AKD stack.

What guardrails detect

A non-exhaustive view of failure modes guardrails actively check for in scientific agent outputs:

Hallucination
Attribution gaps
Unsupported claims
Outdated confidence
Overgeneralization
Lack of grounding
Multidisciplinary reasoning failures
Consistency issues

General safety

Off-the-shelf LLM safety components reused across every agent in the catalog.

Granite Guardian

Foundation-model content moderation: jailbreak, harm, bias, plus RAG groundedness, relevance, and answer-relevance checks. Runs via Ollama. Off-the-shelf safety component reused across agents.

Granite Guardian (IBM, 2024) ↗

Source ↗

Defense-in-Depth

Layered review pattern: input filters, intermediate-step checks, and output verification. A composition pattern, not a single component; AKD applies it across agent boundaries rather than gating only at one point.

Source ↗

Science-specific guardrails

Components built specifically for scientific agent outputs: claim-level factuality, NASA risk taxonomies, and domain-aware compliance reasoning.

FactReasoner

Claim-level factuality reasoning. Decomposes agent outputs into claims, scores attribution and supporting evidence, and surfaces unsupported assertions before they reach the user.

FactReasoner: source & docs ↗

Source ↗

Risk Agent

LLM-judge that classifies and explains risks in agent outputs against two taxonomies: the IBM Risk Atlas and the NASA Science Literature Risk taxonomy. Domain-aware, importance-weighted, DAG-based evaluation graph.

IBM AI Risk Atlas ↗

Source ↗

What guardrails detect

General safety

Granite Guardian

Defense-in-Depth

Science-specific guardrails

FactReasoner

Risk Agent

Reference papers