AI Safety

AE

x-zheng16/Awesome-Embodied-AI-Safety

Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses | 400+ Papers | Perception, Cognition, Planning, Interaction, Agentic System

Trend 4

🔥 Heating Up +13.5%

adversarial-attacks ai-safety autonomous-driving backdoor-attacks embodied-agents embodied-ai jailbreak large-language-models multimodal robotics survey

59 0 +4/wk

GitHub

AG

microsoft/agent-governance-toolkit

AI Agent Governance Toolkit — Policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents. Covers 10/10 OWASP Agentic Top 10.

Trend 4

🔥 Heating Up +11.1%

agent-framework ai-agents ai-safety compliance governance microsoft owasp policy-engine python security trust zero-trust

843 152 +32/wk

GitHub

OR

OrlojHQ/orloj

An orchestration runtime for multi-agent AI systems. Declare agents, tools, and policies as YAML; Orloj schedules, executes, routes, and governs them for production-grade operation.

Trend 3

agent-framework agentic-ai agentic-orchestration ai ai-agents ai-governance ai-safety declarative guardrails infrastructure-as-code llm llmops mlops multi-agent open-source orchestration platform-engineering workflow-orchestration yaml

70 7 +0/wk

GitHub

OP

romgX/openrelay

几百个免费 AI 模型配额，一键接入本地项目。| Hundreds of free AI model quotas, one-click access to local projects.

Trend 3

ai ai-proxy aider cerebras claude claude-code copilot cursor developer-tools free-ai free-api groq kiro llm-proxy model-router openai openclaw proxy windsurf

263 28 +5/wk

GitHub

NP

node9-ai/node9-proxy

The Execution Security Layer for the Agentic Era. Providing deterministic "Sudo" governance and audit logs for autonomous AI agents.

Trend 3

ai-safety ai-security claude-code gemini gemini-cli llm llm-agent mcp-server

110 10 +0/wk

GitHub

IN

JKHeadley/instar

Persistent Claude Code agents with scheduling, sessions, memory, and Telegram.

Trend 3

agent-framework agent-identity agent-infrastructure agent-memory agent-skills ai-agents ai-safety autonomous-agents claude-code cli cron job-scheduler llm mcp npm-package open-source persistency telegram-bot typescript whatsapp

52 15 +0/wk

GitHub

QV

QWED-AI/qwed-verification

AISecOps (AI Security Operations) framework for deterministic verification of AI systems. QWED verifies LLM outputs using math, logic, and symbolic execution — creating an auditable trust boundary for agentic AI systems. Not generation. Verification.

Trend 3

ai-accuracy ai-safety ai-security aisecops code-security deterministic-ai enterprise-ai formal-verification generative-ai hallucination hallucination-detection llm-safety llm-verification machine-learning neurosymbolic-ai nlp python smt-solver sympy z3-prover

51 7 +0/wk

GitHub

IB

wuyoscar/ISC-Bench

Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.

Trend 3

adversarial-attacks agent-safety ai-safety benchmark frontier-models jailbreak large-language-models llm-safety red-teaming safety-evaluation

844 130 +2/wk

GitHub

OR

schmitech/orbit

One API for 20+ LLM providers, your databases, and your files — self-hosted, open-source AI gateway with RAG, voice, and guardrails.

Trend 3

ai-assistant ai-gateway ai-safety anthropic chatbot developer-tools elasticsearch llm mongodb natural-language-to-sql ollama-client openai python rag retrieval-augmented-generation self-hosted speech-to-text text-to-speech vector-database

248 41 +0/wk

GitHub

TO

ryoungj/ToolEmu

[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use

Trend 3

agent ai-safety language-agent language-model large-language-models prompt-engineering

197 21 +1/wk

GitHub

AC

agentcontrol/agent-control

Centralized agent control plane for governing runtime agent behavior at scale. Configurable, extensible, and production-ready.

Trend 3

agentic-workflow ai-safety guardrails llm runtime-guardrails

194 21 +1/wk

GitHub

NE

rhino-acoustic/NeuronFS

mkdir beats vector DB. B-tree NeuronFS: 0-byte folders govern AI — ₩0 infrastructure, ~200x token efficiency. OS-native constraint engine for LLM agents.

Trend 3

ai-agent ai-safety cursor-rules file-system guardrails llm model-agnostic multi-agent neuronfs prompt-engineering structure-is-context zero-cost

129 22 +1/wk

GitHub

AM

jphall663/awesome-machine-learning-interpretability

A curated list of awesome responsible machine learning resources.

Trend 3

ai-safety awesome awesome-list data-science explainable-ml fairness interpretability interpretable-ai interpretable-machine-learning interpretable-ml machine-learning machine-learning-interpretability privacy-enhancing-technologies privacy-preserving-machine-learning python r reliable-ai secure-ml transparency xai

4.0k 625 +1/wk

GitHub

SR

PKU-Alignment/safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Trend 3

ai-safety alpaca beaver datasets deepspeed gpt large-language-models llama llm llms reinforcement-learning reinforcement-learning-from-human-feedback rlhf safe-reinforcement-learning safe-reinforcement-learning-from-human-feedback safe-rlhf safety transformer transformers vicuna

1.6k 132 +0/wk

GitHub

UQ

cvs-health/uqlm

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection

Trend 3

ai-evaluation ai-safety confidence-estimation confidence-score hallucination hallucination-detection hallucination-evaluation hallucination-mitigation llm llm-evaluation llm-hallucination llm-safety uncertainty-estimation uncertainty-quantification

1.1k 119 +0/wk

GitHub

CY

ttguy0707/CyberClaw

Trend 2

✦ New Signal

agent agent-framework ai-agent ai-safety claude-code cross-platform enterprise-ai langchain langgraph llm openclaw python transparent-ai two-phase-invocation

39 6 +8/wk

GitHub

MR

OpenLMLab/MOSS-RLHF

Secrets of RLHF in Large Language Models Part I: PPO

Trend 0

ai-safety alignment rlhf

1.4k 105 +0/wk

GitHub

LA

PacificAI/langtest

Deliver safe & effective language models

Trend 0

ai-safety ai-testing artificial-intelligence benchmark-framework benchmarks ethics-in-ai large-language-models llm llm-as-evaluator llm-evaluation-toolkit llm-test llm-testing ml-safety ml-testing mlops model-assessment nlp responsible-ai trustworthy-ai

556 49 +0/wk

GitHub

CO

cordum-io/cordum

The open agent control plane. Govern autonomous AI agents with pre-execution policy enforcement, approval gates, and audit trails. Works with LangChain, CrewAI, MCP, and any framework.

Trend 0

agent-framework agentic-ai ai-agent ai-governance ai-orchestration ai-safety audit-trail autonomous-agents control-plane devops governance human-in-the-loop llm llm-agents mcp model-context-protocol nats policy-engine safety-kernel workflow-engine

459 21 -1/wk

GitHub

TI

tigerlab-ai/tiger

Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)

Trend 0

ai-safety aisafety classification data-augmentation fine-tuning large-language-models llm llm-training rag

401 27 +0/wk

GitHub

RD

Govcraft/rust-docs-mcp-server

🦀 Prevents outdated Rust code suggestions from AI assistants. This MCP server fetches current crate docs, uses embeddings/LLMs, and provides accurate context via a tool call.

Trend 0

ai ai-safety caching cargo coding-assistant context-aware crates-io developer-tools embeddings information-retrieval llm mcp mcp-server openai rag rust rust-library rustdoc rustlang semantic-search

268 33 +0/wk

GitHub

DI

WindVChen/DiffAttack

An unrestricted attack based on diffusion models that can achieve both good transferability and imperceptibility.

Trend 0

adverarial-attacks ai-safety diffusion-adversarial-attack diffusion-models imperceptible-attacks transferable-attacks unrestricted-attacks

261 19 +0/wk

GitHub

AG

Tooooa/AgentMark

【ACL 2026 Main】AgentMark: Utility-Preserving Behavioral Watermarking for Agents

Trend 0

✦ New Signal

acl acl2026 agent ai-safety behavioral-watermarking large-language-models llm watermark

75 3 +0/wk

GitHub

SH

jnMetaCode/shellward

AI Agent Security Middleware — 8-layer defense, DLP data flow, prompt injection detection, zero dependencies. SDK + OpenClaw plugin.

Trend 0

agent-security ai-agent ai-firewall ai-safety ai-security claude-code cursor data-exfiltration dlp guardrails langchain llm-security mcp mcp-security openclaw pii-detection prompt-injection runtime-security security shellward

53 8 +0/wk

GitHub

Top Projects (24)

x-zheng16/Awesome-Embodied-AI-Safety

microsoft/agent-governance-toolkit

OrlojHQ/orloj

romgX/openrelay

node9-ai/node9-proxy

JKHeadley/instar

QWED-AI/qwed-verification

wuyoscar/ISC-Bench

schmitech/orbit

ryoungj/ToolEmu

agentcontrol/agent-control

rhino-acoustic/NeuronFS

jphall663/awesome-machine-learning-interpretability

PKU-Alignment/safe-rlhf

cvs-health/uqlm

ttguy0707/CyberClaw

OpenLMLab/MOSS-RLHF

PacificAI/langtest

cordum-io/cordum

tigerlab-ai/tiger

Govcraft/rust-docs-mcp-server

WindVChen/DiffAttack

Tooooa/AgentMark

jnMetaCode/shellward

Related Topics