AI Safety

RISING

Projects and tools focused on ensuring the safe deployment and use of AI.

Active projects 24
New this week +24
Total star growth +54
13.3k
Total Stars
1.6k
Total Forks
0
Multi-Source Repos
+54
Stars This Period

Top Projects (24)

AE

x-zheng16/Awesome-Embodied-AI-Safety

Safety in Embodied AI: A Survey of Risks, Attacks, and Defenses | 400+ Papers | Perception, Cognition, Planning, Interaction, Agentic System

Trend 4
🔥 Heating Up +13.5%
adversarial-attacks ai-safety autonomous-driving backdoor-attacks embodied-agents embodied-ai jailbreak large-language-models multimodal robotics survey
59 0 +4/wk
GitHub
AG

microsoft/agent-governance-toolkit

AI Agent Governance Toolkit — Policy enforcement, zero-trust identity, execution sandboxing, and reliability engineering for autonomous AI agents. Covers 10/10 OWASP Agentic Top 10.

Trend 4
🔥 Heating Up +11.1%
agent-framework ai-agents ai-safety compliance governance microsoft owasp policy-engine python security trust zero-trust
843 152 +32/wk
GitHub
OR

OrlojHQ/orloj

An orchestration runtime for multi-agent AI systems. Declare agents, tools, and policies as YAML; Orloj schedules, executes, routes, and governs them for production-grade operation.

Trend 3
agent-framework agentic-ai agentic-orchestration ai ai-agents ai-governance ai-safety declarative guardrails infrastructure-as-code llm llmops mlops multi-agent open-source orchestration platform-engineering workflow-orchestration yaml
70 7 +0/wk
GitHub
OP

romgX/openrelay

几百个免费 AI 模型配额,一键接入本地项目。| Hundreds of free AI model quotas, one-click access to local projects.

Trend 3
ai ai-proxy aider cerebras claude claude-code copilot cursor developer-tools free-ai free-api groq kiro llm-proxy model-router openai openclaw proxy windsurf
263 28 +5/wk
GitHub
NP

node9-ai/node9-proxy

The Execution Security Layer for the Agentic Era. Providing deterministic "Sudo" governance and audit logs for autonomous AI agents.

Trend 3
ai-safety ai-security claude-code gemini gemini-cli llm llm-agent mcp-server
110 10 +0/wk
GitHub
IN

JKHeadley/instar

Persistent Claude Code agents with scheduling, sessions, memory, and Telegram.

Trend 3
agent-framework agent-identity agent-infrastructure agent-memory agent-skills ai-agents ai-safety autonomous-agents claude-code cli cron job-scheduler llm mcp npm-package open-source persistency telegram-bot typescript whatsapp
52 15 +0/wk
GitHub
QV

QWED-AI/qwed-verification

AISecOps (AI Security Operations) framework for deterministic verification of AI systems. QWED verifies LLM outputs using math, logic, and symbolic execution — creating an auditable trust boundary for agentic AI systems. Not generation. Verification.

Trend 3
ai-accuracy ai-safety ai-security aisecops code-security deterministic-ai enterprise-ai formal-verification generative-ai hallucination hallucination-detection llm-safety llm-verification machine-learning neurosymbolic-ai nlp python smt-solver sympy z3-prover
51 7 +0/wk
GitHub
IB

wuyoscar/ISC-Bench

Internal Safety Collapse: Turning the LLM or an AI Agent into a sensitive data generator.

Trend 3
adversarial-attacks agent-safety ai-safety benchmark frontier-models jailbreak large-language-models llm-safety red-teaming safety-evaluation
844 130 +2/wk
GitHub
OR

schmitech/orbit

One API for 20+ LLM providers, your databases, and your files — self-hosted, open-source AI gateway with RAG, voice, and guardrails.

Trend 3
ai-assistant ai-gateway ai-safety anthropic chatbot developer-tools elasticsearch llm mongodb natural-language-to-sql ollama-client openai python rag retrieval-augmented-generation self-hosted speech-to-text text-to-speech vector-database
248 41 +0/wk
GitHub
TO

ryoungj/ToolEmu

[ICLR'24 Spotlight] A language model (LM)-based emulation framework for identifying the risks of LM agents with tool use

Trend 3
agent ai-safety language-agent language-model large-language-models prompt-engineering
197 21 +1/wk
GitHub
AC

agentcontrol/agent-control

Centralized agent control plane for governing runtime agent behavior at scale. Configurable, extensible, and production-ready.

Trend 3
agentic-workflow ai-safety guardrails llm runtime-guardrails
194 21 +1/wk
GitHub
NE

rhino-acoustic/NeuronFS

mkdir beats vector DB. B-tree NeuronFS: 0-byte folders govern AI — ₩0 infrastructure, ~200x token efficiency. OS-native constraint engine for LLM agents.

Trend 3
ai-agent ai-safety cursor-rules file-system guardrails llm model-agnostic multi-agent neuronfs prompt-engineering structure-is-context zero-cost
129 22 +1/wk
GitHub
AM

jphall663/awesome-machine-learning-interpretability

A curated list of awesome responsible machine learning resources.

Trend 3
ai-safety awesome awesome-list data-science explainable-ml fairness interpretability interpretable-ai interpretable-machine-learning interpretable-ml machine-learning machine-learning-interpretability privacy-enhancing-technologies privacy-preserving-machine-learning python r reliable-ai secure-ml transparency xai
4.0k 625 +1/wk
GitHub
SR

PKU-Alignment/safe-rlhf

Safe RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback

Trend 3
ai-safety alpaca beaver datasets deepspeed gpt large-language-models llama llm llms reinforcement-learning reinforcement-learning-from-human-feedback rlhf safe-reinforcement-learning safe-reinforcement-learning-from-human-feedback safe-rlhf safety transformer transformers vicuna
1.6k 132 +0/wk
GitHub
UQ

cvs-health/uqlm

UQLM: Uncertainty Quantification for Language Models, is a Python package for UQ-based LLM hallucination detection

Trend 3
ai-evaluation ai-safety confidence-estimation confidence-score hallucination hallucination-detection hallucination-evaluation hallucination-mitigation llm llm-evaluation llm-hallucination llm-safety uncertainty-estimation uncertainty-quantification
1.1k 119 +0/wk
GitHub
CY

ttguy0707/CyberClaw

👾 下一代透明智能体架构 | Next-Gen Transparent Agent Architecture 🔍 全行为审计 | 🛡️ 两段式安全调用 | 🧠 双水位记忆 | ⏰ 心跳任务 📊 P0 级事故率降低 80% | 兼容 OpenClaw + Claude Code 技能生态

Trend 2
New Signal
agent agent-framework ai-agent ai-safety claude-code cross-platform enterprise-ai langchain langgraph llm openclaw python transparent-ai two-phase-invocation
39 6 +8/wk
GitHub
MR

OpenLMLab/MOSS-RLHF

Secrets of RLHF in Large Language Models Part I: PPO

Trend 0
ai-safety alignment rlhf
1.4k 105 +0/wk
GitHub
LA

PacificAI/langtest

Deliver safe & effective language models

Trend 0
ai-safety ai-testing artificial-intelligence benchmark-framework benchmarks ethics-in-ai large-language-models llm llm-as-evaluator llm-evaluation-toolkit llm-test llm-testing ml-safety ml-testing mlops model-assessment nlp responsible-ai trustworthy-ai
556 49 +0/wk
GitHub
CO

cordum-io/cordum

The open agent control plane. Govern autonomous AI agents with pre-execution policy enforcement, approval gates, and audit trails. Works with LangChain, CrewAI, MCP, and any framework.

Trend 0
agent-framework agentic-ai ai-agent ai-governance ai-orchestration ai-safety audit-trail autonomous-agents control-plane devops governance human-in-the-loop llm llm-agents mcp model-context-protocol nats policy-engine safety-kernel workflow-engine
459 21 -1/wk
GitHub
TI

tigerlab-ai/tiger

Open Source LLM toolkit to build trustworthy LLM applications. TigerArmor (AI safety), TigerRAG (embedding, RAG), TigerTune (fine-tuning)

Trend 0
ai-safety aisafety classification data-augmentation fine-tuning large-language-models llm llm-training rag
401 27 +0/wk
GitHub
RD

Govcraft/rust-docs-mcp-server

🦀 Prevents outdated Rust code suggestions from AI assistants. This MCP server fetches current crate docs, uses embeddings/LLMs, and provides accurate context via a tool call.

Trend 0
ai ai-safety caching cargo coding-assistant context-aware crates-io developer-tools embeddings information-retrieval llm mcp mcp-server openai rag rust rust-library rustdoc rustlang semantic-search
268 33 +0/wk
GitHub
DI

WindVChen/DiffAttack

An unrestricted attack based on diffusion models that can achieve both good transferability and imperceptibility.

Trend 0
adverarial-attacks ai-safety diffusion-adversarial-attack diffusion-models imperceptible-attacks transferable-attacks unrestricted-attacks
261 19 +0/wk
GitHub
AG

Tooooa/AgentMark

【ACL 2026 Main】AgentMark: Utility-Preserving Behavioral Watermarking for Agents

Trend 0
New Signal
acl acl2026 agent ai-safety behavioral-watermarking large-language-models llm watermark
75 3 +0/wk
GitHub
SH

jnMetaCode/shellward

AI Agent Security Middleware — 8-layer defense, DLP data flow, prompt injection detection, zero dependencies. SDK + OpenClaw plugin.

Trend 0
agent-security ai-agent ai-firewall ai-safety ai-security claude-code cursor data-exfiltration dlp guardrails langchain llm-security mcp mcp-security openclaw pii-detection prompt-injection runtime-security security shellward
53 8 +0/wk
GitHub

Related Topics