SA

thu-ml/SageAttention

[ICLR2025, ICML2025, NeurIPS2025 Spotlight] Quantized Attention achieves speedup of 2-5x compared to FlashAttention, without losing end-to-end metrics across language, image, and video models.

3.3k 389 +0/wk
GitHub
attention cuda efficient-attention inference-acceleration llm llm-infra mlsys quantization triton video-generate video-generation vit
Trend 3

Star & Fork Trend (19 data points)

Stars
Forks

Multi-Source Signals

Growth Velocity

thu-ml/SageAttention has +0 stars this period . 7-day velocity: 0.1%.

Deep analysis is being generated for this repository.

Signal-backed technical analysis will be available soon.

Metric SageAttention Torch-Pruning Awesome-Code-LLM Acontext
Stars 3.3k 3.3k3.3k3.3k
Forks 389 377225309
Weekly Growth +0 +1+0+1
Language Cuda PythonN/ATypeScript
Sources 1 111
License Apache-2.0 MITN/AApache-2.0

Capability Radar vs Torch-Pruning

SageAttention
Torch-Pruning
Maintenance Activity 57

Last code push 81 days ago.

Community Engagement 59

Fork-to-star ratio: 11.9%. Active community forking and contributing.

Issue Burden 70

Issue data not yet available.

Growth Momentum 30

No measurable growth in the current period (first-day cold start expected).

License Clarity 95

Licensed under Apache-2.0. Permissive — safe for commercial use.

Risk scores are computed from real-time repository data. Higher scores indicate healthier metrics.