NVIDIA/aicr
Tooling for optimized, validated, and reproducible GPU-accelerated AI runtime in Kubernetes
TensorRT, NeMo, Megatron-LM, RAPIDS, cuDF
Tooling for optimized, validated, and reproducible GPU-accelerated AI runtime in Kubernetes
OpenShell is the safe, private runtime for autonomous AI agents.
Open-source deep-learning framework for exploring, building and deploying AI weather/climate workflows.
LLM KV cache compression made easy
BioNeMo Framework: For building and adapting AI models in drug discovery at scale
A unified library of SOTA model optimization techniques like quantization, pruning, distillation, speculative decoding, etc. It compresses deep learning models for downstream deployment frameworks like TensorRT-LLM, TensorRT, vLLM, etc. to optimize inference speed.
The NVIDIA NeMo Agent toolkit is an open-source library for efficiently connecting and optimizing teams of AI agents.
Open-source deep-learning framework for building, training, and fine-tuning deep learning models using state-of-the-art Physics-ML methods
NVIDIA DLSS is a new and improved deep learning neural network that boosts frame rates and generates beautiful, sharp images for your games
NeMo Retriever Library is a scalable, performance-oriented document content and metadata extraction microservice. NeMo Retriever extraction uses specialized NVIDIA NIM microservices to find, contextualize, and extract text, tables, charts and images that you can use in downstream generative applications.
Framework providing pythonic APIs, algorithms and utilities to be used with PhysicsNeMo core to physics inform model training as well as higher level abstraction for domain experts
PyTorch implementation of Audio Flamingo: Series of Advanced Audio Understanding Language Models
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT LLM also contains components to create Python and C++ runtimes that orchestrate the inference execution in a performant way.
This repo contains the source code for RULER: What’s the Real Context Size of Your Long-Context Language Models?
A developer reference project for creating Retrieval Augmented Generation (RAG) chatbots on Windows using TensorRT-LLM
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
A Python framework for GPU-accelerated simulation, robotics, and machine learning.
A GPU-accelerated library containing highly optimized building blocks and an execution engine for data processing to accelerate deep learning training and inference applications.
Generative AI reference workflows optimized for accelerated infrastructure and microservice architecture.
Optimized primitives for collective multi-GPU communication
A Flow-based Generative Network for Speech Synthesis
Ongoing research training transformer models at scale
State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit and 4-bit floating point (FP8 and FP4) precision on Hopper, Ada and Blackwell GPUs, to provide better performance with lower memory utilization in both training and inference.
Transformer related optimization, including BERT, GPT
NVIDIA Merlin is an open source library providing end-to-end GPU-accelerated recommender systems, from feature engineering and preprocessing to training deep learning models and running inference in production.
Style transfer, deep learning, feature transform
the LLM vulnerability scanner
Minkowski Engine is an auto-diff neural network library for high-dimensional sparse tensors
CUDA Templates and Python DSLs for High-Performance Linear Algebra
Deep Learning GPU Training System
A library that uses hardware acceleration to load sequences of video frames to facilitate machine learning training
Deep Learning Experiment Management
NVIDIA Deep learning Dataset Synthesizer (NDDS)
Flowtron is an auto-regressive flow-based generative network for text to speech synthesis with control over speech variation and style transfer
Fast and accurate object detection with end-to-end GPU optimization
Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
Deep learning for recommender systems
Synthesizing and manipulating 2048x1024 images with conditional GANs
A suite of image and video neural tokenizers
Context-Aware RAG library for Knowledge Graph ingestion and retrieval functions.
Unsupervised Language Modeling at scale for robust sentiment classification