huggingface/datasets
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
Projects and tools related to computer vision algorithms and applications.
🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools
Ultralytics YOLO 🚀
Cross-platform, customizable ML solutions for live and streaming media.
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Label Studio is a multi-type data labeling and annotation tool with standardized output format
World's first open-source, agentic video production system. 11 pipelines, 49 tools, 400+ agent skills. Turn your AI coding assistant into a full video production studio.
Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.
Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.
Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research
NVR with realtime local object detection for IP cameras
LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.
《动手学深度学习》:面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。
WebAI2API: 基于 Camoufox 的网页 AI 转 API 工具,支持 LMArena/Gemini等,多窗口并发与账号隔离。 | Web AI to OpenAI API via Camoufox. Supports LMArena/Gemini and more, multi-window concurrency & account isolation.
Free, open-source alternative to Weavy AI, Krea Nodes, Freepik Spaces & FloraFauna AI — node-based AI workflow builder for generative image & video pipelines
Learn it. Build it. Ship it for others.
Design a custom AI inference chip. That is the goal.
Stirling-PDF but for images. 30+ tools and local AI in a single Docker container - resize, compress, remove backgrounds, upscale, OCR, and more. No cloud, no telemetry. Your images never leave your machine.
Local Lens is a privacy-first, AI-powered photo organizer for your PC. Sort and group photos by faces, dates, and locations—all locally, with no cloud upload. Enjoy a modern, intuitive UI and keep your memories organized and secure on your own device.
Always up-to-date, most comprehensive HAR resource — continuously scanned and auto-updated from Papers with Code. 53 datasets integrated across all modalities.
Free self-driving car stack - fully open-source ADAS and autonomous driving system
行影集——你的私人AI智能相册
计算机毕业设计、机器学习毕业设计、深度学习毕业设计、原创AI项目【源码+论文】
ML-powered manga translator, written in Rust.
A Curated List of Awesome Video World Models with AR Diffusion: Covering Algorithms, Applications, and Infrastructure, Aimed at Serving as a Comprehensive Resource for Researchers, Practitioners, and Enthusiasts.
Pytorch implementation of "EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation"
A framework for efficient model inference with omni-modality models
✨ Reverse-engineered Python API for Google Gemini web app
1,300+ curated trending AI image prompts from X/Twitter, ranked by engagement. Works with NanoBanana Pro, GPT Image, Midjourney
A unified framework for easy reinforcement learning in Flow-Matching models
SRA's seminar on Introduction to Computer Vision Fundamentals
A Simple, Lightweight, and Extensible Serving Framework for X-AnyLabeling
Reliable, minimal and scalable library for pretraining foundation and world models
An efficient and research-friendly Gaussian Splatting framework described in the CVPR'26 paper "Faster-GS: Analyzing and Improving Gaussian Splatting Optimization"
A high-performance Unreal Engine 5 (UE5) plugin developed by MaLanShan Audio & Video Laboratory, designed for real-time visualization, management, and scalable rendering of 3D Gaussian Splatting (3DGS) and dynamic Volumetric Video (4DGS).
Daily notes on AI papers
The video search layer for AI agents. Search video by meaning — across speech, visuals, and on-screen text.
We have summarised all 3D anomaly detection methods and datasets (still updating). 多模态,点云和姿势无关异常检测的综述仓库(持续更新)
NetworkOptix open-source components used to build Powered-by-Nx products including Desktop Client for Network Optix Video Management Platform.
✔(已完结)超级全面的 深度学习 笔记【土堆 Pytorch】【李沐 动手学深度学习】【吴恩达 深度学习】【大飞 大模型Agent】
Effortless data labeling with AI support from Segment Anything and other awesome models.
[ICLR 2026] RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO, designed for fine-tuning.
SwarmUI (formerly StableSwarmUI), A Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility.
基于图像识别的自动化黑盒测试框架 | An automation black-box testing framework based on image recognition
【三年面试五年模拟】AIGC算法工程师面试秘籍。涵盖AIGC、LLM大模型、AI Agent、传统深度学习、自动驾驶、机器学习、计算机视觉、自然语言处理、强化学习、大数据挖掘、具身智能、元宇宙、AGI等AI行业面试笔试干货经验与核心知识。
Become a cracked AI/ML Research Engineer
Train, inspect, edit, automate, and export 3D Gaussian Splatting scenes from a single native application.
SAPIEN Manipulation Skill Framework, an open source GPU parallelized robotics simulator and benchmark
Open-Source AI Camera Skills Platform, AI NVR & CCTV Surveillance. Local VLM video analysis with Qwen, DeepSeek, SmolVLM, LLaVA, YOLO26. LLM-powered agentic security camera agent — watches, understands, remembers & guards your home via Telegram, Discord or Slack. Pluggable AI skills. OpenAI, Google, Anthropic or local AI. Runs on Mac Mini & AI PC.
Data Infrastructure providing a declarative, incremental approach for multimodal AI workloads.
Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.
Declarative way to run AI models in React Native on device, powered by ExecuTorch.
AI skill for OpenClaw & Claude Code — recommend from 10000+ Nano Banana Pro (Gemini) image prompts. Smart search by use case, content remix, sample images.
Automatically find issues in image datasets and practice data-centric computer vision.
[ECCV`24&ICLR`25] CityGaussian Series for High-quality Large-Scale Scene Reconstruction with Gaussians
🚀 即梦3.0逆向API【特长:图像生成顶流】,零配置部署,多路token支持,仅供测试,如需商用请前往官方开放平台。
Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)
Image Background Removal Toolkit - Open Source and API Models
:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works
Processed / Cleaned Data for Paper Copilot
CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold
PointFlow : 3D Point Cloud Generation with Continuous Normalizing Flows
⚕️GenAI powered multi-agentic medical diagnostics and healthcare research assistance chatbot. 🏥 Designed for healthcare professionals, researchers and patients.
🔥🔥🔥Java免费离线AI算法工具箱,支持人脸识别,活体检测,表情识别、目标检测、实例分割、行人检测、OCR文字识别、车牌识别、表格识别、ASR+TTS、机器翻译等功能,Maven引用即可使用。支持PyTorch、Tensorflow,已集成 Mtcnn、InsightFace、SeetaFace6、YOLOv8~v12、PaddleOCR(PPOCRv5)、Whisper等主流模型
A minimalist SOTA LaTeX OCR model with only 20M parameters, running in browser. Full training pipeline available for self-reproduction. | 超轻量SOTA LaTeX公式识别模型,仅20M参数量,可在浏览器中运行。训练全流程代码开源,以便自学复现。
OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.
Vision-and-Language Navigation in Continuous Environments using Habitat
DeepInverse: a PyTorch library for solving imaging inverse problems using deep learning
The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment
A new algorithm for retrieving topological skeleton as a set of polylines from binary images
This repository provides a comprehensive list of radar and optical satellite datasets curated for ship detection, classification, semantic segmentation, and instance segmentation tasks. These datasets are ideal for applications in computer vision, machine learning, remote sensing, and maritime analysis.
Books / PDFS / EPUBS for different fields of programming . READ GROW AND ENJOY 😊😊😊😊
This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"
The official Roboflow Python package. Manage your datasets, models, and deployments. Roboflow has everything you need to build a computer vision application.
本地监控 + AI 视觉 — LAN-based smartphone-powered AI monitoring framework with structured event output for data acquisition and analysis.
This repository contains some python code of some traditional change detection methods or provides their original websites, such as SFA, MAD, and some deep learning-based change detection methods, such as SiamCRNN, DSFA, and some FCN-based methods.
Node Creative Coding / 3D / Image Processing tool inspired by Quartz Composer
AI Plugin is a powerful extension for the Payload CMS, integrating advanced AI capabilities to enhance content creation and management.
Benchmark diffusion models faster. Automate evals, seeds, and metrics for reproducible results.
PhotonVision is the free, fast, and easy-to-use computer vision solution for the FIRST Robotics Competition.
A visual-based graph node editor for training computer vision models.
Next-generation Albumentations: dual-licensed for open-source and commercial use
A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI framework, it provides an efficient, scalable, and user-friendly solution for running MLX-based vision and language models locally with an OpenAI-compatible interface.
Powerful & Easy-to-Use Video Face Swapping and Editing Software
MegaDetector is an AI model that helps conservation folks spend less time doing boring things with camera trap images.
Securade.ai HUB - A generative AI based edge platform for computer vision that connects to existing CCTV cameras and makes them smart.
Deploy Computer Vision solutions with a few lines of code.
Create and control 3D shapes using hand gestures in real-time. Built with mediapipe computer vision and threejs
Create a digital-twin style traffic visualization using only mp4 CCTV footage and its Google Maps location.
A Python library for efficient image generation using CSS Flexbox
Machine vision toolbox for Python
GammaCV is a WebGL accelerated Computer Vision library for browser
All Computer Vision Projects - Beginner to Advanced
FFHQR -- the first large-scale retouching dataset for computer vision research.
c4 GenAI Suite
100+ AI Machine learning Deep learning Computer vision NLP Projects with code
Hand-controlled arpeggiator, drum machine, and audio reactive visualizer. Built with mediapipe computer vision, threejs, tonejs
A structured learning reference for computer vision: from image fundamentals to research frontiers
An app for collecting raw RGB-D scans on iOS devices.