Computer Vision | AISignal

DA

huggingface/datasets

🤗 The largest hub of ready-to-use datasets for AI models with fast, easy-to-use and efficient data manipulation tools

Trend 22

ai artificial-intelligence computer-vision dataset-hub datasets deep-learning huggingface llm machine-learning natural-language-processing nlp numpy pandas pytorch speech tensorflow

21.4k 3.2k +1/wk

GitHub HuggingFace PyPI 3-source

UL

ultralytics/ultralytics

Ultralytics YOLO 🚀

Trend 19

cli computer-vision deep-learning hub image-classification instance-segmentation machine-learning object-detection pose-estimation python pytorch rotated-object-detection segment-anything tracking ultralytics yolo yolo-world yolo11 yolo26 yolov8

55.6k 10.7k +46/wk

GitHub PyPI arxiv 3-source

ME

google-ai-edge/mediapipe

Cross-platform, customizable ML solutions for live and streaming media.

Trend 18

android audio-processing c-plus-plus calculator computer-vision deep-learning framework graph-based graph-framework inference machine-learning mediapipe mobile-development perception pipeline-framework stream-processing video-processing

34.6k 5.9k +24/wk

GitHub PyPI 2-source

EA

JaidedAI/EasyOCR

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Trend 18

cnn crnn data-mining deep-learning easyocr image-processing information-retrieval lstm machine-learning ocr optical-character-recognition python pytorch scene-text scene-text-recognition

29.3k 3.6k +7/wk

GitHub PyPI 2-source

LS

HumanSignal/label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format

Trend 14

annotation annotation-tool annotations boundingbox computer-vision data-labeling dataset datasets deep-learning image-annotation image-classification image-labeling image-labelling-tool label-studio labeling labeling-tool mlops semantic-segmentation text-annotation yolo

27.0k 3.5k +12/wk

GitHub PyPI 2-source

OP

calesthio/OpenMontage

World's first open-source, agentic video production system. 11 pipelines, 49 tools, 400+ agent skills. Turn your AI coding assistant into a full video production studio.

Trend 14

⚡ Breakout +112.1%

agent agentic-ai ai claude copilot cursor elevenlabs ffmpeg flux image-generation open-source openai python remotion stable-diffusion text-to-speech text-to-video video-generation video-production

844 149 +91/wk

GitHub

IN

invoke-ai/InvokeAI

Invoke is a leading creative engine for Stable Diffusion models, empowering professionals, artists, and enthusiasts to generate and create visual media using the latest AI-driven technologies. The solution offers an industry leading WebUI, and serves as the foundation for multiple commercial products.

Trend 13

ai-art artificial-intelligence generative-art image-generation img2img inpainting latent-diffusion linux macos outpainting stable-diffusion txt2img windows

27.0k 2.8k +15/wk

GitHub PyPI 2-source

VP

lucidrains/vit-pytorch

Implementation of Vision Transformer, a simple way to achieve SOTA in vision classification with only a single transformer encoder, in Pytorch

Trend 13

artificial-intelligence attention-mechanism computer-vision image-classification transformers

25.0k 3.5k +1/wk

GitHub PyPI 2-source

KH

khoj-ai/khoj

Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.

Trend 13

agent ai assistant chat chatgpt emacs image-generation llama3 llamacpp llm obsidian obsidian-md offline-llm productivity rag research self-hosted semantic-search stt whatsapp-ai

34.0k 2.1k +25/wk

GitHub HuggingFace PyPI 3-source

AI

microsoft/AirSim

Open source simulator for autonomous vehicles built on Unreal Engine / Unity, from Microsoft AI & Research

Trend 11

ai airsim artificial-intelligence autonomous-quadcoptor autonomous-vehicles computer-vision control-systems cross-platform deep-reinforcement-learning deeplearning drones pixhawk platform-independent research self-driving-car simulator unreal-engine

18.1k 4.9k +4/wk

GitHub HuggingFace PyPI 3-source

FR

blakeblackshear/frigate

NVR with realtime local object detection for IP cameras

Trend 10

ai camera google-coral home-assistant home-automation homeautomation mqtt nvr object-detection realtime rtsp tensorflow

31.3k 3.0k +16/wk

GitHub PyPI 2-source

LO

mudler/LocalAI

LocalAI is the open-source AI engine. Run any model - LLMs, vision, voice, image, video - on any hardware. No GPU required.

Trend 7

agents ai api audio-generation decentralized distributed image-generation libp2p llama llm mamba mcp musicgen object-detection rerank stable-diffusion text-generation tts

45.1k 3.9k +56/wk

GitHub HuggingFace PyPI 3-source

DZ

d2l-ai/d2l-zh

《动手学深度学习》：面向中文读者、能运行、可讨论。中英文版被70多个国家的500多所大学用于教学。

Trend 5

book chinese computer-vision deep-learning machine-learning natural-language-processing notebook python

76.9k 12.2k +25/wk

GitHub PyPI 2-source

WE

foxhui/WebAI2API

WebAI2API: 基于 Camoufox 的网页 AI 转 API 工具，支持 LMArena/Gemini等，多窗口并发与账号隔离。 | Web AI to OpenAI API via Camoufox. Supports LMArena/Gemini and more, multi-window concurrency & account isolation.

Trend 5

🔥 Heating Up +23.9%

ai-tools browser-automation generative-ai image-generation openai-api text-generation text-to-image web-scraping

368 114 +26/wk

GitHub

VW

SamurAIGPT/Vibe-Workflow

Free, open-source alternative to Weavy AI, Krea Nodes, Freepik Spaces & FloraFauna AI — node-based AI workflow builder for generative image & video pipelines

Trend 4

ai ai-workflow-builder artistic-intelligence comfyui creative-tools fastapi florafauna-ai-alternative freepik-spaces-alternative generative-ai image-generation krea-nodes-alternative nextjs node-editor open-source self-hosted-ai video-generation weavy weavy-ai-alternative weavyai workflow-automation

112 26 +3/wk

GitHub

AE

rohitg00/ai-engineering-from-scratch

Learn it. Build it. Ship it for others.

Trend 4

agents ai ai-agents ai-engineering computer-vision course deep-learning from-scratch generative-ai llm machine-learning mcp nlp python reinforcement-learning rust swarm-intelligence transformers tutorial typescript

2.0k 381 +31/wk

GitHub

AH

ai-hpc/ai-hardware-engineer-roadmap

Design a custom AI inference chip. That is the goal.

Trend 4

autonomous-driving computer-vision cuda deep-learning digital-design edge-ai embedded-linux embedded-system gpu gpu-programming hpc mlir nvidia nvidia-jeston opencl rtos sensor-fusion tinygrad verilog xilinx

54 16 +0/wk

GitHub

SI

stirling-image/stirling-image

Stirling-PDF but for images. 30+ tools and local AI in a single Docker container - resize, compress, remove backgrounds, upscale, OCR, and more. No cloud, no telemetry. Your images never leave your machine.

Trend 4

ai docker homelab image-editor image-processing open-source self-hosted

577 15 +12/wk

GitHub

LO

ashesbloom/LocalLens

Local Lens is a privacy-first, AI-powered photo organizer for your PC. Sort and group photos by faces, dates, and locations—all locally, with no cloud upload. Enjoy a modern, intuitive UI and keep your memories organized and secure on your own device.

Trend 4

ai-tools automated-categorization computer-vision cross-platform desktop-app face-recognition facial-recognition fastapi gui-application machine-learning offline-processing photo-management photo-organization photography privacy-first productivity python react tauri windows-installer

109 6 +2/wk

GitHub

AH

Leooo-Huang/awesome-human-activity-recognition

Always up-to-date, most comprehensive HAR resource — continuously scanned and auto-updated from Papers with Code. 53 datasets integrated across all modalities.

Trend 3

action-recognition awesome awesome-list benchmark computer-vision datasets deep-learning human-activity-recognition machine-learning motion-detection pose-estimation

93 1 +2/wk

GitHub

AV

autowarefoundation/autoware_vision_pilot

Free self-driving car stack - fully open-source ADAS and autonomous driving system

Trend 3

adas advanced-driver-assistance-systems artificial-intelligence autonomous-driving autopilot autoware computer-vision deep-learning deep-neural-networks end-to-end-machine-learning foundation-models open-source robotics self-driving-car

461 99 +7/wk

GitHub

TR

LC044/TrailSnap

行影集——你的私人AI智能相册

Trend 3

ai album photo

299 39 +6/wk

GitHub

TE

TensorCEO/TensorCEO

计算机毕业设计、机器学习毕业设计、深度学习毕业设计、原创AI项目【源码+论文】

Trend 3

ai-projects artificial-intelligence computer-science-project computer-vision deep-learning dl-projects final-year-project flask graduation-design graduation-thesis machine-learning ml-projects python thesis-project

108 13 +2/wk

GitHub

KO

mayocream/koharu

ML-powered manga translator, written in Rust.

Trend 3

computer-vision deep-learning gpu japanese manga rust tauri

2.0k 101 +10/wk

GitHub

AV

gracezhao1997/Awesome-Video-World-Models-with-AR-Diffusion

A Curated List of Awesome Video World Models with AR Diffusion: Covering Algorithms, Applications, and Infrastructure, Aimed at Serving as a Comprehensive Resource for Researchers, Practitioners, and Enthusiasts.

Trend 3

ar-diffusion autoregressive awesome-list computer-vision diffusion-models generative-ai video-generation world-models

364 13 +8/wk

GitHub

ED

Intellindust-AI-Lab/EdgeCrafter

Pytorch implementation of "EdgeCrafter: Compact ViTs for Edge Dense Prediction via Task-Specialized Distillation"

Trend 3

computer-vision dinov3 distillation instance-segmentation lightweight object-detection pose-estimation real-time

113 11 +0/wk

GitHub

VO

vllm-project/vllm-omni

A framework for efficient model inference with omni-modality models

Trend 3

audio-generation diffusion image-generation inference model-serving multimodal pytorch transformer video-generation

4.2k 718 +26/wk

GitHub

GA

HanaokaYuzu/Gemini-API

✨ Reverse-engineered Python API for Google Gemini web app

Trend 3

ai api async bard chatbot gemini generative-ai google google-gemini image-generation imagefx llm nano-banana python reverse-engineering

2.6k 379 +20/wk

GitHub

NT

jau123/nanobanana-trending-prompts

1,300+ curated trending AI image prompts from X/Twitter, ranked by engagement. Works with NanoBanana Pro, GPT Image, Midjourney

Trend 3

awesome-list gemini3proimage gpt-image image-generation midjourney nanobanana nanobananapro prompt-engineering prompts

366 41 +2/wk

GitHub

FF

X-GenGroup/Flow-Factory

A unified framework for easy reinforcement learning in Flow-Matching models

Trend 3

diffusion flow-matching image-generation reinforcement-learning video-generation

318 22 +2/wk

GitHub

PI

SRA-VJTI/Pixels

SRA's seminar on Introduction to Computer Vision Fundamentals

Trend 3

build-system computer-vision cpp git github image-processing makefile numpy opencv python

189 146 +0/wk

GitHub

XA

CVHub520/X-AnyLabeling-Server

A Simple, Lightweight, and Extensible Serving Framework for X-AnyLabeling

Trend 3

annotation-tool clip computer-vision deep-learning grounding-dino image-classification image-labeling-tool instance-segmentation labeling-tool machine-learning object-detection pose-estimation pytorch rotated-object-detection segment-anything transformers vision-language-model x-anylabeling yolo

188 27 +0/wk

GitHub

SP

galilai-group/stable-pretraining

Reliable, minimal and scalable library for pretraining foundation and world models

Trend 3

computer-vision computer-vision-algorithms contrastive-learning deep-learning distributed foundation-models joint-embedding joint-embedding-predictive-architecture large-language-model multimodal-learning pytorch self-supervised-learning stable-pretraining transformers

181 34 +1/wk

GitHub

FG

nerficg-project/faster-gaussian-splatting

An efficient and research-friendly Gaussian Splatting framework described in the CVPR'26 paper "Faster-GS: Analyzing and Improving Gaussian Splatting Optimization"

Trend 3

computer-graphics computer-vision gaussian-splatting novel-view-synthesis

139 17 +2/wk

GitHub

MU

mlslabs/MLSLabsGaussianSplattingRenderer-UE

A high-performance Unreal Engine 5 (UE5) plugin developed by MaLanShan Audio & Video Laboratory, designed for real-time visualization, management, and scalable rendering of 3D Gaussian Splatting (3DGS) and dynamic Volumetric Video (4DGS).

Trend 3

3dgs 4dgs computer-graphics computer-vision gaussian-splatting radiance-field ue5 unreal-engine volumetric-video

137 12 +0/wk

GitHub

PN

AkihikoWatanabe/paper_notes

Daily notes on AI papers

Trend 3

adaptive-learning blog computer-vision educational-data-mining language-model learning-analytics machine-learning nlp notes paper recommender-systems technology

107 2 +1/wk

GitHub

CE

cerul-ai/cerul

The video search layer for AI agents. Search video by meaning — across speech, visuals, and on-screen text.

Trend 3

ai-agent ai-agents api computer-vision multimodal neon-postgres open-source pgvectorscale rag semantic-search skills understanding video-search video-search-engine

96 4 +0/wk

GitHub

A3

M-3LAB/awesome-3d-anomaly-detection

We have summarised all 3D anomaly detection methods and datasets (still updating). 多模态，点云和姿势无关异常检测的综述仓库（持续更新）

Trend 3

3d anomaly-detection anomaly-segmentation awesome-lists computer-vision datasets graphics llms point-cloud reviews three-dimensional

93 0 +0/wk

GitHub

NO

networkoptix/nx_open

NetworkOptix open-source components used to build Powered-by-Nx products including Desktop Client for Network Optix Video Management Platform.

Trend 3

ai camera desktop-client meta networkoptix nx nx-meta object-detection onvif video-processing vms webrtc

73 31 +1/wk

GitHub

CV

AccumulateMore/CV

✔（已完结）超级全面的深度学习笔记【土堆 Pytorch】【李沐动手学深度学习】【吴恩达深度学习】【大飞大模型Agent】

Trend 3

agent agents book chinese computer-vision cv deep-learning jupyter-notebook llm llms machine-learning natural-language-processing nlp notebook python rag

19.5k 2.2k +76/wk

GitHub PyPI 2-source

XA

CVHub520/X-AnyLabeling

Effortless data labeling with AI support from Segment Anything and other awesome models.

Trend 3

artificial-intelligence clip computer-vision deep-learning groundingdino image-annotation-tool image-classification image-labeling-tool image-matting instance-segmentation machine-learning object-detection ocr onnxruntime paddlepaddle pose-estimation rotated-object-detection sam vision-language-model yolo

8.7k 932 +10/wk

GitHub

RD

roboflow/rf-detr

[ICLR 2026] RF-DETR is a real-time object detection and segmentation model architecture developed by Roboflow, SOTA on COCO, designed for fine-tuning.

Trend 3

computer-vision detr instance-segmentation machine-learning object-detection rf-detr sota

6.3k 758 +13/wk

GitHub

SW

mcmonkeyprojects/SwarmUI

SwarmUI (formerly StableSwarmUI), A Modular Stable Diffusion Web-User-Interface, with an emphasis on making powertools easily accessible, high performance, and extensibility.

Trend 3

ai comfyui csharp image-generation javascript machine-learning ml python stable-diffusion

3.9k 389 +6/wk

GitHub

MA

MaaXYZ/MaaFramework

基于图像识别的自动化黑盒测试框架 | An automation black-box testing framework based on image recognition

Trend 3

black-box-testing computer-vision

3.7k 400 +12/wk

GitHub

AI

WeThinkIn/AIGC-Interview-Book

【三年面试五年模拟】AIGC算法工程师面试秘籍。涵盖AIGC、LLM大模型、AI Agent、传统深度学习、自动驾驶、机器学习、计算机视觉、自然语言处理、强化学习、大数据挖掘、具身智能、元宇宙、AGI等AI行业面试笔试干货经验与核心知识。

Trend 3

ai-agent aigc computer-vision deep-learning interview interview-preparation interview-questions interviews-solutions large-language-models machine-learning natural-language-processing openclaw stable-diffusion transformer

3.4k 379 +12/wk

GitHub

MC

HenryNdubuaku/maths-cs-ai-compendium

Become a cracked AI/ML Research Engineer

Trend 3

ai-textbook algorithms artificial-intelligence computer-science computer-vision deep-learning jax linear-algebra machine-learning machine-learning-algorithms math mathematics multimodal-learning nlp probability python reinforcement-learning speech-processing statistics

3.0k 428 +14/wk

GitHub

LS

MrNeRF/LichtFeld-Studio

Train, inspect, edit, automate, and export 3D Gaussian Splatting scenes from a single native application.

Trend 3

computer-graphics computer-vision cuda gaussian-splatting optimization

2.8k 287 +8/wk

GitHub

MA

haosulab/ManiSkill

SAPIEN Manipulation Skill Framework, an open source GPU parallelized robotics simulator and benchmark

Trend 3

3d-computer-vision computer-vision embodied-ai reinforcement-learning robot-learning robot-manipulation robotics robotics-simulation simulation-environment

2.8k 460 +5/wk

GitHub

DE

SharpAI/DeepCamera

Open-Source AI Camera Skills Platform, AI NVR & CCTV Surveillance. Local VLM video analysis with Qwen, DeepSeek, SmolVLM, LLaVA, YOLO26. LLM-powered agentic security camera agent — watches, understands, remembers & guards your home via Telegram, Discord or Slack. Pluggable AI skills. OpenAI, Google, Anthropic or local AI. Runs on Mac Mini & AI PC.

Trend 3

ai ai-camera ai-nvr camera cctv computer-vision deep-learning face-recognition home-assistant home-security llama-cpp llm local-ai machine-learning object-detection python raspberry-pi security-camera video-surveillance vlm

2.7k 425 +1/wk

GitHub

PI

pixeltable/pixeltable

Data Infrastructure providing a declarative, incremental approach for multimodal AI workloads.

Trend 3

ai artificial-intelligence chatbot computer-vision data-science database feature-engineering feature-store genai llm machine-learning ml mlops multimodal vector-database

1.6k 207 +3/wk

GitHub

MM

MiniMax-AI/MiniMax-MCP

Official MiniMax Model Context Protocol (MCP) server that enables interaction with powerful Text to Speech, image generation and video generation APIs.

Trend 3

image-generation image-to-video mcp mcp-server mcp-tools text-to-image text-to-speech text-to-video video-generation voice-cloning

1.4k 246 +3/wk

GitHub

RN

software-mansion/react-native-executorch

Declarative way to run AI models in React Native on device, powered by ExecuTorch.

Trend 3

computer-vision executorch image-embeddings llm-inference object-detection ocr on-device-ai react-native-ai segmentation speech-to-text text-embeddings text-to-speech vlm

1.4k 68 +1/wk

GitHub

NB

YouMind-OpenLab/nano-banana-pro-prompts-recommend-skill

AI skill for OpenClaw & Claude Code — recommend from 10000+ Nano Banana Pro (Gemini) image prompts. Smart search by use case, content remix, sample images.

Trend 3

ai-agent ai-image claude-code-skill clawhub content-creation gemini image-generation nano-banana openclaw openclaw-skill prompt-engineering prompt-library

1.4k 139 +5/wk

GitHub

CL

cleanlab/cleanvision

Automatically find issues in image datasets and practice data-centric computer vision.

Trend 3

computer-vision data-centric-ai data-exploration data-profiling data-quality data-science data-validation deep-learning exploratory-data-analysis image-analysis image-classification image-generation image-quality image-segmentation

1.2k 77 +0/wk

GitHub

CI

Linketic/CityGaussian

[ECCV`24&ICLR`25] CityGaussian Series for High-quality Large-Scale Scene Reconstruction with Gaussians

Trend 3

3d computer-vision eccv2024 gaussian-splatting graphics iclr2025 large-scale level-of-details neural-network neural-rendering novel-view-synthesis radiance-field surface-reconstruction

1.1k 96 +0/wk

GitHub

JF

LLM-Red-Team/jimeng-free-api

🚀 即梦3.0逆向API【特长：图像生成顶流】，零配置部署，多路token支持，仅供测试，如需商用请前往官方开放平台。

Trend 3

bytedance chatbot chatgpt-api image-generation image-generation-ai jimeng llm

1.1k 282 +1/wk

GitHub

SI

simpler-env/SimplerEnv

Evaluating and reproducing real-world robot manipulation policies (e.g., RT-1, RT-1-X, Octo) in simulation under common setups (e.g., Google Robot, WidowX+Bridge) (CoRL 2024)

Trend 3

computer-vision embodied-ai real2sim reinforcement-learning robot-learning robot-manipulation robotics robotics-benchmark robotics-simulation

1.0k 186 +3/wk

GitHub

WI

withoutbg/withoutbg

Image Background Removal Toolkit - Open Source and API Models

Trend 3

ai-background-removal background-removal background-removal-open-source background-removal-toolkit background-remover background-remover-onnx-model computer-vision computer-vision-ai docker-background-removal-open-source image-background-removal image-matting image-processing open-source-background-removal open-source-background-remover python-background-removal

977 47 +1/wk

GitHub

AC

cuixing158/Awesome-CV-MasterHub

:fire: :fire: :fire: A paper list of some recent Computer Vision(CV) works

Trend 3

awesome image-captioning image-classification image-dehazing image-denoising image-enhancement image-fusion image-generation image-segmentation keypoint-detection low-level-vision object-detection panoptic-segmentation paper-code paper-list papers-with-code pose-estimation video-generation video-understanding vision-transformer

909 54 +2/wk

GitHub

PA

papercopilot/paperlists

Processed / Cleaned Data for Paper Copilot

Trend 3

artificial-intelligence computational-linguistics computer-graphics computer-vision databases dataminning natural-language-processing robotics

901 43 +2/wk

GitHub

CA

mees/calvin

CALVIN - A benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks

Trend 3

computer-vision deep-learning grounding manipulation natural-language-processing pytorch robotics vision vision-and-language vision-language

877 115 +3/wk

GitHub

VS

MIT-SPARK/VGGT-SLAM

VGGT-SLAM: Dense RGB SLAM Optimized on the SL(4) Manifold

Trend 3

computer-vision slam vggt vggt-slam

866 97 +0/wk

GitHub

PO

stevenygd/PointFlow

PointFlow : 3D Point Cloud Generation with Continuous Normalizing Flows

Trend 3

3d-point-clouds computer-vision continuous-normalizing-flows machine-learning pytorch shapes

861 109 +1/wk

GitHub

MA

souvikmajumder26/Multi-Agent-Medical-Assistant

⚕️GenAI powered multi-agentic medical diagnostics and healthcare research assistance chatbot. 🏥 Designed for healthcare professionals, researchers and patients.

Trend 3

agent agentic-ai agents chatbot computer-vision disease-detection genai genai-chatbot generative-ai guardrails langchain langgraph large-language-models llm medical-image-processing medical-imaging python rag retrieval-augmented-generation vector-database

854 189 +4/wk

GitHub

SM

geekwenjie/SmartJavaAI

🔥🔥🔥Java免费离线AI算法工具箱，支持人脸识别，活体检测，表情识别、目标检测、实例分割、行人检测、OCR文字识别、车牌识别、表格识别、ASR+TTS、机器翻译等功能，Maven引用即可使用。支持PyTorch、Tensorflow，已集成 Mtcnn、InsightFace、SeetaFace6、YOLOv8~v12、PaddleOCR(PPOCRv5)、Whisper等主流模型

Trend 3

android asr clip deep-learning djl face-attribute face-comparison face-detection face-quality face-recognition landmark object-detection ocr-recognition pose-estimation silent-face-anti-spoofing table-structure-recognition translation tts yolov12 yolov8

810 140 +4/wk

GitHub

TE

alephpi/Texo

A minimalist SOTA LaTeX OCR model with only 20M parameters, running in browser. Full training pipeline available for self-reproduction. | 超轻量SOTA LaTeX公式识别模型，仅20M参数量，可在浏览器中运行。训练全流程代码开源，以便自学复现。

Trend 3

computer-vision deep-learning distillation-model formula formulanet hydra latex latex-ocr machine-learning math math-formula-recognition ocr ocr-recognition python pytorch pytorch-lightning transformers unimernet vision-encoder-decoder

788 45 +1/wk

GitHub

VM

waybarrios/vllm-mlx

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

Trend 3

anthropic apple-silicon audio-processing claude-code computer-vision image-understanding inference llm machine-learning macos mllm mlx multimodal-ai speech-to-text stt text-to-speech tts video-understanding vision-language-model vllm

774 176 +0/wk

GitHub

VC

jacobkrantz/VLN-CE

Vision-and-Language Navigation in Continuous Environments using Habitat

Trend 3

ai computer-vision deep-learning python research robotics

766 82 +1/wk

GitHub

DE

deepinv/deepinv

DeepInverse: a PyTorch library for solving imaging inverse problems using deep learning

Trend 3

computational-imaging computed-tomography deblurring deep-equilibrium-models deep-learning diffusion-models image-processing image-reconstruction imaging inverse-problems mri plug-and-play pytorch super-resolution unfolded

698 158 +1/wk

GitHub

HM

hailo-ai/hailo_model_zoo

The Hailo Model Zoo includes pre-trained models and a full building and evaluation environment

Trend 3

ai-accelerators computer-vision deep-learning edge-ai hailo hailo8 quantization quantized-neural-networks

630 83 +1/wk

GitHub

ST

LingDong-/skeleton-tracing

A new algorithm for retrieving topological skeleton as a set of polylines from binary images

Trend 3

algorithm computational-geometry computer-vision polylines skeletonization

585 64 +0/wk

GitHub

SI

jasonmanesis/Satellite-Imagery-Datasets-Containing-Ships

This repository provides a comprehensive list of radar and optical satellite datasets curated for ship detection, classification, semantic segmentation, and instance segmentation tasks. These datasets are ideal for applications in computer vision, machine learning, remote sensing, and maritime analysis.

Trend 3

classification computer-vision dataset datasets deep-learning detection hrsid instance-segmentation list maritime-analysis optical remote-sensing sar satellite-imagery semantic-segmentation ship-detection ships ssdd synthetic-aperture-radar xview

574 74 +1/wk

GitHub

BO

Rishabh-creator601/Books

Books / PDFS / EPUBS for different fields of programming . READ GROW AND ENJOY 😊😊😊😊

Trend 3

computer-vision cpp deep-learning dvc-pipeline hacking hypothesis-testing javascript machine-learning maths mlflow mongodb-database natural-language-processing pdfs python reinforcement-learning sqlite3 statistics stats yolo

573 99 +1/wk

GitHub

MM

MMMU-Benchmark/MMMU

This repo contains evaluation code for the paper "MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI"

Trend 3

computer-vision deep-learning deep-neural-networks evaluation foundation-models large-language-models large-multimodal-models llm llms machine-learning multimodal multimodal-deep-learning multimodal-learning multimodality natural-language-processing question-answering stem visual-question-answering

553 50 +1/wk

GitHub

RP

roboflow/roboflow-python

The official Roboflow Python package. Manage your datasets, models, and deployments. Roboflow has everything you need to build a computer vision application.

Trend 3

computer-vision deep-learning machine-learning python

552 119 +0/wk

GitHub

CS

suzuran0y/CCTV-Smartphone-AI-Monitoring

本地监控 + AI 视觉 — LAN-based smartphone-powered AI monitoring framework with structured event output for data acquisition and analysis.

Trend 3

ai-monitoring computer-vision device-repurposing event-driven image-recognition-tool ip-camera ml-ops monitoring-system multimodal structured-output video-streaming

547 38 +1/wk

GitHub

CH

ChenHongruixuan/ChangeDetectionRepository

This repository contains some python code of some traditional change detection methods or provides their original websites, such as SFA, MAD, and some deep learning-based change detection methods, such as SiamCRNN, DSFA, and some FCN-based methods.

Trend 3

change-detection deep-learning image-processing multi-temporal python remote-sensing

528 108 +1/wk

GitHub

FA

Fabric-Project/Fabric

Node Creative Coding / 3D / Image Processing tool inspired by Quartz Composer

Trend 3

3d computer-vision creative-coding graphics llm metal mlx multimedia node-based post-processing realtime shaders swift swiftui video vlm

496 23 +1/wk

GitHub

PA

ashbuilds/payload-ai

AI Plugin is a powerful extension for the Payload CMS, integrating advanced AI capabilities to enhance content creation and management.

Trend 3

ai ai-translate ai-writing ai-writing-tool content-generation gpt-image-1 image-generation payload-plugin payloadcms plugin smart-generation text-generation text-to-image text-to-speech voice-generation

471 58 +2/wk

GitHub

DR

TheDesignFounder/DreamLayer

Benchmark diffusion models faster. Automate evals, seeds, and metrics for reproducible results.

Trend 3

benchmarking diffusion-models evaluation-metrics generative-ai image-generation stable-diffusion

408 209 +1/wk

GitHub

PH

PhotonVision/photonvision

PhotonVision is the free, fast, and easy-to-use computer vision solution for the FIRST Robotics Competition.

Trend 3

computer-vision frc java opencv vision vision-processing wpilib

407 294 +0/wk

GitHub

MF

zaina-ml/ml_forge

A visual-based graph node editor for training computer vision models.

Trend 3

artificial-intelligence beginner-friendly computer-vision data-science deep-learning desktop-app drag-and-drop gui image-classification machine-learning neural-network no-code node-editor open-source pipeline python pytorch training visual-programming

402 49 +1/wk

GitHub

AL

albumentations-team/AlbumentationsX

Next-generation Albumentations: dual-licensed for open-source and commercial use

Trend 3

3d augmentation bounding-box computer-vision data-augmentation deep-learning deeplearning image-augmentation image-classification image-processing image-segmentations instance-segmentation keypoint-detection machine-learning medical-imaging object-detection python pytorch segmentation tensorflow

300 27 +0/wk

GitHub

MO

cubist38/mlx-openai-server

A high-performance API server that provides OpenAI-compatible endpoints for MLX models. Developed using Python and powered by the FastAPI framework, it provides an efficient, scalable, and user-friendly solution for running MLX-based vision and language models locally with an OpenAI-compatible interface.

Trend 3

apple-silicon fastapi flux image-generation mlx mlx-lm mlx-vlm multi-models openai-compatible queue speech-recognition structured-outputs tool-calling vision-api whisper

290 53 +1/wk

GitHub

VF

VisoMasterFusion/VisoMaster-Fusion

Powerful & Easy-to-Use Video Face Swapping and Editing Software

Trend 3

ai computer-vision face-editor faceswap live-portrait video-editor vr

263 65 +0/wk

GitHub

ME

agentmorris/MegaDetector

MegaDetector is an AI model that helps conservation folks spend less time doing boring things with camera trap images.

Trend 3

camera-traps cameratrap cameratraps computer-vision conservation ecology machine-learning megadetector wildlife

262 45 +0/wk

GitHub

HU

securade/hub

Securade.ai HUB - A generative AI based edge platform for computer vision that connects to existing CCTV cameras and makes them smart.

Trend 3

artificial-intelligence computer-vision edge-deployment face-detection fire-detection generative-ai grounding-dino industrial-safety jetson machine-learning model-zoo nvidia-gpu object-detection ppe-detection proximity-dete smoke-detection video-analytics worker-safety yolo7 zone-mana

260 26 +2/wk

GitHub

IK

Ikomia-dev/IkomiaApi

Deploy Computer Vision solutions with a few lines of code.

Trend 3

computer-vision computer-vision-ai computer-vision-algorithms computer-vision-opencv computer-vision-tools computervision deep-learning detectron2 human-pose-estimation image-processing machine-learning object-detection opencv openmmlab pose-estimation python pytorch tensorflow yolo

243 13 +1/wk

GitHub

SC

collidingScopes/shape-creator-tutorial

Create and control 3D shapes using hand gestures in real-time. Built with mediapipe computer vision and threejs

Trend 3

3d 3d-shapes augmented-reality browser computer-vision free fun-with-computer-vision hand-gesture mediapipe open-source real-time shape-creator spatial-computing threejs tutorial

232 51 +1/wk

GitHub

T3

duy-phamduc68/TrafficLab-3D

Create a digital-twin style traffic visualization using only mp4 CCTV footage and its Google Maps location.

Trend 3

3d-bbox autonomous-driving camera-calibration cctv-analysis computer-vision digital-twin geospatial-mapping homography intellegent-transportation object-detection object-tracking projection-mapping pyqt5-gui satellite-imagery smart-city traffic-analysis traffic-monitoring urban-analytics yolo

212 21 +0/wk

GitHub

PI

francozanardi/pictex

A Python library for efficient image generation using CSS Flexbox

Trend 3

flexbox flexbox-css gradient graphics image-generation python shadow skia taffy text-rendering text-to-image typography

199 5 +1/wk

GitHub

MT

petercorke/machinevision-toolbox-python

Machine vision toolbox for Python

Trend 3

blob-features bundle-adjustment camera-calibration computer-vision image-search image-segmentation machine-vision mathematical-morphology opencv python stereo-vision

193 29 +0/wk

GitHub

GA

PeculiarVentures/GammaCV

GammaCV is a WebGL accelerated Computer Vision library for browser

Trend 3

computer-vision feature-extraction gpu gpu-acceleration image-analysis image-processing machine-learning machine-vision object-detection opencv webgl

192 23 +0/wk

GitHub

CV

avs-abhishek123/Computer-Vision-Projects

All Computer Vision Projects - Beginner to Advanced

Trend 3

artificial-intelligence computer-vision deep-learning object-detection object-recognition object-tracking opencv pillow python3

180 37 +0/wk

GitHub

FD

skylab-tech/ffhqr-dataset

FFHQR -- the first large-scale retouching dataset for computer vision research.

Trend 3

computer-vision dataset deep-learning high-resolution large-scale retouching

175 12 +0/wk

GitHub

CG

codecentric/c4-genai-suite

c4 GenAI Suite

Trend 3

ai ai-agents artificial-intelligence assistant chatbot chatgpt claude dall-e gemini genai image-generation langchain llm llm-ui mcp multimodal ollama openai rag self-hosted

167 18 +0/wk

GitHub

1A

darkdevil3610/100-AI-Machine-learning-Deep-learning-Computer-vision-NLP

100+ AI Machine learning Deep learning Computer vision NLP Projects with code

Trend 3

artificial-intelligence artificial-intelligence-projects awesome collageproject compuer-vision-project computer-vision data-science deep-learning deep-learning-papers deep-learning-projects final-year-project final-year-projects fyp machine-learning machine-learning-projects nlp nlp-projects python

152 14 +1/wk

GitHub

AR

collidingScopes/arpeggiator

Hand-controlled arpeggiator, drum machine, and audio reactive visualizer. Built with mediapipe computer vision, threejs, tonejs

Trend 3

arpeggio audio-reactive augmented-reality computer-vision drum-machine fun-with-computer-vision hand-gesture-recognition hand-tracking mediapipe music spatial-computing synthesizer threejs tonejs visualizer

150 43 +1/wk

GitHub

AC

mawady/awesome-computer-vision-resources

A structured learning reference for computer vision: from image fundamentals to research frontiers

Trend 3

awesome awesome-list computer-science computer-vision data-science deep-learning education image-processing machine-learning

145 26 +1/wk

GitHub

SC

strayrobots/scanner

An app for collecting raw RGB-D scans on iOS devices.

Trend 3

3d computer-vision ios rgb-d

121 23 +0/wk

GitHub

Top Projects (100)

huggingface/datasets

ultralytics/ultralytics

google-ai-edge/mediapipe

JaidedAI/EasyOCR

HumanSignal/label-studio

calesthio/OpenMontage

invoke-ai/InvokeAI

lucidrains/vit-pytorch

khoj-ai/khoj

microsoft/AirSim

blakeblackshear/frigate

mudler/LocalAI

d2l-ai/d2l-zh

foxhui/WebAI2API

SamurAIGPT/Vibe-Workflow

rohitg00/ai-engineering-from-scratch

ai-hpc/ai-hardware-engineer-roadmap

stirling-image/stirling-image

ashesbloom/LocalLens

Leooo-Huang/awesome-human-activity-recognition

autowarefoundation/autoware_vision_pilot

LC044/TrailSnap

TensorCEO/TensorCEO

mayocream/koharu

gracezhao1997/Awesome-Video-World-Models-with-AR-Diffusion

Intellindust-AI-Lab/EdgeCrafter

vllm-project/vllm-omni

HanaokaYuzu/Gemini-API

jau123/nanobanana-trending-prompts

X-GenGroup/Flow-Factory

SRA-VJTI/Pixels

CVHub520/X-AnyLabeling-Server

galilai-group/stable-pretraining

nerficg-project/faster-gaussian-splatting

mlslabs/MLSLabsGaussianSplattingRenderer-UE

AkihikoWatanabe/paper_notes

cerul-ai/cerul

M-3LAB/awesome-3d-anomaly-detection

networkoptix/nx_open

AccumulateMore/CV

CVHub520/X-AnyLabeling

roboflow/rf-detr

mcmonkeyprojects/SwarmUI

MaaXYZ/MaaFramework

WeThinkIn/AIGC-Interview-Book

HenryNdubuaku/maths-cs-ai-compendium

MrNeRF/LichtFeld-Studio

haosulab/ManiSkill

SharpAI/DeepCamera

pixeltable/pixeltable

MiniMax-AI/MiniMax-MCP

software-mansion/react-native-executorch

YouMind-OpenLab/nano-banana-pro-prompts-recommend-skill

cleanlab/cleanvision

Linketic/CityGaussian

LLM-Red-Team/jimeng-free-api

simpler-env/SimplerEnv

withoutbg/withoutbg

cuixing158/Awesome-CV-MasterHub

papercopilot/paperlists

mees/calvin

MIT-SPARK/VGGT-SLAM

stevenygd/PointFlow

souvikmajumder26/Multi-Agent-Medical-Assistant

geekwenjie/SmartJavaAI

alephpi/Texo

waybarrios/vllm-mlx

jacobkrantz/VLN-CE

deepinv/deepinv

hailo-ai/hailo_model_zoo

LingDong-/skeleton-tracing

jasonmanesis/Satellite-Imagery-Datasets-Containing-Ships

Rishabh-creator601/Books

MMMU-Benchmark/MMMU

roboflow/roboflow-python

suzuran0y/CCTV-Smartphone-AI-Monitoring

ChenHongruixuan/ChangeDetectionRepository

Fabric-Project/Fabric

ashbuilds/payload-ai