supabase/supabase
The Postgres development platform. Supabase gives you a dedicated Postgres database to build your web, mobile, and AI applications.
Projects and tools related to embedding techniques in AI and ML.
The Postgres development platform. Supabase gives you a dedicated Postgres database to build your web, mobile, and AI applications.
High performance embedded vector database
On-device context engine and memory for AI agents. Claude Code, Hermes and OpenClaw. Hooks + MCP server + hybrid RAG search.
Memory library for building stateful agents
Local code search combining BM25, vector similarity, and cross-encoder reranking. Parses 60+ languages with tree-sitter, runs entirely offline, and returns structured results with file paths, line ranges, and symbol metadata. Built in Rust.
从 NLP 到 LLM 的算法全栈教程,在线阅读地址:https://datawhalechina.github.io/base-llm/
Semantic code searcher and codebase utility
An Obsidian plugin to interact with your privacy focused AI-Assistant making your second brain even smarter!
On-device AI for iOS & Android
Local AI-powered document search and editing with first-in-class hybrid retrieval, LLM answers, WebUI, REST API and MCP support for AI clients.
A Claude Code plugin that automatically captures everything Claude does during your coding sessions, compresses it with AI (using Claude's agent-sdk), and injects relevant context back into future sessions.
Give agents everything they need to ship fullstack apps. The backend built for agentic development.
A Markdown-first memory system, a standalone library for any AI agent. Inspired by OpenClaw.
Enterprise-grade (40m+ lines) codebase intelligence in a zero-setup, private and local Claude Plugin or MCP: managed indexing, hybrid semantic search, polyglot code dependency graphs, and DB/API/infra knowledge. Benchmark: 61% less tokens, 84% fewer calls, 37x faster than standard AI grep.
A modern desktop application for exploring, managing, and analyzing vector databases
LangChain4j is an open-source Java library that simplifies the integration of LLMs into Java applications through a unified API, providing access to popular LLMs and vector databases. It makes implementing RAG, tool calling (including support for MCP), and agents easy. LangChain4j integrates seamlessly with various enterprise Java frameworks.
Local persistent memory store for LLM applications including claude desktop, github copilot, codex, antigravity, etc.
Semantic Search & Call Graphs for AI Agents (100% Local)
Open Source Semantic Search for your AI Agent
Rust library for vector embeddings and reranking.
MLX-Embeddings is the best package for running Vision and Language Embedding models locally on your Mac using MLX.
EntityDB is an in-browser vector database wrapping indexedDB and Transformers.js over WebAssembly
Running local Language Language Models (LLM) to perform Retrieval-Augmented Generation (RAG)
🦉⚡️Serverless, distributed vector database as an API
Easy and lightning fast training of 🤗 Transformers on Habana Gaudi processor (HPU)
基于PyTorch的BERT中文文本分类模型(BERT Chinese text classification model implemented by PyTorch)
The Go client for Chroma vector database
Ollama SDK for .NET
Official Rust Implementation of Model2Vec
BERT-CCPoem is an BERT-based pre-trained model particularly for Chinese classical poetry
Run embedding models locally in Swift using MLTensor.
TTS System Bert-VITS2 Android Ver, powered by alibaba-MNN engine.
State-of-the-art CLIP/SigLIP embedding models finetuned for the fashion domain. +57% increase in evaluation metrics vs FashionCLIP 2.0.
《李宏毅深度学习教程》(李宏毅老师推荐👍,苹果书🍎),PDF下载地址:https://github.com/datawhalechina/leedl-tutorial/releases
Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.
Easy-to-use and powerful LLM and SLM library with awesome model zoo.
🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP
💡 All-in-one AI framework for semantic search, LLM orchestration and language model workflows
This repository contains demos I made with the Transformers library by HuggingFace.
Retrieval and Retrieval-augmented LLMs
💥 Fast State-of-the-Art Tokenizers optimized for Research and Production
Pre-Training with Whole Word Masking for Chinese BERT(中文BERT-wwm系列模型)
大规模中文自然语言处理语料 Large Scale Chinese Corpus for NLP
SeaTunnel is a multimodal, high-performance, distributed, massive data integration tool.
BertViz: Visualize Attention in Transformer Models
Leveraging BERT and c-TF-IDF to create easily interpretable topics.
Postgres with GPUs for ML/AI apps.
Google AI 2018 BERT pytorch implementation
Transformer related optimization, including BERT, GPT
Open Lakehouse Format for Multimodal AI. Convert from Parquet in 2 lines of code for 100x faster random access, vector index, and data versioning. Compatible with Pandas, DuckDB, Polars, Pyarrow, and PyTorch with more integrations coming..
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
High-performance data engine for AI and multimodal workloads. Process images, audio, video, and structured data at any scale
MineContext is your proactive context-aware AI partner(Context-Engineering+ChatGPT Pulse)
text2vec, text to vector. 文本向量表征工具,把文本转化为向量矩阵,实现了Word2Vec、RankBM25、Sentence-BERT、CoSENT等文本表征、文本相似度计算模型,开箱即用。
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
A blazing fast inference solution for text embeddings models
中文语言理解测评基准 Chinese Language Understanding Evaluation Benchmark: datasets, baselines, pre-trained models, corpus and leaderboard
Minimal keyword extraction with BERT
State of the Art Natural Language Processing
One beautiful Ruby API for OpenAI, Anthropic, Gemini, Bedrock, Azure, OpenRouter, DeepSeek, Ollama, VertexAI, Perplexity, Mistral, xAI, GPUStack & OpenAI compatible APIs. Agents, Chat, Vision, Audio, PDF, Images, Embeddings, Tools, Streaming & Rails integration.
A python library for self-supervised learning on images.
快速上手AI理论及应用实战:基础知识、Transformer、NLP、ML、DL、竞赛。含大量注释及数据集,力求每一位能看懂并复现。
we want to create a repo to illustrate usage of transformers in chinese
Open Source Pre-training Model Framework in PyTorch & Pre-trained Model Zoo
Fast, Accurate, Lightweight Python library to make State of the Art Embedding
A Unified Library for Parameter-Efficient and Modular Transfer Learning
Analytics, Versioning and ETL for multimodal data: video, audio, PDFs, images
Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc.
The universal tool suite for vector database management. Manage Pinecone, Chroma, Qdrant, Weaviate and more vector databases with ease.
[CVPR 2021] Official PyTorch implementation for Transformer Interpretability Beyond Attention Visualization, a novel method to visualize classifications by Transformer based networks.
The open-source RAG platform: built-in citations, deep research, 22+ file formats, partitions, MCP server, and more.
A BERT model for scientific text.
jiant is an nlp toolkit
Bringing BERT into modernity via both architecture changes and scaling
Data augmentation for NLP, presented at EMNLP 2019
LLPhant - A comprehensive PHP Generative AI Framework using OpenAI GPT 4. Inspired by Langchain
This repository contains examples for customers to get started using the Amazon Bedrock Service. This contains examples for all available foundational models
All-in-one training for vision models (YOLO, ViTs, RT-DETR, DINOv3): pretraining, fine-tuning, distillation.
Solves basic Russian NLP tasks, API for lower level Natasha projects
Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation and works with PyTorch.
Trained models & code to predict toxic comments on all 3 Jigsaw Toxic Comment Challenges. Built using ⚡ Pytorch Lightning and 🤗 Transformers. For access to our API, please email us at [email protected].
Transformer models from BERT to GPT-4, environments from Hugging Face to OpenAI. Fine-tuning, training, and prompt engineering examples. A bonus section with ChatGPT, GPT-3.5-turbo, GPT-4, and DALL-E including jump starting GPT-4, speech-to-text, text-to-speech, text to image generation with DALL-E, Google Cloud AI,HuggingGPT, and more
Embeddable vector database for Go with Chroma-like interface and zero third-party dependencies. In-memory with optional persistence.
Github repo with tutorials to fine tune transformers for diff NLP tasks
A code repository indexing tool to supercharge your LLM experience.
🪿 LinGoose is a Go framework for building awesome AI/LLM applications.
ID-based RAG FastAPI: Integration with Langchain and PostgreSQL/pgvector
Samples showing how to build Java applications powered by Generative AI and LLMs using Spring AI and Spring Boot.
LLM-PowerHouse: Unleash LLMs' potential through curated tutorials, best practices, and ready-to-use code for custom training and inferencing.
Pre-trained Transformers for Arabic Language Understanding and Generation (Arabic BERT, Arabic GPT2, Arabic ELECTRA)
Awesome-LLM-Eval: a curated list of tools, datasets/benchmark, demos, leaderboard, papers, docs and models, mainly for Evaluation on LLMs. 一个由工具、基准/数据、演示、排行榜和大模型等组成的精选列表,主要面向基础大模型评测,旨在探求生成式AI的技术边界.
This repository provides programs to build Retrieval Augmented Generation (RAG) code for Generative AI with LlamaIndex, Deep Lake, and Pinecone leveraging the power of OpenAI and Hugging Face models for generation and evaluation.
基于pytorch的bert_bilstm_crf中文命名实体识别
Turkish BERT/DistilBERT, ELECTRA, ConvBERT and T5 models
A flexible, adaptive classification system for dynamic text classification
pytextclassifier is a toolkit for text classification. 文本分类,LR,Xgboost,TextCNN,FastText,TextRNN,BERT等分类模型实现,开箱即用。
Cognitive memory for AI agents — FSRS-6 spaced repetition, 29 brain modules, 3D dashboard, single 22MB Rust binary. MCP server for Claude, Cursor, VS Code, Xcode, JetBrains.
中文文本分类任务,基于PyTorch实现(TextCNN,TextRNN,FastText,TextRCNN,BiLSTM_Attention, DPCNN, Transformer,Bert,ERNIE),开箱即用!
A curated list of retrieval-augmented generation (RAG) in large language models