MH

opendatalab/MinerU-HTML

MinerU-HTML: An SLM-powered HTML main content extractor that outputs clean HTML bodies. Perfect for Deep Research Agents, RAG applications, and training data generation.

229 24 +0/wk
GitHub
article-extractor corpus-tools nlp rag scraping text-extraction trafilatura web-scraping webagent
Trend 0

Star & Fork Trend (18 data points)

Stars
Forks

Multi-Source Signals

Growth Velocity

opendatalab/MinerU-HTML has +0 stars this period . Velocity data will be available after more historical data is collected.

Deep analysis is being generated for this repository.

Signal-backed technical analysis will be available soon.

Metric MinerU-HTML NSP-BERT bert-vocab-builder rag-using-langchain-amazon-bedrock-and-opensearch
Stars 229 230230228
Forks 24 384845
Weekly Growth +0 +0+0+0
Language Python PythonPythonPython
Sources 1 111
License Apache-2.0 Apache-2.0N/AMIT-0

Capability Radar vs NSP-BERT

MinerU-HTML
NSP-BERT
Maintenance Activity 97

Last code push 12 days ago.

Community Engagement 52

Fork-to-star ratio: 10.5%. Active community forking and contributing.

Issue Burden 70

Issue data not yet available.

Growth Momentum 30

No measurable growth in the current period (first-day cold start expected).

License Clarity 95

Licensed under Apache-2.0. Permissive — safe for commercial use.

Risk scores are computed from real-time repository data. Higher scores indicate healthier metrics.