Smart Knowledge Extraction CLI
Transform documents into structured knowledge with one command.
"Stop reading. Start understanding."
"告别文档焦虑,让信息一目了然"
- 🔌 MCP Server — Query your knowledge abstracts from Claude Desktop and IDE agents with
he-mcp. (PR #40) - 🧠 Anthropic Claude Support — Use
claude-opus-4-8,claude-sonnet-4-6, andclaude-haiku-4-5directly as your LLM provider. (PR #38) - 📝 Obsidian Export — Turn any graph into an Obsidian vault with Markdown notes linked by
[[wikilinks]]. (PR #37) - 🧹
he clean— Remove a KA's index or the whole knowledge abstract in one command. (PR #39) - 🔧 Reliability Fixes — True mean for multi-chunk embeddings, capped OpenAI-compatible batch sizes, and resolved multi-word
llm_*merge strategies. (PRs #35, #36, #41)
See the full changelog in the GitHub releases.
Hyper-Extract is an intelligent, LLM-powered knowledge extraction and evolution framework. It radically simplifies transforming highly unstructured texts into persistent, predictable, and strongly-typed Knowledge Abstracts. It effortlessly extracts information into a wide spectrum of formats—ranging from simple Collections (Lists/Sets) and Pydantic Models, to complex Knowledge Graphs, Hypergraphs, and even Spatio-Temporal Graphs.
| 🔷 8 Knowledge Structures | From simple Lists to advanced Graphs, Hypergraphs, and Spatio-Temporal Graphs |
| 🧠 10+ Extraction Engines | GraphRAG, LightRAG, Hyper-RAG, KG-Gen, and more — ready to use |
| 📝 80+ YAML Templates | Zero-code extraction across Finance, Legal, Medical, TCM, Industry, and General domains |
| 🔄 Incremental Evolution | Feed new documents anytime to expand and refine your knowledge base |
| 📤 Obsidian Export | Turn any extracted graph into an Obsidian vault — Markdown notes linked by [[wikilinks]] |
📄 Researcher — Turn papers into knowledge graphs
Feed a 20-page academic paper, get an interactive graph of key concepts, authors, and citations.
he parse paper.pdf -t general/academic_graph -o ./paper_kb/
he show ./paper_kb/🏦 Financial Analyst — Extract entities from earnings reports
Automatically identify companies, executives, financial metrics, and their relationships from unstructured reports.
he parse earnings.md -t finance/earnings_graph -o ./finance_kb/
he search ./finance_kb/ "What are the key risk factors?"🔒 Local Deployment — Keep data on-premise with vLLM
Run Qwen3.5-9B + bge-m3 locally via vLLM. No data leaves your machine.
from hyperextract import create_client
llm, emb = create_client(
llm="vllm:Qwen3.5-9B@http://localhost:8000/v1",
embedder="vllm:bge-m3@http://localhost:8001/v1",
api_key="dummy",
)Hyper-Extract relies on the LLM's structured output capability (json_schema or Function Calling).
| Platform | Verified Models |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini, gpt-5 |
| Anthropic | claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5 |
| 阿里云百炼 | qwen-plus, qwen-turbo, deepseek-r1 |
| Local vLLM | Qwen3.5-9B (GPTQ-Marlin) |
Embedding models (semantic search) work with any OpenAI-compatible endpoint: text-embedding-3-small, text-embedding-v4 (Bailian), bge-m3 (local vLLM).
Anthropic note: Claude is used for the LLM (set
ANTHROPIC_API_KEY). Anthropic has no embeddings API, so pair it with an OpenAI-compatible embedder:from hyperextract import create_client llm, emb = create_client(llm="anthropic", embedder="openai:text-embedding-3-small")Requires the extra:
pip install 'hyperextract[anthropic]'.
📖 Full guide: Provider System & Local Model Support
# Install
uv tool install hyperextract
# Configure API key
he config init -k YOUR_OPENAI_API_KEY
# Extract knowledge from a document
he parse examples/en/tesla.md -t general/biography_graph -o ./output/ -l en
# Query it
he search ./output/ "What are Tesla's major achievements?"
# Visualize
he show ./output/
# Export to an Obsidian vault (Markdown notes + [[wikilinks]])
he export obsidian ./output/ -o ./vault/🐍 Python API (click to expand)
uv pip install hyperextractfrom hyperextract import Template
ka = Template.create("general/biography_graph")
with open("examples/en/tesla.md") as f:
result = ka.parse(f.read())
result.show()🔗 More examples: examples/en
| Feature | GraphRAG | LightRAG | KG-Gen | ATOM | Hyper-Extract |
|---|---|---|---|---|---|
| Knowledge Graph | ✅ | ✅ | ✅ | ✅ | ✅ |
| Temporal Graph | ✅ | ❌ | ❌ | ✅ | ✅ |
| Spatial Graph | ❌ | ❌ | ❌ | ❌ | ✅ |
| Hypergraph | ❌ | ❌ | ❌ | ❌ | ✅ |
| Domain Templates | ❌ | ❌ | ❌ | ❌ | ✅ |
| Interactive CLI | ✅ | ❌ | ❌ | ❌ | ✅ |
| Multi-language | ✅ | ❌ | ❌ | ❌ | ✅ |
From simple to complex — pick the right structure for your data:
Example — AutoGraph visualization:
📋 What's under the hood? (Architecture & Templates)
Hyper-Extract follows a three-layer architecture:
- Auto-Types — 8 strongly-typed data structures (Model, List, Set, Graph, Hypergraph, Temporal Graph, Spatial Graph, Spatio-Temporal Graph)
- Methods — Extraction algorithms: KG-Gen, GraphRAG, LightRAG, Hyper-RAG, Cog-RAG, and more
- Templates — 80+ presets across 6 domains. Zero-code setup.
Template example (Graph type):
language: en
name: Knowledge Graph
type: graph
tags: [general]
description: 'Extract entities and their relationships.'
output:
entities:
fields:
- name: name
type: str
- name: type
type: str
- name: description
type: str
relations:
fields:
- name: source
type: str
- name: target
type: str
- name: type
type: str
identifiers:
entity_id: name
relation_id: '{source}|{type}|{target}'| Resource | Link |
|---|---|
| Full Documentation | yifanfeng97.github.io/Hyper-Extract |
| CLI Guide | Command-line interface |
| Provider System | Model compatibility & local deployment |
| Template Gallery | 80+ presets |
| Examples | Working code |
Expose your knowledge abstracts to MCP-capable assistants (Claude Desktop, IDE agents) via the Model Context Protocol — read + export only.
pip install 'hyperextract[mcp]'
he-mcp # stdio MCP serverTools: list_templates, info, search, ask (RAG), export_obsidian. Full guide: MCP Server docs.
Contributions are welcome! Please submit Issues and PRs.
Licensed under Apache-2.0.
This project has been security assessed by MseeP.ai.

