GitHub - yifanfeng97/Hyper-Extract: Hypergraph is more powerful. Transform unstructured text into structured knowledge with LLMs. Graphs, hypergraphs, and spatio-temporal extractions — with one command.

Smart Knowledge Extraction CLI

Transform documents into structured knowledge with one command.

📖 English Version · 中文版

"Stop reading. Start understanding."
"告别文档焦虑，让信息一目了然"

📰 What's New

🔌 MCP Server — Query your knowledge abstracts from Claude Desktop and IDE agents with he-mcp. (PR #40)
🧠 Anthropic Claude Support — Use claude-opus-4-8, claude-sonnet-4-6, and claude-haiku-4-5 directly as your LLM provider. (PR #38)
📝 Obsidian Export — Turn any graph into an Obsidian vault with Markdown notes linked by [[wikilinks]]. (PR #37)
🧹 he clean — Remove a KA's index or the whole knowledge abstract in one command. (PR #39)
🔧 Reliability Fixes — True mean for multi-chunk embeddings, capped OpenAI-compatible batch sizes, and resolved multi-word llm_* merge strategies. (PRs #35, #36, #41)

See the full changelog in the GitHub releases.

Hyper-Extract is an intelligent, LLM-powered knowledge extraction and evolution framework. It radically simplifies transforming highly unstructured texts into persistent, predictable, and strongly-typed Knowledge Abstracts. It effortlessly extracts information into a wide spectrum of formats—ranging from simple Collections (Lists/Sets) and Pydantic Models, to complex Knowledge Graphs, Hypergraphs, and even Spatio-Temporal Graphs.

✨ Core Features


🔷 8 Knowledge Structures	From simple Lists to advanced Graphs, Hypergraphs, and Spatio-Temporal Graphs
🧠 10+ Extraction Engines	GraphRAG, LightRAG, Hyper-RAG, KG-Gen, and more — ready to use
📝 80+ YAML Templates	Zero-code extraction across Finance, Legal, Medical, TCM, Industry, and General domains
🔄 Incremental Evolution	Feed new documents anytime to expand and refine your knowledge base
📤 Obsidian Export	Turn any extracted graph into an Obsidian vault — Markdown notes linked by `[[wikilinks]]`

🎯 What Can You Do With It?

📄 Researcher — Turn papers into knowledge graphs

Feed a 20-page academic paper, get an interactive graph of key concepts, authors, and citations.

he parse paper.pdf -t general/academic_graph -o ./paper_kb/
he show ./paper_kb/

🏦 Financial Analyst — Extract entities from earnings reports

Automatically identify companies, executives, financial metrics, and their relationships from unstructured reports.

he parse earnings.md -t finance/earnings_graph -o ./finance_kb/
he search ./finance_kb/ "What are the key risk factors?"

🔒 Local Deployment — Keep data on-premise with vLLM

Run Qwen3.5-9B + bge-m3 locally via vLLM. No data leaves your machine.

from hyperextract import create_client
llm, emb = create_client(
    llm="vllm:Qwen3.5-9B@http://localhost:8000/v1",
    embedder="vllm:bge-m3@http://localhost:8001/v1",
    api_key="dummy",
)

🚀 Supported Platforms & Models

Hyper-Extract relies on the LLM's structured output capability (json_schema or Function Calling).

Platform	Verified Models
OpenAI	gpt-4o, gpt-4o-mini, gpt-5
Anthropic	claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5
阿里云百炼	qwen-plus, qwen-turbo, deepseek-r1
Local vLLM	Qwen3.5-9B (GPTQ-Marlin)

Embedding models (semantic search) work with any OpenAI-compatible endpoint: text-embedding-3-small, text-embedding-v4 (Bailian), bge-m3 (local vLLM).

Anthropic note: Claude is used for the LLM (set ANTHROPIC_API_KEY). Anthropic has no embeddings API, so pair it with an OpenAI-compatible embedder:
from hyperextract import create_client
llm, emb = create_client(llm="anthropic", embedder="openai:text-embedding-3-small")
Requires the extra: pip install 'hyperextract[anthropic]'.

📖 Full guide: Provider System & Local Model Support

⚡ 30-Second Quick Start

# Install
uv tool install hyperextract

# Configure API key
he config init -k YOUR_OPENAI_API_KEY

# Extract knowledge from a document
he parse examples/en/tesla.md -t general/biography_graph -o ./output/ -l en

# Query it
he search ./output/ "What are Tesla's major achievements?"

# Visualize
he show ./output/

# Export to an Obsidian vault (Markdown notes + [[wikilinks]])
he export obsidian ./output/ -o ./vault/

🐍 Python API (click to expand)

uv pip install hyperextract

from hyperextract import Template

ka = Template.create("general/biography_graph")

with open("examples/en/tesla.md") as f:
    result = ka.parse(f.read())

result.show()

🔗 More examples: examples/en

📈 Why Hyper-Extract?

Feature	GraphRAG	LightRAG	KG-Gen	ATOM	Hyper-Extract
Knowledge Graph	✅	✅	✅	✅	✅
Temporal Graph	✅	❌	❌	✅	✅
Spatial Graph	❌	❌	❌	❌	✅
Hypergraph	❌	❌	❌	❌	✅
Domain Templates	❌	❌	❌	❌	✅
Interactive CLI	✅	❌	❌	❌	✅
Multi-language	✅	❌	❌	❌	✅

🧩 Supported Knowledge Structures

From simple to complex — pick the right structure for your data:

Example — AutoGraph visualization:

📋 What's under the hood? (Architecture & Templates)

Hyper-Extract follows a three-layer architecture:

Auto-Types — 8 strongly-typed data structures (Model, List, Set, Graph, Hypergraph, Temporal Graph, Spatial Graph, Spatio-Temporal Graph)
Methods — Extraction algorithms: KG-Gen, GraphRAG, LightRAG, Hyper-RAG, Cog-RAG, and more
Templates — 80+ presets across 6 domains. Zero-code setup.

Template example (Graph type):

language: en
name: Knowledge Graph
type: graph
tags: [general]
description: 'Extract entities and their relationships.'
output:
  entities:
    fields:
    - name: name
      type: str
    - name: type
      type: str
    - name: description
      type: str
  relations:
    fields:
    - name: source
      type: str
    - name: target
      type: str
    - name: type
      type: str
identifiers:
  entity_id: name
  relation_id: '{source}|{type}|{target}'

📚 Documentation & Resources

Resource	Link
Full Documentation	yifanfeng97.github.io/Hyper-Extract
CLI Guide	Command-line interface
Provider System	Model compatibility & local deployment
Template Gallery	80+ presets
Examples	Working code

🔌 MCP Server

Expose your knowledge abstracts to MCP-capable assistants (Claude Desktop, IDE agents) via the Model Context Protocol — read + export only.

pip install 'hyperextract[mcp]'
he-mcp        # stdio MCP server

Tools: list_templates, info, search, ask (RAG), export_obsidian. Full guide: MCP Server docs.

🤝 Contributing & License

Contributions are welcome! Please submit Issues and PRs.
Licensed under Apache-2.0.

🔒 Security

This project has been security assessed by MseeP.ai.

Name		Name	Last commit message	Last commit date
Latest commit History 301 Commits
.github		.github
docs		docs
examples		examples
hyperextract-skills		hyperextract-skills
hyperextract		hyperextract
tests		tests
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
README_ZH.md		README_ZH.md
docs_hooks.py		docs_hooks.py
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📰 What's New

✨ Core Features

🎯 What Can You Do With It?

🚀 Supported Platforms & Models

⚡ 30-Second Quick Start

📈 Why Hyper-Extract?

🧩 Supported Knowledge Structures

📚 Documentation & Resources

🔌 MCP Server

🤝 Contributing & License

🔒 Security

⭐ Star History

About

Uh oh!

Releases 4

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📰 What's New

✨ Core Features

🎯 What Can You Do With It?

🚀 Supported Platforms & Models

⚡ 30-Second Quick Start

📈 Why Hyper-Extract?

🧩 Supported Knowledge Structures

📚 Documentation & Resources

🔌 MCP Server

🤝 Contributing & License

🔒 Security

⭐ Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages