Skip to content

yifanfeng97/Hyper-Extract

Repository files navigation

Hyper-Extract Logo

Smart Knowledge Extraction CLI

Transform documents into structured knowledge with one command.

📖 English Version · 中文版

Trendshift

PyPI Version Python Version License Docs GitHub Stars


"Stop reading. Start understanding."
"告别文档焦虑,让信息一目了然"


Hero & Workflow

📰 What's New

  • 🔌 MCP Server — Query your knowledge abstracts from Claude Desktop and IDE agents with he-mcp. (PR #40)
  • 🧠 Anthropic Claude Support — Use claude-opus-4-8, claude-sonnet-4-6, and claude-haiku-4-5 directly as your LLM provider. (PR #38)
  • 📝 Obsidian Export — Turn any graph into an Obsidian vault with Markdown notes linked by [[wikilinks]]. (PR #37)
  • 🧹 he clean — Remove a KA's index or the whole knowledge abstract in one command. (PR #39)
  • 🔧 Reliability Fixes — True mean for multi-chunk embeddings, capped OpenAI-compatible batch sizes, and resolved multi-word llm_* merge strategies. (PRs #35, #36, #41)

See the full changelog in the GitHub releases.

Hyper-Extract is an intelligent, LLM-powered knowledge extraction and evolution framework. It radically simplifies transforming highly unstructured texts into persistent, predictable, and strongly-typed Knowledge Abstracts. It effortlessly extracts information into a wide spectrum of formats—ranging from simple Collections (Lists/Sets) and Pydantic Models, to complex Knowledge Graphs, Hypergraphs, and even Spatio-Temporal Graphs.

✨ Core Features

🔷 8 Knowledge Structures From simple Lists to advanced Graphs, Hypergraphs, and Spatio-Temporal Graphs
🧠 10+ Extraction Engines GraphRAG, LightRAG, Hyper-RAG, KG-Gen, and more — ready to use
📝 80+ YAML Templates Zero-code extraction across Finance, Legal, Medical, TCM, Industry, and General domains
🔄 Incremental Evolution Feed new documents anytime to expand and refine your knowledge base
📤 Obsidian Export Turn any extracted graph into an Obsidian vault — Markdown notes linked by [[wikilinks]]

🎯 What Can You Do With It?

📄 Researcher — Turn papers into knowledge graphs

Feed a 20-page academic paper, get an interactive graph of key concepts, authors, and citations.

he parse paper.pdf -t general/academic_graph -o ./paper_kb/
he show ./paper_kb/
🏦 Financial Analyst — Extract entities from earnings reports

Automatically identify companies, executives, financial metrics, and their relationships from unstructured reports.

he parse earnings.md -t finance/earnings_graph -o ./finance_kb/
he search ./finance_kb/ "What are the key risk factors?"
🔒 Local Deployment — Keep data on-premise with vLLM

Run Qwen3.5-9B + bge-m3 locally via vLLM. No data leaves your machine.

from hyperextract import create_client
llm, emb = create_client(
    llm="vllm:Qwen3.5-9B@http://localhost:8000/v1",
    embedder="vllm:bge-m3@http://localhost:8001/v1",
    api_key="dummy",
)

🚀 Supported Platforms & Models

Hyper-Extract relies on the LLM's structured output capability (json_schema or Function Calling).

Platform Verified Models
OpenAI gpt-4o, gpt-4o-mini, gpt-5
Anthropic claude-opus-4-8, claude-sonnet-4-6, claude-haiku-4-5
阿里云百炼 qwen-plus, qwen-turbo, deepseek-r1
Local vLLM Qwen3.5-9B (GPTQ-Marlin)

Embedding models (semantic search) work with any OpenAI-compatible endpoint: text-embedding-3-small, text-embedding-v4 (Bailian), bge-m3 (local vLLM).

Anthropic note: Claude is used for the LLM (set ANTHROPIC_API_KEY). Anthropic has no embeddings API, so pair it with an OpenAI-compatible embedder:

from hyperextract import create_client
llm, emb = create_client(llm="anthropic", embedder="openai:text-embedding-3-small")

Requires the extra: pip install 'hyperextract[anthropic]'.

📖 Full guide: Provider System & Local Model Support

⚡ 30-Second Quick Start

# Install
uv tool install hyperextract

# Configure API key
he config init -k YOUR_OPENAI_API_KEY

# Extract knowledge from a document
he parse examples/en/tesla.md -t general/biography_graph -o ./output/ -l en

# Query it
he search ./output/ "What are Tesla's major achievements?"

# Visualize
he show ./output/

# Export to an Obsidian vault (Markdown notes + [[wikilinks]])
he export obsidian ./output/ -o ./vault/
🐍 Python API (click to expand)
uv pip install hyperextract
from hyperextract import Template

ka = Template.create("general/biography_graph")

with open("examples/en/tesla.md") as f:
    result = ka.parse(f.read())

result.show()

🔗 More examples: examples/en

📈 Why Hyper-Extract?

Feature GraphRAG LightRAG KG-Gen ATOM Hyper-Extract
Knowledge Graph
Temporal Graph
Spatial Graph
Hypergraph
Domain Templates
Interactive CLI
Multi-language

🧩 Supported Knowledge Structures

From simple to complex — pick the right structure for your data:

Knowledge Structures Matrix

Example — AutoGraph visualization:

AutoGraph Visualization

📋 What's under the hood? (Architecture & Templates)

Hyper-Extract follows a three-layer architecture:

  • Auto-Types — 8 strongly-typed data structures (Model, List, Set, Graph, Hypergraph, Temporal Graph, Spatial Graph, Spatio-Temporal Graph)
  • Methods — Extraction algorithms: KG-Gen, GraphRAG, LightRAG, Hyper-RAG, Cog-RAG, and more
  • Templates — 80+ presets across 6 domains. Zero-code setup.
Architecture

Template example (Graph type):

language: en
name: Knowledge Graph
type: graph
tags: [general]
description: 'Extract entities and their relationships.'
output:
  entities:
    fields:
    - name: name
      type: str
    - name: type
      type: str
    - name: description
      type: str
  relations:
    fields:
    - name: source
      type: str
    - name: target
      type: str
    - name: type
      type: str
identifiers:
  entity_id: name
  relation_id: '{source}|{type}|{target}'

📚 Documentation & Resources

Resource Link
Full Documentation yifanfeng97.github.io/Hyper-Extract
CLI Guide Command-line interface
Provider System Model compatibility & local deployment
Template Gallery 80+ presets
Examples Working code

🔌 MCP Server

Expose your knowledge abstracts to MCP-capable assistants (Claude Desktop, IDE agents) via the Model Context Protocol — read + export only.

pip install 'hyperextract[mcp]'
he-mcp        # stdio MCP server

Tools: list_templates, info, search, ask (RAG), export_obsidian. Full guide: MCP Server docs.

🤝 Contributing & License

Contributions are welcome! Please submit Issues and PRs.
Licensed under Apache-2.0.

🔒 Security

This project has been security assessed by MseeP.ai.

⭐ Star History

Star History Chart

About

Hypergraph is more powerful. Transform unstructured text into structured knowledge with LLMs. Graphs, hypergraphs, and spatio-temporal extractions — with one command.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages