This repository contains a GraphRAG (Graph-enhanced Retrieval-Augmented Generation) implementation for a healthcare company's product catalog using the official Microsoft GraphRAG package. The implementation automatically extracts entities and relationships from unstructured text documents to build a knowledge graph, which is then used to enhance retrieval and answer questions.
GraphRAG is an approach that combines the strengths of knowledge graphs with retrieval-augmented generation. It addresses limitations of traditional RAG systems by:
- Automatically extracting entities and relationships from documents
- Building a knowledge graph to represent structured information
- Using the graph structure to enhance retrieval beyond simple vector similarity
- Integrating graph-based and vector-based retrieval for more comprehensive answers
This implementation uses the official Microsoft GraphRAG CLI to:
- Index documents: The system processes text files in the
inputdirectory, extracting entities and relationships to build a knowledge graph. - Query the graph: The system supports both global and local search methods to answer questions about the healthcare products.
- Create statistics: The system generates statistics about the knowledge graph, such as the number of entities and relationships.
- Visualize the knowledge graph: The system generates a visual representation of the entities and relationships in the knowledge graph.
pipeline.py: The main implementation file that contains the GraphRAGPipeline class for indexing and queryinganalyzer.py: Contains the GraphRAGAnalyzer class for analyzing and visualizing the knowledge graphmain.py: Example script demonstrating the GraphRAG functionalityinput/: Directory containing the input text filesoutput/: Directory where GraphRAG stores its output files (entities, relationships, etc., will be created by GraphRAG)logs/: Directory for log files (will be created by GraphRAG)cache/: Directory for cached data (will be created by GraphRAG)
- Python 3.12 (managed by uv)
- An OpenAI API key
-
Install uv if you don't have it (install guide)
-
Install dependencies — uv reads
pyproject.tomlanduv.lockand provisions Python 3.12 if needed
uv sync- Scaffold the graphrag config (creates
settings.yamlandprompts/). Accept the defaults — the 3.x templates usegpt-4.1andtext-embedding-3-large.
uv run graphrag init --root ./Warning:
graphrag initwill overwrite an existing.envin the project root with a placeholder. If you already have one, back it up first (cp .env .env.bak).
- Create
.envwith your OpenAI key
echo 'GRAPHRAG_API_KEY=<API_KEY>' > .env- Approve environment settings
direnv allowRun the example script:
uv run python main.pyThe script will:
- Run the indexing process
- Execute example search queries
- Generate statistics and visualization of the knowledge graph
- CLI-based interaction: Uses the GraphRAG CLI for indexing and querying
- Graph analysis: Provides statistics and insights about the knowledge graph
- Knowledge graph visualization: Creates visual representations of entities and relationships