AutoG is a novel framework that addresses the critical challenge of automatically constructing high-quality graphs from tabular data for graph machine learning (GML) applications. While GML has seen tremendous growth, the crucial step of converting tabular data into meaningful graphs remains largely manual and unstandardized. AutoG leverages Large Language Models (LLMs) to automate this process, producing graphs that rival those created by human experts.
- Automatic graph schema generation without human intervention
- LLM-based solution for high-quality graph construction
git clone https://github.com/amazon-science/Automatic-Table-to-Graph-Generation
cd Automatic-Table-to-Graph-Generation/
# Install 4dbinfer-related libraries
bash multi-table-benchmark/conda/create_conda_env.sh
# Clone DeepJoin to download the language model
git clone https://github.com/mutong184/deepjoin
These are required for development but not necessary if using cached LLM outputs:
pip install llama-index-llms-bedrock
pip install llama-index
pip install valentine
Generate the preprocessing dataset:
bash scripts/download.sh
This creates two dataset versions:
- Old Version: A baseline preprocessed version using basic heuristics
- Expert Version: Human expert-generated version with optimized column naming
Note: AutoG uses the 'old' version as input while ignoring the schema information.
To run the AutoG pipeline:
bash scripts/autog.sh
For detailed configuration options, see scripts/autog.sh
.
Execute GML tasks on the constructed graphs:
bash scripts/run.sh
Follow these steps to apply AutoG to your own data:
-
Generate metadata information:
from models.llm.gconstruct import analyze_dataframes metadata = analyze_dataframes(your_dataframe)
-
Generate initial type predictions:
from prompts import identify types = identify(metadata)
-
Create a DBBRDBDataset wrapper for your data.
-
Generate first-round prompts using AutoG.
If you use AutoG in your research, please cite:
@inproceedings{
chen2025autog,
title={AutoG: Towards automatic graph construction from tabular data},
author={Zhikai Chen and Han Xie and Jian Zhang and Xiang song and Jiliang Tang and Huzefa Rangwala and George Karypis},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=hovDbX4Gh6}
}