pyg-team · puririshi98 · Jan 28, 2025 · Jan 28, 2025 · Jan 28, 2025 · Jan 29, 2025
@@ -15,3 +15,15 @@
 /torch_geometric/sampler/ @rusty1s @mananshah99 @akihironitta
 
 /docs/ @rusty1s @akihironitta
+
+/torch_geometric/loader/rag_loader.py @puririshi98
+
+/torch_geometric/data/large_graph_indexer.py @puririshi98
+
+/torch_geometric/utils/rag @puririshi98
+
+/torch_geometric/nn/nlp @puririshi98
+
+/torch_geometric/nn/models/g_retriever.py @puririshi98
+
+/examples/llm @puririshi98
@@ -14,6 +14,7 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
 ### Added
 
+- Adds TXT2KG class with examples ([#9992](https://github.com/pyg-team/pytorch_geometric/pull/9992))
 - Added support for negative weights in `sparse_cross_entropy` ([#10432](https://github.com/pyg-team/pytorch_geometric/pull/10432))
 - Added `connected_components()` method to `Data` and `HeterData` ([#10388](https://github.com/pyg-team/pytorch_geometric/pull/10388))
 - Added LPFormer Graph Transformer for Link Prediction ([#9956](https://github.com/pyg-team/pytorch_geometric/pull/9956))

@@ -1,12 +1,11 @@
 # Examples for Co-training LLMs and GNNs
 
-| Example                                      | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |
-| -------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| [`g_retriever.py`](./g_retriever.py)         | Example for Retrieval-Augmented Generation (RAG) w/ GNN+LLM by co-training `LLAMA3` with `GAT` for answering questions based on knowledge graph information from the toy WebQSP dataset. We also have an [example repo](https://github.com/neo4j-product-examples/neo4j-gnn-llm-example) for integration with [Neo4j Graph DBs](neo4j.com) along with an associated [blog](https://developer.nvidia.com/blog/boosting-qa-accuracy-with-graphrag-using-pyg-and-graph-databases/) showing 2x accuracy gains over LLMs on real medical data. |
-| [`g_retriever_utils/`](./g_retriever_utils/) | Contains multiple scripts for benchmarking GRetriever's architecture and evaluating different retrieval methods.                                                                                                                                                                                                                                                                                                                                                                                                                          |
-| [`multihop_rag/`](./multihop_rag/)           | Contains starter code and an example run for building a Multi-hop dataset using WikiHop5M and 2WikiMultiHopQA                                                                                                                                                                                                                                                                                                                                                                                                                             |
-| [`nvtx_examples/`](./nvtx_examples/)         | Contains examples of how to wrap functions using the NVTX profiler for CUDA runtime analysis.                                                                                                                                                                                                                                                                                                                                                                                                                                             |
-| [`molecule_gpt.py`](./molecule_gpt.py)       | Example for [MoleculeGPT: Instruction Following Large Language Models for Molecular Property Prediction](https://ai4d3.github.io/2023/papers/34.pdf). Supports MoleculeGPT and InstructMol dataset                                                                                                                                                                                                                                                                                                                                        |
-| [`glem.py`](./glem.py)                       | Example for [GLEM](https://arxiv.org/abs/2210.14709), a GNN+LLM co-training model via variational Expectation-Maximization (EM) framework on node classification tasks to achieve SOTA results                                                                                                                                                                                                                                                                                                                                            |
-| [`git_mol.py`](./git_mol.py)                 | Example for [GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text](https://arxiv.org/abs/2308.06911)                                                                                                                                                                                                                                                                                                                                                                                             |
-| [`protein_mpnn.py`](./protein_mpnn.py)       | Example for [Robust deep learning--based protein sequence design using ProteinMPNN](https://www.biorxiv.org/content/10.1101/2022.06.03.494563v1)                                                                                                                                                                                                                                                                                                                                                                                          |
+| Example                                | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+| -------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| [`g_retriever.py`](./g_retriever.py)   | Example for Retrieval-Augmented Generation (RAG) w/ GNN+LLM by co-training `LLAMA3` with `GAT` for answering questions based on knowledge graph information from the toy WebQSP dataset. We also have an [example repo](https://github.com/neo4j-product-examples/neo4j-gnn-llm-example) for integration with [Neo4j Graph DBs][neo4j.com] along with an associated [blog](https://developer.nvidia.com/blog/boosting-qa-accuracy-with-graphrag-using-pyg-and-graph-databases/) showing 2x accuracy gains over LLMs on real medical data.                                                                                                                                                                                                                                        |
+| [`nvtx_examples/`](./nvtx_examples/)   | Contains examples of how to wrap functions using the NVTX profiler for CUDA runtime analysis.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
+| [`molecule_gpt.py`](./molecule_gpt.py) | Example for MoleculeGPT: Instruction Following Large Language Models for Molecular Property Prediction. Supports MoleculeGPT and InstructMol dataset                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             |
+| [`glem.py`](./glem.py)                 | Example for [GLEM](https://arxiv.org/abs/2210.14709), a GNN+LLM co-training model via variational Expectation-Maximization (EM) framework on node classification tasks to achieve SOTA results                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
+| [`git_mol.py`](./git_mol.py)           | Example for GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
+| [`protein_mpnn.py`](./protein_mpnn.py) | Example for [Robust deep learning--based protein sequence design using ProteinMPNN](https://www.biorxiv.org/content/10.1101/2022.06.03.494563v1)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |
+| [`txt2kg_rag.py`](./txt2kg_rag.py)     | Full end 2 end RAG pipeline using TXT2KG and Vector and Graph RAG with a GNN to achieve state of the art results. Uses the [techQA dataset](https://paperswithcode.com/dataset/techqa) but can be extended to handle any RAG dataset with a corpus of documents and an associated set of Q+A pairs to be split for train/eval/test. See [Kumo.ai x NVIDIA GNN+LLM Webinar](https://www.youtube.com/watch?v=uRIA8e7Y_vs) for more details. Note that the TechQA data requires only a single document to answer each question so it can be viewed as a toy example. To see significant accuracy boosts from GNN+LLM TXT2KG based RAG, use data that requires multiple text chunks to answer a question. In cases where single document can answer, basic RAG should be sufficient. |