Skip to content

Latest commit

 

History

History
156 lines (124 loc) · 9.05 KB

README.md

File metadata and controls

156 lines (124 loc) · 9.05 KB

Rematch-RARE

Conference

Paper

arXiv

PWC

PWC

PWC

This repository contains the source code, data, and documentation for the research paper:

@inproceedings{kachwala-etal-2024-rematch,
    title = "{REMATCH}: Robust and Efficient Matching of Local Knowledge Graphs to Improve Structural and Semantic Similarity",
    author = "Kachwala, Zoher  and
      An, Jisun  and
      Kwak, Haewoon  and
      Menczer, Filippo",
    editor = "Duh, Kevin  and
      Gomez, Helena  and
      Bethard, Steven",
    booktitle = "Findings of the Association for Computational Linguistics: NAACL 2024",
    month = jun,
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.findings-naacl.64",
    doi = "10.18653/v1/2024.findings-naacl.64",
    pages = "1018--1028",
    abstract = "Knowledge graphs play a pivotal role in various applications, such as question-answering and fact-checking. Abstract Meaning Representation (AMR) represents text as knowledge graphs. Evaluating the quality of these graphs involves matching them structurally to each other and semantically to the source text. Existing AMR metrics are inefficient and struggle to capture semantic similarity. We also lack a systematic evaluation benchmark for assessing structural similarity between AMR graphs. To overcome these limitations, we introduce a novel AMR similarity metric, rematch, alongside a new evaluation for structural similarity called RARE. Among state-of-the-art metrics, rematch ranks second in structural similarity; and first in semantic similarity by 1{--}5 percentage points on the STS-B and SICK-R benchmarks. Rematch is also five times faster than the next most efficient metric.",
}

rematchflow

An example of rematch similarity calculation for a pair of AMRs. After AMRs are parsed from sentences, rematch has a two-step process to calculate similarity. First, sets of motifs are generated. Second, the two sets are used to calculate the Jaccard similarity (intersecting motifs shown in color).

Abstract

Knowledge graphs play a pivotal role in various applications, such as question-answering and fact-checking. Abstract Meaning Representation (AMR) represents text as knowledge graphs. Evaluating the quality of these graphs involves matching them structurally to each other and semantically to the source text. Existing AMR metrics are inefficient and struggle to capture semantic similarity. We also lack a systematic evaluation benchmark for assessing structural similarity between AMR graphs. To overcome these limitations, we introduce a novel AMR similarity metric, rematch, alongside a new evaluation for structural similarity called RARE. Among state-of-the-art metrics, rematch ranks second in structural similarity; and first in semantic similarity by 1--5 percentage points on the STS-B and SICK-R benchmarks. Rematch is also five times faster than the next most efficient metric.

Keywords

Knowledge Graphs, Graph Matching, Abstract Meaning Representation (AMR), Semantic Graphs, Graph Isomorphism, Semantic Similarity, Structural Similarity.

Installation and Usage

  1. Clone the repository:

    git clone https://github.com/Zoher15/Rematch-RARE.git
  2. Create and activate conda Environment:

    conda env create -f rematch_rare.yml
    conda activate rematch_rare

Data Preprocessing

  1. License and download AMR Annotation 3.0
  2. Preprocess data by:
    bash methods/preprocess_data/preprocess_amr3.sh <dir>
    <dir> is the directory where your amr_annotation_3.0_LDC2020T02.tgz file is located

Results

Structural Consistency (RARE)

image

Steps to reproduce these results:

  1. Generate Randomized AMRs with Rewired Edges (RARE):
    python experiments/structural_consistency/randomize_amr_rewire.py
  2. Evaluate any metric on RARE test:
    bash experiments/structural_consistency/structural_consistency.sh <metric>
    <metric> should be one of rematch, smatch, s2match, sembleu, wlk or wwlk. Depending on the metric, this could take a while to run.

Semantic Consistency

image

Steps to reproduce these results:

  1. Parse AMRs from STS-B and SICK-R:

    a. Follow the instructions to install the transition_amr_parser. Highly recommend creating an independent conda environment called transition_amr_parser. Parse AMR3-structbart-L-smpl and AMR3-joint-ontowiki-seed42 by activating the environment and executing the script (requires cuda):

    conda env create -f transition_amr_parser.yml
    conda activate transition_amr_parser
    bash experiments/semantic_consistency/parse_amrs.sh

    b. (optional) Parse Spring by cloning the repo and following the instructions to install. Highly recommend creating an independent conda environment called spring. Also download and unzip the AMR3 pretrained checkpoint. Ensure that the resulting unzipped file (AMR3.parsing.pt) is in the cloned repo directory spring/. Then run the following, where <spring_dir> is the location of your Spring repo (requires cuda):

    conda env create -f spring.yml
    conda activate spring
    bash experiments/semantic_consistency/parse_spring.sh <spring_dir>

    c. (optional) Parse Amrbart by cloning the repo and following the instructions to install. Highly recommend creating an independent conda environment called amrbart. Then run the following, where <amrbart_dir> is the location of your Amrbart repo (requires cuda):

    conda env create -f amrbart.yml
    conda activate amrbart
    bash experiments/semantic_consistency/parse_amrbart.sh <amrbart_dir>
  2. Evaluate a metric on the test set:

    conda activate rematch_rare
    bash experiments/semantic_consistency/semantic_consistency.sh <metric> <parser>

    <metric> should be one of rematch, smatch, s2match, sembleu, wlk or wwlk.

    <parser> should be one of AMR3-structbart-L-smpl, AMR3-joint-ontowiki-seed42, spring_unwiki or amrbart_unwiki. Ensure the chosen <parser> has been executed in the previous step.

Hybrid Consistency (Bamboo Benchmark)

image

Please follow the instructions in the Bamboo repo. Do note that by default, Bamboo uses Pearsonr, but for our analysis we chose Spearmanr. That change can be made easily in the evaluation script by using find and replace. The word pearsonr needs to be replaced with spearmanr.

Efficiency

image

AMR Metric Time(s) RAM(GB)
smatch 927 0.2
s2match 7718 2
sembleu 275 0.2
WLK 315 30
rematch 51 0.2

Steps to reproduce this experiment:

  1. Generate the time testbed by:
    conda activate rematch_rare
    python experiments/efficiency/generate_matchups.py
  2. Evaluate a specific <metric>, one of rematch, smatch, s2match, sembleu or wlk:
    bash experiments/efficiency/efficiency.sh <metric>
  3. If all metrics have been executed, the plots from the paper can be reproduced by (save in data/processed/AMR3.0):
    python experiments/efficiency/plot_complexity.py