Python package for data processing and training of claim verification models. This package was developed as part of an ML engineering capstone project.
Claim verification is a task in natural language processing (NLP) with applications ranging from fact-checking to verifying the accuracy of scientific citations. The models used in this package are based on the transformer deep-learning architecture.
- Data Modules
- Support for local files and HuggingFace datasets.
- Consistent label encoding for different natural language inference (NLI) datasets (see below).
- Supports shuffling training data from multiple datasets for improved model generalization.
- Trainer
- Training and data modules implemented with PyTorch Lightning.
- Use any pretrained sequence classification model from HuggingFace.
- Logger is configured to plot training and validation loss on the same graph in TensorBoard.
Run these commands in the root directory of the repository.
- The first command installs the requirements.
- The second command installs the pyvers package in development mode.
- Remove the
-e
for a standard installation.
- Remove the
pip install -r requirements.txt
pip install -e .
- This class loads data from local data files in JSON lines format (jsonl).
- Supported datasets include SciFact and Citation-Integrity.
- The schema for the data files is described here.
- Get data files for SciFact and Citation-Integrity with labels used in pyvers here.
- The data module can be used to shuffle training data from both datasets.
from pyvers.data import FileDataModule
# Set the model used for the tokenizer
model_name = "bert-base-uncased"
# Load data from one dataset
dm = FileDataModule("data/scifact", model_name)
# Shuffle training data from two datasets
dm = FileDataModule(["data/scifact", "data/citint"], model_name)
# Get some tokenized data
dm.setup("fit")
next(iter(dm.train_dataloader()))
- This class loads data from selected HuggingFace datasets.
- Supported datasets are copenlu/fever_gold_evidence, facebook/anli, and nyu-mll/multi_nli.
from pyvers.data import NLIDataModule
model_name = "bert-base-uncased"
# Load data from HuggingFace datasets
dm = NLIDataModule("facebook/anli", model_name)
# Get some tokenized data
dm.prepare_data()
dm.setup("fit")
next(iter(dm.train_dataloader()))
- This is a small handmade toy dataset.
- There are no data files; the dataset is hard-coded in the class definition.
This takes about a minute on a CPU.
# Import required modules
import pytorch_lightning as pl
from pyvers.data import ToyDataModule
from pyvers.model import PyversClassifier
# Initialize data and model
dm = ToyDataModule("bert-base-uncased")
model = PyversClassifier(dm.model_name)
# Train model
trainer = pl.Trainer(enable_checkpointing=False, max_epochs=20)
trainer.fit(model, datamodule=dm)
# Test model
trainer.test(model, datamodule=dm)
# Show predictions
predictions = trainer.predict(model, datamodule=dm)
print(predictions)
This is what we get (results vary between runs):
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Test metric ┃ DataLoader 0 ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ AUROC Macro │ 0.963 │
│ AUROC Weighted │ 0.963 │
│ Accuracy │ 88.9 │
│ F1 Macro │ 88.6 │
│ F1 Micro │ 88.9 │
│ F1_NEI │ 100.0 │
│ F1_REFUTE │ 80.0 │
│ F1_SUPPORT │ 85.7 │
└───────────────────────────┴───────────────────────────┘
[['SUPPORT', 'SUPPORT', 'SUPPORT', 'NEI', 'NEI', 'NEI', 'REFUTE', 'REFUTE', 'SUPPORT']]
# Ground-truth labels are:
# [['SUPPORT', 'SUPPORT', 'SUPPORT', 'NEI', 'NEI', 'NEI', 'REFUTE', 'REFUTE', 'REFUTE']]
This uses a DeBERTa model trained on MultiNLI, Fever-NLI and Adversarial-NLI (ANLI) for zero-shot classification of claim-evidence pairs.
import pytorch_lightning as pl
from pyvers.model import PyversClassifier
from pyvers.data import ToyDataModule
dm = ToyDataModule("MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli")
model = PyversClassifier(dm.model_name)
trainer = pl.Trainer()
dm.setup(stage="test")
predictions = trainer.predict(model, datamodule=dm)
print(predictions)
# [['SUPPORT', 'SUPPORT', 'SUPPORT', 'REFUTE', 'REFUTE', 'REFUTE', 'REFUTE', 'REFUTE', 'REFUTE']]
The pretrained model successfully distinguishes between SUPPORT and REFUTE on the toy dataset but misclassifies NEI as REFUTE. This can be improved with fine-tuning.
When using a pre-trained model for zero-shot classification, check the mapping between labels and IDs.
from transformers import AutoConfig
model_name = "answerdotai/ModernBERT-base"
config = AutoConfig.from_pretrained(model_name, num_labels=3)
print(config.to_dict()["id2label"])
# {0: 'LABEL_0', 1: 'LABEL_1', 2: 'LABEL_2'}
model_name = "MoritzLaurer/DeBERTa-v3-base-mnli-fever-anli"
config = AutoConfig.from_pretrained(model_name, num_labels=3)
print(config.to_dict()["id2label"])
# {0: 'entailment', 1: 'neutral', 2: 'contradiction'}
Because it uses labels that are consistent with the NLI categories listed below, for zero-shot classification we would choose the pretrained DeBERTa model rather than ModernBERT. However, fine-tuning either model for text classification should work (see this page for information on fine-tuning ModernBERT).
ID | pyvers | Fever* | MultiNLI, ANLI |
---|---|---|---|
0 | SUPPORT | SUPPORTS | entailment |
1 | NEI | NOT ENOUGH INFO | neutral |
2 | REFUTE | REFUTES | contradiction |
* Text labels only