GitHub - PhilipQuirke/quanta_maths: Tool used to verify accuracy of transformer model

Introduction

This library support goals and uses terminology introduced in the paper Increasing Trust in Language Models through the Reuse of Verified Circuits. Please read the paper. In brief:

Given an existing transformer model with low loss, this library helps a researcher to analyze and understand the algorithm implemented by a transformer model.
The "useful" token positions, attention heads and MLP neurons that are used in predictions are identified.
Various tools and techniques evaluate aspects of the model's "behavior" (e.g. attention patterns).
The researcher can extend the tools with model-specific searches and tests - searching for hypothesised model components that perform model-specific algorithm "sub-tasks" (e.g. Base Add in the Addition model)
Useful facts found in this way are stored as JSON (refer Useful_Tags for details) and can be visualized (refer Assets for samples).
A researcher can describe an algorithm hypothesis as a series of claims, and evaluate those claims against the facts found. The resulting insights can be used to refine and\or extend both the algorithm sub-task tests and the algorithm hypothesis description, leading to a full description of the model's algorithm.

Installation

From source

git clone https://github.com/PhilipQuirke/quanta_maths.git
cd MathsMechInterp
pip install .

Test bed

Much of this library is generic (can be applied to any transformer model). As a "real-world" testbed to help refine this library we use models trained to perform integer addition and subtraction (e.g. 133357+182243=+0315600 and 123450-345670=-0123230). Arithmetic-specific algorithm sub-task searches are defined (e.g. Base Add, Use Sum 9, Make Carry, Base Subtract, Borrow One). Addition and Subtraction hypothesises are described and evaluated in the Colab notebook QuantaMathsAnalyse.ipynb. Arithmetic-specific python code is in files like maths_config.py.

Folders, Files and Classes

This library contains files:

Notebooks: Jupyter notebooks which are run in Google Colab or Jupyter:
- Train: Colab QuantaMathsTrain.ipynb is used to train transformer arithmetic models.
  - Outputs pth and json files that are (manually) stored on HuggingFace
- Analysis: Colab QuantaMathsAnalyse.ipynb is used to analyze the behavior and algorithm sub-tasks of transformer arithmetic models
  - Inputs pth files (generated above) from HuggingFace
  - Outputs *_behavior and *_algorithm json files that are (manually) stored on HuggingFace
- Algorithm: Colab QuantaMathsAlgorithm.ipynb describes/tests an overall algorithm for a model (based on behavior and algorithm sub-tasks data)
  - Inputs *_behavior and *_algorithm json files (generated above) from HuggingFace
QuantaMechInterp: Python library code imported into the notebooks:
- model_*.py: Contains the configuration of the transformer model being trained/analysed. Includes class ModelConfig
- useful_*.py: Contains data on the useful token positions and useful nodes (attention heads and MLP neurons) that the model uses in predictions. Includes class UsefulConfig derived from ModelConfig. Refer Useful_Tags for more detail.
- algo_*.py: Contains tools to support declaring and validating a model algorithm. Includes class AlgoConfig derived from UsefulConfig.
- quanta_*.py: Contains categorisations of model behavior (aka quanta), with ways to detect, filter and graph them. Refer Filter for more detail.
- ablate_*.py: Contains ways to "intervention ablate" the model and detect the impact of the ablation
- maths_*.py: Contains specializations of the above specific to arithmetic (addition and subtraction) transformer models. Includes class MathsConfig derived from AlgoConfig.
Tests: Unit tests

HuggingFace resources

The HuggingFace website holds the output files generated by the CoLab notebooks for ~45 models:

For each model these output files available are:

model's weight (model.pth),
model's training details (training.json),
generic analysis facts (behavior.json), and
maths-specific results from searching for hypothesis algorithm features (features.json)

Refer Hugging_Models for more detail.

Papers

The papers associated with this content are:

Understanding Addition in Transformers: https://arxiv.org/abs/2310.13121 . Aka Paper1. Model add_d5_l1_h3_t30K is very similar to the one in this paper.
Increasing Trust in Language Models through the Reuse of Verified Circuits. https://arxiv.org/abs/2402.02619

Extending the code

Most exploratory work is done in a Google Colab in the 'train and 'analyse' notebooks. After some new code is successfully developed and tested in the notebook, the code is migrated to the quanta_tools code folder.

Name		Name	Last commit message	Last commit date
Latest commit History 1,133 Commits
.github		.github
MathsMechInterp		MathsMechInterp
Tests		Tests
assets		assets
notebooks		notebooks
.gitignore		.gitignore
LICENSE		LICENSE
MathsMechInterp.pyproj		MathsMechInterp.pyproj
QuantaProjects.sln		QuantaProjects.sln
README.md		README.md
filter.md		filter.md
hugging_models.md		hugging_models.md
mixed_model.md		mixed_model.md
pca.md		pca.md
pyproject.toml		pyproject.toml
terminology.md		terminology.md
test_useful_node_list.json		test_useful_node_list.json
useful_tags.md		useful_tags.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Installation

Test bed

Folders, Files and Classes

HuggingFace resources

Papers

Extending the code

About

Releases

Packages

Contributors 2

Languages

License

PhilipQuirke/quanta_maths

Folders and files

Latest commit

History

Repository files navigation

Introduction

Installation

Test bed

Folders, Files and Classes

HuggingFace resources

Papers

Extending the code

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages