Quanta Mech Interp

Introduction

This reuseable library, helps a researcher investigate a model's algorithm by:

searching a model to discover interesting (generic or topic-specific) model facts
storing facts learnt about a model using any Mech Interp (MI) technique in a common json format
declaring a hypothesised model feature as a set of facts (criteria) that must be satisfied
searching a model's attention head(s) or an MLP layer that satisfy a model feature's criteria
storing the features found in the same json format
declaring a hypothesised model algorithm as a set of model features that must exist
evaluating a hypothesised algorithm to see if all the required features exist using the model's stored facts

The library includes:

A technique Useful Nodes to reduce the model "search space" to just nodes important to the model's prediction accuracy.
A technique Filtering to more efficiently search for model facts, features and the overall algorithm.
Multiple visualizations of model useful nodes, facts and features e.g.

This library contains files:

QuantaMechInterp: Python library code imported into the notebooks:
- model_*.py: Contains the configuration of the transformer model being trained/analysed. Includes class ModelConfig
- useful_*.py: Contains data on the useful token positions and useful nodes (attention heads and MLP neurons) that the model uses in predictions. Includes class UsefulConfig derived from ModelConfig. Refer Useful_Tags for more detail.
- algo_*.py: Contains tools to support declaring and validating a model algorithm. Includes class AlgoConfig derived from UsefulConfig.
- quanta_*.py: Contains categorisations of model behavior (aka quanta), with ways to detect, filter and graph them. Refer Filter for more detail.
- ablate_*.py: Contains ways to "intervention ablate" the model and detect the impact of the ablation

From source

git clone https://github.com/PhilipQuirke/quanta_mech_interp.git
cd QuantaMechInterp
pip install .

This library was created as part of the Understanding Addition in Transformers and Increasing Trust in Language Models through the Reuse of Verified Circuits1 papers. The library is also used by the in-flight "TinySQL" project funded by WithMartian

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
QuantaMechInterp		QuantaMechInterp
Tests		Tests
assets		assets
.gitignore		.gitignore
LICENSE		LICENSE
QuantaMechInterp.pyproj		QuantaMechInterp.pyproj
README.md		README.md
feature.md		feature.md
filter.md		filter.md
pyproject.toml		pyproject.toml
useful_tags.md		useful_tags.md