Skip to content

Latest commit

 

History

History
25 lines (14 loc) · 1.54 KB

readme.md

File metadata and controls

25 lines (14 loc) · 1.54 KB

Syntax-based Contextual Visualizations for SAE & LLM Interpretability

Project Overview

This project aims to improve interpretability measures by developing a new visualization method for Sparse Autoencoder (SAE) feature contexts. Specifically, we proposed using syntactic dependencies to illuminate similarities between contexts.

Project Features

We were able to develop three novel views for activation contexts, two of which utilize syntactic dependency structures and one which uses branching trees. We use the SpaCy dependency parser and sentence tagger on the backend. These new views are meant to supplement activation context lists, e.g. those developed by Anthropic:

Anthropic text contexts

The joint view shows individual contexts side by side. You can enable part of speech tagging or view inactive tokens through the top panel:

Joint view with syntactic contexts

The merged view aggregates commonly occurring contexts and displays them in a branching format. These trees are instantiated as list structures and subtree matches are located where possible.

Merged view with linear trees

The updated merged view simplifies the presentation to primarily consider cooccurrence information, giving an overall picture of relevant contexts for a feature. It fixes the issues with overlap encountered earlier:

Updated merged view with even spacing

Acknowledgements

This work was done for David Laidlaw's CSCI2370: Interdisciplinary Scientific Visualization class.