./countingtrain.pytrain the character counting model, as well as make up the dataintervention.pyactivation patching experimentsvisualize.ipynbresults visualization code
./ioimake_decoder_train_data.pycontains code to make up the dataDLA.pyimplements the DLA experiments
./additiontrain.pytrain the 3 digit addition model, as well as make up the dataintervention.pymain activation patching experimentsinterventionPlus.pyactivation patching experiments for the "+" signvisualize.ipynbresults visualization code
./factualfind_heads_attribution.pyfind 25 most important heads in upper layersmake_data_part1.pyselect text from COUNTERFACT and BEAR that would "activate" each head (do not attend to BOS too much) at the END positionmake_data_part2.pyselect text from miniPile that would "activate" each headcal_freq.pycalculate token frequency over miniPile
./decodermodel.pydefines the model architecturetrain.pytrain the decodercache_generation.pygenerate samples using decoder, but not in a visualized form, need to be transferred to streamlit apprun.shcommands to train decoder and generate samplesutils.pygenerate.pyfunctions used by other filescache_attention.pyused to save attention patternsscatter_completeness.pyscatter_completeness_plot.pydraw scatter plots to verify the completeness
./training_outputscontains the model checkpoint of the probed model for counting and addition task, so the results are reproduceable./LLMcontains prompts and code used to automatically generate interpretation with LLMs./webAPPcontains source code for our web application
- Go to
./ioiand runmake_decoder_train_data.pyto generate data for ioi task. You don't need to do this for counting and addition task. To run factual recall experiment, first download COUNTERFACT and BEAR data (the links are in./factual/make_data_part1.py), then go to./factualand runmake_data_part1.pyandmake_data_part2.pysequentially. - Go to
./decoderand checkrun.shpick a task you are interested and train the decoder. For examplepython train.py --probed_task counting --rebalance 6.0 --save_dir $dir_name --batch_size 256 --num_epoch 100 --data_per_epoch 1000000 --num_test_rollout 200 > ./data_and_model/counting.txt - In
run.shit also contains command for generating preimage samples using decoder. For example,python cache_generation.py --probed_task countingand the generation will appear in./training_outputsThe best way to check the generated samples is to go into./webAPPfolder and dostreamlit run InversionView.py