This repository contains the code and data used to generate a premise selection dataset, which is described in Magnushammer: A Transformer-based Approach to Premise Selection. The premise selection model trained on this dataset achieved a state-of-the-art 71% proof rate on the PISA benchmark and 37.3% on miniF2F.
You can download the dataset used in the paper from our Huggingface datasets page. The code in this repository and the data stored here can be used to generate new datasets for premise selection, proof generation, and text retrieval tasks.
The raw data used to generate the datasets is a collection of proof trajectories - sequences that contain information about the commands called to Isabelle (the proof assistant) and the corresponding proof states.
Each proof trajectory corresponds to a unique proof of a problem, written by a human expert or generated automatically with Sledgehammer. The proofs generated by Sledgehammer are alternatives to the original human proofs, so some trajectories in the raw data come from the same original problems.
The raw data is structured as JSON files, and each file corresponds to a particular proof generation mode. Each JSON file contains trajectories, where a given trajectory represents the steps taken in a single proof. The example given below clarifies how a proof is transformed into a trajectory.
For the following proof:
theorem identity1: fixes f :: "nat \<Rightarrow> nat"
assumes fff: "\<And>n. f(f(n)) < f(Suc(n))"
shows "f(n) = n"
proof -
{ fix m n have key: "n \<le> m \<Longrightarrow> n \<le> f(m)"
proof(induct n arbitrary: m)
case 0 show ?case by simp
[...]
qed }
hence "\<And>n. n \<le> f(n)" by simp
hence "\<And>n. f(n) < f(Suc n)" by(metis fff order_le_less_trans)
hence "f(n) < n+1" by (metis fff lift_Suc_mono_less_iff[of f] Suc_eq_plus1)
with \<open>n \<le> f(n)\<close> show "f n = n" by arith
qed
We get the following trajectory
{
'statement': 'theorem identity1: fixes f :: "nat \\<Rightarrow> nat"
assumes fff: "\\<And>n. f(f(n)) < f(Suc(n))"
shows "f(n) = n"'
'transitions': <list_of_transitions>
}
where each transition in <list_of_transitions>
corresponds to a single proof step (we do not include the full list for brevity).
A transition that corresponds to the step by (metis fff lift_Suc_mono_less_iff[of f] Suc_eq_plus1)
called above is
'state': 'proof (prove)\n using this:\n f ?n < f (Suc ?n) \n goal (1 subgoal):\n 1. f n < n + 1'
'step': 'by (metis fff lift_Suc_mono_less_iff[of f] Suc_eq_plus1)'
'premises': {'fff': ['local.fff', ' fff: fixes n :: "nat" shows "f (f n) < f (Suc n)"'],
'lift_Suc_mono_less_iff': ['Nat.order.lift_Suc_mono_less_iff', ' lift_Suc_mono_less_iff: fixes less_eq :: "\'a \\<Rightarrow> \'a \\<Rightarrow> bool" and less :: "\'a \\<Rightarrow> \'a \\<Rightarrow> bool" and f :: "nat \\<Rightarrow> \'a" and n :: "nat" and m :: "nat" assumes "class.order less_eq less" and "\\<And>n. less (f n) (f (Suc n))" shows "less (f n) (f m) = (n < m)"'],
'Suc_eq_plus1': ['Nat.Suc_eq_plus1', ' Suc_eq_plus1: fixes n :: "nat" shows "Suc n = n + 1"']}
Hence the structure of a trajectory is:
Trajectory:
{
'statement': str,
'transitions': List[Transition]
}
where
Transition:
{
'state': str,
'step': str,
'premises': Dict[str,Tuple[str,str]]
}
In 'premises', the keys are names of the premises referenced in the proof step, whereas the values are lists, where the first element is the name of the premise in the lemma library, and the second element is the statement of the premise.
In order to generate your own dataset from the raw data, download the selected raw files included here. We include a discussion of the source and extraction methods for all the raw data files below.
Next, run setup.sh
to set up a virtual environment and install dependencies.
chmod +x setup.sh
./setup.sh
Finally, run the following script to generate the data in a JSON
format compliant with Huggingface's Datasets:
python generation_script.py --file-paths <paths> --out-path <out_path> --forbidden-statements-path PISA_test_theorems.json
All raw data files for this project are available for download here.
The data is categorized into two types: human data
and machine_generated_data
.
Human data
consists of proofs that were created directly by humans. On the other hand, machine-generated data
is composed of alternative proofs generated using Sledgehammer.
The machine-generated data is further divided into two subcategories, gen1
and gen2
, which differ in the Sledgehammer configurations used for mining. Gen1
proofs use a less diverse set of tactics, but sometimes have a higher chance of producing a successful proof.
The two modes of proof generation used for machine-generated data are referred to as mode_1
and mode_2
. Proofs in mode_1
were generated by replacing the last step of a successful human proof with a Sledgehammer step, while mode_2
attempted to replace the entire proof with a single Sledgehammer step. Note that since human proofs are usually nested, both approaches considered all sub-proofs separately as well.
The last distinction, minimal
and non-minimal
, refers again to Sledgehammer configuration. Minimal
proofs were found by first finding a non-minimal
version, then trying to remove unnecessary premises. Thus, non-minimal
proofs contain more premises, but some of them might be redundant. For batch-contrastive learning, there is a case to be made for both approaches.
All data included in this dataset is written in English and uses the Isabelle syntax, which represents mathematical expressions using syntax similar to LaTeX.
The dataset was created using the proofs included in the Archive of Formal Proofs and the Standard library included in the Isabelle 2021-1 distribution.
The data included in this dataset is mostly untyped, meaning that there is little information about the objects referenced in the statement or premise statements. Adding type information would be a valuable contribution.
If you use this dataset in your research, please cite the associated arXiv paper: Magnushammer: A Transformer-based Approach to Premise Selection
We would like to express our gratitude to the following individuals and organizations for their contributions to this project:
-
We would like to acknowledge @jinpz for their contributions to the data mining aspect of this project. Their expertise and hard work greatly assisted us in achieving our project goals.
-
PISA API: We also want to thank the developers of the PISA API for creating a powerful tool that allowed us to interact with Isabelle through Python.
-
Google TRC Compute: Finally, we want to acknowledge Google's TPU Reasearch Cloud for providing compute necessary to develop the code infrastructure needed for the mining procedure.
We are grateful for the support and contributions of each of these individuals and organizations, and we would not have been able to accomplish this project without them.