This repo is made to pretrained a transformer PFN model on the model of TabPFN, in order to learn temporal coherence and to perform clinical diagnosis, in connection with medical researches done in DAFTED.
First, download the project's code:
# clone project
git clone https://github.com/Jeremstym/pretrainingTSPFN.gitNext you have to install the project and its dependencies. The project's dependency management and packaging is handled
by poetry so the recommended way to install the project is in a virtual environment
(managed by your favorite tool, e.g. conda, virtualenv, poetry, etc.), where
poetry is installed. That way, you can simply run the command:
poetry installNote When a
poetry.lockfile is available in the repository,poetry installwill automatically use it to determine the versions of the packages to install, instead of resolving anew the dependencies inpyproject.toml. When nopoetry.lockfile is available, the dependencies are resolved from those listed inpyproject.toml, and apoetry.lockis generated automatically as a result.
Warning Out-of-the-box,
poetryoffers flexibility on how to install projects. Packages are nativelypip-installable just as with a traditionalsetup.pyby simply runningpip install <package>. However, we recommend usingpoetrybecause of an issue withpip-installing projects with relative path dependencies (thevitalsubmodule is specified using a relative path). When the linked issue gets fixed, the setup instructions will be updated to mention the possibility of usingpip install ., if one wishes to avoid usingpoetryentirely.
Use data in .csv format, where the last "column" of each row is the label.
Put each dataset in the same folder that you could target with a .envvariable path directory.
You just have to list your .csv in config data pretraining-csv.yaml
Warning Insure that each dataset does not contain more than 10 labels !! As we follow the TabPFN architecture (v1) and download its weights, we cannot afford more than 10 label classification for now.
This project uses Hydra to handle the configuration of the
tspfn runner script. To understand how to use Hydra's CLI, refer to its
documentation. For this particular project, preset configurations for various parts of
the tspfn runner pipeline are available in the config package. These files are meant to be
composed together by Hydra to produce a complete configuration for a run.
Below we provide examples of how to run some basic commands using the Hydra CLI:
# Manually set hydra.run.dir where the experience is run and where the output filed will be delivered
tspfn-pretrain 'hydra.run.dir=/data/stympopper/TSPFN_BIGpretraining_v3' +experiment=pretrainingTSPFN/tspfn-pretraining seed=42