This repository contains the code for training ModernBERT for trajectory classification. The code is based on the Hugging Face ModernBERT implementation.
Datasets are higher-order mobility data.
Install dependencies using the following command:
conda env create -f environment.yml
Activate the environment:
conda activate modern
We put all the hexagon/tessellation ids in a single file and then use the transformers
library to train a tokenizer on the corpus. The tokenizer is then used to in training the model as a pre-trained tokenizer.
python preprocess.py
We train a tokenizer on the hexagon/tessellation ids using the transformers
library.
python tokenizer_trainer.py
Adjust the hyperparameters in the train.py
file and then run the following command to train the model:
python train.py
For evaluation of the model, run the following command:
python evaluate_script.py
Adjust the path of dataset and column names in each file then run the following commands.
For this you should map it to a hexagon/tessellation format. You can convert to raw trajectory data to hexagon/tessellation format using the following repository: Point2Hex.
To cite this repo:
@misc{Faraji2025ModenBERT,
author = {Faraji, Ali},
title = {Training ModernBERT for Trajectory Classification},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/alifa98/ModernBERT-Trajectory-Classification}},
}