Training ModernBERT for Trajectory Classification

This repository contains the code for training ModernBERT for trajectory classification. The code is based on the Hugging Face ModernBERT implementation.

Datasets

Datasets are higher-order mobility data.

Dependencies

Install dependencies using the following command:

conda env create -f environment.yml

Activate the environment:

conda activate modern

Preprocessing

We put all the hexagon/tessellation ids in a single file and then use the transformers library to train a tokenizer on the corpus. The tokenizer is then used to in training the model as a pre-trained tokenizer.

python preprocess.py

Training a Tokenizer

We train a tokenizer on the hexagon/tessellation ids using the transformers library.

python tokenizer_trainer.py

Training

Adjust the hyperparameters in the train.py file and then run the following command to train the model:

python train.py

Evaluation

For evaluation of the model, run the following command:

python evaluate_script.py

Running for your own data

Adjust the path of dataset and column names in each file then run the following commands.

How can I train on my own trajectory data which is not in the hexagon/tessellation format?

For this you should map it to a hexagon/tessellation format. You can convert to raw trajectory data to hexagon/tessellation format using the following repository: Point2Hex.

Citation

To cite this repo:

@misc{Faraji2025ModenBERT,
  author = {Faraji, Ali},
  title = {Training ModernBERT for Trajectory Classification},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/alifa98/ModernBERT-Trajectory-Classification}},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Training ModernBERT for Trajectory Classification

Datasets

Dependencies

Preprocessing

Training a Tokenizer

Training

Evaluation

Running for your own data

How can I train on my own trajectory data which is not in the hexagon/tessellation format?

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Training ModernBERT for Trajectory Classification

Datasets

Dependencies

Preprocessing

Training a Tokenizer

Training

Evaluation

Running for your own data

How can I train on my own trajectory data which is not in the hexagon/tessellation format?

Citation