Training ModernBERT for Trajectory Classification

This repository contains the code for training ModernBERT for trajectory classification. The code is based on the Hugging Face ModernBERT implementation.

Datasets

Datasets are higher-order mobility data.

Dependencies

Install dependencies using the following command:

conda env create -f environment.yml

Activate the environment:

conda activate modern

Preprocessing

We put all the hexagon/tessellation ids in a single file and then use the transformers library to train a tokenizer on the corpus. The tokenizer is then used to in training the model as a pre-trained tokenizer.

python preprocess.py

Training a Tokenizer

We train a tokenizer on the hexagon/tessellation ids using the transformers library.

python tokenizer_trainer.py

Training

Adjust the hyperparameters in the train.py file and then run the following command to train the model:

python train.py

Evaluation

For evaluation of the model, run the following command:

python evaluate_script.py

Running for your own data

Adjust the path of dataset and column names in each file then run the following commands.

How can I train on my own trajectory data which is not in the hexagon/tessellation format?

For this you should map it to a hexagon/tessellation format. You can convert to raw trajectory data to hexagon/tessellation format using the following repository: Point2Hex.

Citation

To cite this repo:

@misc{Faraji2025ModenBERT,
  author = {Faraji, Ali},
  title = {Training ModernBERT for Trajectory Classification},
  year = {2025},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/alifa98/ModernBERT-Trajectory-Classification}},
}

Name	Name	Last commit message	Last commit date
Latest commit alifa98 Clarify tokenizer usage in README for model training process Jan 29, 2025 bdbf6d4 · Jan 29, 2025 History 11 Commits
.gitignore	.gitignore	Add .gitignore to exclude checkpoints and __pycache__ directories	Jan 29, 2025
README.md	README.md	Clarify tokenizer usage in README for model training process	Jan 29, 2025
environment.yml	environment.yml	Update README and add environment configuration for ModernBERT training	Jan 29, 2025
evaluate_script.py	evaluate_script.py	Add evaluation script for ModernBert model and update train script fo…	Jan 29, 2025
preprocess.py	preprocess.py	Add preprocessing script to load CSV and save sequences for tokenizer…	Jan 29, 2025
tokenizer_trainer.py	tokenizer_trainer.py	Add tokenizer trainer script to create and save a WordLevel tokenizer	Jan 29, 2025
train.py	train.py	Increase model capacity by adjusting the number of hidden layers and …	Jan 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Training ModernBERT for Trajectory Classification

Datasets

Dependencies

Preprocessing

Training a Tokenizer

Training

Evaluation

Running for your own data

How can I train on my own trajectory data which is not in the hexagon/tessellation format?

Citation

About

Languages

alifa98/ModernBERT-Trajectory-Classification

Folders and files

Latest commit

History

Repository files navigation

Training ModernBERT for Trajectory Classification

Datasets

Dependencies

Preprocessing

Training a Tokenizer

Training

Evaluation

Running for your own data

How can I train on my own trajectory data which is not in the hexagon/tessellation format?

Citation

About

Topics

Resources

Stars

Watchers

Forks

Languages