Skip to content

Latest commit

 

History

History
51 lines (38 loc) · 4.04 KB

README.md

File metadata and controls

51 lines (38 loc) · 4.04 KB

LLaVA-NDiNO

🤗📚 Datasets | 🤗💻 Models

Repository for the paper "LLaVA-NDiNO: Empowering LLMs with Multimodality for the Italian Language"

Introduction

LLaVA-NDiNO is a family of models trained for optimized performance in the Italian language. Specifically, the models have been trained using three different approaches (either only one of them or by applying them in sequence):

  • Language Adaptation: by pre-training the model on a rich collection of image-text data
  • Instruction-Tuning: by fine-tuning the model on instruction-following image-text data (where the model answer is brief)
  • Long Instruction-Tuning: by fine-tuning the model on instruction-following image-text data (where the model answer is long)

In this repository we provide everything we used for training and evaluation. Please note that this work used the LLaVA-NeXT codebase for the training procedure. We modified a single script, we provide this script in the repository.

Repository Structure

  • 📁 lmms-eval-tasks: contains the tasks implementations to be added to the lmms-eval library to reproduce the evaluation results on the Italian versions of GQA, POPE, SeedBENCH, OK-VQA, MTVQA and EXAMS-V
  • 📁 requirements: contains the Singularity definition file to build the Singularity container used for the training step
  • 📄 convert_llava_weights.py: script used to convert the LLaVA-NeXT checkpoint obtained by the original codebase into the HuggingFace format
  • 📄 evaluate.sh: template script to evaluate the models on the Italian versions of GQA, POPE, SeedBENCH, OK-VQA, MTVQA and EXAMS-V
  • 📄 evaluate_ppl.py: script to evaluate the models on the Perplexity metric
  • 📄 llava_train_modified.py: modified train script of the original LLaVA-NeXT repository to apply the LLaMA 3 chat template without system prompt
  • 📄 train_from_llm.sh: template script to train a LLaVA-NeXT model from a pre-trained LLM
  • 📄 train_from_lmm.sh: template script to train a LLaVA-NeXT model from a pre-trained LLaVA-NeXT model

Usage

To train a model, you should:

To evaluate a model, you should:

Citation

@inproceedings{musacchioLLaVANDiNO,
  title={LLaVA-NDiNO: Empowering LLMs with Multimodality for the Italian Language},
  author={Musacchio, Elio and Siciliani, Lucia and Basile, Pierpaolo and Semeraro, Giovanni},
  booktitle={Proceedings of the Eighth Workshop on Natural Language for Artificial Intelligence (NL4AI 2024) co-located with 23th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2024)},
  year={2024}
}