LLaVA-NDiNO

🤗📚 Datasets | 🤗💻 Models

Repository for the paper "LLaVA-NDiNO: Empowering LLMs with Multimodality for the Italian Language"

Introduction

LLaVA-NDiNO is a family of models trained for optimized performance in the Italian language. Specifically, the models have been trained using three different approaches (either only one of them or by applying them in sequence):

Language Adaptation: by pre-training the model on a rich collection of image-text data
Instruction-Tuning: by fine-tuning the model on instruction-following image-text data (where the model answer is brief)
Long Instruction-Tuning: by fine-tuning the model on instruction-following image-text data (where the model answer is long)

In this repository we provide everything we used for training and evaluation. Please note that this work used the LLaVA-NeXT codebase for the training procedure. We modified a single script, we provide this script in the repository.

Repository Structure

📁 lmms-eval-tasks: contains the tasks implementations to be added to the lmms-eval library to reproduce the evaluation results on the Italian versions of GQA, POPE, SeedBENCH, OK-VQA, MTVQA and EXAMS-V
📁 requirements: contains the Singularity definition file to build the Singularity container used for the training step
📄 convert_llava_weights.py: script used to convert the LLaVA-NeXT checkpoint obtained by the original codebase into the HuggingFace format
📄 evaluate.sh: template script to evaluate the models on the Italian versions of GQA, POPE, SeedBENCH, OK-VQA, MTVQA and EXAMS-V
📄 evaluate_ppl.py: script to evaluate the models on the Perplexity metric
📄 llava_train_modified.py: modified train script of the original LLaVA-NeXT repository to apply the LLaMA 3 chat template without system prompt
📄 train_from_llm.sh: template script to train a LLaVA-NeXT model from a pre-trained LLM
📄 train_from_lmm.sh: template script to train a LLaVA-NeXT model from a pre-trained LLaVA-NeXT model

Usage

To train a model, you should:

Build the Singularity container using the definition file in requirements
Replace the original train.py script with the llava_train_modified.py script
Perform the LLaVA-NDiNO train steps,train_from_llm.sh and train_from_lmm.sh are template scripts to train LLaVA-NeXT starting from a LLM and a LLaVA-NeXT checkpoint respectively
Convert the model using the convert_llava_weights.py script

To evaluate a model, you should:

Clone and install the lmms-eval library
Add the task folders and the mBlip script in lmms-eval-tasks to the tasks and models directories respectively
Evaluate the models following the template scripts in evaluate.sh

Citation

@inproceedings{musacchioLLaVANDiNO,
  title={LLaVA-NDiNO: Empowering LLMs with Multimodality for the Italian Language},
  author={Musacchio, Elio and Siciliani, Lucia and Basile, Pierpaolo and Semeraro, Giovanni},
  booktitle={Proceedings of the Eighth Workshop on Natural Language for Artificial Intelligence (NL4AI 2024) co-located with 23th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2024)},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

LLaVA-NDiNO

Introduction

Repository Structure

Usage

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

LLaVA-NDiNO

Introduction

Repository Structure

Usage

Citation