Boosting Zero-Shot Crosslingual Performance using LLM-Based Augmentations with Effective Data Selection
This repo can be used to reproduce results of the paper "Boosting Zero-Shot Crosslingual Performance using LLM-Based Augmentations with Effective Data Selection"
Data related to all tasks need to stored in the data/ directory of the main repository. Datasets generated for all tasks are present here:
Fine tuning code of llama-2 is present in the directory train_llama2. Also, the code to generate data for the different tasks using LLAMA-2 is present in this directory.
IndicTrans2 can be installed by following this:
Clone this repository in the home directory and create a separate conda environment to translate using IndicTrans2.
Download the en-indic model from and store the weights in IndicTrans2/translations/en-indic-preprint folder.
For any queries regarding the code base, reach out to: [email protected]
If you use this repo, please cite the paper.