Skip to content

Multilingual Overhaul #833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

Multilingual Overhaul #833

wants to merge 3 commits into from

Conversation

hynky1999
Copy link
Collaborator

@hynky1999 hynky1999 commented Jun 25, 2025

First of all sorry that PR is quite large, the worked on this started about half year ago and then was put to halt.

What this PR do?

  1. Adds new math tasks to multilngual tasks, the target here are only languages that fine-tasks consider:
*cmath_tasks,  
*mathlogicqa_rus_tasks,  
*mgsm_tasks,  
*afri_mgsm_tasks,  
*armath_tasks,  
*msvamp_tasks,  
*cmm_math_mc_tasks,  
*math23k_tasks,  
*tal_scq5k_tasks,  
*mathqa_tr_tasks,  
*mwp_tr_tasks,  
*mera_arithmetic_tasks,  
*qazuntv2_tasks,  
*hawp_tasks,  
  1. Large template changes:
    a) Adding cot parameter to Formulation, this will change the Answer: to Answer-Step-By-Step: which is cheap way to enforce CoT. This is again templated (as is original Answer)

b) Adding auto instructions for cot formulation.
As written in the comment here is the phylosphy behind what version of task one should use:

# Philospohy of formulations:
# 1. For early-stage pretrained model, we recommend using CF formulation with few-shots (task_cf_native), this allows to get reasonable signal even at this stage
# 2. For later stage, we recommend using MCF formulation with few-shots (task_mcf_native), as models at this point should be able to do MCF formulation
# 3. For post-trained models, we recommend using MCF formulation without few-shots with cot (task_mcf_cot_native), this allows the best match to their real usage and they should be capable to
# follow expected format

# Similarly for generative tasks, we recommend using non-cot variants for all pre-trained models, and cot variants for post-trained models

Therefore I added a simple system which will use templated instruction if we are in MCF mode with cot. Issue here is that we will need to crowdsource these instructions again, but I think it should be possible. Not all tasks are the same so I had to create few different instructions to tell what to output (letter vs letter of continuation vs just answer to question) and into what format to output (\boxed for math vs for everything else, I tested various separators but works by far the best). This also means I added simple extractor for the

The goal here is to use a templated instruction, so that we don't have to do this (https://github.com/huggingface/lighteval/pull/832/files) for every single task for multingual evals (impossible)

Some more nits:

  • Continuation tasks now separate the options with Options keyword, this is because I found that some continuation models will not understand these are possible continuations but rather just random enumeration. The downside is that it means we again need to crowdsource one more keywords (Options). I am quite hesistant about this because the upsides are not that great at the end and if we do this it will require folks to contribute the literals to make these tasks runnable again (hellaswag / copa)
  • Smol overhaul of translation metrics (so that they can work with the and thinking

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant