Multilingual Overhaul #833

hynky1999 · 2025-06-25T21:08:09Z

First of all sorry that PR is quite large, the worked on this started about half year ago and then was put to halt.

What this PR do?

Adds new math tasks to multilngual tasks, the target here are only languages that fine-tasks consider:

*cmath_tasks,  
*mathlogicqa_rus_tasks,  
*mgsm_tasks,  
*afri_mgsm_tasks,  
*armath_tasks,  
*msvamp_tasks,  
*cmm_math_mc_tasks,  
*math23k_tasks,  
*tal_scq5k_tasks,  
*mathqa_tr_tasks,  
*mwp_tr_tasks,  
*mera_arithmetic_tasks,  
*qazuntv2_tasks,  
*hawp_tasks,

Large template changes:
a) Adding cot parameter to Formulation, this will change the Answer: to Answer-Step-By-Step: which is cheap way to enforce CoT. This is again templated (as is original Answer)

b) Adding auto instructions for cot formulation.
As written in the comment here is the phylosphy behind what version of task one should use:

# Philospohy of formulations:
# 1. For early-stage pretrained model, we recommend using CF formulation with few-shots (task_cf_native), this allows to get reasonable signal even at this stage
# 2. For later stage, we recommend using MCF formulation with few-shots (task_mcf_native), as models at this point should be able to do MCF formulation
# 3. For post-trained models, we recommend using MCF formulation without few-shots with cot (task_mcf_cot_native), this allows the best match to their real usage and they should be capable to
# follow expected format

# Similarly for generative tasks, we recommend using non-cot variants for all pre-trained models, and cot variants for post-trained models

Therefore I added a simple system which will use templated instruction if we are in MCF mode with cot. Issue here is that we will need to crowdsource these instructions again, but I think it should be possible. Not all tasks are the same so I had to create few different instructions to tell what to output (letter vs letter of continuation vs just answer to question) and into what format to output (\boxed for math vs for everything else, I tested various separators but works by far the best). This also means I added simple extractor for the

The goal here is to use a templated instruction, so that we don't have to do this (https://github.com/huggingface/lighteval/pull/832/files) for every single task for multingual evals (impossible)

Some more nits:

Continuation tasks now separate the options with Options keyword, this is because I found that some continuation models will not understand these are possible continuations but rather just random enumeration. The downside is that it means we again need to crowdsource one more keywords (Options). I am quite hesistant about this because the upsides are not that great at the end and if we do this it will require folks to contribute the literals to make these tasks runnable again (hellaswag / copa)
Smol overhaul of translation metrics (so that they can work with the and thinking

…lsation metrics

hynky1999 added 3 commits May 23, 2025 02:28

add new translation literals

25449ac

updated templates

cb6eb13

update tasks, fix tests for templates, add math_qa template, fix tran…

f17858d

…lsation metrics

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multilingual Overhaul #833

Multilingual Overhaul #833

Uh oh!

hynky1999 commented Jun 25, 2025 •

edited

Loading

Uh oh!

Uh oh!

Multilingual Overhaul #833

Are you sure you want to change the base?

Multilingual Overhaul #833

Uh oh!

Conversation

hynky1999 commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR do?

Uh oh!

Uh oh!

hynky1999 commented Jun 25, 2025 •

edited

Loading