Multilingual Overhaul #833
Open
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
First of all sorry that PR is quite large, the worked on this started about half year ago and then was put to halt.
What this PR do?
a) Adding cot parameter to Formulation, this will change the
Answer:
toAnswer-Step-By-Step:
which is cheap way to enforce CoT. This is again templated (as is original Answer)b) Adding auto instructions for cot formulation.
As written in the comment here is the phylosphy behind what version of task one should use:
Therefore I added a simple system which will use templated instruction if we are in MCF mode with cot. Issue here is that we will need to crowdsource these instructions again, but I think it should be possible. Not all tasks are the same so I had to create few different instructions to tell what to output (letter vs letter of continuation vs just answer to question) and into what format to output (\boxed for math vs for everything else, I tested various separators but works by far the best). This also means I added simple extractor for the
The goal here is to use a templated instruction, so that we don't have to do this (https://github.com/huggingface/lighteval/pull/832/files) for every single task for multingual evals (impossible)
Some more nits: