llm_model_evaluation

Description

Use python script to do LLM Model Evaluation.

Support Dataset

I. mmlu dataset

Introduction from paper with code: Paper-with-code

II. tmmluplus dataset

Introduction: Medium Article
huggingface dataset: Huggingface Dataset

How to use it?

Step 1: please download the model from huggingface The following command line is the example of mistral-7B-v0.1 model:

git lfs install
git clone https://huggingface.co/mistralai/Mistral-7B-v0.1

Step 2: Please arrange the dataset from tmmluplus data folder to data_arrange folder.
Step 3: Please run the following code to predict the result:

python3 evaluation_hf_testing.py \
    --model ./models/llama2-7b-hf \
    --data_dir ./llm_evaluation_tmmluplus/data_arrange/ \
    --save_dir ./llm_evaluation_tmmluplus/results/

Step 4: Please run the evaluation code to get the output json file.

!python /content/llm_model_evaluation/catogories_result_eval.py \
    --catogory "mmlu" \
    --model ./models/llama2-7b-hf \
    --save_dir "./results/results_llama2-7b-hf"

The example google colab code

mmlu dataset:

Google Colab - mmlu
Google Colab - mmlu in phi-2 model [Colab free tier can use this Google Colab example]

tmmluplus dataset:

Google Colab - tmmluplus

Evaluation Result

mmlu dataset:

模型	Weighted Accuracy	STEM	humanities	social sciences	other	Inference Time(s)
Mistral-7B-v0.1	0.6254094858282296	0.5251822398939695	0.5636556854410202	0.7357816054598635	0.703578038247995	15624.038010835648

tmmluplus dataset:

模型	Weighted Accuracy	STEM	humanities	social sciences	other	Inference Time(s)
Mistral-7B-v0.1	-	-	-	-	-	-

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
categories		categories
config		config
data		data
data_arrange		data_arrange
old_script		old_script
results		results
LICENSE		LICENSE
README.md		README.md
catogories_result_eval.py		catogories_result_eval.py
evaluation_api_testing.py		evaluation_api_testing.py
evaluation_hf_testing.py		evaluation_hf_testing.py
llm_evaluation_mmlu.ipynb		llm_evaluation_mmlu.ipynb
llm_evaluation_mmlu_phi_2.ipynb		llm_evaluation_mmlu_phi_2.ipynb
llm_evaluation_tmmluplus.ipynb		llm_evaluation_tmmluplus.ipynb
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llm_model_evaluation

Description

Support Dataset

I. mmlu dataset

II. tmmluplus dataset

How to use it?

The example google colab code

Evaluation Result

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

InfuseAI/llm_model_evaluation

Folders and files

Latest commit

History

Repository files navigation

llm_model_evaluation

Description

Support Dataset

I. mmlu dataset

II. tmmluplus dataset

How to use it?

The example google colab code

Evaluation Result

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages