Weird cuda OutOfMemoryError error

Hi

I need some helps.

I am trying to evaluate my pre-trained model with Thai fine tasks. Here is my command

```
export CUDA_VISIBLE_DEVICES="0,1"
echo "Running lighteval for model: meta-llama/Llama-3.2-3B"
lighteval accelerate \
"pretrained=meta-llama/Llama-3.2-3B,dtype=bfloat16,model_parallel=True" \
"examples/tasks/fine_tasks/mcf/th.txt" \
--custom-tasks "src/lighteval/tasks/multilingual/tasks.py" \
--dataset-loading-processes 8 \
--cache-dir "./le_cache" \
--no-use-chat-template \
--override-batch-size 4
```

When I ran this command I got error 
```
OutOfMemoryError: CUDA out of memory. Tried to allocate 6.02 GiB. GPU 0 has a total capacity of 39.59 GiB of which 5.52 GiB is free. 
Process 35878 has 674.00 MiB memory in use. Process 32242 has 33.41 GiB memory in use. Of the allocated memory 28.39 GiB is allocated by 
PyTorch, and 3.70 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting 
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  
(https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)
```
Which is kind of strange to me. My batch size is very small and here is my GPU machines spec

<img width="1373" alt="Image" src="https://github.com/user-attachments/assets/ca151aa8-8d68-478a-ad69-bafc0d326c0e" />

The only thing I can guess is that I have 7 dataset to evaluate but I still got no idea
```
# mcf.th.txt
# General Knowledge (GK)
lighteval|meta_mmlu_tha_mcf|5|1
lighteval|m3exams_tha_mcf|5|1

# Reading Comprehension (RC)
lighteval|belebele_tha_Thai_mcf|5|1
lighteval|thaiqa_tha|5|1
lighteval|xquad_tha|5|1

# Natural Language Understanding (NLU)
lighteval|community_hellaswag_tha_mcf|5|1
lighteval|xnli2.0_tha_mcf|5|1
```

Any Ideas?



I use lighteval 0.6.0.dev0 and torch 2.2.2+cu121. I clone this repo and `pip install -e .[dev]`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Weird cuda OutOfMemoryError error #561

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Weird cuda OutOfMemoryError error #561

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions