Skip to content

ImportError when importing datasets.load_dataset #6995

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Leo-Lsc opened this issue Jun 24, 2024 · 9 comments
Closed

ImportError when importing datasets.load_dataset #6995

Leo-Lsc opened this issue Jun 24, 2024 · 9 comments
Assignees

Comments

@Leo-Lsc
Copy link

Leo-Lsc commented Jun 24, 2024

Describe the bug

I encountered an ImportError while trying to import load_dataset from the datasets module in Hugging Face. The error message indicates a problem with importing 'CommitInfo' from 'huggingface_hub'.

Steps to reproduce the bug

  1. pip install git+https://github.com/huggingface/datasets
  2. from datasets import load_dataset

Expected behavior

ImportError Traceback (most recent call last)
Cell In[7], line 1
----> 1 from datasets import load_dataset
3 train_set = load_dataset("mispeech/speechocean762", split="train")
4 test_set = load_dataset("mispeech/speechocean762", split="test")

File d:\Anaconda3\envs\CS224S\Lib\site-packages\datasets_init_.py:17
1 # Copyright 2020 The HuggingFace Datasets Authors and the TensorFlow Datasets Authors.
2 #
3 # Licensed under the Apache License, Version 2.0 (the "License");
(...)
12 # See the License for the specific language governing permissions and
13 # limitations under the License.
15 version = "2.20.1.dev0"
---> 17 from .arrow_dataset import Dataset
18 from .arrow_reader import ReadInstruction
19 from .builder import ArrowBasedBuilder, BeamBasedBuilder, BuilderConfig, DatasetBuilder, GeneratorBasedBuilder

File d:\Anaconda3\envs\CS224S\Lib\site-packages\datasets\arrow_dataset.py:63
61 import pyarrow.compute as pc
62 from fsspec.core import url_to_fs
---> 63 from huggingface_hub import (
64 CommitInfo,
65 CommitOperationAdd,
...
70 )
71 from huggingface_hub.hf_api import RepoFile
72 from multiprocess import Pool

ImportError: cannot import name 'CommitInfo' from 'huggingface_hub' (d:\Anaconda3\envs\CS224S\Lib\site-packages\huggingface_hub_init_.py)
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

Environment info

Leo@DESKTOP-9NHUAMI MSYS /d/Anaconda3/envs/CS224S/Lib/site-packages/huggingface_hub
$ datasets-cli env
Traceback (most recent call last):
File "", line 198, in run_module_as_main
File "", line 88, in run_code
File "D:\Anaconda3\envs\CS224S\Scripts\datasets-cli.exe_main
.py", line 4, in
File "D:\Anaconda3\envs\CS224S\Lib\site-packages\datasets_init
.py", line 17, in
from .arrow_dataset import Dataset
File "D:\Anaconda3\envs\CS224S\Lib\site-packages\datasets\arrow_dataset.py", line 63, in
from huggingface_hub import (
ImportError: cannot import name 'CommitInfo' from 'huggingface_hub' (D:\Anaconda3\envs\CS224S\Lib\site-packages\huggingface_hub_init_.py)
(CS224S)

@albertvillanova albertvillanova self-assigned this Jun 25, 2024
@albertvillanova
Copy link
Member

What is the version of your installed huggingface-hub:

import huggingface_hub
print(huggingface_hub.__version__)

It seems you have a very old version of huggingface-hub, where CommitInfo was not still implemented. You need to update it:

pip install -U huggingface-hub

Note that CommitInfo was implemented in huggingface-hub 0.10.0 and datasets requires "huggingface-hub>=0.21.2"

@Leo-Lsc
Copy link
Author

Leo-Lsc commented Jun 25, 2024

The version of my huggingface-hub is 0.23.4.

@albertvillanova
Copy link
Member

albertvillanova commented Jun 25, 2024

The error message says there is no CommitInfo in your installed huggingface-hub library:

ImportError: cannot import name 'CommitInfo' from 'huggingface_hub' (D:\Anaconda3\envs\CS224S\Lib\site-packages\huggingface_hub_init_.py)

And this is implemented since version 0.10.0:

@awsankur
Copy link

I am getting the exact same issue when I import datasets. The version of my huggingface-hub is also 0.23.4. I dont see a solution in the comments. Not sure why is this issue closed?

@albertvillanova
Copy link
Member

I closed the issue because the problem is not related to the datasets library.

The problem is with your local Python environment: it seems corrupted. You could try to remove it and regenerate it again.

@awsankur
Copy link

I have recreated my conda environment but still run into the same issue. Here is my environment:

conda create --name esm python=3.10
 conda activate esm
 conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
 pip3 install -r requirements.txt

Requirements.txt

accelerate
datasets==2.20.0
pyfastx
transformers
boto3
huggingface_hub==0.23.4

And then I get:

>>> import datasets
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/fsx/ubuntu/miniconda3/envs/esm2/lib/python3.10/site-packages/datasets/__init__.py", line 17, in <module>
    from .arrow_dataset import Dataset
  File "/fsx/ubuntu/miniconda3/envs/esm2/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 63, in <module>
    from huggingface_hub import (
ImportError: cannot import name 'CommitInfo' from 'huggingface_hub' (/fsx/ubuntu/miniconda3/envs/esm2/lib/python3.10/site-packages/huggingface_hub/__init__.py)
>>>

@albertvillanova
Copy link
Member

You can check:

>>> import huggingface_hub
>>> print(huggingface_hub.__version__)

@awsankur
Copy link

This is what I see:

>>> import huggingface_hub
>>> print(huggingface_hub.__version__)
0.23.4

@awsankur
Copy link

Installing chardet makes it work for some reason

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants