Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to download OakInk2 dataset #4

Open
dqj5182 opened this issue Oct 13, 2024 · 4 comments
Open

Unable to download OakInk2 dataset #4

dqj5182 opened this issue Oct 13, 2024 · 4 comments

Comments

@dqj5182
Copy link

dqj5182 commented Oct 13, 2024

I tried to download the dataset from huggingface with following script:
from huggingface_hub import snapshot_download snapshot_download(repo_id="kelvin34501/OakInk-v2", repo_type="dataset")

However, I constantly get this error.
File "<stdin>", line 2, in <module>53.tar: 22%|███████████████████▊ | 1.79G/8.30G [20:54< 43:33, 2.49MB/s] File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn | 1.87G/8.30G [21:27< 43:56, 2.44MB/s] return fn(*args, **kwargs)-16-25-53.tar: 24%|██████████████████████▍ | 2.02G/8.30G [22:22< 28:32, 3.67MB/s] File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/huggingface_hub/_snapshot_download.py", line 308, in snapshot_download | 2.86G/8.30G [26:21< 19:31, 4.64MB/s] thread_map(6ea__2023-04-22-16-25-53.tar: 41%|█████████████████████████████████████▍ | 3.38G/8.30G [27:54< 17:21, 4.73MB/s] File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/tqdm/contrib/concurrent.py", line 94, in thread_map | 4.25G/8.30G [29:11< 05:49, 11.6MB/s] return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs) File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/tqdm/contrib/concurrent.py", line 76, in _executor_map return list(tqdm_class(ex.map(fn, *iterables, **map_args), **kwargs)) File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__ for obj in iterable: File "/home/danieljung0121/anaconda3/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator yield fs.pop().result() File "/home/danieljung0121/anaconda3/lib/python3.9/concurrent/futures/_base.py", line 446, in result return self.__get_result() File "/home/danieljung0121/anaconda3/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception File "/home/danieljung0121/anaconda3/lib/python3.9/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/huggingface_hub/_snapshot_download.py", line 283, in _inner_hf_hub_download return hf_hub_download( File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(*args, **kwargs) File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1457, in hf_hub_download http_get( File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 552, in http_get raise EnvironmentError( OSError: Consistency check failed: file should be of size 13168824320 but has size 2971966929 ((…)dab0aa577c392a5__2023-04-13-20-53-02.tar). We are sorry for the inconvenience. Please retry download and passforce_download=True, resume_download=Falseas argument. If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub..

FYI, I have tried all combinations of force_download=True, resume_download=True, force_download=True, resume_download=False, force_download=False, resume_download=True, force_download=False, resume_download=False as recommended by the error but none of them worked. So, I am assuming that the problem stems from either the download itself or huggingface.

May I ask for help on this problem?

@kelvin34501
Copy link
Collaborator

kelvin34501 commented Oct 16, 2024

Does the download error only happen on the specific tar
scene_01__O001++seq__97fc3dab0aa577c392a5__2023-04-13-20-53-02.tar?

@dqj5182
Copy link
Author

dqj5182 commented Oct 16, 2024

@kelvin34501 Thank you for your response.

It is difficult to tell, because the download script processes entire files in a whole.
But, I think the file definitely contains error for downloading.

@dqj5182
Copy link
Author

dqj5182 commented Oct 17, 2024

@kelvin34501 I have downloaded the scene_01__O001++seq__97fc3dab0aa577c392a5__2023-04-13-20-53-02.tar manually from huggingface webpage and added to my storage and rerun the script from huggingface_hub import snapshot_download snapshot_download(repo_id="kelvin34501/OakInk-v2", repo_type="dataset").

But the error persists for different file.

(…)e09a5a39a86248e__2023-04-17-14-49-58.tar: 32%|█████████████████████████████▊ | 315M/991M [03:20<07:12, 1.57MB/s] (…)cfe40029aca15ea__2023-04-22-09-43-29.tar: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1.53G/1.53G [14:24<00:00, 1.77MB/s] (…)e8d115be91c9473__2023-04-17-14-49-08.tar: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1.01G/1.01G [15:20<00:00, 1.10MB/s] (…)09a2be744b05fc3__2023-04-17-15-08-33.tar: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1.96G/1.96G [18:46<00:00, 1.74MB/s] (…)47c6f3886b08da2__2023-04-17-14-51-43.tar: 100%|██████████████████████████████████████████████████████████████████████████████████████████████| 933M/933M [11:45<00:00, 1.32MB/s] (…)0dc7af63ce7ae6e__2023-04-17-14-59-25.tar: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1.79G/1.79G [11:09<00:00, 2.68MB/s] (…)82529e4350bba74__2023-04-22-16-10-37.tar: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 9.81G/9.81G [1:34:15<00:00, 1.74MB/s] (…)cc8304e637ce76c__2023-04-17-15-42-47.tar: 100%|████████████████████████████████████████████████████████████████████████████████████████████| 1.52G/1.52G [13:51<00:00, 1.83MB/s] Traceback (most recent call last):42-47.tar: 41%|██████████████████████████████████████▌ | 629M/1.52G [08:13<09:54, 1.50MB/s] File "<stdin>", line 1, in <module>47.tar: 64%|███████████████████████████████████████████████████████████ | 965M/1.52G [10:56<05:20, 1.73MB/s] File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn | 1.12G/1.52G [12:12<03:07, 2.12MB/s] return fn(*args, **kwargs)-15-42-47.tar: 90%|███████████████████████████████████████████████████████████████████████████████████▏ | 1.37G/1.52G [13:35<00:47, 3.02MB/s] File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/huggingface_hub/_snapshot_download.py", line 308, in snapshot_download█████| 1.52G/1.52G [13:51<00:00, 11.3MB/s] thread_map(a74__2023-04-22-16-10-37.tar: 100%|██████████████████████████████████████████████████████████████████████████████████████████| 9.81G/9.81G [1:34:15<00:00, 9.35MB/s] File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/tqdm/contrib/concurrent.py", line 94, in thread_map return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs) File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/tqdm/contrib/concurrent.py", line 76, in _executor_map return list(tqdm_class(ex.map(fn, *iterables, **map_args), **kwargs)) File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/tqdm/std.py", line 1195, in __iter__ for obj in iterable: File "/home/danieljung0121/anaconda3/lib/python3.9/concurrent/futures/_base.py", line 609, in result_iterator yield fs.pop().result() File "/home/danieljung0121/anaconda3/lib/python3.9/concurrent/futures/_base.py", line 439, in result return self.__get_result() File "/home/danieljung0121/anaconda3/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result raise self._exception File "/home/danieljung0121/anaconda3/lib/python3.9/concurrent/futures/thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/huggingface_hub/_snapshot_download.py", line 283, in _inner_hf_hub_download return hf_hub_download( File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/huggingface_hub/utils/_validators.py", line 118, in _inner_fn return fn(*args, **kwargs) File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 1457, in hf_hub_download http_get( File "/home/danieljung0121/anaconda3/lib/python3.9/site-packages/huggingface_hub/file_download.py", line 552, in http_get raise EnvironmentError( OSError: Consistency check failed: file should be of size 6680616960 but has size 4801893217 ((…)ef0a80eed1a965b__2023-04-22-16-02-02.tar). We are sorry for the inconvenience. Please retry download and pass force_download=True, resume_download=False as argument. If the issue persists, please let us know by opening an issue on https://github.com/huggingface/huggingface_hub.

@kelvin34501
Copy link
Collaborator

I have a guess (which might not be the case): the HuggingFace download tool uses a cache system where files are by default first downloaded to .cache/huggingface/hub and a symlink is then generated at the target location.
If there is insufficient space in the HOME directory, there might be issues with the downloaded file size not matching.

I use the following script to validate the size of the tar, and it seems ok:

import requests
from huggingface_hub import get_hf_file_metadata, hf_hub_url

url = hf_hub_url(repo_id="kelvin34501/OakInk-v2", repo_type="dataset", filename="data/scene_01__O001++seq__97fc3dab0aa577c392a5__2023-04-13-20-53-02.tar")
metadata = get_hf_file_metadata(url)
print(metadata.size)

response = requests.head(metadata.location)
print(response.headers["Content-Length"])

I am currently trying to reproduce the download issue and it will take some time.
Perhaps you could check if you have the space issue mentioned above?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants