Skip to content

LSUN download Data #2748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

LSUN download Data #2748

wants to merge 1 commit into from

Conversation

AniTho
Copy link

@AniTho AniTho commented Oct 3, 2020

Added the function to download LSUN data from the site, extract it and then delete zip file

@pmeier
Copy link
Collaborator

pmeier commented Oct 3, 2020

Closes #2746. @AniTho I'll review in 1-2 days.

Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the PR @AniTho! I have a few comments how to make it more robust. Furthermore, the linter is failing with

./torchvision/datasets/lsun.py:97:1: W293 blank line contains whitespace

Could you fix that?

@@ -82,6 +82,7 @@ def __init__(
# for each class, create an LSUNClassDataset
self.dbs = []
for c in self.classes:
self._download(c)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The download should be optional. Other datasets have a download=True flag in the constructor that indicates whether the download should happen or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I will make the changes.
Thanks for your support and guidance.

Comment on lines +102 to +106
if not(os.path.isfile(os.path.join(self.root, file_name))):
download_url(url, self.root, file_name)
print("Extracting File")
extract_archive(file_path, self.root, True)
print("Done!!!")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this simpler by using

def download_and_extract_archive(
url: str,
download_root: str,
extract_root: Optional[str] = None,
filename: Optional[str] = None,
md5: Optional[str] = None,
remove_finished: bool = False,
) -> None:

@@ -93,6 +94,16 @@ def __init__(
self.indices.append(count)

self.length = count

def _download(self, c):
url = 'http://dl.yf.io/lsun/scenes/{}'.format(c + '_lmdb.zip')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the download you could check if the lmdb file is present and if its MD5 checks out with

def check_integrity(fpath: str, md5: Optional[str] = None) -> bool:
if not os.path.isfile(fpath):
return False
if md5 is None:
return True
return check_md5(fpath, md5)

If that is the case you don't need to download anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmeier Do I have to find md5 hash, if yes how? otherwise I can just directly check if the file exists or not

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I have to find md5 hash [...] otherwise I can just directly check if the file exists or not

Although not terribly common, download errors happen. Without a checksum, you have no idea if the file maybe is corrupt. Thus it we prefer to have a MD5 checksum for most files. In your case that would mean for the .zip and .lmdb files of every class.

if yes how?

You can download and extract all archives with the download logic you already have. Afterwards you can use a variety of tools. Depending on your setup you might have md5sum already installed. If you don't want additional software, you can use torchvision.datasets.utils.calculate_md5

def calculate_md5(fpath: str, chunk_size: int = 1024 * 1024) -> str:

@codecov
Copy link

codecov bot commented Oct 6, 2020

Codecov Report

Merging #2748 into master will increase coverage by 0.07%.
The diff coverage is 93.44%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #2748      +/-   ##
==========================================
+ Coverage   73.05%   73.12%   +0.07%     
==========================================
  Files          96       96              
  Lines        8298     8317      +19     
  Branches     1291     1293       +2     
==========================================
+ Hits         6062     6082      +20     
  Misses       1838     1838              
+ Partials      398      397       -1     
Impacted Files Coverage Δ
torchvision/transforms/functional.py 80.36% <33.33%> (-1.77%) ⬇️
torchvision/io/image.py 86.00% <80.00%> (+6.00%) ⬆️
torchvision/transforms/functional_tensor.py 72.54% <97.67%> (+2.46%) ⬆️
torchvision/io/__init__.py 100.00% <100.00%> (ø)
torchvision/transforms/functional_pil.py 66.19% <100.00%> (+0.97%) ⬆️
torchvision/transforms/transforms.py 80.92% <100.00%> (+0.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a9a8220...c0188c8. Read the comment docs.

@pmeier
Copy link
Collaborator

pmeier commented Apr 7, 2022

Closing this since it is stale and the prototype version of LSUN in #5390 includes downloads.

@pmeier pmeier closed this Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants