LSUN download Data #2748

AniTho · 2020-10-03T11:03:46Z

Added the function to download LSUN data from the site, extract it and then delete zip file

pmeier · 2020-10-03T12:00:50Z

Closes #2746. @AniTho I'll review in 1-2 days.

pmeier

Thanks a lot for the PR @AniTho! I have a few comments how to make it more robust. Furthermore, the linter is failing with

./torchvision/datasets/lsun.py:97:1: W293 blank line contains whitespace

Could you fix that?

pmeier · 2020-10-04T17:55:05Z

torchvision/datasets/lsun.py

@@ -82,6 +82,7 @@ def __init__(
        # for each class, create an LSUNClassDataset
        self.dbs = []
        for c in self.classes:
+            self._download(c)


The download should be optional. Other datasets have a download=True flag in the constructor that indicates whether the download should happen or not.

Okay I will make the changes.
Thanks for your support and guidance.

pmeier · 2020-10-04T17:57:12Z

torchvision/datasets/lsun.py

+        if not(os.path.isfile(os.path.join(self.root, file_name))):
+            download_url(url, self.root, file_name)
+        print("Extracting File")
+        extract_archive(file_path, self.root, True)
+        print("Done!!!")


You can make this simpler by using

vision/torchvision/datasets/utils.py

Lines 242 to 249 in 588f7ae

def download_and_extract_archive(

url: str,

download_root: str,

extract_root: Optional[str] = None,

filename: Optional[str] = None,

md5: Optional[str] = None,

remove_finished: bool = False,

) -> None:

pmeier · 2020-10-04T17:59:51Z

torchvision/datasets/lsun.py

@@ -93,6 +94,16 @@ def __init__(
            self.indices.append(count)

        self.length = count
+
+    def _download(self, c):
+        url = 'http://dl.yf.io/lsun/scenes/{}'.format(c + '_lmdb.zip')


Before the download you could check if the lmdb file is present and if its MD5 checks out with

vision/torchvision/datasets/utils.py

Lines 38 to 43 in 588f7ae

def check_integrity(fpath: str, md5: Optional[str] = None) -> bool:

if not os.path.isfile(fpath):

return False

if md5 is None:

return True

return check_md5(fpath, md5)

If that is the case you don't need to download anything.

@pmeier Do I have to find md5 hash, if yes how? otherwise I can just directly check if the file exists or not

Do I have to find md5 hash [...] otherwise I can just directly check if the file exists or not

Although not terribly common, download errors happen. Without a checksum, you have no idea if the file maybe is corrupt. Thus it we prefer to have a MD5 checksum for most files. In your case that would mean for the .zip and .lmdb files of every class.

if yes how?

You can download and extract all archives with the download logic you already have. Afterwards you can use a variety of tools. Depending on your setup you might have md5sum already installed. If you don't want additional software, you can use torchvision.datasets.utils.calculate_md5

vision/torchvision/datasets/utils.py

Line 26 in de90862

def calculate_md5(fpath: str, chunk_size: int = 1024 * 1024) -> str:

codecov · 2020-10-06T17:24:53Z

Codecov Report

Merging #2748 into master will increase coverage by 0.07%.
The diff coverage is 93.44%.

@@            Coverage Diff             @@
##           master    #2748      +/-   ##
==========================================
+ Coverage   73.05%   73.12%   +0.07%     
==========================================
  Files          96       96              
  Lines        8298     8317      +19     
  Branches     1291     1293       +2     
==========================================
+ Hits         6062     6082      +20     
  Misses       1838     1838              
+ Partials      398      397       -1

Impacted Files	Coverage Δ
torchvision/transforms/functional.py	`80.36% <33.33%> (-1.77%)`	⬇️
torchvision/io/image.py	`86.00% <80.00%> (+6.00%)`	⬆️
torchvision/transforms/functional_tensor.py	`72.54% <97.67%> (+2.46%)`	⬆️
torchvision/io/__init__.py	`100.00% <100.00%> (ø)`
torchvision/transforms/functional_pil.py	`66.19% <100.00%> (+0.97%)`	⬆️
torchvision/transforms/transforms.py	`80.92% <100.00%> (+0.03%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update a9a8220...c0188c8. Read the comment docs.

pmeier · 2022-04-07T08:57:21Z

Closing this since it is stale and the prototype version of LSUN in #5390 includes downloads.

LSUN download Data

c0188c8

pmeier requested changes Oct 4, 2020

View reviewed changes

pmeier closed this Apr 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LSUN download Data #2748

LSUN download Data #2748

Uh oh!

AniTho commented Oct 3, 2020

Uh oh!

pmeier commented Oct 3, 2020

Uh oh!

pmeier left a comment

Uh oh!

pmeier Oct 4, 2020

Uh oh!

AniTho Oct 4, 2020

Uh oh!

pmeier Oct 4, 2020

Uh oh!

pmeier Oct 4, 2020

Uh oh!

AniTho Oct 6, 2020

Uh oh!

pmeier Oct 6, 2020

Uh oh!

codecov bot commented Oct 6, 2020 •

edited

Loading

Uh oh!

pmeier commented Apr 7, 2022

Uh oh!

Uh oh!

	def download_and_extract_archive(
	url: str,
	download_root: str,
	extract_root: Optional[str] = None,
	filename: Optional[str] = None,
	md5: Optional[str] = None,
	remove_finished: bool = False,
	) -> None:

	def check_integrity(fpath: str, md5: Optional[str] = None) -> bool:
	if not os.path.isfile(fpath):
	return False
	if md5 is None:
	return True
	return check_md5(fpath, md5)

LSUN download Data #2748

LSUN download Data #2748

Uh oh!

Conversation

AniTho commented Oct 3, 2020

Uh oh!

pmeier commented Oct 3, 2020

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

pmeier Oct 4, 2020

Choose a reason for hiding this comment

Uh oh!

AniTho Oct 4, 2020

Choose a reason for hiding this comment

Uh oh!

pmeier Oct 4, 2020

Choose a reason for hiding this comment

Uh oh!

pmeier Oct 4, 2020

Choose a reason for hiding this comment

Uh oh!

AniTho Oct 6, 2020

Choose a reason for hiding this comment

Uh oh!

pmeier Oct 6, 2020

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Oct 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

pmeier commented Apr 7, 2022

Uh oh!

Uh oh!

codecov bot commented Oct 6, 2020 •

edited

Loading