Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 12 additions & 1 deletion torchvision/datasets/lsun.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
from collections.abc import Iterable
import pickle
from typing import Any, Callable, cast, List, Optional, Tuple, Union
from .utils import verify_str_arg, iterable_to_str
from .utils import verify_str_arg, iterable_to_str, download_url, extract_archive


class LSUNClass(VisionDataset):
Expand Down Expand Up @@ -82,6 +82,7 @@ def __init__(
# for each class, create an LSUNClassDataset
self.dbs = []
for c in self.classes:
self._download(c)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The download should be optional. Other datasets have a download=True flag in the constructor that indicates whether the download should happen or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I will make the changes.
Thanks for your support and guidance.

self.dbs.append(LSUNClass(
root=root + '/' + c + '_lmdb',
transform=transform))
Expand All @@ -93,6 +94,16 @@ def __init__(
self.indices.append(count)

self.length = count

def _download(self, c):
url = 'http://dl.yf.io/lsun/scenes/{}'.format(c + '_lmdb.zip')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before the download you could check if the lmdb file is present and if its MD5 checks out with

def check_integrity(fpath: str, md5: Optional[str] = None) -> bool:
if not os.path.isfile(fpath):
return False
if md5 is None:
return True
return check_md5(fpath, md5)

If that is the case you don't need to download anything.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pmeier Do I have to find md5 hash, if yes how? otherwise I can just directly check if the file exists or not

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I have to find md5 hash [...] otherwise I can just directly check if the file exists or not

Although not terribly common, download errors happen. Without a checksum, you have no idea if the file maybe is corrupt. Thus it we prefer to have a MD5 checksum for most files. In your case that would mean for the .zip and .lmdb files of every class.

if yes how?

You can download and extract all archives with the download logic you already have. Afterwards you can use a variety of tools. Depending on your setup you might have md5sum already installed. If you don't want additional software, you can use torchvision.datasets.utils.calculate_md5

def calculate_md5(fpath: str, chunk_size: int = 1024 * 1024) -> str:

file_name = c + '_lmdb.zip'
file_path = os.path.join(self.root, file_name)
if not(os.path.isfile(os.path.join(self.root, file_name))):
download_url(url, self.root, file_name)
print("Extracting File")
extract_archive(file_path, self.root, True)
print("Done!!!")
Comment on lines +102 to +106
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can make this simpler by using

def download_and_extract_archive(
url: str,
download_root: str,
extract_root: Optional[str] = None,
filename: Optional[str] = None,
md5: Optional[str] = None,
remove_finished: bool = False,
) -> None:


def _verify_classes(self, classes: Union[str, List[str]]) -> List[str]:
categories = ['bedroom', 'bridge', 'church_outdoor', 'classroom',
Expand Down