Description
Description of issue
After following the tutorial to create a custom dataset (https://www.tensorflow.org/datasets/add_dataset) I get an error when trying to actually use it. The error dissappeared for no clear reason after I've built it again. Definition and build of dataset worked fine but there are unclarities and I think some clarifications would help.
Dataset
- custom segmentation dataset
- consists of two folders "images" and "annotations" and two text files containing image-lists/splits
Problems I've encountered
- I was only able to come up with a useful dataset definition after looking at similar definitions, e.g. https://github.com/tensorflow/datasets/blob/master/tensorflow_datasets/image_classification/oxford_iiit_pet.py. It was not clear to me however from the documentation (but maybe I'm just an idiot)
- This led to me using the deprecated SplitGenerator, which leads to errors when building. But this was easily converted to new style of declaration. A hint would be nice though.
- Part on "manual_dir" needs clarification. It sais to use "dl_manager.manual_dir", but not what to do with it (extract!) and use it. Also a hint towards TFDS CLI argument --manual_dir would be great
- Error when trying to use to dataset with tfds.load. Initially I got Error "google.protobuf.json_format.ParseError: Message type "tensorflow_datasets.DatasetInfo" has no field named "moduleName"". It dissapeared after building the dataset again however..
Setup
- Windows 10
- Anaconda 3 environment
- Python 3.6
- tensorflow-datasets 4.1.0
- tfds-nightly-4.1.0.dev202012260107
dataset definition
`"""dataset_seg dataset."""
import os
import tensorflow_datasets as tfds
from tensorflow.io.gfile import GFile
_DESCRIPTION = """
Dataset for image segmentation of microscope images
images: JPG, square shaped, zero-padded
annotations: PNG, color-indexed masks, 0=background, 1=foreground
"""
class DatasetSeg(tfds.core.GeneratorBasedBuilder):
"""DatasetBuilder for dataset_seg dataset."""
MANUAL_DOWNLOAD_INSTRUCTIONS = """
Register into https://example.org/login to get the data. Place the
file in the manual_dir/.
"""
VERSION = tfds.core.Version('1.0.0')
RELEASE_NOTES = {
'1.0.0': 'Initial release.',
}
def _info(self) -> tfds.core.DatasetInfo:
"""Returns the dataset metadata."""
# Specifies the tfds.core.DatasetInfo object
return tfds.core.DatasetInfo(
builder=self,
description=_DESCRIPTION,
features=tfds.features.FeaturesDict({
# These are the features of your dataset like images, labels ...
'image': tfds.features.Image(),
'label': tfds.features.ClassLabel(names=["foreground"]),
'file_name': tfds.features.Text(),
'segmentation_mask': tfds.features.Image(shape=(None, None, 1)),
}),
# If there's a common (input, target) tuple from the
# features, specify them here. They'll be used if
# as_supervised=True in builder.as_dataset.
supervised_keys=('image', 'label'), # Set to None to disable
homepage='https://dataset-homepage/',
)
def _split_generators(self, dl_manager: tfds.download.DownloadManager):
"""Returns SplitGenerators."""
# Downloads the data and defines the splits
data = dl_manager.manual_dir / "dataset_seg.zip"
data_extracted = dl_manager.extract(data)
images_path_dir = os.path.join(data_extracted, "images")
annotations_path_dir = os.path.join(data_extracted, "annotations")
# Setup train and test splits
return {
"train": self._generate_examples(images_path_dir,
annotations_path_dir,
os.path.join(data_extracted, "train.txt")),
"val": self._generate_examples(images_path_dir,
annotations_path_dir,
os.path.join(data_extracted, "val.txt")),
}
def _generate_examples(self, images_dir_path, annotations_path_dir, images_list_file):
with GFile(images_list_file, "r") as images_list:
for image_name_no_suffix in images_list:
mask_name = image_name_no_suffix.strip() + ".png"
image_name = image_name_no_suffix.strip() + ".jpg"
record = {
"image": os.path.join(images_dir_path, image_name),
"label": 0, # as of now there is only one label..
"file_name": image_name,
"segmentation_mask": os.path.join(annotations_path_dir, mask_name)
}
yield image_name, record`