Skip to content

Commit 6ef9bb7

Browse files
Update gitbook documentation
1 parent ad9d98e commit 6ef9bb7

File tree

3 files changed

+54
-49
lines changed

3 files changed

+54
-49
lines changed

docs/gitbook/how-tos/avalanchedataset/README.md

+6-4
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,14 @@ description: Dealing with AvalancheDatasets
44

55
# AvalancheDataset
66

7-
The `AvalancheDataset` is an implementation of the PyTorch `Dataset` class that comes with many useful out-of-the-box functionalities. For most users, the _AvalancheDataset_ can be used as a plain PyTorch Dataset. For classification problems, `AvalancheDataset` return `x, y, t` elements (input, target, task label). However, the `AvalancheDataset` can be easily extended for any custom needs.
7+
The `AvalancheDataset` is an implementation of the PyTorch `Dataset` class that comes with many useful out-of-the-box functionalities. For most users, the *AvalancheDataset* can be used as a plain PyTorch Dataset. For classification problems, `AvalancheDataset` return `x, y, t` elements (input, target, task label). However, the `AvalancheDataset` can be easily extended for any custom needs.
88

9-
**A serie of **_**Mini How-Tos**_ will guide you through the functionalities of the _AvalancheDataset_ and its subclasses:
9+
**A serie of _Mini How-Tos_** will guide you through the functionalities of the *AvalancheDataset* and its subclasses:
10+
11+
- [AvalancheDatasets basics](https://avalanche.continualai.org/how-tos/avalanchedataset/avalanche-datasets)
12+
- [Advanced Transformations](https://avalanche.continualai.org/how-tos/avalanchedataset/advanced-transformations)
1013

11-
* [AvalancheDatasets basics](https://avalanche.continualai.org/how-tos/avalanchedataset/avalanche-datasets)
12-
* [Advanced Transformations](https://avalanche.continualai.org/how-tos/avalanchedataset/advanced-transformations)
1314

1415
```python
16+
1517
```

docs/gitbook/how-tos/avalanchedataset/avalanche-datasets.md

+24-26
Original file line numberDiff line numberDiff line change
@@ -2,47 +2,45 @@
22
description: Converting PyTorch Datasets to Avalanche Dataset
33
---
44

5-
# avalanche-datasets
6-
5+
# Avalanche Datasets
76
Datasets are a fundamental data structure for continual learning. Unlike offline training, in continual learning we often need to manipulate datasets to create streams, benchmarks, or to manage replay buffers. High-level utilities and predefined benchmarks already take care of the details for you, but you can easily manipulate the data yourself if you need to. These how-to will explain:
87

98
1. PyTorch datasets and data loading
109
2. How to instantiate Avalanche Datasets
1110
3. AvalancheDataset features
1211

1312
In Avalanche, the `AvalancheDataset` is everywhere:
13+
- The dataset carried by the `experience.dataset` field is always an *AvalancheDataset*.
14+
- Many benchmark creation functions accept *AvalancheDataset*s to create benchmarks.
15+
- Avalanche benchmarks are created by manipulating *AvalancheDataset*s.
16+
- Replay buffers also use `AvalancheDataset` to easily concanate data and handle transformations.
1417

15-
* The dataset carried by the `experience.dataset` field is always an _AvalancheDataset_.
16-
* Many benchmark creation functions accept _AvalancheDataset_s to create benchmarks.
17-
* Avalanche benchmarks are created by manipulating _AvalancheDataset_s.
18-
* Replay buffers also use `AvalancheDataset` to easily concanate data and handle transformations.
1918

2019
## 📚 PyTorch Dataset: general definition
2120

2221
In PyTorch, **a `Dataset` is a class** exposing two methods:
23-
24-
* `__len__()`, which returns the amount of instances in the dataset (as an `int`).
25-
* `__getitem__(idx)`, which returns the data point at index `idx`.
22+
- `__len__()`, which returns the amount of instances in the dataset (as an `int`).
23+
- `__getitem__(idx)`, which returns the data point at index `idx`.
2624

2725
In other words, a Dataset instance is just an object for which, similarly to a list, one can simply:
28-
29-
* Obtain its length using the Python `len(dataset)` function.
30-
* Obtain a single data point using the `x, y = dataset[idx]` syntax.
26+
- Obtain its length using the Python `len(dataset)` function.
27+
- Obtain a single data point using the `x, y = dataset[idx]` syntax.
3128

3229
The content of the dataset can be either loaded in memory when the dataset is instantiated (like the torchvision MNIST dataset does) or, for big datasets like ImageNet, the content is kept on disk, with the dataset keeping the list of files in an internal field. In this case, data is loaded from the storage on-the-fly when `__getitem__(idx)` is called. The way those things are managed is specific to each dataset implementation.
3330

3431
### Quick note on the IterableDataset class
35-
3632
A variation of the standard `Dataset` exist in PyTorch: the [IterableDataset](https://pytorch.org/docs/stable/data.html#iterable-style-datasets). When using an `IterableDataset`, one can load the data points in a sequential way only (by using a tape-alike approach). The `dataset[idx]` syntax and `len(dataset)` function are not allowed. **Avalanche does NOT support `IterableDataset`s.** You shouldn't worry about this because, realistically, you will never encounter such datasets (at least in torchvision). If you need `IterableDataset` let us know and we will consider adding support for them.
3733

38-
## How to Create an AvalancheDataset
3934

35+
## How to Create an AvalancheDataset
4036
To create an `AvalancheDataset` from a PyTorch you only need to pass the original data to the constructor as follows
4137

38+
4239
```python
4340
!pip install avalanche-lib
4441
```
4542

43+
4644
```python
4745
import torch
4846
from torch.utils.data.dataset import TensorDataset
@@ -60,6 +58,7 @@ avl_data = AvalancheDataset(torch_data)
6058

6159
The dataset is equivalent to the original one:
6260

61+
6362
```python
6463
print(torch_data[0])
6564
print(avl_data[0])
@@ -70,13 +69,13 @@ print(avl_data[0])
7069
most of the time, you can also use one of the utility function in [benchmark utils](https://avalanche-api.continualai.org/en/latest/benchmarks.html#utils-data-loading-and-avalanchedataset) that also add attributes such as class and task labels to the dataset. For example, you can create a classification dataset using `make_classification_dataset`.
7170

7271
Classification dataset
73-
74-
* returns triplets of the form \<x, y, t>, where t is the task label (which defaults to 0).
75-
* The wrapped dataset must contain a valid **targets** field.
72+
- returns triplets of the form <x, y, t>, where t is the task label (which defaults to 0).
73+
- The wrapped dataset must contain a valid **targets** field.
7674

7775
Avalanche provides some utility functions to create supervised classification datasets such as:
76+
- `make_tensor_classification_dataset` for tensor datasets
77+
all of these will automatically create the `targets` and `targets_task_labels` attributes.
7878

79-
* `make_tensor_classification_dataset` for tensor datasets all of these will automatically create the `targets` and `targets_task_labels` attributes.
8079

8180
```python
8281
from avalanche.benchmarks.utils import make_classification_dataset
@@ -90,9 +89,9 @@ sup_data = make_classification_dataset(torch_data, task_labels=tls)
9089
```
9190

9291
## DataLoader
93-
9492
Avalanche provides some [custom dataloaders](https://avalanche-api.continualai.org/en/latest/benchmarks.html#utils-data-loading-and-avalanchedataset) to sample in a task-balanced way or to balance the replay buffer and current data, but you can also use the standard pytorch `DataLoader`.
9593

94+
9695
```python
9796
from torch.utils.data.dataloader import DataLoader
9897

@@ -105,9 +104,9 @@ for x_minibatch, y_minibatch in my_dataloader:
105104
```
106105

107106
## Dataset Operations: Concatenation and SubSampling
108-
109107
While PyTorch provides two different classes for concatenation and subsampling (`ConcatDataset` and `Subset`), Avalanche implements them as dataset methods. These operations return a new dataset, leaving the original one unchanged.
110108

109+
111110
```python
112111
cat_data = avl_data.concat(avl_data)
113112
print(len(cat_data)) # 100 + 100 = 200
@@ -119,8 +118,9 @@ print(len(avl_data)) # 100, original data stays the same
119118
```
120119

121120
## Dataset Attributes
121+
AvalancheDataset allows to add attributes to datasets. Attributes are named arrays that carry some information that is propagated by concatenation and subsampling operations.
122+
For example, classification datasets use this functionality to manage class and task labels.
122123

123-
AvalancheDataset allows to add attributes to datasets. Attributes are named arrays that carry some information that is propagated by concatenation and subsampling operations. For example, classification datasets use this functionality to manage class and task labels.
124124

125125
```python
126126
tls = [0 for _ in range(100)] # one task label for each sample
@@ -142,16 +142,14 @@ print(cat_data.targets_task_labels.name, len(cat_data.targets_task_labels._data)
142142
Thanks to `DataAttribute`s, you can freely operate on your data (e.g. to manage a replay buffer) without losing class or task labels. This makes it easy to manage multi-task datasets or to balance datasets by class.
143143

144144
## Transformations
145-
146-
Most datasets from the _torchvision_ libraries (as well as datasets found "in the wild") allow for a `transformation` function to be passed to the dataset constructor. The support for transformations is not mandatory for a dataset, but it is quite common to support them. The transformation is used to process the X value of a data point before returning it. This is used to normalize values, apply augmentations, etcetera.
145+
Most datasets from the *torchvision* libraries (as well as datasets found "in the wild") allow for a `transformation` function to be passed to the dataset constructor. The support for transformations is not mandatory for a dataset, but it is quite common to support them. The transformation is used to process the X value of a data point before returning it. This is used to normalize values, apply augmentations, etcetera.
147146

148147
`AvalancheDataset` implements a very rich and powerful set of functionalities for managing transformation. You can learn more about it in the [Advanced Transformations How-To](https://avalanche.continualai.org/how-tos/avalanchedataset/advanced-transformations).
149148

150149
## Next steps
150+
With these notions in mind, you can start start your journey on understanding the functionalities offered by the AvalancheDatasets by going through the *Mini How-To*s.
151151

152-
With these notions in mind, you can start start your journey on understanding the functionalities offered by the AvalancheDatasets by going through the _Mini How-To_s.
153-
154-
Please refer to the [list of the _Mini How-To_s regarding AvalancheDatasets](https://avalanche.continualai.org/how-tos/avalanchedataset) for a complete list. It is recommended to start with the **"Creating AvalancheDatasets"** _Mini How-To_.
152+
Please refer to the [list of the *Mini How-To*s regarding AvalancheDatasets](https://avalanche.continualai.org/how-tos/avalanchedataset) for a complete list. It is recommended to start with the **"Creating AvalancheDatasets"** *Mini How-To*.
155153

156154
## 🤝 Run it on Google Colab
157155

0 commit comments

Comments
 (0)