Repeated Augmentation and its effects #1206

vbvg2008 · 2022-04-03T15:29:45Z

vbvg2008
Apr 3, 2022

Hi all,

I wonder if anyone has given any thought about repeated augmentation and its effect on the training.

From a look at the code, if using repeated augmentation (say 3), then the number of samples we train per epoch is extended 3 times. In other words, each epoch will become 3 times longer.

As a result, when setting --aug-repeats 3 and train for 300 epochs (such as the A2 in Resnet Strikes Back paper), we are in fact training 900 effective epochs.

Can anyone comment on whether the benefit of repeated augmentation is because of itself, or simply because it is training n times longer?

Answered by hankyul2

Apr 4, 2022

Hi @vbvg2008

Benefit of repeated augmentation is from itself. Repeated augmentation does not extend training epoch. RepeatAugSampler only select 1/aug_repeat samples from whole dataset and repeat each sample for aug_repeat times so that each gpu could train model on same samples.

In this code, the number of selected samples for each gpu is len(dataset)/num_gpu, which means that the number of samples does not change at all.

For better understanding, see below code with comments.

class RepeatAugSampler(Sampler):
    def __init__(self, ...):
        ...
        selected_ratio = dist.get_worl_size() # selected_ratio = number of gpu
        self.num_selected_samples = len(self.datasets) / sele…

View full answer

hankyul2 · 2022-04-04T01:19:35Z

hankyul2
Apr 4, 2022

Hi @vbvg2008

Benefit of repeated augmentation is from itself. Repeated augmentation does not extend training epoch. RepeatAugSampler only select 1/aug_repeat samples from whole dataset and repeat each sample for aug_repeat times so that each gpu could train model on same samples.

In this code, the number of selected samples for each gpu is len(dataset)/num_gpu, which means that the number of samples does not change at all.

For better understanding, see below code with comments.

class RepeatAugSampler(Sampler):
    def __init__(self, ...):
        ...
        selected_ratio = dist.get_worl_size() # selected_ratio = number of gpu
        self.num_selected_samples = len(self.datasets) / selected_ratio # num_selected_samples = len(dataset) / (number of gpu)
    def __iter__(self):
        ...
        return iter(indices[:self.num_selected_samples]) # whatever indices is, it is sliced as long as num_selected_samples

I want this answer can help you.

Thank you.

hankyul

2 replies

vbvg2008 Apr 4, 2022
Author

@hankyul2 Hi, thank you so much for the answer.

I see, so despite the total number of samples are extended n times by repeated augmentation , but the self.num_selected_samples ensures the epoch length to be the same.

To further understand it, does this mean some samples may not be selected at all for a given epoch?

hankyul2 Apr 4, 2022

@vbvg2008 Yes, RepeatAugSampler only selects 1/aug_repeat samples and repeats selected sample aug_repeat times for each epoch, which means that unselected samples are not used in that epoch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repeated Augmentation and its effects #1206

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Repeated Augmentation and its effects #1206

Uh oh!

vbvg2008 Apr 3, 2022

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

hankyul2 Apr 4, 2022

Uh oh!

vbvg2008 Apr 4, 2022 Author

Uh oh!

hankyul2 Apr 4, 2022

vbvg2008
Apr 3, 2022

Replies: 1 comment 2 replies

hankyul2
Apr 4, 2022

vbvg2008 Apr 4, 2022
Author