allreduce benchmark

AllReduce Benchmark

Minikube

Batch size : 64 Number of batches per task: 50 Dataset: cifar10 and image size is (32, 32, 3)

Worker resource: cpu=0.3,memory=2048Mi,ephemeral-storage=1024Mi

Resnet50

Resnet50 is a computation-intensive model and its trainable parameters number for cifar10 is 23,555,082.

Workers	computation/communication	Speed	Speedup Ratio
1	0%	3.1 images/s	1
2	10: 1	5.65 images/s	1.82

MobileNetV2

MobileNetV2 is a communication-intensive model and its trainable parameters number of MoblieNetV2 is 2,236,682.

Workers	computation/communication	Speed	Speedup Ratio
1	-	29 images/s	1
2	10: 3	44.7 images/s	1.54
3	10: 6	57.2 images/s	1.97

ASI

CPU only

Worker resource: cpu=4,memory=8192Mi,ephemeral-storage=1024Mi

MobileNetV2

Workers	communication	Speed	Speedup Ratio
1	0%	353.6 images/s	1
2	24%	503 images/s	1.42
4	44.7%	680 images/s	1.92
8	66.7%	648 images/s	1.83

Resnet50

Workers	communication	Speed	Speedup Ratio
1	0%	26.7 images/s	1
2	18%	41 images/s	1.57
4	25%	68.4 images/s	2.56
8	32%	123 images/s	4.61

GPU

Data: ImageNet shape (256, 256, 3) mini-batch size : 64

A task per 16 minibatches

MobileNetV2

1024 images/task

Workers	speed	total task time	allreduce time	tensor.numpy() time	apply_gradients
1 (local)	169 images/s	6.06s	-	-	5.59s
2	246 images/s	8.34s	7.25026	5.79s	0.6s
4	401 images/s	10.2029s	8.9s	5.78s	0.71s

Resnet50

Workers	speed	total task time	allreduce time	tensor.numpy() time	apply_gradients
1 (local)	168 images/s	6.1s	-	-	4.16s
2	148 images/s	13.76s	10.36s	5.04s	1.35s
4	228 images/s	18s	14.67s	5.14s	1.30s

Compression model with Conv2DTranspose

Workers	speed	total task time	allreduce time	tensor.numpy() time	apply_gradients
1 (local)	109 images/s	9.36s	-	-	8.95s
2	176 images/s	11.65s	1.47s	9.36s	0.42s
4	328 images/s	12.47s	2.44s	9.32s	0.37s

allreduce benchmark

AllReduce Benchmark

Minikube

Resnet50

MobileNetV2

ASI

CPU only

GPU

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally