The implementation for MLSys 2023 paper: "Cuttlefish: Low-rank Model Training without All The Tuning"
Training low-rank neural network models has been recently shown to reduce the total number of trainable parameters, while maintaining predictive accuracy, resulting in end-to-end speedups. The catch, however, is that several extra hyper-parameters must be finely tuned, such as determining the rank of the factorization at each layer. In this work, we overcome this issue and propose Cuttlefish, an automated low-rank training method that does not require tuning the factorization hyper-parameters. Cuttlefish leverages the observation that after a few epochs of full-rank training, the stable rank of each layer (i.e., an approximation to the true rank), converges to a constant. Cuttlefish switches from full-rank to low-rank training when the stable ranks of all layers have converged, while setting the dimension of each factorization to the respective stable rank. We show that this leads to 4.2
- PyTorch 1.6.0
- CUDA 11.0
(configured with Docker container nvcr.io/nvidia/pytorch:20.07-py3)
- PyTorch 1.11.0
- CUDA 11.6
- Hugging Face 4.17.0.dev0 (configured with Docker container nvcr.io/nvidia/pytorch:22.01-py3; for BERT fine-tuning results)
We also provide the public Amazom EC2 AMI - ami-05c0b3732203032b3 (in the region of US West (Oregon)) for your convenience where the ImageNet dataset, Docker environments, and etc are ready.
(On a machine with Docker installed)
sudo docker run -t -d --gpus all --shm-size 8G --name cuttlefish -v YOUR-LOCAL-PATH:/workspace nvcr.io/nvidia/pytorch:20.07-py3
sudo docker run -t -d --gpus all --shm-size 8G --name cuttlefish-bert -v YOUR-LOCAL-PATH:/workspace nvcr.io/nvidia/pytorch:22.01-py3
docker exec -it cuttlefish bash # enter the docker interactive env
git clone https://github.com/hwang595/Cuttlefish.git
cd Cuttlefish
bash install.sh
docker exec -it cuttlefish-bert bash # enter the docker interactive env
git clone https://github.com/hwang595/Cuttlefish.git
cd Cuttlefish/transformers
bash install.sh
Users who do not have access to our Amazom EC2 AMI can also download ImageNet (ILSVRC2012) directly from the ImageNet website (registration and login are required before downloading).
After downloading, one will get two zip files ILSVRC2012_img_train.tar
and ILSVRC2012_img_val.tar
(yes, there is also a testset, but people tend not to use it). Then one can use this script adapted from PyTorch examples to extract the ImageNet dataset (please make sure to copy the extract_ILSVRC.sh
script to your data directory).
After running the script, the overall folder structure will look like:
/workspace/ILSVRC2012/train/
├── n01440764
│ ├── n01440764_10026.JPEG
│ ├── n01440764_10027.JPEG
│ ├── ......
├── ......
/workspace/ILSVRC2012/val/
├── n01440764
│ ├── ILSVRC2012_val_00000293.JPEG
│ ├── ILSVRC2012_val_00002138.JPEG
│ ├── ......
├── ......
To run Cuttlefish on ResNet-18+CIFAR-10 Task, just simply run (without changing any script modification)
docker exec -it cuttlefish bash
cd Cuttlefish/scripts # this step is important
bash run_main.sh # you need to run `run_main.sh` under the Cuttlefish/scripts dir
The code will run until the network is trained (for 300 epochs), and at the end you should be able to see something like
INFO:root:### Epoch: 298, Current effective lr: 0.008
INFO:root:Epoch: 298, lowrank training ...
INFO:root:Train Epoch: 298 [0/50000 (0%)] Loss: 0.000547
INFO:root:Train Epoch: 298 [40960/50000 (82%)] Loss: 0.001204
INFO:root:####### Comp Time Cost for Epoch: 298 is 7.750053878784178, os time: 8.45917797088623
INFO:root:
Epoch: 298, Test set: Average loss: 0.0011, Accuracy: 9461/10000 (94.61%)
INFO:root:### Epoch: 299, Current effective lr: 0.008
INFO:root:Epoch: 299, lowrank training ...
INFO:root:Train Epoch: 299 [0/50000 (0%)] Loss: 0.000213
INFO:root:Train Epoch: 299 [40960/50000 (82%)] Loss: 0.000243
INFO:root:####### Comp Time Cost for Epoch: 299 is 7.7537792053222665, os time: 8.437126636505127
INFO:root:
Epoch: 299, Test set: Average loss: 0.0011, Accuracy: 9466/10000 (94.66%)
INFO:root:Comp-Time: 2513.8327392120354
INFO:root:Best-Val-Acc: 94.66
docker exec -it cuttlefish bash
cd Cuttlefish/scripts
bash run_main.sh
The script run_main.sh
supports Cuttlefish, vanilla full-rank training, and Pufferfish (MLSys21). Example scripts are provided below:
Cuttlefish; ResNet-18; CIFAR-10 (Frobenius Decay on; Extra BNs off)
#!/bin/bash
cd ..
SEED=0
TRIAL=0
EPOCHS=300
DATASET=cifar10
MODEL=resnet18
CUDA_VISIBLE_DEVICES=0 python main.py \
--arch=${MODEL} \
--mode=lowrank \
--rank-est-metric=scaled-stable-rank \
--dataset=${DATASET} \
--batch-size=1024 \
--epochs=${EPOCHS} \
--full-rank-warmup=True \
--fr-warmup-epoch=$((EPOCHS + 1)) \
--seed=${SEED} \
--lr=0.1 \
--frob-decay=True \
--extra-bns=False \
--resume=False \
--evaluate=False \
--scale-factor=8 \
--lr-warmup-epochs=5 \
--ckpt_path=./checkpoint/resnet18_best.pth \
--momentum=0.9
Pufferfish; ResNet-18; CIFAR-10
SEED=0
TRIAL=0
EPOCHS=300
DATASET=cifar10
MODEL=resnet18
WARMUP_EPOCH=80
CUDA_VISIBLE_DEVICES=0 python main.py \
--arch=${MODEL} \
--mode=pufferfish \
--rank-est-metric=scaled-stable-rank \ # this is not effective for Pufferfish
--dataset=${DATASET} \
--batch-size=1024 \
--epochs=${EPOCHS} \
--full-rank-warmup=True \
--fr-warmup-epoch=${WARMUP_EPOCH} \
--seed=${SEED} \
--lr=0.1 \
--frob-decay=False \
--extra-bns=True \ # extra BNs are always enabled in Pufferfish
--resume=False \
--evaluate=False \
--scale-factor=8 \
--lr-warmup-epochs=5 \
--ckpt_path=./checkpoint/resnet18_best.pth \
--momentum=0.9
Vanilla Full-rank; ResNet-18; CIFAR-10
SEED=0
TRIAL=0
EPOCHS=300
DATASET=cifar10
MODEL=resnet18
WARMUP_EPOCH=301 # we can directly train vanilla model by setting warmup epoch
CUDA_VISIBLE_DEVICES=0 python main.py \
--arch=${MODEL} \
--mode=pufferfish \
--rank-est-metric=scaled-stable-rank \ # this is not effective for Pufferfish
--dataset=${DATASET} \
--batch-size=1024 \
--epochs=${EPOCHS} \
--full-rank-warmup=True \
--fr-warmup-epoch=${WARMUP_EPOCH} \
--seed=${SEED} \
--lr=0.1 \
--frob-decay=False \
--extra-bns=True \ # extra BNs are always enabled in Pufferfish
--resume=False \
--evaluate=False \
--scale-factor=8 \
--lr-warmup-epochs=5 \
--ckpt_path=./checkpoint/resnet18_best.pth \
--momentum=0.9
docker exec -it cuttlefish bash
cd Cuttlefish/cuttlefish_deit
bash run.sh
docker exec -it cuttlefish bash
cd Cuttlefish/scripts
bash run_cuttlefish_imagenet.sh
# or
bash run_pufferfish_imagenet.sh
docker exec -it cuttlefish-bert bash
cd Cuttlefish/transformers/examples/pytorch/text-classification
bash run.sh # for Cuttlefish experiments
bash run_vanilla.sh # for vanilla BERT fine-tuning
bash run_distill_bert.sh # for distill BERT
bash run_tiny_bert.sh # for tiny BERT
Cuttlefish leverages tiny and lightweight benchmarking to determine the selection of hyper-parameter
docker exec -it cuttlefish bash
cd Cuttlefish/scripts
bash run_cifar_block_benchmark.sh
where --rank-ratio
can be modified to be, e.g., 2, 4, 8, 16 (for 1/2, 1/4, 1/8, 1/16 of the experimented rank ratios). If --rank-ratio
is set to be 0.0
, then full-rank network will be used for benchmarking.
We compared Cuttlefish with many popular baseline methods. We also provide code we used to replicate their results.
For instance, to run the method "SI&FD" (a low-rank training method with Spectural Initialization and Frobenius Decay) one can do:
docker exec -it cuttlefish bash
# for ResNet-18 on CIFAR-10
cd Cuttlefish/baselines/fnl_cuttlefish_baseline/pytorch_resnet_cifar10
bash run_resnet18.sh # with modifications on `--rank-scale`
# for VGG-19 on CIFAR-10
cd Cuttlefish/baselines/fnl_cuttlefish_baseline/EigenDamage-Pytorch
bash run_vgg19.sh # with modifications on `--target-ratio`
To run "XNOR-Net" one can do
docker exec -it cuttlefish bash
cd Cuttlefish/baselines/XNOR_CIFAR10
bash run_xnor_net.sh
To run GraSP one can do
docker exec -it cuttlefish bash
cd Cuttlefish/baselines/GraSP
# pre-prune resnet50 on ImageNet
bash prune_imagenet.sh
# finetuning pruned resnet50
bash finetune_imagenet.sh
We also leverage the great code bases, e.g., open_lth and LC-model-compression for our baseline comparisons.
If you found the code/scripts here are useful to your work, please cite Pufferfish by
@inproceedings{wang2023cuttlefish,
title={Cuttlefish: Low-rank Model Training without All The Tuning},
author={Wang, Hongyi and Agarwal, Saurabh and U-chupala, Pongsakorn and Tanaka, Yoshiki and Xing, Eric and Papailiopoulos, Dimitris},
journal={MLSys},
year={2023}
}