Skip to content

Commit eaf293a

Browse files
KhaleelKhananandtrexmarkschoeneKhaleelKhan
authored
Release candidate v0.2.0 (#10)
* Added some context for EvNN * AdamW, Simpler Thresholds (#5) * Switched to AdamW optimizer, simplified threshold parameterization, slight changes to the training of thresholds * removed wandb from training script * fixed inference script and updated README.md * Improved setup and install (#8) * improve setup and remove makefiles * remove makefile --------- authored-by: KhaleelKhan <[email protected]> * bump up version, update readme * include required files in distributed archive * only require nvcc to compile cuda kernels * cleaned LM code from pruning attempts * update changelog and prepare merge --------- Co-authored-by: Anand <[email protected]> Co-authored-by: Mark Schoene <[email protected]> Co-authored-by: KhaleelKhan <[email protected]>
1 parent 1c49161 commit eaf293a

17 files changed

+339
-506
lines changed

CHANGELOG.md

+9
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,14 @@
11
# ChangeLog
22

3+
## 0.2.0-egru (2024-05-24)
4+
### Changed
5+
- Simplified install and removed makefile
6+
- CUDA compute capability is automatically detected
7+
- Update Readme with the setup instruction
8+
- Update Dockerfile
9+
- Cleaned LM pruning code
10+
11+
312
## 0.1.0-egru (2022-03-01)
413
### Changed
514
- Project forked from original

build/MANIFEST.in MANIFEST.in

-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
include Makefile
21
include frameworks/pytorch/*.h
32
include frameworks/pytorch/*.cc
43
include lib/*.cc

Makefile

-66
This file was deleted.

README.md

+15-10
Original file line numberDiff line numberDiff line change
@@ -30,36 +30,41 @@ Here's what you'll need to get started:
3030
- a [CUDA Compute Capability](https://developer.nvidia.com/cuda-gpus) 3.7+ GPU (required only if using GPU)
3131
- [CUDA Toolkit](https://developer.nvidia.com/cuda-toolkit) 11.0+ (required only if using GPU)
3232
- [PyTorch](https://pytorch.org) 1.3+ for PyTorch integration (GPU optional)
33-
- [BLAS](https://netlib.org/blas/) or any BLAS-like library for CPU computation.
34-
- [Eigen 3](http://eigen.tuxfamily.org/) to build the C++ examples (optional)
33+
- [OpenBLAS](https://www.openblas.net/) or any BLAS-like library for CPU computation.
3534

3635
Once you have the prerequisites, you can install with pip or by building the source code.
3736

38-
<!-- ### Using pip
37+
### Using pip
3938
```
4039
pip install evnn_pytorch
41-
``` -->
40+
```
4241

4342
### Building from source
4443
> **Note**
4544
>
4645
> Currenty supported only on Linux, use Docker for building on Windows.
4746
47+
Build and install it with `pip`:
4848
```bash
49-
make evnn_pytorch # Build PyTorch API
49+
pip install .
5050
```
51+
### Building in Docker
5152

52-
If you built the PyTorch API, install it with `pip`:
53+
Build docker image:
5354
```bash
54-
pip install evnn_pytorch-*.whl
55+
docker build -t evnn -f docker/Dockerfile .
5556
```
5657

57-
If the CUDA Toolkit that you're building against is not in `/usr/local/cuda`, you must specify the
58-
`$CUDA_HOME` environment variable before running make:
58+
Example usage:
5959
```bash
60-
CUDA_HOME=/usr/local/cuda-10.2 make
60+
docker run --rm --gpus=all evnn python -m unittest discover -p "*_test.py" -s /evnn_src/validation -v
6161
```
6262

63+
> **Note**
64+
>
65+
> The build script tries to automatically detect GPU compute capability. In case the GPU is not available during compilation, for example when building with docker or when using compute cluster login nodes for compiling, Use enviroment variable `EVNN_CUDA_COMPUTE` to set the required compute capability.
66+
> Example: For CUDA Compute capability 8.0 use ```export EVNN_CUDA_COMPUTE=80```
67+
6368
## Performance
6469

6570
Code for the experiments and benchmarks presented in the paper are published in ``benchmarks`` directory.

benchmarks/lm/README.md

+17-5
Original file line numberDiff line numberDiff line change
@@ -4,18 +4,30 @@ To run the language modeling experiments, first download the data
44

55
./getdata <data_dir>
66

7-
Then run Penn Treebank experiments with EGRU (1350 units)
7+
We [provide checkpoints for EGRU](https://cloudstore.zih.tu-dresden.de/index.php/s/NPQ9pLnpZnTsM5X) with 3 layers of hidden size (1350, 1350, 750)
88

9-
python lm/train.py --data path_to_your_data --scratch ./log --dataset PTB --epochs 2500 --rnn_type egru --layers 3 --hidden_dim 1350 --batch_size=64 --bptt=68 --dropout_connect=0.6788113982442464 --dropout_emb=0.7069992062976298 --dropout_forward=0.2641540030663871 --dropout_words=0.05460274136214911 --emb_dim=788 --learning_rate=0.00044406742918918466 --pseudo_derivative_width=2.179414375864446 --thr_init_mean=-3.76855645544185 --weight_decay=9.005509348932795e-06 --seed 12008
9+
# Penn Treebank
10+
To train EGRU on Penn Treebank word-level language modeling, run
1011

11-
or EGRU (2000 units)
12+
python benchmarks/lm/train.py --data=/path/to/data --scratch=/your/scratch/directory/Experiments --dataset=PTB --epochs=1000 --batch_size=64 --rnn_type=egru --layer=3 --bptt=70 --scheduler=cosine --weight_decay=0.10 --learning_rate=0.0012 --learning_rate_thresholds 0.0 --emb_dim=750 --dropout_emb=0.6 --dropout_words=0.1 --dropout_forward=0.25 --grad_clip=0.1 --thr_init_mean=0.01 --dropout_connect=0.7 --hidden_dim=1350 --pseudo_derivative_width=3.6 --scheduler_start=700 --seed=9612
1213

13-
python lm/train.py --data path_to_your_data --scratch ./log --dataset PTB --epochs 2500 --rnn_type egru --layers 3 --hidden_dim 2000 --batch_size=128 --bptt=67 --dropout_connect=0.621405385527356 --dropout_emb=0.7651296208061924 --dropout_forward=0.24131807369801447 --dropout_words=0.14942681962154375 --emb_dim=786 --learning_rate=0.000494172266064804 --pseudo_derivative_width=2.35216907207571 --thr_init_mean=-3.4957794302256007 --weight_decay=6.6878095661652755e-06 --seed 52798
14+
For inference with the [provided checkpoint](https://cloudstore.zih.tu-dresden.de/index.php/s/NPQ9pLnpZnTsM5X), run
15+
16+
python benchmarks/lm/infer.py --data /path/to/data --dataset PTB --datasplit test --batch_size 1 --directory /path/to/checkpoint
17+
18+
# Wikitext-2
19+
To train EGRU on Wikitext-2, run
20+
21+
python benchmarks/lm/train.py --data=/your/data/directory --scratch=/your/scratch/directory/Experiments --dataset=WT2 --epochs=800 --batch_size=128 --rnn_type=egru --layer=3 --bptt=70 --scheduler=cosine --weight_decay=0.12 --learning_rate=0.001 --learning_rate_thresholds 0.0 --emb_dim=750 --dropout_emb=0.7 --dropout_words=0.1 --dropout_forward=0.25 --grad_clip=0.1 --thr_init_mean=0.01 --dropout_connect=0.7 --hidden_dim=1350 --pseudo_derivative_width=3.6 --scheduler_start=400 --seed=913420
22+
23+
For inference with the [provided checkpoint](https://cloudstore.zih.tu-dresden.de/index.php/s/NPQ9pLnpZnTsM5X), run
24+
25+
python benchmarks/lm/infer.py --data /path/to/data --dataset WT2 --datasplit test --batch_size 1 --directory /path/to/checkpoint
1426

1527
Various flags can be passed to change the defaults parameters.
1628
See "train.py" for a list of all available arguments.
1729

18-
This code was tested with PyTorch >= 1.9.0
30+
This code was tested with PyTorch >= 1.9.0, CUDA 11.
1931

2032
A large batch of code stems from Salesforce AWD-LSTM implementation:
2133
https://github.com/salesforce/awd-lstm-lm

benchmarks/lm/eval.py

+2-3
Original file line numberDiff line numberDiff line change
@@ -14,16 +14,15 @@
1414
# ==============================================================================
1515

1616
import torch
17-
import lm.data as d
17+
import data as d
1818

1919

20-
def evaluate(model, eval_data, criterion, batch_size, bptt, ntokens, device, return_hidden=False):
20+
def evaluate(model, eval_data, criterion, batch_size, bptt, ntokens, device, hidden_dims, return_hidden=False):
2121
# turn on evaluation mode
2222
model.eval()
2323

2424
# initialize evaluation metrics
2525
iter_range = range(0, eval_data.size(0) - 1, bptt)
26-
hidden_dims = [rnn.hidden_size for rnn in model.rnns]
2726

2827
total_loss = 0.
2928
mean_activities = torch.zeros(len(iter_range), dtype=torch.float16, device=device)

benchmarks/lm/infer.py

+11-57
Original file line numberDiff line numberDiff line change
@@ -23,9 +23,9 @@
2323
import torch
2424
import torch.nn
2525

26-
import lm.data as d
27-
from lm.models import LanguageModel
28-
from lm.eval import evaluate
26+
import data as d
27+
from models import LanguageModel
28+
from eval import evaluate
2929

3030

3131
def get_args():
@@ -37,7 +37,6 @@ def get_args():
3737
argparser.add_argument('--batch_size', type=int, default=80)
3838
argparser.add_argument('--directory', type=str, required=False, help='model directory for checkpoints and config')
3939
argparser.add_argument('--hidden', action='store_true', help='returns the hidden states of the whole dataset to perform analysis')
40-
argparser.add_argument('--prune', type=float, default=0.0)
4140

4241
return argparser.parse_args()
4342

@@ -85,14 +84,19 @@ def main(args):
8584
model = LanguageModel(**model_args).to(device)
8685
elif config['rnn_type'] == 'egru':
8786
model = LanguageModel(**model_args,
88-
dampening_factor=config['damp_factor'],
87+
dampening_factor=config['pseudo_derivative_width'],
8988
pseudo_derivative_support=config['pseudo_derivative_width']).to(device)
9089
else:
9190
raise RuntimeError("Unknown RNN type: %s" % config['rnn_type'])
9291

9392
best_model_path = os.path.join(args.directory, 'checkpoints', f"{config['rnn_type'].upper()}_best_model.cpt")
9493
model.load_state_dict(torch.load(best_model_path, map_location=device))
9594

95+
if model_args['rnn_type'] == 'egru':
96+
hidden_dims = [rnn.hidden_size for rnn in model.rnns]
97+
else:
98+
hidden_dims = [rnn.module.hidden_size if args.dropout_connect > 0 else rnn.hidden_size for rnn in model.rnns]
99+
96100
criterion = torch.nn.CrossEntropyLoss()
97101

98102
if args.hidden:
@@ -104,6 +108,7 @@ def main(args):
104108
bptt=config['bptt'],
105109
ntokens=vocab_size,
106110
device=device,
111+
hidden_dims=hidden_dims,
107112
return_hidden=True)
108113
save_file = os.path.join(args.directory, f'hidden_states_{args.datasplit}.hdf')
109114
with h5py.File(save_file, 'w') as f:
@@ -121,6 +126,7 @@ def main(args):
121126
bptt=config['bptt'],
122127
ntokens=vocab_size,
123128
device=device,
129+
hidden_dims=hidden_dims,
124130
return_hidden=False)
125131

126132
test_ppl = math.exp(test_loss)
@@ -131,58 +137,6 @@ def main(args):
131137
print(f'Layerwise activity {test_layerwise_activity_mean.tolist()} +- {test_layerwise_activity_std.tolist()}')
132138
print('=' * 89)
133139

134-
if args.prune > 0.0 and args.hidden:
135-
print(f"Model Parameter Count: {sum(p.numel() for p in model.parameters() if p.requires_grad)}")
136-
input_indices = torch.arange(model.rnns[0].input_size).to(device)
137-
for i in range(model.nlayers):
138-
if i < model.nlayers - 1:
139-
# get event frequencies
140-
hid_dim = all_hiddens[i].shape[2]
141-
hid_cells = all_hiddens[i].reshape(-1, hid_dim)
142-
seq_len = hid_cells.shape[0]
143-
spike_frequency = torch.sum(hid_cells != 0, dim=0) / seq_len
144-
print(
145-
f"Layer {i + 1}: "
146-
f"less than 1/100: {torch.sum(spike_frequency < 0.01)} / {spike_frequency.shape} "
147-
f"// never: {torch.sum(hid_cells.sum(dim=0) == 0)} / {spike_frequency.shape}")
148-
149-
# compute remaining indicies from spike frequencies
150-
topk = int(model.rnns[i].hidden_size * (1 - args.prune))
151-
hidden_indices, _ = torch.sort(torch.argsort(spike_frequency, descending=True)[:topk], descending=False)
152-
hidden_indices = hidden_indices.to(device)
153-
else:
154-
hidden_indices = torch.arange(model.rnns[i].hidden_size).to(device)
155-
model.rnns[i].prune_units(input_indices, hidden_indices)
156-
input_indices = hidden_indices
157-
158-
print(f"Model Parameter Count: {sum(p.numel() for p in model.parameters() if p.requires_grad)}")
159-
160-
test_loss, test_activity, test_layerwise_activity_mean, test_layerwise_activity_std, centered_cell_states, all_hiddens = \
161-
evaluate(model=model,
162-
eval_data=test_data,
163-
criterion=criterion,
164-
batch_size=args.batch_size,
165-
bptt=config['bptt'],
166-
ntokens=vocab_size,
167-
device=device,
168-
return_hidden=True)
169-
for i in range(model.nlayers - 1):
170-
# get event frequencies
171-
hid_dim = all_hiddens[i].shape[2]
172-
hid_cells = all_hiddens[i].reshape(-1, hid_dim)
173-
seq_len = hid_cells.shape[0]
174-
spike_frequency = torch.sum(hid_cells != 0, dim=0) / seq_len
175-
print(
176-
f"less than 1/100: {torch.sum(spike_frequency < 0.01)} / {spike_frequency.shape} "
177-
f"// never: {torch.sum(hid_cells.sum(dim=0) == 0)} / {spike_frequency.shape}")
178-
test_ppl = math.exp(test_loss)
179-
print('=' * 89)
180-
print(f'| Inference | test loss {test_loss:5.2f} | '
181-
f'test ppl {test_ppl:8.2f} | '
182-
f'test mean activity {test_activity}')
183-
print(f'Layerwise activity {test_layerwise_activity_mean.tolist()} +- {test_layerwise_activity_std.tolist()}')
184-
print('=' * 89)
185-
186140

187141
if __name__ == "__main__":
188142
args = get_args()

benchmarks/lm/models.py

+3-55
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,8 @@
1717
import torch.nn as nn
1818
import torch.nn.functional as F
1919
import evnn_pytorch as evnn
20-
from lm.modules import VariationalDropout, WeightDrop
21-
from lm.embedding_dropout import embedded_dropout
20+
from modules import VariationalDropout, WeightDrop
21+
from embedding_dropout import embedded_dropout
2222
from typing import Union
2323

2424

@@ -64,9 +64,8 @@ def forward(self, x):
6464
bs, seq_len, ninp = x.shape
6565
if self.project:
6666
x = x.view(-1, ninp)
67-
x = F.relu(self.projection(x))
67+
x = self.projection(x)
6868
x = x.view(bs, seq_len, self.nemb)
69-
x = self.variational_dropout(x, self.dropout)
7069
x = x.view(-1, self.nemb)
7170
x = self.decoder(x)
7271
return x
@@ -155,57 +154,6 @@ def __init__(self,
155154

156155
self.backward_sparsity = torch.zeros(len(self.rnns))
157156

158-
def prune_embeddings(self, index):
159-
device = next(self.parameters()).device
160-
self.embeddings.weight = nn.Parameter(
161-
self.embeddings.weight[:, index]).to(device)
162-
self.emb_dim = self.embeddings.weight.shape[1]
163-
self.decoder = Decoder(ninp=self.hidden_dim if self.projection else self.emb_dim, ntokens=self.vocab_size,
164-
project=self.projection, nemb=self.emb_dim,
165-
dropout=self.dropout_forward).to(device)
166-
self.decoder.decoder.weight = self.embeddings.weight
167-
168-
def prune(self, fractions, hiddens, device):
169-
# calculate new hidden dimensions
170-
indicies = [torch.arange(self.rnns[0].input_size).to(device)]
171-
172-
for i in range(self.nlayers):
173-
if isinstance(fractions, float):
174-
frac = fractions
175-
elif isinstance(fractions, tuple) or isinstance(fractions, list):
176-
frac = fractions[i]
177-
else:
178-
raise NotImplementedError(
179-
f"data type {type(fractions)} not implemented. Use float, tuple or list")
180-
181-
# get event frequencies
182-
hid_dim = hiddens[i].shape[2]
183-
hid_cells = hiddens[i].reshape(-1, hid_dim)
184-
seq_len = hid_cells.shape[0]
185-
spike_frequency = torch.sum(hid_cells != 0, dim=0) / seq_len
186-
print(
187-
f"Layer {i + 1}: "
188-
f"less than 1/100: {torch.sum(spike_frequency < 0.01)} / {spike_frequency.shape} "
189-
f"// never: {torch.sum(hid_cells.sum(dim=0) == 0)} / {spike_frequency.shape}")
190-
191-
# compute remaining indicies from spike frequencies
192-
topk = int(self.rnns[i].hidden_size * (1 - frac))
193-
hidden_indices, _ = torch.sort(torch.argsort(
194-
spike_frequency, descending=True)[:topk], descending=False)
195-
hidden_indices = hidden_indices.to(device)
196-
indicies.append(hidden_indices)
197-
198-
# input dimension equals embedding dimension for tied weights
199-
indicies[0] = indicies[-1]
200-
201-
# prune weights
202-
for i in range(self.nlayers):
203-
self.rnns[i].prune_units(indicies[i], indicies[i+1])
204-
205-
self.prune_embeddings(indicies[-1])
206-
print(
207-
f"Final model hidden size: {[rnn.hidden_size for rnn in self.rnns]}")
208-
209157
def init_embedding(self, initrange):
210158
nn.init.uniform_(self.embeddings.weight, -initrange, initrange)
211159

0 commit comments

Comments
 (0)