You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|[bert-large-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_large_qa_squad11_amp/files)| Large model fine-tuned on SQuAD v1.1 |
242
-
|[bert-large-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_large_ft_sst2_amp)|Large model fine-tuned on GLUE SST-2 |
242
+
|[bert-large-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_large_ft_sst2_amp)|Large model fine-tuned on GLUE SST-2 |
243
243
|[bert-large-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_large_pretraining_amp_lamb/files?version=20.03.0)| Large model pretrained checkpoint on Generic corpora like Wikipedia|
244
244
|[bert-base-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_base_qa_squad11_amp/files)| Base model fine-tuned on SQuAD v1.1 |
245
245
|[bert-base-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_base_ft_sst2_amp_128/files)| Base model fine-tuned on GLUE SST-2 |
246
246
|[bert-base-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_base_pretraining_amp_lamb/files)| Base model pretrained checkpoint on Generic corpora like Wikipedia. |
247
-
|[bert-dist-4L-288D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distilled_4l_288d_qa_squad11_amp/files)| 4 layer distilled model fine-tuned on SQuAD v1.1 |
248
-
|[bert-dist-4L-288D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distilled_4l_288d_ft_sst2_amp/files)| 4 layer distilled model fine-tuned on GLUE SST-2 |
249
-
|[bert-dist-4L-288D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distilled_4l_288d_pretraining_amp/files)| 4 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
250
-
|[bert-dist-6L-768D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distill_6l_768d_3072di_12h_squad/files)| 6 layer distilled model fine-tuned on SQuAD v1.1 |
251
-
|[bert-dist-6L-768D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distill_6l_768d_3072di_12h_sst2/files)| 6 layer distilled model fine-tuned on GLUE SST-2 |
252
-
|[bert-dist-6L-768D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distill_6l_768d_3072di_12h_p2/files)| 6 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
247
+
|[bert-dist-4L-288D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_4l_288d_qa_squad11_amp/files)| 4 layer distilled model fine-tuned on SQuAD v1.1 |
248
+
|[bert-dist-4L-288D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_4l_288d_ft_sst2_amp/files)| 4 layer distilled model fine-tuned on GLUE SST-2 |
249
+
|[bert-dist-4L-288D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_4l_288d_pretraining_amp/files)| 4 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
250
+
|[bert-dist-6L-768D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_6l_768d_qa_squad11_amp/files)| 6 layer distilled model fine-tuned on SQuAD v1.1 |
251
+
|[bert-dist-6L-768D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_6l_768d_ft_sst2_amp/files)| 6 layer distilled model fine-tuned on GLUE SST-2 |
252
+
|[bert-dist-6L-768D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_6l_768d_pretraining_amp/files)| 6 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
Copy file name to clipboardExpand all lines: PyTorch/LanguageModeling/BERT/distillation/README.md
+13-11
Original file line number
Diff line number
Diff line change
@@ -20,8 +20,8 @@ bash run_e2e_distillation.sh
20
20
`run_e2e_distillation.sh` contains 8 command lines to obtain fully distilled BERT models for SQuADv1.1 and SST-2. The distilled BERT model has a config (N=4, D=312, Di=1200 , H=12). To distill knowledge into models of different sizes, a new `BERT_4L_312D/config.json` can be created and passed as a starting point in `run_e2e_distillation.sh`
21
21
22
22
`run_e2e_distillation.sh` contains the following:
23
-
- Generic distillation on Wikipedia and BooksCorpus dataset(BooksCorpus is optional) of maximum sequence length 128. `--input_dir` needs to be update respectively.
24
-
- Generic distillation on Wikipedia and BooksCorpus dataset(BooksCorpus is optional) of maximum sequence length 512. `--input_dir` needs to be update respectively.
23
+
-Phase1 distillation: Generic distillation on Wikipedia dataset of maximum sequence length 128. `--input_dir` needs to be update respectively.
24
+
-Phase2 distillation: Generic distillation on Wikipedia dataset of maximum sequence length 512. `--input_dir` needs to be update respectively.
25
25
26
26
*Task specific distillation: SQuAD v1.1* (maximum sequence length 384)
Note: Distillation for SST-2 uses as output of step 1. as starting point in 7, whereas SQuaD v1.1 uses output of step 2. as a starting point in 4.
38
+
Note: Task specific distillation for SST-2 uses as output checkpoint of phase1 distillation as starting point, whereas task specific distillation of SQuAD v1.1 uses output checkpoint of phase2 distillation as a starting point.
39
39
40
40
One can download different general and task-specific distilled checkpoints from NGC:
|[bert-dist-4L-288D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distilled_4l_288d_qa_squad11_amp/files)| 4 layer distilled model fine-tuned on SQuAD v1.1 |
44
-
|[bert-dist-4L-288D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distilled_4l_288d_ft_sst2_amp/files)| 4 layer distilled model fine-tuned on GLUE SST-2 |
45
-
|[bert-dist-4L-288D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distilled_4l_288d_pretraining_amp/files)| 4 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
46
-
|[bert-dist-6L-768D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distill_6l_768d_3072di_12h_squad/files)| 6 layer distilled model fine-tuned on SQuAD v1.1 |
47
-
|[bert-dist-6L-768D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distill_6l_768d_3072di_12h_sst2/files)| 6 layer distilled model fine-tuned on GLUE SST-2 |
48
-
|[bert-dist-6L-768D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distill_6l_768d_3072di_12h_p2/files)| 6 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
43
+
|[bert-dist-4L-288D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_4l_288d_qa_squad11_amp/files)| 4 layer distilled model fine-tuned on SQuAD v1.1 |
44
+
|[bert-dist-4L-288D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_4l_288d_ft_sst2_amp/files)| 4 layer distilled model fine-tuned on GLUE SST-2 |
45
+
|[bert-dist-4L-288D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_4l_288d_pretraining_amp/files)| 4 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
46
+
|[bert-dist-6L-768D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_6l_768d_qa_squad11_amp/files)| 6 layer distilled model fine-tuned on SQuAD v1.1 |
47
+
|[bert-dist-6L-768D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_6l_768d_ft_sst2_amp/files)| 6 layer distilled model fine-tuned on GLUE SST-2 |
48
+
|[bert-dist-6L-768D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_6l_768d_pretraining_amp/files)| 6 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
49
49
50
50
51
+
Following results were obtained on NVIDIA DGX-1 with 32G on pytorch:20.12-py3 NGC container.
52
+
51
53
*Accuracy achieved and E2E time to train on NVIDIA DGX-1 With 32G:*
52
54
53
55
| Student | Task | SubTask | Time(hrs) | Total Time (hrs)| Accuracy | BERT Base Accuracy |
*[NVIDIA Ampere](https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/), [Volta](https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/) or [Turing](https://www.nvidia.com/en-us/geforce/turing/) based GPU
75
75
@@ -93,6 +93,3 @@ and [HPC](https://developer.nvidia.com/hpc-application-performance) benchmarks.
0 commit comments