Skip to content

Commit 1c7f613

Browse files
sharathtsnv-kkudrynski
authored andcommitted
[BERT/PyT] Updated links to checkpoints and README
1 parent 690233c commit 1c7f613

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+930
-1324
lines changed

PyTorch/LanguageModeling/BERT/.dockerignore

+1
Original file line numberDiff line numberDiff line change
@@ -27,3 +27,4 @@ results/
2727
dask-worker-space/
2828
__pycache__
2929
distillation/__pycache__
30+
runner_workspace

PyTorch/LanguageModeling/BERT/README.md

+8-7
Original file line numberDiff line numberDiff line change
@@ -239,17 +239,18 @@ Find all trained and available checkpoints in the table below:
239239
| Model | Description |
240240
|------------------------|---------------------------------------------------------------------------|
241241
| [bert-large-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_large_qa_squad11_amp/files) | Large model fine-tuned on SQuAD v1.1 |
242-
| [bert-large-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_large_ft_sst2_amp) |Large model fine-tuned on GLUE SST-2 |
242+
| [bert-large-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_large_ft_sst2_amp) |Large model fine-tuned on GLUE SST-2 |
243243
| [bert-large-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_large_pretraining_amp_lamb/files?version=20.03.0) | Large model pretrained checkpoint on Generic corpora like Wikipedia|
244244
| [bert-base-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_base_qa_squad11_amp/files) | Base model fine-tuned on SQuAD v1.1 |
245245
| [bert-base-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_base_ft_sst2_amp_128/files) | Base model fine-tuned on GLUE SST-2 |
246246
| [bert-base-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_base_pretraining_amp_lamb/files) | Base model pretrained checkpoint on Generic corpora like Wikipedia. |
247-
| [bert-dist-4L-288D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distilled_4l_288d_qa_squad11_amp/files) | 4 layer distilled model fine-tuned on SQuAD v1.1 |
248-
| [bert-dist-4L-288D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distilled_4l_288d_ft_sst2_amp/files) | 4 layer distilled model fine-tuned on GLUE SST-2 |
249-
| [bert-dist-4L-288D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distilled_4l_288d_pretraining_amp/files) | 4 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
250-
| [bert-dist-6L-768D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distill_6l_768d_3072di_12h_squad/files) | 6 layer distilled model fine-tuned on SQuAD v1.1 |
251-
| [bert-dist-6L-768D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distill_6l_768d_3072di_12h_sst2/files) | 6 layer distilled model fine-tuned on GLUE SST-2 |
252-
| [bert-dist-6L-768D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distill_6l_768d_3072di_12h_p2/files) | 6 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
247+
| [bert-dist-4L-288D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_4l_288d_qa_squad11_amp/files) | 4 layer distilled model fine-tuned on SQuAD v1.1 |
248+
| [bert-dist-4L-288D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_4l_288d_ft_sst2_amp/files) | 4 layer distilled model fine-tuned on GLUE SST-2 |
249+
| [bert-dist-4L-288D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_4l_288d_pretraining_amp/files) | 4 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
250+
| [bert-dist-6L-768D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_6l_768d_qa_squad11_amp/files) | 6 layer distilled model fine-tuned on SQuAD v1.1 |
251+
| [bert-dist-6L-768D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_6l_768d_ft_sst2_amp/files) | 6 layer distilled model fine-tuned on GLUE SST-2 |
252+
| [bert-dist-6L-768D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_6l_768d_pretraining_amp/files) | 6 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
253+
253254

254255

255256

PyTorch/LanguageModeling/BERT/distillation/README.md

+13-11
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ bash run_e2e_distillation.sh
2020
`run_e2e_distillation.sh` contains 8 command lines to obtain fully distilled BERT models for SQuADv1.1 and SST-2. The distilled BERT model has a config (N=4, D=312, Di=1200 , H=12). To distill knowledge into models of different sizes, a new `BERT_4L_312D/config.json` can be created and passed as a starting point in `run_e2e_distillation.sh`
2121

2222
`run_e2e_distillation.sh` contains the following:
23-
- Generic distillation on Wikipedia and BooksCorpus dataset(BooksCorpus is optional) of maximum sequence length 128. `--input_dir` needs to be update respectively.
24-
- Generic distillation on Wikipedia and BooksCorpus dataset(BooksCorpus is optional) of maximum sequence length 512. `--input_dir` needs to be update respectively.
23+
- Phase1 distillation: Generic distillation on Wikipedia dataset of maximum sequence length 128. `--input_dir` needs to be update respectively.
24+
- Phase2 distillation: Generic distillation on Wikipedia dataset of maximum sequence length 512. `--input_dir` needs to be update respectively.
2525

2626
*Task specific distillation: SQuAD v1.1* (maximum sequence length 384)
2727
- Data augmentation
@@ -35,25 +35,27 @@ bash run_e2e_distillation.sh
3535

3636
![BERT Distillation Flow](https://developer.nvidia.com/sites/default/files/akamai/joc_model.png)
3737

38-
Note: Distillation for SST-2 uses as output of step 1. as starting point in 7, whereas SQuaD v1.1 uses output of step 2. as a starting point in 4.
38+
Note: Task specific distillation for SST-2 uses as output checkpoint of phase1 distillation as starting point, whereas task specific distillation of SQuAD v1.1 uses output checkpoint of phase2 distillation as a starting point.
3939

4040
One can download different general and task-specific distilled checkpoints from NGC:
4141
| Model | Description |
4242
|------------------------|---------------------------------------------------------------------------|
43-
| [bert-dist-4L-288D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distilled_4l_288d_qa_squad11_amp/files) | 4 layer distilled model fine-tuned on SQuAD v1.1 |
44-
| [bert-dist-4L-288D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distilled_4l_288d_ft_sst2_amp/files) | 4 layer distilled model fine-tuned on GLUE SST-2 |
45-
| [bert-dist-4L-288D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distilled_4l_288d_pretraining_amp/files) | 4 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
46-
| [bert-dist-6L-768D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distill_6l_768d_3072di_12h_squad/files) | 6 layer distilled model fine-tuned on SQuAD v1.1 |
47-
| [bert-dist-6L-768D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distill_6l_768d_3072di_12h_sst2/files) | 6 layer distilled model fine-tuned on GLUE SST-2 |
48-
| [bert-dist-6L-768D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/models/bert_pyt_ckpt_distill_6l_768d_3072di_12h_p2/files) | 6 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
43+
| [bert-dist-4L-288D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_4l_288d_qa_squad11_amp/files) | 4 layer distilled model fine-tuned on SQuAD v1.1 |
44+
| [bert-dist-4L-288D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_4l_288d_ft_sst2_amp/files) | 4 layer distilled model fine-tuned on GLUE SST-2 |
45+
| [bert-dist-4L-288D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_4l_288d_pretraining_amp/files) | 4 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
46+
| [bert-dist-6L-768D-uncased-qa](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_6l_768d_qa_squad11_amp/files) | 6 layer distilled model fine-tuned on SQuAD v1.1 |
47+
| [bert-dist-6L-768D-uncased-sst2](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_6l_768d_ft_sst2_amp/files) | 6 layer distilled model fine-tuned on GLUE SST-2 |
48+
| [bert-dist-6L-768D-uncased-pretrained](https://catalog.ngc.nvidia.com/orgs/nvidia/teams/dle/models/bert_pyt_ckpt_distilled_6l_768d_pretraining_amp/files) | 6 layer distilled model pretrained checkpoint on Generic corpora like Wikipedia. |
4949

5050

51+
Following results were obtained on NVIDIA DGX-1 with 32G on pytorch:20.12-py3 NGC container.
52+
5153
*Accuracy achieved and E2E time to train on NVIDIA DGX-1 With 32G:*
5254

5355
| Student | Task | SubTask | Time(hrs) | Total Time (hrs)| Accuracy | BERT Base Accuracy |
5456
| --------------- |:----------------:| :---------------:| :--------: | :-------------: | :------: | ------------------: |
55-
| 4 Layers; H=288 | Distil Phase 1 | | 1.399 | | | |
56-
| 4 Layers; H=288 | Distil Phase 2 | | 0.649 | | | |
57+
| 4 Layers; H=288 | Distil Phase 1 | backbone loss | 1.399 | | | |
58+
| 4 Layers; H=288 | Distil Phase 2 | backbone loss | 0.649 | | | |
5759
| 4 Layers; H=288 | Distil SST-2 | backbone loss | 1.615 | | | |
5860
| 4 Layers; H=288 | Distil SST-2 | final layer loss | 0.469 | 3.483 | 90.82 | 91.51 |
5961
| 4 Layers; H=288 | Distil SQuADv1.1 | backbone loss | 3.471 | | | |

PyTorch/LanguageModeling/BERT/triton/Dockerfile

+1-12
Original file line numberDiff line numberDiff line change
@@ -12,8 +12,6 @@
1212
# limitations under the License.
1313

1414
ARG FROM_IMAGE_NAME=nvcr.io/nvidia/pytorch:21.10-py3
15-
ARG TRITON_CLIENT_IMAGE_NAME=nvcr.io/nvidia/tritonserver:21.10-py3-sdk
16-
FROM ${TRITON_CLIENT_IMAGE_NAME} as triton-client
1715
FROM ${FROM_IMAGE_NAME}
1816

1917
# Ensure apt-get won't prompt for selecting options
@@ -22,7 +20,7 @@ ENV DCGM_VERSION=2.0.13
2220

2321
# Install perf_client required library
2422
RUN apt-get update && \
25-
apt-get install -y libb64-dev libb64-0d curl pbzip2 pv bzip2 cabextract && \
23+
apt-get install -y libb64-dev libb64-0d curl pbzip2 pv bzip2 cabextract wget libb64-dev libb64-0d && \
2624
curl -s -L -O https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/datacenter-gpu-manager_${DCGM_VERSION}_amd64.deb && \
2725
dpkg -i datacenter-gpu-manager_${DCGM_VERSION}_amd64.deb && \
2826
rm datacenter-gpu-manager_${DCGM_VERSION}_amd64.deb && \
@@ -35,20 +33,11 @@ WORKDIR /workspace
3533
RUN git clone https://github.com/attardi/wikiextractor.git && cd wikiextractor && git checkout 6408a430fc504a38b04d37ce5e7fc740191dee16 && cd ..
3634
RUN git clone https://github.com/soskek/bookcorpus.git
3735

38-
# Install Perf Client required library
39-
RUN apt-get update && apt-get install -y libb64-dev libb64-0d
40-
41-
# Install Triton Client Python API and copy Perf Client
42-
COPY --from=triton-client /workspace/install/ /workspace/install/
43-
RUN find /workspace/install/python/ -iname triton*manylinux*.whl -exec pip install {}[all] \;
44-
4536
# Setup environment variables to access Triton Client binaries and libs
4637
ENV PATH /workspace/install/bin:${PATH}
4738
ENV LD_LIBRARY_PATH /workspace/install/lib:${LD_LIBRARY_PATH}
4839
ENV PYTHONPATH /workspace/bert
4940

50-
RUN apt-get install -y iputils-ping
51-
5241
WORKDIR /workspace/bert
5342
ADD requirements.txt /workspace/bert/requirements.txt
5443
ADD triton/requirements.txt /workspace/bert/triton/requirements.txt

PyTorch/LanguageModeling/BERT/triton/README.md

+3-6
Original file line numberDiff line numberDiff line change
@@ -62,14 +62,14 @@ After deployment, the Triton inference server is used for evaluation of the conv
6262
and online (dynamic batching) scenarios.
6363

6464

65-
All steps are executed by the provided runner script. Refer to [Quick Start Guide](#quick-start-guide)
65+
All steps are executed by the provided runner script. Refer to [Quick Start Guide](#quick-start-guide)
6666

6767

6868
## Setup
6969
Ensure you have the following components:
7070
* [NVIDIA Docker](https://github.com/NVIDIA/nvidia-docker)
71-
* [PyTorch NGC container 21.07](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)
72-
* [Triton Inference Server NGC container 21.07](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver)
71+
* [PyTorch NGC container 21.10](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)
72+
* [Triton Inference Server NGC container 21.10](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver)
7373
* [NVIDIA CUDA](https://docs.nvidia.com/cuda/archive//index.html)
7474
* [NVIDIA Ampere](https://www.nvidia.com/en-us/data-center/nvidia-ampere-gpu-architecture/), [Volta](https://www.nvidia.com/en-us/data-center/volta-gpu-architecture/) or [Turing](https://www.nvidia.com/en-us/geforce/turing/) based GPU
7575

@@ -93,6 +93,3 @@ and [HPC](https://developer.nvidia.com/hpc-application-performance) benchmarks.
9393
### Known issues
9494

9595
- There are no known issues with this model.
96-
97-
98-

0 commit comments

Comments
 (0)