In this demo, we will use the Hugging Faces transformers
and datasets
library with Amazon SageMaker to fine-tune a pre-trained transformer on binary text classification. In particular, we will use the pre-trained DistilBERT model with the IMDB dataset.
We will then deploy the resulting model for inference using SageMaker Serverless Endpoint.
We'll be using an offshoot of BERT called DistilBERT that is smaller, and so faster and cheaper for both training and inference. A pre-trained model is available in the transformers
library from Hugging Face.
The IMDB is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. It provides a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. It's avalaible under the IMDB
dataset on Hugging Face.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.