Natural language processing (NLP) is one of the most important fields in artificial intelligence (AI). It has become very crucial in the information age because most of the information is in the form of unstructured text. NLP technologies are applied everywhere as people communicate mostly in language: language translation, web search, customer support, emails, forums, advertisement, radiology reports, to name a few.
There are a number of core NLP tasks and machine learning models behind NLP applications. Deep learning, a sub-field of machine learning, has recently brought a paradigm shift from traditional task-specific feature engineering to end-to-end systems and has obtained high performance across many different NLP tasks and downstream applications. Tech companies like Google, Baidu, Alibaba, Apple, Amazon, Facebook, Tencent, and Microsoft are now actively working on deep learning methods to improve their products. For example, Google recently replaced its traditional statistical machine translation and speech-recognition systems with systems based on deep learning methods.
Optional Textbooks
- Deep Learning by Goodfellow, Bengio, and Courville free online
- Machine Learning — A Probabilistic Perspective by Kevin Murphy online
- Natural Language Processing by Jacob Eisenstein free online
- Speech and Language Processing by Dan Jurafsky and James H. Martin (3rd ed. draft)
In this course, students will learn state-of-the-art deep learning methods for NLP. Through lectures and practical assignments, students will learn the necessary tricks for making their models work on practical problems. They will learn to implement and possibly invent their own deep learning models using available deep learning libraries like Pytorch.
Our Approach
-
Thorough and Detailed: How to write from scratch, debug, and train deep neural models
-
State of the art: Most lecture materials are new from the research world in the past 1-5 years.
-
Practical: Focus on practical techniques for training the models, and on GPUs.
-
Fun: Cover exciting new advancements in NLP (e.g., Transformer, ChatGPT).
Weekly Workload
- Lecture and/or tutorial and/or practical problems implemented in PyTorch.
- There will be NO office hours.
- There will be 10% marks for class participation.
Assignments (individually graded)
- There will be two (2) assignments contributing to 2 * 25% = 50% of the total assessment.
- Students will be graded individually on the assignments. They will be allowed to discuss with each other on the homework assignments, but they are required to submit individual write-ups and coding exercises.
Final Project (Group work but individually graded)
- There will be a final project contributing to the remaining 40% of the total coursework assessment.
- 3–5 people per group
- Presentation: 15%, report: 25%
- The project will be group work but the students will be graded individually. The final project presentation will ensure the student’s understanding of the project
- Proficiency in Python (using Numpy and PyTorch).
- Linear Algebra, basic Probability and Statistics
- Machine Learning basics
Instructor
Teaching Assistants
Nguyen Tran Cong Duy
- What is Natural Language Processing?
- Why is language understanding difficult?
- What is Deep Learning?
- Deep learning vs. other machine learning methods?
- Why deep learning for NLP?
- Applications of deep learning to NLP
- Knowing the target group (background, field of study, programming experience)
- Expectation from the course
-
Programming in Python
- Jupiter Notebook and google colab
- Introduction to Python
- Deep Learning Frameworks
- Why Pytorch?
- Deep learning with PyTorch
-
[Supplementary]
- Numerical programming with Numpy/Scipy - Numpy intro
- Numerical programming with Pytorch - Pytorch intro
- What is Machine Learning?
- Supervised vs. unsupervised learning
- Linear Regression
- Logistic Regression
- Multi-class classification
- Parameter estimation (MLE & MAP)
- Gradient-based optimization & SGD
- Deep learning with PyTorch
- Linear Regressionn
- Logistic Regression
- [Supplementary]
- Numerical programming with Pytorch - Pytorch intro
- From Logistic Regression to Feed-forward NN
- Activation functions
- SGD with Backpropagation
- Adaptive SGD (adagrad, adam, RMSProp)
- Regularization (Weight Decay, Dropout, Batch normalization, Gradient clipping)
- Deep learning with PyTorch
- Linear Regressionn
- Logistic Regression
- Numpy notebook Pytorch notebook
- Backpropagation
- Dropout
- Batch normalization
- Initialization
- Gradient clipping
- Word meaning
- Denotational semantics
- Distributed representation of words
- Word2Vec models (Skip-gram, CBOW)
- Negative sampling
- FastText
- Evaluating word vectors
- Intrinsic evaluation
- Extrinsic evaluation
- Cross-lingual word embeddings
- Word2Vec Tutorial - The Skip-Gram Model, blog
- Efficient Estimation of Word Representations in Vector Space - Original word2vec paper
- Distributed Representations of Words and Phrases and their Compositionality - negative sampling paper
- GloVe: Global Vectors for Word Representation
- FastText: Enriching Word Vectors with Subword Information
- Linguistic Regularities in Sparse and Explicit Word Representations.
- Neural Word Embeddings as Implicit Matrix Factorization.
- Classification tasks in NLP
- Window-based Approach for language modeling
- Window-based Approach for NER, POS tagging, and Chunking
- Convolutional Neural Net for NLP
- Max-margin Training
- Survey on Cross-lingual embedding methods
- Slides on Cross-lingual embedding
- Adversarial autoencoder for unsupervised word translation
- Evaluating Cross-Lingual Word Embeddings
- Linear Algebraic Structure of Word Senses, with Applications to Polysemy
- Improving Distributional Similarity with Lessons Learned from Word Embeddings
- Natural Language Processing (Almost) from Scratch
- Convolutional Neural Networks for Sentence Classification
- Fast and Accurate Entity Recognition with Iterated Dilated Convolutions
- Language modeling with RNNs
- Backpropagation through time
- Text generation with RNN LM
- Sequence labeling with RNNs
- Sequence classification with RNNs
- Issues with Vanilla RNNs
- Gated Recurrent Units (GRUs) and LSTMs
- Bidirectional RNNs
- Multi-layer RNNs
- N-gram Language Models
- Karpathy’s nice blog on Recurrent Neural Networks
- Building an Efficient Neural Language Model
- On the difficulty of training recurrent neural networks
- Colah’s blog on LSTMs/GRUs
- Neural Architectures for Named Entity Recognition
- Fine-grained Opinion Mining with Recurrent Neural Networks and Word Embeddings
- Zero-Resource Cross-Lingual NER
- Adaptive Softmax Paper
- Adaptive Input representation paper
- KNN-LM paper
Assignment 1 is out here. Deadline: 23 Jan 2024.
- Machine translation
- Early days (1950s)
- Statistical machine translation or SMT (1990-2010)
- Alignment in SMT
- Decoding in SMT
- Neural machine translation or NMT (2014 - )
- Encoder-decoder model for NMT
- Advantages and disadvantages of NMT
- Greedy vs. beam-search decoding
- MT evaluation
- Statistical Machine Translation slides, CS224n 2015 (lectures 2/3/4)
- Sequence to Sequence Learning with Neural Networks (original seq2seq NMT paper)
- Statistical Machine Translation (book by Philipp Koehn)
- A Neural Conversational Model
- BLEU (original paper)
- Information bottleneck issue with vanilla Seq2Seq
- Attention to the rescue
- Details of attention mechanism
- Sub-word models
- Byte-pair encoding
- Hybrid models
- Neural Machine Translation by Jointly Learning to Align and Translate (original seq2seq+attention paper)
- Effective Approaches to Attention-based Neural Machine Translation
- Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models
- Seq2Seq Variants (Pointer nets, Pointer Generator Nets)
- Machine Translation
- Summarization
- Transformer architecture
- Self-attention
- Positional encoding
- Multi-head attention
- Get To The Point: Summarization with Pointer-Generator Networks
- Pointer Networks
- Stack-Pointer Networks for Dependency Parsing
- A Unified Linear-Time Framework for Sentence-Level Discourse Parsing
- Attention Is All You Need
- The Illustrated Transformer
- Resurrecting Submodularity in Neural Abstractive Summarization
-
Why semi-supervsied?
-
Semisupervised learning dimensions
-
Pre-training and fine-tuning methods
- CoVe
- TagLM
- ELMo
- GPT
- ULMfit
- BERT
- BART
-
Evaluation benchmarks
- GLUE
- SQuAD
- NER
- SuperGLUE
- XNLI
Assignment 2 is out here. Deadline: 21 Feb 2024, 11:59 pm.
- Large Pretrained Language Models
- Examples of Large Pretrained Language Models
- Multilingual NLP
- Why do we need Multilingual NLP?
- Low-resource NLP
- Cross-lingual models
- Multilingual models