Deep Learning for Audio (DLA)

Lecture and seminar materials for each week are in ./week* folders, see README.md for materials and instructions
Any technical issues, ideas, bugs in course materials, contribution ideas - add an issue
The current version of the course is conducted in autumn 2024 at the CS Faculty of HSE.

For previous years versions, see Past Versions section.

Syllabus

week01 Introduction to Course
- Lecture: Introduction to Course
- Seminar: Experiment tracking, Hydra, Git, VS code
- Self-Study: Introduction to PyTorch
week02 Introduction to Digital Signal Processing
- Lecture: Signals, Fourier Transform, spectrograms, MelScale, MFCC
- Seminar: DSP in practice, spectrogram creation, IRF, frequency filtering
week03 Speech Recognition I
- Lecture: Metrics, Datasets, Connectionist Temporal Classification (CTC), Classic Models, Beam Search, Language models
- Seminar: Audio Augmentations, Beam Search
- Q&A Session: Homework discussion, R&D coding tips
week04 Speech Recognition II
- Lecture: LAS, RNN-T, Language models for RNN-T and LAS
- Seminar: Hybrid RNN-T and CTC model training and inference
week05 Guest Lecture. Speech Recognition III and Audio SSL
- Lecture: Self-Supervised Models for Audio, Audio LLMs
week06 Source Separation I
- Lecture: A review of general Source Separation and Denoising, Encoder-Decoder-Separator architectures, Demucs family, DCCRN, FullSubNet+, BandSplitRNN
- Seminar: Metrics
week07 Source Separation II
- Lecture: Speech separation, Blind and Target Separation, Recurrent(TasNet, DPRNN, VoiceFilter) and CNN(ConvTasNet, SpEx+)
- Seminar: WienerFilter, SincFilter and DEMUCS; streaming processing and performance metrics
week08 Audio-Visual Deep Learning
- Lecture: Audio-Visual Fusion, Source Separation, Speech Recognition, and Self-Supervised Models. Wav2Lip and SadTalker (talking face)
- Q&A: Project and Slurm discussion
- Extra Seminar: Create Your Own Intelligent Voice Assistant

Homeworks and Projects

HW_ASR Training speech recognition model
Project_AVSS Training audio-visual speech separation model

See our project template.

Resources

Lecture recordings on YouTube (in russian)

Some of the weeks have English recordings. See the corresponding sub-directories.

Contributors & course staff

Course materials and teaching (in different years) were delivered by:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Learning for Audio (DLA)

Syllabus

Homeworks and Projects

Resources

Contributors & course staff

Past Versions

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 153 Commits
hw1_asr		hw1_asr
project_avss		project_avss
week01		week01
week02		week02
week03		week03
week04		week04
week05		week05
week06		week06
week07		week07
week08		week08
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

License

triple-purity/dla

Folders and files

Latest commit

History

Repository files navigation

Deep Learning for Audio (DLA)

Syllabus

Homeworks and Projects

Resources

Contributors & course staff

Past Versions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages