Skip to content
forked from markovka17/dla

Deep learning for audio processing

License

Notifications You must be signed in to change notification settings

triple-purity/dla

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

logo5v1

Deep Learning for Audio (DLA)

  • Lecture and seminar materials for each week are in ./week* folders, see README.md for materials and instructions
  • Any technical issues, ideas, bugs in course materials, contribution ideas - add an issue
  • The current version of the course is conducted in autumn 2024 at the CS Faculty of HSE.

For previous years versions, see Past Versions section.

Syllabus

  • week01 Introduction to Course

    • Lecture: Introduction to Course
    • Seminar: Experiment tracking, Hydra, Git, VS code
    • Self-Study: Introduction to PyTorch
  • week02 Introduction to Digital Signal Processing

    • Lecture: Signals, Fourier Transform, spectrograms, MelScale, MFCC
    • Seminar: DSP in practice, spectrogram creation, IRF, frequency filtering
  • week03 Speech Recognition I

    • Lecture: Metrics, Datasets, Connectionist Temporal Classification (CTC), Classic Models, Beam Search, Language models
    • Seminar: Audio Augmentations, Beam Search
    • Q&A Session: Homework discussion, R&D coding tips
  • week04 Speech Recognition II

    • Lecture: LAS, RNN-T, Language models for RNN-T and LAS
    • Seminar: Hybrid RNN-T and CTC model training and inference
  • week05 Guest Lecture. Speech Recognition III and Audio SSL

    • Lecture: Self-Supervised Models for Audio, Audio LLMs
  • week06 Source Separation I

    • Lecture: A review of general Source Separation and Denoising, Encoder-Decoder-Separator architectures, Demucs family, DCCRN, FullSubNet+, BandSplitRNN
    • Seminar: Metrics
  • week07 Source Separation II

    • Lecture: Speech separation, Blind and Target Separation, Recurrent(TasNet, DPRNN, VoiceFilter) and CNN(ConvTasNet, SpEx+)
    • Seminar: WienerFilter, SincFilter and DEMUCS; streaming processing and performance metrics
  • week08 Audio-Visual Deep Learning

    • Lecture: Audio-Visual Fusion, Source Separation, Speech Recognition, and Self-Supervised Models. Wav2Lip and SadTalker (talking face)
    • Q&A: Project and Slurm discussion
    • Extra Seminar: Create Your Own Intelligent Voice Assistant

Homeworks and Projects

  • HW_ASR Training speech recognition model
  • Project_AVSS Training audio-visual speech separation model

See our project template.

Resources

Some of the weeks have English recordings. See the corresponding sub-directories.

Contributors & course staff

Course materials and teaching (in different years) were delivered by:

Past Versions

About

Deep learning for audio processing

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 100.0%