This project develops a deep learning model designed to accurately count syllables in audio recordings of varying lengths. Utilizing a recurrent neural network architecture enhanced with an attention mechanism, the model processes spectrogram tensors to predict the syllable count. The model is built with PyTorch.
- ffmpeg (!!!)
- PyTorch
- PyAudio
- Matplotlib