Skip to content

Conversation

xenova
Copy link
Collaborator

@xenova xenova commented Apr 6, 2025

Copied from huggingface/transformers#36603:

When computing triangular mel filter matrices and when we triangularize in mel space, if we have e.g. num_frequency_bins being 257, meaning real-valued fft as been computed on 512 points (257 = n_fft // 2 + 1), then the fft_bin_width should be (sampling rate / 2) / number_of_bins with number_of_bins = num_frequency_bins - 1.

This was very likely introduced and not seen by tests for the reason that mel_filter_bank was called with, following the above example, num_frequency_bins = 256 when using triangularize_in_mel_space=True then padded with 0s to retrieve the correct expected shape (257), which is a bad practice and misleading for the user as we do not respect the method API!

I also updated the expected outputs with the one I got from running torchaudio kaldi implementation, confirming our implem was incorrect.

Thanks @eustlb for the original fix!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@xenova xenova merged commit 10c09fb into main Apr 9, 2025
4 checks passed
@xenova xenova deleted the fix-triangularise-mel-space branch April 9, 2025 19:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants