VTLN across genders #6

EmreOzkose · 2021-09-15T12:46:06Z

Hi!

I am trying to extract mfcc+vtln features for a dataset, but I couldn't get the pipeline exactly. When I run mfcc_vtln.py, I get warp factors for each speakers. So we can use these warps for normalization of dataset. However if we have no speaker ids in dataset, but have female/male labels, can we do vtln across genders? I mean, in this setup, we have only 2 speaker_id which are female and male. Do you think if this will work?

mmmaat · 2021-09-15T12:53:00Z

Hi!
Indeed you cannot use VTLN without speaker information... We never tried VTLN by gender, but it can make sense I guess.
Another option (works only if your dataset has long enough wavs) is to compute one VTLN coefficient per audio sample. In an experiment (no yet published...) we showed that 2 minutes of audio per speaker is enough for the VTLN coefficients to converge.

EmreOzkose · 2021-09-15T13:22:09Z

I think computing one VTLN coefficient per audio sample may take too long (for example if we have 100h training data)? Another issue is inference time. If we have 1 sample during test time, warp factor will be 1.0.

EmreOzkose · 2021-09-17T11:25:09Z

I wanna note my observations here. I seperated dataset into 2 speaker as I said and run mfcc_vtln.py. When I limit duration with 10min and 20min, warp factors become 1.0 for each speaker. I am incresing duration limit.

mmmaat changed the title ~~vtln~~ VTLN across genders Sep 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VTLN across genders #6

VTLN across genders #6

EmreOzkose commented Sep 15, 2021

mmmaat commented Sep 15, 2021

EmreOzkose commented Sep 15, 2021

EmreOzkose commented Sep 17, 2021

VTLN across genders #6

VTLN across genders #6

Comments

EmreOzkose commented Sep 15, 2021

mmmaat commented Sep 15, 2021

EmreOzkose commented Sep 15, 2021

EmreOzkose commented Sep 17, 2021