-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LSTM: data2pkl #8
Comments
Sorry for missing that piece of code. That was like written a long time ago and I can only refer to the current code to try to remember the details. So this line should explain the format of the pkl files:
Basically you should store the forward beamformer (also for backward bf, reference) output spectrograms as a big NP array with shape total_time_steps * NEFF(num of effective FFT points e.g. 256/2 + 1). By total_time_steps, I mean concatenating all samples together so all the data should stored together rather than store them separately. As for how to convert spectrograms into the pkl files, you just store their log magnitude and reshape into that shape. Another thing is the pkl files for forward BF, backward BF, and reference should be aligned, which means that same time index value should correspond to the same sample's forward BF, backward BF, reference sampled at the same time. |
Your suggestion is really great, I will try to implement it according to your suggestion. I am truly grateful for your help. |
I found that in your shared project, the vocalization time of male and female voices in the test voice is inconsistent. Does this mean that we have to do this when we make the data set? For example, let the interfering speech appear first, and after a few seconds delay, the target speech appears, and the following sound is a mixture of interfering speech and target speech. |
In your code, I found that the lstm folder does not define how to dump the data into .pkl file format. I think this is a job that I need to do. How do I define my image commands when converting a spectrogram of speech to a .pkl format?
Is the naming convention for images similar to converting data to binary files?
The text was updated successfully, but these errors were encountered: