Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LSTM: data2pkl #8

Open
xiaojian10 opened this issue Sep 11, 2019 · 3 comments
Open

LSTM: data2pkl #8

xiaojian10 opened this issue Sep 11, 2019 · 3 comments

Comments

@xiaojian10
Copy link

In your code, I found that the lstm folder does not define how to dump the data into .pkl file format. I think this is a job that I need to do. How do I define my image commands when converting a spectrogram of speech to a .pkl format?
Is the naming convention for images similar to converting data to binary files?

@zhr1201
Copy link
Owner

zhr1201 commented Sep 11, 2019

Sorry for missing that piece of code. That was like written a long time ago and I can only refer to the current code to try to remember the details.

So this line should explain the format of the pkl files:

        self.ref_data = np.reshape(
            self.ref_data,
            [self.batch_size, self.epoch_size, self.num_steps, self.NEFF])

Basically you should store the forward beamformer (also for backward bf, reference) output spectrograms as a big NP array with shape total_time_steps * NEFF(num of effective FFT points e.g. 256/2 + 1). By total_time_steps, I mean concatenating all samples together so all the data should stored together rather than store them separately. As for how to convert spectrograms into the pkl files, you just store their log magnitude and reshape into that shape. Another thing is the pkl files for forward BF, backward BF, and reference should be aligned, which means that same time index value should correspond to the same sample's forward BF, backward BF, reference sampled at the same time.

@xiaojian10
Copy link
Author

Your suggestion is really great, I will try to implement it according to your suggestion. I am truly grateful for your help.

@xiaojian10
Copy link
Author

I found that in your shared project, the vocalization time of male and female voices in the test voice is inconsistent. Does this mean that we have to do this when we make the data set? For example, let the interfering speech appear first, and after a few seconds delay, the target speech appears, and the following sound is a mixture of interfering speech and target speech.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants