Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to use for denoised audio (which is the 'clean' signal?) #1

Open
youssefavx opened this issue Apr 3, 2022 · 3 comments
Open

How to use for denoised audio (which is the 'clean' signal?) #1

youssefavx opened this issue Apr 3, 2022 · 3 comments

Comments

@youssefavx
Copy link

Hey! Thank you so much for making this super helpful library.

In the read me, the usage says:

from pysiib import SIIB
from scipy.io import wavefile

fs, x = wavfile.read("clean.wav")
fs, y = wavfile.read("distorted.wav")

# SIIB with MI function in C-implementation (this is used in [1],[2])
SIIB(x, y, fs)
# SIIB with MI function in python implementation
SIIB(x, y, fs, use_MI_Kraskov=False)
# SIIB^Gauss
SIIB(x, y, fs, gauss=True)

My first question is:

  1. Does order matter? Meaning, does it matter whether the clean and distorted signal are like so: SIIB(y, x fs) as opposed to SIIB(x, y, fs)?
  2. If it does matter... if I'm trying to measure how much a denoising process has affected a signal, what is classified as clean and what is classified as distorted? Is 'clean' in this context the original noisy signal, and 'distorted' the new denoised signal?

Thanks again! No pressure to respond if you don't have time.

@youssefavx
Copy link
Author

youssefavx commented Apr 4, 2022

Also, I've tested the algorithm on a bunch of different noisy files and I get numbers that vary wildly from 40 to 1000. There doesn't seem to be a kind of 'fixed' score (above 50, it's all unintelligible), it seems to be relative to the given file. Or am I doing something wrong?

I was planning on using this to fade in a noisy signal when the denoised version made intelligibility poorer rather than better...but since it varies so much it's hard to pick a threshold ("Over 700, I'll start fading in the signal") because it's different for every new noisy file (since the noise varies and the denoising success varies).

@kamo-naoyuki
Copy link
Owner

Does order matter? Meaning, does it matter whether the clean and distorted signal are like so: SIIB(y, x fs) as opposed to SIIB(x, y, fs)?

SIIB(y, x fs) is not identical to SIIB(x, y, fs) because SIIB performs on the vad region of the clean reference;

pySIIB/pysiib.py

Lines 97 to 100 in 226c2f3

# VAD
vad_index_x = get_vad(x, window_length, window_shift, window, delta_dB)
x_hat = x_hat[:, vad_index_x]
y_hat = y_hat[:, vad_index_x]

If it does matter... if I'm trying to measure how much a denoising process has affected a signal, what is classified as clean and what is classified as distorted? Is 'clean' in this context the original noisy signal, and 'distorted' the new denoised signal?

I'm not sure what you mean. The clean signal should be a noise-free signal, i.e. only speech.

Also, I've tested the algorithm on a bunch of different noisy files and I get numbers that vary wildly from 40 to 1000. There doesn't seem to be a kind of 'fixed' score (above 50, it's all unintelligible), it seems to be relative to the given file. Or am I doing something wrong?

Actually, I just only ported this code from the author's one for my purpose. I recommend you ask the author deep questions.

@youssefavx
Copy link
Author

youssefavx commented Apr 4, 2022

SIIB(y, x fs) is not identical to SIIB(x, y, fs) because SIIB performs on the vad region of the clean reference;

Ah! Thank you for answering.

I'm not sure what you mean. The clean signal should be a noise-free signal, i.e. only speech.

Ah okay, sorry I meant that, let's say that:

  1. I have a noisy signal (recorded on tape in an old format, so I don't have an original 'clean' noise-free signal of the voice.)
  2. I want to denoise this old noisy recording. I generate a denoised recording using an AI.
  3. Because I don't know if this AI will accurately denoise the signal without artifacts or affecting/distorting the voice in some regions, I want to measure if there was a loss in intelligibility on the denoised signal relative to the original (noisy) signal.

In this case, the clean signal / variable should be data of the denoised signal, and not the original initially noisy old recording?

Actually, I just only ported this code from the author's one for my purpose. I recommend you ask the author deep questions

Will do! Thank you so so much for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants