I'm not saying this tool is bad but I would be really careful about using tools like this in an environment where audio quality really matters (Youtube videos, podcasts, etc.).
Noise reduction tools work by removing specific frequencies from the source, some of which overlap with your natural voice.
This is why you start to sound robotic and get weird cutouts if you try to use tools to remove too much noise or background sounds. It's one of those things where, if you're not used to hearing your entire vocal range, you might not be aware at how much is getting cut out from tools that reduce noise.
It's too bad they don't have a before / after with a few voice samples in the readme.
Definitely sounds better than I thought it would have and I've watched tons of this guy's videos in the past.
It really distorts his voice / range in some cases, such as when he taps his desk with that orange hammer. The difference there is night and day. It chops out his his natural voice's range. It seems to degrade his voice the more intense the background noise is, such as the leaf blower (lol), but that's reasonable to expect. But at the same time, even the mechanical keyboard has a very noticeable negative effect on his range.
It's one of those things where I wish so much that it worked perfectly, but I couldn't realistically think about using it for any recording work due to things like the above. There's just too many common noises (typing, etc.) that drastically distorts your voice.
9:23 in that video is hilarious though. Have to love Jerry!
I wonder if its the algorithm degrading his voice or if the input sound is already degraded. Is it possible a leaf blower or a hammer would cause enough "noise" to make it so our ears couldnt hear his voice clearly as well? Then when you subtract out the portion of the sound attributed to the leafblower, youre hearing the parts of his voice that werent being jumbled by the leaf blower?
Hard to say because softer noises like typing still makes his voice sound like it's cutting out unnaturally. It's like the frequencies are being subtracted out of his normal tone, but it's more subtle than the leaf blower so you may not notice it without good headphones. It makes him sound very choppy and mechanical.
With the leaf blower I suspect that when it gets too close the microphone/ADC is saturating, which clips his voice. I wonder if it would've sounded better had he attempted to lower the gain on the microphone.
The difference in here is that RNNoise does not just remove some specific frequency, it uses neural networks to remove it which results in much higher quality compared to what you were implying.
I have personally not noticed voice quality suffering too much, but you are of course right. And this is not what it was made for. My personal use case is mostly voip where RNNoise (imo) does an amazing job.
Looks excellent and keen to delve into the code a bit.
One quick question since you'll clearly know the codebase - do you think this could easily be adapted to create a "playback-side" noise filter?
Use-case rationale here is noisy and poor quality podcasts or "other people's" audio - it would be awesome to be able to configure your tool as the output for Chrome or Firefox or whatever program I'm listening to, then route the cleaned audio from your tool to the physical audio port.
Is that something which would be feasible to do here?
Agreed, but now this has piqued my interest in a good way.
Having two instances loaded might be a bit confusing as you say - I imagine it would need to be something like "NoiseTorch for Recording" and NoiseTorch for Playback.
I'd need to go and play around with Pulse but I guess it would be possible to present 2 interfaces into Pulse with different names, then hope users can see the distinction when selecting a microphone versus the output device.
Would it be possible to upload a few before / after samples with varying degrees of background noise? Even if it's all the same person that would be a huge help to gauge the quality.
Just a suggestion if you do it, please include realistic room noises in some of the samples.
I looked at the RNNoise examples and it was pretty bad. I mean, the audio quality of the speaker got completely mangled but the background noise was also comically high. It sounded like the person just sat down in the middle of the street in NYC or was inside of a busy train terminal.
Yes and no. NoiseTorch also has VAD (Voice Activity Detection). RNNoise also returns the probability of a sound sample being voice, I use that to clamp the microphone completely if its < the configured probability.
This works really well for situations like Discord or Teamspeak where you're usually not constantly talking, but doing things that can still set off "normal" voice activation. RNNoise's model often knows it's not voice, but cannot denoise it completely.
Yes, classic noise suppression sounds very poor very quickly. Noisy or poor audio is like blurred photos or videos, very hard to fix, while noisy or shaky videos are easily fixed (especially temporal de-noising on videos is akin to magic, it can extend the performance of the camera by multiple stops with very low IQ impact).
That's why these ML tools are potentially huge, good ol' noise suppression just isn't good.
How long until we can get some kind of open AI project to take in incoming bad quality voice and output clear noiseless human speech (in our, or whoever's voice we want), so podcasters don't have to buy expensive microphones and try to soundproof their rooms anymore?
I know we're not there yet, but I feel like we're about to break "garbage in garbage out" with AI.
I'm just a video course / podcaster who spent a decent amount of time researching audio and I'm not a deep down audio engineer.
But based on the results I see with automated software tools that only try to reduce noise, I would say we're no where near there and a really good solution would involve things that haven't been invented yet. I think we'll have manned trips to Mars well before you have a software solution that can emulate the sound of a moderately treated room with ~2ms of latency or less.
With that said, I think we're there today if all you want to do is help reduce the noise of an air conditioner so you can chat with a friend on Hangouts, Discord or Zoom. This is a scenario where audio quality doesn't matter, but not hearing an A/C or lawn mower is worth having the person talking sound like a choppy robot. You probably won't even notice it too much with earbuds.
Noise reduction tools work by removing specific frequencies from the source, some of which overlap with your natural voice.
This is why you start to sound robotic and get weird cutouts if you try to use tools to remove too much noise or background sounds. It's one of those things where, if you're not used to hearing your entire vocal range, you might not be aware at how much is getting cut out from tools that reduce noise.
It's too bad they don't have a before / after with a few voice samples in the readme.