Single Tone Detection with the Goertzel Algorithm (2012)

kragen · on Dec 25, 2019

I've been playing around with single tone detection using the Minsky circle algorithm. Without roundoff, they are equivalent, but Minsky tolerates roundoff better, especially at low frequencies — you can think of Goertzel as representing the phase or quadrature information as the difference s[n] - s[n-1], which obviously gets less bits allocated to it than the s[i] themselves, particularly near the peaks and at low frequencies.

Precisely, you can derive Minsky’s algorithm (modified for frequency detection by adding the input samples into the register you produce the output from) from applying Goertzel’s algorithm to the first backward differences of the input signal.

The derivation is in Dercuano in notes/goertzel-minsky-prefix-sums.html. The latest, and probably the penultimate, version of Dercuano is at http://canonical.org/~kragen/dercuano-20191215.tar.gz.

There are some tricks in there for windowing the computation over different timescales, too.

jacquesm · on Dec 25, 2019

Interesting, thank you, I will look into that.

The problem I'm working on requires something that will work well in the presence of chords and even though I've made really good progress I'm still about 4% away from SOTA, but 100's of times faster. Not quite ready to give up.

edit: found a HN discussion on that: https://news.ycombinator.com/item?id=15266331

kragen · on Dec 25, 2019

Anything linear should work well in the presence of chords, I would think? (Although maybe PLLs would work better, depending on what you're doing, despite being nonlinear.)

The HN discussion has some good information, but it also has some mathematical errors.

Presumably you don't mean the School of the Arts or Summits on the Air (though both involve detecting frequencies), so what is SOTA?

jacquesm · on Dec 25, 2019

PLL is an interesting option, never thought of that, that is worth pursuing, thank you. And like the other comment already wrote, state of the art.

The project is automatic music transcription, the 'gold standard' that I've found so far is Magenta based 'Onsets and Frames' (https://magenta.tensorflow.org/onsets-frames ), which scores pretty good on a large benchmark.

I'm still a ways away from being their score but there is real chance that it will happen. The tensorflow based software is super slow, many orders of magnitude slower than realtime would require and I have a bunch of applications in mind where slower than realtime is not an option.

Here are the two side-by-side across the same testset (first mine):

// di 24 dec 2019 16:49:29 CET overall total 17482 correct 12604 fp(fantasized) 2723 fn(missed) 4878 f_measure 76.833 precision 82.234 recall 72.097 accuracy 62.381

vs (Magenta O&F):

// wo 25 dec 2019 20:09:51 CET overall total 17482 correct 13865 fp(fantasized) 2908 fn(missed) 3617 f_measure 80.952 precision 82.663 recall 79.31 accuracy 67.999

Clearly the O&F software has the edge in detection but it is close enough that I have some hope of beating that score, I'm already way ahead of what they list as the 'previous state of the art'.

kragen · on Dec 25, 2019

Congratulations! So you're equal in precision but still suffer on recall? Anything in common among the notes your system misses? My friend Alastair Porter recently did his dissertation on tune identification, so he told me a bit about the transcription problem. It's hairy!

I think PLLs make sense when you have chirps, like vibrato or portamento. Not sure they'll help in other cases. For piano they can only hurt, I imagine.

jacquesm · on Dec 25, 2019

> Congratulations!

Thank you. There ought to be something to show for working this hard between jobs since August... It's been a very long slog.

> So you're equal in precision but still suffer on recall?

That's about the gist of it. But I suspect that if I fix the recall I'll end up being down on precision again, that's roughly how this has been going so far. Two steps forward, one step back.

> Anything in common among the notes your system misses?

Restrikes are really hard, as are octaves. They are conceptually super simple but when you get all the combinations thrown at you it stops being simple in a hurry.

I'm reasonably optimistic about the restrikes, less so about the octaves, especially when the lower note is the louder one (power chords, for instance).

> My friend Alastair Porter recently did his dissertation on tune identification, so he told me a bit about the transcription problem. It's hairy!

No kidding. Tune identification is usually done on the binary encoding of the melody (up or equal = 1, down = 0) so I can see why transcription would come in handy there.

> I think PLLs make sense when you have chirps, like vibrato or portamento. Not sure they'll help in other cases. For piano they can only hurt, I imagine.

They might be useful in verification of notes already generated.

rjeli · on Dec 25, 2019

State of the art

jcims · on Dec 25, 2019

Is FFT a problem because of the compute requirements or the latency? (or something else)

jacquesm · on Dec 25, 2019

Well, it is not so much a problem as it is a solution, FFTs are a very important part of what I've built. But the low end is very hard to do precise using FFTs (because the bin size is large compared to the difference between individual notes), for that I use autocorrelation, and for the verification step I would like to use something else than FFTs to get a 'second opinion' on the notes detected.

jcims · on Dec 26, 2019

Sounds like a very cool project, hope we hear about it when its ready.

kragen · on Dec 31, 2019

I've published the final version of Dercuano at http://canonical.org/~kragen/dercuano-20191230.tar.gz. I also hacked together an inferior PDF version for reading on hand computers at http://canonical.org/~kragen/dercuano.20191230.pdf. The Goertzel derivation is pp. 2769–2770.

MobileVet · on Dec 25, 2019

We made the first guitar tuner on a mobile phone in 2006 using the algorithm. It ran like a champ on the Motorola Razr and other ‘dumb phones’ of that time. We charged $25 for it and people loved it as it performed way better than the $25 stand alone tuners. My testing showed it was equivalent to a ~$75 model at the time.

I had a background in signals and controls and was familiar with the DFT. Goertzel was a new variant to me but ran much better on the limited resources.

Good memories.

pierrec · on Dec 25, 2019

I suppose that made sense because of the very limited resources, but note that it's not ideal because it doesn't estimate pitch at all, it only tells you if there's any content inside a frequency bin you provide. A guitar tuner is expected to have an arrow pointing up/down to tell you which way you should adjust, and this algorithm can't give you that. Getting the direction of that arrow accurately can be challenging because of harmonic content (which guitars have a lot of).

These days it would make more sense to use an algorithm that takes harmonics into account and gives you an estimated fundamental frequency, like Miller Puckette's pitch detection algorithm.

MobileVet · on Dec 25, 2019

Yes, I remember the harmonics were an issue. We were excited when the iPhone came out and had the resources to tune all the strings at once.

I can’t remember what we did, but we did provide sharp / flat indication. It definitely worked just like a normal tuner you would find, letting you dial it in quite precisely. Maybe I will crack open the code and look at it...

kragen · on Dec 25, 2019

This is true. Without reaching for state-of-the-art psychoacoustics research, though, even the second autocorrelation peak will take harmonics into account. And you can compute it without the STFT latency, which matters here.

It is possible to estimate where a signal is inside the frequency bin; that's how phase vocoders work.

CamperBob2 · on Dec 25, 2019

Couple of ways to do it. If you don't want to deal with the latency required to take the block-to-block phase difference (which somehow ended up with the $50 nomenclature "phase vocoder"), you can just do a parabolic fit on the magnitudes of the central bin and its adjacent pair. 99% of the time, the latter is sufficient. With only three points the math is super trivial [1].

For sharp/flat indication, I'd be inclined to look at the adjacent-bin magnitudes. But it's been explained to me numerous times that tuning an instrument isn't just a matter of measuring its frequency, and while I can't argue with that, I don't think I'm ever going to truly grok it.

1: https://ccrma.stanford.edu/~jos/sasp/Quadratic_Interpolation...

jacquesm · on Dec 25, 2019

> But it's been explained to me numerous times that tuning an instrument isn't just a matter of measuring its frequency, and while I can't argue with that, I don't think I'm ever going to truly grok it.

It depends on the instrument. If the instrument has only a few , single physical elements per tone generated and the instrument is frequently tuned then you are correct (say a guitar), there is nothing to be done beyond setting the center frequency correctly depending on how you want to tune.

But tuning a piano or harpsichord is a different matter. First of all, there are many different ways of tuning, for instance there is 'stretch' tuning which deals with inharmonicity (sp?) in the higher regions.

It gets more complex when you have multiple physical resonating elements (such as strings), for instance depending on where you are on the piano keyboard you might have one, two or three strings per note called a 'unison' (sound together).

The strings in a single unison not only need to sound good right after tuning, they need to continue to sound good together afterwards and that means that all of them need to degrade gracefully over time as well and that in turn means that they need to lower in tension in a similar way. Good tuners are able to ensure that the string tension is distributed equally between the 'sounding' part of the string (the middle part) and the non-sounding parts (the ends). The amount of tension on the strings and the friction of the pin block and the pins (stick slip) can make this a lot harder than you might think.

Finally, just turning the pins until the frequency matches is going to result in a piano that de-tunes very quickly, the best approach is to go a bit too high and then to slowly lower the pitch while hitting the keys quite hard so that the tension of the string gets distributed evenly.

Added complexity is that the combined 18,000 KG of tension on the frame will cause the tuning of one section of the strings to affect all the others. If a piano is off very far from being properly tuned you may have to make multiple passes to get the job done.

kragen · on Dec 25, 2019

Even a guitar string has inharmonicity, so the pitch you measure by measuring the lowest resonant frequency is somewhat too low: a sine wave at that frequency will sound lower than the string does. I thought that was CamperBob2 meant. It's certainly what pierrec was talking about, and it's an advantage of ACF-based pitch estimation algorithms.

jacquesm · on Dec 26, 2019

Do you stretch tune guitars too then?

kragen · on Dec 26, 2019

That's an interesting question; you'd have to do it either by fret placement or pitch bending with finger pressure. In the latter case it would only happen when you were playing chords with far-apart notes.

CamperBob2 · on Dec 25, 2019

Some great points there, for sure. Pianos are often mentioned when the subject of tuning-as-art-form comes up.

kragen · on Dec 25, 2019

Maybe that does give you less latency (if it doesn't require narrower bins for equivalently good results), but it also requires you to compute three frequency bins, which requires 3× the computation with Goertzel (or Minsky), though still far less than a full autocorrelation function.

An amusing thought, which surely JOS has explored: if you use a Karplus–Strong delay line as your resonating element instead of a sinusoid, you can take the overtones into account for free. Maybe that reduces to the ACF, minus some kind of normalization. I'm not sure.

jcims · on Dec 25, 2019

Do you recall how you handled low/high tuning? Multiple filters?

MobileVet · on Dec 25, 2019

If I remember correctly, we suggested the user tune with a particular fret held down.

If you are talking about tuning at 441 etc instead of 440, that was easy enough as it was just an input parameter the user set.

jcims · on Dec 25, 2019

Nice!

sevagh · on Dec 25, 2019

Bizarre that people keep mentioning their "guitar tuners" that use Goertzel. Are you looping over every single possible frequency? You should be using real fundmental frequency estimation/pitch detection algorithms ([1], [2]) instead.

1. https://github.com/sevagh/pitch-detection

2. https://github.com/sevagh/pitcha

MobileVet · on Dec 26, 2019

This was 2006... on a dumb phone. It wasn’t ideal, but it worked quite well.

sevagh · on Dec 26, 2019

So did you indeed loop over every possible frequency?

I know I attempted a Goertzel-based pitch detector once, where I looped over frequencies and picked the one with the largest magnitude. I picked 25Hz and 4200Hz as my limits (from a piano).

jacquesm · on Dec 25, 2019

This is a neat trick if you need to detect only a few tones out of the possible range offered by an FFT, it essentially allows you to compute only a single bucket with minimal overhead. As the article states as long as you run it < log2(n) times where 'n' would be the size of the FFT required you should come out ahead.

Unfortunately for my application it did not give an improvement but I thought it was interesting anyway, and on embedded platforms something like this may be your only option.

neonate · on Dec 25, 2019

What is your application?

jacquesm · on Dec 25, 2019

Automatic music transcription.

I've blogged about it here:

https://jacquesmattheij.com/mp3-to-midi/

That's almost two months ago, a lot has happened since then but it gives you a general idea of what I'm up to.

sevagh · on Dec 25, 2019

Interesting project. What do you think of https://github.com/sevagh/transcribe ? It seems like in your project the word transcription isn't referring to sheet music transcription?

jacquesm · on Dec 25, 2019

Well, you can go to sheet music if you want to but there are plenty of packages out there that will display midi files as sheet music so that would seem to be an unnecessary complication. The core problem is polyphonic music -> midi and that is hard enough!

I will take a look at your software, it definitely sounds interesting, the question at the end of https://github.com/sevagh/transcribe/blob/master/doc/snac.md can be answered in the affirmative, the Onsets and Frames package mentioned elsewhere in this thread does just that and works very well.

By the way, the linked paper is unavailable.

sevagh · on Dec 26, 2019

OK, fair enough. A MIDI file would contain the musical information necessary to generate any notation. Thanks for the reply.

I see indeed the link to the Mcleod pitch method paper is dead. I'll see about hosting it in a git repo.

jacquesm · on Dec 27, 2019

Automatic music transcription.

ajnin · on Dec 25, 2019

I used this algorithm to detect the signal sent by heart belt sensors a few years ago. Those belts send a quite weakly powered 5kHz "chirp" through a coil for each heart beat. You're suppsed to detect those chirps via another coil magnetically coupled and tuned with an LC circuit to the 5kHz resonance frequency. Through a combination of a lot of analog amplification and this algorithm I was able to implement detection on an AVR microcontroller, which would never have been possible using FFT or similar techniques.

jcims · on Dec 25, 2019

Did the ADC come with the microcontroller? That’s a pretty high sample rate.

jacquesm · on Dec 25, 2019

http://maxembedded.com/2011/06/the-adc-of-the-avr/

Has the built in ADC of the AVR at 15 K samples/second which should be enough to process a 5 KHz signal.

jcims · on Dec 26, 2019

Whoa. The capability in these little chips blows my mind. Nice to be able to eliminate the LC passives with a bit of code. If the chirps are of adequate length could probably sleep or sample other inputs a good bit of the time too.

jacquesm · on Dec 25, 2019

That's a very neat application.

riggsd · on Dec 25, 2019

For many simple use cases, it is vastly easier to count zero crossings per unit time. One period of a sinusoid contains a single positive-to-negative and a single negative-to-positive transition. No DSP required, though you may first bandpass the audio if you've got a target frequency range. Assumptions: the fundamental is the strongest harmonic present, negligible DC offset.

Simple example: For a 1.0 second audio sample, if the sign changes from +/- or -/+ 880 times, the frequency is 440Hz.

jacquesm · on Dec 25, 2019

How would you adapt that method to more complex waveforms such as composites resulting from interference and harmonics?

riggsd · on Dec 25, 2019

Strong harmonics which overpower the fundamental will fool a zero-cross analysis, resulting in, say, a double or triple frequency output if the 2nd or 3rd harmonic is strongest. The strongest signal present will overwhelm the amplitude sign of the combined signal; so it still works even in the presence of some noise or other signal.

High amplitude noise will also skew results resulting in a higher frequency output. Low-pass or band-pass filter can work wonders.

Again, zero-cross analysis is suitable for simple use cases where you are directly sampling a signal.

jacquesm · on Dec 25, 2019

I wonder if you could take just the sign changes of the input signal and use that in lieu of the zero crossings and get something useful out of it, for instance, a finger print for chords or harmonics.

riggsd · on Dec 25, 2019

Nope.

jacquesm · on Dec 25, 2019

5 chars save me day of experimenting. Thank you!

Any suggestions on what might be fruitful avenues?

yters · on Dec 26, 2019

DSP question: how do I detect signals, such as certain words, in a series? Do I convolve a reference signal? And how do I remove such a signal from the series automatically? Context is I want to remove eye blinks and other such artifacts from an EEG.

stagger87 · on Dec 26, 2019

I don't know about EEG, but a standard way to perform this type of action is to correlate your signal with a known signal (signature of a blink in your case) and if the correlation result exceeds some arbitrary threshold you set you have a match.

yters · on Dec 26, 2019

Is it possible to whiten the series to remove the signal but retain other signals occurring at the same time?

jacquesm · on Dec 26, 2019

That is getting quite close to the way my software does chord detection one tone at the time, so yes, that works. What I do is subtract the match from the signal and then re-run the analyzer. This works well enough for signals that are orthogonal, less well for signals that have a lot in common (such as octaves or other close multiples of the original). Correlation is super expensive numerically if you don't know the input phase by the way, you'd have to hunt for the best match, if you can take it out of the time domain then it will get a lot better.

yters · on Dec 27, 2019

A bit of research suggests to me the best way to do what I want is convolve the signal to remove with the series, and subtract the signal at the peak of the convolution, if it passes the threshold.

Also found out that convolution can be calculated much more quickly than N^2 (my original concern) by using the Discrete Fourier Transform. The DFT of convolved series is the same as pointwise multiplying the DFT of each series. So, taking the inverse DFT of the pointwise multiplied series' DFTs will give me the convolution with a cost of N log N instead of N^2, if I use the Fast Fourier Transform.

I'll try this out and report back!

jacquesm · on Dec 27, 2019

Ok, much good luck, I'm really curious how well it will work. btw, my email is jacques@modularcompany.com, in case you want to take it out-of-band.

neuralzen · on Dec 25, 2019

Tone detection has an interesting history to say the least. https://www.damninteresting.com/on-the-origin-of-circuits/

jacquesm · on Dec 25, 2019

That's not exactly the genesis of the field but it is an interesting story about genetic algorithms and the bit that always got me was how there were parts in there used in ways that no human designer would ever use them and yet they were critical to its functioning.

"Dr. Thompson peered inside his perfect offspring to gain insight into its methods, but what he found inside was baffling. The plucky chip was utilizing only thirty-seven of its one hundred logic gates, and most of them were arranged in a curious collection of feedback loops. Five individual logic cells were functionally disconnected from the rest— with no pathways that would allow them to influence the output— yet when the researcher disabled any one of them the chip lost its ability to discriminate the tones. Furthermore, the final program did not work reliably when it was loaded onto other FPGAs of the same type."

Still amazing to re-read, and taking this into account it will be a long long time before we can finally say that we have cracked the DNA code and all its gory little details.

perl4ever · on Dec 26, 2019

I always wanted to know what you would get if you developed the same thing, but forced it to deal with different FPGAs during its evolution. Would it still come up with something surprisingly inhuman?

jacquesm · on Dec 26, 2019

That's an experiment worth doing.

xchip · on Dec 25, 2019

FYI: I am using here the Goertzel Algorithm to help tune ANY musical instrument.

https://aguaviva.github.io/GuitarTuner/GuitarTuner.html

cushychicken · on Dec 25, 2019

What's the bandwidth of a Goertzel bin?

What's the 3dB bandwidth of the bin?

kragen · on Dec 25, 2019

Anything you like. I mean, it precisely calculates the inner product of a constant-amplitude complex exponential with the signal, so the tradeoff is the same as with the FFT, except that you have the option of using a fractional number of cycles.

In the Dercuano note I mentioned in my comment above, I go into some detail about alternative ways to window.

cushychicken · on Dec 25, 2019

So the bin bandwidth is about the same as an FFT bin of the same resolution?

Will check out that note.

kragen · on Dec 25, 2019

No, not about the same, precisely the same, except for rounding error (which is worse for Goertzel.) The numerical result you will get is precisely the same. If bin 27 of your FFT over some segment of some signal is 4.557 + 6.288j, then Goertzel for that frequency and that segment will also give you 4.557 + 6.288j.

jacquesm · on Dec 25, 2019

p5a0u9l · on Dec 25, 2019

This seems rather niche. Most real world applications don’t provide the luxury of knowing your desired response ahead of time.

Also, consider implementation in an FPGA where there are decades worth of validated IP to select from vs. this.

kragen · on Dec 25, 2019

There are of course FPGA implementations of Goertzel, but for almost any DSP application with 10 MHz or less of bandwidth, a CPU is plenty. No need to deal with Xilinx’s shitty proprietary software and debugging with sigrok for something you can do on a CPU.

In software there is nothing stopping you from changing your Goertzel coefficient, and thus the frequency detected, on the fly.

vardump · on Dec 25, 2019

> Also, consider implementation in an FPGA where there are decades worth of validated IP to select from vs. this.

Yeah, encrypted "intellectual property" that acts like a black box. Pass.

kragen · on Dec 25, 2019

Unfortunately the electronics industry has adopted the propaganda term “intellectual property” to mean “designs”, so that's what the parent is referring to: decades worth of validated designs.

jacquesm · on Dec 25, 2019

Not really. After all if you can do something in 'just software' your market size is suddenly a very large multiple of what it would be if you did an FPGA version, the marginal costs are much lower and the delivery can be done over the internet.

detaro · on Dec 25, 2019

Why would you add an FPGA for a simple standard algorithm like this?!

flerchin · on Dec 25, 2019

Clock rates are higher. You can go into the multiple GHz. Mostly useful for radar and electronic warfare. (not op)