That is getting quite close to the way my software does chord detection one tone at the time, so yes, that works. What I do is subtract the match from the signal and then re-run the analyzer. This works well enough for signals that are orthogonal, less well for signals that have a lot in common (such as octaves or other close multiples of the original). Correlation is super expensive numerically if you don't know the input phase by the way, you'd have to hunt for the best match, if you can take it out of the time domain then it will get a lot better.
A bit of research suggests to me the best way to do what I want is convolve the signal to remove with the series, and subtract the signal at the peak of the convolution, if it passes the threshold.
Also found out that convolution can be calculated much more quickly than N^2 (my original concern) by using the Discrete Fourier Transform. The DFT of convolved series is the same as pointwise multiplying the DFT of each series. So, taking the inverse DFT of the pointwise multiplied series' DFTs will give me the convolution with a cost of N log N instead of N^2, if I use the Fast Fourier Transform.