It's a hard problem. I'm prototyping this here [1]. Any user can tweak or vote on transcriptions, so my goal is to use the user annotations to help train models and make it better.
I was wondering if you could estimate what it would cost to have always on recording of all these radio conversations, cost of running this speech2text ML and cost of labeling this data.
I think having these rough estimates will make donations easier for people.
I've got a year+ of the Ohio MARCS-IP site in Hamilton County Ohio recorded. Let me know if you need some data -- I'd be more than happy to get you the dump.
(trunk-recorder + rdio scanner).
The UI is:
https://cvgscan.iwdo.xyz for the live stuff, but, let me know if you're interested in the data -- my email is in my profile
Great question! Unfortunately the long term costs aren't clear yet, right now I'm using google speech as a bootstrapping technique, but that is prohibitively expensive to run long term.
I think once my models are viable enough to do this at scale, the cost will be basically the cost of running a dedicated server per N streams. So $100-300/mo per N streams? Where N could roughly be at least 100 concurrent streams per server. I will know this better in "stage 2" where I'm attempting to scale this up. It's also a fairly distributed problem so I can look into doing it folding@home style, or even have the stream's originator running transcription in some cases to keep costs down.
[1] https://feeds.talonvoice.com
Repo is here if you need to report (or fix) bugs in the webapp: https://github.com/lunixbochs/feeds
If you want to help with development, reach out and I can onboard + give some test data.