It's a hard problem. I'm prototyping this here [1]. Any user can tweak or vote o...

dspoka · on June 3, 2020

Great to see you working on this!

I was wondering if you could estimate what it would cost to have always on recording of all these radio conversations, cost of running this speech2text ML and cost of labeling this data.

I think having these rough estimates will make donations easier for people.

imroot · on June 3, 2020

I've got a year+ of the Ohio MARCS-IP site in Hamilton County Ohio recorded. Let me know if you need some data -- I'd be more than happy to get you the dump.

(trunk-recorder + rdio scanner).

The UI is:

https://cvgscan.iwdo.xyz for the live stuff, but, let me know if you're interested in the data -- my email is in my profile

lunixbochs · on June 3, 2020

Great question! Unfortunately the long term costs aren't clear yet, right now I'm using google speech as a bootstrapping technique, but that is prohibitively expensive to run long term.

I think once my models are viable enough to do this at scale, the cost will be basically the cost of running a dedicated server per N streams. So $100-300/mo per N streams? Where N could roughly be at least 100 concurrent streams per server. I will know this better in "stage 2" where I'm attempting to scale this up. It's also a fairly distributed problem so I can look into doing it folding@home style, or even have the stream's originator running transcription in some cases to keep costs down.