Great question! Unfortunately the long term costs aren't clear yet, right now I'm using google speech as a bootstrapping technique, but that is prohibitively expensive to run long term.
I think once my models are viable enough to do this at scale, the cost will be basically the cost of running a dedicated server per N streams. So $100-300/mo per N streams? Where N could roughly be at least 100 concurrent streams per server. I will know this better in "stage 2" where I'm attempting to scale this up. It's also a fairly distributed problem so I can look into doing it folding@home style, or even have the stream's originator running transcription in some cases to keep costs down.
I think once my models are viable enough to do this at scale, the cost will be basically the cost of running a dedicated server per N streams. So $100-300/mo per N streams? Where N could roughly be at least 100 concurrent streams per server. I will know this better in "stage 2" where I'm attempting to scale this up. It's also a fairly distributed problem so I can look into doing it folding@home style, or even have the stream's originator running transcription in some cases to keep costs down.