I just need their API to be faster. 15-30 seconds per request using 4o-mini isn'...

BoorishBears · on Oct 1, 2024

You should try Azure: it comes with dedicated capacity which is typically a very expensive "call our sales team" feature with OpenAI

simonw · on Oct 1, 2024

The new Realtime Websocket API appears to send back responses within less than a second. It might be just what you want.

bcherry · on Oct 1, 2024

yes and you can use it in text-text mode if you want. a key benefit is for turn-based usages (where you have running back and forth between user and assistant) you only need to send the incremental new input message for each generation. this is better than "prompt caching" on the chat completions API, which is basically a pricing optimization, as it's actually a technical advantage that uses less upstream bandwidth.

carlgreene · on Oct 1, 2024

That is odd. Longest I’ve experienced in my use of it is a few seconds.

petesergeant · on Oct 1, 2024

That doesn’t match my experience using it a lot at all