Does it matter that each token takes additional milliseconds on the network if the local inference isn't fast? I don't think it does.
The privacy argument makes some sense, if there's no telemetry leaking data.
Does it matter that each token takes additional milliseconds on the network if the local inference isn't fast? I don't think it does.
The privacy argument makes some sense, if there's no telemetry leaking data.