You are not wrong! An entire subclass of anomaly detection can basically be reduced to: forecast the next data point and then measure the forecast error when the data point arrives.
Well it doesn’t really require a forecast - variance based anomaly detection doesn’t make an assertion of the next point but that its maximum change is within some band. Such models usually can’t be used to make a forecast other than the banding bounds.
If you need to detect anomalies as soon as they occur, that seems right. But if you want to detect them later you can also combine back-casting with forecasting. Like Kalman smoothing.
I'm not really smart in these areas, but it feels like forecasting and anomaly detection are pretty related. I could be wrong though.