Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Let me see if I understood you correctly. Let Y represent a random variable representing stationary time series data. Let g(.) be the forecast function.

1) Log-transformation: log(Y)

2) Forecast results of transformed data: E[g(log(Y))]

3) Inversion of forecast results: exp(E[g(log(Y))])

4) However, what we really want is the expectation of the transformed-forecasted-inversed results, which is E[exp(g(log(Y)))].

Jensen's inequality states that for a convex function φ,

φ(E[X]) <= E[φ(X)]

And equality is only attained if either X is constant or φ is affine. Since neither is (generally) the case here, therefore

exp(E[g(log(Y))]) < E[exp(g(log(Y)))]

An offset factor ε is needed to correct (3) to (4)

E[exp(g(log(Y)))] = exp(E[g(log(Y))]) * ε

Did I get that right? (Offset changed to multiplicative factor)



Yes, exactly. However in this case, the offset is multiplicative.


Understood. To add one more point, I'm noticing the reason the above works the way it does is because most forecast algorithms output an expected value instead of a random variable, hence the results are E[g(log(Y)] instead of just g(log(Y)).

It strikes me that if you package the entire thing as a random variable:

Z = exp(G(log(Y)))

and use a different kind of forecast function G : Y -> Y' where Y, Y' ~ Normal, then we don't need the multiplicative factor -- which can be difficult to calculate for an arbitrary transformation. We can just get the expected value of Z, ie. E[Z] = E[exp(G(log(Y)))]. This is not done in the article, but in theory it could be.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: