Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's censored data. Statisticians have been working with that for at least half a century. All they've done is reinvent the wheel, and I have no confidence that they've made something that's as good as what we already have.

Anybody who's interested in this should read up on the Kaplan-Meier estimator (https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator). If you want to see it used for a problem that's very similar to demand estimation, see Censored Exploration and the Dark Pool Problem (https://www.cis.upenn.edu/~mkearns/papers/darkpools-final.pd...).



There's elements of both censoring and causal inference here. For a given customer/checkout: if they buy you know the demand was higher than what was displayed. If they don't buy, you know it was lower. But exact demand is censored. The causal inference aspect comes in since you can only display one set of options at checkout time so the demand in the counterfactual case (if you showed the customer a different set of display options) is completely unknowable.


Plenty of software has reinvented something that was done prior. Even if the outcome hasn’t improved significantly, how the outcome was achieved still has plenty of value. Just as we’ve seen in software, repeatedly finding better and efficient processes to achieve largely the same technical outcome.

Throwing the machine learning label into the mix always pushes the expected results to a higher standard.


In contrast, demand estimation is a statistical or rather econometric problem that targets exactly the areas that ML has yet to explore: Causal analysis, censoring of dgp and related to this but distinct in the literature, identification and endogeneity.

The authors in this article do not show any ML, its all mainline stats.

ML is used, even in these areas, for the things it does best. So it is a misnomer to separate nowadays.

But i disagree that one wozld expect ML to do better in this area.

Look at websites doing ab testing, which is certainly not ML but experimental stats.


I should note that I meant people tend to expect more from ML outputs when they hear the words ML/AI while that may not be the case and while it could simply be a process optimizations making the whole original output easier to achieve for personal new to it or gives experts a shorter path.


This problem is more about causal inference than it is about censoring (and yes, causal inference is technically solving a particular subcase of censoring, but the problem setup is usually quite a bit different). They are trying to estimate the causal effect of presenting certain availability options to their customers.

Their particular causal inference approach appears to be simply outcome modeling, which is an OK approach (the main problem with it is that it doesn't provide good diagnostics for detecting violation of your strong ignorability assumptions necessary for the procedure to work). They could probably improve things a bit by using a doubly robust estimator.

Kaplan-Meier estimators are completely irrelevant. Those have to do with right censoring of survival distributions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: