It makes sense that low-information terms would have a lower preference when searching without any context. If your index models the context around terms, you can get better results from a low-information search.
I think...I'm kind of shooting from the hip here relating it to context modeling in lossless compression schemes like CABAC and PPM.
Could you overcome stop words with some sort of Bayesian phrase matching over some learned hidden states?
I think...I'm kind of shooting from the hip here relating it to context modeling in lossless compression schemes like CABAC and PPM.
Could you overcome stop words with some sort of Bayesian phrase matching over some learned hidden states?