So basically, if you want to estimate the average of more than three averages, there is a better way to go than just averaging them together. Sort of like how when computing the Standard Deviation for a sample population you divide by n-1 instead of n. Anyways, that's the James-Stein estimator, and the authors apply it to PCA dimensionality reduction; they de-bias and improve the leading eigenvector with their technique.
Ctrl-f for "fig. 3" to see their results in action.
This looks like it might affect a wide number of fields. I appreciate the concrete examples (batting averages, finance / Markowitz) as well.
PCA is an interative algo. Once you build the leading vector, you subtract it from your data and then build the next.
Do you know by any chance why you can't use this new method recursively for building the whole SVD basis ? (I haven't read the paper carefully yet.. It's a bit at the limit of my understanding )
So many things like this come out but I've rarely found a way to actually implement any of these things, so they don't become actual learnings or takeaways. Skimming over the paper it just doesn't seem applicable.
Perhaps someone has a list of interesting innovations from recent research we can actually apply at work with data, programming or such?
Very cool to see researchers moving the needle on something as fundamental as PCA.
A few caveats for the "impact" in finance: PCA is an oft-used tool during the initial research phase, however, once a model, whether on the alpha or risk side, reaches production, marginal improvements to the leading eigenvector will likely be a rounding error compared to other confounding issues, that's assuming PCA survives past the basic first-pass approximation phase of model building at all.
There are a number of better and more analytically useful models which I would expect to be used instead.
I can already hear some people coming from less quantitative firms or groups arguing PCA is used extensively. However, those same firms and groups rarely adhere strictly to model outputs - Layering on discretionary traders views, which are not often not quantified.
I was never on the sell-side responsible for derivatives pricing, but they would almost certainly never include PCA in anything they do, unless perhaps it was a client request?
This is interesting and important to know about but increasingly I feel like stochastic uncertainty isn't the main source of "random/error variance", it's more like real but uninteresting heterogeneity. I can see using this in practice but it's probably a minor part of the whole picture. I don't mean to sound dismissive; I think it's more that I wish there was more focus on other types of uncertainty as well.
https://www.pnas.org/doi/10.1073/pnas.2207046120