Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It'd be something like conjugate gradient, not sgd. Unless A itself is too big to fit in memory.


The method John d Cook is referring to is something like conjugate gradient not sgd?

I’m not too familiar with CG. What clues you in that he’s thinking more CG than SGD?


It is just "what you do". If it is a small problem the default is qr decomp of A. If you are worried about speed do a cholesky decomp of A'A. If the problem is big (usually because of a sparse A) then you do conjugate gradient (because fill in will bite with a direct method). If it is really, really big (A can't fit in memory) then it isn't clear what the "thing to do" is. It is probably "sketching" but in ML/neural network land everyone just does SGD, which you can think of as a monte carlo estimate of the gradient (A for a linear problem). Maybe sketching and SGD are equivalent (or an appemroximation). "what you do" is based on convergence and stability characteristics.


Helpful. It makes sense that CG maintains sparsity. I didn’t realize it saw use in practice.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: