*You're going to want to optimize the hell out of it and ensure it's correct, th...

6gvONxR4sf7o · on Jan 22, 2020

For this kind of linear algebra, you're going to have a different routine for a sparse matrix (dcoomm) or a symmetric one (dsymm) or a triangular one (dtrmm) or symmetric banded matrix vector product (dsbmv) or whatever. Hopefully those are separate functions from your dense one (dgemm), which is what I'm talking about. You'll also have a different function for CUDA (which exists in cuBLAS) than for multi-core (in ScaLAPACK? PLASMA?) than for single-CPU (in BLAS), which is what I'm talking about. General matrix multiply in this sense isn't "all the different matrix multiplies." It's a function with a very specific purpose, and yeah you don't really need to revisit it every decade.

Either way, it's not that much horror: http://www.netlib.org/lapack/lapack-3.1.1/html/dgemm.f.html