Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

They’re embeddings so they’re dense. There are few things easier than dense vector similarity.


Embeddings for retrieval don't have to be. It is not unheard of to transform the raw embeddings to optimize them for retrieval; e.g., through binarization or hashing.


I was more making a distinction between embeddings and bag of words which are very very sparse matrices. The embedding dimensionality will not be anywhere near as high so this level of sparsity is a minor inconvenience.

Edit: also CPUs for this, yikes…




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: