Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's fairly easy to use, not that compute intensive (e.g. can run on even a small-ish CPU VM), the embeddings tend to perform well and you can avoid sending your data to a third party. Also, there are models fine tuned for particular domains on HF-hub, that can potentially give better embeddings for content in that domain.


Just to add to this, a great resource is the Massive Text Embedding Benchmark (MTEB) leaderboard which you can use to find good models to evaluate, and there are many open models that outperform i.e. OpenAI's text-embedding-ada-002, currently ranked #46 for retrieval, which you can use with SentenceTransformers.

https://huggingface.co/spaces/mteb/leaderboard


I see - thanks for the clarifications

I presume if your customers are enterprise companies then you may opt to use this library vs sending their data to OpenAI etc.

And you can get more customisation/fine-tuning from this library too.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: