Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
eBay’s Fast Billion-Scale Vector Similarity Engine (ebayinc.com)
66 points by jabo on May 5, 2023 | hide | past | favorite | 24 comments


Nice to see eBay invest further on its search side.

This paper is too dense for me to read through at this moment, but eBay must be a real disappointment for its shareholders:

Despite its head start at being a platform for online retail open to anyone, eBay stock has only doubled in 10 years, or in other words, added $12b in equity value (currently $24b). Imagine that: a successful tech company with a real product in a market that’s going stratospheric (online shopping) has lagged the S&P 500.

Meanwhile Amazon has octupled, which works out to adding $864b in market value.

In other words, Amazon has added 36 eBays worth of equity to itself in the last 10 years.

Ok, I haven’t looked at corp debt to calculate these numbers based on enterprise value, but that could make the numbers worse: eBay is asset-light with little physical infra compared to Amazon.

eBay’s fees have gone through the roof, so I suspect its volume has been decreasing.


Ebay used to have an engineering limit on the number of items in their indexes so they couldn’t grow and had to increase fees instead to keep the item count down, I don’t know if they still have that limit, I haven’t been paying attention. It meant they ended up with a focus on very high margin crap, Amazon is now going the same way because they can and it makes a lot of money.

Maybe with more listings eBay can lower their fees so that low margin items are able to be sold. And maybe with some competition between the two there might be a reduction of fees and a return to quality. Sadly I don’t think eBay is competing effectively, their website still sucks and I can’t get anything resembling an invoice for taxes, maybe it’s there but I haven’t seen it, I don’t use eBay very often.


>Ebay used to have an engineering limit on the number of items in their indexes so they couldn’t grow and had to increase fees instead to keep the item count down

I find this incredibly hard to believe. The hard limit on the number of total listings was deemed such an intractable problem to solve that rather than focus all engineering efforts for a few months on raising it, they just threw up their hands, didn’t even attempt to solve it, and tried to increase revenue solely through higher listing fees?


I find it very credible. Early ebay had very high fees on photos. They also would purge old auctions within at least 90 days. They still do purge old auctions which clearly would have some seo value. Also the whole way ebay motors was operated as essentially a separate site lends credibility as well.


That just means they were trying to cut costs (by reducing their storage footprint) and increase revenue (by charging a premium for photos) at the same time. I guess the brass at eBay deemed the commercial value of archiving old listings to be worth less than the cost of storing them. Remember, storage used to be very expensive, and no amount of engineering effort could decrease the cost of server racks and hard drives.

eBay Motors is separate because the process of buying a car is very different from buying knickknacks. The number of listings on eBay Motors is tiny compared to the main site, so it doesn’t even make sense that they’d split it off specifically to split up their listing database.

None of what you mention is evidence of a technical limitation.


And making a website that doesn’t suck shouldn’t be hard either… the eBay corporate culture is really bad, most blame the Meg Whitman era for that. Sometimes companies get so bad that even very simple things become insurmountable. Auction items have a few extra dimensions of complexity over normal search as the value changes quite dramatically over time and mostly at the last minute so it’s quite time sensitive. I know people who work there, they’ve checked out and are just collecting a paycheck and I don’t blame them.


Amazons growth has been one of the most astonishing things this decade. It feels like they broke all rules and managed to absolutely dominate. I remember in the 2000s the dominant paradigm was to be a platform since software “can’t do infra”. The idea was to be the middleman that Walmart, Target etc use, not to directly compete with them. Amazon did the opposite and competed hard and won.

Another anecdote is in a sea of tech companies that try to attract talent with perks/ benefit/ even high salaries Amazon had none. Amazon prides itself for making a table out of a door to save money (no joke). But arguably Amazon employees made bank in Amazon because of its stock growth, definitely more compared to Google, Apple, Meta, maybe similar to Microsoft. Finally of all companies, Amazon is the only big tech I know that veered off its main product and still dominated. AWS is completely off the left field for Amazon, it makes me optimistic that Amazon might become a juggernaut in healthcare too (they definitely are trying).


One of the best things about eBay compared to other market places like Amazon and Aliexpress, is that their search operators include not just AND (space) but also OR (comma separated values inside parentheses) and NOT (minus) but more importantly - eBay's search engine respects the supported operators and denoting which items are search results and which are suggested/similar.

This cannot be said about Amazon, Aliexpress, both which intentionally (I guess) mix similar items with results that match the search query.

I hope that eBay keep those two things distinct.


And you can setup email alerts for those complex search terms.

I made a tiny amount of money doing typo arbitrage on bundles labelled as “Nitendo” many years ago.

Sometimes when I wanted a really niche item, I’d wait for one of those alert emails that had more than usual results because I knew the bidding would be split.


I dislike eBay search, for example all of a sudden one of my daily bookmarked searches went from the usual 10 results to THOUSANDS, all of them not including my search term, this is the link: https://www.ebay.co.uk/sch/i.html?_nkw=Tasmota&LH_PrefLoc=1&...

Why are some sellers having all their listings (car parts) included in my search for "Tasmota"?


That's interesting, I've seen it happening randomly that a search query would return unrelated results for 1 out of 100 queries using the same query.

In your example however, it seems to me that if you add quotes, the irrelevant results are gone: https://www.ebay.co.uk/sch/i.html?_nkw=%22Tasmota%22


Good to see another engine leverage ScANN outside of Google.

HNSW uses lots of RAM and it’s interesting how all the major engines settled on that algo.

I’m interested in how they apply filtering, since with codebook based similarity such as PQ and ScANN it’s not trivial.

Maybe one day we’ll also see someone implement a production ready Vamana engine too, which also does really well at the billion scale.


Agreed. There's a lot of great index types out there - HNSW is incredible, but algorithms such as ScANN (and PQ) have their place in the ecosystem.

Tree-based vector indexes aren't bad either, especially if we can find a way to make the random projections more efficient.


This can be achieved with open-source solutions like Qdrant.https://github.com/qdrant/qdrant


HNSW seems better in the criteria that matter.

I think similarity search is a commodity now; I would not invest in developing an in-house solution given the abundance of good commercial solutions.


That depends a bit on the scale and use case specifics. But commoditized billion-scale vector search is indeed a thing. We published this for Weaviate in December last year https://weaviate.io/blog/sphere-dataset-in-weaviate


We've seen Milvus used in a variety of recommender systems running in production.


They’re embeddings so they’re dense. There are few things easier than dense vector similarity.


Embeddings for retrieval don't have to be. It is not unheard of to transform the raw embeddings to optimize them for retrieval; e.g., through binarization or hashing.


I was more making a distinction between embeddings and bag of words which are very very sparse matrices. The embedding dimensionality will not be anywhere near as high so this level of sparsity is a minor inconvenience.

Edit: also CPUs for this, yikes…


Such as...?


Vespa.ai is pretty crazy too, a bit unkown we run a huge vespa cluster serving 1k+ queries with <100ms latency ...



Qdrant, milvus, weaviate




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: