Hacker Newsnew | past | comments | ask | show | jobs | submit | senecaso's commentslogin

Why don't sites just start publishing a dump of their site that crawlers could pull instead? I realize that won't work for dynamic content, but surely a lot of these "small" sites that are out there which are currently getting hammered, are not purely dynamic content?

Maybe we could just publish a dump, in a standard format (WARC?), at a well-known address, and have the crawlers check there? The content could be regularly updated, and use an etag/etc so that crawlers know when its been updated.

I suspect that even some dynamic sites could essentially snapshot themselves periodically, maybe once every few hours, and put it up for download to satiate these crawlers while keeping the bulk of the serving capacity for actual humans.


Because crawlers aren't concerned about the bandwidth of the sites they crawl and will simply continue to take everything, everywhere, all the time regardless of what sites do.

Also it's unfair to expect every small site to put in the time and effort to, in essence, pay the Danegeld to AI companies just for the privilege of their continued existence. It shouldn't be the case that the web only exists to feed AI, or that everyone must design their sites around feeding AI.


I didnt know about the link to checkout. That's a slightly nicer user experience for sure. Still, its confusing for users who want to do more shopping at the same time. I had users who clicked on a number of items, clicked "add to cart" in each one (all different shops), and then couldn't figure out how to checkout on the main site afterwards! Obviously people were looking for a more complete one-stop-shopping experience than I was providing at the time.


I mean a single checkout from multiple shopify stores isn't really possible (at least by 3rd parties)

My hypothesis is that, if you could drive traffic to your site and offer a fast checkout experience, there's probably multiple ways to monetize that. Driving the traffic is the hard part.


Ya, curation is sadly required in the Shopify ecosystem. There are millions of shops, there is a tonne of garbage. Its also difficult (but not impossible) to properly classify items so that you can better target results for a given query. One of the first problems that anyone attempting this will run into is the amount of mature content available on Shopify shops. Innocent queries turn up many NSFW images that may offend some users, so you have to be able to get on top of that one pretty quick.

I remember in once case, I found what appeared to be an escort service listing "models" on Shopify. It was super creepy. I needed to get in front of that one pretty quick as well, as it was turning up in results.


https://www.searchagora.com/about

Seems he is indexing nearly 650k shops.


Yes, this has been available for a few years now. Initially, they only indexed a very small number of shops, so it was less useful. Based on a few queries, it seems like the are still using some form of text-based search with rank boosting. Seems like they still aren't searching their entire base of shops, but they have increased the number of shops for sure, and they seem to be continuing to invest in the product, which is nice. It seems more useful now than it did the last time I checked!


I hope you have better luck than I did!

A few years ago, my partner and I built vendazzo.com (now defunct). It was an e-commerce search engine on products listed on Shopify shops (sound familiar? :)). At the time, we had > 100m products listed, and I don't remember how many shops we were indexing.. over 100k I think, but we had access to over a million. Overall, I think your approach is very similar to ours, but we managed to keep our costs lower. At the time, we were spending ~$550/mo, and our search times were under 300ms. We had established partnerships with a number of shops, and we had a few users, but not nearly enough. That's where the wheels came off. The site operated for over a year, but the monthly costs wore us down until we finally decided to pull the plug.

I still maintain that this is a good idea, and constantly have to fight off the urge to "try again", however, to do it properly, I think funding would be necessary, or finding some way to organically gain a lot of users.

Looking back, there are things I could have done to reduce my opex further, but in the end, it still wouldn't have mattered if I couldn't figure out how to acquire users.


>but in the end, it still wouldn't have mattered if I couldn't figure out how to acquire users

In EU there are many price comparison engines with millions or billions of products. I don't know how popular they are. Some monetize trough ads, some have partnership with stores and you can buy directly from the search results.

I generally search first on the local Amazon equivalent, if I don't like what I see, I search on a smaller store. If I still can't find or dislike the products or prices I search Google. If I am still not contended with the results, I will go search on comparison engines.

And I also have a browser extension called Pricy who polls the comparison engines, so once I land in a product page I know which store has the better price and what was the price history through last year.

Probably many people have similar patterns. I expect people in US to search Amazon first, if it's not a very niche product they are after.

I think you can have a better monetization proposal, if instead of just search you build a sales platform, so people can directly buy after searching, without hoping to various websites.


Unfortunately many of these "comparison" websites have a businesses model built on affiliate fees.

It doesn't take much imagination to predict which products show up as "best" or "cheapest".

And the fairer ones have to keep playing cat and mouse with shops lowering pricing when they detect a scraper coming by. Or employ tricks to make their shipping seem free, lowering their overall price on the comparison platform.


> It doesn't take much imagination to predict which products show up as "best" or "cheapest".

Never seen a "best" outside of amazon, which does weird shit even without any affiliate fees. And "cheapest" is not really up to the site, unless they want to go under quite quickly.


Many if not all are like that. It's like everyone wants to take advantage of the lack of perfect information in the marketplace, as opposed to actually being helpful for consumers.


We were intentionally limiting the number of products and shops we were indexing due to opex. We needed to keep it low enough to provide ourselves with enough runway to keep things floating for longer.

pricerunner is another site which operates in a similar space. We had plans to build out the price tracking and a number of other features, so that we would appeal more to users who had your use cases. Sadly, we weren't getting enough traction. We did have regular users from the EU, but we simply couldn't seem to get in front of enough eyeballs for it to matter. At least at first, I expect that a large amount of your traffic to a new site like this has to be driven by Google, and we failed on that front as well. I'm not an SEO expert, so there were likely many things we did wrong or didn't even do which lead to this situation.

re: a sales platform, that's a pretty big challenge to take on, which would require massive investment up front. Not sure thats a viable route for most. We did have plans to address the "without hoping to various websites" problem, as we identified that as problematic for users very early on. The solution was relatively simple, but required more money to build out. We simply ran out of funds before we could get there.


<< We did have plans to address the "without hoping to various websites" problem, as we identified that as problematic for users very early on. The solution was relatively simple, but required more money to build out. We simply ran out of funds before we could get there. >>

What were your plans to solve this problem?


> In EU there are many price comparison engines with millions or billions of products. I don't know how popular they are.

Anecdotally, I guess, I'd say extremely popular. I never search for products anywhere else.


Yeah, here in Czechia I always look at https://www.heureka.cz/ first.


What do you consider the local Amazon variant? And which country?


Amazon has no direct presence in Switzerland, but you can order a fraction of its products from neighboring countries. Many products are not available, mainly because nobody wants to deal with customs once the product crosses the EU boarder.

Amazon itself never moved into Switzerland in the first place for many reasons (small market, unusual customs situation, relatively high salary for warehouse workers), and in the meantime the largest Swiss supermarket chain created an Amazon clone which became hugely popular pretty much immediately: Galaxus.ch


If you wouldn't have said that it's basically the Amazon in Switzerland, I'd have thought that this is some blogspam dropshipping site...


Amazon is a blogspam drop shipping site in Europe


There are alternatives throughout Europe. The Balkans have Emag, Benelux has bol.com. I think in both regions Amazon is less popular. I'm sure there are other examples.


The Netherlands has plenty of them. Tweakers.net is a price tracker for electronics and such (eg: computer parts, phones, laptops etc) and usually it's easier to find a shop cheaper than Amazon. I have some go to stores for my needs because their content is organised way better than Amazon. I also find some alternatives better than Amazon because they have free next day shipping, something that's not free on Amazon.


Emag in Romania. I hate it, they bought most of the competition, they did a lot of anticompetitive things, but it's really easy to buy from them.


At some point, a couple of years ago when they introduced marketplace, I actually thought they are aiming for an "exit" to Amazon. They really got the service part of e-commerce nailed down. Merchants quality is and always will be an issue, but it is the same as on Amazon.


hagglezon.com to compare Amazon variant prices


bol.com in the Netherlands


Im curious why you consider lack of users to be the problem. I would have described it as lack of revenue.

What plans did you have for generating revenue from the site? (Serious question - given your low costs it would seem like a tiny amount of revenue would gave been enough.)


Our business model revolved around referrals, so lack of users directly translated to lack of revenue. While its true that even if we had millions of users but none of them were buying sponsored items we would have had a revenue problem, that wasn't the problem we were facing, as the few users we did have were in fact purchasing sponsored items.


Then the problem seem to be the lack of users.

Have you tried having an YouTube channel, TikTok, Facebook, Twitter, blog and explain daily how you built the website, how your platform is going to help users?


we did have channels on various sites, yes. However, its difficult to maintain a steady stream of content there for people to consume. Not only that, but you have the same discoverability problems as you do for the main site. Also, a blog outlining how you built the site may be of limited value. At least my experience on that front was it would generate short-lived bursts of traffic, but wouldnt generate returning users. So I think those articles were mostly appealing to technical users, and not necessarily users who were looking to do some shopping. Of course technical users do also shop, but after reading a technical article, they probably arent looking to immediately shop, and without some other mechanism putting the site in front of them again when they needed to shop, we would miss the opportunity.


Thanks for sharing this! If you're up for it, I'd love to talk more about your experience, especially the technical tooling. Working as fast as I can to understand the right way to approach the tech, as there are tradeoffs with performance and price. I'm at support @ searchagora .com


What strategies did you consider or implement to attract more users, and what would you do differently now to ensure better user acquisition?


We had no capital, so advertising or solutions that basically involved "throwing money at the problem" were off the table for us.

We spent time posting in forums helping people find items they were looking for, and we had a few posts here on HN that generated short-lived, explosive traffic bursts. I remember those days we had posts get picked up on HN, it was always an exciting night!

We were looking at influencers and getting our name getting bloggers to talk about us, but, again, without capital, our options were very limited here. I'm sure someone with more of a marketing background would have found a bunch of ways we could have generated organic user growth, but neither me or my business partner had that skill set.

If I were to do it again, I think I would try to get someone with a marketing background involved to help gain traction. Without that, even the best product in the world will die of starvation if no one finds it.


looks like simptoms of no market. maybe you were solving a problem already solved by amazon ? most shops on shopify also use amazon


Many shops do double list, this is true. However, I don't think its a solved problem. There are many people who do not want to shop on Amazon for their own reasons. There are also people who want to shop locally, and Amazon provides no mechanism to do so (that I'm aware of). There are also many smaller shops who simply cannot afford to list on Amazon, as there are considerable fees associated with running a successful business there. It was these smaller shops who we were initially building to serve, to provide a funnel for them.

Still, there were problems with our solution that if addressed may have provided a better market fit. If we had had more runway, we would have worked to address them, but that simply wasn't in the cards.


To me it seems like a small market. And worse, it's hard to conquer that small market since it's very fragmented. Even if you had money for advertisements, it still would have been hard.

On the plus side, though, if you had the skills to build that platform, you certainly have the skill to build a more profitable and easier to monetize platform.


>looks like simptoms of no market. maybe you were solving a problem already solved by amazon ? most shops on shopify also use amazon

FAANGS get around this by creating problems that they will offer to solve.


Not in all countries though. Amazon isn't present or popular, or as omnipresent in many countries.

That's an opportunity, I guess.


> We spent time posting in forums helping people find items they were looking for,

Did you run any analytics on how much overlap there was across Shopify sites on "similar items" (Alibaba resellers/dropshippers)?


we didn't, no, but we spent a lot of time sifting through our catalog, and there was a _tremendous_ amount of crap in there. We manually curated and purged shops that were obviously just dropshipping or looked like out-right scams.


Can't you sample ten random product then ask a llm to rate the shop on a scale from drop shipped to artisanal as a first approximation?


I doubt it would be that easy, but, ya, using some form of automation is necessary. We devised a few rudimentary way to filter out the chaff, and it did quite well to remove the garbage. Still some would slip through, so it still required vigilance to remove them when you happen to see them.


Wow, it's cool to see this idea trending on HN! Full disclosure, I'm one of the co-founders at https://www.marmalade.co. Speaking from personal experience, it’s been a long road getting from the universe of all Shopify products to a curated inventory that’s easy for people to shop on. While ChatGPT isn't going to replace human curation anytime soon, the AI tailwind has made it much easier to build search and recommendation systems. On our end, we've definitely caught the semantic search bug. Watch out for it - you’ll wake up one day with a cross-modal hybrid search index on pinecone and any number of models on huggingface :). However, as you rightly point out, user growth is still the key. We're working toward launching a community aspect of the platform in the coming months as a solution.


You site looks good, and your results are fantastic! Job well done. I did hit a server error though, so obviously still some issues to work out, but overall, really well done. Moving to semantic search was one of my top priorities before we went under, but I struggled to justify the costs of it as we were operating on a shoestring budget.

Best of luck to you and your team on user acquisition!


ya, a way to submit feedback/comments might be a good idea. I just submitted an idea, and immediately was wondering how I would be able to engage with the people who voted (or didnt)


First, thank you for trying it out and for submitting an idea. That's super cool.

With regards to engaging with users:

In the current design, the first two levels of interest that a user may show on your idea do not include sharing any contact information. Namely upvoting an idea, and signing up for updates. A user has to support your idea on level 3 ("share contact info with the project owner") or level 4 ("share your use case") in order for you to actually get their contact info.

For now, if you go to your "ideas" section, you'll already see the count of each level of support your idea got (upvoted, subscribed to updates, shared contact info, filled out questionaire).

The ability to actually send out an update to the users that requested it is actually still missing. Same goes for accessing the contact info of the users who shared it with you, and for accessing the info they shared with you when they filled out the questionaire. I didn't know if this Show HN was going to go anywhere. But I'm going to add these features. These contacts are yours. And they have entrusted me with forwarding their request to you.


I was thinking more along the lines of a public dialog, where a user could post a comment like "This is a great idea! However, if you could make it do X and Y, it would be even better!"

Getting access to people's contact information would be useful at a later point in the process, but early on I think immediate feedback would be the most beneficial. Even something along the lines of a HN thread like this one would be enough engagement at first. In fact, perhaps that is even sufficient :)


Right. This should be possible without necessarily giving up one's privacy.

I wouldn't want a public dialogue. Because that would signal public interest in the idea to outsiders. This is something that I imagine people posting an idea want to avoid, because they don't want to attract competition if the idea turns out to be a good one.

But it would be easy to have a two-way-conversation with the creator. So user john can post a message to idea owner thomas. Thomas can even reply, and they can write back and forth. But john will ever only see his own conversation with thomas, and never those that thomas might have with others, such as ben or lucy.

This is actually one of the major reasons I made the site, even though there's Hacker News. If my idea got traction on HN, it would attract more and more attention, and I'd have no way of pulling it from the site. Public interest in the idea would be documented for everyone to see, forever.


Oh, wow! Thanks for pointing that out. I'll take a look and try to get that cleaned up.

We don't have a lot of shops with electronics at the moment, so a query for "router" would be light anyway, but these particular results are not useful at all.


Got it. What particular business types do you feel your search currently provides good results for?


We have almost anything you could look for, but for some reason, we don't have a good selection of computer parts and accessories. Its like all of the electronics shops decided not to use Shopify or something :)

You can find a rough breakdown of our product counts by category in our "Why Shopify is the best (only?) alternative to Amazon" post: https://blog.vendazzo.com/why-shopify-is-the-best-alternativ...

Of the 140k shops we currently have, the vast majority are clothing (shoes, shirts, jewelry, etc), home & garden (furniture, kitchen, cups, glasses, etc), and vehicle parts. The other categories still have a significant number of products in them, but we're light on the computer accessories, and much of what you find will be either older generation or enterprise. Its possible that the 140k shops that we selected just happen to not be electronics/computer shops, but until we're able to index the remaining 1.2M+ shops (that we're aware of) we won't know for sure.

One other thing to point out is that the filters (after you search), can be very helpful We're still not at a point where we can build out semantic search, so using the filters to select a category, colour, etc, can drastically improve the relevance of your results. Hope that helps!

BTW, I have made changes to remove the "Route Protection Package" from our index, and it should start disappearing over the next week or so. Thanks for pointing that out!


Yep, this is pretty much it. Amazon has the advantage that they are able to force their sellers to specify metadata about their products in very specific ways, which allows them to build on top of that consistency. With Shopify, as tomnipotent said, its all over the place, and it only gets worse as you cross languages.


I've seen both Amazon itself and sellers use wildly inconsistent ways listing products using the "multiple flavors of this item" functionality.

Probably hands-down the most irritating metadata issue with Amazon is when sellers don't include quantity information properly - or even include quantity in the title when selling just one size - which I'm sure they do on purpose to make price per unit shopping more difficult.

Sometimes the way the items are listed will be different between one product and another made by the same company, or between Amazon's listing and a third party's listing for the same general product...and they each have different types of the item available. Say, Amazon will sell Ken's Socks in purple and blue, all sizes in purple but only small in blue...while Blankenship E

Things seem to get especially messy when you have a matrix of two 'flavors' (such as a sub-model and color, or unit size), and not all combinations were manufactured or available.

Even the UI isn't consistent. Sometimes it's a drop-down list, sometimes it's the rectangles.

Oh, and often the model/part number isn't entered...


We were surprised by this as well, but since one didn't exist, we started to create one!

https://vendazzo.com/

We still have a long ways to go, but you can search for items and filter based on things like category and distance (if you are trying to shop locally). We're doing things on a very tight budget at the moment, so sometimes the site goes down, but we're working on it :)

If you have any feedback, we would love to hear it.


Is it broken? It just says "An error occurred" when I typed "Plushie" and then "Fruit"


Yep, it just went down. We've been using spot instances, and those are hard to come by around Black Friday. I'll look into it.


back up!


Interesting! Could you explain how did you manage to find sites and how can you know where are they based?


Right now, we gather location data from the pages themselves. Some shops (not many) include an address. The process is very error-prone though, so our accuracy in this area is likely not where we want it to be. We are working on other ways to improve that.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: