Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Searx – Privacy-respecting metasearch engine (sagrista.info)
299 points by cube00 on Nov 12, 2021 | hide | past | favorite | 147 comments


Every time I stumble across a new search engine, I add it to my search engine comparison tool:

https://www.gnod.com/search/

Will add SearX now. It seems to provide reasonably good results.

Update: It's on (under 'More Engines').


There is also Marginalia https://search.marginalia.nu/ which I don't see on your list.


It's currently undergoing its monthly maintenance just FYI. It's up and technically working, but with a drastically reduced index size.


Tried Marginalia. So many plaintext http links which I avoid like the plague. That's my only gripe with it. Other than that, it's an awesome tool.


That's funny, I personally prefer HTTP for its simplicity, human-readability, accessibility, lack of centralized control, backwards compatibility and lack of forced upgrades or locking out old clients, etc., not to mention speed.

Of course, I'm fortunate enough to live in a place where MITM attacks are virtually non-existent, aside from WiFi portals and maybe ISP banners (which I've never experienced.)


> Of course, I'm fortunate enough to live in a place where MITM attacks are virtually non-existent, aside from WiFi portals and maybe ISP banners (which I've never experienced.)

I don’t know where you live, but i feel like this is more common and insidious than you think. For instance, in the UK Vodafone (or Three, I don’t remember exactly) would break 100s of our sites by injecting js and tracking pixels into the markup.

Now, with behavioural targeting slowly dying, ad tech businesses talking about fingerprinting as valid alternatives, and contextual targeting on the rise, I can guarantee you that the situation is going to get worse.


How insidious would you say that is compared to not being able to use an otherwise capable 7-year-old device to access most "secure" websites across the Web?


> So many plaintext http links which I avoid like the plague.

Why? What you described appears to be the safest place on the web.


I only browse HTTPS sites. I have the `HTTPS Everywhere` addon installed with the `EASE` / Encrypt All Sites Eligible turned on so I don't accidentally browse an unencrypted website. Something like 85%/90% of the web is encrypted now, and there's no excuse to be using outdated plaintext http anymore. It's a privacy and security risk. There are only few instances where I had to view a http site (I'm a freelancer and my client's webpage was still unencrypted, so I had to see it, so a rare exception to the rule).


The privacy and security risk comes in large part from the nature of code and actions performed on the site.

In reality as far as privacy goes, the matters are on average opposite to your claim. Most sites that will put your privacy at risk today are using https - I am talking about the vast majority of the commercially operated web today. I know my privacy is much better respected on a plain text (no javascript) site using http then on [insert a top 10k most popular site here] using https.

And for security, if I am not performing for example shopping or entering my billing details anywhere on the site, I do not see how a http site can compromise my security.

I actually prefer deploying http sites for simple test projects where speed is imperative because they are also faster - there is no SSL handshake needed to connect.


It's funny because I got like 70% HTTP in my index, so the whole "90% of the web is encrypted" seems to depend on which sample you are looking at. Google doesn't index HTTP at all, so that's not a good place to go looking for what's the most popular. That's in fact half the reason why I built this search engine in the first place, because they demand things of websites that some websites simply can't or wont comply with.

A lot of servers still use HTTP, for various reasons. There are also some clients that can't use HTTPS.


I think there are absolute numbers and then there are "the sites most people visit regularly" and those probably are 75% https. It's relative like most things.


Absolute numbers are pretty hard to define, as is the size of the Internet.

If the same server has two domains associated with it, does it count twice? Now consider a loadbalancer that points to virtual servers on the same machine. How about subdomains?


It may be a privacy risk, but it's certainly not a security risk with plain old blog and static sites that have completely open data available to anyone who wants to surf to their sites.


HTTPS is a still privacy risk because the hostname is sent in plaintext. Perhaps you get some "URL privacy" but you get no improvment in terms of "hostname privacy". HTTP only leaks the hostname once. HTTPS leaks it twice.

This can be prevented by (a) using TLS1.3 to ensure the server certificate that is sent is encrypted and (b) taking steps to plug the SNI leak; popular browsers send SNI to every website, even when it is not required.


You should be getting fewer .txt-results in the new update, a part of the problem was that keyword extraction for plain text was kind of not working as intended, so they'd usually crop up as false positives toward the end of any search page. I'm hoping that will work better once the upgrade is finished.


I have this REALLY old text file of search engine URLs:

http://www.jaruzel.com/textfiles/Old%20Web%20Info/Internet%2...

Google basically killed almost off of them off :(

It would be great to see some proper competition in the search space, especially around specialist search engines.


I would love a search engine targeted towards developers. Searching for symbols seem to be a problem with google, not to mention all of the utterly crappy results they serve up.


Symbolhound does that.


Also grep.app for searching into repos really fast.


also https://searchcode.com/ was posted on a HN thread about niche search engines a couple days ago


You should also add YaCy: https://yacy.net.


It seems to be not web based?

When I click on "Try out the YaCy Demo Peer", I get "502 Bad Gateway".


It's self-hosted and peer-to-peer. You could search for other public-facing instances, e.g., http://sokrates.homeunix.net:6060. Ideally, you could run your own instance to show the world how it works.


You should add Fireball. It's excellent

https://fireball.de/


A yacy instance would be good too ;)

https://yacy.net/


Nice work. If you have a twitter handle you might request to get added to these lists; either way they might be useful for you: https://twitter.com/SearchEngineMap/lists


This is a nice compilation.

It would be very interesting if it examined and compared results.


Yes, looks good, although I thought it was going to be a federated search, i.e. you enter your search term and it performs that search on all the sites selected. The simpler way of implementing a federated search would be to show separate results boxes from each site, although that wouldn't scale well to a large number of sites, and it can get quite complicated to try to combine the results.


You should add the podcast search engine https://www.listennotes.com/


Too bad they force you to log in to view a result or do anything but search. They also share/sell your data to 3rd parties including Google


Thanks, added.

Holy Moly, are there really over 100 million podcast episodes out there?


Yes. some numbers: https://www.listennotes.com/podcast-stats/

Listen Notes was started in early 2017 as a side project, when there were ~23 million episodes.

I remembered seeing the number of web pages indexed by Google in early 1998 was ~25 million, then I thought that "ok, 23 million episodes might justify the existence of a podcast search engine" :)


Is anything supposed to happen when I enter something and press Enter? Nothing happens for me, FF on Windows, uBlock.


You are supposed to type something into the search field then click the engine you want to use, it will pass on whatever you entered.

I agree that the creator could make that a little more clear somewhere on the page.


where's you.com lol


Thanks, added.


Ask.Moe

nona.de


I will add add a disclaimer to this comment that it is tinfoil-hat and just speculation(bordering on conspiracy) but many of these "we are a privacy-first company" might actually just be honeypots and fronts for 3-letter agencies.

The comment is not wholly conspiratorial, considering the CIA owned Swiss crypto company: Crypto AG [1]

It's within the realm of possibility that most of these privacy services could be owned by 3-letter agencies or small enough to be coerced into cooperation.

[1] https://www.scmp.com/news/world/europe/article/3050193/crypt...


Haven't you heard? The CIA has gone open source. They don't need to own a company anymore.

They can just download the Searx source code; modify it as they see fit, and make it available on a server someplace.

Can you prove that searx.be isn't run by a "3 letter agency"? Can you prove that the source code running at searx.be is the same as on Github?

The point being --- unless you have full access to the server, open source means nothing with regard to privacy and security of any service. It actually means less than nothing --- it means it is super easy to build into a honeypot.


Of course, there's no fool-proof solution to knowing what code is running in the server side, but https://searx.space at least shows if an instance modified their client-side code, which you can see in the HTMl column.

To mitigate server-side code from identifying you, you can consume an instance from Tor. Of course, you could try to do that with any other search engine, but most of the other search engines either block exit nodes or provide incomplete functionality if you disable JS.

It's not perfect, but it may be good enough depending on your threat model.


Note to the CIA --- don't modify the client side code when building your honeypots.

Personally, I just use a VPN with the "lite" version of DuckDuckGo --- no JS.

https://lite.duckduckgo.com/lite


SearX is a project which we respect and a positive contribution to improving search choice. Consideration of how it might be being used is wise.

It's also wise to do due diligence on any company/service where you are revealing sensitive personal information. Traffic coming from Google in 2006, for sensitive medical search queries was a catalysts for us going public in 2006 on our strict no-tracking policy and we maintained that position.

We have yet to be contacted by authorities, but you'll have to trust us on that one for now. Since we don't log any personal or identifying data at all, we would have nothing to share [0]. You can read about our investors on our blog.

Building and maintaining a search engine with independent infrastructure has a huge challenge and has meant building proprietary IP over many years. Since we refuse to use techniques used in growth hacking such as analytics from you know who, and all tools involving any tracking, marketing is a bigger challenge than it is for companies without strong principles. It has been a mammoth effort, by mostly our founder whose story you can read here [1].

[0] https://www.mojeek.com/about/privacy/ [1] https://blog.mojeek.com/2021/03/to-track-or-not-to-track.htm...


I should have added that my comment implied as much to do with DDG as it does with cheap-VPN-provider-35 with a shell company in Belize.

The original comment was in reference to DDG proudly making claims of not getting requests from .gov and marketing themselves as a company who "cannot see what you search for".


I do think it's a bit of a red flag.

Sort of like how most anti-tracking browser extensions eventually turn out to actually be tracking extensions. Or like how used car dealers that have a name like "honest bob's cheap luxury cars" often turn out to neither be honest, cheap nor luxurious.


Isn’t that confirmation bias? uMatrix and uBlock are reliable, the opposite being PrivacyBadger. The EFF has lost my trust before but I never assume maliciousness before incompetence. https://old.reddit.com/r/privacytoolsIO/comments/l2dges/why_...


The list of browser extensions that in some form has backpedaled from their central premise and main function, the list is pretty long. Ghostery, Adblock, AdblockPlus, ...


I don’t disagree it’s a lot, NoScript was another example, uBlock and uMatrix by no fault of themselves were also hijacked being open source, Ghostery was sold, and Adblock Plus with acceptable ads wasn’t bad as they said. It was widely reported, I continued installing ABP, since it was easy, wasn’t hard to turn off acceptable ads, and I think that direction they tried to move the industry in wasn’t harmful. I might have moved back to Adblock or learned about hosts but if they were successful we’d have less resource hungry ads, a net benefit for everyone, especially when using public computers or helping someone with IT.

Ghostery was more widely reported as Audacity adding telemetry. Everyone who cared knew long before to leave or uninstall it.

Hosts blocking is reliable and I’ve never had a single malicious one with the wide assortment I used. PiHole hasn’t been hijacked either and I think it’s unreasonable to think that no group can make mistakes, faltering can’t ever happen, I really don’t think Adblock Plus was that bad.

If the market wasn’t saturated with methods to block, I would have stuck with them if they were remorseful.

-Sent from my not private Apple device I’ll still use since it’s got a huge userbase on messaging in the US


Same goes for VPN companies. I do feel bad for all the journalists and whistleblowers who will fall for these scams but as far as I'm concerned, as long as I can avoid for profit data collection companies like Google, it's good enough for me.

If I can't avoid my data being collected, I will still try my best to make it as worthless as possible just out of spite.


The thing is... lets say the CIA/NSA are tapping searx wholely or just instances. What exactly are the ramifications? I feel like they are going to be largely missing the target. A bunch of techsavvy nerds trying different search engines aren't going to be terrorists.

And even if they are? As a Canadian or someone who isnt in the USA. What exactly is the point? Wouldn't this effectively be the safest host? CIA/NSA wont be selling your private infos. They wont be sending me to a blacksite because i look at python documentation and youtube chill music.


> CIA/NSA wont be selling your private infos.

Why not? They used to sell cocaine after all and your info is probably rather less risky.


>Why not? They used to sell cocaine after all and your info is probably rather less risky.

It will reveal the operation busting any potential for catching terrorists.


The purpose of government SIGINT (Signals Intelligence) is certainly not to catch terrorists/pedophiles/money-launderers. Those activities are generally tolerated/endorsed by intelligence agencies, as they are not heinous enough to garner their ire, even helping them whenever they coerce someone into committing a terrorist attack. The true purpose of all of those data is to create a metadata map and to assess who is up to what and who can do what, such that the powers of their nations over the world can be maintained as long as possible.


While I don't like the idea of a three letter agency honeypot, I'd be even more concerned with ad-tech and surveillance capitalism companies setting up honeypots.


The problem with self-hosted search engine is that they make you _very_ unique: you’re the only client of the ”backend“ engine with that (static and non-NATed) IP. Furthermore, you’re now one of the small group of people with ”hosting“ IPs. Using self-hosted SearX may make you easier to track, not harder.

Using SearX hosted by someone else is marginally better, but now you have to trust the owner of the server, which is probably not what you want for privacy-centered search engine.


Could you clarify where the privacy concern is here? As I understand it, I'm sharing my IP with search engines anyways; the only difference with a self-hosted SearX instance is that I'm sharing my server's IP instead.

Is the concern that the latter's IP isn't behind a NAT, and therefore is more unique? If so, I think that's the least concerning of the identifying datapoints that a search engine has access to -- my browser metadata is far more identifying. With SearX, that information doesn't get forwarded (IIUC).


Server IP can easily be differentiated from casual internet users' IP, and since you also don't usually get a server IP rotated, you're pretty much stuck on a single IP and you're easy to be categorized as an unique user no matter which device or location you use it from.


If you don't want to expose your IP address, you can configure searx to proxy all the queries through Tor. This obviously makes the instance way slower and you'll have to disable some engines that block Tor exit nodes, so it's a trade-off.


Tor exit nodes are public and given that SearX traffic is easily distinguishable from normal traffic, you're once again in a very unique group of people who use SearX + Tor setup.


Probably better to let your server go through some public VPN service.


What you can do is:

* Share your instance with some friends (though of course it shifts the trust one level down)

* Route outgoing requests through tor / some VPN. In a hosted environment that you configure once for "everywhere", it's more feasible to do fun things like "Google searches over OperaVPN, bing searches through Mullvad, everything else through Tor". You could even change proxies.

Especially with the latter you can kind of "eat the cake and have it too", with some added latency of course.

As a bonus, searx can also be configured to rewrite links (yt->piped, twitter->nitter, imgur->rimgu etc) and remove tracking query parameters.


Do you know hosting companies that allow crawling and scanning? Most explicitly forbid such things as it makes their ip addresses worth less.


To add on my other comment: Vultr and Linode are both providing guides on how to set up searx on their platforms, so there you have two that will not raise any issue.

https://www.vultr.com/docs/install-searx-with-nginx-on-ubunt...

https://www.linode.com/content/maximize-your-privacy-with-se...


Really? Past employers have done quite heavy automated web queries across several major providers, who never had an issue, even when discussing our use-cases with them. At one point we had ~1000 AWS EIPs in one account solely for increasing throughput (this was an effect of the legacy solution outgrowing its intended lifetime way longer than we wanted, if you scoff at how uneconomical that sounds and is. Just saying, AWS supported us throughout).

Either way, what we're talking about here is more akin to an http proxy than scanning/crawling - since every request is explicitly triggered by manual user action as opposed to automated so it shouldn't be any issue.

If you expose and advertise a public unauthenticated frontend and end up with 10ks of users maaaaybe some providers will start talking to you about it but otherwise I wouldn't have any concerns. And if you get to that scale, you may already want to look at proxying through providers like Luminati anyway.


When you click a link on your SearX instance and you don't use referrers, how can anybody track you? Nobody knows that you are coming from your "backend" engine.

You just reveal your search queries to the hosting provider if he maliciously intercepts them.


Isn’t hosting your own instance taking away every benefit of searx by revealing your IP? If you had a VPN you’d use it to mask your IP from tracking of search engines anyway, and if you used Tor for it, you’d probably move back soon since it’ll be so much worst with latency, like how many people go back to google because DDG sucks for results. I suggest just using instances you can find them at https://searx.space some are more reliable than others but none have been trouble free. There’s a lot of these instances for privacy, chrome has a privacy plugin with a white eye that uses nitter.net for Twitter, teddit for Reddit and other public instances. One instance of Reddit was even made completely in rust. ;)

https://chrome.google.com/webstore/detail/privacy-redirect/p...

To reply to the person under me, you’re always relying on a trust in something unverified and untrustworthy filters like VPNs anyway, you’re either revealing your IP using a wrapper that reveals it instantly, use a site that isn’t a search engine and might be using your data, using a VPN that is based on reputable and assumptions, usually based in another country you won’t visit or know much about aside from random reviewers, or using Tor, losing latency, reasonable speed image search, and still be possibly compromised.


But if you’re using instances someone else is hosting, aren’t you hitting half the author’s objections?

- They may be hosted in the US

- They may be hosted on AWS

- You have no idea if the maintainer of the instance is tracking you


^ This right here. This article is pretty hogwash IMO

Points 1 and 3 aren't relevant if they aren't recording the data. Companies in other jurisdictions have no magic invulnerability you can trust to their data getting out (legally or illegally) if they're storing it.

Points 2 and 5 are equally true of any open source project unless you run it yourself from source. There are _plenty_ of examples of users getting phished by maliciously built/hosted open source tools

Point 4 is obviously not malicious tracking and a mistake any project could make

At the end of the day though, unless you're going to run everything yourself (which most people aren't) you have to pick who to trust -- some random person running a server somewhere, or a company with hundreds of employees recruited under the premise of working on a privacy-centric search engine who could all turn whistleblower


Are there any opensource real internet search engines worth looking at? I think we should be working on disrupting search as a whole instead of depending on the Googles, Baidus and Bings of the world.

I'm fully aware of the massive crawling and storage requirements, but opensource projects that can get search right can later 1) be hosted by the powerhouses of the cloud or non-profit parties, or 2) become a fully distributed hosting and crawling effort as in p2p and blockchain.


p2p: there's the Yacy effort. https://yacy.net/ I… couldn't find a portal to try it out (I did years ago, the results need… to be discussed. It's anyways easy to install and to choose what part of the web to crawl.)

> YaCy is free software for your own search engine.

maybe they rebranded and don't aspire to be a complete web search engine?


I always liked the term "search engine client" better (vs 'metasearch engine'). In essence it is a product that can connect to different search indexes.

An "email client" does exactly the same thing, connects to different email servers and we do not call it "metaemail".

edit: just realized that with the current hype around metaverse, 'metasearch' will probably be more appropriate for something searching the metaverse in the future.


I have been trialling Swisscows (https://swisscows.com/) and have found it quite useful. I have not deeply researched their privacy claims, but for now I am just trying to not use Google or mainstream alternatives.

Does anyone else have experience or comments on Swisscows' search engine? Seems like an interesting company all round.


Doesnt searx looks for results at duckduckgo and google for you anyway? Whats the difference from using ddg directly?


There's other sources available. It is a meta search engine, so it will always rely on other sources, but you can disable duckduck go and google backends.


Searx can ping multiple search engines, including those not supported by DDG. For example, searx has a dedicated file search, which includes torrents.


Isn't duckduckgo just an alternative frontend for bing with an integrated ad/tracking blocker. Or at least that's what they claim.


It is. They say they add some indexing on they own, but results are all the same.


I don't think it uses DDG directly.But anyways you can configure the sources for files, media, wiki,etc. Makes sense since the engine is open source, but then again it's not really a search engine itself but a metasearch one


As someone who uses Safari with its built in list of search providers, I'm rather stuck with DDG for address bar searches, but boy it has really started to suck over the last year or two.


Yes, Safari is the worst offender when it comes to offering search choice on desktop.

On iOS using a new app called Hyperweb you can the new Safari extensions to access and create a longer preferences list. https://hyperweb.app/

We really shouldn't have to choose this or that, but should be able to easily use multiple choices in search. You can do that today as explained here, but you'll need to switch browsers. https://blog.mojeek.com/2021/09/multiple-choice-in-search.ht...


So don't use Safari then? Seems like a pretty simple solution.


Chrome destroys the battery on my laptop and is basically spyware these days, the Chrome-alikes are all dreadful - Vivaldi has the jankiest UI ever, Brave is unbearably sluggish, Edge runs processes that I didn't ask for like Microsoft Updater that bugs me constantly and spams the new tab screen with all sorts of low rent junk that I can't remove.

Firefox is my developing browser and I do really like it, but Safari my actual browsing browser because it's by far the best browsing browser on Macs.


If you leave macOS you can get much better frontends for WebKit, from the simple, rather Safari-like GNOME Web (AKA Epiphany) to the powerful Pentadactyl-like luakit.


I've landed on Librewolf for personal, ungoogled chromium for work. It's great so far, been on this setup for a few months.


Edge is available for Mac.


> with its built in list of search providers

TIL. That's just... terrible UX.


You could always do !searx


True! I'm experimenting with Ecosia right now, but also recently got on the beta for the kagi.com search engine which so far has proven to be vasty superior to any other that I've tried.


Nothing in their FAQ answers where they are located and they dance around the question of what info they track/keep by saying that they keep the minimum. That is a non-answer.

I like the idea in theory but in practice I have no idea who I'm dealing with. They could be far more open about their processes. I like the idea of a paid browser though.


I recently made the swtich from DDG from Searx simply bevause right-clicking on a search result to copy the url resulted in a referrer link to be copied rather than the link of the result destination.


I use https://addons.mozilla.org/en-US/firefox/addon/clearurls/ to automatically convert referrals to actual links on web pages.


I've been using ClearURLs for a while so I never even realized that DDG used referrer links. Have they always done this?


I think enabling DoNotTrack header or disabling JavaScript prevents this behavior: i cannot reproduce on Tor Browser. But you are correct this is a worrying development.


Isn't the referral link only on the ad results, which is clearly marked "AD"? That's what my quick test shows.


Now I am not sure what exactly the issue is but this seems to be applicable to all results, at least on my particular setup.

On LibreWolf browser,

On mouse hover, the correct result urls show but then I right-click one, it shows something like:

https://duckduckgo.com/l/?uddg=<destination_url_here>&notrut...

But this same behaviour cannot be observed on Google/Searx on the same browser. It also even isn't observable with DDG on a FF nightly build on the same system and a FF stable build on a separate one.

Edit: url formatting

EDIT2: this behaviour seems to be limited to my particular setup and even a clean LibreWolf profile seems not to suffer from this issue. I apologise for the misunderstanding.


Bizarre. I would never guess that was a local problem, even with my post I assumed you accidentally copied the ad then quickly made a decision.


'Uses Amazon Web Services (AWS) as a cloud provider and Cloudfare CDS.'

IIRC DDG uses Microsoft servers now exclusively. Makes sense given the volume of queries they're handling and all dependent on Bing API.


Please keep title same as the actual post: Searx - moving away from DuckDuckGo

It gives more context to the topic, as in it's not just a link to the search engine itself.


For what it's worth, it's used by by the de-Googled phone /e/ Foundation at https://spot.ecloud.global and seems to work tolerably well.


Slightly off topic but has anyone a good solution for removing content farm seach results in this or any engine. For example some worst offenders wikihow , forbes, business i sider.


uBlacklist fully supports blocking results from Google, Bing and DDG, and partially from Startpage and Ecosia [1]. It's been available for Firefox and Chromium derivatives for a while, and there seems to be a third-party Safari port as well [2][3].

There's a bunch of language-specific blocklists on GitHub focused on StackOverflow/GitHub mirrors and wonky machine translations, but I don't know of any mature curation effort yet.

[1] https://github.com/iorate/uBlacklist#supported-search-engine...

[2] https://github.com/HoneyLuka/uBlacklist/tree/safari-port/saf...

[3] https://apps.apple.com/us/app/ublacklist-for-safari/id154791...


Privacy is important, but I also care about the terrible quality of search results I get from nearly all the major providers these days. Couldn't an aggregator like SearX host a machine learning layer that learns what results are more likely to be valuable to me, and ranks them higher in the results? Keeping the customization layer on my own server and improving search results would seem to be a big advantage both privacy and performance wise.


From the link:

The CEO sold his previous company's data before founding DDG. His previous company (Names DB) was a surveillance capitalist service designed to coerce naive users to submit sensitive information about their friends.

Is that a fair statement? Can someone provide more context?


It was a failed social network to help you reconnect with old friends. It tried to get you to recruit your friends immediately after registering and had a typical social network license. I'd say that description is intentionally describing it in the worst possible light, but not wholly inaccurate.


If you're trying to be comprehensive, a few other suggestions in rough order of their usefulness:

Gigablast.com - Has been improved recently. private.sh is supposed to be a private proxy for Gigablast, but it has been broken recently

Exalead.com - run by a French defense contractor for some reason

filepursuit.com - search for files only. Need to play around with it more.

PeteyVid.com - multi-platform video search

Wiby.me - focus on "classic" style web sites


Great! I wish there was a possibility to blocklist certain domains (who wants to see Quora in their results...). This should be easily implementable on Searx's side. Another feature I often wish for is searching in a specific time period. It's so annoying, for example on Youtube, when I remember that a video was released in 2011, but there's simply no filter for it.


> I wish there was a possibility to blocklist certain domains

You can do that in this fork: https://github.com/searxng/searxng/blob/e839910f4c4b085463f1...


You do not need to trust third parties to keep you private and not track your every move, which is awesome.

The only way to avoid third parties to run your own server ... but this "metasearch engine" is basically just an aggregation proxy. So every search can still be tracked back to your proxy server by Google, Bing or whoever is providing the actual results.


I run searx locally, have it set as my default search engine, and use it as a search engine client. It's nice to have the same interface across as a variety of search engines.

Searx is provided as a service on NixOS, which makes this all simple to run.

I do the same for Nitter, a Twitter frontend, which supports RSS and behaves well while logged out.


Metager is a non-profit, open-source search engine running fully on renewable energy. It also has a proxy for opening results anonymously

https://metager.de


https://metager.org/ for english users


That seems quite exclusive to germany


I had been considering using Searx (which I had known about before) lately since I have to use DDG+Google for getting satisfactory results.

Edit: I really like DDG bang and vim like nav keys tho


Someone here once posted a link to duckduckstart.com. I have it set up in my search bar now.

If you search, it goes through Startpage (Google results, more privately). If you search with a bang, it goes through Duckduckgo. It's probably close to what you're looking for.


saw it just now. wow thanks


Could this be installed on a Raspberry Pi? I am very happy with my Raspberry Pi-hole: would not mind adding a second pi for searching.



I believe rightdao.com is missing from the list. It has independent index (and also impressive speed).

Also not sure what the criteria for inclusion is, but search.marginalia.nu and teclis.com both have their own indexes.


That site is horrible on mobile, a good portion of the screen is taken up by the orange "download/view" infographic thing. Interesting though to see how connected the engines are, I would have thought DDG would be bigger with its bang option though I assume it's about what is natively included in the results.


Yes, it is horrible on mobile. The size of all the syndicating search services are the same. An update is overdue.

Complementary twitters list are maintained here: https://twitter.com/SearchEngineMap/lists


Just searched a couple of queries like "opencv rectangle", "python regex" - and it returned nothing


Which instance did you use?



Qwant is another (non-US) alternative


Doesn't it use Bing results, like DuckDuckGo does? I read it's hosted on MS Azure like Bing and DDG, so in the end it's somewhat just a rebranded Bing. Quite a shame for a European (Franco-German) search engine.


Even if they were just a proxy behind bing, that would be still good, weren’t it?


Well, claiming to offer an alternative to the GAFAM while depending on their products / data / infrastructure is a bit misleading.

So indeed they're doing okay privacy wise, but a lot of users feel cheated when they realize their "independent search engine" (DuckDuckGo) is just a Bing portal hosted on Azure.


My search engine puts a Pi symbol in lower left corner of every web page I visited.


Search engines have been coming up lately, so maybe this is a good a place as any to discuss some back of the envelope calculations.

Let's say we wanted to recreate the web index made by Google. How much cost and engineering would it take?

Estimating the size of the web from worldwidewebsize.com [0], this is estimated at around 50 billion (5010^9). The average web page size looks to be on the order of 1.5 Mb (1.510^6). The nominal cost of hard disk space is about $0.02 / Gb [2].

So, roughly, that's 75 exabytes of data (~7510^15). At a cost of $0.02 / Gb that gives roughly $1.5M just to buy the hardware to store (a significant fraction of?) the web. The Hutter prize exists [3], so maybe there's some confidence that we only need to actually store 1/10 of that, so around $150k in costs.

For perspective, that's 10 multi millionaire silicon valley types donating about $150k each, 100 "engineer types" at $15k each or 1000 to 10,000 pro-active citizens at $1.5k to $150 each (just* for the hard disk space, discounting energy, bandwidth and other operating costs).

If we try to extrapolate lowering hard disk space costs and take the price halving to be about 2.5 years with a current (pessimistic?) cost of $0.02/Gb, that's about 10-15 years before a petabyte scale hard drive is available to the consumer for $1000.

From my perspective, I would ask "why hasn't a decentralized search index been created and/or is in wide use?". My guess is that figuring out a robust enough system that's cheap enough is still out of reach. $150 might not seem like a lot, but you have to convince 10k people to devote energy just to search.

Put another way, when does the landscape change enough so that decentralized search is a viable option? My guess is that when people can store a significant fraction of the web locally for nominal cost is the determining factor. Maybe some great compression and/or AI sentiment analysis can be done to bootstrap and maybe some type of financial incentives can help solve this issue, but my bet these will only provide a light push in the right direction and the needed technology is the underlying cheap disk space.

As a side note, the worldwidewebsize.com [0] shows the number of indexed pages by Google holding pretty constant over a five year period with a sharp decline somewhere in 2020. I wonder if this is the method of estimation or if Google has changed something significant in their back end to alter their search engine and storage.

[0] https://www.worldwidewebsize.com/

[1] https://www.pingdom.com/blog/webpages-are-getting-larger-eve....

[2] https://www.backblaze.com/blog/hard-drive-cost-per-gigabyte/

[3] http://prize.hutter1.net/


Whoogle is another alternative, that focuses on Google search results


People don’t want privacy. They want results.

Society programs us to think privacy is our top concern. Is it?


I do think you are mostly correct. Some people really care about privacy, but for most people it isn't a huge concern.

This doesn't make it any less important, but just means that if your main selling point is "we're the search engine that cares about privacy", then odds are you're not going to get a lot of users.


I agree with most of your points. Although It may be matter of a time to grow that niche. IMO Pople who value it are willing to take sacrifice of less functional outputs.

Privacy is most effective selling point when working with sensitive information.


If you are working with sensitive information, the last thing you probably want to do is broadcast that you are working with sensitive information.


Au contraire, it's the first thing you should broadcast, unless you're trying to scam people out of their PII.


I feel like privacy is a concern relative to the chance of my information being misused and the damage from the misuse of that information. If a search engine wants to collect random facts about me but that collection is extremely unlikely to affect me in any way I couldn’t care less


Alright, going to try searx.be for a while then.


I heard it could be a honeypot for the CIA. But feel free to prove me wrong.


If you don't want to use a single searx instance then feel free to use a random one automatically for each search thanks to this tool which can be used locally: https://searx.neocities.org/


I heard these searx instances could be linked together in a honeypot network run by the CIA.

But feel free to prove me wrong.


I had a searx instance running for a long time and it's great when it works but the plugins for site specific searches break all the time and if you have more than 3-4 users with a high search frequency google blacklists your IP by throttling.


May I ask about what’s your idea of preserving privacy with self-hosting a website that searches for specific terms from a presumably fixed IP, where by 1/3-1/4 chance it can be attributed to you?


The instance ran on a mobile connection not associated with any private information.

EDIT: It was located in a german street light 20km away from any of the users. Just to get the geolocation question out of the way.

It was more of an experiment than anything else there will be a talk about it and other FreiFunk (Open Mesh Network in Germany) related stuff at the next virtual CCC congress.


Did you renew the mobile IP from time to time?

Note: since the version 1.0, searx stops sending request for 1 day when a CAPTCHA is detected which might help a little.

(I'm really interested by the results of your experiment)


We renewed every 12 hours in the end we came to the conclusion that google might discriminate traffic from eastern european states. The sim cards were mostly from poland, belarus and the ukraine. When we tested it against german, french and italian cards captchas were way more frequent on the eastern cards and showed way earlier. I will ping you as soon as the talk is online.


Thank you! To ping, see my profile or @dalf at GitHub


Or you could use the only independent search engine that actually has its own index, and search result quality comparable to that of Google: https://search.brave.com/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: