The result is very strange. It's saying that South Korea has the most number of websites with the header and yet I don't see ANY search result in Korean. No writeup or whatsoever. Wonder what those websites would be.
Flying by the seat of my pants, this page of information has details which we can guess at - 27,799 are South Korea, 27,690 are Korea Telecom (so close that I'll say it's a 1-to-1 match). Wikipedia tells me as of 2015, KT ran more than 140,000 Wifi hotspots.[1]
Further down the info, we see 28,587 (almost the same number as above) HTTP titles are "Gargoyle Router Management Utility" - which is an opensource variant of the OpenWRT world which patches the code to include the Clacks header.[2]
I'm going to conclude that there's a direct correlation in this data (it all being one and the same endpoint/device pattern) and that 30,000 KT Wifi hotspots across South Korea have their management UI open on the public interface and not locked to the internal network or a VPN, etc. running this Gargoyle patch.
And how representative are publicly accessible redis/valkey instances for redis/valkey usage in general? And can shodan even differentiate Redis from a Valkey instance setup in a backwards-compatible way without being able to authenticate?
In absolute numbers probably not highly representative but the relative numbers are meaningful to measure adoption. And no, it requires the user to disable authentication in order to get the service details to differentiate between Redis and Valkey. But again, you can compare unauthenticated Redis to unauthenticated Valkey to see how the percentages are changing over time.
We developed "geodns" for situations where you want to do DNS lookups from different regions around the world. For example, ycombinator.com returns different IPs depending on your location:
$ geodns ycombinator.com
108.156.133.117 Singapore
108.156.133.21 Singapore
108.156.133.25 Singapore
108.156.133.59 Singapore
108.156.39.26 London
108.156.39.61 London
108.156.39.62 London
108.156.39.64 London
13.32.27.123 Frankfurt am Main
13.32.27.47 Frankfurt am Main
13.32.27.51 Frankfurt am Main
13.32.27.80 Frankfurt am Main
13.35.93.12 Clifton
13.35.93.14 Clifton
13.35.93.46 Clifton
13.35.93.47 Clifton
18.239.94.100 Amsterdam
18.239.94.114 Amsterdam
18.239.94.33 Amsterdam
18.239.94.79 Amsterdam
99.86.20.42 Doddaballapura
99.86.20.54 Doddaballapura
99.86.20.64 Doddaballapura
99.86.20.96 Doddaballapura
Is that because it's behind cloudflare? I'm pretty sure it still runs primarily on a single server in a Colo (i.e. except in times of hardware failure or other physical realities).
It's also possible to get a copy of HN from Cloudflare in addition to M5. I keep historical DNS data and can confirm there are Cloudlfare IPs that continue to work.
whois is returning AWS and I don't see any of the normal cloudfront headers, but I do see a server header of nginx. So it doesn't look like cloudflare to me, I'd guess they're just running some ec2 instances with nginx configured to give the exact behaviour they need (as I recall they return cached pages to non logged in users, which is why you can sometimes log out and get the page to load when they're having issues). I also see awsdns in their ns records, so it looks to be like they're just doing Geo-dns in route53 to route to the closest instance they're running.
- Want to distribute data to users that don't want to manage a server? A lot of people don't want to manage a server and don't need the best possible performance.
- Want to take data with you on a thumb drive and work with it offline? It's extremely convenient to be able to use SQLite for an app that has to work offline.
- Does the app mostly just read from the database and fit in memory? It's undervalued to just put the entire database into memory so you don't hit the disk and don't introduce network latency. For example, the following website does all enrichment with in-memory SQLite databases: https://shdn.io/analyze?target=ycombinator.com
At Shodan, we distribute versions of our datasets as SQLite and they're a popular way to consume the data without having to manage infrastructure.