Who needs Google Maps and a browser when you can have interactive maps over SSH? Made with Rust, ratatui for terminal rendering on the backend, and xterm.js for terminal emulation on the front-end.
Yep this is only for stuff that we've crawled, so we can't detect all of your links. Because we have limited crawling resources, we rate-limit the crawling by domain so we don't get stuck in spider traps.
The current visualization only shows the current state of the crawl, so it won't know about all of the posts.
Well, the results are quite strange. Mine ( ploum.net ) is said to have 7 links to youdoblog. (which is false, there’s only one link to that website in all my 900 blog posts).
The hairball was much worse before. I used a lot of techniques from this paper [1] to make it look decent and a bunch of other heuristics based on other papers to make it look informative.
I'd give it a shot to make node embeddings with Node2Vec [1] and then reduce them to 2D with UMAP [2]. I think it could help breaking apart the hairball, assuming you have a nice clustered structure.
To get their topics? I used a basic louvain community detection algorithm, then put all the URLs into GPT with some few-shot prompting tricks to get it to output a particular topic. There's some heuristics to break up giant communities / combine small communities in there too.
Interesting, I was curious what I would be categorized as and it's "Whistleblowing and Leaks", which I do suppose is what my content has lately been to some extent but it was funny to see that written out.
My question for you is how can I see what sites link to me, as opposed to what sites I link to?
This is very cool but also not accurate, at least for jakeseliger.com. Henryn.ca lists 0 links from jakeseliger.com to nytimes.com, reason.com, and numerous others that simple search demonstrates are linked to, for example: https://jakeseliger.com/?s=nytimes.com&submit=Search
I put up many links posts, so I probably link to an abnormally large number of sites.
Yep this is only for stuff that we've crawled, so we can't detect all of your links. Because we have limited crawling resources, we rate-limit the crawling by domain so we don't get stuck in spider traps.
The current visualization only shows the current state of the crawl, so it won't know about all of the posts.
> I scraped my favorite blogs and made a graph from the domains that each blog links to.
Nice analysis! However, I'm guessing these arent your fav blogs as there are tens of thousands of entries! How did you decide which blogs to index, did you use some central registry of blogs?
Very neat! So you wrote the graph visualization UI? I see in prior project you used cytoscape - any motivation for doing it yourself this time (vs one of the available libraries)?
Yeah I used cytoscape before but it didn't have the full customization that I wanted. Besides the performance issues, there were some problems I couldn't have solved without a custom renderer
- if many lines overlap, how should their colors blend?
- how to render circles so that they look nice both zoomed in / out
- how to avoid it looking like a hairball graph [1]
The nice thing about a personal project is that I can do whatever I like with no constraints, so I built one that's suited for this project and fits my tastes.
reply