Hacker Newsnew | past | comments | ask | show | jobs | submit | blast's commentslogin

What are those techniques? I'd like to learn more.

Mostly entropy in it's various forms, like KL divergence. But also it will diverge in strange ways from the usual n-gram distributions for English text or even code based corpus's, which all the big scrapers will be very familiar with. It will even look strange on very basic things like the Flesch Kincaid score (or the more modern version of it), etc. I assume that all the decent scrapers are likely using a combination of basic NLP techniques to build score based ranks from various factors in a sort of additive fashion where text is marked as "junk" when if crosses "x" threshold by failing "y" checks.

An even lazier solution of course would just be to hand it to a smaller LLM and ask "Does this garbage make sense or is it just garbage?" before using it in your pipeline. I'm sure that's one of the metrics that counts towards a score now.

Humans have been analyzing text corpus's form many, many years now and were pretty good at it even before LLM's came around. Google in particular is amazing at it. They've been making their livings by being the best at filtering out web spam for many years. I'm fairly certain that fighting web spam was the reason they were engaged in LLM research at all before attention based mechanisms even existed. Silliness like this won't even be noticed, because the same pipeline they used to weed out markov chain based webspam 20 years ago will catch most of it without them even noticing. Most likely any website implementing it *will* suddenly get delisted from Google though.

Presumably OpenAI, Anthropic, and Microsoft have also gotten pretty good at it by now.


That's a bittersweet mix of troubles and sweetness. I hope the troubles sort out without too much heartache, and that 2026 brings many good things.


It will be a second dose of Streisand if they do.


> some random homeless guy

Was he homeless? I haven't seen that mentioned in the articles.



New York Post states it in a YouTube video titled 'All About Brown, MIT Shooting Suspect Claudio Neves Valente – who BARKED During Massacre'.


> John posted about the encounter on Reddit after the shooting

Anyone have the Reddit link? (I wonder why the article doesn't include it)




I feel sorry for this guy. His Reddit inbox is probably fucked, and he's absolutely going to get doxxed and hounded by news people, and I wouldn't be surprised if even worse things happened to him.

Good on him for reporting what he saw. He also went to the police the next day and reported it directly. But now the media machine is going to make him regret he ever said anything, which is unfortunate.


He’s already public, but he can make a new Reddit account.

> Now the media machine is going to make him regret he ever said anything

We’ll see how it turns out, but I don’t see why even the internet mob would hate him. He probably can’t live in Brown’s basement anymore, but maybe with the reward money and recognition he can find a real place.


This is admittedly very tangential only, but as a non-native speaker / not a US-American, I found this sentence from the NYT reporting[0] a bit confusing:

> John said that the suspect’s clothing was inappropriate for the weather and that they had made eye contact.

Why is the report mentioning the eye contact? Is that culturally significant, as in, in the US you don’t normally do eye contact with strangers, and if a stranger does make eye contact, it’s suspicious?

[0]: https://www.nytimes.com/2025/12/19/us/brown-mit-shooting-inv...


I think the eye contact bit is useful as a signal that the witness got a very good look at the suspect's face.


I think the eye contact in question was a prelude to the two of them kind of following each other around and a minor verbal altercation, so the later context shows that it was probably kind of suspicious eye contact, rather than a friendly "what's up?"


I suppose that made eye contact = the face was clearly visible for a second or two, and thus recognized with more certainty.


I agree with the other comments that this sentence is just poorly written.

In cities people tend to not make eye contact while walking by each other, though in smaller towns it is more common to acknowledge each other in passing.

In neither case would it be accurate to find eye contact suspicious. The sentence appears to be a summation of several things the person saw, convincing them poorly and creating the ambiguity.




I doubt that they know. It's too early to figure something like that out.


Seems to me that the obvious business model here is that they will need to have their AI inject their own ads into the DOM. Overall though, this feels like a feature, not a business.


To me the more obvious option is additional features that people pay for, i.e. freemium. But what do I know.


As a user, I'll never pay for software. Adblock for SaaS and pirated downloads for everything else is all I need.


Clearly there’s a tension on this venture-capital-run website between some people using their computer-nerd skills to save money and improve their experience, and other people hustling a business that requires the world to pay them.


> Clearly there’s a tension on this venture-capital-run website

Yeah. If they have a problem with that, they can kill HN. You can't have hackers/smart people in your forum and decide what they will do. Moderation can try do guide it but there is a limit when meeting smart + polite people.


Or, they do know and don't want to say. This project does seem to have funding so I assume there is a plan.


> If you’ve used Violentmonkey/Tampermonkey, Tweeks is like a next‑generation userscript manager


Have you seen a man eat his own head?

https://www.youtube.com/shorts/BFIO6OcbsTQ


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: