Hacker Newsnew | past | comments | ask | show | jobs | submit | jameshart's commentslogin

Is the baseline assumption of this work that an erroneous citation is LLM hallucinated?

Did they run the checker across a body of papers before LLMs were available and verify that there were no citations in peer reviewed papers that got authors or titles wrong?


They explain in the article what they consider a proper citation, an erroneous one and an hallucination, in the section "Defining Hallucitations". They also say than they have many false positives, mostly real papers who are not available online.

Thad said, i am also very curious of the result than their tool, would give to papers from the 2010's and before.


If you look at their examples in the "Defining Hallucitations" section, I'd say those could be 100% human errors. Shortening authors' names, leaving out authors, misattributing authors, misspelling or misremembering the paper title (or having an old preprint-title, as titles do change) are all things that I would fully expect to happen to anyone in any field were things get ever got published. Modern tools have made the citation process more comfortable, but if you go back to the old days, you'd probably find those kinds of errors everywhere. If you look at the full list of "hallucinations" they claim to have discovered, the only ones I'd not immediately blame on human screwups are the ones where a title and the authors got zero matches for existing papers/people. If you really want to do this kind of analysis correctly, you'd have to match the claim of the text and verify it with the cited article. Because I think it would be even more dangerous if you can get claims accepted by simply quoting an existing paper correctly, while completely ignoring its content (which would have worked here).

> Modern tools have made the citation process more comfortable,

That also makes some of those errors easier. A bad auto-import of paper metadata can silently screw up some of the publication details, and replacing an early preprint with the peer-reviewed article of record takes annoying manual intervention.


I mean, if you’re able to take the citation, find the cited work, and definitively state ‘looks like they got the title wrong’ or ‘they attributed the paper to the wrong authors’, that doesn’t sound like what people usually mean when they say a ‘hallucinated’ citation. Work that is lazily or poorly cited but nonetheless attempts to cite real work is not the problem. Work which gives itself false authority by claiming to cite works that simply do not exist is the main concern surely?

Let me second this: a baseline analysis should include papers that were published or reviewed at least 3-4 years ago.

When I was in grad school, I kept a fairly large .bib file that almost certainly had a mistake or two in it. I don’t think any of them ever made it to print, but it’s hard to be 100% sure.

For most journals, they actually partially check your citations as part of the final editing. The citation record is important for journals, and linking with DOIs is fairly common.


People will commonly hold LLMs as unusable because they make mistakes. So do people. Books have errors. Papers have errors. People have flawed knowledge, often degraded through a conceptual game of telephone.

Exactly as you said, do precisely this to pre-LLM works. There will be an enormous number of errors with utter certainty.

People keep imperfect notes. People are lazy. People sometimes even fabricate. None of this needed LLMs to happen.


Fabricated citations are not errors.

A pre LLM paper with fabricated citations would demonstrate will to cheat by the author.

A post LLM paper with fabricated citations: same thing and if the authors attempt to defend themselves with something like, we trusted the AI, they are sloppy, probably cheaters and not very good at it.


Further, if I use AI-written citations to back some claim or fact, what are the actual claims or facts based on? These started happening in law because someone writes the text and then wishes there was a source that was relevant and actually supportive of their claim. But if someone puts in the labor to check your real/extant sources, there's nothing backing it (e.g. MAHA report).

>Fabricated citations are not errors.

Interesting that you hallucinated the word "fabricated" here where I broadly talked about errors. Humans, right? Can't trust them.

Firstly, just about every paper ever written in the history of papers has errors in it. Some small, some big. Most accidental, but some intentional. Sometimes people are sloppy keeping notes, transcribe a row, get a name wrong, do an offset by 1. Sometimes they just entirely make up data or findings. This is not remotely new. It has happened as long as we've had papers. Find an old, pre-LLM paper and go through the citations -- especially for a tosser target like this where there are tens of thousands of low effort papers submitted -- and you're going to find a lot of sloppy citations that are hard to rationalize.

Secondly, the "hallucination" is that this particular snake-oil firm couldn't find given papers in many cases (they aren't foolish enough to think that means they were fabricated. But again, they're looking to sell a tool to rubes, so the conclusion is good enough), and in others that some of the author names are wrong. Eh.


> Firstly, just about every paper ever written in the history of papers has errors in it

LLMs make it easier and faster, much like guns make killing easier and faster.


LLM are a force multiplier of this kind of errors though. It's not easy to hallucinate papers out of whole cloth, but LLMs can easily and confidently do it, quote paragraphs that don't exist, and do it tirelessly and at a pace unmatched by humans.

Humans can do all of the above but it costs them more, and they do it more slowly. LLMs generate spam at a much faster rate.


>It's not easy to hallucinate papers out of whole cloth, but LLMs can easily and confidently do it, quote paragraphs that don't exist, and do it tirelessly and at a pace unmatched by humans.

But no one is claiming these papers were hallucinated whole, so I don't see how that's relevant. This study -- notably to sell an "AI detector", which is largely a laughable snake-oil field -- looked purely at the accuracy of citations[1] among a very large set of citations. Errors in papers are not remotely uncommon, and finding some errors is...exactly what one would expect. As the GP said, do the same study on pre-LLM papers and you'll find an enormous number of incorrect if not fabricated citations. Peer review has always been an illusion of auditing.

1 - Which is such a weird thing to sell an "AI detection" tool. Clearly it was mostly manual given that they somehow only managed to check a tiny subset of the papers, so in all likelihood was some guy going through citations and checking them on Google Search.


I've zero interest in the AI tool, I'm discussing the broader problem.

The references were made up, and this is easier and faster to do with LLMs than with humans. Easier to do inadvertently, too.

As I said, LLMs are a force multiplier for fraud and inadvertent errors. So it's a big deal.


I think we should see a chart as % of “fabricated” references from past 20 years. We should see a huge increase after 2020-2021. Anyone has this chart data?

Quoting myself from just last night because this comes up every time and doesn't always need a new write-up.

> You also don't need gunpowder to kill someone with projectiles, but gunpowder changed things in important ways. All I ever see are the most specious knee-jerk defenses of AI that immediately fall apart.


Yeah that is what their tool does.

* terms and conditions may apply

Music publishing vs radio stations is a fascinating example - compulsory licensing, meaning radio stations are free to broadcast any music at all; even rules preventing radio stations and DJs from accepting payola from publishers to promote their records.

Compulsory licensing sounds interesting but isn't there a fundamental problem in terms of setting price? Music tends to not have big budget differences. Should a show with a budget of 10k get the same fee as a show with a budget of a $1mil? And who sets the price?

I think it is fine for thr content maker to set the price ... as long as everyone gets the same price, and the content maker isn't also the distributor (streaming service).

Do you pay a different ticket price to go see a James Cameron movie at the cinema than you do for a Wes Anderson movie?

How does that work?


Eliminating time constraints is entirely reasonable. Leaving exams early is generally an option in most standardized testing systems - though usually with some minimum time you must remain present before leaving.

Taking what is currently scheduled as a three hour exam which many students already leave after 2, and for which some have accommodations allowing them 4 hours, and just setting aside up to five hours for it for everyone, likely makes the exam a fairer test of knowledge (as opposed to a test of exam skills and pressured time management) for everyone.

Once you’ve answered all the problems, or completed an essay, additional time isn’t going to make your answers any better. So you can just get up and leave when you’re done.


I think one challenge would be preventing professors from taking advantage of the time to extend the test. I suspect the professors would generally like to extend the test to be more comprehensive, and are limited by the time limits of the test, and tests will naturally extend to fill whatever default time is allotted.

> Leaving exams early is generally an option in most standardized testing systems

I didn't because I'd use the extra time to go over my answers again looking for errors.


‘Random’ configurations are going to be dominated by fixed scale noise of a general 50% density, which is going to have very common global evolutionary patterns - it’s almost homogenous so there’s little opportunity for interesting things to occur. You need to start with more scale free noise patterns, so there are more opportunities for global structures to emerge.

What people would print and what soldiers would say in the 1940s were likely somewhat divergent.


100%


Unclear. ‘Foo’ has a life and origin of its own and is well attested in MIT culture going back to the 1930s for sure, but it seems pretty likely that it’s counterpart ‘bar’ appears in connection with it as a comical allusion to FUBAR.


On the other hand you can at least expect that if someone causes an accident they will suffer consequences. Courts and insurance exist as mechanisms to transfer liability including for medical bills, and the criminal system (in many countries) can and does punish people for reckless driving.

Sure, you still need to look after yourself in the moment - but there are incentives in place to discourage drivers from misbehaving and those incentives do help reduce the likelihood that you will be a victim of an accident. They’re not great! Bad drivers get away with a lot, and cyclists are not adequately considered in many mechanisms, but they are better than nothing.

Yet ‘nothing’ is what we have with respect to online fraud, where the situation is more akin to one where driving laws don’t exist or aren’t enforced, nobody drives cars with license plates, you can’t get insurance, and if you are run off the road the police’s reaction is to tell you that roads are inherently dangerous places. Bad drivers will never be caught, and if they drive over you they get to steal your bike and sell it. Entire businesses are set up around forcing cyclists into streets where they can be mowed down with steamrollers, and the police claim to be powerless to stop them.

There are numerous mechanisms that exist that make it possible for us to share roads without inherent trust. And even those are inadequate. Fraudulent behavior online has none of the societal mechanisms that we have created to constrain driving.


There are plenty of countries where driving laws aren't enforced. Using the Internet anywhere is sort of like driving in a corrupt developing country. This creates a certain amount of cognitive dissonance in people who have spent their whole lives in functional developed countries where obeying the law is the default behavior.


Right. The internet is more like driving in the kind of country where people give you advice like ‘if you come across roadworks and a guy dressed as a cop tries to wave you over, you need to hit reverse and pull a J-turn out of there or you will die’.


Rule of thumb is that the estimated remaining duration of an outage is equal to the current elapsed duration of the outage.


That doesn’t give you high availability; it doesn’t give you monitoring and alerting; it doesn’t give you hardware failure detection and replacement; it doesn’t solve access control or networking…

Managed databases are a lot more than apt install postgresql.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: