The reasons listed in TFA - "confidentiality, sensitive data and compromising authors’ intellectual property" - make sense to discourage reviewers from using cloud-based LLMs.
There are also reasons for discouraging the use LLMs in peer review at all: it defeats the purpose of peer in the peer review; hallucinations; criticism not relevant to the community; and so on.
However, I think it's high time to reconsider what scientific review is supposed to be. Is it really important to have so-called peers as gatekeepers? Are there automated checks we can introduce to verify claims or ensure quality (like CI/CD for scientific articles), and leave content interpretation to the humans?
Let's make the benefits and costs explicit: what would we be gaining or losing if we just switched to LLM-based review, and left the interpretation of content to the community? The journal and conference organizers certainly have the data to do that study; and if not, tool providers like EasyChair do.
Yes, there are often strong reasons to have peers as gatekeepers. Scientific writing is extremely information-dense. Consider a niche technical task that you work on -- now consider summarizing a day's worth of work in one or two sentences, designed to be read by someone else with similar expertise. In most scientific fields, the niches are pretty small, The context necessary to parse that dense scientific writing into a meaningful picture of the research methods is often years/decades of work in the field. Only peers are going to have that context.
There are also strong reasons why the peers-as-gatekeepers model is detrimental to the pursuit of knowledge, such as researchers forming semi-closed communities that bestow local political power on senior people in the field, creating social barriers to entry or critique. This is especially pernicious given the financial incentives (competition for a limited pool of grant money; award of grant money based on publication output) that researchers are exposed to.
I think if you leave authors alone they will be more likely to write in the first category rather than the second. After all, papers are mainly written to communicate your findings to your direct peers. So information dense isn't bad because the target audience understands.
Of course that makes it harder for people outside to penetrate but this also depends on the culture of the specific domain and there's usually people writing summaries and surveys. Great task for grad students tbh (you read a ton of papers, summarize, and by that point you should have a good understanding of what needs to be worked on in the field and not just dragged through by your advisor)
Agreed: information-dense isn't bad at all. It's a reason for peer review, though: people other than peers in the field have a much harder time reviewing an article for legitimacy, because they lack the context.
It's a fair point. In the ideal setting, peer review can really be a very informative and important gate. And who better to be the gatekeeper than someone who understands the context?
However, there are still big issues with how these peers perform reviews today [1].
For example, if there's a scientifically arbitrary cutoff (e.g., the 25% acceptance rate at top conferences), reviewers will be mildly incentivized to reject (what they consider to be) "borderline-accept" submissions. If the scores are still "too high", the associate editors will overrule the decision of the reviewers, sometimes for completely arbitrary reasons [2].
There's also a whole number of things reviewers should look out for, but for which they neither have the time, space, tools, nor incentives to do. For example, reviewers are meant to check if the claims fit what is cited, but I can't know how many actually take the time to look at the cited content. There's also checking for plagiarism, GenAI and hallucinated content, does the evidence support the claims, how were charts generated, "novelty", etc. There are also things that reviewers shouldn't check, but that pop up occasionally [3].
However, you would be right to point out that none of this has to do with peers doing the gatekeeping, but with how the process is structured. But I'd argue that this structure is so common that it's basically synonymous with peer review. If it results in bad experiences often enough, we really need to push for the introduction of more tools and honesty into the process [4].
[1] This is based on my experience as a submitter and a reviewer. From what I see/hear online and in my community, it's not an uncommon experience, but it could be a skewed sample.
[3] Example things reviewers shouldn't check for or use as arguments: did you cite my work; did you cite a paper from the conference; can I read the diagram without glasses if I print out the PDF; do you have room to appeal if I say I can't access publicly available supplementary material; etc.
[4] Admittedly, I also don't know what would be the solution. Still, some mechanisms come to mind: open but guaranteed double-blind anonymous review; removal of arbitrary cutoffs for digital publications; (responsible, gradual) introduction of tools like LLMs and replication checks before it gets to the review stage; actually monitoring reviewers and acting on bad behavior.
> However, I think it's high time to reconsider what scientific review is supposed to be
I've been arguing for years we should publish to platforms like OpenReview and that basically we check for plagiarism and obvious errors but otherwise publish.
The old days the bottleneck was the physical sending out of papers. Now that's cheap. So make comments public. We're all on the same side. The people that will leave reviews are more likely to actually be invested in the topic rather than doing review as purely a service. It's not perfect but no system will be and we currently waste lots of time chasing reviewers
I agree. OpenReview is a good initiative, and while it has its own flaws, it's definitely a step in the right direction.
The arXiv and the derivative preprint repositories (e.g., bioRxiv) are other good initiatives.
However, I don't think it's enough to leave the content review completely to the community. There's are known issues with researchers using arXiv, for example, to stake claims on novel things, or readers jumping on the claims made by well-known institutions in preprints, which may turn out to be overconfident or bogus.
I believe that a number of checks (beyond plagiarism) need to happen before the paper is endorsed by a journal or a conference. Some of these can and should be done in a peer review-like format, but it needs to be heavily redesigned to support review quality without sacrificing speed. Also, there are things that we have good tools for (e.g., checking citation formatting), so this part should be integrated.
Plus, time may be one of the bottlenecks, but that's partly because publishers take money from academic institutions, yet expect voluntary service. There's no reason for this asymmetry, IMO.
> There's are known issues with researchers using arXiv, for example, to stake claims on novel things
I think this is more a function of the metric system. That we find works get through review better when "novel". So this is used over-zealously. But get rid of the formal review system and that goes too.
> which may turn out to be overconfident
This is definitely an issue but one we must maintain as forgivable. Mistakes must be allowed in science. Minimized, but allowed. Mistakes are far too easy to make when working at the edge of knowledge. I'd wager >90% of papers have mistakes. I can tell you that 100% of mine have mistakes (all found after publication) and I don't know another researcher who says differently.
> bogus
And these people should be expelled.
A problem that the current system actually perpetuates. This is because when authors plagiarize the papers get silently desk rejected. Other researchers do not learn of this and cannot then take extra precaution at other works by these authors. IMO fraud is one of the greatest sins you can make in science. Science depends a lot on trust (even more so because our so-called peer-review system places emphasis on novelty and completely rejects replication) on authors.
The truth is that no reviewer can validate claims by reading a paper. I can tell you I can't do that even for papers that are in my direct niche. But what a reviewer can do is invalidate. We need to be clear about that difference and the bias. Because we should never interpret papers as "this is the truth" but "this is likely the truth under these specific conditions". Those are very different things.
I agree that checking is better, but I don't believe absolutely necessary. The bigger problem I have right now is that we are publishing so much that it is difficult to get a reviewer who is a niche expert, or sub-domain expert. More generalized reviewers can't properly interpret papers. It is too easy to over-generalize results and think they are just doing the same thing as another work (I've seen this way too often), or see something as too incremental (almost everything is incremental... and it is going to stay that way as long as we have a publish or perish system). BUT the people that are niche experts are going to tend to find the papers because they are seeking them out.
But what I think does need to be solved still is the search problem. It's getting harder and frankly we shouldn't make scientists also be marketers. It is a waste of time and creates perverse incentives, as you've even mentioned.
> because publishers take money from academic institutions,
And the government.
Honestly I hate how shady this shit is. I understand conferences, where there's a physical event, but paid-access journals are a fucking scam (I'd be okay with a small fee for server costs and such but considering arxiv and openreview, I suspect this isn't very costly). They are double dipping. Getting money from govs, academics paying for access, but then getting the literal "product" they are selling given to them for free and then the "quality control" of that "product" also being done for free. And by "for free" I mean on the dime of academic institutions and government tax dollars.
This misses the point entirely. Science is a dialogue. Peer review is just a protocol to signal that the article has been part of some dialogue.
Anyone can put anything to paper. Now more than ever - see all the vibe physics floating around. Peer review is just an assurance that what you are about to read isn't just some write-only output of some spurious thought process.
There are also reasons for discouraging the use LLMs in peer review at all: it defeats the purpose of peer in the peer review; hallucinations; criticism not relevant to the community; and so on.
However, I think it's high time to reconsider what scientific review is supposed to be. Is it really important to have so-called peers as gatekeepers? Are there automated checks we can introduce to verify claims or ensure quality (like CI/CD for scientific articles), and leave content interpretation to the humans?
Let's make the benefits and costs explicit: what would we be gaining or losing if we just switched to LLM-based review, and left the interpretation of content to the community? The journal and conference organizers certainly have the data to do that study; and if not, tool providers like EasyChair do.