Somewhat tangential to the article, but why is SQL considered a programming language?
I understand that's the convention according to the IEEE and Wikipedia [1], but the name itself - Structured Query Language - reveals that its purpose is limited by design. It's a computer language [2] for sure, but why programming?
"structured query language" is actually a backronym, SEQUEL is indeed a programming language and the only mainstream 4GL. consider the output of the compiler (query planner) is a program with specific behavior, just that your sql code is not the only source - the other inputs are the schema and its constraints and statistics. it's an elegant way to factor the sourcecode for a program, I wonder if Raymond Boyce didn't die young what kind of amazing technology we might have today.
With support for Common Table Expressions (CTE), SQL becomes a Turing complete language. To be honest, it makes me somewhat uncomfortable that a query sent to a DB server could be non-terminating and cause a server thread to enter an infinite loop. On the other hand, the practical difference between a query that contains an infinite loop and one that runs for days is probably negligible.
To be honest, I'd like to chip in that it is technically possible to write brainf*ck, an esoteric programming language but nonetheless, its a programming language
Btw this runs in sqlite, you can try it yourself if you are interested.
Source: I was thinking of creating a programming language paradigm like sqlite/smalltalk once where resumed execution/criu like possibilities were built in. Let me know if someone knows something like this too. I kinda gave up on the project but I knew that there was this one language which supported this paradigm but it was very complicated to understand and had a lot of other first time paradigm like the code itself / the ast tree is sort of like a database itself but so the tangential goes.
Because stored procedures do exist, and there isn't a single production quality RDMS that doesn't go beyond DDL and DML, adding structured programming extensions.
Also, even within the standard itself, it allows for declarative programming.
Because "programming language" is an adjective or a descriptive term. Whatever looks like a programming language, can be called a programming language.
This is completely wrong. The SQL spec isn't Turning complete but multiple DBs provide Turing complete extensions (like pgplsql) that make it so. Also, even without the extensions, it is still very much a programming language by any definition of the term. Like most takes on SQL, it is more about your understanding (or lack thereof) of SQL.
I was under the impression that recursive CTEs make the SQL spec Turing complete. Not that it makes any practical difference, it's still very difficult to use for "general purpose" programming, and still very practical to use for data processing / management.
Last year I read about some database researcher who implemented AoC in pretty standard SQL.
While I agree that we need a word for this type of behavior, hallucinate is a wrong choice IMO.
Hallucinations are already associated with a type of behavior, which is (roughly defined) "subjectively seeing/hearing things which aren't there". This is an input-level error, not the right umbrella term for the majority of errors happening with LLMs, many if which are at output-level.
I don't know what would be a better term, but we should distinguish between different semantic errors, such as:
- confabulating, i.e., recalling distorted or misinterpreted memories;
- lying, i.e., intentionally misrepresenting an event or memory;
- bullshitting, i.e., presenting a version without regard for the truth or provenance; etc.
I'm sure someone already made a better taxonomy, and hallucination is OK for normal public discussions, but I'm not sure why the distinctions aren't made in supposedly more serious works.
I mean, I think you're right that confabulation is probably a more correct technical term, but we all use hallucinate now, so it doesn't really matter. It might have been useful to argue about it 4 or 5 years ago, but that ship has long since sailed. [1]
And I think we already distinguish between types of errors -- LLM's effectively don't lie, AFAIK, unless you're asking them to engage in role-play or something. They mostly either hallucinate/confabulate in terms of inventing knowledge they don't have, or they just make "mistakes" e.g. in arithmetic, or in attempting to copy large amounts of code verbatim.
And when you're interested in mistakes, you're generally interested in a specific category of mistakes, like arithmetic, or logic, or copying mistakes, and we refer to them as such -- arithmetic errors, logic errors, etc.
So I don't think hallucination is taking away from any kind of specificity. To the contrary, it is providing specificity, because we don't call arithmetic errors hallucinations. And we use the word hallucination precisely to distinguish it from these run-of-the-mill mistakes.
My experience: I often thought that I didn't have the time to learn (hard) things, only to find out sooner or later that I actually did, and still do.
At work, this usually meant that I was giving myself tighter deadlines than they needed to be, or that I was putting too much effort into tasks nobody cared that much about. Over time, I learned that it's OK not to put 100% of energy into the assigned task. Sometimes, it's even encouraged to use that extra energy to learn.
Arguably, I did have the privilege of starting out in salaried European office jobs, where there are more robust boundaries and opportunities. It's obvious how precarious physical work discourages this kind of learning. And reading comments like yours, it's clear how lucky I was to have managers and environments that didn't exploit my eagerness to put pressure on myself.
But if you do have an opportunity to make adjustments, I'd suggest putting less pressure on performing like an athlete, and channeling that energy into learning opportunities instead. Rarely will anyone carve out time for your learning, but they may be responsive to your request or boundaries.
Small correction to your point: they perhaps provide a reason for peer review to happen, but it's scientists themselves who coordinate and provide the actual peer review.
That is, unless ACM and Nature have a different approach to organizing peer review, in which case my correction is wrong. But I believe my point stands for many conferences and journals.
The reasons listed in TFA - "confidentiality, sensitive data and compromising authors’ intellectual property" - make sense to discourage reviewers from using cloud-based LLMs.
There are also reasons for discouraging the use LLMs in peer review at all: it defeats the purpose of peer in the peer review; hallucinations; criticism not relevant to the community; and so on.
However, I think it's high time to reconsider what scientific review is supposed to be. Is it really important to have so-called peers as gatekeepers? Are there automated checks we can introduce to verify claims or ensure quality (like CI/CD for scientific articles), and leave content interpretation to the humans?
Let's make the benefits and costs explicit: what would we be gaining or losing if we just switched to LLM-based review, and left the interpretation of content to the community? The journal and conference organizers certainly have the data to do that study; and if not, tool providers like EasyChair do.
Yes, there are often strong reasons to have peers as gatekeepers. Scientific writing is extremely information-dense. Consider a niche technical task that you work on -- now consider summarizing a day's worth of work in one or two sentences, designed to be read by someone else with similar expertise. In most scientific fields, the niches are pretty small, The context necessary to parse that dense scientific writing into a meaningful picture of the research methods is often years/decades of work in the field. Only peers are going to have that context.
There are also strong reasons why the peers-as-gatekeepers model is detrimental to the pursuit of knowledge, such as researchers forming semi-closed communities that bestow local political power on senior people in the field, creating social barriers to entry or critique. This is especially pernicious given the financial incentives (competition for a limited pool of grant money; award of grant money based on publication output) that researchers are exposed to.
I think if you leave authors alone they will be more likely to write in the first category rather than the second. After all, papers are mainly written to communicate your findings to your direct peers. So information dense isn't bad because the target audience understands.
Of course that makes it harder for people outside to penetrate but this also depends on the culture of the specific domain and there's usually people writing summaries and surveys. Great task for grad students tbh (you read a ton of papers, summarize, and by that point you should have a good understanding of what needs to be worked on in the field and not just dragged through by your advisor)
Agreed: information-dense isn't bad at all. It's a reason for peer review, though: people other than peers in the field have a much harder time reviewing an article for legitimacy, because they lack the context.
It's a fair point. In the ideal setting, peer review can really be a very informative and important gate. And who better to be the gatekeeper than someone who understands the context?
However, there are still big issues with how these peers perform reviews today [1].
For example, if there's a scientifically arbitrary cutoff (e.g., the 25% acceptance rate at top conferences), reviewers will be mildly incentivized to reject (what they consider to be) "borderline-accept" submissions. If the scores are still "too high", the associate editors will overrule the decision of the reviewers, sometimes for completely arbitrary reasons [2].
There's also a whole number of things reviewers should look out for, but for which they neither have the time, space, tools, nor incentives to do. For example, reviewers are meant to check if the claims fit what is cited, but I can't know how many actually take the time to look at the cited content. There's also checking for plagiarism, GenAI and hallucinated content, does the evidence support the claims, how were charts generated, "novelty", etc. There are also things that reviewers shouldn't check, but that pop up occasionally [3].
However, you would be right to point out that none of this has to do with peers doing the gatekeeping, but with how the process is structured. But I'd argue that this structure is so common that it's basically synonymous with peer review. If it results in bad experiences often enough, we really need to push for the introduction of more tools and honesty into the process [4].
[1] This is based on my experience as a submitter and a reviewer. From what I see/hear online and in my community, it's not an uncommon experience, but it could be a skewed sample.
[3] Example things reviewers shouldn't check for or use as arguments: did you cite my work; did you cite a paper from the conference; can I read the diagram without glasses if I print out the PDF; do you have room to appeal if I say I can't access publicly available supplementary material; etc.
[4] Admittedly, I also don't know what would be the solution. Still, some mechanisms come to mind: open but guaranteed double-blind anonymous review; removal of arbitrary cutoffs for digital publications; (responsible, gradual) introduction of tools like LLMs and replication checks before it gets to the review stage; actually monitoring reviewers and acting on bad behavior.
> However, I think it's high time to reconsider what scientific review is supposed to be
I've been arguing for years we should publish to platforms like OpenReview and that basically we check for plagiarism and obvious errors but otherwise publish.
The old days the bottleneck was the physical sending out of papers. Now that's cheap. So make comments public. We're all on the same side. The people that will leave reviews are more likely to actually be invested in the topic rather than doing review as purely a service. It's not perfect but no system will be and we currently waste lots of time chasing reviewers
I agree. OpenReview is a good initiative, and while it has its own flaws, it's definitely a step in the right direction.
The arXiv and the derivative preprint repositories (e.g., bioRxiv) are other good initiatives.
However, I don't think it's enough to leave the content review completely to the community. There's are known issues with researchers using arXiv, for example, to stake claims on novel things, or readers jumping on the claims made by well-known institutions in preprints, which may turn out to be overconfident or bogus.
I believe that a number of checks (beyond plagiarism) need to happen before the paper is endorsed by a journal or a conference. Some of these can and should be done in a peer review-like format, but it needs to be heavily redesigned to support review quality without sacrificing speed. Also, there are things that we have good tools for (e.g., checking citation formatting), so this part should be integrated.
Plus, time may be one of the bottlenecks, but that's partly because publishers take money from academic institutions, yet expect voluntary service. There's no reason for this asymmetry, IMO.
> There's are known issues with researchers using arXiv, for example, to stake claims on novel things
I think this is more a function of the metric system. That we find works get through review better when "novel". So this is used over-zealously. But get rid of the formal review system and that goes too.
> which may turn out to be overconfident
This is definitely an issue but one we must maintain as forgivable. Mistakes must be allowed in science. Minimized, but allowed. Mistakes are far too easy to make when working at the edge of knowledge. I'd wager >90% of papers have mistakes. I can tell you that 100% of mine have mistakes (all found after publication) and I don't know another researcher who says differently.
> bogus
And these people should be expelled.
A problem that the current system actually perpetuates. This is because when authors plagiarize the papers get silently desk rejected. Other researchers do not learn of this and cannot then take extra precaution at other works by these authors. IMO fraud is one of the greatest sins you can make in science. Science depends a lot on trust (even more so because our so-called peer-review system places emphasis on novelty and completely rejects replication) on authors.
The truth is that no reviewer can validate claims by reading a paper. I can tell you I can't do that even for papers that are in my direct niche. But what a reviewer can do is invalidate. We need to be clear about that difference and the bias. Because we should never interpret papers as "this is the truth" but "this is likely the truth under these specific conditions". Those are very different things.
I agree that checking is better, but I don't believe absolutely necessary. The bigger problem I have right now is that we are publishing so much that it is difficult to get a reviewer who is a niche expert, or sub-domain expert. More generalized reviewers can't properly interpret papers. It is too easy to over-generalize results and think they are just doing the same thing as another work (I've seen this way too often), or see something as too incremental (almost everything is incremental... and it is going to stay that way as long as we have a publish or perish system). BUT the people that are niche experts are going to tend to find the papers because they are seeking them out.
But what I think does need to be solved still is the search problem. It's getting harder and frankly we shouldn't make scientists also be marketers. It is a waste of time and creates perverse incentives, as you've even mentioned.
> because publishers take money from academic institutions,
And the government.
Honestly I hate how shady this shit is. I understand conferences, where there's a physical event, but paid-access journals are a fucking scam (I'd be okay with a small fee for server costs and such but considering arxiv and openreview, I suspect this isn't very costly). They are double dipping. Getting money from govs, academics paying for access, but then getting the literal "product" they are selling given to them for free and then the "quality control" of that "product" also being done for free. And by "for free" I mean on the dime of academic institutions and government tax dollars.
This misses the point entirely. Science is a dialogue. Peer review is just a protocol to signal that the article has been part of some dialogue.
Anyone can put anything to paper. Now more than ever - see all the vibe physics floating around. Peer review is just an assurance that what you are about to read isn't just some write-only output of some spurious thought process.
First two sentences are key. The reason why HN is so much better than other fora (IMO) is that the mods don't allow lever-pulling and astroturfing to overtake regular contributions. Yet it's also popular, so you're bound to get some activity on most posts.
Sure, it can be frustrating if you're trying to promote a product or farm karma on posts. But the fact that mostly nobody cares about karma means that you can post something and have it be evaluated on its technical, economic, social merits.
Obviously, there are caveats to this - i.e., anything US- and FAANG-related is bound to get much more activity than otherwise - but the overall atmosphere of HN is refreshing compared to Reddit.
I can't know for sure. For me, it's just the "eyeball method" of comparing HN and different subreddits on Reddit.
As to how I (or anyone) could show this, here are a few example questions:
1. How many examples of stealthy but otherwise blatant promotion do you see in the comments? Not every astroturfing campaign will be successful or original, so you'd be able to notice some patterns. Plus, HN is already commercially oriented, and there's the "Show HN" option, so it reduces the incentives for astroturfing.
2. Alternatively, how much controversy is there around the specific type of forum? For some subreddits, for example, you'd be able to see counter-subreddits popping up when participants feel the mods are abusing their power to promote one type of opinion.
3. Is a certain type of political/brand-related opinion or interpretation always at the top of your comment feed? For example, if upvotes determine the order of the comments, do you consistently see fewer critical comments on things that you'd expect the community to react to in different ways.
4. Do you consistently see some contributors having more power in discussions over others? Other than the mods, obviously. If this is the case, karma (i.e., number of upvotes) often has more value.
I'd be shocked if it were because the owners now this is a fragile thing and one word from Dan or Tom that this is or was the case and half the participants here would walk. The owners are more than likely well aware of that risk and are not going to destroy the goose that lays the golden eggs.
Check out the treatment PG (and Garry Tan) got in the thread about defending YC's effective investment into Installmonetizer for a good example of news.ycombinator.com's response to such crap.
I think that's all true, plus, from my experience:
- it's now more difficult to identify a truly unexplored area of work within a relatively short amount of time (e.g., the first 2 explorative years of a PhD lasting 4-6 years).
- even if you find a niche where you could make a completely original contribution, you're disincentivized by how hard it is to convince your supervisor and peer reviewers - unless it's painfully obvious or you invest a lot of upfront effort to prove its worth.
- media promotes a fetishized version of original contributions (e.g., theory of relativity that led to a paradigm shift), whereas scientists are taught to always justify their contribution with respect to the existing work; this inevitably prunes many paths and ideas.
- although interdisciplinarity is promoted in opinion pieces, interdisciplinary contributions are often discouraged by the discipline-related communities.
None of this is an excuse, but they're certainly filters and pressure chambers.
Pretty nice. I've been using LLMs to generate different Python and JS tools for wrangling data for ontology engineering purposes.
More recently, I've found a lot of benefit from using the extended thinking mode in GPT-5 and -5.1. It tends to provide a fully functional and complete result from a zero-shot prompt. It's as close as I've gotten to pair programming with a (significantly) more experienced coder.
One functional example of that (with 30-50% of my own coding, reprompting and reviews) is my OntoGSN [1] research prototype. After a couple of weeks of work, it can handle different integration, reasoning and extension needs of people working in assurance, at least based on how I understood them. It's an example of a human-AI collab that I'm particularly proud of.
I looked at the original study [1], and it seems to be a very well-supported piece of research. All the necessary pieces are there, as you would expect from a Nature publication. And overall, I am convinced there's an effect.
However, I'm still skeptical of the effects or size of the change. First, a point that applies to the Massachusetts ballot on psychedelics in particular, putting views into percentages, and getting accurate results from political polls are notoriously difficult tasks [2]. Therefore, the size of any effect is faced with whatever confounding variables make those tasks difficult.
Second, there could be some level Hawthorne effect [3] at play here, such that participants may report being (more) convinced because that's what (they think) is expected of them. I'm not familiar with the recruiting platforms they used, but if they're specialized in paid or otherwise professional surveys, I wonder if participants feel an obligation to perform well.
Third, and somewhat related to the above, participants could state they'd vote Y after initially reporting X preference, because they know it's a low-cost no-commitment claim. In other words, they can claim they'd now vote for Y without fear of judgement because it's a lab environment and an anonymous activity, but they can always go back to their original position once the actual vote happens. To show the size of the effect with respect to other things, researchers will have to make the stakes higher, or follow-up with participants after the vote and find out if/why they changed their mind (again).
Fourth, if one 6-minute-average conversation with a chatbot could convince an average voter, I wonder how much did they know about the issue/candidate being voted on. More cynically for the study, there may be much more at play with actual vote preference than a single dialectic presentation of facts. For example: salient events that happen in the period up to the election; emotional connection with the issue/candidate; personal experiences.
Still, this does not make the study flawed for not covering everything. We can learn a lot from this work, and kudos to the authors for publishing it.
I guess vibe coding is fun as a meme, but it hides the power of (what someone else on HN) called language user interfaces (LUIs).
The author's point is correct IMO. If you have direct mappings between assembly and natural language, there's no functional need for these intermediate abstractions to act as pseudo-LUIs. If you could implement it, you would just need two layers above assembly: an LLM OS [1], and a LUI-GUI combo.
However, I think there's a non-functional, quality need for intermediate abstractions - particularly to make the mappings auditable, maintainable [2], understandable, etc. For most mappings, there won't be a 1:1 representation between a word and an assembly string.
It's already difficult for software devs to balance technical constraints and possibilities with vague user requirements. I wonder how an LLM OS would handle this, and why we would trust that its mappings are correct without wanting to dig deeper.
[1] Coincidentally, just like "vibe coding", this term was apparently also coined by Andrej Karpathy.
[2] For example, good luck trying to version control vectors.
I understand that's the convention according to the IEEE and Wikipedia [1], but the name itself - Structured Query Language - reveals that its purpose is limited by design. It's a computer language [2] for sure, but why programming?
[1] https://en.wikipedia.org/wiki/List_of_programming_languages
[2] https://en.wikipedia.org/wiki/Computer_language
reply