> LLMs can also write prompts and self introspect to debug.
Why should we assume that won't lead to a rabbit hole of misunderstanding or outright hallucination? If it doesn't know what "correct" really is, even infinite levels of supervision and reinforcement might still be toward an incorrect goal.
But it's not just like humans. For one thing it's built differently, with a different relationship between training and execution. It doesn't learn from its mistakes until it gets the equivalent of a brain transplant, and in fact extant AIs are notorious for doubling down instead of accepting correction. Even more importantly, the AI doesn't have real-world context, which is often helpful to notice when "correct" (to the spec) behavior is not useful, acceptable, or even safe in practice. This is why the idea of an AI controlling a physical system is so terrifying. Whatever requirement the prompter forgot to include will not be recognized by the AI either, whereas a human who knows about physical properties like mass or velocity or rigidity will intuitively honor requirements related to those. Adding layers is as likely to magnify errors as to correct them.
> But it's not just like humans. For one thing it's built differently
I'm referring to the behaviour, not the inner nature.
> in fact extant AIs are notorious for doubling down instead of accepting correction.
My experience suggests ChatGPT is better than, say, humans on Twitter.
I've had the misfortune of several IRL humans who were also much, much worse; but the problem was much rarer outside social media.
> Even more importantly, the AI doesn't have real-world context, which is often helpful to notice when "correct" (to the spec) behavior is not useful, acceptable, or even safe in practice.
Absolutely a problem. Not only for AI, though.
When I was a kid, my mum had a kneeling stool she couldn't use, because the woodworker she'd asked to reinforce it didn't understand it and put a rod where your legs should go.
I've made the mistake of trying to use RegEx for what I thought was a limited-by-the-server subset of HTML, despite the infamous StackOverflow post, because I incorrectly thought it didn't apply to the situation.
There's an ongoing two-way "real-world context" miss-match between those who want the state to be able to pierce encryption and those who consider that to be an existential threat to all digital services.
> a human who knows about physical properties like mass or velocity or rigidity will intuitively honor requirements related to those
Yeah, kinda, but also no.
We can intuit within the range of our experience, but we had to invent counter-intuitive maths to make most of our modern technological wonders.
--
All that said, with this:
> It doesn't learn from its mistakes until it gets the equivalent of a brain transplant
You've boosted my optimism that an ASI probably won't succeed if it decided it preferred our atoms to be rearranged to our detriments.
> I'm referring to the behaviour, not the inner nature.
Since the inner nature does affect behavior, that's a non sequitur.
> we had to invent counter-intuitive maths to make most of our modern technological wonders.
Indeed, and that's worth considering, but we shouldn't pretend it's the common case. In the common case, the machine's lack of real-world context is a disadvantage. Ditto for the absence of any actual understanding beyond "word X often follows word Y" which would allow it to predict consequences it hasn't seen yet. Because of these deficits, any "intuitive leaps" the AI might make are less likely to yield useful results than the same in a human. The ability to form a coherent - even if novel - theory and an experiment to test it is key to that kind of progress, and it's something these models are fundamentally incapable of doing.
> Since the inner nature does affect behavior, that's a non sequitur.
I would say the reverse: we humans exhibit diverse behaviour despite similar inner nature, and likewise clusters of AI with similar nature to each other display diverse behaviour.
So from my point of view, that I can draw clusters — based on similarities of failures — that encompasses both humans and AI, makes it a non sequitur to point to the internal differences.
> The ability to form a coherent - even if novel - theory and an experiment to test it is key to that kind of progress, and it's something these models are fundamentally incapable of doing.
Sure.
But, again, this is something most humans demonstrate they can't get right.
IMO, most people act like science is a list of facts, not a method, and also most people mix up correlation and causation.
It’s like when you continually refine a Midjourney image. At first refining it gets better results, but if you keep going the pictures start coming out…really weird. It’s up to the human to figure out when to stop using some sort of external measure of aesthetics.
Why should we assume that won't lead to a rabbit hole of misunderstanding or outright hallucination? If it doesn't know what "correct" really is, even infinite levels of supervision and reinforcement might still be toward an incorrect goal.