School incentives are not really aligned around maximizing learning rate for every student. (E.g. that is why there is/was debate around teaching phonetics)
Cool experiment! My intuition suggests you would get a better result if you let the LLM generate tokens for a while before giving you an answer. Could be another experiment idea to see what kind of instructions lead to better randomness. (And to extend this, whether these instructions help humans better generate random numbers too.)
People still play chess, even though now AI is far superior to any human. In the future you will still be able to hand-write code for fun, but you might not be able to earn a living by doing it.
The live demos are using a very cheap and not very smart model. Do not update your opinion on AI capabilities based on the poor performance of gpt-4o-mini
Lots of people are building on the edge of current AI capabilities, where things don't quite work, because in 6 months when the AI labs release a more capable model, you will just be able to plug it in and have it work consistently.
and where is that product that was developed on the edge of current AI capabilities and now with latest AI model plugged in it's suddenly working consistently? All I am seeing is models getting better and better in generating videos of spaghetti eating movie stars.
They're coming. I've seen the observability tools try to do this but I still have to tweak it. it's just time-consuming. Empromptu.ai is the closest to solving this problem. They are the only ones that have a library that you install in your to do system optimization, evals, for accuracy in real-time.
In 6 months when FSD is completed, and we get robots in every home? I suspect we keep adding features, because reliability is hard. I do not know what heuristic you would be looking to conclude that this problem will eventually be solved by current AI paradigms.
This is the crux of the issue. Whether you think this is like extending a ladder to the moon, or more like we figured out how to get to the moon and are now aiming at Jupiter.
Claude Plays Pokemon is one person's side project to see how well Sonnet can play pokemon. It is a neat LLM benchmark; it's not a serious attempt at making Pokemon-playing AI.
It may not be serious, but it's a true display of an LLMs limitations. A bad look for Claude, and a missed advertising opportunity if someone can do better.
Timelines are very uncertain, also definition what would satisfy this statement of operating as a high income knowledge worker is very unclear. Is it for one task? Many tasks? Any task?
It's highly likely that these CEO will continue to hype up a singular examples and misrepresented claims that lead to setting outsized expectations. Already seeing expectations that all tasks are now possible and causing chaos in the corporate world of folks trying to be on the bandwagon.
Also wonder if it hides the true value that the symbiotic work of human with phd level AI assistant is going to out perform any autonomous agent for the foreseeable future.
I'd certainly question whether LLMs will. AI writ large, on an infinite timescale, who knows. But for LLMs I would be sceptical. The only knowledge worker jobs they seem seriously likely to take over are writers of high volume, low-quality bullshit (for instance, real estate ads, which have always had a bit of a problem with both stylistic suck and, well, reality), but those generally aren't particularly high-paid.
reply