More

segh · 2026-01-08T14:38:53 1767883133

School incentives are not really aligned around maximizing learning rate for every student. (E.g. that is why there is/was debate around teaching phonetics)

segh · 2025-04-30T15:24:32 1746026672

Cool experiment! My intuition suggests you would get a better result if you let the LLM generate tokens for a while before giving you an answer. Could be another experiment idea to see what kind of instructions lead to better randomness. (And to extend this, whether these instructions help humans better generate random numbers too.)

segh · 2025-04-29T11:09:33 1745924973

ChatGPT's tone is slowly taking over the entire internet

segh · 2025-04-24T16:19:29 1745511569

People still play chess, even though now AI is far superior to any human. In the future you will still be able to hand-write code for fun, but you might not be able to earn a living by doing it.

pmarreck · 2025-04-25T19:26:22 1745609182

Same with sewing/knitting by hand.

segh · 2025-04-24T16:18:14 1745511494

The live demos are using a very cheap and not very smart model. Do not update your opinion on AI capabilities based on the poor performance of gpt-4o-mini

segh · 2025-04-24T16:15:56 1745511356

The system prompt can include examples. That is often a good idea.

segh · 2025-03-31T15:27:12 1743434832

Lots of people are building on the edge of current AI capabilities, where things don't quite work, because in 6 months when the AI labs release a more capable model, you will just be able to plug it in and have it work consistently.

cube00 · 2025-03-31T17:18:34 1743441514

> because in 6 months when the AI labs release a more capable model

How many years do we have to keep hearing this line? ChatGPT is two years old and still can't be relied on.

postexitus · 2025-03-31T15:56:22 1743436582

and where is that product that was developed on the edge of current AI capabilities and now with latest AI model plugged in it's suddenly working consistently? All I am seeing is models getting better and better in generating videos of spaghetti eating movie stars.

alltoowell · 2025-03-31T23:21:32 1743463292

They're coming. I've seen the observability tools try to do this but I still have to tweak it. it's just time-consuming. Empromptu.ai is the closest to solving this problem. They are the only ones that have a library that you install in your to do system optimization, evals, for accuracy in real-time.

segh · 2025-03-31T16:32:22 1743438742

For me, they have come from the AI labs themselves. I have been impressed with Claude Code and OpenAI's Deep Research.

vslira · 2025-03-31T17:22:23 1743441743

while i'm bullish on AI capabilities, that is not a very optimistic observation for developers building on top of it

techpineapple · 2025-03-31T15:39:01 1743435541

In 6 months when FSD is completed, and we get robots in every home? I suspect we keep adding features, because reliability is hard. I do not know what heuristic you would be looking to conclude that this problem will eventually be solved by current AI paradigms.

thornewolf · 2025-03-31T15:53:06 1743436386

GP comment is what has already happened "every 6 months" multiple times

segh · 2025-03-27T17:12:42 1743095562

This is the crux of the issue. Whether you think this is like extending a ladder to the moon, or more like we figured out how to get to the moon and are now aiming at Jupiter.

segh · 2025-03-27T17:05:01 1743095101

Claude Plays Pokemon is one person's side project to see how well Sonnet can play pokemon. It is a neat LLM benchmark; it's not a serious attempt at making Pokemon-playing AI.

disambiguation · 2025-03-27T17:37:32 1743097052

It may not be serious, but it's a true display of an LLMs limitations. A bad look for Claude, and a missed advertising opportunity if someone can do better.

segh · 2025-03-27T12:56:23 1743080183

Do you disagree that AI will ever reach the level of a "high-income knowledge worker", or do you disagree that it will happen in a year or two?

nkassis · 2025-03-27T13:20:47 1743081647

Timelines are very uncertain, also definition what would satisfy this statement of operating as a high income knowledge worker is very unclear. Is it for one task? Many tasks? Any task?

It's highly likely that these CEO will continue to hype up a singular examples and misrepresented claims that lead to setting outsized expectations. Already seeing expectations that all tasks are now possible and causing chaos in the corporate world of folks trying to be on the bandwagon.

Also wonder if it hides the true value that the symbiotic work of human with phd level AI assistant is going to out perform any autonomous agent for the foreseeable future.

rsynnott · 2025-03-27T13:52:51 1743083571

I'd certainly question whether LLMs will. AI writ large, on an infinite timescale, who knows. But for LLMs I would be sceptical. The only knowledge worker jobs they seem seriously likely to take over are writers of high volume, low-quality bullshit (for instance, real estate ads, which have always had a bit of a problem with both stylistic suck and, well, reality), but those generally aren't particularly high-paid.

op00to · 2025-03-27T15:42:52 1743090172

The level of what?