If you just say, "here is what llm said" if that turns out to be nonsense you can say something like, "I was just passing along the llm response, not my own opinion"
But if you take the llm response and present it as your own, at least there is slightly more ownership over the opinion.
This is kind of splitting hairs but hopefully it makes people actually read the response themselves before posting it.
Taking ownership isn't the worst instinct, to be fair. But that's a slightly different formulation.
"People are responsible for the comments that they post no matter how they wrote them. If you use tools (AI or otherwise) to help you make a comment, that responsibility does not go away"
> At the very least the copy paster should read what the llm says, interpret it, fact check it, then write their own response.
then write their own response using an AI to improve the quality of the response? the implication here is that an AI user is going to do some research when using the AI was their research. to do the "fact check" as you suggest would mean doing actual work, and clearly that's not something the user is up for indicated by use of the AI.
so, to me, your suggestion is fantasy level thinking
I have a client who does this — pastes it into text messages! as if it will help me solve the problem they are asking me to solve — and I'm like "that's great I won't be reading it". You have to push back.
That's because "certain" and "know the answer" has wildly different definitions depending on the person, you need to be more specific about what you actually mean with that. Anything that can be ambiguous, will be treated ambiguously.
Anything that you've mentioned in the past (like `no nonsense`) that still exists in context, will have a higher possibility of being generated than other tokens.
> By the numbers, Star Wars is far more grounded as science fiction that Star Trek, but people will insist the former is at best merely "science fantasy." It's really all just vibes.
Worked with it a bit last night! Seems quick. I did run into the same problem I have with Gemini often where the response says something like, "I need to do x" or "I did x" and then nothing actually happens. Agent seems to think it actually does finish the task but it stops part way.
But I'm sure they will sort that out, as I dont have that issue with other anthropic models.
This is interesting. I’ve had this same issue trying to build an agentic system with the smaller ChatGPT models, almost no matter the prompt (“think aloud” are magic words that help a lot, but it’s still flaky). Most of the time it would either perform the tool call before explaining it (the default) or explain it but then not actually make the call.
I’ve been wondering how Cursor et al solved this problem (having the LLM explain what it will do before doing it is vitally important IMO), but maybe it’s just not a problem with the big models.
Your experience seems to support that smaller models are just generally worse about tool calling (were you using Gemini Flash?) when asked to reason first.
reply