I tried using ChatGPT for some handwritten text I couldn't make out and it faile...

fnordpiglet · on Aug 3, 2024

Are you using 4o?

First lying requires agency and intent, which LLMs don’t have and they can’t lie.

Yes it makes stuff up when you put garbage in and uncritically consume the garbage. The key isn’t to look at it as an outsourcing of agency or the easy button but as a tool that gets you started on stuff and a new way of interacting with computers. It also confidently asserts things that are untrue or are subtly off base. To that extent, and in a literally very real sense, this is a very early preview of the technology - of a completely new computing technique that only reached bare minimum usability in the last two years. Would you rather not have early access or have to wait 20 years as accountants and product managers strangle it?

For OCR I’m surprised anyone who has ever used it before would scan illegible hand writing in and expect to not get a bunch of garbage out without it identifying the garbage was semantically wrong. Frontier Multimodal LLMs do an amazing job - compared to the state of the art a year ago. Do they do an amazing job compared to an ever shifting goal post? Are all the guard rails of a mature 30 year old software technique even discovered yet? No. But I’ll tell you from the early days of things, the early days of HTTP was nothing like today. Was HTTP useless because it was so unreliable and flakey? No it was amazing for those with the patience and capacity to dream to building something truly remarkable at the time, like Google or Amazon or eBay.

The PDF issue you had is not expected. I upload PDFs all the time. For instance when I’m working on something, like restringing some hunter Douglas blinds in my house recently, I upload the instructions for the restring kit to a ChatGPT session or Claude and it then becomes something I can ask iteratively how to tackle what I’m working on as I get to challenge spots in the process. It’s not always right and if confidently tells me subtly wrong things. But I pretty quickly realize what’s wrong and isn’t as I work and that’s usually something ambiguous in the instructions and requires a lot more context on something very specific and likely not documented publicly anywhere. But 80% of the time my questions get answered as I work. That’s -amazing- that I can scan a paper instruction sheet into a computer and get step by step guidance that I can interactively interrogate using my voice as I work and it literally understands everything I ask and gives me cogent if sometimes off answers. This is like literally the definition of the future I was promised.