Most people don't realize that LLMs by design were not made for document processing, data extraction etc. For that, you would have to use a dedicated tool like Klippa DocHorizon, which built its own AI OCR from scratch. It also provides an API that you can use to send your documents and receive formatted data. It's less popular than, say, Textract or Tesseract, but it's far more accurate, especially if you're dealing with sensitive data that you wouldn't want an LLM to hallucinate.
Interesting concept. I tried it with a text written in Church Slavonic, didn't work. I guess the documents don't have to be THAT old. It would also be nice if you could upload images individually instead of selecting everything from a folder. Either way, nice work.
Thanks! I have not tried anything other than English, not sure how good the LLMs are for that. Did you use Tesseract or Gemini?
Once the page structure is set up from the images (via the directory upload), you can upload new images for each page, but I didn't include an option to just create all the pages manually. It's a good idea. Going to add that...
I've also tried tesseract in the past with handwritten notes, which didn't provide very accurate results. Then I started looking into some commercial solutions and stumbled upon many different tools, but the only one that could handle my handwriting was Klippa DocHorizon: https://www.klippa.com/en/ocr/ It uses machine learning and OCR instead of just plain OCR like tesseract does, so it might be an option to look into. You could also test it out at https://www.klippa.com/en/ocr/tools/
I've been using it for a while and would highly recommend it. hopefully it can work out for your use case
Klippa user here. I've been using it for the past year or so and can also recommend OP use it for scanning and extracting data from business cards. The workflow builder is probably what OP is looking for: https://www.youtube.com/watch?v=1TZJxlaOiKo
Just for curiosity, how did you find out about Klippa?
It’s possible. But if he thinks his company will benefit (like by launching the things into Space) then he probably won’t. Sad that this is where we are. Absolute power corrupts absolutely.
Love seeing vim and neovim being used in more open source projects. Despite the huge learning curve, once vim becomes second nature, it's hard to go back to using the mouse. At least, that's how it was for me. Thanks for creating this cheat sheet, I'm sure lots of beginners will find it useful!
Hey! I am sure it only takes time to become a master of it :) I think what makes people stick is exactly the fact that its configurable, as the sheer process of messing with your setup is fun in itself, not just the flow it can possibly give you! I really do hope that indeed others will find it useful <3
Absolutely! I'm very confident people will find it useful. Are you also perhaps going to include a light mode version of the website? It's currently hard to look at it when the sun is shining in my screen. I think other people would like it as well!
Sure! Actually it is already in there if your Chrome is in light mode, but I couldn't yet figure out how toggle the themes yet with the theme-changer npm. :D When I get back in the evening I'll write this down! Feel free to open an issue for it also!
Yeah, digitizing receipts is still a huge challenge for most companies, especially for expense reimbursements. Even though invoices are increasingly digital, employees still end up with physical receipts for work-related expenses. From what I've seen, there are some interesting contenders like Klippa that seem to solve exactly this problem [1].
Curious to know if anyone heard of or used their OCR or a similar tool. Apparently it's not an LLM in disguise but an actual AI trained on gazillions of documents so the risk of hallucination might be lower than these LLM OCR solutions like Mistral.