Now that Paperless-NG seems to be going unmaintained (last commit on 15th Sep 2021), Paperless-NGX has been created with a focus on an org, so that the continuity of the project can be maintained with a simple path for the original creators to join back if they want to.
I don't think the community could have handled this better!
As a user of Paperless I highly recommend setting it up, it is so incredibly useful as soon as you have to deal with any kind of paperwork-intensive activity (buying a house, getting a mortgage, paying taxes). Take literally everything you get and throw it into Paperless, then when someone asks for all of these documents it will take you minutes instead of hours to track them all down - when you need it, it's magic.
Does it still require you to press a button to tell it every single time it's pointed at a new sheet of paper?
I'm waiting for one of these apps to monitor a video stream from a camera and automatically determine when a new sheet is ready to scan.
I already have a mobile app that can find the edges of the paper, crop it, and OCR all its contents. But it still needs me there as a human to tell it "the page you're pointed at now is one you haven't seen before" when that determination should be far simpler than doing OCR.
I'm waiting for the day I can aim my phone or webcam and just flip through a bunch of pages and have it scan each one as it appears. It doesn't have to be super fast like Data from TNG or Johnny Five from short circuit, I just want to go from "flip, press, flip, press, flip, press" to "flip, flip, flip"
If you have so much paper volume that pressing a button is tedious, you should be using a scanner with an ADF. Fujistu makes a bunch and they've been on the market so long that finding a used one for less money is pretty easy.
If you don't have that much volume...just improve the physical UI. Whatever floats your boat: a foot pedal, capacitive switch, Clapper, etc.
It's not half important enough to me to justify buying more equipment.
Especially when there is no technological barrier that keeps us from having software do it automatically. I'm not going to buy gear to do something I fully expect the software to be capable of in the next year or two.
Until then, searching paper with my eyeballs once in a great while is less work than procuring and maintaining a document feeder or pressing a button three thousand times, no matter how convenient that button is.
It's better because it automatically OCRs everything and puts it into a search index. It means that finding documents now consists of typing a few keywords and immediately getting what you're looking for vs having to open a ton of PDFs
Not the parent, but I suspect it's not better in the outcome, just in the amount of effort you have to go into creating and arranging vs just throwing documents at it and searching later.
1. If I need a browser to access them, that's very similar to "can't access them". Do you expect my programs to start browser sessions to access files? Will my file manager have to connect to that silly web server?
2. I can't effectively OCR them for many non-English language (at least - unless it's made some advances in OCR for some languages for which it's lacking). Also, if I do OCR anything, I then need to correct the mistakes, which takes a h-u-g-e amount of time even when the OCR works relatively well.
You seem to come at this useful tool with a very negative mindset.
- The scanned documents are stored in the filesystem in PDF format, so you can certainly access them without going through the web application.
- OCR doesn't have to be perfect, because it's just being used to locate a specific document when you need it. So if you have 1788 documents and you want to find your apartment lease, you search for "lease" and there's a very good chance that word will be found in the correct document. That's the whole point - storing documents with a low amount of manual effort, in a way that makes it very easy to find them when you need them.
I'm looking at this ecosystem and wish I'd seen it sooner. I've been using the Google Drive Android app to scan documents. It does auto-cropping and perspective correction, but it doesn't OCR, and I manually rename the files so that I can order them by date and cross reference with my banking transactions. With OCR I might at least be able to extract some useful data and avoid some tedious data entry.
As others have noted, the tagging, auto flagging, etc. are great, and it provides a decent, cross-platform, web-based interface.
Also, you can flexibly configure how it stores the actual documents. So at the end of the day if you don't like the program, it gracefully degrades to a bunch of PDFs organized into a nice folder hierarchy that's easy to back up.
Interesting - will check it out to see whether it is better than my current approach, in which I scan to PDF-A including OCR and let Spotlight do the indexing.
Some additions: I found that black-and-white scans at 300 dpi work for almost all documents, resulting in a small file size and decent readability. Occasionally I switch to gray and 200 dpi and rarely to color.
After looking up how long the originals of different types of documents need to be kept for legal reasons, I settled simply for the maximum time (6 years in my case) and file documents in a binder, sorted by the year in which I can discard them. Then, at the beginning of each year, I can get rid off one section of the documents which are older than 6 years. There is a second binder for active contracts (insurance etc). As soon as one of them ends, it goes into the first binder.
I‘ve started organizing my Downloads folder in a similar way - sorting stuff by when I think I can delete it (either because it‘s not relevant any more or because I simply never touched it), typically a few months in the future.
Both systems have helped to keep the clutter low and.
> I wrote this to make “going paperless” easier. I do not have to worry about finding stuff again. I feed documents right from the post box into the scanner and then shred them. Perhaps you might find it useful too.
I know this strongly depends per country, but could anyone else share real-life experiences of going paperless?
From my understanding, if there is a sheet of paper that has special meaning, presumably you must be able to produce the original, with wet stamps and signatures, upon request (since any copy can be created from scratch digitally). So I keep loads of papers I am afraid to dispose of.
I use paperless-ng to go mostly paperless, but I keep the paper:
I bought a simple self-incrementing stamp, which I stamp all incoming documents with. When adding the document to paperless-ng, I set the document's number accordingly, and finally I file the paper away purely sorted by ID (so I have a (physical) folder titled 1-101, one 102-231, etc.). When I need the original to a document, the lookup is very fast, and when I know I won't need the original ever again, I don't stamp it and tag it as "digital only" in paperless-ng.
Although I have not used paperless-ng (extensively) so far, I have been using almost the same method for many years and could not be happier. When I receive a document, I do the same every time:
1) Stamp it with the pagination stamp.
2) Scan it.
3) Dump the scan into a single large digital folder.
4) Dump the original into a physical folder (titled as you do).
It relieves the mental burden of categorising since you do the same every time – stamp, scan, dump. And repeat. In the very rare case you need to find the original it's also superior, thanks to OCR and the sequential numbering being represented on the scan.
This is the situation in Germany for example. You need to keep many paper originals for 10 years. You can follow a process called „ersetzendes Scannen“ but I‘m not quite sure how that works.
For me I just scan everything and than put it in a binder with a label (like 2022-1) and put the same label on the digital document. This way I still have the document and will be able to find it if needed, but I don’t have to worry about where to put it. They all just go into the same binder until it is full.
This is my understanding as an individual in Germany who is not self-employed: Original documents help you to make or defend against legal claims, so for the time being, I'd keep them during any limitation period which applies. Afterwards that paper is not really useful any more (but keeping a digital copy doesn't hurt). I suggest to find out, which periods apply in your jurisdiction.
In my case, I simply settled for the maximum of all the different periods I encountered and file originals by the year after which I can trash them and use the digitized versions for actually working with them (e.g. my tax declaration is now much faster to do). For Germany, I found that 6 years grace period should be fine. 10 years, if you're self-employed.
IMHO, this seems to consist of two very separate pieces of software: Scanning, and complex management of an archive of files and documents.
Both are interesting challenges, but focusing on the second one - I don't see why it should be tied-in with scanning. It's actually something that I feel is missing in operating systems, or at least desktop environments: The ability to arrange your files in more than one way, rather than having to force them into a static hierarchy ... while also not having them lost behind some piece of software which prevents direct access.
> ...The ability to arrange your files in more than one way, rather than having to force them into a static hierarchy ... while also not having them lost behind some piece of software which prevents direct access...
I wish all common operating systems supported labels/tagging in a compatible, easy-to-migrate manner...so that a person could migrate from windows to linux or back again, and all their file meta data - like file labels, file tags! - was preserved.
There is DevonThink [1]. Has been around for decades, does OCR, syncs to a huge bunch of services including WebDav, has mobile clients.
It's a bit of an old paradigm, though - powerful, client-side, offers a lot of freedom and therefore wants you to put in manual work for setting up a system. I love it.
I've been moving more and more to apps like Devonthink. I use Obsidian and Zotero, which while they different use cases are a similar 'old paradigm'.
My thinking recently has been on the storing items in a way that will last. It seems every year another SaaS gets bought and/or shutdown. Text based storage (Obsidian/Markdown) and desktop/foss apps are one way to combat that.
If "old paradigm" means self-owned data and no dependence on a subscription that can go away at any time I'm all in on it. (Small) SaaS is not dependable.
DEVONThink is criminally underrated. The mobile apps are fantastic. It's almost a super power being able to pull up any document/receipt I've dropped in my scanner over the last decade on my phone from anywhere.
That would be pretty nice for a variety of reasons. Most people don't have the time to experiment with self hosting so having some kindof installer that hosts the server locally and makes sure it runs on startup, presents some kind of tray icon or shortcut to the web interface would greatly lower the barrier to use such amazing pieces of software.
This was one of the first apps I added to run on PikaPods. So free to use during beta and I hope to set up a revenue sharing agreement with the project soon:
PikaPods looks like an interesting idea. I invested some time setting up account, and trying to create Baserow pod. I was greeted with "Failure while adding container. Our team has been informed."
Yeah, still in public beta and multi-container apps like Baserow are pretty new and marked as "experimental". Already looking into the error.
Edit: Should work, but takes several minutes before the web UI becomes available. We do provide the logs for such cases. Most other apps are easier to deal with. :-)
Baserow dev who worked on our new multi-container single image (baserow/baserow on dockerhub) here. I assume you are using the baserow/baserow image for pikapods (but perhaps not?).
- `SYNC_TEMPLATES_ON_STARTUP=false` to turn off the initial load of example Baserow templates. Our template collection is growing and the loading of this into the database is probably what is causing the several minute startup. This will be optimized in the future.
- `BASEROW_RUN_MINIMAL=true`. This will combine two of the backend async queue processes into a single one. This might result into higher priority async tasks getting stuck behind slower tasks (an large import/export might slow down the broadcast of realtime events for example). But this tradeoff is perfectly reasonable in most small self hosted environments.
- `BASEROW_AMOUNT_OF_GUNICORN_WORKERS=1` to reduce the number of concurrent backend api processes from the default of 3. Once again this might cause a degradation in performance for higher volumes of traffic, but a tradeoff worth considering if memory etc is a concern.
This image is brand new and because of it's multi-process nature is hiding some interesting complexity. We are still working out the kinks and a sensible set of defaults. It's also very easy for us to offer different variants of this image with different defaults set. Any feedback, suggestions or questions are very welcome here or at http://community.baserow.io.
Wow! Thanks for the tips! Will look into adding those during the next review.
Using docker.io/baserow/baserow:1.9.1 currently.
For now the image is marked as "experimental". And I'm looking into channeling future issue reports from apps in a more structured way and maybe keep app settings on Github for editing and reporting issues against. Without duplicating upstream efforts of course. Still lots of work to do here.
Thanks for looking into it. It's a missed opportunity; this application failure quite possibly cost you a customer loss. I am not sure when I'll be able to test it again.
I think you should cross-promote PikaPods more, and prominently, based on the popularity of your BorgBase. I noticed the following in my backup report, and that raised PikaPods' profile in my opinion; so I might cut PikaPods some slack, and try it again.
> The team behind BorgBase is launching a new container hosting service for open source apps...
It’s a trade-off between adding new apps quickly and testing them very well. Currently I tend to add more apps and just monitor for failures in Sentry. Mostly to see which categories are used and to provide enough choice.
Thanks for sticking around for now. I still have many improvements planned, like making it easy to report deployment bugs and suggest changes to settings. Like new env vars, ports, images, etc.
At one point I was using Evernote combined with a Fujitsu Scanner to scan in all my paperwork and Evernote would automatically OCR the scans and make it easy for me to search for this later. Evernote is a commercial offering.
Does this provide similar or better yet, better functionality?
If I scan my days mail in one go.. say water bills, property expenses etc in one batch.. will this then sort the water bill from the telephone bill in what is saved?
Ps. Bills contrived for illustration I receive much more of this online these days but still paperwork abounds.
Just to save others from having to look it up, paperless-xxx uses OCRmyPDF to do the work. So you don't have to host anything if you are willing to touch a file or two...
What do you mean by hosting? Both paperless-NGX and OCRmyPDF can be installed completely locally. Why not just install paperless which takes care of the OCR integration for you?
Now that Paperless-NG seems to be going unmaintained (last commit on 15th Sep 2021), Paperless-NGX has been created with a focus on an org, so that the continuity of the project can be maintained with a simple path for the original creators to join back if they want to.
I don't think the community could have handled this better!