Paperless-NGX

robinparriath · on March 31, 2022

If I understand this correctly, the original Paperless was archived (Archival notice)[https://github.com/the-paperless-project/paperless/commit/9b...], so Paperless-NG was created.

Now that Paperless-NG seems to be going unmaintained (last commit on 15th Sep 2021), Paperless-NGX has been created with a focus on an org, so that the continuity of the project can be maintained with a simple path for the original creators to join back if they want to.

I don't think the community could have handled this better!

anaisbetts · on March 31, 2022

As a user of Paperless I highly recommend setting it up, it is so incredibly useful as soon as you have to deal with any kind of paperwork-intensive activity (buying a house, getting a mortgage, paying taxes). Take literally everything you get and throw it into Paperless, then when someone asks for all of these documents it will take you minutes instead of hours to track them all down - when you need it, it's magic.

addingnumbers · on March 31, 2022

Does it still require you to press a button to tell it every single time it's pointed at a new sheet of paper?

I'm waiting for one of these apps to monitor a video stream from a camera and automatically determine when a new sheet is ready to scan.

I already have a mobile app that can find the edges of the paper, crop it, and OCR all its contents. But it still needs me there as a human to tell it "the page you're pointed at now is one you haven't seen before" when that determination should be far simpler than doing OCR.

I'm waiting for the day I can aim my phone or webcam and just flip through a bunch of pages and have it scan each one as it appears. It doesn't have to be super fast like Data from TNG or Johnny Five from short circuit, I just want to go from "flip, press, flip, press, flip, press" to "flip, flip, flip"

KennyBlanken · on March 31, 2022

If you have so much paper volume that pressing a button is tedious, you should be using a scanner with an ADF. Fujistu makes a bunch and they've been on the market so long that finding a used one for less money is pretty easy.

If you don't have that much volume...just improve the physical UI. Whatever floats your boat: a foot pedal, capacitive switch, Clapper, etc.

addingnumbers · on March 31, 2022

It's not half important enough to me to justify buying more equipment.

Especially when there is no technological barrier that keeps us from having software do it automatically. I'm not going to buy gear to do something I fully expect the software to be capable of in the next year or two.

Until then, searching paper with my eyeballs once in a great while is less work than procuring and maintaining a document feeder or pressing a button three thousand times, no matter how convenient that button is.

einpoklum · on March 31, 2022

Can you explain how it is better than carefully scanning and arrangement the scanned files yourself? (Not being facetious, I'm really asking.)

anaisbetts · on March 31, 2022

It's better because it automatically OCRs everything and puts it into a search index. It means that finding documents now consists of typing a few keywords and immediately getting what you're looking for vs having to open a ton of PDFs

einpoklum · on March 31, 2022

"Typing"? Typing where? What's the integration with file managers like? And what's the API for file lookup via that "index"?

KennyBlanken · on March 31, 2022

Why don't you expend some effort answering your own questions instead of expecting others to answer a combative person on the internet?

https://www.youtube.com/results?search_query=paperless-ng

vertis · on March 31, 2022

Not the parent, but I suspect it's not better in the outcome, just in the amount of effort you have to go into creating and arranging vs just throwing documents at it and searching later.

anthropodie · on March 31, 2022

You can categorise those documents, tag them, OCR them, search them. Access them on phone or any other device with browser.

einpoklum · on March 31, 2022

1. If I need a browser to access them, that's very similar to "can't access them". Do you expect my programs to start browser sessions to access files? Will my file manager have to connect to that silly web server?

2. I can't effectively OCR them for many non-English language (at least - unless it's made some advances in OCR for some languages for which it's lacking). Also, if I do OCR anything, I then need to correct the mistakes, which takes a h-u-g-e amount of time even when the OCR works relatively well.

ishi · on March 31, 2022

You seem to come at this useful tool with a very negative mindset.

- The scanned documents are stored in the filesystem in PDF format, so you can certainly access them without going through the web application.

- OCR doesn't have to be perfect, because it's just being used to locate a specific document when you need it. So if you have 1788 documents and you want to find your apartment lease, you search for "lease" and there's a very good chance that word will be found in the correct document. That's the whole point - storing documents with a low amount of manual effort, in a way that makes it very easy to find them when you need them.

KennyBlanken · on March 31, 2022

Yeah, that guy is being a tool. "WHY IS THIS THING I REFUSE TO SPEND EVEN A FEW MINUTES BOTHERING TO RESEARCH, SO BAD?!"

h0l0cube · on March 31, 2022

I'm looking at this ecosystem and wish I'd seen it sooner. I've been using the Google Drive Android app to scan documents. It does auto-cropping and perspective correction, but it doesn't OCR, and I manually rename the files so that I can order them by date and cross reference with my banking transactions. With OCR I might at least be able to extract some useful data and avoid some tedious data entry.

nojito · on March 31, 2022

How is that any different from Onedrive/iCloud/etc?

matthewmcg · on March 31, 2022

As others have noted, the tagging, auto flagging, etc. are great, and it provides a decent, cross-platform, web-based interface.

Also, you can flexibly configure how it stores the actual documents. So at the end of the day if you don't like the program, it gracefully degrades to a bunch of PDFs organized into a nice folder hierarchy that's easy to back up.

Semaphor · on March 31, 2022

The /r/selfhosted announcement [0] had some more information about it, including links to issues where the fork-plans were discussed.

This is a great reminder for me to migrate ;)

[0]: https://old.reddit.com/r/selfhosted/comments/tbcuf0/announci...

Semaphor · on March 31, 2022

For those who don't know, paperless is a Document Management System. More information in the docs [0].

It’s the 2nd fork, with -NG being the first one. They each rose up when the former version became unmaintained.

[0]: https://paperless-ngx.readthedocs.io/en/latest/

phkx · on March 31, 2022

Interesting - will check it out to see whether it is better than my current approach, in which I scan to PDF-A including OCR and let Spotlight do the indexing.

Some additions: I found that black-and-white scans at 300 dpi work for almost all documents, resulting in a small file size and decent readability. Occasionally I switch to gray and 200 dpi and rarely to color. After looking up how long the originals of different types of documents need to be kept for legal reasons, I settled simply for the maximum time (6 years in my case) and file documents in a binder, sorted by the year in which I can discard them. Then, at the beginning of each year, I can get rid off one section of the documents which are older than 6 years. There is a second binder for active contracts (insurance etc). As soon as one of them ends, it goes into the first binder. I‘ve started organizing my Downloads folder in a similar way - sorting stuff by when I think I can delete it (either because it‘s not relevant any more or because I simply never touched it), typically a few months in the future. Both systems have helped to keep the clutter low and.

throwaway290 · on March 31, 2022

> I wrote this to make “going paperless” easier. I do not have to worry about finding stuff again. I feed documents right from the post box into the scanner and then shred them. Perhaps you might find it useful too.

I know this strongly depends per country, but could anyone else share real-life experiences of going paperless?

From my understanding, if there is a sheet of paper that has special meaning, presumably you must be able to produce the original, with wet stamps and signatures, upon request (since any copy can be created from scratch digitally). So I keep loads of papers I am afraid to dispose of.

L3viathan · on March 31, 2022

I use paperless-ng to go mostly paperless, but I keep the paper:

I bought a simple self-incrementing stamp, which I stamp all incoming documents with. When adding the document to paperless-ng, I set the document's number accordingly, and finally I file the paper away purely sorted by ID (so I have a (physical) folder titled 1-101, one 102-231, etc.). When I need the original to a document, the lookup is very fast, and when I know I won't need the original ever again, I don't stamp it and tag it as "digital only" in paperless-ng.

caeruleus · on March 31, 2022

Although I have not used paperless-ng (extensively) so far, I have been using almost the same method for many years and could not be happier. When I receive a document, I do the same every time:

  1) Stamp it with the pagination stamp.
  2) Scan it.
  3) Dump the scan into a single large digital folder.
  4) Dump the original into a physical folder (titled as you do).

It relieves the mental burden of categorising since you do the same every time – stamp, scan, dump. And repeat. In the very rare case you need to find the original it's also superior, thanks to OCR and the sequential numbering being represented on the scan.

flo123456 · on March 31, 2022

This is the situation in Germany for example. You need to keep many paper originals for 10 years. You can follow a process called „ersetzendes Scannen“ but I‘m not quite sure how that works.

For me I just scan everything and than put it in a binder with a label (like 2022-1) and put the same label on the digital document. This way I still have the document and will be able to find it if needed, but I don’t have to worry about where to put it. They all just go into the same binder until it is full.

phkx · on March 31, 2022

This is my understanding as an individual in Germany who is not self-employed: Original documents help you to make or defend against legal claims, so for the time being, I'd keep them during any limitation period which applies. Afterwards that paper is not really useful any more (but keeping a digital copy doesn't hurt). I suggest to find out, which periods apply in your jurisdiction.

In my case, I simply settled for the maximum of all the different periods I encountered and file originals by the year after which I can trash them and use the digitized versions for actually working with them (e.g. my tax declaration is now much faster to do). For Germany, I found that 6 years grace period should be fine. 10 years, if you're self-employed.

einpoklum · on March 31, 2022

IMHO, this seems to consist of two very separate pieces of software: Scanning, and complex management of an archive of files and documents.

Both are interesting challenges, but focusing on the second one - I don't see why it should be tied-in with scanning. It's actually something that I feel is missing in operating systems, or at least desktop environments: The ability to arrange your files in more than one way, rather than having to force them into a static hierarchy ... while also not having them lost behind some piece of software which prevents direct access.

mxuribe · on March 31, 2022

> ...The ability to arrange your files in more than one way, rather than having to force them into a static hierarchy ... while also not having them lost behind some piece of software which prevents direct access...

I wish all common operating systems supported labels/tagging in a compatible, easy-to-migrate manner...so that a person could migrate from windows to linux or back again, and all their file meta data - like file labels, file tags! - was preserved.

brunoqc · on March 31, 2022

I wish there was a standalone version, so my mom could use it.

Or maybe a desktop + a mobile version that are synced without a server, so you could still be able to use your phone to scan new documents.

uniqueuid · on March 31, 2022

There is DevonThink [1]. Has been around for decades, does OCR, syncs to a huge bunch of services including WebDav, has mobile clients.

It's a bit of an old paradigm, though - powerful, client-side, offers a lot of freedom and therefore wants you to put in manual work for setting up a system. I love it.

[1] https://www.devontechnologies.com/apps/devonthink

vertis · on March 31, 2022

I've been moving more and more to apps like Devonthink. I use Obsidian and Zotero, which while they different use cases are a similar 'old paradigm'.

My thinking recently has been on the storing items in a way that will last. It seems every year another SaaS gets bought and/or shutdown. Text based storage (Obsidian/Markdown) and desktop/foss apps are one way to combat that.

solarkraft · on March 31, 2022

If "old paradigm" means self-owned data and no dependence on a subscription that can go away at any time I'm all in on it. (Small) SaaS is not dependable.

LeSaucy · on March 31, 2022

DEVONThink is criminally underrated. The mobile apps are fantastic. It's almost a super power being able to pull up any document/receipt I've dropped in my scanner over the last decade on my phone from anywhere.

operator-name · on March 31, 2022

That would be pretty nice for a variety of reasons. Most people don't have the time to experiment with self hosting so having some kindof installer that hosts the server locally and makes sure it runs on startup, presents some kind of tray icon or shortcut to the web interface would greatly lower the barrier to use such amazing pieces of software.

tobbe2064 · on March 31, 2022

Theres a mobile app; https://github.com/bauerj/paperless_app

brunoqc · on March 31, 2022

You still need to host paperless, right?

anthropodie · on March 31, 2022

Host it on Raspberry PI at home. That's what I did.

brunoqc · on March 31, 2022

Yeah, I have one at home. But it's not ideal for non-tech-savvy people like my mom.

anthropodie · on April 1, 2022

Your mom does not have to know what it is hosted on. Just give her url or use some shortage like jump by daledavies.

brunoqc · on April 1, 2022

But then she would depend on me. She doesn't depend on me to use Excel.

frameset · on March 31, 2022

I use Papermerge, which seems to be a very similar software. Does anyone have experience with both and prefer Paperless?

m3nu · on March 31, 2022

This was one of the first apps I added to run on PikaPods. So free to use during beta and I hope to set up a revenue sharing agreement with the project soon:

https://www.pikapods.com/apps

gurjeet · on March 31, 2022

PikaPods looks like an interesting idea. I invested some time setting up account, and trying to create Baserow pod. I was greeted with "Failure while adding container. Our team has been informed."

m3nu · on March 31, 2022

Yeah, still in public beta and multi-container apps like Baserow are pretty new and marked as "experimental". Already looking into the error.

Edit: Should work, but takes several minutes before the web UI becomes available. We do provide the logs for such cases. Most other apps are easier to deal with. :-)

nigggle · on April 2, 2022

Baserow dev who worked on our new multi-container single image (baserow/baserow on dockerhub) here. I assume you are using the baserow/baserow image for pikapods (but perhaps not?).

We have a few environment variables (https://baserow.io/docs/installation%2Fconfiguration) that can be tweaked for this image to make it start-up faster and with a lower resource footprint:

- `SYNC_TEMPLATES_ON_STARTUP=false` to turn off the initial load of example Baserow templates. Our template collection is growing and the loading of this into the database is probably what is causing the several minute startup. This will be optimized in the future.

- `BASEROW_RUN_MINIMAL=true`. This will combine two of the backend async queue processes into a single one. This might result into higher priority async tasks getting stuck behind slower tasks (an large import/export might slow down the broadcast of realtime events for example). But this tradeoff is perfectly reasonable in most small self hosted environments.

- `BASEROW_AMOUNT_OF_GUNICORN_WORKERS=1` to reduce the number of concurrent backend api processes from the default of 3. Once again this might cause a degradation in performance for higher volumes of traffic, but a tradeoff worth considering if memory etc is a concern.

This image is brand new and because of it's multi-process nature is hiding some interesting complexity. We are still working out the kinks and a sensible set of defaults. It's also very easy for us to offer different variants of this image with different defaults set. Any feedback, suggestions or questions are very welcome here or at http://community.baserow.io.

m3nu · on April 2, 2022

Wow! Thanks for the tips! Will look into adding those during the next review.

Using docker.io/baserow/baserow:1.9.1 currently.

For now the image is marked as "experimental". And I'm looking into channeling future issue reports from apps in a more structured way and maybe keep app settings on Github for editing and reporting issues against. Without duplicating upstream efforts of course. Still lots of work to do here.

gurjeet · on April 3, 2022

Thanks for looking into it. It's a missed opportunity; this application failure quite possibly cost you a customer loss. I am not sure when I'll be able to test it again.

I think you should cross-promote PikaPods more, and prominently, based on the popularity of your BorgBase. I noticed the following in my backup report, and that raised PikaPods' profile in my opinion; so I might cut PikaPods some slack, and try it again.

> The team behind BorgBase is launching a new container hosting service for open source apps...

m3nu · on April 3, 2022

It’s a trade-off between adding new apps quickly and testing them very well. Currently I tend to add more apps and just monitor for failures in Sentry. Mostly to see which categories are used and to provide enough choice.

Thanks for sticking around for now. I still have many improvements planned, like making it easy to report deployment bugs and suggest changes to settings. Like new env vars, ports, images, etc.

evolve2k · on March 31, 2022

At one point I was using Evernote combined with a Fujitsu Scanner to scan in all my paperwork and Evernote would automatically OCR the scans and make it easy for me to search for this later. Evernote is a commercial offering.

Does this provide similar or better yet, better functionality?

If I scan my days mail in one go.. say water bills, property expenses etc in one batch.. will this then sort the water bill from the telephone bill in what is saved?

Ps. Bills contrived for illustration I receive much more of this online these days but still paperwork abounds.

tenebrisalietum · on March 31, 2022

Yes. It uses tesseract for OCR and gathers the OCR text and puts it in a database, and you can then search.

So if you enter something you know is on the water bill then it'll come up in the search, such as the name of the water company.

I've never used Evernote so I'm not sure it's better but I had no problem finding stuff so far.

Also you can tag documents and search by tag so you have that too.

Paperless kicks ass.

upofadown · on March 31, 2022

Just to save others from having to look it up, paperless-xxx uses OCRmyPDF to do the work. So you don't have to host anything if you are willing to touch a file or two...

subeadia · on March 31, 2022

What do you mean by hosting? Both paperless-NGX and OCRmyPDF can be installed completely locally. Why not just install paperless which takes care of the OCR integration for you?

nbenitezl · on March 31, 2022

This is also an interesting one: https://openpaper.work/en/

moveax · on March 31, 2022

There is a linuxserver.io container image now, which should act as drop in replacement for the paperless-ng linuxserver image.

Haven't tried it yet, but it looked good so far

https://hub.docker.com/r/linuxserver/paperless-ngx

wfriesen · on March 31, 2022

I just migrated my instance by repointing the webserver container image at the one in this repo. So far, so good.

bravura · on March 31, 2022

This is cool, but can I pay for a managed service?

I clicked through the documentation and I don't understand how easy/hard it is to make cloud backups and do disaster recovery.

meibo · on March 31, 2022

Excited to see that this is the way the project will be going forward, been using Paperless-NG for quite a while now and it's a pleasure.

kloudleinc · on March 31, 2022

It's a very useful tool. I mainly use it to back up my documents instead of only keeping the paperless ones though.

gsich · on March 31, 2022

Automatic deskewing of images?