DIY Book Scanner

ethanpil · on Jan 7, 2024

Remember when Google was cool and not evil and released their book scanner project for free? https://code.google.com/archive/p/linear-book-scanner/

  "Google hereby grants to you a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, transfer, and otherwise run, modify and propagate this design..."

mdaniel · on Jan 7, 2024

The link that's less likely to be sunset due to someone's random whims: https://github.com/google/linear-book-scanner#readme

zwayhowder · on Jan 7, 2024

I made one of these during my MBA. I spent close to $2000AUD including the two Nikon mirrorless cameras I purchased. I am not particularly handy and made it out of spare 2x4 lumber I had so it wasn’t light.

But it worked, I scanned about two dozen short term library books that I needed to reference frequently during my course at a cost of about $85 per book. If I’d purchased the time limited ebooks they would have cost $125 each, and been scattered across 3 different bookstore apps.

I would scan while watching tv and could do approx 1000 pages per hour.

I also learned that I should not do carpentry and potentially saved tens of thousands by hiring a handyman or carpenter for home diy…

cf1241290841 · on Jan 7, 2024

There is an amazing one using plastic plumbing pipes still on youtube

https://www.youtube.com/watch?v=ns3jGFbJvXI

zwayhowder · on Jan 8, 2024

Someone at the time had a CNC aluminium one on ebay for $700 or so. I thought "I can do that cheaper". I was very wrong. The actual parts weren't too bad.(I still had to spend $1400 on the cameras IIRC), but the number of tools I didn't own was a lot higher than I expected.

Still no regrets, I had a fun week of arts and crafts and got to stick it to Elsevier and other academic publishers :D

cf1241290841 · on Jan 8, 2024

It does sound like an amazing project you did there.

You are also making a very good point with the surprising effort and equipment this can take. I ran into trouble just fixing something on a heatsink the other day. Turns out in addition to drill, drillbit and tapping set i really need something to keep the drill straight :)

nforgerit · on Jan 7, 2024

With Covid lockdowns I got a bit depressed and decided to give up my flat to travel around. When I looked at my bookshelf it broke my heart to give away/sell all my books, so I remembered the old story about Brin/Mayer (afair) at Google trying out how long it takes to photograph a book.

I did the same just with my bare hands and my smartphone with a rather short book and calculated it would take me ~2 weeks to create an imperfect (thumbs included) digitized copy of all my books. So that's what I did, eventually improving upon things:

- took a grill from the oven which stabilized the phone and relieved my arm

- created a couple of bash scripts to automate slicing and image compression

- run tesseract-ocr (fulltext search)

- ghostscript for making it a pdf All automated and improved over time. No big bang, just trial and error and tiny steps.

Meanwhile I have hundreds of books. They're not perfect but perfectly readable/searchable. Why I am telling you this? Keep it simple. Unless you're more interested in engineering the "machine" than the actual product it's supposed to create.

yowlingcat · on Jan 8, 2024

That grill trick is a neat discovery. Makes sense that it would do the job as a stabilizer, but it's a clever reuse of some thing most of us have lying around the house. I bet you could use a similar kind of a setup with a baking drying rack to mount a smartphone on motors and rotate the thing in multiple dimensions, but perhaps by that point there's a better approach.

whycome · on Jan 8, 2024

I cannot picture how a grillwould be used

nforgerit · on Jan 8, 2024

Well you can lie down the phone on the grill without the camera lenses getting covered. Then you build some kinds of pillars (anything; I just used 2 stacks of books on each side) and put the grill with the phone on top. Then with one hand I released the trigger and with the other hand I turned the pages. It worked surprisingly well.

is_true · on Jan 8, 2024

Wasn't searching LibGen more efficient?

phs318u · on Jan 8, 2024

For anyone who thinks this is a facetious response, I've done exactly this - use LibGen to find far more useful (to me) formats of books that I own physically. I don't see why I should be punished because I purchased the book in an age where electronic copies were not available. The author and/or their estate have gotten their fair share from me already.

angra_mainyu · on Jan 8, 2024

LibGen and the associated SciHub are truly some of the best projects to ever exist.

BeetleB · on Jan 7, 2024

The original creator of this had to suspend all activities when he got a job at Apple.

https://news.ycombinator.com/item?id=27364737

SequoiaHope · on Jan 7, 2024

Apple I believe says this in their contracts but in California (idk if this worker was in CA) it’s illegal to prevent employees from working on projects on personal time using personal resources, aside from maybe carve outs for competing things. It actually really sucks that apple acts like workers can’t moonlight, I’ve had some friends who didn’t do it out of fear of retribution despite it being enshrined in CA law.

aidenn0 · on Jan 7, 2024

Well, what's a related field? Apple has a store that sells eBooks...

SequoiaHope · on Jan 9, 2024

This article says the exemptions include, among other things:

“The nature of moonlighting work is in direct conflict with the company. For example, working with or providing consultations to competitors, competing for an employer’s clients, or activities that harm the goodwill or reputation of your employer.”

https://www.aegislawfirm.com/blog/2023/01/california-moonlig...

I think it would be hard to argue that a book scanner competes with an online ebook store, since one is an archival tool and one is a commercial store. Someone could host illegal copies of copyrighted works and they would be competing with apple’s legitimate work, but those people would be the ones competing with apple, not the creator of a tool they used in the process.

aidenn0 · on Jan 11, 2024

There's not just the moonlighting law, there's also copyright assignment for software written. California also protects that, but under different rules.

bluGill · on Jan 8, 2024

Does the guy work in that bookstore? If he is in a different dapartment he is probably okay.

j16sdiz · on Jan 8, 2024

According to his website ( https://danreetz.com/resume.htm ), his employment at Apple ended on 2017.

grumpyfox96 · on Jan 8, 2024

There is this classic – very very impressive but to my knowledge not commercially available and probably never commercialized. 250 pages/min is astounding.

https://youtu.be/03ccxwNssmo

dang · on Jan 7, 2024

DIY book scanning - https://news.ycombinator.com/item?id=991897 - Dec 2009 (7 comments)

jbaiter · on Jan 7, 2024

Writing the software for an earlier version of this was one of the first open source projects I ever did, with an early 0.1x release of React, fond memories So great to see the project is still alive.

daniel_reetz · on Jan 8, 2024

"alive" is a strong word but I am keeping it online as long as possible.

miss ya old friend

textfiles · on Jan 8, 2024

Shout out to you, my boy

mdaniel · on Jan 7, 2024

The submission from 2021 with quite a bit of commentary: https://news.ycombinator.com/item?id=27361815

Syzygies · on Jan 7, 2024

This should be easier in software, reconstructing a 3D model of the relaxed open book from a stereo or multiple photos, then using AI to "upsample" to the scalable PDF document most likely to produce the modeled image.

I was part of a font consulting company during the Postscript / Truetype font wars, and we reconstructed fonts from scans or earlier digital formats. Most of the work was fixing bad data. This all should be easier now; think of Peter Jackson cleaning up the "Let It Be" sound, leading to the Beatles releasing one more track.

It baffles me that book images don't get this quality of attention. As a mathematician I spend a lot of time reading old journal articles that look terrible.

jzb · on Jan 7, 2024

From the front page of the link: “The easiest way to avoid page curl in your images is to flatten the pages by pressing them against glass or acrylic. While there are some computer algorithms that can help dewarp the pages after capture, it is always more reliable to just capture flat pages in the first place.”

neilv · on Jan 7, 2024

> If you have a healthy budget, just buy DSLR cameras and use those.

Do these scanning rigs lock the mirror and shutter of the DSLR?

If not, what MTBF are they looking at, when prosumer DSLR shutter life might be around 50K actuations?

Paul-Craft · on Jan 8, 2024

> A "good" shutter count varies depending on the camera model. Entry-level and mid-range DSLR cameras typically have a shutter count rating between 100,000 and 200,000, while professional-grade cameras can range between 400,000 and 500,000. When purchasing a second-hand camera, it's best to choose one with shutter count well below its rating.

https://checkshuttercount.com/nikon

neilv · on Jan 8, 2024

I don't know what's accurate.

This top-search-hit other site has some "Average number of actuations after which shutter died" data is for some older models.

Consumer (lowest, 69K): https://olegkikin.com/shutterlife/canon_eos1000d.htm

Prosumer (98K): https://olegkikin.com/shutterlife/canon_eos30d.htm

That's average, so, if that data is reasonably representative of units in the the wild (I don't know), I'd think a trustworthy rating (and safe expectation) would be lower than that.

The reasons I mentioned shutter life was because I wanted to know how the scanning projects using DSLRs managed that, and also, to suggest to anyone dropping money on a DSLR for this that shutter life might be a cost consideration.

zoklet-enjoyer · on Jan 7, 2024

That guy posted this on here when it was new. Also he's from my town

cf1241290841 · on Jan 7, 2024

https://diybookscanner.org/forum/viewtopic.php?f=1&p=9034

External power for the cameras is quite neat.

edit: In case anyone is curious, for battery replacements the term is (dual) "battery eliminator"

mvelbaum · on Jan 8, 2024

I used to use a flatbed scanner back in my undergrad, was quite a painful experience :).

Nowadays I just use https://1dollarscan.com. Turns out to be rather expensive, but still beats all that manual work.

WalterBright · on Jan 7, 2024

Just slice the spine off, and run it through a sheet fed scanner. I know, I know - Sacrilege!

But you can also just set the book on a table, open it, and photograph it with your phone camera. The result is perfectly legible on your monitor.

lotophage · on Jan 8, 2024

I love the idea (as long as they are not rare out of print books). Then burn the loose pages of all the books you've scanned in a bonfire to complete the ritual. The books have now transcended the physical realm and it felt really wrong while doing it. History warns us that people will be next, are they right?

WalterBright · on Jan 8, 2024

Rare out of print books are, of course, rare. The vast bulk are relatively worthless in the used market.

Telemakhos · on Jan 8, 2024

I once asked a special collections librarian in the US to scan a rare book. This is a perfectly normal, common practice in some libraries; for example, I've gotten scans of rare books in a library in Salzburg e-mailed to me for something like twenty euros. This one librarian, however, hadn't heard of that, and the only method he could think of to do it was the one you suggest—slicing the spine and running the pages through a multi-page scanner, and he was horrified at the idea. A long and frustrating conversation ensued as I tried to convince him that I was not trying to get him to destroy rare books in order to digitize them.

KolmogorovComp · on Jan 7, 2024

Cool to see this reposted! I see on the forums are using dedicated compact cameras. Are they still better than smartphones?

I also wonder if using the LIDAR from iPhones for example could significantly improve the flattening process.

pronoiac · on Jan 7, 2024

Yay, the forum is back online!

daniel_reetz · on Jan 8, 2024

Lots of work behind the scenes. Welcome back.

dah00pl3 · on Jan 8, 2024

I saw this and immediately looked to see if someone had finally posted instructions on making a GrumbleGear 3000 scanner (Robin Sloan's "Mr. Penumbra's 24-Hour Bookstore").

jerdthenerd · on Jan 7, 2024

I was interested to see how they solved automated page turning... but none of the designs appear to address this?

Sounds like a repetitive motion injury waiting to happen after you get done scanning Fountainhead.

cf1241290841 · on Jan 7, 2024

Fully automatic ones do exist, but its generally not done to not risk damage to the books. And without ocr of page numbers you always risked missing pages.

https://www.youtube.com/watch?v=kvM-tjrS2-U

That said, a lot of automation processes in place do destroy the book by cutting the spine and scanning it.

westurner · on Jan 8, 2024

"Ask HN: What's the best out-of-box Document OCR/Analyzing/recognition API?" (2024) https://news.ycombinator.com/item?id=38829242 ; BetterOCR :

> Better text detection by combining multiple OCR engines (EasyOCR, Tesseract, and Pororo)

FWIU there's a way to image multiple stacked sheets of ancient scrolls without unrolling them?

indianmouse · on Jan 8, 2024

Is there a kit that can be bought online?

I may not have the time and energy to do a DIY, but have an immediate requirement to procure one. Any leads / directions / websites would be really appreciated.

Thank you!

treetalker · on Jan 8, 2024

Not a DIY kit, but in case LibGen is insufficient and you need a commercially available solution, ScanSnap scanners have a model for this purpose.

https://www.pfu-us.ricoh.com/scanners/scansnap/sv600

The software has automatic page-turn detection (so you don’t have to repeatedly press the scan button); has page-curve correction and deskew; and automatically removes fingers/thumbs from the image, in case you need to hold the pages down. Neat!

Like another commenter, I used 1dollarscan to digitize many books (to save space) but I agree that that process was more expensive than expected (and destroyed my physical books, which I have come to regret). If I had known about the scanner I just linked to, I probably would have invested in one instead.

Off-topic, but apparently Ricoh has acquired the ScanSnap brand from Fujitsu. (News to me, at least.) But unless Ricoh has changed something, in my experience it’s hard to go wrong with the ScanSnap brand for personal scanning needs.

(I have no affiliation with the companies or brands mentioned.)

ChadNYC · on Jan 8, 2024

https://diybookscanner.org/forum/viewforum.php?f=28

indianmouse · on Jan 8, 2024

Thanks and really appreciate it.

But none of them are available or selling it currently.

Also, the forum looks dead / defunct.

No noticeable activity on any thread! Sad to see such a vibrant community gone extinct.

I know it is not an active area unlike other DIY and the requirements are also very low. But still....!

jareklupinski · on Jan 7, 2024

with the methods that hopefully come out of that research project that aims to read those really old scrolls that can't be touched or opened...

maybe we'll find a method for "CT scanning" a book and using imaging techniques to reconstruct the text inside without needing to flip each page?

rrr_oh_man · on Jan 8, 2024

> maybe we'll find a method for "CT scanning" a book and using imaging techniques to reconstruct the text inside without needing to flip each page

YES! Quite literally what's happenening here, albeit in a different context:

https://www.nytimes.com/2023/10/12/arts/design/herculaneum-s...

emtrw · on Jan 8, 2024

It's getting to the point that we need this again.

Ebooks are basically all epub and that's completely useless for type setting. I've had to contact authors of textbooks to try and get the latex source so I can read what the damned thing says on a screen.