"Google hereby grants to you a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable (except as stated in this section) patent license to make, have made, use, offer to sell, sell, import, transfer, and otherwise run, modify and propagate this design..."
I made one of these during my MBA. I spent close to $2000AUD including the two Nikon mirrorless cameras I purchased. I am not particularly handy and made it out of spare 2x4 lumber I had so it wasn’t light.
But it worked, I scanned about two dozen short term library books that I needed to reference frequently during my course at a cost of about $85 per book. If I’d purchased the time limited ebooks they would have cost $125 each, and been scattered across 3 different bookstore apps.
I would scan while watching tv and could do approx 1000 pages per hour.
I also learned that I should not do carpentry and potentially saved tens of thousands by hiring a handyman or carpenter for home diy…
Someone at the time had a CNC aluminium one on ebay for $700 or so. I thought "I can do that cheaper". I was very wrong. The actual parts weren't too bad.(I still had to spend $1400 on the cameras IIRC), but the number of tools I didn't own was a lot higher than I expected.
Still no regrets, I had a fun week of arts and crafts and got to stick it to Elsevier and other academic publishers :D
It does sound like an amazing project you did there.
You are also making a very good point with the surprising effort and equipment this can take. I ran into trouble just fixing something on a heatsink the other day. Turns out in addition to drill, drillbit and tapping set i really need something to keep the drill straight :)
With Covid lockdowns I got a bit depressed and decided to give up my flat to travel around. When I looked at my bookshelf it broke my heart to give away/sell all my books, so I remembered the old story about Brin/Mayer (afair) at Google trying out how long it takes to photograph a book.
I did the same just with my bare hands and my smartphone with a rather short book and calculated it would take me ~2 weeks to create an imperfect (thumbs included) digitized copy of all my books. So that's what I did, eventually improving upon things:
- took a grill from the oven which stabilized the phone and relieved my arm
- created a couple of bash scripts to automate slicing and image compression
- run tesseract-ocr (fulltext search)
- ghostscript for making it a pdf
All automated and improved over time. No big bang, just trial and error and tiny steps.
Meanwhile I have hundreds of books. They're not perfect but perfectly readable/searchable. Why I am telling you this? Keep it simple. Unless you're more interested in engineering the "machine" than the actual product it's supposed to create.
That grill trick is a neat discovery. Makes sense that it would do the job as a stabilizer, but it's a clever reuse of some thing most of us have lying around the house. I bet you could use a similar kind of a setup with a baking drying rack to mount a smartphone on motors and rotate the thing in multiple dimensions, but perhaps by that point there's a better approach.
Well you can lie down the phone on the grill without the camera lenses getting covered. Then you build some kinds of pillars (anything; I just used 2 stacks of books on each side) and put the grill with the phone on top. Then with one hand I released the trigger and with the other hand I turned the pages. It worked surprisingly well.
For anyone who thinks this is a facetious response, I've done exactly this - use LibGen to find far more useful (to me) formats of books that I own physically. I don't see why I should be punished because I purchased the book in an age where electronic copies were not available. The author and/or their estate have gotten their fair share from me already.
Apple I believe says this in their contracts but in California (idk if this worker was in CA) it’s illegal to prevent employees from working on projects on personal time using personal resources, aside from maybe carve outs for competing things. It actually really sucks that apple acts like workers can’t moonlight, I’ve had some friends who didn’t do it out of fear of retribution despite it being enshrined in CA law.
This article says the exemptions include, among other things:
“The nature of moonlighting work is in direct conflict with the company. For example, working with or providing consultations to competitors, competing for an employer’s clients, or activities that harm the goodwill or reputation of your employer.”
I think it would be hard to argue that a book scanner competes with an online ebook store, since one is an archival tool and one is a commercial store. Someone could host illegal copies of copyrighted works and they would be competing with apple’s legitimate work, but those people would be the ones competing with apple, not the creator of a tool they used in the process.
There's not just the moonlighting law, there's also copyright assignment for software written. California also protects that, but under different rules.
There is this classic – very very impressive but to my knowledge not commercially available and probably never commercialized. 250 pages/min is astounding.
Writing the software for an earlier version of this was one of the first open source projects I ever did, with an early 0.1x release of React, fond memories
So great to see the project is still alive.
This should be easier in software, reconstructing a 3D model of the relaxed open book from a stereo or multiple photos, then using AI to "upsample" to the scalable PDF document most likely to produce the modeled image.
I was part of a font consulting company during the Postscript / Truetype font wars, and we reconstructed fonts from scans or earlier digital formats. Most of the work was fixing bad data. This all should be easier now; think of Peter Jackson cleaning up the "Let It Be" sound, leading to the Beatles releasing one more track.
It baffles me that book images don't get this quality of attention. As a mathematician I spend a lot of time reading old journal articles that look terrible.
From the front page of the link: “The easiest way to avoid page curl in your images is to flatten the pages by pressing them against glass or acrylic. While there are some computer algorithms that can help dewarp the pages after capture, it is always more reliable to just capture flat pages in the first place.”
> A "good" shutter count varies depending on the camera model. Entry-level and mid-range DSLR cameras typically have a shutter count rating between 100,000 and 200,000, while professional-grade cameras can range between 400,000 and 500,000. When purchasing a second-hand camera, it's best to choose one with shutter count well below its rating.
That's average, so, if that data is reasonably representative of units in the the wild (I don't know), I'd think a trustworthy rating (and safe expectation) would be lower than that.
The reasons I mentioned shutter life was because I wanted to know how the scanning projects using DSLRs managed that, and also, to suggest to anyone dropping money on a DSLR for this that shutter life might be a cost consideration.
I love the idea (as long as they are not rare out of print books). Then burn the loose pages of all the books you've scanned in a bonfire to complete the ritual. The books have now transcended the physical realm and it felt really wrong while doing it. History warns us that people will be next, are they right?
I once asked a special collections librarian in the US to scan a rare book. This is a perfectly normal, common practice in some libraries; for example, I've gotten scans of rare books in a library in Salzburg e-mailed to me for something like twenty euros. This one librarian, however, hadn't heard of that, and the only method he could think of to do it was the one you suggest—slicing the spine and running the pages through a multi-page scanner, and he was horrified at the idea. A long and frustrating conversation ensued as I tried to convince him that I was not trying to get him to destroy rare books in order to digitize them.
I saw this and immediately looked to see if someone had finally posted instructions on making a GrumbleGear 3000 scanner (Robin Sloan's "Mr. Penumbra's 24-Hour Bookstore").
Fully automatic ones do exist, but its generally not done to not risk damage to the books. And without ocr of page numbers you always risked missing pages.
I may not have the time and energy to do a DIY, but have an immediate requirement to procure one. Any leads / directions / websites would be really appreciated.
The software has automatic page-turn detection (so you don’t have to repeatedly press the scan button); has page-curve correction and deskew; and automatically removes fingers/thumbs from the image, in case you need to hold the pages down. Neat!
Like another commenter, I used 1dollarscan to digitize many books (to save space) but I agree that that process was more expensive than expected (and destroyed my physical books, which I have come to regret). If I had known about the scanner I just linked to, I probably would have invested in one instead.
Off-topic, but apparently Ricoh has acquired the ScanSnap brand from Fujitsu. (News to me, at least.) But unless Ricoh has changed something, in my experience it’s hard to go wrong with the ScanSnap brand for personal scanning needs.
(I have no affiliation with the companies or brands mentioned.)
It's getting to the point that we need this again.
Ebooks are basically all epub and that's completely useless for type setting. I've had to contact authors of textbooks to try and get the latex source so I can read what the damned thing says on a screen.