Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I also wrote an alternative solution (not a public repo), but I found that relying on site maps and other link lists generally gave unsatisfactory results. Instead, my solution navigated as a user and actually used next chapter links. While that slowed it down (+ 10 seconds between requests to be polite), it could handle very large books, with the largest I used being 700+ chapters at the time (5000 pages).


This is almost the same aproach I used for Bloxp[0]. I have some common Previous Post link markups and I try to navigate from the last post in a blog, one by one, to the first. I also allow to manually indicate the HTML markup to use for crawling a given blog, in case it is not matching any of the common ones.

I uploaded the site 10 years ago (at first I did it because it was useful to me) and I have made almost no changes since then but many people still use it as a simple way to export a full blog into an ePub.

[0] http://www.bloxp.com


Yea I also had that idea before, but I didn't want to maintain a bunch of different "next chapter" finder logic.

But I do agree it would be a more reliable way of doing things.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: