Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I always found useful something along the lines of

  pdftotext -layout file.pdf | grep -E ...
for PDFs, good to see a Swiss Army knife utility for all sorts of file though!


rga uses pdftotext (from poppler) internally for pdfs, except wraps it in parallelization and a very fast cache layer, since you usually want to do multiple queries per file :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: