Sounds like a job for RAG.
We're about to prototype a system at work to point to all our confluence, Google docs, Jira tickets, gitlab source, and hand written KB articles, so adding a section for interesting links should turn the RAG system into a CBO.
The first thing I did when I saw this thread was ctrl-f for doclaynet :)
I've been at this problem since 2013, and a few years ago turned my findings into more of a consultancy than a product. See HTTPS://pdfcrun.ch
However, due to various events, I burned out recently and took a permie job, so would love to stick my head in the sand and play video games in my spare time, but secretly hoping you'd see this and to hear about your work.
Doclaynet is the easy part and with triple the usual resolution the previous gen of yolo models have solved document segmentation for every document I've looked at.
The hard part is the table segmentation. I don't have the budget to do a proper exploration of hyper parameters for the gridformer models before starting a $50,000 training run.
This is a back burner project along with speaker diarization. I have no idea why those haven't been solved since they are very low hanging fruit that would release tens of millions in productivity when deployed at scale, but regardless I can't justify buying a Nvidia DGX H200 and spending two months exploring architectures for each.