Maybe you should look up MathML. Structured PDFs use MathML just like HTML to represent math formulae.
The real problem exists because most people don't use a correctly formatted/structured PDF to begin with. I don't wanna think about all the problems MS word might cause here and probably violates spec-wise.
Vectorized PDFs also don't use embedded fonts and content flow instructions but a bunch of randomly sequenzed glyphs that make no sense without a very good OCR. Vector-based PDFs are usually garbage for automated usage, and they are not useful anyhow for assistive purposes (e.g. a screenreader or a converter that uses the DAISY format or similar).
So yeah, I'd argue that PDF is the wrong serialization format. There are standardized alternatives that would be easier to parse, communicate, and license.
And countering your argument about portability: MHTML and WARC formats are very portable, the former is the default format for the page save functionality of all mobile smartphones. They are a single file, containing all necessary resources to display the page.
The real problem exists because most people don't use a correctly formatted/structured PDF to begin with. I don't wanna think about all the problems MS word might cause here and probably violates spec-wise.
Vectorized PDFs also don't use embedded fonts and content flow instructions but a bunch of randomly sequenzed glyphs that make no sense without a very good OCR. Vector-based PDFs are usually garbage for automated usage, and they are not useful anyhow for assistive purposes (e.g. a screenreader or a converter that uses the DAISY format or similar).
So yeah, I'd argue that PDF is the wrong serialization format. There are standardized alternatives that would be easier to parse, communicate, and license.
And countering your argument about portability: MHTML and WARC formats are very portable, the former is the default format for the page save functionality of all mobile smartphones. They are a single file, containing all necessary resources to display the page.