>> You are 250 lines of C# code away from creating a fully functional PDF invoic...

pathartl · on Jan 18, 2023

Coming from the web dev space into backend on a project that heavily relies on PDF generation, I would say that something like a PDF often cannot be expressed with just markup and a stylesheet. There's a large difference in something like the web (it must be expressed with some fluidity of layout) compared to a very static document like a PDF. Page breaks, readability, print supply, watermarks, paging, etc all has to be considered.

aidos · on Jan 18, 2023

I feel like all of that can be done in markup.

kgwxd · on Jan 18, 2023

I've worked with PDF markup tools built on libriaries like this for 20 years, both third-party and in-house custom. It usually takes 10 minutes to find out the markup doesn't support what is required for the task. Third-party you have to find a hack or drop it all together. In-house you can maybe add something in, but you'll have to do it fast, and if you can't break it down into a general-purpose feature (which you probably can't because the fundamental philosophy of your "easy" markup language wasn't designed with anything like this in mind) so you'll just have to uglify the markup language even more or, again, drop it all together.

Code is the only sane way.

aidos · on Jan 18, 2023

Well sure, as ever, it depends on your usecase.

PDF is an insanely complex spec (I’ve spent more time reading it than most because I need to know bits of it for my job and I just generally find it fascinating). But a lot of devs just need to put some content on the screen to match a template they were given. In my experience, a complete enough markup language allows you to bang out and maintain those templates better than code.

I know it doesn’t suit every need, but it’s just a way of representing the data so it’s closer to the final output than imperative code is. Definitely take your point though about the limitations becoming dealbreakers.

MarcinZiabek · on Jan 19, 2023

What if the code resembles the markup language in terms of readability, but still gives you access to more advanced features? Surely, there is space for various approaches, it all depends on your task and requirements

layer8 · on Jan 18, 2023

It can, but it’ll become something complex and Turing-complete like LaTeX.

styx31 · on Jan 18, 2023

Webpages and pdf (paged documents) are fundamentally different, you won't be able to support easily headers and footers, page-breaks and orphans on a webpage. You can create basic invoices on webpages, but anything more complex (and by that I mean any serious word document) will require you to twist HTML. Try to have column headers to repeat on each printed page on a HTML page.

aidos · on Jan 18, 2023

The markup doesn’t need to be html - and would be better not to be. The point is more that templating languages are great for formatting data as markup and markup is great for driving layout. With this library as a backend you can make something super usable.

SigmundA · on Jan 18, 2023

I believe browsers have been repeating table headers on printed output for some time.

Page media CSS is designed for this although most browsers don't fully support it, PrinceXML is the go to for full paged media support.

IMO they are not fundamentally different, they are both document formats, PDF just a has fixed paged rendering layout baked in while HTML can flow and adjust to rendering target. The main issue is lack of full print CSS support in HTML rendering engines.

https://www.w3.org/TR/css-page-3/

https://www.princexml.com

styx31 · on Jan 18, 2023

You are right about the thead repeatable header.

Still, to switch back to the previous point, it seems it's more a divergence between using markup or code to design a document. Both have valid usage and benefits depending on your case.

In my case and my apps, I often need to handle complex conditions that fits better imo in procedural code (complex invoices and agreements). On other cases (reports), I prefer to use a markup language.

SigmundA · on Jan 18, 2023

There are a lot of procedural tools for generating HTML, lots, if modern browsers fully supported print CSS then you could use them for complex PDF generation, or direct printing, either client side or on the server headless.

If your app is a web app this is a no brainer, the users browser could simply do the print or PDF conversion as needed.

I do see a use for more direct libraries in native apps, although if every native client had a browser control with full print CSS support even then it might not be such an issue.

Scarbutt · on Jan 18, 2023

If your app is a web app this is a no brainer, the users browser could simply do the print or PDF conversion as needed.

That's arguable, IME (and also a better UX), most would prefer to just get the PDF file which just one click than to deal with additional browser dialogs. No everyone knows how to do print-to-pdf or even know it exists.

Or do you mean browsers expose print-to-pdf functionality as an API?

SigmundA · on Jan 18, 2023

Hitting print in the browser or calling Window.print() if you want to force the dialog.

If you serve a PDF you still need to hit print or use dialog to save, you can use a headless browser server side to serve that if needed.

I do think browser could use better print API's but you not getting around that with server side PDF's unless the server direct prints to on site printers or something.

MarcinZiabek · on Jan 19, 2023

I am not sure if it is a good idea to think about webpage and PDF content as the same. After all, they both serve different purpose and their layout shouldd be optimized for the use case.

lazyeye · on Jan 18, 2023

None of these things are difficult at all with html. Plus you have the benefit of having the document viewable in a web browser too. You use the exact same html layout for both with specific css (heights, widths mainly) for each.

bob1029 · on Jan 18, 2023

We do a lot of dynamic report gen PDFs and this is something we'd prefer.

Right now, we basically emulate this technique w/ HTML->PDF. We build chunks of report HTML with various string interpolation methods and then compose those to obtain our final HTML output.

Raw, declarative HTML is nice if you don't have an undefined # of things to describe with it. When you are looping and projecting domain types into a report, things get a lot trickier.

amithegde · on Jan 18, 2023

I used https://github.com/Antaris/RazorEngine to generate all sorts of complex HTML, email body etc. back in the day. Since it follows razor syntax, loops etc. work well

bob1029 · on Jan 18, 2023

We actually used this exact library at one point, but it fell out of favor for some reason I cannot recall.

jaywalk · on Jan 18, 2023

There are PDF generators that work just like that. As a web developer who uses C# on the backend, QuestPDF is exactly what I want.

naasking · on Jan 18, 2023

Will those basic web pages be less than 250 lines for the equivalent look? I'm skeptical.

MarcinZiabek · on Jan 19, 2023

There are many good reasons of choosing the programming language over a markup language. C# has countless of features, both functional and syntactic: conditions, loops, methods, formatting, iteration, recursion, etc. Additionally, each of those features is well supported by all major IDEs. Writing your presentation layer in a proper programming language does not only rely on your existing skills but also gives you access to tools such us code completion and IntelliSense. Moreover, using FluentAPI helps with keeping the code concise and easy to change.

At the end of the day, it all depends on how you use the technology, doesn't it?

password4321 · on Jan 18, 2023

https://docs.aspose.com/pdf/net/working-with-xml/

Starting at $3600 for use on a web server.

aidos · on Jan 18, 2023

Not familiar with .Net but I’d imagine this would probably be fairly easy to build on top of this library (and I agree, xml is often a much better way to generate reports).

I’ve done something similar but in Python and generating Excel documents. I use jinja for templating to create the xml and then parse that and convert to commands that drive the library that creates the final document.

Genmutant · on Jan 19, 2023

You can use XSLT to XSL-FO if you want that. I haven't found it very nice to use.

wvenable · on Jan 18, 2023

But then you need a template language to generate the markup from the data.

aidos · on Jan 18, 2023

That's a well trodden path in most languages. A cursory search surfaced this library that looks like it would probably do the job:

https://github.com/scriban/scriban

wvenable · on Jan 18, 2023

A programming language referencing a template language library for processing a markup language to generate another markup language (PDF) sounds just about right.

aidos · on Jan 18, 2023

Nitpick but it’s a stretch to classify PDF as a markup language. They’re a graph of nodes that can encapsulate myriad different types of data including things that are probably even turing complete like fonts. Even the graphics streams inside PDFs aren’t markup.

We build abstractions for a reason. I think we can all agree that templating markup for layouts has been a reasonable success story of the web generation.