Text layout is a loose hierarchy of segmentation (2020)

nicoburns · on Dec 30, 2022

It's two years since this was posted, and the amazing news is that we now have not just one but two projects implementing text layout in Rust that look like they might be viable:

- https://github.com/pop-os/cosmic-text/ - which is still has a way to go but already complex scenarios involving asian and arabic scripts, which is impressive considering it's only a couple of months old. This one is backed by System76 for use in their new desktop environment

- And https://github.com/dfrg/parley which looks abandoned but is already impressively complete and the author has signalled they intend to revisit at the start of this year. This one is being used in the Druid toolkit mentioned in the article (and it's successor Xilem) and also the text editor https://github.com/lapce/lapce which is based on that toolkit.

raphlinus · on Dec 30, 2022

A clarification: Druid proper uses the text layout capabilities of the platform (DirectWrite on Windows, Core Text on macOS, and Pango on Linux), while Xilem does indeed use Parley. Lapce uses a fork of Druid that, as parent states, uses Parley for text layout.

It does look like a very good year coming up! I'm writing a reflection/wishes post as we speak.

nicoburns · on Dec 30, 2022

Ah yes. I only really started following the Druid project closely recently, and I've focussed more on the newer sub-projects so my knowledge of Druid proper is limited (indeed partly because I am much more interested in "rust native" UI (to the extent that that is possible) than I am in toolkits that rely heavily on closed-source OS provided APIs.

moloch-hai · on Dec 31, 2022

It is kind of tragic when so much work gets wrapped into libraries coded in obscure languages, and so never gets out to affect much in the world.

Doubtless, enormous brilliance went into Symbolics OS and UI code, all washed away when Symbolics tanked. On the up side, untold gigabytes of bad Java code were flushed in the big tech meltdown of 2000. When Twitter and Facebook go flat like inflatable lawn decorations in a cold snap, will much of anything be lost? (Anyway zstd and lz4 will survive.) How much stuff is coded in Wolfram?

Obscure code today goes onto GitHub repositories rather that just evaporating, but it is more manual labor to find and transcribe to a live language than can usually be mustered.

bfgeek · on Dec 30, 2022

One interesting/fun problem with text layout is that:

  width(A) + width(B) != width(A+B)

...which some basic text layout engines assume. The post touches on this, and is one reason why line-breaking is so difficult. If you add some text to a line, the width of the line may be longer or shorter(!) than if you measured (shaped) the parts separately.

This occurs for many reasons, the post mentions splitting a ligature, can also happen with kerning (spaces can have kerning applied for example).

E.g. many text layout engines will incrementally "add" to a line. However to do this correctly for all cases you need to re-shape (measure) the entire line (or from a known "safe to break" point in the shape result see: https://github.com/harfbuzz/harfbuzz/issues/224 ). This is typically fine for latin cases, but begins to become slow for more complex scripts (Thai).

Chromium kinda works backwards. It tries to fit everything in the paragraph on a single line, shapes (measures), then finds a line-break opportunity within the (potentially large) shape result which would "fit" on the line without re-shaping. Then given that line-break will reshape the entire line (using the "safe to break" API), (and remaining content for the next line), see if it will fit and repeats the process[1].

By always "removing" content from the line you end up with a correct implementation which works for all the crazy things which can happen with text rendering.

[1] The content in the line may become bigger after taking a line-break, hence this needs to happen in a loop until there is one "unbreakable" piece of content in the line. This loop typically only goes through one iteration.

amelius · on Dec 31, 2022

Didn't Knuth solve this problem in his Computers & Typesetting books?

raphlinus · on Dec 31, 2022

No. Most line breaks in English are at spaces, and the Computer Modern fonts don't have any complex shaping behavior across spaces (it's mostly complex scripts such as Nastaliq that have this behavior). I haven't checked to see whether it takes into account the kerning value with the hyphen added.

The first system I know of that does solve this very well is DirectWrite. Sergey Malkin describes the solution here: https://github.com/harfbuzz/harfbuzz/issues/1463#issuecommen...

It sounds like Chromium has since also adapted the "safe to break" API. Great to see people engaging these hard problems!

taeric · on Dec 31, 2022

I doubt Knuth would claim for "solving" the problem, but I do get the impression that it is a solution. In particular, it does tackle hyphenation of words. Is where I learned of the re-cord versus rec-ord problem.

I'm curious what you mean regarding complex shapes across spaces. If it is related to the displayed bug regarding how words like "office" would be split, I don't think TeX ever had the bug as described. Though, I could be wrong, easily. (Incidentally, what a fun bug. Kudos to who found that one!)

Edit: I also had the impression that TeX would adjust intercharacter spacing as a whole to keep things looking a bit more uniform. Essentially, the goal was to act a lot like a human would when setting out the characters. If needed, you could adjust spacing on all characters within a margin of "not going to be noticed" so that you could expand a line of text to take up the full width. Without just having more space between a few words.

raphlinus · on Dec 31, 2022

By "shaping across space" I mean that the shapes of the words might be very different depending on whether there's a space or whether they're on second lines. That basically doesn't happen in Latin script (though one could possibly imagine a very fancy cursive or a stunt). So I'll give an example in Nastaliq. One of the example images[1] on Wikipedia is "خط نستعلیق" (meaning "Nastaliq script") where the "خط" is tucked under the other word when they're on the same line. There's a space between them, so it's a valid line break. If they were on two separate lines, the total width would be a lot wider because of that shaping behavior.

This is not conceptually a difficult problem, you can just shape all possible substrings between candidate line breaks to get their widths, at which point Knuth-Plass will give the optimal line breaks (relative to your objective function). But given that shaping is expensive, you really want to avoid that in the 99.99% or so cases where the width of the word isn't altered by other words beyond space boundaries. That's what the "safe to break" logic is about - letting you know when you can make that assumption, as opposed to needing to reshape to get the precise metrics.

[1]: https://en.wikipedia.org/wiki/Nastaliq#/media/File:Khatt-e_N...

taeric · on Dec 31, 2022

Sadly, I don't understand the example. Is there a meaning/logic to whether the word is tucked under? I know you said it rarely happens in latin, so I'm assuming it is almost akin to how 2nd is supposed to be narrower, as the "nd" should not be the same as the 2? Definitely not a typographic convention that happens in traditional text, but almost akin to dropped caps? Feels like it is a convention that is common in logos and such? (Though, scanning, I can't see too many logos that stack words, anymore. Olive Garden is about the best example I can find.)

That said, logos are a good example where TeX is not suited. Odds are high that it will not help stack words in a way that works for slogans and such.

aebtebeten · on Dec 31, 2022

Pedantry: note that "safe to break" optimisations can occur vertically (where I encountered them with latin scripts) as well as horizontally.

Also, the original HTML tables algorithm was evidently meant for hardcopy output and not relayout-upon-typed-input, as it was horrendously slow without making some safe to break optimisations.

taeric · on Dec 31, 2022

Pretty sure TeX considers horizontal optimizations, as well? I know it tries not to dangle a single sentence to a page, at least. (Well, I "know" that... I would be far from shocked to find I'm wrong.)

bfgeek · on Dec 31, 2022

A lot of folks think that Knuth-Plass is the optimal solution for good looking text, for all text. It really only considers Latin, and even then has restrictions.

Some fonts have the space character in the kerning table, and possibly (although super rare) in the "ligature" table (gsub). (https://www.sansbullshitsans.com/ is an example of a font with a space in gsub, "paradigm shift" maps to a single glyph).

If you want to go really deep on how complex some scripts are take a look at: https://r12a.github.io/scripts/arabic/arb.html#shaping

taeric · on Dec 31, 2022

Fair, a lot of folks do treat it with probably more praise than makes sense. I'm sure I'm one of those people. :D

That said, my understanding is that there really was no "optimal" for the best way to arrange text. You can, effectively, make an objective function that you can optimize on; but there is no global "this will make text look good" algorithm.

Indeed, even ligatures are... debatable in utility. I personally like them; but I would scoff at any claims for any objective superiority of them. They are fun and a bit of a "flex" for laying out text on a computer. Any other claim is going to be tough to hold up.

taeric · on Dec 31, 2022

Another fun problem of hyphenation in English, that I didn't see on my read through this, is that how something is hyphenated may depend on how the word is used. Easy example is "record", as a noun it will hyphenate differently than it will as a verb.

blkhp19 · on Dec 31, 2022

I’m trying to figure out when “record” would ever be hyphenated. If it doesn’t fit on one line, just put it on the next, right? No need to do some weird “rec-ord” / “re-cord” hyphenation.

dan-robertson · on Dec 31, 2022

If you’re justifying text, there may be more constraints like not wanting a big change in interword spacing between lines or avoiding spaces lining up across lines. These constraints might apply to the current line or to following lines affected by your choice of where to break.

Though I think the main reason would be limited horizontal space, e.g. a table cell or narrow newspaper column.

taeric · on Dec 31, 2022

To add to this, the point it is it trying a full page optimization, not just single lines or words. Will it be likely? Probably not, but it can happen.

And I only used "record" as it was the example given to me. I'm sure there are others. Is a good example for how/why counting syllables can be harder than folks think. Well, kind of a good example; since you will get the right count, regardless.

hsn915 · on Dec 31, 2022

My problem with this kind of essay is, even if it might contain useful information, it presents the problem as "omg this is way too complicated it's almost insurmountable".

This is a common trend I keep seeing in technical blogs, and I don't understand why they do it.

Look, it's not that. It's just a series of small problems. Solve each problem one at a time, and you'll approximate a complete solution as you iterate further and further. Your first iteration will not be perfect, but so what?

csande17 · on Dec 31, 2022

The underlying assumption of all these "handling text is impossible" blog posts is that you're trying to write a text renderer that handles every language in the world. This is indeed very hard, as many languages require their own typesetting features and special cases. It's hard to solve this iteratively when you don't speak all of the languages involved.

In some cases this is just an unspoken assumption because the blog posts are looking at, say, Firefox, and Firefox supports a lot of languages. In other cases, it's framed as a moral or professional imperative -- you're a bad developer if you don't consider the needs of Arabic users, if not a bad person. (I'm not sure how I feel about the globalist/colonialist worldview that seems to drive this idea that Arabic users won't have any software if Silicon Valley doesn't write it for them. I guess if you work at a big global megacorporation, it's your job to write big global megasoftware.)

But yeah, if you aren't trying to do that, then text is a much easier problem than these posts describe. Tons and tons of embedded devices, video games, and so on get away with having much simpler text rendering and layout code that only supports a few languages. In English, you really can start with a very simple layout system, then add features like kerning and ligatures and hyphenation and watch the text get progressively nicer-looking!

hsn915 · on Dec 31, 2022

Supporting Arabic is not that hard. It's not as complicated as these blog posts are presenting it to be.

csande17 · on Dec 31, 2022

I don't think any individual language is too hard—after all, humans can learn to read and write them—but doing all of them at once is tricky.

hsn915 · on Dec 31, 2022

That hard part is already done by HarfBuzz; segmenting text for feeding it into harfbuzz is not that hard. I've done it with hardly any prior experience (and even written a blog post about it)

https://hasen.substack.com/p/imgui-text

gernb · on Dec 31, 2022

Or if you're smart you won't reinvent the wheel. It's already been done. use one of them

danielvaughn · on Dec 31, 2022

First paragraph mentions how he couldn’t write a book on it, but honestly I’d buy that book in a heartbeat.

raphlinus · on Dec 31, 2022

Aw thanks. It remains a life goal to write a book on 2D graphics, including text layout and rendering, but I don't see any way to find time for it in the next year at least. Meanwhile I do have some pretty exciting stuff lined up, including some writing (just not a book) and will publish a blog post on my 2023 plans tomorrow.

splatzone · on Dec 30, 2022

I like this way of formatting ideas hierarchically.

Is there an Obsidian plugin for using this kind of structure? I know there’s the graph view for seeing how notes are linked together