What is the point of having different codepoints for FONTS in Unicode? What a lo...

Arnt · on Nov 19, 2014

Unicode generally includes these things because an older encoding did, in the name of roundtrip compability. I expect some older font encoding did it to cater to people who need more than 26 symbols in their maths papers. Let 𝒉 be the...

Unicode's name for 𝒉 explains it all, really.

lmm · on Nov 19, 2014

And yet the Unicode consortium went with Han unification, which is still blocking adoption for a significant potential userbase (pretty much any software that needs to display Japanese names).

Arnt · on Nov 21, 2014

I went to a unicode meeting about a decade ago, and asked one of the luminaries over beer one night. He told me that they did some practical research, including reading newspapers and talking to editors. In Japan they would ask questions like "I see that you mention Shanghai in today's paper, and you use Japanese glyphs for the city's name, not the same as Chinese newspaper use. Why?". The answer was generally "that's how we write Shanghai here" and out of that came Han unification.

I suspect that if you could find a couple of mainstream publishers in Taiwan or Japan that prefer to print the names of mainland Chinese using the same glyphs as are used on mainland China instead of the glypths used on Taiwan or in Japan, you might be able to reopen the discussion of han unification.

Arnt · on Nov 21, 2014

Or even better: A directive from the someone's ministry of education decreeing deunified Han in school books, so at least one country's pupils would actually learn to read deunified Han.

Now wouldn't that be fun: "When history textbooks coverthe civil war in 1927-50, they shall use traditional Chinese for the names of then KMT-held cities and simplified Chinese for the names of then communist-held cities."

bentley · on Nov 19, 2014

Well, the original reasoning behind Han unification was the (horrendously impractical) idea of storing all of Unicode in 16 bits. Most of these characters were added later; you can tell because their codepoints are greater than U+FFFF.

TazeTSchnitzel · on Nov 19, 2014

They're not encoding different fonts. They're encoding distinct character forms, often necessary for historical texts and such. Some of these are actually symbols, too.