Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wow, never knew this, this is really bad, especially in this day and age where emojis are so prevalent.

MySQL should deprecate utf8 and give a warning if you try to use it.



It is deprecated:

> You should also be aware that the utf8mb3 character set is deprecated and you should expect it to be removed in a future MySQL release. Please use utf8mb4 instead.

https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8...


How the heck did someone downvote you? Basically your link is exactly what I was looking for and I'm glad they're making this deprecation.


Bad idea. If they deprecate "utf8" they will deprecate a standard. It was designed and specified at a time when the Unicode code space had a 21 bit limit. There are some technical (storage space-related) considerations with the suggestion to "just use utf8mb4 everywhere instead" because of how InnoDB's indices work.


MySQL "utf8" is still "utf8mb3", which is not a standard anywhere except MySQL. It cannot store the full range of 21-bit code points; I don't know why you keep repeating this. The maximum codepoint in 3-byte UTF-8 is 0xFFFF, which is 16 bits.


> If they deprecate "utf8" they will deprecate a standard.

No, they won't. They will be deprecating something they created that never conformed to the UTF-8 standard. That fact that unicode didn't have codes beyond 21 bits at that point is pretty irrelevant.

It was an invalid implementation from day 1, at least with the name of "utf8".


>It was designed and specified at a time when the Unicode code space had a 21 bit limit

21 bits is the current Unicode limit. Unfortunately UTF-8 in 3 Bytes only has 16 usable bits - the first byte starts with 1110 and the two continuation bytes start with 10, so it's 4 + 6 + 6 = 16 bits.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: