Not related to the issue raised in the article, but using a translation tool for individual words is going to give you the wrong answer in many situations by definition. The meaning of a word depends on its context.
For translating a single word, a dictionary that lists all possible meanings in the target language is the right tool. A translator (human or algorithm) can only guess or give you the list of possible translations.
Google Translate has come up with a suitable proxy solution by adding the "More Translations" pane.
I'm a Western who has lived in Taipei for almost three years. I use Google Translate every day of my life. It's helped me book restaurant reservations and talk to my doctor.
I also grew up in Canada, have British parents, and lived in the USA for seven years, so I'm used to switching from "Traditional English" to "Simplified English" too ;-)
I mean, ya, sometimes Google Translate uses Mainland China words (e.g., I just typed Bicycle and it returned 自行車 instead of 腳踏車). But I just tried typing 土豆片 (potato chips) and translating it to "English" and it returned "potato chips" instead of the British "crisps" too. This is not a big deal at all.
自行車 is used since way before mainland China influence is significant, although 腳踏車 has always been indeed more colloquial. Rejecting 自行車 is like rejecting winter squash as American English only because you grow up calling it pumpkin. As parent said, author really seems to have shockingly little experience with non-colloquial Taiwanese Chinese to confidently publishing such a page.
I mean it’s on a par with using simplified Chinese here. People understand what you mean, but you get disapproving looks. A foreigner using G translate would probably be fine, though :)
Is having "lived in Taipei for almost three years," really enough to confidently claim that "this is not a big deal at all?" Especially if you "use Google Translate every day of [your] life," presumably not because you're proficient in the local language?
As a matter of fact, a special "Cross-Strait" dictionary was developed to deal with all the language differences, with nearly 6,000 words and 30,000 phrases:
https://taiwantoday.tw/news.php?unit=10&post=19596
It hardly seems like "no big deal" to those who should know best.
I was in Taiwan and trying to direct a cab. I looked up the words left and right.
It translated to left and correct.
What followed was the driver asking me “left or right?” And me saying “correct!” And he would turn left in confusion I would start shouting “correct correct” while getting more frustrated. He was like “great I’m doing good” while I was pointing the other way.
Good times I eventually had a chat with him and looked up “turn right” after driving around a block twice using left turns lol
I'd say perhaps 5 or so of the ~100 entries on the list are somewhat debatable. The rest seems pretty obvious and indisputable, and among these I would count the "通過"/"透過" that you mentioned. If anything, perhaps the whole list is too focused on the IT lexicon. But it's pretty much solid work, doesn't really deserve a "mixed opinion" - in my opinion.
My point is this site doesn't provide any context. It's very weird to say "mega should be translated to 兆/百萬/超級/巨型" without specifying the context first. A Reddit "megathread" is definitely not a 百萬討論串.
*Edit: my word choice was wrong in the parent comment. I meant to say "every context", not "any context".
Literal translations often have that problem, and even human translators need context.
The initial comment looked like there wasn't any context, which means "mega" could just as easily mean «SI million prefix» as «huge».
In written English, the word "minute" in isolation and without context may be the noun for the time period of 60 seconds, or it may be the adjective «very small».
And even if it is representing a moment of time, it may not be representing exactly sixty seconds. It could in fact be an idiom representing an indeterminate longer interval:
This particular usage has been around, colloquially, for a minute.
Slightly besides your point, but there's a disagreement on Chinese SI prefixes for mega and tera. Most users might use Traditional Chinese in a Taiwan context in which 百萬 is correct, but it's possible for a Macau to use TCN and expect mega to translate to 兆. This is why Chinese Wikipedia has more specific settings than just Simplified vs Traditional. It doesn't surprise me to see Google Translate struggle with this term.
In any case, Bing and DeepL both agree with Google on translating it to 巨型 (giant).
I have friends who work there, but afaik they all work(ed) for Chromebook-related projects. I'm not even sure if anyone in Google Taiwan is in charge of Google Translation.
> I think this pales in comparison to how you can't translate from or into Cantonese!
I agree. It doesn't translate from or into Taiwanese(Taigi) either tho.
Honest question, why haven't china moved to a western (?) alphabet based writing? With my ignorant knowledge, it seems crazy that you are using that way of writing today. Using 24 letter combinations vs memorizing 3000 (min) to 20.000 icons!
When I was a kid, my best friend was chinese, and we could never do anything on saturdays because he had to go to a school to learn chines, and how to write in chinese was very hard and time-consuming.
Do you know? Care to explain?
EDIT: I am not from the US, but from Argentina.
EDIT2: I am talking about the effort to learn how to draw (not only read but draw!) very complex looking 3000 icons vs. the effort of learning 24/26 and some rules. With a western language, by just knowing a few pronunciation rules, you can read any word. At least in Spanish. I am not implying that Chinese are bad, or that the language should die or anything like that.
I honestly believe you asked this question out of your ignorance.
Despite the apparent difficulty maybe the people of China like their language nevertheless.
Languages vary widely in their expressivity.
Also, I think language (and the script) is closely related to the culture as well. The people of China might not want to let go of that.
And what about the literature and everything that's already written down in their native language? It's not just a question of translating everything. Trust me, things get lost in translation.
Also, if we're going to have everybody in the world work with the same script, there are arguably better candidates such as Arabic and Sanskrit.
In Sanskrit, for example, there's only about 50 letters (49?), it's impossible to mispronounce, the whole gender-neutralization situation becomes irrelevant (because well, things like car, village, fruit, pencil etc have genders (of which there are 3 btw — masculine, feminine, and neuter, but they aren't used in the way you would think)).
> why haven't china moved to a western (?) alphabet based writing?
By the way, here's how this appears to us non-westerners — okay here comes this Western hero who thinks the rest of the world is nonsense and should be replaced by the superior Western™ system.
I'm not a historian. But in the history, actually there was an attempt to "romanize Chinese".[1] And it failed. It's a complex issue, mostly a political one, and to say there is one single reason that it failed would be a gross simplification.
My anecdote is:
1. Chinese is a very "easy-to-read, hard-to-write" writting system. It's compact and information dense compared to alphabet based system.
2. There are too many homophones. In a daily conversation, you provide extra context with body language and your tone. But in a written context you have no such tools. All you have is "icons".
Chinese writing is not phonetic, while a western alphabet would necessarily be phonetic.
One may think that is a disadvantage for Chinese, but since there such a wide variety of spoken Chinese languages, Chinese writing acts as a unifying framework across different spoken languages.
This allows a (more or less, ignoring traditional and simplified) common script for one billion people.
So a person in Taiwan can reasonably read a newspaper in Taiwan, Hong Kong, Shanghai, and Beijing. But that same person would probably have trouble speaking Cantonese with Hong Kongers.
There is a similar thing going on with written arabic, where Ḥarakāt diacritics indicate short vowels, long consonants, and some other vocalizations, but these are generally left out of writing except in the Qur'an. So you have a classic phonetic key with which to recite the Qur'an, but almost all written text besides that can be read by wildly different speakers.
I'm amazed at how two (Mainland) Chinese people can always communicate.
The level of shared cultural foundation is beyond what we have in Europe, I believe.
While they use the same script, they are not interchangable. A person from Taiwan would recognize characters in a HK or Japanese newspaper but would not necessarily understands it. A good half would only make sense to someone who speaks Cantonese.
Also, several languages have moved from a script if iconographic characters to phonetic ones: for example Vietnamese and Korean, the former adopting a phonetic western alphabet with accents, and the latter developing a new phonetic script, hangul. Japanese developed two new syllables based scripts, hiragana and katakana, which are mixed with the old word characters (kanji). All these used to use Chinese styles characters prior to the switch.
No reason Chinese couldn't do the same. In fact, it did… pinyin is a formal phonetic alphabet for Mandarin Chinese that's based on the western alphabet with additional marks denoting the tones of the words. It's only hard I'm an educational setting, but you could use it anywhere!
Someone from Taiwan can read a HK newspaper no problem.
The pinyin phonetic alphabet only works for mandarin, while a unified written script applies beyond mandarin.
Learning written Chinese not only connects you across varying spoken Chinese languages, but also connects you with the rich history of Classical Chinese text.
The formal pinyin system is great, and should be used along side with the actual Chinese characters. But there is no reason to replace the rich written Chinese characters, which connects across space and time, with a narrow and hollow substitute.
A similar reason why US hasn't switched from imperial units, just way bigger.
The current system works, replacing it would be a massive multi-decade long change with a lot of opposition. Cultural pride/arrogance plays a big role. It's not inconceivable to think that the Chinese regime might fall in the process if it tried to pull it off. Given all these, the benefits just don't outweigh the costs/risks, at least in the horizon of the next 100 years.
If you want to write Chinese text on a computer - you still have to use Latin alphabet, because it's not possible to put thousands of characters on a keyboard.
So a different writing system for Chinese has already been developed.
Some other languages also have more than one writing system. (Serbian and Azeri).
When mobile phones became accessible to the general public then many people used Latin alphabet to write in Russian, because SMS didn't support Cyrillic properly.
Eh, you can find at least three (Traditional) Chinese input methods on any given modern system, and Pinyin is only one of them. Cangjie and Bopomofo require no understanding to Latin alphabets. Not to mention there are other less used methods such as Boshiamy.
As I understand - Pinyin is still the most popular one. It also is compatible with all standard Latin keyboards and doesn't require touchscreens or microphones. So pretty much all educated Chinese people are familiar with the Latin alphabet.
But for most people outside China - the Chinese alphabet looks like gibberish.
Pinyin is used in regions where Simplified Chinese is the predominant written language, but other systems are generally used in regions where traditional Chinese is the written language e.g Hong Kong, Macau and Taiwan
In English I use speech to text a lot (including completely substituting it for typing for a year when I had wrist issues). If your microphone is right next to your mouth and you're using good software it works in surprisingly noisy environments: I can talk quietly directly into my mic on the subway without issues. And when I'm at my desk I use a boom mic next to my mouth, with similar benefits.
Quiet environments were, for me, more of an issue: it combines very poorly with open plan offices since you bother the people around you.
>If you want to write Chinese text on a computer - you still have to use Latin alphabet,
Non-native Chinese-Script language learner - I use touchpad stroke-input on my macbook 50% of the time (and touchscreen stroke input on phone 100% of the time).
This is an interesting question. computers deal in numbers. but were originated in a latin alphabet and are optimized around an alphabet style of processing. while we managed to jam chinese into unicode by treating it as if it were just a big alphabet. this is not entirly correct and there is a lot of information lost in the process[1]. what would a native chinese computer system have looked like?
However note perhaps the simplicity of the latin alphabet(a small set of separated characters) lent itself well to a simple implementation on early computers. which got them out of numerical processing earlier than if they had to speak chinese or even something like arabic where the written language has advanced to the point where the connected cursive form was the only correct form. the rendering of which would be tricky for early computers. note that english has a connected cursive form, the art of which has just about been destroyed by computers.
1. I don't read(or speak) chinese. but the characters are composed of sub characters and stroke order that may provide hints or insight into the nature of the character. Or they may not, I don't know, I don't read chinese.
Not the original commenter, but I live in Singapore where I am surrounded by Chinese characters and several Chinese dialects. The first issue is that there isn't one "Chinese" language. Mandarin is dominant as the official language in China, but even here in SG one can still hear Cantonese, Hokkien, Hakka, and a few other dialects. While these each have unique grammars, the characters for the most part can still be understood across dialects (and for languages like Hokkien, have been mapped to those characters). Also, for Mandarin, there are very few distinct sounds (phonemes), even accounting for tones, so there are many, many more homonyms than in, say, English. The characters help disambuguate, whereas spoken or phonetically-written Mandarin relies heavily on context.
Not sure why you're being dumped on here. Written Chinese is objectively harder to learn than any other written language still in use today for exactly the reason that people need to memorize thousands of characters.
There is perhaps a benefit to having several mutually unintelligible languages all "sort of" compile back to the same written text, but that benefit is increasingly being eroded as the version of Mandarin spoken in mainland China becomes the lingua franca, not just inside China but also in parts of the diaspora.
If everyone is more-or-less able to understand spoken Mandarin, then it's no big leap to codify a phonetic written representation of that pronunciation, whether using Zhuyin or Pinyin or something else. It's a totally achievable goal, and we know it's achievable because Vietnam, Korea and Japan already did it. Claiming it's impossible is just Chinese exceptionalism.
The real question is not whether it's possible, or whether it would make the language easier to learn, it's whether Chinese-speaking people - and in particular the government with an authoritarian rule over the education of the overwhelming majority of Chinese-speaking people - want to do it. And the answer is they do not. And so it persists.
Also - should the thing we optimise for in a language be how easy it is to learn? You learn a language in a small fraction of the duration you will actually use it for.
English is a terribly inconsistent language, btw. If you know others, you realize there's a big gap between pronunciation and the word based on historical reasons. People who learn struggle a lot to communicate effectively due to these inconsistencies. Maybe some of them can't go play with their friends too.
If we used something else instead of english as the Lingua Franca, some say lobjan, maybe we could be more effective. Let's all switch.
I hope the point is clear. It's not all about the most efficient from your perspective but cultural, historical, pragmatic reasons, including that languages and ways of communicating mutate by themselves instead of being pushed top down.
Information density: Writing takes awhile, but reading them is super efficient, a short line of words can replace a small paragraph in alphabet based writing. For us Chinese reader, could never "appreciate" the character limits then-twitter, our language is so dense that 12 Chinese character(which occupy 24 single spaced characters) can describe a full event, what can alphabet based writing do with 24?
I'm not sure if you're aware, but asking why an ethnos/nation hasn't significantly changed their language/culture to appease someone who finds it difficult to grasp comes across as incredibly rude.
I am not asking for me. I never interact with Chinese writing, so it is not my problem, I am asking for them, as I imagine that learning a minimum of 3000 icons would take a lot of time. Computers struggle with Chinese writing (or at least last time I talked with my friend he had this keyboards where they had hundreds of combinations).
I found that believing that a person trying to understand something and asking a respectful question is rude is incredibly rude and ignorant.
But you can make the same argument for any non-englisb language today - any language is more difficult to use than English when it comes to computers. For the general case, if your friend was living in China, it'd be the norm, there wouldn't be any Saturday classes. The diaspora has to make do. Moreover, in China, it is the norm, they do not perceive their writing as being problematically difficult, they've made do with it for some years now and it works in China.
I do want to stress that any language that is old enough will eventually contort into a state where the writing is lagging behind the spoken language as it develops faster, and eventually lots of room for optimization starts to appear, lots of legacy to remove. As the culture develops and the spoken language simplifies and words get added, ambiguity seeps in, rudimentary language construcs start to appear unfamiliar to the commmon speaker. The vocabulary now consists of a large mix of old and new, with some redundancy and barbarisms thrown in for good measure. And now, there's lots of room for improvement. And sometimes a nation (or its authorities) decides that its time to simplify things, as mainland China and Sweden and others have done. I wish someone did this with English, bit since there's a multiplicity of English speaking countries, there will never be a meaningful overhaul that doesn't turn into a massive mess.
The icons are made up of strokes and there are only a handful of strokes in Chinese writing. The combination of strokes and the location is arguably as complicated as spelling in English.
When I read anything, be it mandarin or Japanese or English, I attach meanings to words first. In fact, I am attaching meaning to logical structures and phrases, and then the individual words make the detail. Converting words into sounds seems to be a different skill from converting words into meaning. It really doesn’t matter whether the words are made of strokes or letters of the alphabet, the breakdown of the little details is a separate mechanism from comprehension.
How many words did you memorize in English?
All it's inconsistencies like butcher were the pronunciation isn't recognizable from the spelling.
Is it read or read?
I doubt you can simply map the pronunciation of the chinese spoken languages to the latin alphabet.
And that's only the words, you also need grammar and different languages have tenses that don't even exist in other languages. Just compare Hebrew to English.
After that you need to train a billion people, you still need to conserve the knowledge of language or all the historical written texts are lost.
But memorizing words is relatively easy. Adding to that the drawing is not. I am not saying it is impossible, it is not obviously. My mother language is Spanish, so I also have 26 letters.
>林 (lín): This character means “forest” or “woods” and is composed of two 木 (mù) characters side by side. It represents a small forest or a group of trees. 森 (sēn): This character also means “forest” but represents a larger, denser forest than 林 (lín). It is composed of three 木 (mù) characters arranged in a triangle.
That's more logic between the symbols of tree, wood and forrest than between the same english words, and it even looks like some kind of tree.
Spanish has lots of those accent marks, so it's more than 26 letters, it's also accent marks.
Spanish has accents in the a, e, i, o, u. I never use them, ans is still understandable. If you really believe that writing chinese is easer than writing in Spanish, you are absolutely wrong.
Speaking as another native spanish speaker, I also basically never use accents. I sometimes add them by applying spellchecker suggestions in formal writing (Work emails, etc) when I see the red squiggles.
I work with a friend who is Chinese and she often takes notes while learning things. She is at least as fast as I am at longhand writing in English AND she has to translate to Chinese on the fly. Also she ends up with WAY more compact notes than would be possible in an alphabet.
I think maybe it's English that would benefit from being written in ideograms! The popularity of emoji is partly due to their compactness.
Yep, Chinese characters are terrifyingly complicated, but the Chinese language is more compact sentence-wise and text-wise. Actually, the Chinese were acutely aware of how inconvenient the heavily-stroked characters are. That's why they invented simplified Chinese.
Let's instead have the english speaking community switch to a devanagari alphabet based writing system.
It is so much more comprehensive - you never have to fumble about with the right way to pronounce something. Any particular combination of letters has one definitive way to pronounce it - so if you can read a phrase you can also speak it out without any ambiguity.
So much better than the mess english is. </sarcasm>
For what it's worth, one of modern China's most famous writers, Lu Xun, was adamantly in favor of eliminating the characters. Your question is by no means stupid.
There are many reasons the characters have not been replaced by an alphabet. Ultimately, it comes down to the weight of tradition. Chinese has had basically the same writing system for 2000+ years. The characters are deeply embedded in Chinese culture, and the structure of the language itself is closely bound up with the characters. Only young kids use the alphabet, as a stepping stone to learning the characters, so proposing to use the alphabet comes across as if you want to dumb down the language. There are over a billion people who have learned how to read/write the characters, so the system has huge inertia and buy-in. As others have noted, this is like switching from imperial to metric, but a million times harder and emotionally fraught.
Hot take: we also just memorize icons, they just happen to me made out of 26 symbols. But to read fast, you absolutely do use some high level icon based parsing, IMO
Definitely, but you can read a word you never came across (in a language were the Latin alphabet mapping is actually sane, like Italian or Spanish) by just knowing a few pronunciation rules, that depends on the language.
Clearly English is not a good example for this, I know.
The same is true in Mandarin Chinese, to some extent. A native speaker or someone who speaks the language very well can deduce a lot about a word's possible meaning and pronunciation from its component radicals. There are less of them per character than the average English word has letters, but a similar system exists where a Chinese person who comes across a new word in a paragraph will have a lot of context clues to tell them what that word might mean and how it might be pronounced.
One thing to note though is that this is easier with traditional Chinese characters than with simplified characters because part of the simplification was often removing components of a character or simplifying them so that they no longer have an obvious meaning.
The simplification did make it easier to write by hand (which is becoming less and less relevant with computers) but doesn't necessarily make it easier to learn new characters because of this.
That I didn't know and it's actually fascinating! Can you point me to some resource that explains this with some examples, for people who cannot read Mandarin Chinese? TIA!
Well, I still think Latin alphabet (or the Arabic one, but I don't know it enough to be sure about it) is more optimized / easier to grasp conceptually when you don't know the word. For example one of the compounds illustrated mixes 2 pictographs, but one is used for the meaning and the other for the sound of the final word (the "to wash oneself" example); that doesn't sound easy to me.
There are enough radicals that it's reasonably common you can indicate both the likely correct pronunciation and a bit of the meaning with one of them, and radicals can also be compounded/nested (e.g. combine two radicals to make a character, then that character is used as a radical in another character). It's why the standardised vocabulary test, HSK, is so important, because very often more technical or less common characters are built from more common characters that you do know. Sort of like English, where I could tell you that I majored in "lemonadology", and even though that's not a word you've ever seen you would probably have a decent idea how to pronounce it, and that it's probably got something to do with studying lemons, likely in the form of lemonade. This is informed by a cultural context where we both know and understand what "lemonade" is and means, as well as just the knowledge of the "ade" and "ology" suffixes, and the "lemon" noun. The same thing happens with Chinese characters, sort of. Those characters contain a lot of information, they are very much not just tens of thousands of letters. Also, new characters are invented occasionally by people combining previous characters and radicals, although that's less common than in English. And "word" in Chinese can encompass both many individual characters that can be considered words on their own, but also combinations of characters that form a distinct word. Like 老公 means husband, so it's a word, but if you interpreted it as two words you could read it as "old man";no Chinese speaker would do so seriously, it's not ambiguous in its "husband" meaning, but it comes into play when wondering why 老公 is different from 丈夫,which is a different way to say husband with a different cultural context.
For reference, Chinese people think their language is very easy and that English is absurdly hard (for example, that we have an absolutely unnecessary yet mandatory number of ways to indicate tenses and plurality, and if you screw them up we think you're stupid even though there's no legitimate functional use for the differences - to say nothing of the fact that even with simple pluralisation like adding an "s", that's not one sound, that's three distinct sounds that a Chinese person would need to learn). A nationalistic Chinese on Weibo might air the opinion that the Chinese script is more optimised/easier to grasp conceptually when you aren't familiar with the concept than English, where you have to have familiarity with the suffixes and prefixes of two separate non-English languages (one of them dead!) as table stakes, plus a lot of familiarity with French. Of course, this wouldn't be correct, most people don't really understand a new word they come across in that much detail (in English or Chinese), it's just because you're not a speaker of the language and it's in a different script that it seems so hard to you. I promise you, Chinese is not objectively harder than English to learn. Chinese children have the same language development timeline as English and Arabic speakers, and Chinese scientists are very prolific and accomplished. Chinese is just quite difficult to learn for an English speaker.
First of all, I'm pretty sure that my PoV is biased by the fact that I'm European and grew up in a Latin alphabet world. With that said, I'm not saying that Chinese writing hinders Chinese people capability or development - China had been and it's becoming again the leading country in the world on many aspects - but just as programming languages syntax go, the same applies to human languages. Python syntax is simpler and less verbose then old-school Java for example, but that doesn't mean that shitty software can be written in Python and great software can be written in Java, or that someone cannot learn Java as their first programming language.
Maybe there are some mental tricks and shortcuts that kick in with pictographs that I'm not aware of because I'm not a language expert or a Chinese speaker, and that make everything simpler for the brain that what I can imagine from my point of view.
Thanks for the interchange anyway, I learned something new :)
> Honest question, why haven't china moved to a western (?) alphabet based writing?
I wanted to link an article written by a linguist, but I couldn't find it. One of the hurdles would be tones: in order to express them, one would have to use plenty of diacritics and westerners would not be able to pronounce them correctly anyway.
Because it's part of the cultural heritage. Very much like the weird spelling of many English words is.
In particular, the phonetics of Mandarin Chinese underwent several waves of simplification to the point that many characters are pronounced pretty much the same - in particular, there are a lot of syllables pronounced /yi/ or /shi/.
So, transition to a purely alphabetic writing system would mean losing access to all the sophisticated texts of culture. There is even a poem illustrating that phenomenon, and taking it to the extreme: https://en.wikipedia.org/wiki/Lion-Eating_Poet_in_the_Stone_...
More practically, everyone learns their first language as a child, and at that point does not get to decide whether something is too "crazy" to learn or not, since nobody asks their opinion.
Further simplification was attempted at some point by the Communists (also as a means to increase adult literacy) but they rolled it back quickly.
Also, it's not 20,000 "icons" to learn. There are a couple of hundred composing elements ("radicals," although it's not entirely correct to call all of them this), which just repeat themselves in different arrangements, and there are some rules to it. Beyond these, only a hundred or so characters have purely unique elements.
> A lot of times the translation would sound like how Chinese is used in China instead.
Ignorant person here (sorry if I'm asking a dumb question), but given the larger population of China to Taiwan, why is that not the correct thing to do? Or is this "Traditional Chinese" a language spoken only in Taiwan?
When translating to Dutch, I wouldn't expect it to output Dutch as it is spoken in Belgium, Suriname, or Limburg either; I'd expect it to output Dutch as the main body of speakers speak it. They should get their own language designation instead, if the software wants to support translating into variants of the language
Chinese generally has two writing systems: traditional and simplified. In the 1950s to 1980s mainland China developed simplified chinese in a bid to make the language easier to write. Note that these are just different ways of writing the same characters: think cursive versus block. They are not different languages -- the vocabulary and grammar can be identical.
Traditional Chinese is used in Taiwan, Hong Kong, Macau, Singapore, and to one degree or another in various Chinese immigrant communities around the world. Taiwan is obviously the 800 pound gorilla here.
Taiwan has come to develop its own style of written Chinese -- think US vs British English but more nationalistic. So the question is: when Google Translate converts to traditional characters should it also be translating into Taiwan's written patois [not to be confused with the Taiwanese dialect]? After all, Google Translate does not differentiate between Britain and the US in English (it sadly uses British). And there are other groups which use traditional without this patois (Hong Kong, say, though likely not for long), albeit in fewer numbers.
The author does have one really good point, and that is that Google Translate is mistakenly describing traditional as zh-TW, that is "traditional as used in Taiwan". As long as it insists on doing that, it should also be translating into Taiwan's written patois. But what it should be doing instead is describing traditional as zh-Hant.
"Traditional characters" and "Simplified characters" are writing systems, not languages. You can write many variants of Chinese with either system.
Traditional characters are still used in Taiwan and Hong Kong, while simplified characters are standard on the mainland and in Singapore. However, traditional characters are still used in some niche cases on the mainland (such as decorations, names of shops and restaurants).
Whether a translation to "Traditional characters" should default to Taiwanese-flavored Mandarin is an open question.
I don't think the downvotes are fully justified and I think it's a valid point to make. Taiwan itself for example for the longest time had an extremely unintuitive local pinyin version. They are at least no longer as strict in enforcing it versus the now official one that came from the Mainland. It is sometimes the cause for going the wrong place in Uber trips.
However, it is worth noting that there was basically an iron curtain between China and Taiwan for a long time after Chiang Kai-shek went there. A lot of IT terminology for example developed separately in Taiwan and China, so a lot of technical terms are different. The following website lets you look up scientific terms and convert them between china mainland and taiwenese.
Based on a sibling comment from raincole, Traditional Chinese isn't used in China at all. Shouldn't the translator then simply output the Taiwanese (and, according to them, Hong Kongese) variants when selecting that language, rather than the language having two possible words to use?
Thanks. So essentially the site is saying that Google Translate outputs what I'd call (for simplicity) Taiwanese/Hong Hongese with a foreigner's "accent" (but in writing)? It sounds odd to me. If China doesn't speak Traditional Chinese, why would GTranslate output text with China's usage patterns
Be on the lookout for misinformation obscuring the fact that Cantonese (a.k.a. Yue), spoken in Hong Kong, is a language. It’s not mutually intelligible with Mandarin (unlike, say, Mandarin spoken in Taiwan, which is more or less Mandarin with a different accent and slightly different vocabulary) or other languages of Chinese family. Cantonese uses different transliteration systems as well (Pinyin doesn’t work for it).
Who’s mainly responsible for there to this day being no ISO standard for transliterating Cantonese, and its conspicuous absence from Google Translate (despite the whopping 86 million speakers—consider that Translate supports, say, Welsh, Basque, Dhivehi, and many others that are spoken by fewer than even one million), is probably obvious. That entity is much helped by the issue being opaque to most foreigners, who are sufficiently baffled by a completely different writing system to just not care and bundle all languages written in Chinese script into one basket.
> Who’s mainly responsible for there to this day being no ISO standard for transliterating Cantonese (like Pinyin), and its conspicuous absence from Google Translate (despite having 86 million speakers—consider that Translate supports, say, Welsh, Basque, Dhivehi, and many others that are spoken by fewer than even one million), is probably obvious.
There's no ISO standard because there's no actual standard. Even just within Hong Kong there are multiple systems in use.
Let’s not mix cause and effect. There are multiple systems because there’s no standard, and there’s no standard because there’s strong opposing interest. Cantonese-speaking world would strictly benefit from this standardisation, lack of it impedes business and takes away from language’s legitimacy.
I wouldn’t be surprised if there were attempts to introduce a standardised way of romanising Yue, but certain groups with many votes considered this national interest as they try to maintain authority across a very diverse set of people speaking different languages.
Can you point to a specific instance of a proposed ISO standard that was shot down by committee?
Because otherwise the default explanation for why there's no ISO standard for Cantonese romanization is the same as why there's no ISO standard for romanizing Burmese, or Malaysian Jawi etc. There's little demand internationally for such a standard.
And if the Hong Kong government's standard, or the Guangdong government's standard (this one presumably preferred on the Mainland) is officially blessed by ISO, would people currently using something else really switch just because of ISO?
Myanmar, while spoken by fewer people than Yue, benefits from an official standard for romanization from its own government (not ISO), and Jawi is a dialect of a language spoken by fewer than a thousand people total.
When saying “little demand internationally”, could you clarify what international demand was there for a standard for romanizing Mandarin and why that is somehow not applicable to Yue?
> Myanmar, while spoken by fewer people than Yue, benefits from an official standard for romanization from its own government (not ISO)
Said standard getting ignored by most Burmese people in favor of ad-hoc romanizations, similar to Cantonese speakers in Hong Kong and Guangdong mostly ignoring various standards published by their respective governments.
> Jawi is a dialect of a language spoken by fewer than a thousand people total.
Malaysian Jawi https://en.wikipedia.org/wiki/Jawi_script is not a dialect, but an Arabic-derived writing system for Malay. While not all Malay speakers use it, it's certainly more than a few thousand.
> what international demand was there for a standard for romanizing Mandarin and why that is somehow not applicable to Yue?
Mandarin is an official language of the UN, whereas Yue isn't.
> So essentially the site is saying that Google Translate outputs what I'd call (for simplicity) Taiwanese/Hong Hongese with a foreigner's "accent" (but in writing)? It sounds odd to me.
Taiwanese is another language... but let's use your simplified terms for now. Well, yes, pretty much this. It's not that different from writting "fish and chips" as "fish and fries" in British English.
I'd be like if Belgium used Dutch but with the Cyrillic alphabet. It's still the same language but with another way of writing it and some regional differences. (To be clear, this is about Mandarin used in Taiwan, Taiwanese and Cantonese are different languages which Google Translate does not support)
British vs. American English seems like a good analogy given that they also have distinct language codes. Would Google translate camión to lorry vs. truck depending on en-GB vs. en-US?
> Ignorant person here (sorry if I'm asking a dumb question), but given the larger population of China to Taiwan, why is that not the correct thing to do? Or is this "Traditional Chinese" a language spoken only in Taiwan?
They addressed this:
> NOTE: The Google Translate menu only says "Chinese (Traditional)". However, if you pick the option, you will see the language code reflected in the URL is zh-TW, which means "Traditional Chinese as being used in Taiwan". The alternative option for Google to fix this problem is to officially drop zh-TW support and switch to an appropriate language code instead, such as zh-Hant.
The whole post is on base on the fact that "TW" in their URL. But I think this is really just a internal implementation detail of Google Translate, which is not intended to display to users (e.g. complete hidden on their app). Everywhere on their UI shows "Chinese (Traditional)". That is what they try to communicate with users.
Sure, zh-TW is somewhat misleading. But they nor say that parameter is a ISO 639 or RFC 5646 conformed.
Please read the rest of the comment. I was assuming that what is called "Traditional Chinese" is used in China, based on its name (Chinese) and that GTranslate outputs it. Urdu and Tamil aren't spoken in China and isn't what GTranslate outputs when you select a Chinese language, afaik
Taiwan absolutely does not call itself China. At best the government that holds sovereignty in Taiwan is called the Republic of China, and yet daily everyone calls the country itself "Taiwan," and in fact even the passports were recently changed to say "Taiwan" in enormous letters to remove any confusion about what the people want. The government would change its name to also be Taiwan, or maybe Republic of Taiwan or something like that, but it can't do so without changing the constitution, which Xi Jinping has threatened to invade immediately if this occurs.
So it's both incorrect and disingenuous to say "Taiwan calls itself China."
> “We don’t have a need to declare ourselves an independent state,” Tsai told the BBC. “We are an independent country already and we call ourselves the Republic of China, Taiwan.”[0]
So it's not factually incorrect. Of course the issue is fraught and just saying "the country on Formosa is called the Republic of China" would be completely deceptive, but nobody is doing that.
Talk about Chinese politics is always heated but I don't appreciate being accused of being disingenuous.
"Republic of China" is a dramatically different name than "China." To claim that both the PRC and Taiwan call themselves the same name is to muddy the waters and contribute to the PRC's cultural imperialism through linguistic mechanism.
Furthermore, though Tsai Ing-Wen is the president of the RoC, as I said, there's a rising independence movement separate from the settler-colonial government of the RoC. Among these people, and the majority of Taiwanese, Taiwan is "Taiwan," and "RoC" is at best a formality, at worse an unchosen government underwritten by an unchangeable constitution.
If you're unintentionally muddying the waters that's one thing, but I react strongly because I strongly oppose anybody that assists the PRC's cultural imperialism.
What do people living in Taiwan call the 'other' China? Is "mainland China" a correct thing to say, or is the word "main" there implying that Taiwan is lesser and not an equal/peer state?
I used the word China in other comments because I didn't know what to call it otherwise, hoping that in the context of Taiwan it is clear what I mean
They call it 中國 "Middle Country," or "Middle State" or "Middle Kingdom," (bear in mind that the meaning 國 "guo" has changed over time) the name for the territory and empires encompassing basically the current territory of the PRC since the Western Zhou dynasty.
Under no circumstances is Taiwan every seriously referred to as "China," not even "Republic of China," the official name in the constitution, is really used anymore. The passports now prominently say "Taiwan," and polling indicates that "Chinese" identity in Taiwan is dying swiftly with the settler colonialist KMT.
The sibling comment may be referring to political discussions between politicians, especially when talking with PRC officials. I'm not sure, I've almost never heard those terms used except by really weird super-KMT taxi drivers.
The parent comment that "Taiwan calls itself China" is simply incorrect, and the government mentioned, in the 70s, was a KMT totalitarian government that's been essentially overthrown as of the 90s.
It's a pointed issue that you'll often find heated responses from because Taiwan, for basically the first time in its history as a globally participant nation, is finally getting to establish its own identity, separate from the Dutch, the Qing, the Japanese, and the ROC/KMT settler-colonialists. There's great fear that the CPC, having failed to win a culture war here with han chauvinism / han supremacy, will simply resort to violence to imperialise the nation.
"Mainland" means "continental" in general; the other context it comes up in is "mainland Europe". The term is ... not completely uncontentious in its own right, but mostly on the "assumes the other China is another China" dimension.
I'm not sure what the last part means about the other China being another China, but continental sounds like a good alternative actually. No concept of main versus spare/alternate/lesser, just not an island
Yes, but the island of Hainan is considered part of "Mainland China," while the mainland areas of Hong Kong and Macau are not, so in this context it's a purely political term, trying to shape reality as opposed to just describing it.
It is not. Traditional Chinese is not used in mainland china. It was superceded by simplified Chinese in areas controlled by the Chinese communist party.
This is a rather basic piece of knowledge when discussing Taiwan and Chinese relations.
... The exact point being that Taiwan by and large does not use simplified Chinese. They are different countries. They speak effectively different languages.
effectively different languages? no, both the mainland and Taiwan both speak Mandarin, although with distinctive slang and accents. I grew up learning simplified and I can guess a lot of traditional characters reasonably well, although I hear going from traditional -> simplified is much harder
They speak the same language in China and Taiwan: Mandarin Chinese. The question is whether to use more Taiwanese-sounding phrases when traditional characters are requested.
A very similar (just more technical/less known) point that's pretty close would be enforcing Sanskritised Hindi written in devanagari on anyone trying to use Persian-influenced words or Urdu. (Or vice versa.)
This seems purely political and obviously divisive: Taiwan (& HK) has its own history and China doesn't want it to.
Be careful when you don't speak the target language and you run something significant through a machine translation. The results can often surprise you and your audience.
I wrote a speech in English, and had Google Translate it to Spanish. This was around 2010. One of the lines was "There is electricity in the air." The translation read, "No hay electricidad en el aire." I speak Spanish well, and upon proofreading, I freaked out that it would negate the sense of the whole sentence, but it can, and it did.
That is on the basic assumption those are incorrect and required to be "fixed".
What makes you think Google "would" fix it?
Facebook has sided with Mainland China during 2019 protest in Hong Kong simply because there are more Chinese working inside Facebook than people from HK.
Youtube has sided with mainland China to censor words in Chinese comments simply because the censorship team employ, of course Chinese.
Just because Google doesn't operate in China, doesn't mean its influence are not there.
( Don't even get me started on Apple about its supply chains )
It's incorrect per the language code they're using, zh-tw. They obviously side with the CPC on political issues, but that's a separate issue, they should just change the code to the more accurate "zh-Hant." zh-tw exists to specifically identify Taiwanese Mandarin, which is a defined separate dialect (we really should stop using "Chinese" as a word to describe the various flavors of Mandarin, let alone including entirely different languages (NOT dialects!) such as Hakka, Cantonese, Shanghainese, Ninghainese, etc).
Google should rid itself of the notion that “Traditional Chinese” and “Simplified Chinese” are anything but complete misnomers if you’re talking about languages. They are scripts, and scripts don’t have a 1:1 mapping to languages.
Imagine a world in which Google Translate has one entry called “Latin” for anything using Latin script (Welsh, Irish, you name it) and another entry “Simplified Latin” for English. Just like that, translation “from TC to TC” (e.g., Cantonese to Hakka) is very much a thing!
This is not a Latin situation at all, it’s similar to British English vs American English. I just checked and Google Translate doesn’t have separate Englishes for American, British, Australian, New Zealand, Scottish, Indian, etc.
Your comment would be true if American and British English were linguistically classified as distinct, mutually unintelligible languages (beyond amusing accent jokes).
A Mandarin speaker in Taiwan has a different accent and slightly different vocabulary than Mandarin speaker in Beijing, but neither of them would be able to converse with a Yue speaker any more than a Russian speaker would be able to converse with a Polish speaker.
It’s pretty clear Google Translate only outputs Madarin in Simplified and Traditional Chinese. Find a person from either side of the strait, as long as they’re literate and have been taught some version of Madarin, they should understand either version of at least 90% of phrases on the linked page (do a verbatim T2C/C2T first if they have trouble recognizing some characters), only one version may be slightly weird.
Also, you’re very mistaken if you think all versions of English are mutually intelligible.
Second, the fact that they also speak Mandarin (because it was a mandatory lesson at school or they were otherwise forced by circumstances to learn it) does not mean there should be no support in Translate for their native language, which incidentally dwarfs many languages that Translate does support in number of speakers.
If you think that’s viable logic, you should try applying it to other languages with predominantly multi-language speakers. Start with Irish (98% of people in Ireland speak English, after all) and move on to Catalan, see where that gets you.
You totally can, but we are talking about Google Translate here, how it mislabels scripts as languages and how it doesn’t support a major language (for what is likely political reasons).
The problem is that detecting the different variants to label the training data requires understanding the semantics (detecting French vs English can be done purely lexically), which has historically been a hard problem. With modern language modeling it’s tractable.
I think Google has the resources required to detect different languages in Chinese script (different combinations of characters and vocabulary, even specific Unicode characters, etc.) but there is 1) lack of incentives and 2) opposing political interests. Doesn’t help that there is a bit of a mess when it comes to Chinese characters, like a single Unicode point can be rendered differently depending on the font.
Author here. Surprised to see this website made it on to Hacker News
Glad to see some debates on the vocabs listed there:
* Yes, it is very IT oriented.
* Yes, I have been in some academic environment. (but seriously, 文本 is rarely heard in my life!)
* No, it is not supposed to be an exhaustive list of all things wrong nor _the_ correct list.
At the end of day, it is a short list of what I'd like to see it changed and thus I tracked it. We may not all agree on the translations, but I think we can agree on 1/55 is a pretty sad score to have.
--
Lastly, feel free to fork and run your copy if you'd like a different set of words / translations to be tracked!
Simplified and Traditional are different (but closely related) writing systems that can be used to express the exact same text. They are not different linguistic variants of Chinese.
It looks like regardless of which writing system you choose, Google still translates into the most widely used variant of Chinese: Standard Chinese, as spoken on the mainland. It just writes the result using the requested characters. It doesn't decide to use more characteristically Taiwanese phrases if you select traditional characters.
In order to address the complaint here, Google could allow one to select the writing system (Simplified vs. Traditional) independently from the regional variety (Mainland, Taiwan, etc.).
If, as you are claiming, the Taiwanese don't have their own Taiwanese Mandarin, with their language being just the same as the Mandarin used in China, how can it be that at the same time they also don't get to have a say in what constitutes the "standard" of that "common" language they share with the "mainland?"
Also, there is no reason for constantly calling it "Mainland China" where you could just as well call it "China," unless it is to further a political agenda.
Anyway, this is all beside the point. The concept of a "standard" language is political not linguistic. If Google is run as a business, not a political entity, their linguistic choices should reflect the language actually being used in any given market, and not be based on purported "standards" promulgated elsewhere. The same simple concept that somehow already works well for other language pairs that could be construed as similar to the point of being the same should also be applied here.
> If, as you are claiming, the Taiwanese don't have their own Taiwanese Mandarin, with their language being just the same as the Mandarin used in China
This is like saying the Canadians and the Americans each have their own unique language. They speak the same language, with small dialectal differences, and small differences in official standards (semi-official, in the case of the US). The internal differences in Mandarin as spoken in different regions of China are far larger than the differences between the ROC and PRC standards.
In the case of the PRC and ROC (now commonly known as "Taiwan"), the PRC standard is derived from the ROC standard, so emphasizing the differences is somewhat strange. They're very closely related to one another.
> Also, there is no reason for constantly calling it "Mainland China" where you could just as well call it "China," unless it is to further a political agenda.
"Mainland China" and "China" are not synonymous. Mainland China comprises the provinces of the mainland and Hainan. However you define China, at a minimum, it also contains Hong Kong and Macau, which are not part of "Mainland China." Hong Kong notably uses traditional characters, and it has slightly different standards than Taiwan.
> If Google is run as a business, not a political entity, their linguistic choices should reflect the language actually being used in any given market, and not be based on purported "standards" promulgated elsewhere.
Google's Chinese translations are so utterly terrible that this entire discussion is almost moot. I seriously doubt that Google is trying very hard to adhere to any particular standard version of Chinese. If you want decent Chinese <-> English translations, use DeepL.
Whether Taiwanese Mandarin is a separate language or not is a matter of opinion. Delving further into this debate will not lead to an interesting discussion as the definition of what constitutes a language is ultimately blurry. So I'm not making a claim either way, just an observation that you can't have it both ways: if the languages are separate, whatever goes on in China is irrelevant. If it is the same language, then the way it is spoken in Taiwan is no less "standard." It seems you agree with the latter, so let's leave it at that.
I don't think the website is trying to "emphasize the differences," just point out the issues with Google Translate: namely that the output for "zh-tw" does not reflect the language actually used by people in Taiwan, and to that end it betrays the trust of the user. Of course it only lists where the problems are, so it's not a balanced view by definition. It focuses on what needs to be fixed.
In particular, as similar as the two languages or variants are to each other, nearly all the vocabulary relating to modern technologies developed separately, and is fairly distinct. I've never tried it but I can imagine a Google-translated text heavy on computer-related vocabulary can easily end up being unintelligible to a Taiwanese, which constitutes poor quality of service on Google's part.
Taiwan is a separate market for Google, and from the business perspective they would do best not to alienate their users there. Of course it is a free service with no reasonable expectation of quality. But if someone went to the trouble of listing all the issues, the problem might be worth addressing even for purely reputational reasons. I read through the whole word list and I'd say it's at least 95% accurate. Frankly, I'm surprised it sparked such a debate.
As for "Mainland China," you are technically correct about the scope. The term has its use in certain contexts if one is aiming to be very precise (or pedantic). But here it's tangential to the discussion.
> Whether Taiwanese Mandarin is a separate language or not is a matter of opinion.
There are dialect pairs where the question of whether they are separate languages is blurry. Taiwanese Mandarin and the official Mandarin of the PRC are nowhere near the level of difference where this question even arises. Mutual intelligibility is unproblematic.
> the output for "zh-tw" does not reflect the language actually used by people in Taiwan, and to that end it betrays the trust of the user.
If users were actually selecting "Taiwanese Mandarin," as opposed to "Chinese (Traditional)," and if Google were doing a decent job of translating into Chinese in the first place, I would agree with you. But neither is the case.
> Google could allow one to select the writing system
I'm not sure their models have a concept of writing systems. I noticed that neither Google Translate nor DeepL understand the equivalence of katakana and hiragana in Japanese. Probably every character is just an independent token to them.
To clarify, "Standard Chinese" is the English term to describe the official language of the PRC. There is no country called "mainland," nor is there a country called "mainland China," nor is there a language that can be accurately called "Chinese."
It's much more clear in Mandarin itself, such as 官話. Note that some might translate this as "Traditional Chinese," separate from the typical meaning, when trying to separate "simplified characters" from "traditional characters." In English it's all a mess.
Also, you can write guanhua or beifanghua or "Standard Chinese" in both Simplified and Traditional charactersets. They are almost 1:1. You also used to be able to write Japanese, Vietnamese, Korean, and other languages in Traditional Chinese characters.
That's why we should be careful to separate Guanhua ("Standard Chinese") from Taiwanese Mandarin and etc. imo using the word "Chinese" in the descriptor just sows further confusion and in fact furthers the political goals of the CPC to cast a political blanket over all things even remotely descriptive as "Chinese," be they language, culture, race, heritage, or nationality.
The Mainland is a well established concept when talking about China.
"Standard Chinese," "Mandarin Chinese," or however you want to call it (there are several terms in Chinese itself that refer to essentially the same thing) is the official language both on the mainland and in Taiwan. They have slightly different variants, but both standards grew out of the same historical movement to standardize Chinese. Both the Republic of China and the People's Republic of China wanted to create a national standard to enable universal communication in the country, and the PRC essentially adopted the standard the ROC had been working on.
This is an accident of history and a quirk of the English translation. Though the official language in English is described as "Standard Chinese," in Mandarin it's written as 國語, which just means "national language." The National Language described is Taiwanese Mandarin, 華語, which translates in English usually as Mandarin Language. By some measures it's very similar but it's different enough to deserve a different name, and the efforts of Taiwanese people to separate their culture from that of the PRC, and their ROC ancestors that came from the territory of the PRC, is a valid effort and should be acknowledged. Give it another 40 years or so and the languages will be quite distinct, as Taiwanese consciously incorporate more Hokkien, Hakka, and the various indigenous languages into their version of Mandarin to form a unique cultural identity.
> The Mainland is a well established concept when talking about China.
And people will happily, and inaccurately, describe the UK as "Britain" or "England," much to the annoyance of people who prefer to maintain their own unique cultural identity within the UK. In Taiwan this issue is even more critical as the sovereignty of the nation is under attack from many angles, including linguistically and culturally. There is no country on earth called The Mainland, and there is no country on earth called China, and CPC efforts to engage in cultural imperialism by casting a wide net around the concept of China and Chinese, as well as lay claims to historical Chinese imperial territories, and furthermore to imply that Taiwan is a PRC territory by calling the PRC the "mainland" (thus Taiwan as a province or colony), should be actively resisted.
> This is an accident of history and a quirk of the English translation.
It's not an accident of history. The Nationalists and the Communists had many similar influences (e.g., Sun Yat-Sen is revered by both). One of the ideas they shared in common was the idea of creating a national language. The PRC basically adopted the standardized form of Chinese developed by the ROC.
> Give it another 40 years or so and the languages will be quite distinct
I'm not so sure about this, given the amount of cultural interchange across the strait. Speaking in a more Taiwanese way is actually quite popular in mainland China, where it's considered cute.
> There is no country on earth called The Mainland
I'm not going to get into the entire nationalist dispute here. "The Mainland" marks out a well defined geographic area that's not quite synonymous with the PRC.
google maps also in taiwan made me ride on the highway on a motorbike which isn't allowed and got a nice ticket for following googles directions even with avoid highways enabled.
Google maps is particularly useless in this sense. I desperately wish they'd open source it so local could just add features as needed. In Taiwan, we don't have "highway," "motorcycle roads," etc, as Google understands it (with its notoriously difficult to activate "motorcycle route option"). There's specific rules that should be just implemented by name, flags like "no flower road," "no highspeed highway (red plate and car OK, yellow and below not ok)", no "highspeed highway (red, yellow, car ok, white and green plate not ok)," or maybe just "white scooter only" or even "greenplate only" routes modes. Better if you could just input your exact vehicle type per Taiwan law and get a legal route that way. The information is quite available on data.gov.tw for those that want to have machines read it. Can't speak for other countries.
This on top of the fact that I'm pretty sure google maps uses some USA centric constant for traffic light timer estimations. Traffic light timers are very, very long in Taiwan compared to other countries. One time I had to wait 500 seconds (they had to add an extra big countdown timer to account for this). Usually it's more than 99 (the maximum the 2 digit counters can display). When you're in a car, motorcycle, or scooter, it's safe to multiply travel times by 1.5x to 2x. As for bicycle, I have no idea why google thinks bicycling is so fast in Taipei, but always multiply by at least 2x whatever time Google tells you.
Edit: the 500 seconds was an extreme outlier. Around 99 seconds is the usual in major Taipei intersections.
That can make sense when you're coming from a tiny side road onto a major busy highway.
It doesn't make sense to delay thousands of cars on the highway by 30 seconds so that one car can cross it a few minutes faster. Better to make that one car wait 10 mins (and perhaps by now it is a queue of 3 cars), and save 30 seconds * thousands on the main road.
Other countries normally build bridges or highway merges to avoid this problem.
Google Maps did make you do anything. It's a dumb tool that is not always correct made by people in USA. You decided to forgo basic due diligence in a foreign country.
It's important especially when saying something is dumb, or telling someone they're foregoing basic due diligence, to spell correctly or the statement can lose it's poignancy.
Seems better but it's definitely easy to spot when something has been machine translated, even when using DeepL. Did a test now and if you translate from English into a target language, the corresponding text is definitely not the way a native speaker would formulate it.
It currently supports exactly 4 non-European languages. It won't be a proper Google Translate alternative until it supports at least 3x as many languages.
Their Spanish-English is significantly more idiomatic than google but as others have said - far fewer languages supported. But I suspect deepl will continue to be better than google as they grow.
One funny thing that I found not long ago is that you can traverse a long chain of word translations by re-feeding the translated word. You start with a word let's say in italian, get it translated to spanish. Now you grab that word in spanish and translate back it to italian. You do sometimes a get a different word. You do this again and again. There doesn't seem be a one to one correspondence in certain cases. Don't know if this a known issue nor if some else already spot it.
Yes it is well known, but I wouldn't call it an issue per se, but rather a demonstration of the ambiguity of natural language. In any given language, many (most?) words have multiple meanings. Even if one of the underlying meanings of each word in your translation chain is "equivalent" (this could be a deep philosophical discussion of semantics on its own), each word-to-word correspondence for each language-pair has a different most-probable choice in the translation model (especially without context to help disambiguate).
[Un]fortunately, ja-en machine translation is dramatically better than it was when this website was launched, I guess it was more than 10 years ago. It used to devolve into really bizarre stuff.
It is essentially a fan-out tree across the thesauruses of the languages, so very unlikely to converge, unless there is one overwhelmingly preferred common meaning in all the languages.
In "Late Night with Jimmy Fallon" show, they take a popular song, translate to some language and translate back, and have the guest sing the massacred version. Search for "jimmy fallon google translate"
I wish there were tools for diagramming out sentences like this in languages you don't understand. I'm sure it wouldn't help much for understanding the meaning, but it'd be nice to get an exact/terse explanation of its structure.
I'm using ChatGPT... who needs Google for translation? Either you know a little bit the target language and you are able to spot wrong translations (both in Google translate and ChatGPT) or you don't know enough about the target language and you trust the translation that is given to you (and in this case, ChatGPT is probably giving you a better translation)
Thing is, Google forces their translations upon everyone. For example if I want to check Taipei restaurant reviews on Google Maps with my interface language set to English, each of them shows up translated (poorly), and I have to click or tap every time to "see the original."
Translation quality is one issue, e.g. "加油!" being translated to 'add oil!' instead of something like 'keep up the good work!' which is what it is supposed to mean in this context figuratively. But there is also a bigger cultural issue in that Google developers do not allow for people being multilingual: just because I made a choice for the interface language doesn't mean I need everything else translated to it (with no opt-out).
Google translate has felt stagnant for the past few years now. I've used Linguee in the past, but Deepl has become my de facto translation tool now, offering much better translations.
i am not so sure about the list, I found many are spoken pretty much interchangeably in Taiwan sometimes: 便利店 vs 便利商店, 每秒幀數 vs 每秒影格數(I actually heard more of the former from those youtuber), 高清 vs 高畫質(I actually heard more of the former from those youtuber) etc.
That's another thing we currently struggled with: some of them start seeping into our surroundings.
And part of the reason come from these translation tools, there are many translated materials (website, documentations, etc) produced using tools that do not correctly handle the difference, and those content creators don't really care about that.
Really curious how people views "cultural pollution" so differently. There are some words/usage we takes directly from japan and no one complains, for example 一枚入, 定食, do we need to change those back to what we would have said before influenced by Japan?
What I mean is, culture/language slowly evolves, they aren't stagnant, as long as the world are progressing, as long as we are interacting with each other, language changes.
But this is also how culture works, we can and want to choose what to accept and what to reject,
and just don't want to be the same as the ones who keep acting hostile to us.
Another thing is, related to the OP mainly focused on, they have tendency to oversimplifying words and merging words that are not related, which would cause problem when we communicate.
Take '質量' as mostly used example, which only means mass of matter in zh_TW, but it may also mean quantity and quality in zh_CN, and the list goes on and on.
And about those Japanese words, they don't pollute our tools like the ones in OP stated, we have to choose to use them.
I now use chatGPT for translating english to traditional Chinese and give me some variations. Definitely works better than Google translate.
For example for "I want to build a good gym routine":
我想建立一個良好的健身計畫。
Wǒ xiǎng jiànlì yīgè liánghǎo de jiànshēn jìhuà.
Here are some variations of the sentence along with their English translations:
我想制定一套有效的健身計劃。
Wǒ xiǎng zhìdìng yī tào yǒuxiào de jiànshēn jìhuà.
I want to create an effective workout plan.
我希望安排一個健康的健身日程。
Wǒ xīwàng ānpái yīgè jiànkāng de jiànshēn rìchéng.
I hope to set up a healthy fitness schedule.
ps: I am building an app to make learning chinese easier. Feel free to ping me privately for testing :)
I also do this with translations between Swedish, Danish and English.
I find it fascinating that chatGPT is better at translating (a lot better) than a tool specifically built for it, when (if I understand correctly) chatGPT was in no way designed to translate, and is only in some way predicting text, one word at a time.
How come google hasn't leveraged better existing tech to make the translations better? Is it too computationally expensive?
The Google translate offline translation datasets are absolutely tiny - like, 20mb in size for French. Obviously this is heavily quantised and so on, but it’s not that surprising that a 30-60+ billion parameter language model outcompetes Google translate handily.
I assume you’re right - that Google translate hasn’t been updated to take advantage of much bigger, more computationally complex models. I suppose for direct translation it’s probably not needed, but being able to ask chatgpt to explain the translation (and any cultural nuances involved) is a game changer when you’re trying to learn a language.
But they could easily be using a larger dataset online?
Which goes two ways: maybe this line of reasoning doesn't mean anything; or well yes exactly, but why so small online when they have all this space and also offer Bard.
They don't charge anywhere near enough to do that, I'd imagine; and likely couldn't at the scale they operate at (I mean, they are even embedded into many apps to help instantly translate banal things like comments). Imagine trying to translate a long news article with a sequence of max-length LLM inferences.
Yeah I'm not actually suggesting it run through Bard/a LLM, I just mean surely small dataset size is a design requirement for space constrained devices' offline translation, it doesn't necessarily mean they use the same datasets online, and if they do.. why, because it seems to be enough?
(It's a bit confusing to talk about because surely it is just an older version of the same sort of thing, it's a less large language model right? I just think it could/should/would be a bit larger in the online hosted version.)
You can check if translations are also better on Bard, that would partly answer that question
I assume the low latency you get from Google Translate is not feasible with current LLMs like ChatGPT. Translate is used to translate sentences on the go, live videos, translate entire web pages ; all of these would be too expensive (and slow) with an LLM... but things might change in the coming months/years as the tech improves.
Google isn’t using the supposedly highly multi-lingual Bard 2 for their translation service. That should tell you how confident they are in their own AI products
I only ever use ChatGPT for translation now. GPT 4 especially blows everything else out of the water in quality.
I’ve done some insane round-trips through chains of totally unrelated languages and it retained 95% of the meaning. That’s superhuman quality.
I often need to translate Arabic, and I just use ChatGPT for translation now, it's much better.
Google Translate has been very broken for a while now and will often just completely refuse to translate, or only partially translate or translate in ways that are incoherent
I think there's just a large profit incentive for corporations to treat Taiwan like it's part of China. I found a Leap Frog globe in walmart the other day and it told me Taiwan wasn't a country and was, in fact, just China.
Is it just me or is translation in android completely broken as well? In the past I could select a word and translate it with the translate button. Now it most of the time triggers a Google search and adds words I didn't select.
Because you may have enabled that. I just tried this and it gave me an option to enable the feature to include surrounding text for better context or something.
As for translation, I get a tooltip to translate, web search or Wikipedia search
Must be something else, don't you think? If there's a company in the world who can just "adjust their model to make it bigger" that's Google. But they haven't done so. Perhaps Google doesn't care.
Google Translate is a relatively small product with a relatively small budget. It doesn't bring in a huge amount of users or revenue.
The engineers working on it seem to be focussing on supporting small languages (ie. languages that only a few million people speak - africa and india have lots of those, both big growth markets). They're also working on better apps - for example being to translate text inside images, text from a video feed, live translation of captions, etc.
It makes sense that they probably wouldn't be deploying a huge chunk of compute there to get a few marginal quality gains.
"no" is an abbreviation for "number" and "č" is the correct translation for that interpretation: https://en.wiktionary.org/wiki/%C4%8D%2E (though Wiktionary only has Czech)
Why not roll out your own translation service to translate English to the specific language used on Taiwan? This way you don't have to sit in the waiting room.
>Google Translate has not been accurately translating into Traditional Chinese (as used in Taiwan) for a while now. A lot of times the translation would sound like how Chinese is used in China instead.
Isn't Taiwan a part of China, though? I am not trying to justify Google not caring about Traditional Chinese as used in Taiwan. I am merely commenting as the author's view of China and Taiwan as two different entities while almost all countries don't recognize Taiwan as a separate entity.
Whatever your standpoint on that, the site proceeds to explain there is a specific locale that suggests a Taiwan-specific translation as opposed to one that is commonly used on the mainland, but it does actually result in something that is used in Taiwan.
- mega → 巨型 (百萬)
Really? Do you really want to translate mega to 百萬 (million) in every context?
*Edit: every, not any
- text → 文本 (文字)
Seriously? 文本 is the formal, academic way to say it, even in Taiwan. Have this person never been any academic environment?
- through → 通過 (透過)
Seriously???