Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Citrine: Localized Programming Language (citrine-lang.org)
53 points by MindGods on July 18, 2020 | hide | past | favorite | 71 comments


I don't get it at all. Nothing stops you from writing you code in your own language in most other prog langs (as long as it has unicode support). The reason a lot of us chose to write in English is because we want to share our code and English is the most universal lang we have right now. As a non native English speaker I most definitely see the problem in this but it is also an imperfect world and communication is one of those problems which will never have a perfect solution.

Programming languages should imo from the offset be steps ahead of this. They have simple rules that can be parsed despite understanding the subjectives. I could probably understand some rust code written in mandarin after studying it for a while. In our line of work it is important with clear communication. Spoken languages are not good ways of accomplishing that.


I've said too much about this subject in the past [1], but my litmus test for "localized" programming languages is a Korean support (both because I speak it natively and because it is very much different from most Indo-European languages). It spectacularly fails. At the very least it is evident that Citrine only ever cares about languages with prepositions (e.g. `x on: 'greet:' do: { ... }`).

[1] https://news.ycombinator.com/item?id=21352775


You don't have to go as far as Korean - that BS fails H A R D even with other Germanic languages.

It's like whoever did the translations just took the English version and chose the first entry in the dictionary without even a second thought.

The translations are definitely machine translations and not done by native speakers. This results in hilarious constructs that are hard to decipher even when knowing the context.

As soon as I saw inconsistent grammar and even simple words being completely mistranslated, I dismissed the whole thing.


“Citrine only ever cares about languages with prepositions”

You’re confusing keyword/identifier localization for natural language. I agree with everyone who says NL is the wrong problem to solve, but that’s not what Citrine is attempting to do.

Look at it this way: Citrine code is NOT natural English either. At most, it is a sort of “pidgin English” where individual words can be readily understood but the grammar is wholly artificial.

For instance, a Chinese or Japanese dialect should continue to use whitespace to separate identifiers. It’s a programming language, not a natural one, and that difference is the point. Trying to “fake it” leads to dead-end mistakes like AppleScript. Trying to do it for real creates exactly the sort of ambiguities and imprecision that a programming language is meant to eliminate.

It’s about finding the right compromise between human accessibility and machine precision, and anything that breaks down barriers between the traditional US English-dominated programming world and the billions of humans who speak something else has got to be a step in the right direction.


I definitely agree that localized languages can only reflect a subset of human languages (and it's of course true even for English-based languages), but that doesn't make my point moot. If the author did really care about other languages and wanted to pursue the fixed syntax, every identifier should have been strictly nouns or verbs and nothing else; for example (say) `animate(source: X, target: Y)` instead of `animate(from: X, to: Y)`.


“every identifier should have been strictly nouns or verbs and nothing else”

Totally agree. TBH I don’t think the authors’ enthusiasm, while admirable, is quite matched by their knowledge and experience. A bit more up-front learning could probably save them disappearing down some obviously wrong paths.

As you say, they should focus on verbs and nouns, and getting those to machine-translate precisely. Nailing that would be a big deal just in itself. If they later want to experiment with adding prepositions to make code read “more naturally”, make the system add those on a per-language basis according to per-language rules laid down by native-language speakers.

Thus in English the method signature might present as `animate(from: source, to: target)`, because that’s what English readers expect and like. But that doesn’t mean that `from` and `to` should be translated to every other language, because that’s just an exercise in cutesy-clever nonsense, burying what is significant under what is not, and software already suffers from more than enough of that.


There's little reason, I suppose, why internationalizing the code of a software program is any less challenging than internationalizing an application.

That is to say: language keywords, function and variable names, comments, documentation, and perhaps even multi-token structures might all require associated translation strings.

And as with translating an application (or any text, really), the source material might be written in one native language, and only later be translated into others based on demand and interest.

Software projects have the interesting property that partial translations might make sense under some circumstances. A commonly-used API function could be translated first, for example.

It'd be impressive to work on a project where multiple developers are each individually contributing to literally the same codebase, but each reading and writing it in their own native language(s).

(clearly a situation like that would require a lot of input from translators too - ideally with context for the software language and the project itself)


Korean is special because it still has the one remaining unicode bug in the spec, U+ffa0 HALFWIDTH HANGUL FILLER is still a valid identifier. http://perl11.org/blog/unicode-identifiers.html

Almost nobody got that right.


The site isn’t clear on it, but from downloading the translation package (http://citrine-lang.org/downloads/headers.tar.gz), it seems translation is a simple string replacement. I don’t think that’s enough to cover all languages.

https://en.wikipedia.org/wiki/Non-English-based_programming_... has a list of languages that do at least as well (I think AppleScript tried to do better, but can’t find examples)

Of course, switching language for the programming language itself won’t translate identifiers, so multi-language teams still will have to decide on the language(s) used for those.


“it seems translation is a simple string replacement. I don’t think that’s enough to cover all languages.”

Honestly, simple string replacement isn’t good enough to cover any languages. Homonyms and synonyms, anyone? I suspect machine translation will be harder with code simply because there’s a lot less contextual information around individual words compared to ordinary prose with which to make a best guess as to meaning.

“I think AppleScript tried to do better, but can’t find examples”

Dr Cook’s HOPL paper on AppleScript gives an example of French and Japanese dialects (p20; can’t it paste here as it’s an image):

www.cs.utexas.edu/~wcook/Drafts/2006/ashopl.pdf

Not really the same thing as Citrine as it relied on manual localization of an application’s resources (custom keywords defined in AETE terminology, vs labels and tooltips on GUI controls). And, as you say, it did not do anything to localize user-defined words.

In any case, AppleScript is a really good example of how NOT to design an accessible language syntax. All the rigidity and tolerance of a machine language, with all the complexity and ambiguity of a natural one faked on top. (Plus all the fun that comes with arbitrary keyword injection—argh.)

This is why I say artificiality is good. Humans aren’t after Shakespeare, they’re after high-level understanding of program code.


“Homonyms and synonyms, anyone?”

I think, for this case, that’s mostly solvable by being careful to not make assumptions. For example, you should create separate translation strings for the ‘for’ in “for…in” and in “for…each”, and for the ‘define’ in “define function” and “define procedure”.

Also, when (not if; if your language is successful, it will happen) translators report problems, you should ‘simply’ not hesitate to introduce new ‘clones’ of to-be-translated words.

Even ignoring languages with different word order, that wouldn’t make things perfect. For example, there will be languages where the correct way to say “define” depends on the (perceived) plurality or gender of the function name.

I think the correct way to handle this is by letting the translator produce a grammar for your language that produces the same tokens as the ‘original’. That probably is beyond many would-be translators, though.


This is just silly. The whole point of having an abstraction like a higher level programming language is so that we can all work together on tasks instead of spending time on language details.

Unless you are working on an isolated system in an isolated setting that will be destroyed rather than merged, none of this helps. It's the same as localised programming some people do today where you simply can't work together because you lost the most important common denominator.


The Chinese version looks like just bad machine translation.

`Object` is translated to `宾语` which means the grammar component object in subject, verb, object, etc.

The `power` operator is translated to `功率` which means A measure of the effectiveness that a force producing a physical effect has over time.

`Ceil` is translated to `细胞` which means Cell ???

:-(


> `Object` is translated to `宾语` which means the grammar component object in subject, verb, object, etc.

Sounds right (though OOP treats objects artificially as grammatical subjects, they are the things on which functions operate, and thus more like what would normally be grammatical objects; they are the patients rather than the agents of actions.)


There is an established, domain-specific translation for this and many other terms of art (in this case, 对象)


Ouch. Sounds like they’re using a general-purpose, not domain-specific, dictionary for their translations. That might suffice for an initial private proof-of-concept, but not for a public audience.

Imagine translating a medical or legal textbook without knowing the proper professional medical/legal terminology. Those target audiences will tear it a new one, and quite right too.

+1 for authors need to do their homework better.


The word "object" in object-oriented programming is best translated as "thing" or "entity". It has nothing to do with grammar; it's representative of "things" with properties and capabilities. If you want to get grammatical, what of ergativity?


> Citrine is one of the first∗ embeddable∗∗, general purpose, localized programming languages (...) designed to allow every man to write code in his mother tongue

This concentration of asterixes had me laugh out loud. Anyhow, this reminds me of so-called “auxilliary”, constructed languages of communication, which are supposed to be easy to learn all over the world for as many people as possibe all over the world. In reality though, they’re almost without exception wholly eurocentric, built mostly on germanic and romance languages, which barely covers one of the main language families just in europe... likewise, this citrine promises grossly more than it delivers.

Though we shouldn’t stop at that; if we look away from what it claims to be, what it is seems pretty neat. For education, maybe this is great.


I've written once an internationalized parser for another (more established) language. As long as you can output an AST, this works.

User feedback was interesting. Some non-native speakers of English objected to this, on the basis that it was a lot easier for them to distinguish language keywords (in English) from user symbols (typically in the user's language); language acted as a kind of syntax highlighter for them.

Also, they felt it disorienting to express in their own native language programming constructs that they'd learned in English. For example when one translates "a `while` loop", the keyword `while` is often left untranslated as there is no equivalent domain-appropriate word in the target language, and the generic translations don't "sound" right.


“they felt it disorienting to express in their own native language programming constructs that they'd learned in English”

Trust me, going from natural-language English to programming-language English is just as discombobulating. (Been there, done that; made a deliberate point of remembering that experience.)

If you’re only testing it with existing programmers, you’re bound to skew your results because anything new and different to what they’re already used to is a disruption to their established flow. As in any scientific research, designing the right control for your trial is critical to avoiding GIGO.

A better test would be taking groups of English and non-English non-programmers, and teaching both of them entirely from scratch. (This assumes, of course, that each group’s teaching materials have been human-localized to the same standard, and all trainers follow the same script.) That way you avoid polluting your results with preconceptions and existing biases.

Out of interest, is there any more public information of your work here? iris-script[1], my own end-user-friendly language project and obvious anagram, has a way to go before I can begin to explore localizability, but it’s on the TODO list so I am collecting links to relevant material. Ta.

--

[1] https://github.com/hhas/iris-script


> A better test would be taking groups of English and non-English non-programmers, and teaching both of them entirely from scratch.

In a way, we already have this test today. Most of the world's programmers are non-native speakers and, by and large, they learn programming languages where keywords are based on English. I've heard people thinking that language localization would be a cool thing, but I haven't heard anyone complain about it not being available.

> Out of interest, is there any more public information of your work here?

No, my comment is all the public info there is about it. My $.02, localizability wouldn't be what I would worry about for a proof-of-concept language.


It also is bad for googling. Imagine using a version for a not so popular human language. Chances are you would need to know the English and/or Mandarin versions of language constructs anyways. If so, how much does writing in your native language add?


But isn’t that true when googling for ANY information?

It’s not a coincidence that big search engine vendors are also busy on machine translation. It’s not a zero-sum game/catch-22. Rather than look at where we are in 2020, consider where we should be in 2030. As automatic on-the-fly translation of general search results improves, programming-specific search results will naturally improve too.


That’s true for any information, but why would you introduce that problem for your (certainly initially) fringe programming language?


Hi, I am the creator of the language, my name is Gabor. You can ask questions if you like.

To answer some:

- Yes, we use machine translations, they serve as an example, they are far from perfect. Some language files are translated by native speakers. I think the website needs to be more clear about this.

- I use gendered language because coding is a men's job, women belong in the kitchen! ;-). No, just joking. Women are also welcome to become Citrine users. I just think the opening sentence is beautiful, it combines the concepts of male and female in a lovely, natural way ignoring today's PC-bullshit.

- No, Emoji-language is not allowed in the core. I only support natural languages. Endangered languages (EGIDS6 and higher) are also welcome. There is no limit.

I understand that there will be a lot of hate because of this language. I even received death threats over it. When a young developer I worked with brought up the idea I even laughed at him. However as I thought it over, the idea began to grow on me and I longed for a purely Dutch programming language (I had created one as a child for the C64 by just overriding the BASIC tokens). I figured that, if I longed for such a thing, maybe others do as well. I decided to share my code after some years just to give anyone interested some kind of basis or just discuss it.

It is important to realize that Citrine is trying to strike a balance. Programs will never read like a book. However, having a programming language using your own words and grammar just feels better and makes me more productive, I also tend to make fewer mistakes. The problem with just mixing Dutch with English programming languages is that is extremely ugly, also you never know when it's justified to use Dutch or English, especially when interacting with established English conventions, 3rd party software libraries or embedded languages in code (like shell or SQL). The other solution, translating everything into English is just horrible. I have encountered so many bugs that stemmed from miscommunication because of translation issues to English that I believe this will become a dead end eventually. One technique I am working on, that might help to improve the readability even further is simple macro processing, so you can say 'create a new Object' instead of 'Object new'.

Anyway, if you have any questions let me know, always happy to answer ;-)


“I use gendered language because coding is a men's job, women belong in the kitchen! ;-). No, just joking. Women are also welcome to become Citrine users. I just think the opening sentence is beautiful, it combines the concepts of male and female in a lovely, natural way ignoring today's PC-bullshit.”

Dude, really. If you gave a crap about effective communication you would not just have said that.

Hell, you would not even have thought of saying that; never mind typing it, reading it back to yourself, and then hitting Post cos you still think it’s a good idea. SMH

Your enthusiasm is admirable but your limited expertise is clearly showing. Instead of saying that you’re here to answer our questions, you should be the one who’s listening to our criticisms and then asking searching questions of us. A bit more humility and a lot less hubris. You and your product will be a lot better for it.


If you have technical criticisms please share.


Go read the entire thread then, because that’s all you’ll be getting out of me and probably quite a few others now thanks to your awful attitude.

Programmers like you are the reason I finally taught myself how to code, so I would never have to depend on your sort for anything. You’re a smug, condescending martinet with a grossly inflated sense of your own specialness, and the sooner you grow up/the world kicks you to the curb, the better.

So here’s me expressing my freedom of thought and expression by having nothing more to do with you.


I applaud your effort. Programmers, of all people, should understand the effect that language has on shaping thoughts.

My biggest question is - why couple a new programming language and the translation system? Either system alone would be difficult to get a foothold. Wouldn't you be better off picking an existing programming language that you really like - one with good semantics and lots of libraries - and implementing the translation system on top of that? Some sort of Lisp would seem to be a natural fit. What does your language offer to someone who is already comfortable with English-based programming languages?

(sorry for double-commenting, but I didn't want to start a merged thread for two very different topics)


When I created Citrine the original purpose was not to make it localized, just as readable as possible. However, the language evolved into something different. At a certain point I figured that the grammar allows for a lot of flexibility which could be used to turn it into a localized language.


>I just think the opening sentence is beautiful, it combines the concepts of male and female in a lovely, natural way ignoring today's PC-bullshit.

Honestly, I am astonished that you can be so conscious of the effect of language on thinking in the context of programming, and yet so tone-deaf when writing your copy. This is not "today's PC bullshit", it is an effort to eradicate structural sexism going back at least half a century. It is not lovely and natural to use "man" as a default for "men and women" - it rings strangely to the modern ear, and you have attracted a number of comments about it. Maybe 40 years ago taking this position would have been forgivable, but in 2020 you are very much on the wrong side of this battle.

I commend this article by Douglas Hofstadter to you. It was written in 1982: http://leeclarke.com/courses/intro/readings/Hofstadter_Chang...


Love it. I phrased it this way just to trigger you. There is no structural sexism except maybe in some Islamic countries - but I don't hear anyone about that. Women are free to do whatever they want. If they feel offended by words they probably don't belong outside the house.


Well congrats on being a Grade-A douche. I strongly recommend you go find a new hobby—one you can play by yourself—because with salesmanship like that your project is already dead.

https://jaxenter.com/wp-content/uploads/2017/04/women-in-com...


The Citrine license does not forbid women from using it, that would be true sexism. Abolishing words because of 'gendered language' is not something I believe in. I believe in freedom of speech and freedom of thought. Citrine is a project that cannot live without a free mind. As such, conforming to groupthink would kill Citrine and there would be nothing left to share anyway.


>I phrased it this way just to trigger you.

This is unfriendly.

>If they feel offended by words they probably don't belong outside the house.

I'm afraid that with this you move from "tone-deaf" to "overtly sexist".


I admire your efforts, but it seems to me it would be easier just to get everyone on Earth to agree to speak the same language. That would merely be impossible.


The grammars langauges are so different, can't imagine how this should work in practice.

Sadly the site is dead (503).


Works fine for me (the site, that is).

Spoiler alert: it doesn't work at all in practise.

Grammar is one thing, but the "translation" even fails with single words.

It's basically complete garbage.


This is comical. In Japanese nil is translated to ナイル, like Nile River. It looks like their machine translation software aggressively corrects "errors" in input.


In Java (among others), you can use any unicode names. Yes, reserved words stay the same ("if", "public", etc), but most of the code still reads like native.

Also I've seen business ERP languages, completely localized. Those who program on them usually laugh on it, they say it's not hard at all to get used to English keywords, business logic is what's hard. Even better, the fact it's English means it's built-in.


Is anyone aware of tooling or languages in existence that allow for maintaining codebases with multiple supported languages? I'm imagining something similar to a git hook that performs translation at an AST level, where library can provide a "strings database" that maps it's public members into arbitrarily many languages. I could imagine for multinational companies, it might even be worth hiring dedicated translators to maintain manual translations as part of such a system, although it would definitely be desireable for it to fall back to machine translation.

That way, anyone could check out a version of the code that's localized to their native language, including language primitives, standard library calls, variable names, comments, etc. without fracturing the whole library ecosystem by language boundaries.


>variable names

Ouch. Naming things is, famously, one of the two Hard Problems of computer science. How much harder if you have to name everything, down to the last loop iterator, thousands of times - once for each language that humans speak?


This is cool! Has anyone here taken a computer science course in another language?

I have wondered if non-English-speaking universities have to teach their CS courses in 25-50% English, just due to the syntax of the programming languages generally being in English


I took a CS course in Polish and had no problem at all with programming languages syntax in English.


I have! I actually really enjoy taking computer classes to learn computer languages that I already know in languages that I am studying, e.g. I took a Java class "Initiation à la programmation (en Java)" at the École Polytechnique Fédérale de Lausanne on Coursera. It is a great way to learn new words and build up your vocabulary. I am looking for more German courses now.


I think a localized programming language like this could be useful as a stepping stone while teaching gifted kids, for instance (before moving on to a “real” programming language like Python or Java).


The grammar of localized programming language should be specifically designed. Only replacing the keywords and operators seems strange in some sense.

Besides, the example in my first language is unreadable.


This actually looks cool! Besides the "localized" stuff, it borrows some good parts from Smalltalk, JavaScript and Lisp. I'm curious now.


They lost me at dynamic scope.


I'd love to have this sort of i18n for Python or JavaScript, and a good kid-friendly environment. Scratch does i18n well. Text-based programming doesn't. There's a gap in age between programming and English fluency.

What's odd to me is the page is internationalized. Their explanation: "As a developer, you have to know some English. Nobody is going to change that anytime soon."


The first? Algol 68 allowed implementations to have language-specific versions of keywords. I remember seeing a French language Algol 68 book in the stacks at the OU library that had listings with French keywords. I don't remember what they did with the portmanteau words that Algol 68 (and much more so in the Revised Report) favored, like "ouse" and "elif".


I remember using Université de Grenoble ALGOL 60 circa 1968 on an IBM 7044. All of the keywords were in French. I don't remember if anyone hacked it to add English equivalents. It was kind of fun to say 'DEBUT' for begin.


If my memory doesn't fail me, Microsoft Office Basic in the 90s had localized keywords. Word, Access, Excel had English keywords in the US edition and Italian keywords in the Italian one. Probably the same in all the other countries. I don't remember if programs were portable across languages. VBA is 100% English now and has been for a long time.


This is the first language and open source project I have contributed to. I would never have believed that one day my name would land on a page of a programming language site. Thanks to open source, thanks to Gabor, and to all that contributed to making me reach this level.



I first wondered, why there is not an english version of this language, because there was no “en”…apparently citrine things, “us” is a language…


I just looked on the translations to my native language and it is hilarious. Need more popcorn.


Weird how a language that exists to be more welcoming and accessible uses needlessly gendered language on its website.

"Designed to allow every man to write code in his mother tongue" could easily be written as "designed to allow every person write code in the language they are most comfortable with."


It did say that, before they ran the site through Citrine.


that's actually quite funny ;-)


> designed to allow every man to write code in his mother tongue. Hopefully, by doing so, Citrine will make coding accessible to a wider audience.

One day that audience might extend to women.


My thoughts exactly! but it looks like op is Dutch in which language google translation tells me "mens" can mean "person", so hopefully it is just an esl thing


Which is just a perfect example of why anything "universal" is quite hard in practice when these misunderstandings happen even between English and Dutch.


Not really, it is more like counter-example; if the author would have been writing in his native language this might not have been issue. So it supports the claim that allowing people to work with their native languages would reduce mistakes.

Of course this also implies that the translations should be done by professional translators, instead of the authors themselves.


Which I'm sure the authors would agree with:

> The Citrine community is working hard to provide translation files. We use machine translations if we can't find a translator yet. We appreciate any help to improve language support!


Well, no, 'mens' translates to 'human' and wouldn't normally be used in this sentence (it'd put unnecessary emphasis on the species of who would be programming, which just feels weird).

I can't think of a Dutch version of the same sentence that wouldn't make the subject explicitly male (well, there are some archaic versions that would use 'alleman' but I can't think of one that wouldn't sound weird).


Supposing, of course, that this is an ESL problem, I'd expect the original Dutch sentence to have used the Dutch word 'men', which means the same as the English 'one' as in "one must not ...", or the 'man' in 'mankind'.


Ah, but can it speak Emoji? Because, bless them.

Not the first time I’ve seen a “let’s make coding more accessible” project make this mistake, alas.


[insert your own crying-with-laughter emojis here, cos HN’s comment form ate mine]


“Man” obviously means “a human being” in this context.


Sorry to be dismissive it's almost always essential to allow international contributors and english is by far the most common international language.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: