Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Mini – The Minimal Language (minilanguage.com)
171 points by aethertap on July 19, 2023 | hide | past | favorite | 85 comments


I note that with a vocabulary of 1,000 words it is roughly 10x the size of Toki Pona, a conlang which also aims for minimalism.

That said, Toki Pona's goal is to help clarify thought, whereas this seems to intend to prioritize communication more highly.

https://en.wikipedia.org/wiki/Toki_Pona


There's an extended discussion about the relationship and differences between the two here: https://minilanguage.medium.com/mini-the-minimal-language-3f...

"The spark that led me to create Mini was realizing that a micro-language like TP could actually work: there’s no reason in principle a language with a limited word-count couldn’t have a simple, complete, and unambiguous grammar alongside a vocabulary based on intelligible word roots designed to handle most aspects of everyday discourse...."


Though if you speak Spanish and English you'll probably be able to guess most of the words, so you may find it easier to read initially than Toki Pona.


I've been making a game system in which the core mechanic is using a limited language to describe magical effects, so it's a long way away from the intended purpose for both Toki Pona and Mini. However, I've found that Mini is far easier to work with and more expressive for my purposes because it's easier to put structure into the statements using the particles to indicate what part of speech is intended for each word. The selection of words also seems to be surprisingly well-chosen, because most of my use cases have been pretty straightforward to express.

I haven't really tried to limit the system to just Mini Kore (which is also 120 words, like Toki Pona, and would be a more direct comparison), mostly because Mini's current size actually seems to have the right feel. It might be an interesting experiment though.


Help clarify thought? While being actively hostile to numbers?


I’m not acquainted to TP, can you precise what you mean here or provide some link about this?


https://lipu-sona.pona.la/11.html

I suppose I can agree with this page saying it "simplifies" thought, but not in the good way.


I think the concept of "simplest naturalistic language" may be intrinsically broken -- a "naturalistic language" is not simple. Natural languages balance between regular rules (e.g. in English, we often add -ed to make the past tense of a verb) and exceptions especially for common cases ("went", "was", "had", "made", "did" because going, being, having, making, doing are all so common). This tension is partly about how much a language user must know/consider when speaking/listening and how efficiently you can say things.

I cannot find a citation quickly, but I recall years ago reading a paper about simulated agents "evolving" a language in a game context where agents had to indicate items to one another, by sending messages which were subject to a noisy channel. Items had multiple attributes (think "small red square", "big green triangle" etc), and experimenters could vary both the noise in the channel, and the entropy of the distribution over items. Naturally if "small red square" is 99% of the things you have to communicate, and there is low noise, agents invent an abbreviation for it. If there's a huge amount of noise and a relatively even distribution over items, then "small small green green triangle triangle" or similar becomes more likely. Languages very naturally reflect both the things people discuss and the environment in which they discuss them.


Your general point is a good one but I don't think irregular verbs are the best example of error correcting redundancy, or evolved shortcutting. In most cases they are just a relic of genealogy, and don't serve those purposes:

> Most English irregular verbs are native, derived from verbs that existed in Old English. Nearly all verbs that have been borrowed into the language at a later stage have defaulted to the regular conjugation.

https://en.wikipedia.org/wiki/English_irregular_verbs#Develo...


Irregular verbs (go/went, and so on) congugate (change according to tense and subject) using rules just like regular verbs, except that they have different rules. The irregular verbs use Germanic conjugations (cf. man/men, child/children) whereas the regular verbs use grammatical constructions from other source languages.


In every language, for every word, there will be some history and source. And one can always declare a "different rule" around exceptional cases ... but that's kind of vacuous, and speakers have to remember which words are subject to a minority "rule", so claiming they aren't "exceptions" seems disingenuous.

But if you look at the words in English for which we have "different rules", and you look at which words in other languages which have "different rules" ... they typically line up with frequency. You'll note that the small list of verbs listed above also happen to be irregular verbs in a lot of languages.


That assertion about frequency really needs some data to support it, because the only example I can think of that is very common in that regard is the word "to be", which is special in many ways other than its frequency.


While completely true, I think this misses the point which makes minimal "natural" language interesting. Sure you don't use one of these constructed languages in practice the same way you don't build your websites with Turing machine tapes. The question of interest is not one of practice but of theory, what is the equivalent of Turing completeness for natural language? What is the minimum criteria of grammar and vocabulary needed to span the space of conversational ability? In other words, what is the minimum needed for a language to even theoretically be "naturalistic" (even if no naturally occurring language ever looks like it in practice)?


not saying those papers are wrong, but 136 years and millions of speakers from _most_ countries and Esperanto's speakers seem just fine without adding irregular verbs.


Turkish might be a better example: it's a real, natural language and highly regular.


Same with Japanese, apparently a completely unrelated, differently structured language.


Fun fact: A subset of the linguistic community conjectures that Japanese and Korean are part of the same family as the Turkic languages. https://en.wikipedia.org/wiki/Altaic_languages


Yes, but it's a really remote relationship, if it exists.


> Natural languages balance between regular rules (e.g. in English, we often add -ed to make the past tense of a verb) and exceptions especially for common cases ("went", "was", "had", "made", "did" because going, being, having, making, doing are all so common).

Yes, but different natural languages resolve this tension differently.

For example, Turkish is much more regular in its verbs (and in general) than English or German.


> The vowels are pronounced like they are in Spanish, Italian, German, and many other languages

... ok, this is annoying. Can't speak for Italian and Spanish, but in German vowels are pronounced differently depending on context. Later, it says the 'o' is meant to be pronounced like in "moment". Moment is pronounced differently in American and UK English. And neither are like Italian "momento" or like German "Moment".

> All of the consonants (b d f g j k l m n p r s t v) are pronounced exactly the same as they are in English. Phew!

Not helpful.


The article doesn’t say it, but the pronunciation is identical to Japanese, which I am fond of in general.

In college Japanese class we were taught the phrase “ah, we soon get old” for a, i, u, e, and o respectively. I found it to be simple and satisfying.


Lots of the world's languages have exactly five vowels corresponding to [a], [e], [i], [o], [u], but Japanese is a bit unusual in that the Japanese [u] is unrounded, so it can be more precisely (narrowly) transcribed as [ɯ]. Spanish has a more "typical" set of five vowels. You would presumably be understood all right if you used Spanish vowels in Japanese but you wouldn't sound like a native, so pronouncing [ɯ] correctly usually wouldn't be one's first priority in learning Japanese. In Russian and Turkish, on the other hand, you would have to make a distinction between [u] and [ɯ]. (I'm not an authority on any of this; I just dabble in phonetics.)


That clarifies a bit, but still leaves me confused at some of the choices made.

Why include both sounds "r" and "l", when they can be tricky to distinguish for some speakers, and then use Japanese as pronunciation guide? The sounds "m/n" are also easy to mix up. Same with "b/v", which are pretty much interchangeable to a lot of Spanish speakers. I think the number of consonants could have been reduced considerably.

I like how the language flows though. It seems like a goal has been to avoid consonant clusters. It feels kind of like Swahili, though I don't speak that at all. The only input I would have on this point is that the verb/noun/adjective markers "i/a/e" would be hard to distinguish against words ending in a vowel, which seems to happen a lot. In rapid speech I see that becoming a problem that would cause it to flow less well, or breed forth a need for a de facto fixed word order for clarity.

What if every word started with a consonant and ended in a vowel, including those three markers? What if we completely got rid of problem pairs like "rl/mn/bv", by removing one or both in each pair? Could we get by using mainly voiced consonants? I kind of want to fork this project and try it out.

To be clear, while I am being critical in this comment, I want to explicitly say also that it is an impressive job to have made a new language, and refine it to this level of minimalism. Perhaps I am wary after having "wasted" a lot of time on Esperanto.


If you noticed, a ton of lexical material in Mini comes from Germanic and Romance languages, so even basic knowledge of English / German / Spanish / French is immediately helpful. Ditching the R/L distinction, or B/V distinction, or M/N distinction would kill most of this familiarity.


Indeed - it is pretty easy to "get" the language as it is (for Europeans). But I'm sympathetic to the suggestion of ditching those distinctions - not everyone has a Indo-European language background. My Japanese wife still can't distinguish those sounds, even after a decade of trying. She has to watch me closely to separate those sounds. Plus F/H, which is also a problem.


In pretty much any language there's no single point on the vowel chart that actually identifies a vowel - it's a spectrum with numerous allophones. Conlangs like this one are generally constructed in such a way as to allow maximally wide spectrum that is still distinctive. So if you pronounce it the way you speak English, that's still fine.

If you want more precision, generally speaking, the value of the character in IPA will match the actual sound value, except for "j".


You are going to get a lot of weirdness mixing a, e end i up.


Yeah, I mixed up two unrelated things that I shouldn't have They don't recommend using English as a guide for vowels for that exact reason - it's one of the most nonsensical orthographies in that respect. But the website only says "like English" for consonants, where it kinda sorta works (although they forgot to say that "g" is always hard).


That's true going between English and Romance languages anyway


I think the best way to see it is like this: vowels like in Spanish and consonants like in English. The Duolingo Stories have pronunciation with a TTS engine https://duostories.org/mini-en and the dictionary has pronunciation with an actual human voice (mine) https://jprogr.github.io/buku-name


> All of the consonants (b d f g j k l m n p r s t v) are pronounced exactly the same as they are in English. Phew!

"T" as in "Trent" is the same as "T" as in "butter" for this person?

"S" as in "pass" is the same as "S" as in "passion" too?

"G" as in "go" is the same as "G" as in "gel" as well?

There's a reason humans invented the IPA.


That sounds like they mean the default pronunciation, and those are pretty clear.

C and G have some ambiguity, but they didn't include C and it should be obvious that G is not going to be the same as J.

> There's a reason humans invented the IPA.

99% of people can't use IPA without an example chart.

But also: "Each letter matches its International Phonetic Alphabet pronunciation with the exception of J, which is the English /dʒ/."


> 99% of people can't use IPA without an example chart.

How many % of people who are interested in constructed languages?


"G" as in ".gif", of course! And why not "T" as in "potion". :)


> > The vowels are pronounced like they are in Spanish, Italian, German, and many other languages

> ... ok, this is annoying. Can't speak for Italian and Spanish, but in German vowels are pronounced differently depending on context. Later, it says the 'o' is meant to be pronounced like in "moment". Moment is pronounced differently in American and UK English. And neither are like Italian "momento" or like German "Moment".

I listened all four (UK/US English, German, Italian), and the 'o' in moment sounded the same to me.


In English, that "o" is a diphthong, for starters - something like [oʊ] usually

Whether it sounds the same to you or not in different languages/dialects depends on how many "o-like" sounds your native language has. If it's just one, then e.g. [o] and [ɔ] can be hard to distinguish, because you're used to treating them as the same thing manifesting in different contexts.


And that's fine, a lot of sounds are hard to distinguish if you're not familiar with that language. For English, here's a dictionary entry: https://dictionary.cambridge.org/dictionary/english/moment Showing IPA for both US and UK: it's a slightly different diphthong. For German and Italian (I think), it shouldn't be a diphthong at all. Not everyone will hear a difference, which makes it even more helpful to precisely define what the sound should be. Or just don't put any rules in your instructions, or tell people that it's flexible or whatever.


Reminds me of some similarities to Arabic. Arabic uses root words, usually 3 consonants, that mean many similar things with surrounding letters. K-T-B means writing. Kitab means book. Kitaba is writing.

The script is hard, and you have to learn enough of the roots and recognize them to get the meaning. Indonesian is slightly similar: tinju is boxing and petinju is boxer. Prefixes on roots to build up and guess meaning from context.


I like Arabic's diacritic system which makes pronounciation of a word you've only previously read predictable.

I remember once pronouncing "stoic" as "stoyc" instead of "stow-ik" once in English for example. My limited knowledge of Arabic indicates that one diacritic produces "aah"-like vowels, another produces "ooh"-like vowels and another "iih"-like vowels, and even though some other modifiers come into play later, it's still predictiable how a word is pronounced just from reading it. Would be happy to be corrected if I am wrong.


If you're a fan of diacritics, you can write stoïc in English. Apparently the New Yorker uses them still. https://www.newyorker.com/culture/culture-desk/the-curse-of-...


Redundancy in a natural language is not necessarily a bug. It can be considered a feature. Speach is transmitted over a noisy channel (as everybody knows who has ever tried talking/screaming to a friend on a busy street or a concert), so needs to contain redundancy for error correction purposes. A lot of that is context (there are only a handful of things my friend could be screaming at me at a given point in time), but a lot is that it's enough to hear parts of a sentence to infer what it's about.

Many different contexts make use of this redundancy. Air traffic communications is another example where synonyms are chosen to minimize misunderstandings yet still be concise.

Minimizing redundancy also minimizes synonyms, which can be undesirable. Another example is poetry.


  Like Wilkins’ Real Character, a priori languages attempt to decompose the elements of thought into distinct atomic units and build up larger linguistic constructs from those simpler units.
  
  A posteriori languages like Esperanto take a very different approach: rather than starting from scratch with a set of basic concepts, they attempt to pave over the unnecessary grammatical quirks and complications of natural language to create something which is simple and easier to learn.
  
  Mini’s goal is to fully realize both of these visions: to have, at once, a set of linguistic primitives which can be combined to discuss any topic, while ensuring that those primitives are themselves borrowed as directly from natural languages as possible.
https://minilanguage.medium.com/mini-the-minimal-language-3f...

Yeah, I don't get it. In Esperanto you don't use particles, but you change the endings of the words, according to their roles in the sentence. How is Mini fundamentally different?


Looks reminiscent of pidgin English.

https://www.bbc.com/pidgin


Toki Pona is another one.

I made this 4 letter language: http://move.rupy.se/file/talk.txt


Maybe I'm just dumb but it claims simple phonetics but even after reading the pronunciation ("Say it like you mean it") section I still don't really understand how it's supposed to sound?

The singular "a" between verbs and objects, is that a long or short vowel sound - do vowels have short and long distinctions at all? It says all the consonants are pronounced how they are in English - but consonants don't have just one sound?


There's no short/long distinction.

For consonants, when they say that, what it usually means is "consonant by itself" (i.e. not a part of a digraph like "th", and not followed by a vowel like "e" or "i").


Vowels sound a bit more like in Spanish. The dictionary has pronunciations https://jprogr.github.io/buku-name/


[flagged]


Breaking the site guidelines like this will get your account banned, so please don't.

If you'd please review https://news.ycombinator.com/newsguidelines.html and stick to the rules when posting here, we'd appreciate it.


Kore cult (fanatics) lost "Mini" momentum: https://minilanguage.medium.com/speaking-mini-kore-552f787db... "minimal" needs criteria given speech coding & compression & associated parameters.


What is the Kore cult?



Seems interesting as a project, though I doubt the stated goals can be achieved with any new constructed language. I mean, congratulation for the great work, thanks for sharing this with the world and wish you all the luck to succeed with these goals, sure.

Let’s say this is technically the best solution among simplest naturalistic language ever conceived so far to use as an international auxiliary language. This is a ecological niche already largely populated.

Providing the best technical solution, as we know, is only the optional cherry on the tip of the iceberg. What really matters for the stated goals is the community. You can definitely attract a few conlang lovers with some elegant proposals, but that’s about it.

So what’s the plan for making Mini endorsed by a large sustainable community? What kind of ideals, values and social goals it is attached to? What Mini brings on the table for its aimed community to thrive that no other previous constructed language can also provide for people who don’t have ease of learn as sole and primary consideration?


Interesting, this reminded me of Toki Pona ( https://en.wikipedia.org/wiki/Toki_Pona), but it has different goals (Toki Pona was not intended to be an auxiliary language)


I don't think many conlangs will ever succeed because of the relatively huge popularity of Esperanto.


Conlangs are fun, but what will probably happen is we will evolve more efficient communication protocols as bandwidth increases via multimedia communication, Neuralink, etc.


LLMs will talk to each other using embeddings.


FTR LLM embeddeds are a lot less efficient than ASCII for communication and are likely to stay that way.


absolutely not. Transformer layers already communicate using embeddings, and ASCII would be absolutely less efficient there.


And how many bits are in an embedded vector?


12k for gpt3.


It is not bits, but weights


So somehow ascii is less information dense than 12k 32-bit floats per token?


Mini's logo appears to be the M from MIT's logo.


It’s very similar and would easily be confused, but looks like it’s a bit wider and more sparse than the MIT one


Came here to say that! I thought it was one of MIT's projects!


It tickled me to see a changelog for a language there, so i had GPT write one for English too.

Change Log for English Language Evolution

Version 1.0: Proto-English (450 CE)

    Initial release of Proto-English, a West Germanic language spoken by Anglo-Saxon tribes.
    Basic grammar and vocabulary established.
    Development primarily led by "Linguistic Trailblazers."
Version 1.1: Viking Invasion Patch (850 CE)

    Introducing Old Norse influence due to Viking invasions.
    Added Norse loanwords and grammatical structures.
    Integration efforts led by the "Language Fusion Guild."
Version 2.0: The Great Vowel Shift (1400 CE)

    Major phonological update affecting long vowels and diphthongs.
    Unprecedented vowel sound migrations across the language.
    Executed by the "Phonetic Alchemists Consortium."
Version 2.1: Shakespearean Lexical Expansion (1600 CE)

    Extensive vocabulary enrichment, drawing inspiration from literary works by William Shakespeare.
    Introduction of numerous idiomatic expressions.
    Collaborative effort involving "Poetic Linguists Guild."
Version 3.0: British Empire Localization (1800 CE)

    Localization effort to adapt English for various regions within the British Empire.
    Incorporation of local dialects and vocabulary.
    Localization project overseen by the "Imperial Language Commission."
Version 4.0: American Revolution Fork (1776 CE)

    Creation of American English variant with notable vocabulary and spelling differences.
    Introduction of simplified grammar rules and new expressions.
    Led by the "Patriotic Language Architects."
Version 5.0: Globalization Update (20th Century)

    English becomes an international language due to global interactions.
    Inclusion of loanwords and phrases from various languages.
    A collaborative effort by the "Cultural Linguistic Exchange Taskforce."
Version 6.0: Digital Age Upgrade (Late 20th Century)

    Vocabulary expansion to encompass computer science and technology terms.
    Introduction of internet slang and acronyms.
    Driven by the "Cyber Lexicographers Consortium."
Version 7.0: Modern Dialect Divergence (21st Century)

    Increasing divergence between regional dialects due to globalization and migration.
    Emergence of unique vocabulary and idiomatic expressions in different English-speaking communities.
    Monitored by the "Dialectologists Guild."


The change from 1.1 to 2.0 completely missed the Norman invasion which introduced Norman French as the new language in town. Everything which happened later was heavily influenced by that.


Now translate the 20 minute tutorial to mini.


> Mini is a man-made language designed to be as simple as possible.

Okay. Now I want to know about non-man made languages.


Well, not everything we're involved in and accidentally contribute is necessarily _made_ by us, in the sense of intentionally and purposefully brought into existence.

We just happen to learn one or more languages, and pass down a few of our mistakes and innovations to others, the vast majority of which have no effect on anybody.


"Made" and "evolved" aren't the same thing.


You can say that most languages aren't designed by people, but they were definitely made by people.


The original human language? Whale and bird songs?


I wonder how hard it would be to write a prompt that would get gpt 4 to translate back and forth to Mini?


I'm excited about the language but it's impossible to google anything about it :(


Yeah, it's hard to Google with that name. I have put together a page with info about the language https://jprogr.github.io/mini-resources


What would a programming language matching those design criteria look like?


The logo looks very similar to MIT's logo.


Chinese grammar is far more simpler.


(2020)


More writeup from the dev when they did a Show HN: https://news.ycombinator.com/item?id=24386863


By sheerest chance I read the same anonymous Japanese basket weaving forum where a link to this was posted just before its appearance here. Well played OP.


Which forum? May I have a link?


It's from a ~daily Japanese language thread, the comment in question is simply 'reverse nihongo just dropped https://minilanguage.com/' and did not spur any further discussion. I just happened to notice it at the time and thought 'mildly interesting, maybe I should submit this to HN'. When I visited HN later in the day I was amused to see it already here.

/jp/ is not an adult imageboard but still somewhat NSFW, consider yourself warned https://boards.4channel.org/jp/thread/44068025


reddit.com




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: