Xs: a concatenative array language inspired by kdb+ and FORTH

BiteCode_dev · on June 7, 2020

Sometimes I feel that, instead of APL/J/K being standalone languages, we should standardize a mini-language like we did with printf, date formatting or regexes, and embed it in all general purpose languages.

qznc · on June 7, 2020

Why a mini language instead of a library?

Python has Numpy to satisfy most array programming needs, for example.

BiteCode_dev · on June 7, 2020

Good point, and I'm not a big fan of DSL. I think most of the time, we should make a better API.

But for regexes, it has been a success: they are rarely used, but when they are, you are glad you have them.

In fact you probably need to do both: re in Python is a library. A library that implements the regex DSL.

Maybe there is a way to create an API around numpy that translates a J/K/APL like DSL into numpy operations.

Just like regexes, this is not something you want to use often.

But I assume that, just like I don't want to describe a whole parser when I just need to match/split/extract text, people doing array processing probably don't want to describe every single steps for very common operations in some cases.

Y_Y · on June 7, 2020

I did exactly that a couple of years ago, turned out it wasn't hard to translate J expressions into the equivalent numpy code and then eval it in the current context. I'm afraid I've lost the code, otherwise I'd post it.

klibertp · on June 7, 2020

Numpy is a mini-language, effectively a DSL. It's just that Python offers some features which allow for embedding the DSL into the language, making it an internal DSL. In languages with less flexible/overridable syntax - for example, if indexing would be limited to a single expression only - you'd have to write all the indexing, searching, and assignment operations "long-hand", with method calls. Well, depending on the API this may still be workable, but it would be much less convenient, slower, and more error-prone. The external DSLs are a way around this.

TL;DR: if the language is permissive enough, a library may well provide a DSL for whatever it's doing, it's not an "either lib or DSL" situation.

BiteCode_dev · on June 7, 2020

To me, you get a DSL either by having a different parser, or by using a macro system.

If you just use the flexibility of your language syntax, you just create an API, not a DSL.

The L in DSL stands for language, which numpy is, IMO, not.

klibertp · on June 7, 2020

Well then, what is the difference between a DSL made using a macro system, and one made with dynamic language features, reflection, operator overloading, and similar? Both approaches embed a sub-language into its parent language: both are DSLs, of the internal kind. A quick google for "internal DSL" returns, for example, this article from Martin Fowler: https://www.martinfowler.com/bliki/InternalDslStyle.html which seems to support my take on this.

> The L in DSL stands for language

I think you use a too narrow definition of a language, then. Any (sub)set of semantics can be called a language, with or without syntax. DSLs encode a set of semantic rules which are best suited for expressing solutions to problems in a given domain. Once you have semantics worked out, you have to think about the environment where the language will be used. Sometimes it's useful to have the DSL code separate, in which case you end up with an external DSL, like HTML or YAML. On the other hand, sometimes you need tighter integration with some environment, in which case you end up with an internal DSL, like JSX, Datalog, Kanren, Rake, Gradle, Gulp, various URL dispatchers, BDD testing frameworks, and yes, Numpy.

The syntax of a DSL ("a different parser" vs. not) is secondary at best, and the implementation strategy (macros vs. runtime code generation vs. reflection and syntax overloading) is simply irrelevant to something being a DSL or not.

Take a look at PyParsing or LPEG (for Lua) or scala.util.parsing or Parsec (for Haskell). These are all embedded DSLs meant for creating parsers. You work with them in exactly the same way you'd use YACC: you describe the grammar and get a parser for that grammar. That grammar description is done with a language, in all cases. That language - implemented either externally, or internally with host language features - is a DSL.

JadeNB · on June 7, 2020

The tutorial says that 2 + 5 yields 5. I originally thought it was some gotcha about trying to use infix operators in a concatenative language, but the documentation says that it's correctly re-written internally, so I think it's just a typo.

(Incidentally, I tried to C&P the relevant section from the documentation, and I can't highlight in the text at all. How does that work, and why?)

weareconvo · on June 7, 2020

Come to Homer's BBBQ. The extra B is for BYOBB.

pwdisswordfish2 · on June 7, 2020

It appears the author was not inspired by the small binary size of k or FORTH interpreters.

klibertp · on June 7, 2020

Probably because this is a debug build, but yeah, 25Mb for the executable is quite a lot, considering gforth is 225Kb and J ~2Mb.

Well, Xs looks very early in the development, so I think the large list of dependencies is due to an initial effort for getting something up and running quickly. It'll probably get better over time. I think production build of Haxe (also written in OCaml) was about 2Mb, so there's definitely room for improvement :)

smabie · on June 9, 2020

I'm the author, yeah I need to ditch Core. It's fucking massive. I would have thought Jane Street gave a shit about binary size, but I guess not..

klibertp · on June 7, 2020

Looks really interesting! I will definitely play with it tomorrow. I used Forth for a bit in the past, and played with Factor a bit; my impression is that Xs looks a bit like the latter, with some parts from J on top (I don't know K, sadly, so I can't say what features Xs borrows from it).

quantified · on June 7, 2020

Same impression.

cheez · on June 7, 2020

what is this obsession with terse languages that look like a composition of random characters, didn't we learn from Perl

braythwayt · on June 7, 2020

I don’t think that ‘terseness” is “the problem with Perl,” any more than “Too many parentheses” is the problem with Lisp.

APL is a terrific language that is terse. It might not be the right tool for a job where you are constantly recruiting interns to hack on a massive code base, but it’s a fine language for its designed purposes.

Which tells me that terseness in-and-of-itself is not a problem. It presents a tradeoff that may not be right for everyone.

kick · on June 7, 2020

APL came before Perl, and APL is regarded by many to be one of the best languages overall.

I'd recommend reading Iverson's "Notation as a Tool of Thought" for a new perspective.

Plenty of people, from Richard Stallman to Alan Perlis, have found great value in APL. Even if just looking at terseness, Chuck Moore, the creator of FORTH, is considered to be one of the best, if not the best, programmers of all time. He credits it all to terseness.

alfiedotwtf · on June 7, 2020

Perl is a beautiful language written by a linguist who thought hard about salience. This is why it’s such a shame that other languages don’t have sigils while others prefer ugly hacks like Hungarian notation :smacks forehead:

rscho · on June 7, 2020

Terse languages are a must for single or few-user experiments, e.g. research. Not everyone is implementing a commercial website that needs to be understood by 50 people.

j88439h84 · on June 7, 2020

I still use good variable names in projects where I'm the only person reading the code. I wouldn't remember what things mean otherwise.

rscho · on June 7, 2020

We have a different definition of terseness.

j88439h84 · on June 7, 2020

What do you mean?

rscho · on June 7, 2020

I mean that programming in a terse language has nothing to do with obscure variable names. The point of terse languages is that knowing the primitives allows you to write concise programs comfortably. It doesn't mean you have to use single letter variables which in opposition to primitives do change between programs.

j88439h84 · on June 10, 2020

Isn't the way those languages make concise programs by having short symbols instead of descriptive names?

Comparing NumPy to APL it looks that way.

https://analyzethedatanotthedrivel.org/2018/03/31/numpy-anot...

rscho · on June 11, 2020

Well yeah, referring to my above comment the primitives are symbols and it's expected that the user knows them, just as you're supposed to know the keywords in any language.

Good (professional) APL though, will intersperse those primitives with meaningful variable names.

Whether one thinks it's worth the effort to know the symbols vs know the keywords is matter of opinion. But if one dares calling oneself let's say a python programmer, I'd argue that anybody would expect that you'd know the keywords anyways.

cheez · on June 7, 2020

Indeed.

klibertp · on June 7, 2020

I don't think this comment should get downvoted this much - it may sound a bit provocative, but the question is an interesting one, and worth getting answered.

> what is this obsession with terse languages

It's not an obsession at all. It's a recognition of the fact that there are certain areas where the terseness is helpful, sometimes exceptionally so. It doesn't fit everywhere, and indeed there are domains where it would be downright detrimental. Still, people who prefer compact languages know this and are sure to carefully evaluate the circumstances before choosing APL or Forth. There are "rabid fans" - who would like to rewrite the whole world itself in their beloved language - in every language community, but in my experience, they're scarce among APL, J, or Forth users.

> that look like a composition of random characters,

Most of the tersest programming languages are meant for interactive, REPL-driven development. In PLs like APL, Forth, J, K, Cat, and now Xs, you are supposed to build your programs as a series of short expressions that you create and evaluate interactively in the interpreter. In such a setting, where you're likely to rewrite and evaluate every expression often, it's faster and more convenient to work with as concise syntax as possible. It's simply a different development mode, and it has its strengths, and - like every other programming model - its weaknesses.

> didn't we learn from Perl

First, Perl is nowhere near the terseness of APL or J, so I'm not sure what it is you think we should have learned from it. Further, Perl sigils are actually an elegant solution to the problem of type conversions in a dynamically typed language. Take a look at JavaScript and PHP, with all their implicit type-related shenanigans, then come back to Perl - chances are you'll appreciate the sigils quite a lot.

The most important thing with APL descendants is that they all have a minimal amount of core operations, which compose incredibly well. Xs is at a very early stage of development, and it will undoubtedly grow, but for now, it has but 5 functions bound to single-character symbols. You can check out the J vocabulary to see a mature, well-developed array language: https://code.jsoftware.com/wiki/NuVoc

It might look overwhelming at first, but it's essential to realize that what you see on that page is the entirety of J - there's nothing more to the whole language. It's not English-based, but math notation also isn't, and it's similarly concise. Plus, for a significant portion of the planet's inhabitants, English-based languages are just as foreign as J is to you.

To sum it all up: the conciseness of the APL-related languages, which comes from both heavy use of symbols for function names and the tiny amount of core concepts which compose effortlessly, opens the door to a style of programming which fits certain domains and circumstances very well. It may be hard to grasp without first putting some effort and playing with them for a bit... Still, the array and concatenative languages are powerful, and in the right circumstances, can provide an indispensable boost to programmers' productivity.