A more practical reason as a language designer is that C and C++-style type-before-name syntax is a nightmare to lex and parse, as you can't tell whether
A * B;
is a multiplication or a variable declaration, or whether
A<B, C> D;
is a two comparisons with a comma operator or a templated variable declaration, without first knowing the names of all declared types.
This means in practice that you have to declare types before they are used in a file, which means forward-declarations if they are defined later, that you can't separate lexing and parsing because the parser has to provide constant feedback to the lexer, and that misspelling a type name can lead to a syntax error! C++, not content with merely inheriting C's problems, throws in the “most vexing parse” as a bonus.
It's a historical problem in C. Originally, C had only built-in types. Structs were declared with
struct foo { int x; int y; };
which is easy to parse. Then came typedef. With user-defined types,
foo*bar;
isn't parseable with a LALR(1) parser until you've seen the definition of "foo".
Even the ordinary case
foo bar;
needs more than one lookahead to parse.
This is a headache for compilers, and a huge headache for anything that wants to work on single source files without seeing the included files.
It's a big win if files are parseable without their dependencies. The Pascal/Modula/Ada family all are. I think Go is. Not sure about Rust, but it probably is. C and C++, no.
(The hacks for template syntax in C++ are painful to think about.)
Yeah, the easiest way to do it is return a different token type from the lexer for type names, which means the lexer looks up the symbol table when it sees identifiers.
Amusingly, the Delphi dialect of Pascal has an ambiguity in its grammar from how it made function pointers work:
function g: Integer;
// ...
f(g);
Depending on the definition of f, this could be passing the function f by reference (as a function pointer), or passing the result of calling f.
Delphi looks at the argument type to resolve the ambiguity, but it stays awkward in overload scenarios, when there's a choice between Integer or function pointer arguments.
At some point around about 2009, I added the ability to specify explicitly that you want the call, to eliminate the ambiguity:
Rust has made a point of being parser-friendly, to support tooling. The entire reason the turbofish ::<> exists is to avoid ambiguous grammar. (This operator seems to vex people, I don't get the hate.)
Being parser-friendly is not at all the same thing as being user-friendly. The difficulty of writing parsers is way overblown.
One thing to avoid is depending on the existence of the symbol table to parse correctly. (C++ has this problem.) Needing a symbol table makes it hard for parsers like source code formatting and syntax highlighting.
Another is to avoid raw string literals that reach back and try to unwind the results of previous "phases of translation".
Agree with a caveat; if the syntax has many ambiguities, with resolution solved by a clever parser, it'll be confusing for users, especially newbies, because they'll tend to get errors unrelated to what they were trying to achive.
Both humans and computers parse code. Humans can handle ambiguity mode easily than computers, but that doesn’t make the ambiguity more user friendly somehow.
Do note that Rust the language is regular and easy to parse, but rustc the compiler actually performs bounded lookahead and contextual parsing on parse errors to detect common typos and invalid syntax that would otherwise be vexing to users. The nightly only type ascription : syntax is a good example because it crops up in several different contexts as a 1-substitution typo and used to be terrible (may it burn in a fire). The turbofish is another case where we now suggest the correct syntax, but there are many, many more.
Surely that's an example of how parser-friendly syntax empowers tooling? I imagine that it would be much harder to guess intention and fixes from lookahead and context if those were already necessary to make sense of correct code?
I love the turbofish so much, not because it looks nice in code or anything, but because the name is a bit of harmless fun that makes the language feel inviting.
Ah, C not originally having typedefs explains why the syntax wasn't thrown out as too annoying to begin with. Syntax creep I guess.
An easy solution to the problem would be to do what e.g. Haskell does where case-sensitivity determines a type name. For better or worse, C didn't do that.
On the other hand doesn’t forward declaration make sense? The code is declaring it depends on the forward declaration defined “somewhere else”. In a sense it is documenting the the classes/structs better.
Years ago, I stumbled upon a Visual C++ 6.0 bug (if memory serves me right), just by playing around trying to understand C/C++ decls. It would crash on a stray:
*c;
at the beginning of a translation unit.
Didn't K&R have a tiny, tiny C declarations parser (printing out 'human readable' equivalents) example in their book? I think the caveat was that it needs to assume it's dealing with a declaration...
I was just thinking about pointer syntax today, and if you ask me, there are a lot of problems that could be avoided if language designers took a page out of Guido van Rossum's book and extend his idea of forced spacing.
Take the first example you've given, and let's just talk about variable declaration:
int* a;
int * a;
int *a;
That's all the same. That's wrong. Obviously, you can write a lexer and parser that doesn't give a crap, but the human mind does. If you think it doesn't, that's only because you've internalized the various cases.
It should be this:
int* a;
The type we're talking about is a pointer to a variable of type int: in other words, "a" is an int-pointer. If you were to create a macro, you'd do something like this:
#define int_ptr int*
Do you see what I'm getting at? The star in this case is a suffix, equivalent (in our minds) to "_ptr". Conceptually, it doesn't belong anywhere else than attached to the type. It's a compound type, conceptually.
Now, take the star being used in a different context:
int b = 10;
int* a = &b;
printf("%d\n", *a);
There, though we see the same character, it's a completely different thing. It's a dereference operator. Conceptually, it belongs attached to the pointer variable it is dereferencing; and since the star is already used in one context as a suffix, here it should be used as a prefix.
This is no good:
printf("%d\n", * a);
It doesn't matter that "this compiles." That's not what this is about.
Many C programmers (and programmers in other languages, even Python) are used to writing things like this:
int c = x*y;
That's wrong. Sure, the lexer and parser don't care. But that makes the language worse, for the human operator. "But it saves space!" Spare me.
The thing with this one example, using the star, is that what we have is the equivalent of a homonym. We have one sign that is actually three different words. Mandating spacing removes the ambiguity you're complaining about.
C is what it is, but if we imagine someone were going to write it today, they should incorporate the above and mandate spacing. For the sake of the humans. "ident type" is not the only solution.
There's a perspective that doesn't seem to be noted yet, which is that
int *f
is declaring that
*f
will be an int.
This perspective addresses why the star belongs with the name, why it's star instead of ampersand, why you need to repeat the star for multiple variables, why the brackets goes after the name for arrays, and it will sort of get you where you need to go with function pointers (although there's an automatic promotion which means things will work at the call site that won't work for the type).
This isn't to say alternative constructions mightn't be a better choice in a new language, but it's much more parsimonious when considering C than memorizing a bunch of special cases.
I don't think that actually breaks things. A typedef isn't purely syntactic substitution like a macro.
If we have
typedef float *floatp;
then the following compiles:
float x, y;
floatp xp = &x, yp = &y;
while the following does not:
float x, y;
float* xp = &x, yp = &y;
You have defined a new type (which happens to be equivalent to an old type - note that we don't get nominal type-checking from typedefs); the above logic still holds.
People have been arguing about where the asterisk should be for a very long time. The main counter-argument to the construction you used is that multiple variable declaration immediately looks ugly given the current rules of C variable declaration:
int* a,* b; // ??
And that's usually why most C style guidelines use
int *a, *b;
Now, if we treated "int<asterisk>" as the full type name in the syntax then you get a much nicer result:
int* a, b; // much nicer!
Though that does have its own tradeoffs. But that would require actually changing the C syntax, at which point you might as well make it the postfix syntax:
I must disagree... the following sends the wrong message the reader:
int* a, b;
Though I also agree that dereferencing is best without a space. Perhaps the correct suggestion is to not declare pointers and instances on the same line, but that's a convenience that many seem to enjoy
C is just riddled with mistakes and this one finally gelled with me. Python taught me that whitespace is great for syntax and, in this case, I'm quite convinced that it should be used more firmly in most languages.
Python's syntax precludes works against a lot of things, like first-class lambdas, case (yes it can be emulated with if/elif, but it's not the same, even without fallthrough), pattern-matching in general, or assigning the result of a method-chaining pipeline to a variable.
I use python at $dayjob, and I run into the limitations of whitespace as syntax all the time.
I think a better (subjective of course) lesson might be to enforce a style.
I have been thinking about making a toy compiler (I wanted to write a borrow checker) that treats bad code as an error, solely aimed at numerical code - I have recently had "Scientific Programming in Python for physics etc." inflicted on me.
Slight tangent, but I think if Haskell enforced some kind of whitespace a la Python it would be much more approachable in real codebases (Haskell is usually quite readable if you are just translating mathematics into code but it - to me at least - feels dreadful as a productive language because of the way a lot of functions seem to be dumped onto into the text editor in a lot of the code I have read)
Not really relevant to the broader discussion, but Haskell's whitespace sensitivity is defined in terms of automatic insertion of braces and semicolons, and you can write it that way instead if you want. A few people do write Haskell that way (SPJ maybe?) but the community is mostly united around using whitespace. That said, it's useful to know about the braces and semicolons if you're generating Haskell code, because that's often easier. Also occasionally at the GHCI prompt, where it will let you keep things on one line.
If I'm describing a hypothetical language, then I'm actually describing a hypothetical language where you wouldn't declare more than one variable per line.
It's been a long, long time since C required a programmer to declare his variables at the top, and best practice argues that you declare a variable as close to its use as possible. So, there really isn't the same case for multiple variables on one line, whereas it may have been a little more forgivable, once upon a time.
Moreover, when I was first learning C, I learned from an O'Reilly book by Steve Oualline. I remember him saying, when it came to operator precedence, that coding style that relied on the rules was a really bad idea. I think he said something like, "Multiplication and division come before addition and subtraction, and use parentheses for everything else."
My bottom line is C has a little too much "convenience" to it. (Granted, to my taste.) It goes back to my van Rossum comment. I'm against special cases and loosey-goosey stuff.
The problem in C is more than just the `*`. Consider arrays, where C uses `int xs[];` instead of `int[] xs;`. There is no way to use define to declare an int_array type like you did with int_ptr. (Function types also exibit a similar problem)
foo*bar; // multiplies foo and bar
foo* bar; // declares bar as pointer-to-foo
foo *bar; // declares *bar as foo (so bar is still a foo*)
foo * bar; // multiplies foo and bar
These are all unambigous and there are only two semantics between them. You also have:
foo* bar,baz; // baz is a pointer
foo *bar,baz; // baz is a foo
foo *bar,*baz; // baz is pointer again
foo* bar,*baz; // baz is now a pointer *to* a pointer
This is all obvious - or at worst unambigous - to a person reading the actual code (without trying to correct for the idiosyncracies of a parser), so the language should either match that or spit out a warning about unsupported spacing.
A pointer to an int "should" be &int, not int*. That we use *, the
dereference operator, to indicate pointers is wrong. * in a type means it's an
address, but * in a value means it's not an address. That's nuts! Make it
consistent and use & both places. If you need to have a distinction between
refs and pointers, it should be that pointers are nullable refs: &int? or some
such.
Edit: How to escape asterisks in HN?
I think you may be right, &int would be better than int*.
I did not want to stray too far from what C does now, for the sake of argument. I just wanted to say that the ambiguity in C is often because it's so loosey-goosey with whitespace.
More generally this falls into a discussion of context-specific vs context-free grammars. Of which C++ falls into the former, Java falls into the latter.
Pretty much every programming language has a nicely parseable context-free "rough syntax" (my term I just invented) that can be written down formally for the language documentation and a parsing tool. And then every language also has a notion of "well-formed programs", which introduces a whole bunch of additional constraints on what programs should actually be accepted by compiler frontend.
Well-formedness includes type checking. But even without full type checking that can be done later, it also includes things like being aware, in C, of whether a given identifier is declared as a typedef in the current scope. So while C has a nice context-free "rough syntax" formally specified in the standard, its actual input language is context sensitive.
As for Java, the first example that comes to mind is that constructors must have the same name as the class they belong to. This "choose whatever identifier you like, but at some later point repeat that exact same identifier" is a very typical example of something that is not context-free.
You might disagree whether this constraint is part of what you consider Java's "grammar". So the answer to your question depends on what language level you are thinking of. But whichever level you apply to Java, you should apply the same to C++. C++ also has a context-free "rough syntax" in its standard.
It's Turing complete because it could simulate a Turing machine, not because of metaprogramming. The language brainfuck is Turing complete, for example.
Checking if a Brainfuck program is well formed (i.e. can be run) is a linear time operation. In C++ this can take forever. They have different complexities.
The original comment was about Turing completeness, and it was defined incorrectly. I was giving an example of a dead-simple language that was Turing complete, because the claim was that metaprogramming made C++ Turing complete.
The claim wasn’t that C++ is Turing complete, that’s trivially true. The claim was that C++’s grammar is Turing complete. I don’t know if that’s exactly the right way to phrase it, but C++’s template expansion stuff is Turing complete.
Yes, but it is the difference to other programming languages.
In C you cannot encode a Turing machine that is executed by the compiler at compile time. In Brainfuck you cannot encode a Turing machine that is executed by the compiler at compile time. In C++ you can encode a Turing machine that is executed by the compiler at compile time.
No. You would need the ability to write unbounded loops or unbounded recursion. You don't have that with the C preprocessor.
Yes, you can do a lot with the C preprocessor. You can also do a lot in languages that only have bounded loops and are therefore not Turing complete. You can either express nonterminating computations (Turing completeness), or you can't (still powerful, but dramatically less poweful). This question is binary. There is no fuzziness, there is no approximation, there is no "quite close".
Context free means something different to what you're saying here. This is a good discussion of this topic, I never knew C++ was so irregular and informal.
Arguably a programming language is ultimately a user interface, and the more intuitive the interface, the better.
In 2020, not sure we should care that much about how hard compilers have to work to achieve this. Computers and software are here to support us--we're not here to support them.
A parser tends to get confused in the same places where a less experienced human would get confused. Making a language that's easy to parse dovetails with making one that's easy to read.
That's a reasonable point. As long as it's the "good for humans" that driving things, this makes total sense. It's "good for computers" but "bad for humans" that needs to go away these days.
C++ is stuck in this trap where because it's slow to compile, the compiler maintainers increase the amount of optimizations the compiler does to make it faster. Which of course makes the compiler even slower. Which motivates them to increase the amount of optimizations. Which makes the compiler yet slower.
I feel part of the problem with rust is it doesn't have quick and dirty mode thats fast and a production ready mode that is slow but does all the checks. I do this with my C programs, using various formal analysis tools which are slow to vet the code before releasing it.
Checks in rust are fast. In fact, you can 'cargo check' to run checks without actually compiling, and that will finish in less than half a second.
Most editors did this on save before LSP came along.
For logic, this is fast enough because the type system catches enough mistakes and I don't need to run tests all the time.
For UI though, a faster iteration cycle would be nice...
The thing that will massively speed up rustc is the current re-architecting of it. We’re at the point of “few percent here, few percent there” with the current design. These add up over time, of course, but batch compilers are inherently slower than the newer style ones (after an initial compile).
To add to Steve’s point, Rust adds more checks during debug (non-release) more, which can makes the IR larger. So it’s not always the case that debug mode is faster to compile (though it generally is)
I'd consider compile time as part of the UI, so in that sense I think we agree. (If the compile is long because the implementation is poor, that needs to be fixed.)
Not sure what you mean on the dichotomy. If someone says that a language needs to have X because that will make things simpler for the computer, I say that they are wrong. The goal, the only reasonable goal, is to make things better for humans.
> If the compile is long because the implementation is poor, that needs to be fixed
The compile time could be long because the implementation is poor. But it's also possible that the specific requirements do not allow for a significantly faster compile time.
That's why the requirements matter. They determine the space of possible implementations. If the requirements eliminate all "fast" implementations, then the resulting user experience will be poor because of slow compile times.
The best example of this problem is probably SPARK, a variant of Ada which permits formal verification.
Its verifiers are awfully finicky, and tuning the parameters (including selecting the most appropriate verifier) can mean the difference between successful completion in a few seconds, and outright non-termination/timeout-with-failure.
It's true that the answer is to have better verifiers, but that's not just a matter of tweaking the verifier code, it's a serious research challenge. One of the most serious problems with formal methods is the ability to scale.
`int * A` and `A * int` both seem ambiguous unless a) you require "int-pointer" naming instead of "pointer-int", and/or b) you have a separator (like the title uses) so this becomes `A * B` vs `A: * B` which is indeed unambiguous but in a very different way.
i.e. without extra rules you can't tell either way, so it comes down to the extra rules. `A * B` is unambiguous for type-before-name if you require "pointer-int" since it would need to be `* A B` for A to be a type.
(quite possibly there are counter-examples when you get into weirder corners, but my point is that it's not as simple as presented)
in an unusual way. If it was a multiplication, the result would be thrown away and be pointless. Hence, it is a declaration.
But what if A overloads the * operator and the side effects are desired rather than the result? D has a philosophy that arithmetic overloads should be for arithmetic-like operations, not I/O, template metaprogramming, or other nonsense. Hence if you try it like that, too bad, so sad, it'll be treated as a declaration.
I wouldn’t mind if such a statement in isolation would be a compiler error and if languages in general were much stricter. So far all languages I have seen that are very forgiving in terms of syntax (PHP comes to mind) seem to breed sloppy programmers and software written in these often have stupid little bugs caused by typos.
This particular issue has not caused any "silly typo disease" problems that I'm aware of in D. In fact, pretty much nobody notices it, it just works the way people expect it to.
(There are some other things in D that are designed to discourage trying to overload arithmetic operators for non-arithmetic purposes. For example, < <= > >= cannot be overloaded individually, only as a group.)
Strong disagreement here. "type ident" flows with the data during assignment, doesn't confuse the infix operators, and doesn't misuse ":" from a human-language standpoint.
For example:
val x: String = "hello"
The type interrupts the flow of data from "hello" to x, so one thing that pops into mind is that this is typecasting the value to a string before storing it. Nope.
Another possibility I instinctively see this as is doing a comparison and assigning the result (either true or false in this case) to x. Nope.
And human-language wise, colon is "description: explanation" (or more generally: general to specific), which actually fits this syntax better:
val String: x = "hello"
...and at that point, just remove the extraneous stuff:
> The type interrupts the flow of data from "hello" to x, so one thing that pops into mind is that this is typecasting the value to a string before storing it. Nope.
> Another possibility I instinctively see this as is doing a comparison and assigning the result (either true or false in this case) to x. Nope.
You can write it as
val x = "hello" : String
if you prefer. In fact that's a great advantage of this syntax: any expression can be optionally ascribed with a type. If you write the type first then it becomes too intrusive (and too much like a typecast, which absolutely should be intrusive).
> And human-language wise, colon is "description: explanation"
True enough, but what other syntax would fit in postfix position? In human language we'd probably use commas ("Bob, chef"), but that seems a bit too ambiguous in a programming language.
I agree, though I could see the merit for a standalone declaration:
val x: String
x = "hello"
The type at this point is almost like a comment.
For declaration and assignment though, I agree that reading "ident: Type" is harder for me.
Perhaps an interesting idea would be to have the type at the end of the expression. Like so:
val x = "hello": String
Essentially, you're making a type assertion on an expression. Since it's an assignment expression (the value of which would be the assigned variable) then it also type checks the variable.
Most statically typed languages don't even need the type assertion in a case like this, though. A literal has a definite type (hopefully), so the type of x can be inferred.
val x = "hello"
Standalone declarations are the most important problem to solve here.
Yeah, type inference is preferred (and pretty common). I'm just saying that, if I had (or wanted to) set the type, at the end of the expression would be my preferred place.
Your examples have the things on the right and the type on the left, and then used it to describe why examples on the left and [types] on the right makes sense...
The point of the list isn't to say "eggs and bacon is a kind of breakfast", it's to say "breakfast is eggs and bacon". It's "name: details about name", not "type: instance".
My intention was solely to counter the idea that using ":" to mean "is" or "is a" is somehow inconsistent with written English. I don't think it's always necessary or even desirable to match the use of symbols in programming with English orthography, in any case.
The use of a declaration with initialization as the example here muddies the water. Whatever syntax you use has to work for uninitialized declaration of variables, function arguments, and structure members:
val breakfast: String
fun serveBreakfast(breakfast: String)
struct MealPlan {
breakfast: String
}
Declaration with initialization just needs to be consistent with these.
Having a keyword for variable initialization makes parsing easier and less ambiguous. "let", "var", etc. are also used for this purpose, but I was going along with the example.
It is presented here that name before type is easier to read as a matter of fact. I’m not so sure. In math or languages where type info is optional, we often write “x = 5”. When type info is required, it is natural to evolve to “int x = 5”. Readers would naturally focus on the latter part. When we write “x: int = 5”, the type info is in the middle. We cannot skip it even when we just want to focus on the name and value.
Many languages allow you to elide the type, which is another nice thing about the type following the identifier.
In Scala, in particular, types are not the assigned type like in C (where they also serve as the storage specification) -- they are assertions, that the compiler will check are compatible with the code.
So `val x: int = "hello"` is no good and the compiler can cut it short right there; this is especially useful as call-site documentation.
Conversely, a lot of languages will infer type from first assignment, so in e.g. TypeScript "let x = 5", x is inferentially typed as 'number' and the type checker will throw if the implicit constraint is later violated. This reduces the need for explicit type annotations, clearing up a lot of the visual and cognitive noise.
There is still a distinction between primitives and object types in Scala, so it's not correct to say that types in Scala aren't used for storage specification.
It's also the case that there are plenty of languages where all types imply storage specification and they support plenty of type elision.
> Sadly it's nowhere near as elegant for string literals.
What do you mean? For string literals you just do
let foo= "bar";
and that's it.
BTW, in Rust you can omit most variable type annotations since the compiler is able to infer them. You have to give type annotations to functions though.
> I’m more fond of using .into(). It requires adding type hints in some cases, but for most cases it is shorter than the alternatives. Especially when passing a string literal to a function that requires String.
> Now that specialization for str::to_string() has landed, we can safely say that to_string() has the same performance as to_owned(), and thus to_string() should be used since it’s more clear
> I now strongly prefer to_owned() for string literals over either of to_string() or into().
You could argue that Rust strings are complex, because they are : having 8 "string-like" types (owned strings vs string slices (references), cstr/cstring, path and pathbuf, osstr/osString) is complex, but having three different methods doing exactly the same thing isn't.
Yes it's redundant, but what would you rather have : str as the only easily convertible type with no to_string method? Or the only reference type without to_owned? No ability to use into for strings while it works everywhere else? Obviously, redundancy is better than theses alternatives.
Interestingly, I find ident: Type significantly more difficult to read. Having the type information helps me contextualize what I'm about to read -- it narrows the mental search space I need to explore when parsing the name.
For example, knowing something is a float, double, int, or string can make an ident named "releaseTime" mean different things.
I also find that whitespace is more consistent when using Type ident, you get rivers where the spaces all line up, so all the type declarations AND ident declarations align. Whereas with ident: Type, I find it much more difficult because of the variable length of identifiers. (Yes, one could fix this by using tabs, but if idents vary in length by more than one tab stop, it becomes difficult to read horizontally.)
This feels a little nitpicky/idealistic, I don't think the post does a good job of conveying why it's more beneficial.
> This means that the vertical offset of names stays consistent, regardless of whether a type annotation is present (and how long it is) or not.
Why is this necessarily desirable? Strong typing systems have very expressive types, to the point where if something is typed correctly, most of the time my property names are just an alternative casing of the type. Types can be just as expressive or even more expressive than variable names.
> The i: Int syntax naturally leads to a method syntax where the inputs (parameters) are defined before the output (result type), which in turn leads to more consistency with lambda syntax (whose inputs are also defined before its output).
Maybe this is nice in theory? But `Int` really isn't an output here, and the value being assigned isn't either. Rather this seems more like `f(i, Int, value) -> assignment`. It seems just as arguable that `f(Int, i, value) -> assignment` is appropriate.
It seems like some of these are rooted in a "pure mathematical" approach which I can surely appreciate, but ultimately lambda calculus is as much a language as any other programming language, saying "lambda syntax does it this way" doesn't convince me very much.
I've been using Rust a lot recently, which puts names before types and inputs before outputs, and I will absolutely attest to how much mental work is saved by ordering things this way. Skimming or reading Rust comes twice as easy as reading Java, and I do a lot of both. Sure it's an anecdotal report, but I have a real sense here that I feel compelled to report.
As other posters have stated, this order makes parsing easier. But I also suggest this benefit extends to your own brain's parsing ability as well. The old order is indirect and suboptimal and makes you think harder.
Typescript also orders its parameters this way. Between all the Rust and Typescript vs Java and C++ code I've written, I really haven't found that either is better than other, it just seems like a largely arbitrary choice.
Making parsing easier for the compiler is a convincing benefit, would have been nice to see that mentioned in the article. I think that's a substantially stronger reason to prefer types after names. I'm not sure if I parse either faster or slower though.
> and I will absolutely attest to how much mental work is saved by ordering things this way.
As you yourself noted, personal anecdotes are really not an argument. Someone could say they find Java easier to skim than Rust and we'd be nowhere. Like arguing which end of a boiled egg to crack first.
> As other posters have stated, this order makes parsing easier.
Programming languages don't exist to make itself easier to parse. They exist to make it easier for programmers to program. Otherwise, we wouldn't have such things like syntactic sugar. Hell we would just write in machine code and do away with assembly and higher level programming language. And parsing is a simple and superficial one time step. Being a tad bit more difficult is not a convincing argument.
> But I also suggest this benefit extends to your own brain's parsing ability as well.
Based on what evidence?
This is the problem with tech evangelism. It has the same problems as religions, lots of claims, no evidence.
> As you yourself noted, personal anecdotes are really not an argument.
Then what is? If you're looking for a randomized sampling of programmers with sufficient sample size, you're not going to find it here.
> Programming languages don't exist to make itself easier to parse.
No, but a fine example is that of C++: the difficulty in parsing means that if you make a typo, the error message you get might be bizarre and confusing. A compiler for a language that's easier to parse will have a much better idea of the programmer's intent and can provide a much better error message. I find it astounding how often rustc can figure out exactly what I wanted to do and suggest it as a note after the error message.
I would think that more-useful error messages pass your test of "make it easier for programmers to program".
While we're talking about making it easier to program, "name: Type" make it possible to avoid typing out "Type" at all, and letting the compiler infer it (no, this isn't good and readable to do in all situations, but often it's fine). If you have "Type name" style, and try to add the ability to infer types, you end up with Java's "var" abomination.
Regardless, I'm in agreement: I find "name: Type = blah" much easier to read. I read it as "name is a Type that is equal to blah". This also is an improvement in parameter lists, when they're lined up vertically:
def foo(bar: String,
baz: Int,
quux: Foo)
I find that much easier to mentally parse to determine parameter order than
void foo(String bar,
int baz,
Foo quux)
Worse, imagine that all three parameters were of the same type, requiring a scan to the right to read the names. The important information to me at a glance is the name of the parameter, not its type.
As someone who cut his teeth on C and later Java, much later learning Scala and Rust, I immediately liked the style of the latter two much better. Lately I've been doing a lot of Java and get constantly annoyed at the "backwards" order.
> This is the problem with tech evangelism. It has the same problems as religions, lots of claims, no evidence.
I suppose you could argue that what I've written above is just personal preference, but I see it as a bit stronger than that.
> Then what is? If you're looking for a randomized sampling of programmers with sufficient sample size, you're not going to find it here.
Evidence. Maybe a study showing programmers have a natural preference? Or scientific evidence? Anything more convincing than "Rust evangelist" anecdotes.
> No, but a fine example is that of C++: the difficulty in parsing means that if you make a typo, the error message you get might be bizarre and confusing.
Difficulty parsing? If it didn't parse and found an error, then it means it didn't have any difficulty parsing. That has more to do with the complexity of the language itself than parsing. Parsing is a very simple matter. Or maybe the compiler for one language is better? Also, I thought we were comparing Rust to Java?
> I would think that more-useful error messages pass your test of "make it easier for programmers to program".
It does, but once again all you've done is provide anecdotes without any examples or evidence.
> Regardless, I'm in agreement: I find "name: Type = blah" much easier to read.
I don't. The most important part of "name: Type = blah" is the Type. So it's nice to have it first. But then again, there are people who love dynamic programming languages. So once again personal preferences and personal anecdotes aren't convincing arguments.
> As someone who cut his teeth on C and later Java, much later learning Scala and Rust
Yeah, I too fanboy over new languages I learn. But then I get over it and move on with my life. My guess is you just wrote toy programs in scala and rust and nothing substantive.
> I immediately liked the style of the latter two much better. Lately I've been doing a lot of Java and get constantly annoyed at the "backwards" order.
So then use Rust? Why are you using Java?
> I suppose you could argue that what I've written above is just personal preference, but I see it as a bit stronger than that.
I don't have to argue it. All you've provided is personal preference. "I find "name: Type = blah" much easier to read. " is personal preference. It's no more a convincing argument of anything than you prefering chocolate over vanilla shows that chocolate is better than vanilla.
> Maybe this is nice in theory? But `Int` really isn't an output here, and the value being assigned isn't either. Rather this seems more like `f(i, Int, value) -> assignment`. It seems just as arguable that `f(Int, i, value) -> assignment` is appropriate.
The point is that you want variable declarations and function signatures to be consistent, so you either write
val i : Int
def f(x: Int) : String
Or
Int i
String f(Int x)
And if you do the latter then you have a confusing syntax because the output type comes before the input type, and it's very hard to do lambdas in a way that looks consistent.
> String or int are very rarely appropriate variable names
I very much agree, but that only undermines the point if they are often appropriate type names. In the sorts of languages the GP was trying to restrict that sentence to, I don't think that's the case. I even have some doubts that it's true in C.
I think the author misses the single biggest advantage of `identifier: Type`.
The moment `Type identifier` syntax encounters higher order functions and types, you end up with messes of parenthesis. Figuring out what a type means then involves bouncing back and forth across the type definition.
With `identifier: Type` complex higher order types still parse linearly left to right.
It's enough of a UI issue that people will end up avoiding higher order functions in `Type identifier` languages simply because they're a mess to express.
Yup, especially with structural typing as in Typescript when you don't have aliases to your all your type constraints, having identifier:{complex:mess, of:{nested:stuff}} is easier than other way around.
Language Design: This stuff doesn't matter that much. Focus on more important things.
Syntax isn't unimportant, but don't waste energy on trivial matters like these. Just pick something and people will get used to it. Focus on the semantics of your language - that's what really matters.
Language design affects how good autocomplete and error messages can be. That is hugely important.
Having said that, this article doesn’t advocate “ident: Type”, it advocates ”marker ident: Type”.
That marker is essential for ease of parsing and thus for autocomplete (it won’t try to autocomplete the ‘ident’ part by looking at variables in scope or function names, for example) and error messages (it could signal when name shadowing occurs, for example)
Sure, but there's no arguing to be done there. Is the grammar context free? Ideally, can it be parsed with small constant lookahead? Yes? Cool, no further discussion needed.
I'd be more open to this kind of discussion if there was an ounce of actual research behind what makes syntax more/less readable. As it is, it's just a bunch of people arguing endlessly about their very specific preferences. Just pick something sensible and move on.
I didn't say "syntax doesn't matter, pick any ridiculous thing you want". That's what I mean by "not unimportant", though I admit it's not exactly clear that's what I meant. My point is that within the space of reasonable, comprehensible syntaxes, there are no demonstrable differences worth arguing about.
>there are no demonstrable differences worth arguing about.
That is a big claim. Is very easy to believe (I do it before, when my knowledge of programming languages was about just 3 or 4. Now is more than 12). :
But is clearly false, and is easy to prove:
async/await
go chan
fn sort<T>(of:list<T>...)
try/catch
match
All the above are just small things that have a HUGE impact in how develop programs. Also, in matter of "small" stuff that could look insignificant:
[1, 2, 3] + 1 = [2, 3, 4]
this one is a huge deal in certain niches, also, another "small" and insignificant thing:
SELECT ... FROM source
source SELECT ...
All this are just small things. Not all that obvious at the time. Remember how before the times of GOTO the idea of more specialized control flow was unthinkable in the minds of many.
Syntax MATTER MOST. Because, is OUR interface. The space of improvement is not super-big, truth, but it impact hugely.
Also, when done correctly, it make the semantics fit like a glove or not.
Another obvious example: Do concurrency whithout syntax help (just using threads). Or performant, safe, concurrency friendly, zero-gc, system-programming, etc without what rust and other langs have bridged.
I also know many languages (which is hardly some grand accomplishment) and it’s my firm opinion that syntax MATTERS LEAST. You spend some time getting used to it and it never really bothers you again. Semantics matter most - syntax is just an interface to the important stuff.
The difference between Python, C++, Haskell, Common Lisp, Prolog, and SQL isn’t syntax. If it was, everyone would pick their favorite syntax and use it all the time. What matters is how well the semantics (and their potential performance implications) match your problem. The syntax just needs to be a decent enough interface to the semantics. Frankly, it seems to me like most of your “counterexamples” are about language semantics, not syntax.
Here’s the thing. Would I like every language to have a consistent, beautifully designed syntax backed by UX research and testing? Absolutely. But language designers have bigger fish to fry. There’s little value in wasting energy talking about syntax once it reaches a basic state of acceptability.
I do amend my statement - you’re right that it’s a big, unsubstantiated claim. There are no _demonstrated_ differences. I haven’t seen an ounce of evidence that it makes a difference beyond familiarity. Furthermore, even if it did, that wouldn’t make it top priority. It would just make arguments about it sensible.
> The difference between Python, C++, Haskell, Common Lisp, Prolog, and SQL isn’t syntax
Ok, let's try: Do SQL without the SQL syntax.
P.D: I don't think we are that in disagreement ("The syntax just needs to be a decent enough interface to the semantics"), is that the claim of "syntax don't matter" make it look is just an irrelevant aspect of the language. Can be argued how much relevant, but after years on this trade, go to the C++ community (for example) and tell them to change the syntax to lisp syntax and see how much it will succeed.
Syntax is 100% tied to paradigms, idioms, and such. Is intrinsic to the language we use.
It's worth reiterating the point of my initial comment (which I admit I may not have conveyed well). I never said "syntax doesn't matter", because that's not my point. My point is that it's almost never worth arguing about. Just pick something (or accept what already exists) and move on. Language designers (and you) have more worthwhile things to do.
My issue is with unproductive, endless debates about syntax minutiae like the original post. Syntax doesn't matter enough to be worth it, and such debates devolve into everyone shouting about their personal preferences anyway (see: many of these comments).
> Do SQL without the SQL syntax
I'm not sure what you're saying here. The syntax of SQL is completely arbitrary - I'm sure you could think of a completely different syntax that works just fine. Let me know if I'm missing something, but it seems extremely obvious to me that the biggest difference between C++ and SQL programs isn't how they look - it's how they behave. One wouldn't dream of replacing one with the other and that has nothing to do with their syntaxes.
> go to the C++ community (for example) and tell them to change the syntax to lisp syntax and see how much it will succeed
Obviously it'll fail - good. Even if lisp syntax was way better, they've gotten used to C++ syntax and have much more important things to spend their time on.
There you go: you have the exact semantic as a traditional SQL query (1:1 mapping) and only the syntax is different.
Now, one may argue that the syntax is "ugly", less familiar, that the ` are hard to type or whatever, but this is just taste. One simply get used to it. The expressiveness and semantic are the same as in SQL
> Syntax is 100% tied to paradigms, idioms, and such. Is intrinsic to the language we use.
I think then we don't have the same definition of syntax.
The way i understand it is that the syntax is just the way to represent these idioms and paradigms visually.
What the parent is saying is that these paradigms and idiom as what is important, but the exact way they are written, not as much (as long as it is within reason)
I was to talk about the SQL stuff, but I think it will be wasted as long we get blind to the fact syntax IS semantic.
However this:
> but the exact way they are written, not as much (as long as it is within reason)
Then what is "within reason?". Is more logical to only have GOTO than IF, is better to have ELSEIF or nest IF?, what happened if my lang say that null is the same than Option.None?, what if generics use [] and not <>?.
Whitespace matter, yes? no?
Allow unicode?
CamelCase, snake_case or what? What if all const are lowercase, types mixcase and the rest UPPERCASE?
For some, APL syntax make more sense than algol.
Talk about why, that is the point of this kind of talk.
Is VERY easy to rug this kind of stuff. VERY. I WAS in that camp before. But now, I try to build my own lang (relational), and DAMM, it start to be much clear why syntax matter, even "the exact way they are written", because switch this to that and suddenly, my lang is ANOTHER paradigm (or worse, will be CONFUSED as be).
Naming, is one the hard things in computer science.
---
I understand why is easy to dismmis this as irrelevant. Sometimes I don't see why some people are so upset about typography and font selection, or why my profesional brother complain about framing in photograph. But go and SEE what the DESIGNERS of lang say about this stuff and you will note that for them, even this apparent less-significant thing matter. you can even get a prize on the field for show the importance of syntax (http://www.eecg.toronto.edu/~jzhu/csc326/readings/iverson.pd...)!
If that mean that most will not see, GREAT! That is the mark of good design.
One thing that I've found myself doing a lot after using SQL for a while is writing SQL to produce SQL. Maybe the syntax could be more oriented towards that, which would not be just a matter of taste.
You'd have to show me some data that one is easier to work with than the other, because fundamentally types _are_ names, and I find them just as expressive as variable names (many of which are just named after types, lets be honest). Even if I didn't, I don't think my brain struggles to read things in either order (or indeed in languages where types are rarely mentioned).
I mean most programming languages put some sort of punctuation between names. Eg: function calls are punctuated by parens, imports are punctuated by `::` or `.`.
There’s a difference between “things that are implemented in popular programming languages” and “things that are implemented in most programming languages”
Yes but you’re clearly making some sort of argument by authority and I think that works both ways. I just don’t see any reason to hold firm beliefs either way on such a trivial issue unless someone has a citation that I’m missing that proves one form increases comprehension or productivity, or reduces bugs.
I think it also makes more human sense. The parameter name should in some sense be telling you more than just the type. Like "size: Size" is kind of repetitive.
They are not cryptic. They are trying really hard to come up with good syntaxes and semantics actually. I think that modern programming languages tend to have very clean syntaxes.
Another very nice thing about this is that it is much much easier to parse, because only one kind of thing can go in each position of the phrase. Simplicity in parsing is something that I think is underrated in language design; the harder it is for a computer to parse, the harder it is for a human to parse, and parsing code is 90% of the programmers work (the other parts being 9% debugging and 1% authoring new code).
I replied elsewhere that I found the opposite to be true. So I suspect that different people will find different styles to be easier/harder to parse.
> the harder it is for a computer to parse, the harder it is for a human to parse,
I don't think this is true -- assembly (or bytecode) is very easy for the computer to parse, but much, much harder for humans to parse. English is much easier for humans to parse, but pretty difficult for computers to parse.
I disagree. The syntax design should flow from the design of the language itself and whether or not you use prefix or postfix notation for type annotations depends heavily on what makes sense within the semantics of the type system.
Granted that the pathological case of the error you warn against is Perl, and that should be enough of a cautionary tale for anyone. But a language is a user interface for programmers, too. Some affordance is merited, especially in a case like this where prefix vs. postfix may affect ease of parsing, but seems most unlikely to influence how the type system actually behaves.
You're right, I just don't care for the author's notes on language design because they're all on syntax design, which is an impossible task to do in morsels without knowing anything about the rest of the language or how it is supposed to work.
I do prefer postfix because I think it flows very nicely "this is-a thing assigned-to that" is nicer than "thing called this assigned-to that."
In terms of the impact on the language, optional postfix annotation makes it a bit trickier if you want to make the identifier optional, and in languages that support it you tend to see special syntax to deal with that case (which breaks the author's fetish for self consistency).
Personally I think ordering of the trio of "alias" "thing" "value" should be consistent across the language, which extends far past variable assignment, and any one of the trio can be left out.
What are examples of language semantics which are better served by pre/postfix type annotations? Also, what exactly do you mean by making the identifier optional?
To me, this is unclear because it is hard to tell where the type ends and the actual function begins.
Last example.
const foo: { [string]: number} = {"hello": 3}
I believe says that foo is an object with string keys and number values.
What does this look like in a Type ident language?
const { [string]: number} foo = {"hello": 3}
I think all of the Type ident examples are more confusing because it's to tell where the type ends and the name begins (this is most clear in the first example). This probably makes syntax highlighting worse/parsing more complicated/is tougher on the user. With ident: Type, it is very clear that the type starts after the ":" and ends before the "=" sign.
Some of the things have been addressed (`extern crate`).
Many of the issues I disagree with: `Buf` is strictly better than `Buffer` (less typing, like `fn`). I have no issue with mixing `CamelCase::snake_methods`, and actually find it to be quite beautiful. The good parts of being Pythonic.
I would like to see the alternatives to turbofish. What exactly is the author suggesting? And what's wrong with `println!` and `format!` ? It isn't articulated.
`[]` misuse is bad, semicolons aren't consistent, `PathBuf` is inconsistently named, etc. Agree. `io::Result`, ...
Maybe there will be some cleanup in a future language edition.
> This means that the vertical offset of names stays consistent
This is also an argument for using keywords of the same length for introducing a variable and a constant. If that’s desirable, it rules out the obvious choices `var` and `const`.
Possibilities include `var` and `val`, which may be too similar-looking, and `var` and `let` - but are people used to (from JavaScript) `let` being mutable? Any other options?
“Let” is mutable in Basic, too, but the part of the population that is used to that is shrinking.
As to short, equal length options for ‘let’ and ‘val’: one could consider using punctuation. Forth uses colons instead of ‘fun’, and I think, in a concise language, one could get used to using, say, ‘!’ for immutable and ‘~’ for mutable. Unfortunately, they aren’t easy to type. An alternative could be to always assume immutability and only use ~ in the rare cases where one needs to mutate.
So, a simple
foo = 3
or, if one wants to simplify parsing:
= foo 3
introduces a new immutable variable, and
~ foo = 3
or
~ foo 3
a mutable one. If we allow leaving out spaces:
~foo 3
that starts to look like using sigils to indicate mutable state. I think that might be a good option in a mostly immutable language.
I think I would use Forth’s colon instead of ‘=‘. That would make ‘=‘ available for equality testing, allowing us to get rid of ‘==‘.
One downside of the 'ident: Type' approach is the extra colon character.
The major downside of the 'Type ident' approach, is that if 'Type' is optional, then the parser can't be sure if its parsing the 'Type' or the 'ident' when encountering the first token. In practice this isn't too hard to solve however, it can be handled with some backtracking.
In my language, Winter, I have chosen the 'Type ident', approach, mostly due to similarity with C, C++ and Java. I do sometimes wonder if I made the right choice however. Maybe it could be an option? :)
I’m surprised this didn’t touch on the IDE autocomplete suggesting variable names. In Java you would have something like `LocationBuilder locationBuilder` which makes users just tab complete the variable name to quickly have access to a variable. The argument in this article was about names being prioritized and I think forcing no auto completion on a variable name would force the developer to be slightly more descriptive than the variablized string of a class name
In Kotlin, IntelliJ has no problem with this. As you type a new value: `val id`, and you have `IdentName` defined in scope, the value `identName` is suggested automatically.
Not all IDEs are the same, though, and I'm not sure how sophisticated this feature was to implement.
Maybe a little orthogonal -- I could easily imagine IDEs still doing something like transforming the input ": FooType" into "fooType: FooType" w/ the name selected and ready to be tabbed past.
> The ident: Type syntax let’s developers focus on the name by placing it ahead of its type annotation.
If this were true, we'd have to conclude that speakers of name-then-honorific languages like Japanese ("Graham-san") are better at remembering and focusing on people's names than speakers of honorific-then-name languages like English ("Mr. Graham.")
The most important result of this design is that the syntax unambiguously determines whether you are referencing the type or value axis, and enables you to split them accordingly. Having worked with Scala and been forced to return to a C-style language, this is probably one of Scala's most overlooked features.
One additional reason why it is beneficial is that you can then naturally extend typing to any expression, not just identifier. This can help the type inference (and also can serve as documentation), which is (IMHO) a must in a modern programming language.
Yeah I'm pretty sure the real reason for this is that it is way easier to parse types if they are after the name. Especially complex ones like functions.
The first point is the most appealing to my brain at least. Type inference is a really useful feature (when paired with a nice IDE) and having a single standardized prefix to declare variables regardless of what type it is can help the mental model. This is especially true with more complex non-obvious types, where you may not know exactly what type you have without the hint from your environment
If consistency if so important, why do we have: function, func, fun, fn, def, etc... depending the author?
For clarity, use "function", for simplicity use "fn", other forms are just fancy.
If consistency if so important, why do we have: function, func, fun, fn, def, etc... depending the author? For clarity, use "function", for simplicity use "func", other forms are just fancy.
Sorry, but this article starts off with an excellent example of why this is horrible:
val x: String = "hello"
String x = "hello"
The first line reads: "value X is of type String and contains hello"
The second line reads: "String x contains hello"
val and : are fluff and add nothing. Arguments about it being tougher to parse would have some merit if this wasn't all figured out almost 50 years ago.
This means in practice that you have to declare types before they are used in a file, which means forward-declarations if they are defined later, that you can't separate lexing and parsing because the parser has to provide constant feedback to the lexer, and that misspelling a type name can lead to a syntax error! C++, not content with merely inheriting C's problems, throws in the “most vexing parse” as a bonus.