Ante: a compile-time language

jfecher · on June 17, 2017

Hello, I have been working on this project for a while now and would love to take any questions anyone may have.

ivanbakel · on June 17, 2017

What checking is done on code generated by the compiler extensions? Do you manually have to build the code out of well-typed components, or does the compiler check again after each pass?

As an aside, it's very cool to see this implemented, though it does give me the depressing thought that none of my ideas are original anymore. Hopefully more languages follow suit and allow for this kind of utility.

jfecher · on June 17, 2017

The Ante code to generate each portion of llvm-ir is checked as normal but there is no checking done on the llvm-ir that is generated. It is expected for each extension producing ir to generate valid output. That being said, one of the cool things about Ante (imo) is that if you dont like this, you can write an extension that does check for validity.

swsieber · on June 17, 2017

It looks like you use LLVM as a backend - it seems to be the norm these days. What drove you to use it, and what has been your experience with it?

jfecher · on June 17, 2017

I decided to use it largely because of its proven track record, optimization passes, and its ability to act as a JIT compiler. LLVM is very competitive with other compiler's backends in terms of speed (see clang and gcc), and it is well established so there is plenty of help and documentation available. The builtin optimization passes means I can spend more time on the language itself rather than the IR, and the JIT, while not the fastest, is usable as is, and is extensible should I need it. Compared to using C as an intermediate language I would say there is a slight learning curve, but overall I found llvm easy to learn.

One of the only disadvantages is that because llvm is so large it requires some odd build steps that can be a pain to setup, especially on windows. The llvm-config tool however, manages many of required flags for you. Also, because there are so many libraries, compilation of the compiler itself can be rather slow.

couchand · on June 17, 2017

Meh. Looks like a fun toy, but it seems unlikely that this adds much value over directly modifying the LLVM IR. If you've understood enough about the LLVM model to write these constructs you're better off using the C++ API. That evolves pretty quickly and is very extensive, so I'm having a hard time imagining this being nearly as powerful. If it offered a significantly more elegant model that would be one thing, but the README example seems to add only an extra layer of APIs to learn. And based on the test suite it looks like the feature list is aspirational.

tom_mellior · on June 17, 2017

> you're better off using the C++ API

Using it for what? If I understand this project correctly, it allows you to put an application and corresponding application-specific language features/optimizations into one source file. You can then always be sure that those language features and optimizations are available.

I have a hard time seeing how this would work with LLVM's C++ API in a comparable way. There you would have to (1) write the application in an LLVM-supported language, (2) use some other transformation language to add features to your surface syntax, (3) separately compile your LLVM extension pass written in C++, (4) make sure your LLVM extension is used when compiling the application.

couchand · on June 17, 2017

It's not easy to talk specifics, since as far as I can tell the example in the README is currently unimplemented - there's no comparable test and there doesn't seem to be any code exporting the APIs used there.

But from what I can glean from that short example, the proposed API is essentially LLVM's IRBuilder class, with a slightly different interface. This seems like a strategy that's unlikely to be practical, because of all the complexities involved in reimplementing another project's API, particularly one that changes as frequently as LLVM's.

I'm not sure I understand your breakdown of four steps. As far as I know, it would be very hard to make a backend pass that would have enough information encoded purely in IR from a frontend to compile a construct like goto. This is the sort of thing that needs to be deeply integrated into the compiler, so the author of that feature would necessarily need to understand the data structures and architecture of the compiler. At that point, it's probably easier and more reliable to write it as a compiler extension.

I do like the general idea of being able to extend the operation of a compiler at application compile time (rather than compiler compile time), but I am skeptical that the solution involves directly mucking about with LLVM. That's too low level, the language needs to provide a useful model of abstraction above that to be better than just modifying the code of the compiler itself. As others have mentioned here, Lisp is a great example of offering such a model.

jfecher · on June 17, 2017

Hello, author of the language here. You are correct that the project is an early state, I wasn't planning on spreading the word publicly for some time but as long as it was posted here I can answer some questions.

You can extend the compiler's functions during compile-time by doing more than just messing with llvm-ir. You are free to implement an automatic linter, or garbage collector for example. You can even do an optimization pass without operating on the llvm-ir. To do this, you could walk the parse tree of a function before it compiles and change around any common node patterns you are looking for before it is translated to llvm-ir. Alternatively, you can muck around in the ir and do the same thing.

The goto construct in the example provided does not need to be deeply integrated because the existing API is setup largely separate from the internal structure of the compiler. For example the function ctStore stores a variable in a compile-time container separate from the scope-separated table used internally by the compiler for other variables. The primary advantage of Ante as a language is that these functions enable compiler extensions to be made within the program and thus allow the swapping out of, eg. a garbage collector, without recompiling the compiler itself. This lets each application developer decide what features they need for their application. Instead of changing languages for a gc or desired optimization pass, just swap out a library.

I share your concern with the LLVM library constantly updating and the need to update bindings with it. I plan sometime to use the clang api to make a C++ -> Ante converter program, although I expect there to be numerous difficulties with its implementation.

couchand · on June 18, 2017

Hi! Always fun to see new language ideas and watch them grow.

I think the general idea is interesting. However, I suspect that you're underestimating the complexity of the things you've mentioned. I don't mean to be too negative, but have you implemented garbage collection? Getting it right for just one language takes lots of serious effort. With the added complication of needing to deal with other arbitrary user-designed language constructs, it might be an intractable problem.

Brainstorming ideas for language design is great. Trying new things is fantastic. But it's important to be humble and realistic, and to recognize that language design requires making thoughtful tradeoffs.

jfecher · on June 18, 2017

That's completely fine, negativity isn't inherently bad in the first place. I have done some work with a garbage collector for an interpreted language I had previously worked on before. It was very basic as far as garbage collectors go, it just had a simple mark-sweep algorithm, nothing too complex. I'm not denying that creating a garbage collector would be quite an ambitious process, my goal as a language developer is just to give enough control of the language to the user to make that possible.

That being said, by no means do I claim that all extensions will work together perfectly, or at all. For example, a garbage collector would be completely incompatible with a plugin that frees all variables once they go out of scope. Compatibility is left up to the plugin's author in the case of multiple plugins having conflicting goals.

I'm not sure where I stated I was opposed to making tradeoffs, I have done several (albeit mostly syntactic) already. I knew the project was ambitious when I first started it, that is part of the fun of working on it.

tom_mellior · on June 17, 2017

> This is the sort of thing that needs to be deeply integrated into the compiler, so the author of that feature would necessarily need to understand the data structures and architecture of the compiler. At that point, it's probably easier and more reliable to write it as a compiler extension.

That's exactly what this language allows you to do, except that it's nicer than C++ and can be interleaved with the application you want to compile with the extended compiler.

> That's too low level, the language needs to provide a useful model of abstraction above that to be better than just modifying the code of the compiler itself.

I agree that a higher-level API would be useful, too. For high-level things, but not necessarily for adding goto.

couchand · on June 18, 2017

> it's nicer than C++

What do you mean by nicer? The tradeoff here seems to be exchanging useful functionality for a leaner, more limited interface. I'm not sure I agree that the right decision is to forsake the full power of the LLVM interface.

> can be interleaved with the application you want to compile with the extended compiler.

On this we can agree - abstractly, that would be a very powerful thing. Practically, though, what's the cost? How would developers expressively write useful extensions? How do these extensions interact? Answering the hard questions about how this would actually work is going to take some serious thought.

> I agree that a higher-level API would be useful, too. For high-level things, but not necessarily for adding goto.

I'm not sure goto is all that great of an example, except for the dimension of implementability in a README file.

tom_mellior · on June 18, 2017

> What do you mean by nicer?

People have lots of more or less subjective preferences concerning programming languages. To me, this programming language looks more productive and more comfortable to use than C++. You might reasonably disagree.

> Answering the hard questions about how this would actually work is going to take some serious thought.

And/or experimentation. No programming language design was ever gotten right on the first try. Programming languages evolve with use, in unplanned and unplannable directions. It's completely fine to do this and see how it comes out.

In a cousin comment, you wrote "Always fun to see new language ideas and watch them grow." This is exactly that.

xfer · on June 17, 2017

I think he was referring to the fact that you can't use any of llvm pass manager infrastructure which is in C++, even if your language-level optimization could benefit from it.

couchand · on June 17, 2017

That's not what I meant, but that is also a good observation.

jnbiche · on June 17, 2017

This is awesome, don't know how I missed it before. Kind of like a more functional Terra. I'll have fun tonight playing around with this.

nwmcsween · on June 17, 2017

Ideally you wouldn't directly expose the compiler to make optimizations. You would create an IR of which the compiler could use and mung around with the IR instead of the compiler directly.

daxfohl · on June 17, 2017

Why? Are there other languages that take this approach? What is the advantage?

jonesetc · on June 17, 2017

Yep, this is how anything using LLVM works: http://llvm.org/docs/LangRef.html

daxfohl · on June 17, 2017

So what is the advantage or disadvantages of either approach?

mcbits · on June 17, 2017

One advantage is that multiple languages can target the same IR and benefit from the same optimizations. Another is that more development effort is focused on one IR instead of spreading it across all of the languages targeting it.

One possible disadvantage, you can't gain information by automatically converting code to another representation, but you could potentially lose information that would be relevant to an optimization. I don't know of any practical examples of this happening with LLVM, but I believe Java's type erasure carries a very slight penalty.

Another possible disadvantage is if you have a language designed to be compiled extremely fast, then adding an intermediate step could slow it down unacceptably.

nickpsecurity · on June 17, 2017

I'll add to mcbit's excellent description that there's also advantages in verification. Passes on a simple, intermediate language can more easily be equivalence-checked by symbolic (eg KLEE) or formal (eg VeLLVM) methods to ensure the transformations didn't break the program somehow. So, you get piles of people contributing optimizations plus potential improvements in verifying they work.

https://klee.github.io/

http://www.pgbovine.net/PhD-memoir/ucklee-cav-2011.pdf

https://www.cis.upenn.edu/~stevez/vellvm/

nwmcsween · on June 17, 2017

The advantage is you are tied to the IR and not the compiler (if the language designs it's own IR), the disadvantage is the same as the compiler optimizes via passes on the IR. What would be needed is a way to 'hold' optimizations on the IR so that you aren't playing a game of will-it-optimize (or worse will-it-not).

daxfohl · on June 17, 2017

Wait, this "optimization" stuff seems to be a tangent. So the basics: "hello world" for Ante is a "goto" macro. Forgive me but I'm having trouble seeing how mucking with IR yada yada produces "goto" in the target language so succinctly. Ante's README demonstrates this marvelously. How would this be done in the IR approach? It seems like it would have to be much more complex.

couchand · on June 17, 2017

Umm... it IS that IR nonsense. This example directly manipulates LLVM IR.

rosstex · on June 17, 2017

Could someone explain this simply?

_euac · on June 17, 2017

Looks like a "preprocessor" on steroids. Imagine the C preprocessor, but instead of simple macros and constants you can define expressions that will be expanded to much more complex things. This example they give is pretty good at conveying the idea: you can define "goto" to emit the intermediate LLVM opcodes to jump to a label, then use it immediately after. This is great for defining your own DSL.

throwaway91111 · on June 17, 2017

it's three things:

* a syntax, including literals

* first class support for llvm IR and primitives

* a macro system allowing you to create new ir-level primitives with aforementioned syntax.

It seems like a lisp without homoiconicity and a complex syntax (compared to sexps). Think: greenfield compiler development, or maybe a DSP DSL.

daxfohl · on June 17, 2017

Or a lisp that forsakes those things to gain the advantage of compile time type checking etc.

xfer · on June 17, 2017

You don't need to forsake any of that to get compile-time type checking.

vsviridov · on June 17, 2017

If you are into things like that, check out Jai, an experimental language being developed by Jonathan Blow (of Braid and The Witness fame).

https://www.youtube.com/playlist?list=PLmV5I2fxaiCKfxMBrNsU1...

ajarmst · on June 17, 2017

https://en.wikipedia.org/wiki/Greenspun%27s_tenth_rule

aidenn0 · on June 17, 2017

Remember that meta is the greek word for "after" which is why metaprograms run before programs

burgerdev · on June 17, 2017

> Meta (from the Greek preposition and prefix meta- (μετά-) meaning "after", or "beyond") is a prefix used in English to indicate a concept which is an abstraction behind another concept, used to complete or add to the latter.

https://en.wikipedia.org/wiki/Meta

Most usages of 'meta' that I encountered target the 'beyond' meaning, i.e. raising the abstraction level of something.

sillysaurus3 · on June 17, 2017

Clever, but not entirely accurate. May as well be pedantic: In some Lisps, you can set up the program so that it recompiles chunks of the program during execution. Which is either after the program runs or before the next evolution of the program, depending on how you look at it.

aidenn0 · on June 17, 2017

In most lisps, macroexpand is before eval.