The statement is something you provide. It's the search you can have the LLM do. If this works for math it will immediately make code way higher quality via the same tools.
It's not that they can't. It's that it's a waste of time in most cases.
Compilers are moving targets because hardware changes. There's a considerable maintenance upkeep in a compiler.
So if you are interested in programming language semantics, you can opt to skip the compiler part. This lets you iterate language designs without the added baggage of translating said program to machine code.
You can also argue there's no need. If you present your programming language in operational semantics, then it's trivial to write that up as a prolog program and run it on a prolog interpreter. Then you can employ a partial evaluator, and the first Futamura-projection gives you a compiler. You can choose to host your prolog program in a programming language which already has access to a partial evaluator, and you are essentially done before you even started.
I'm someone who has used Prolog in the past, but this is the first time I'm learning of Futamura's work[1]. I knew it was great for building executable grammars, but I hadn't ever really tried to do so thus have absolutely no knowledge on the usual techniques. What an absolutely fascinating methodology, I can see exactly how it maps to Prolog.
It's important to note that not every research area ends up being a surface-language, and oftentimes research projects remain in-progress for a long time. There does exist a freely available research implementation of a 1ML interpreter (though slightly behind the language's formalization) offered by the author:
The thing is that this is a research prototype, not a real compiler. It's not usable in the same degree as a language like SML or Haskell. There is a lot more work beyond a grammar that goes into creating a compiler for a high level language.
[Here, ML means "Meta Language", not "Machine Learning". ML is used as an important building block inside some theorem provers and proof assistants]
The key thing with 1ML is that it merges the core and module system.
The ML family has historically had two systems: core and module. They are stratified in the sense they are separate languages. Modules can contain core expressions, but the other way around isn't possible.
1ML blends module and core. This means you have first-class modules in the core, which leads to a pretty nice language design.
Furthermore, this being Andreas Rossberg, the rigor at which this is carried out is very high. There's proofs of type safety and correctness along the way, generally to the same high bar as Standard ML (SML).
LLMs has had a couple of years by now to show their usefulness, and while hype can drive it for a while, it's now getting to the point where hype alone can't. It needs to provide a tangible result for people.
If that tangible result doesn't occur, then people will begin to criticize everything. Rightfully so.
I.e., the future of LLMs is now wobbly. That doesn't necessarily mean a phase shift in opinion, but wobbly is a prerequisite for a phase shift.
(Personal opinion at the moment: LLMs needs a couple of miracles in the same vein as the discovery/invention of transformers. Otherwise, they won't be able to break through the current fault-barrier which is too low at the moment for anything useful.)
I think you should use a language with a highly expressive type system. That can be assembly too. See TAL back from the 1990'es. I also think you should use a language with a very expressive module system.
The reason is that you want to have some kind of guidance from a larger perspective in the long run. And that is exactly what types and module systems provide. The LLM has to create code which actually type checks, and it can use type checking as an important part of verification.
If you push this idea further: use Lean, Agda or Rocq. Let the LLM solve the nitty gritty details of proof, but use the higher-level theorem formation as the vessel for doing great things.
If you ask for a Red-black tree, you get a red-black tree. If you ask for a red-black tree where all the important properties are proven, you don't have to trust the LLM anymore. The proof is the witness of correctness. That idea is extremely powerful, because it means you can suddenly lift software quality by an order of magnitude, without having to trust the LLM at all.
We currently don't do this. I think it's because proving software correctness is just 50x more work, and it moves too slow. But if you could get an amplifier (LLM) to help out, it's possible this becomes more in the feasible area for a lot of software.
Go is strong. You get something where writing a solution doesn't take too much time, you get a type system, you can brute-force problems, and the usual mind-numbing boring data-manipulation handling fits well into the standard tools.
OCaml is strong too. Stellar type system, fast execution and sane semantics unlike like 99% of all programming languages. If you want to create elegant solutions to problems, it's a good language.
For both, I recommend coming prepared. Set up a scaffold and create a toolbox which matches the typical problems you see in AoC. There's bound to be a 2d grid among the problems, and you need an implementation. If it can handle out-of-bounds access gracefully, things are often much easier, and so on. You don't want to hammer the head against the wall not solving the problem, but solving parsing problems. Having a combinator-parser library already in the project will help, for instance.
Any recommendations for Go? Traditionally I've gone for Python or Clojure with an 'only builtins or things I add myself' approach (e.g. no NetworkX), but I've been keen to try doing a year in Go however was a bit put off by the verbosity of the parsing and not wanting to get caught spending more time futzing with input lines and err.
Naturally later problems get more puzzle-heavy so the ratio of input-handling to puzzle-solving code changes, but it seemed a bit off putting for early days, and while I like a builtins-only approach it seems like the input handling would really benefit from a 'parse don't validate' type approach (goparsec?).
It's usually easy enough for Go you can just roll your own for the problems at hand. It won't be as elegant as having access to a combinator-parser, but all of the AoC problems aren't parsing problems.
Once you have something which can "load \n seperated numbers into array/slice" you are mostly set for the first few days. Go has verbosity. You can't really get around that.
The key thing in typed languages are to cook up the right data structures. In something without a type system, you can just wing things and work with a mess of dictionaries and lists. But trying to do the same in a typed language is just going to be uphill as you don't have the tools to manipulate the mess.
Historically, the problems has had some inter-linkage. If you built something day 3, then it's often used day 4-6 as well. Hence, you can win by spending a bit more time on elegance at day 3, and that makes the work at day 4-6 easier.
Mind you, if you just want to LLM your way through, then this doesn't matter since generating the same piece of code every day is easier. But obviously, this won't scale.
> It won't be as elegant as having access to a combinator-parser, but all of the AoC problems aren't parsing problems.
Yeah, this is essentially it for me. While it might not be a 'type-safe and correct regarding error handling' approach with Python, part of the interest of the AoC puzzles is the ability to approach them as 'almost pure' programs - no files except for puzzle input and output, no awkward areas like date time handling (usually), absolutely zero frameworks required.
> you can just wing things and work with a mess of dictionaries and lists.
Checks previous years type-hinted solutions with map[tuple[int, int], list[int]]
Yeah...
> but all of the AoC problems aren't parsing problems.
I'd say for the first ten years at least the first ten-ish days are 90% parsing and 10% solving ;) But yes, I agree, and maybe I'm worrying over a few extra visible err's in the code that I shouldn't be.
> if you just want to LLM your way through
Totally fair point if I constrain LLM usage to input handling and the things that I already know that I know how to do but don't want to type, although I've always quite liked being able to treat each day as an independent problem with no bootstrapping of any code, no 'custom AoC library', and just the minimal program required to solve the problem.
In a modern image chain, capture is more often than not HDR.
These images are then graded for HDR or SDR. I.e., sacrifices are made on the image data such that it is suitable for a display standard.
If you have an HDR image, it's relatively easy to tone-map that into SDR space, see e.g. BT.2408 for an approach in Video.
The underlying problem here is that the Web isn't ready for HDR at all, and I'm almost 100% confident browsers don't do the right things yet. HDR displays have enormous variance. From "Slightly above SDR" to experimental displays at Dolby Labs. So to display an image correctly, you need to render it properly to the displays capabilities. Likewise if you want to display a HDR image on an SDR monitor. I.e., tone mapping is a required part of the solution.
A correctly graded HDR image taken of the real world will have like 95% of the pixel values falling within your typical SDR (Rec.709/sRGB) range. You only use the "physically hurt my eyes" values sparingly, and you will take the room conditions into consideration when designing the peak value. As an example: cinemas using DCI-P3 peaks at 48 nits because the cinema is completely dark. 48 nits is more than enough for a pure white in that environment. But take that image and put it on a display sitting inside during the day, and it's not nearly enough for a white. Add HDR peaks into this, and it's easy to see that in a cinema, you probably shouldn't peak at 1000 nits (which is about 4.x stops of light above the DCI-P3 peak). In short: your rendering to the displays capabilities require that you probe the light conditions in the room.
It's also why you shouldn't be able to manipulate brightness on an HDR display. We need that to be part of the image rendering chain such that the right decisions can be made.
AVIF is trying to be a distribution format for the Web. JPEG XL is trying to be a complete package for working with image data. JPEG XL can replace OpenEXR in many workflows. AVIF simply cannot.
There's a lot of power in not having to convert for distribution.
It's not. The fragments you can execute are limited if you do it right. A client isn't allowed to just execute anything it wants, because the valid operations are pre-determined. The client sends a reference which executes a specific pre-planned fragment of code.
In development, you let clients roam free, so you have access to the API in a full manner. Deployments then lock-down the API. If you just let a client execute anything it wants in production, you get into performance-trouble very easily once a given client decides to be adventurous.
GraphQL is an execution semantics. It's very close to a lambda calculus, but I don't think that was by design. I think that came about by accident. A client is really sending a small fragment of code to the server, which the server then executes. The closest thing you have is probably SQL queries: the client sends a query to the server, which the server then executes.
It's fundamental to the idea of GraphQL as well. You want to put power into the hands of the client, because that's what allows a top-down approach to UX design. If you always have to manipulate the server-side whenever a client wants to change call structure, you've lost.
No one exposes SQL to clients though. I think where Gql differs from sql is it’s at a higher level. SQL bleeds performance and data layout (e.g. normalizing, limits), GraphQL does not.
It’s not clear if it’s high enough to abstract knowledge from storage. In the end it’s tension between enabling client to wander around productively vs being a bull in a china shop.