Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Haskell, Ada, C++, Awk: An Experiment in Prototyping Productivity (1994) [pdf] (yale.edu)
86 points by Akronymus on Dec 10, 2022 | hide | past | favorite | 49 comments


Related:

Haskell, Ada, C++: An Experiment in Prototyping Productivity (1994) [pdf] - https://news.ycombinator.com/item?id=19570776 - April 2019 (55 comments)

Haskell vs. Ada vs. C++ an Experiment in Software Prototyping Productivity (1994) [pdf] - https://news.ycombinator.com/item?id=14267882 - May 2017 (59 comments)

Haskell vs. Ada vs. C++ vs. Awk vs (1994) [pdf] - https://news.ycombinator.com/item?id=13275288 - Dec 2016 (68 comments)

Haskell, Ada, C++, Awk: An Experiment in Prototyping Productivity (1994) [pdf] - https://news.ycombinator.com/item?id=7050892 - Jan 2014 (24 comments)

Haskell v Ada v C++ v Awk... An Experiment in Software Prototyping Productivity - https://news.ycombinator.com/item?id=7029783 - Jan 2014 (23 comments)


So if I’m parsing the results correctly they only have a single data point per language? Seems like they are testing individual programmers styles/speed just as much as they are testing the languages.


And individual familiarity with their language.

But if you think you can get better data, we'd all love to see it...


I haven't reviewed the methodology of this first hit in detail, but here's perhaps some: https://www.researchgate.net/profile/Charles-Knutson/publica...

I don't think it's this one but I do remember reading a very high quality study arriving at the same result as well.


They have 2 data points for Haskell since the one of the authors wrote a Haskell one which happened to have the fewest lines and highest documentation to code ratio.


They do mention that the one you're mentioning was developed in literate programming style, where the program is a LaTeX document.

I wonder what kind of an impact literate programming has on maintenance. Can't imagine rewriting 40 lines of commentary to suit a 2 line code change to be particularly enjoyable.


I've never used literate programming in a large project, but I have used it a few times at work and my experience is mostly that you are right. In a team where you'll have accumulated knowledge among team members about a codebase you are working in regularly, the benefit to literate programming seemed to be very quickly outstripped by the maintenance burden.

There is an exception though: runbook automation. When I write scripts to automate work that comes up when I'm on call, I try to write them in a literate style when I can, and commit them to a runbooks repo. The extra documentation is a lot more beneficial because they tend to be things that people aren't looking at regularly, and someone might be looking at it because of a 3am page or some urgent outage, so having a lot of documentation up front can be a big benefit.


Really smart putting the runbook automation and the docs all in one place.

Yeah, when I was on call, I'd have an ipython notebook up and running with access to all the systems I need to extract data and information from for doing investigative analysis. I feel like my intelligence drops 10% when a P1 would come in. Being able to map IDs in one system to another system so you can access a specific log can reduce much of the chaos.


> I wonder what kind of an impact literate programming has on maintenance. Can't imagine rewriting 40 lines of commentary to suit a 2 line code change to be particularly enjoyable.

Reflecting on and expressing the motivation of a task and providing the rational of how you accomplished it, the alternatives you considered, etc. will obviously bring a higher cognitive load along with it than just "winging it". My guess is that for non trivial systems the benefits generally outweigh the costs. In most environments you're likely to work in, the extra "upfront" effort is unlikely to get recognized, not by one's bosses, often not even by oneself.

I put upfront in quotes because I believe you don't just profit in the long term, explicitly thinking about the task and discussing / documenting the thought process generally leads to considering the problem more closely.

Having a well designed and documented system probably will also cut down on those "2 line bodge fixed" that are to cumbersome to document in the long run.

That said, it obviously depends on context, trivial code, throwaway code, boilerplate code exists...


> Can't imagine rewriting 40 lines of commentary to suit a 2 line code change to be particularly enjoyable.

I have the expectation of a "simple" 2 lime change that took 24 hours to resolve not require more than 4 lines of commentary.

However I fully admit it's not rational and try to work to change it.

It helps me to remember the end goal isn't just fixing the problem, but also preserving the biggest gotchas, limitations, and most important lessons learned.


This. I've absolutely left 20 line comments on 2 line code changes when the reason for the change is non-obvious, subtle and fairly important. I look at it as a favor for future me.


The need to document your changes would help with copy-paste, in my opinion. You would be forced to extensively document why do you repeat these 70 lines here in your code instead of using the same 70 lines elsewhere.


30 years ago Haskell was the winner. Conciseness was a major feature.

Has anyone put this to the test on larger projects?

10,000 lines of Haskell have the same functionality as 100,000 lines of C++?


I work at one of the largest Haskell-in-production companies in the world

There are issues outside of the actual authorship of code you need to take into account if you want to choose Haskell. I'll leave it at that.

It's a fantastic language undoubtedly.

But if you've never experienced the maintenance and upgrade of a very large Haskell codebase over a period of years, especially if you have a big dependency-tree, and need to interface with external tools (database drivers, etc.) I'd urge you to talk to someone who has for a view into the experience. Also ask about the state of things like profiling and compiler bugs/memory leaks.

EDIT: I want to note that the state the general Haskell ecosystem/tooling has improved at a dramatic pace in the last ~3-4 years, with the advent of HLS and recent GHC releases.


Yeah an important lesson for programmers is that for big projects the ecosystem around a language is usually more important, for any given metric (productivity, quality, performance), than the language itself. Python being used to do high performance math all the time is a good example of this. Python the language isn't at all good at that, but the ecosystem has nice libraries with high performance C code underneath them.

If one comes to believe there is a language that really is a lot better, then you have to get to work building out the ecosystem around it to realize that, and it isn't easy!


The most successful haskell project I've seen in the wild is shellcheck [0].

Here is cloc run against the repo:

    $ cloc .
          75 text files.
          75 unique files.
          12 files ignored.
    
    github.com/AlDanial/cloc v 1.90  T=0.08 s (788.1 files/s, 271069.0 lines/s)
    --------------------------------------------------------------------------------
    Language                      files          blank        comment           code
    --------------------------------------------------------------------------------
    Haskell                          29           2381           1321          15440
    Markdown                          5            393              0           1088
    Bourne Again Shell               12             99             39            438
    SVG                               1             32              0            262
    YAML                              3             39             23            160
    Dockerfile                        6             35             54            104
    Bourne Shell                      8              3              4             99
    --------------------------------------------------------------------------------
    SUM:                             64           2982           1441          17591
    --------------------------------------------------------------------------------
I wouldn't say that this is a compressed version of a 100,000 lines of C++. I'm not adept at evaluating a Haskell source base to do a translation to C++ either.

[0]: https://github.com/koalaman/shellcheck


I bet pandoc [1] is even more successful, in the sense of pretty wide use.

[1]: https://github.com/jgm/pandoc


That is amazing! "Every good regulator of a system must be a model of that system" [1] So not only does the Haskell code encode a shell, but also a type inferencer and type checker, all in 18kloc.

3-5kloc more and you would have a shell compiler!

[1] https://libgen.is/scimag/10.1080%2F00207727008920220


Hasura (https://github.com/hasura/graphql-engine) is quite big project. Although it should be less than 100K LOC.

It was only once when I saw problem that could be considered a bug.


I worked on https://github.com/input-output-hk/cardano-node if you want to see a large haskell code base


Haskell is too special, to be used in typical large project.

But in real projects, are in wide use other pure functional langs (trying to write most popular first): - Erlang (telco, finance) - Lisp (finance, CAD) - Scala (basically universal) - Scheme (basically universal) - Elixir (frontend) - Clojure (don't know much, looks like also frontend).

But in most cases, they are sort of DSL, because functional paradigm is much better in some niches, but not so good in others.


Haskell was mere 4 years old in 1994.


Sounds great except it takes 10x as much time to read/write the Haskell code versus C++.


The chart did indicate that 85 lines of code did take a while to write but the 1100 lines of C++ likely took longer. Many developers claim that Haskell code has fewer bugs. Program correctness counts for something.

Anyway, a modern competition with Python, Swift, Java, Modern C++, Rust, etc would be interesting.

Java 1.0 wasn’t even available when this paper was written.


Fun fact: Python was 3 years old at this point.


Page 9 shows Haskell scoring very well for development time when compared against Ada. C++ development time was unfortunately not reported.


Ada projects tend to start slow, but readability and flexible namespacing make refactoring and maintenance much easier.


Oh, yes. Have touched Ada in VHDL (using Ada declarations syntax), declarations in Ada are very long, but extremely powerful.

Have not seen any comparable in popular languages (except Modula 3, which is for same domain), nearest as for me, guard expressions of Erlang.


In reading, it sounds like some of the code for certain languages wasn't even executed/tested? They just had people take a shot at it and then other people reviewed the code informally? It doesn't sound like there were very stringent test cases written beforehand. Also this paper was published by the people who wrote the Haskel version, and the whole thing, once you see that, reads like these guys did a better job in way less time and maybe understood goals better without having attended the first conference.

I wish there was documentation of what the actual program and input data looked like. Something this text based.... it almost sounds like you could give it as an Advent of Code day 23 puzzle and somebody will have a Python solution done for it in 45 minutes.


I am very impressed that when most languages have a development time in the double digits (including the initial Haskell implementation), the Lisp version was developed in 3 hours.

If the codebases are publicly available it would be interesting to see how understandable they are.


When you don't have to prove your code is correct, and just hope it generally is (a well-grounded hope for an experienced developer), highly flexible and dynamic languages like Lisp or Python or Ruby or Smalltalk will shine. Mistakes which an experienced developer makes are usually trivial and surface during interactive testing, and the development happens in a REPL. Also, simple algorithms and data structures suffice for prototyping like this.

The larger the code base, and the stricter the correctness and performance requirements, the more you want to use something like Haskell, or F#, or Typescript, which allow you to exactly pin down and statically ensure certain things.


What do you mean by "double digit times"? Your whole statement is almost completely lost because I can't tell if you think Lisp is fast or slow. Generally people say Lisps are good at prototyping and I'd agree, Clojure is super fast.


I'm talking about the table which shows development time in hours, as well as lines of code and lines of documentation.

Come to think of it, I wonder if they counted docstrings as documentation or code?


I would guess that they used a very declarative style of Lisp programming, where much of the code looks like a specification. Here they used Common Lisp enhanced with a relational programming model, like an in-core database with an embedded query language.

The developer(s) were probably quite experienced with it.


Relation programming is really quite effective for high order problem solving.

I've been using prolog recently and found really quite good. I've found myself thinking in terms of ontologies rather than hierarchies, which most over languages encourage.

In case it was missed, the lisp in the paper is a rational lisp.


Errata: rational lisp/relation lisp


The small sample size and age of the experiment makes it not very relevant to today.


It's interesting to think about how this was a selection of the best languages available at the time. Python and Java were brand new!


C++ in 1994 was very primitive. It didn’t even have a standard string class. The STL with all the containers and algorithms did not come until the C++ standard in 1998.


Generics had been defined, but it wasn’t uncommon to see compilers crash(!) when using them seriously. It took a while for them to support all of STL.


All C++ compiler vendors shipped their own containers.

In 1994 Borland had already gone through BIDS, Turbo Vision and OWL.

Apple was shipped AppToolbox alongside Metrowerks Powerplant.

IBM had CSet++.

Microsoft was shipping MFC since 1993.

In fact, 30 years later those frameworks still win over many missing features in the standard library, like networking and graphics.


You are partially right.

It is really hard to write portable C++ code of small projects, because strings are different in nearly all major platforms.

But if You tied to one platform, platform specific strings, usually, are good enough for heavy usage.


Vendors didn't wait until 1998; C++ toolchains touted draft features before then.


There is no actual code, and even the problem description is omitted, in the most cryptic way. Supposedly, the description is in a referenced paper that readers are encouraged to read. However, the citation itself warns us that the paper which contains it is 400 pages that are "Unpublished".

Problem description and code, or it didn't happen.


Page 5: "The geo-server specification was by nature ambiguos and imprecise, thus leading to some variation in the functionalities of developed prototypes. (On the other hand, the specification is probably typical of that found in practice, especially during requirement acquisition.)"

Nevertheless, at Page 6 starts the chapter 3 which aptly named "Problem description".

The Haskell code for a problem solution is at Page 15. It is not complete, but it contains examples of what is called "combinators", from whose it is easy to deduce how authors approached the solution.

But, one needs to be aware of the notion of combinators and how they are usually constructed. For that, take a look at [1], a construction of parsing combinators. They start with a simple higher-order functional approach just like the solution exemplified in the paper we all discuss here.

[1] https://www.researchgate.net/publication/222837975_Combinato...


This reads bizzarely. Awk for creating a server prototype? Haskell as an easy to learn rapid prototyping language? The lines of documentation column -- what is that even? Apparently the lisp version seems to only require 12 lines of documentation whilst the first Haskell version requires 450+ lines?


If the server can work with just a text input and output stream, and then used as a coprocess, it can be written in anything that has basic I/O facilities. If events have to be multiplexed from multiple sources, that's done outside of the server (and it is clear from the context diagram in the paper). The "Sensors" and "Object Tracking" are outside of the "Geo-Server" block.

In the diagram, the Geo-Server takes input from Object Tracking and produces some output as well as a Log. It also takes input from a User Interface, and produces output back to it. Because of that extra multiplexing complexity, a front-end harness layer could be developed: something which interacts with the User Interface, Log, Object Tracking and Deployment, and loads the Geo-Server as basically a plug-in.


Somethings I like in this paper, 1) the AWK programmer tried to do as many statements as possible in an 80 column row (must have loved one liners in Perl), 2) the Lisp implementation just describes the guy being at the kickoff for 4 hours (doing what?).


Are there any more recent studies?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: