Hacker Newsnew | past | comments | ask | show | jobs | submit | harpocrates's commentslogin

GHC also uses Demarau-Levenshtein [0]. The fuzzy lookup function gets re-used for a couple places too (including whatever you encountered in GHCi).

[0]: https://github.com/ghc/ghc/blob/25977ab542a30df4ae71d9699d01...


For anyone curious about the difference between the two: Hughes' pretty printer works really well for pretty-printing Haskell code, but Wadler's is more flexible for pretty-printing C-style languages where there is a dedented closing brace at the end of scopes.

It is 100% true that Hughes' paper is a better introduction to the idea of having algebraic document combinators. One can skim Hughes' and then feel at ease using a library that implements Wadler's printer - the bigger differences between the two libraries are tucked away in the layout algorithms and the APIs are similar.


> Hughes' pretty printer works really well for pretty-printing Haskell code, but Wadler's is more flexible for pretty-printing C-style languages where there is a dedented closing brace at the end of scopes.

That's very interesting! So far I've only used Wadler-Leijen-style prettyprinters.

Are Hughes-style pretty printers "better" in some way for Haskell-like documents? If so, why?


As I've already posted elsewhere on this thread, such a tool has been developed in parallel (and with the intent to consume the output of) c2rust. Development on that occurs here: https://github.com/immunant/c2rust/tree/master/c2rust-refact....


Thanks, I hadn't seen this by the time I posted. Very cool.


Step 2 has been happening concurrently, and pretty much since the start of the project. See https://github.com/immunant/c2rust/tree/master/c2rust-refact.... It is just a lot tougher to get right, so it gets less publicity.


Ah awesome! Thanks for the link.


Safety in the form of tightly-fitting types is great, but the signatures definitely suffer a bit in readability (otherwise, Haskell signatures are generally quite helpful when deciphering a functions purpose). I understand why it needs to be this way, but this signature is nasty looking:

    makePCATypeSafe
      :: forall d x y. (AllConstrained SingI '[d,x,y], d <= x ~ 'True)
      => Proxy (d :: Nat) -> DimMatrix D y x Double -> TypeSafePCA
In (pseudo code), dependent types would really help out a lot:

    makePCATypeSafe
      :: forall (d :: Natural) (x :: Natural) (y :: Natural). d <= x
      => Proxy d -> DimMatrix D y x Double -> TypeSafePCA
It sometimes feels like GHC is a kitchen sink for type-level extensions. I wish the bar for entry for new extensions of this sort was higher and the language a bit more cohesive.


There is a plan to add proper dependent types to GHC: https://gitlab.haskell.org/ghc/ghc/wikis/dependent-haskell. No exact timeline on when it will be ready though.


> It's available in Haskell as a first class citizen.

Not really. Haskell's laziness makes it easier to write functions that are memoized, but it does not automagically memoize functions. And how could it without incurring a non-trivial runtime space/time cost?

Haskell's purity is what makes it easier to write libraries that facilitate building memoized versions of functions in a transparent way (and that are obviously correct). For instance, I usually reach for [data-memocombinators][0], which happens to have a fibonacci example at the top of the docs:

    import qualified Data.MemoCombinators as Memo

    fib = Memo.integral fib'
       where
       fib' 0 = 0
       fib' 1 = 1
       fib' x = fib (x-1) + fib (x-2)

  [0]: http://hackage.haskell.org/package/data-memocombinators-0.5.1/docs/Data-MemoCombinators.html


But is there any reason the Haskell compiler couldn't decide to memoize a function call itself if it determined that it would speed things up? To me that sounds hard for the compiler to do heuristically but a Haskell compiler should still have more scope to do optimizations like that than a C compiler has.


Haskell is call-by-need so you get guarantees you wouldn't otherwise. It's not the same as memoization but it gives you what you want in any case.


I heartily recommend [splain][0] to anyone debugging non-trivial implicits. It is a scalac compiler plugin that, among other things, will swap out the horribly unhelpful "implicit not found" or "diverging implicit expansion" messages for an indented and annotated representation of the search tree that the compiler went through before giving up.

`-Xlog-implicits` is good to use every now and then, but it quickly becomes unreadable for any decent sized project.

  [0]: https://github.com/tek/splain


Well grown libraries are using the `@implicitNotFound` annotation to generate better library-controlled compiler errors, with advice given for what to do. It has its limitations of course, but proved very useful in many situations.

A search tree is cool, but it's not going to explain how to get a value when it has restrictions (dependencies).


`@implicitNotFound` still only gives you a top-level failure message. Also, my comment was aimed more at developers of libraries with complex implicit derivations not the consumers of such libraries.


I can't understand why this is not built into Scala.

It is utterly essential if you are doing anything with implicits. In particular with libraries like Circe.


I wish I knew about this when I wrote Scala (albeit briefly). It's one of those "ill posed problems" when debugging - what is the implicit from a third party library that is not in scope that makes this work?


FWIW we've spent a lot of time trying to re-engineer the control-flow translation to be heuristic-friendly in c2rust. We've extended and tweaked the Relooper algorithm in hopes of preserving more of the initial control flow. It still isn't great, but it is getting better.

Offhand, here are some sizeable additions:

* Keep track of some extra information from the C source about what basic blocks were in loops or branch points, then use than information to try to extract back out similar looking Rust

* Support translating `switch` to `match`, complete with collapsing some patterns together

* Properly handle initialization

That said, c2rust can be invoked _without_ relooper enabled if you so wish. In that case, it will simply refuse to translate code with goto's.


I have no idea what I'm talking about, but if you have a control flow translation pass that is intended to translate constructs like goto and produces "uglier" code, why not automatically disable it for functions (or whatever granularity works) with no such constructs?


I love that the relooper can be disabled! I think that option will be really useful while the heuristics are being tuned, since many codebases avoid goto as a matter of style and can probably be translated without it, or with minor tweaks.


The idea is that this is a first step towards safe Rust. First, you convert to unsafe (but semantically preserving) Rust, then you refactor. The refactor stage probably will involve changing some semantics (read: fixing bugs), or perhaps proving some properties with an SMT solver before applying certain transformations (converting a `libc::c_int` to an `i32`, or a `*const i32` to a `&i32`).


C2Rust basically compiles C into a very low level program in Rust. You've then lost the C idioms. Now you have to decompile the low-level rust with pointer arithmetic into Rust abstractions. That's very hard, probably harder than converting C idioms to Rust idioms, checking to see if the result will be equivalent, and falling back to low-level compilation only when absolutely necessary.

The key to this is figuring out the comparable representation for data. Mostly this is a problem with arrays, since C's array/pointer system lacks size info. All C arrays have a size; it's just that the language doesn't know it. The trick is to figure out how the program is representing the size info. Somewhere, there was probably a "malloc" which set the size, and you may have to track backwards to find it. Then you can replace the C array with a Rust array that carries size information, and maybe eliminate variables which carry now-redundant size info.

That would produce readable Rust. But it requires whole-program analysis. That's OK, that's what gigabytes of RAM are for.


I suspect it's not possible for most interesting programs, even with whole program analysis. As soon as you start storing pointers behind other pointers, it's (very) hard to keep track of where they came from.

There's more discussion in the replies to https://news.ycombinator.com/item?id=17382464 that you may've missed.


Yes, I know; I started that discussion.


I see, I'm sorry. It just seemed like you might not have noticed additional replies to your comment, since you didn't reply there and essentially didn't change what you said.


I would really love to talk to you about this, we are working some on this for a project related to converting C to a safe C superset.


I'm a primary contributor to c2rust and I may be the person "stolen" away from corrode. I'd like to apologize if it feels like we ripped off ideas without giving due credit - the project wasn't really supposed to "discovered" so soon. The website was a throwaway idea, a means of easily sharing our work in a limited circle while avoiding both the DARPA approval and the sub-par build process (you have to build a huge chunk of Clang).

So here is me acknowledging Jamey's work: I personally did take inspiration from Corrode, and I was expecting to work on Corrode proper when I joined Galois. I've re-read the CFG module of Corrode several times (as well as the Mozilla paper, and some older literature).

All that said, I also want to point out that Corrode hasn't had any activity at all since last April - and that's not for want of PRs piling up. I'm not criticizing here, since I understand that managing an open source can be quite time-consuming and stressful, but I feel like this also does need to be mentioned. Also, c2rust can be freely forked. Once the DARPA funding runs out, it is my hope that the Rust community will become the maintainers.

Finally, regarding the many improvements that can be automated: that is next up on our plate!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: