Melody – a language that compiles to regular expressions

draegtun · on March 2, 2022

I like using `parse` in Rebol / Red - http://www.rebol.com/r3/docs/functions/parse.html

Here's the parse rule for Batman:

  [
      16 "na"
      2 [space "batman"]
  ]

And complete example for the Semantic version:

    digit: charset "1234567890"

    if parse "0.2.5" [
        opt "v"
        copy major [some digit]
        "."
        copy minor [some digit]
        "."
        copy patch [some digit]
    ][
        print [major minor patch]
    ]

chubot · on Feb 17, 2022

I made a wiki page for similar alternative regex syntax projects:

https://github.com/oilshell/oil/wiki/Alternative-Regex-Synta...

(including my own https://www.oilshell.org/release/latest/doc/eggex.html which is built into Oil)

Banana699 · on Feb 17, 2022

You might want to add Wolfram's Language String Patterns[1].

Perl 6 (which is now actually called Raku, for what it's worth) has BNF-style Grammars as a first class citizen of the language (a virtually unheard of thing AFAIK), and programmers are encouraged to use it for complex parsing tasks instead of regex. I don't know whether that falls under "alternative syntax for regex", they are very closely related to regexes and actually much more powerful and readable, including regexes as a subset. But adding them might drag you into adding every Context Free\Parsing Expression Grammar tool out there, things like Bison and YACC and Antlr.

The language SNOBOL[2][3][4] is one of the earliest languages with text matching and processing as a first class citizen (in fact, the only citizen). Being designed (1962) before the first software implementation of regular expressions* (1968), the pattern language it uses is not based on regular expression, and in some cases actually exceeds it (e.g. matching balanced parenthesis, which mathmatical regex can't do, but some variants of practical regexes can with special non-regular constructs).

This thread[5] in Retrocomputing stack exchange discusses what is the earliest language with string pattern matching capabilities, and finds hidden gems in the process.

[1] https://reference.wolfram.com/language/guide/StringPatterns....

[2] https://en.wikipedia.org/wiki/SNOBOL

[3] https://dl.acm.org/doi/10.1145/800025.1198417

[4] https://www.snobol4.org/

[5] https://retrocomputing.stackexchange.com/questions/658/what-...

* : As opposed to the mathematical definition of regular expression, which dates back to 1956 and perhaps even before.

chubot · on Feb 17, 2022

Thanks for the links, I added Wolfram.

I think SNOBOL might count, but it's a bit different in that I think it's a "dead" language now? e.g. it doesn't appear to have a user base or implementations. But feel free to add it or anything else if there are good links.

There's definitely a fuzzy line between things like LPeg or Rosie and YACC/ANTLR ... it's less about the power of the language and what kind of tasks people use it for, I suppose. If it's "scripting friendly".

nefitty · on Feb 17, 2022

That's so weird. I literally just started my own exploration of alternative Regex syntax this morning. The simulation rears her head again. The Dude abides.

blahgeek · on Feb 17, 2022

Emacs lisp provides a similar tool (with better syntax IMO): https://www.gnu.org/software/emacs/manual/html_node/elisp/Rx...

yoav_lavi · on Feb 16, 2022

Author here, thanks for posting Melody! This is my first attempt at a language and I'm learning Rust, so any input would be appreciated

fouc · on Feb 17, 2022

Minor comment.. I personally find it harder to type < > than : because the < > keys are a lot closer to the shift key and causes more wrist strain.

Have you considered borrowing the :emoji: convention that slack/discord/github use? :space: :feed: etc..

fire · on Feb 17, 2022

( not OP and not disregarding the issue at hand ) have you tried practicing usage of the shift key opposite the key you want to press? Learning to make that change was hard for me, but is one or if not the largest improvement for me over the years in both typing speed and general typing comfort

zdragnar · on Feb 17, 2022

Assuming a standard QWERTY layout, shouldn't you be using the left shift when typing < or >? It'll significantly reduce the contortion effort in your right hand. Same goes for most chords- use opposing hands for the modifier and symbol keys.

(Oddly enough, I don't bother doing this with ctrl/cmd + a/s/f/z/x/c/v, but I think that is mostly because keys to the right of the space bar vary so much between laptops and keyboards that I never bothered trying to stick with it).

dotancohen · on Feb 17, 2022

Hello Yoav! In my opinion the match keyword is not needed. When the parser gets to an opening bracket that should start whatever methodology the match keyword is doing. As a heavy regex user, I understand that you want consistency with the capture keyword behaviour. But if we assume that the user is a programmer but does not know regex, it makes more sense to view {<space>;"batman";} as an array (delimited by curly brackets).

In fact, you might want to go a step further and consider using [] for match and {} for capture (thus eliminating the capture keyword as well). Using [] for match would be natural for Javascript programmers.

dokem · on Feb 16, 2022

I feel like there should be a way to group a portion of the regex, by name, for extraction later. Otherwise I like it.

ie

    group $first_name { some of <letter> }
    1 of <space>
    group $last_name { some of <letter> }

ozzmotik · on Feb 16, 2022

capture name {} is a thing it listed

maximilianroos · on Feb 17, 2022

A bit orthogonal but something I would love to see:

A library which takes a regex and shows some examples that pass and some that fail. I would find that the easiest way of understanding a regex, rather than changing the language itself. (Though Melody looks v promising and I'm keen to see it develop).

It wouldn't be trivial to build — particularly for the "fail" examples, you'd want them fairly close to passing. For example, with `(/*\.csv\.gz)` you'd want `foo.csv.gz` rather than `aoseutn` as an example of a failure.

all2 · on Feb 17, 2022

There's a python library called xeger [0] that allows you to generate strings from regular expressions. I've used this at work to generate large quantities of "valid" test data.

[0] https://pypi.org/project/xeger/

dotancohen · on Feb 17, 2022

This looks great as a password generator, username generator, etc. Thank you!

tgv · on Feb 17, 2022

The fail bit is harder indeed, especially for larger regexps, but not totally impossible. The easiest way towards that goal seems to be constructing the DFA first and then generating illegal single edits (insertion, deletion, substitution). Generating a positive example is possible without it.

nathancahill · on Feb 17, 2022

Like American Fuzzy Lop for regex. I dig it.

parksy · on Feb 17, 2022

Some thoughts I had for the developer:

Does it (or are the plans to) reverse compile? If I could input my regex and output melody script one could create an excellent interactive learning tool, and also more selfishly help with adoption in teams with crusty old devs like me who like our magic rituals and prefer typing our regex by hand.

Also are there plans to support runtime compiling in JS? Something like...

someMelodyObject = <initialise and configure melody> String.replace(someMelodyObject.toRegexp(), someString)

This I think would make it a compelling library for inclusion into projects assuming it were fairly efficient and lightweight. Not sure how or if you'd have to deal with performance and caching but it would probably go a ways to improving adoption among web developers at least.

Anyways good luck with the project. Regex is often considered a dark art when it's actually fairly concise and expressive, opening it up to more people at a higher level could lead to greater understanding of regex in general. Also what an interesting and challenging project to undertake, definitely a nontrivial challenge all told.

yoav_lavi · on Feb 17, 2022

Author here,

1. A reverse compiler is one of the 'maybe' features (see the table in the README), it's something I'd like but would essentially be an entire compiler so it's non trivial

2. The plan is to make Melody available as a compile step (like e.g. SASS) with no runtime overhead or as a Rust crate. You could do the compilation at runtime but other than including variables in the pattern I'm not sure if it'd have a benefit over compile time transforming, + it'd have a performance impact.

Thank you!

parksy · on Feb 17, 2022

No dramas, understood re 1 it's no doubt beyond trivial to create a bidirectional transpiler. I wouldn't even know where to begin so good work on what you've managed so far.

Re 2, shouldn't be a problem as we already have build processes in place. Most projects I work on have npm build steps, I'm not sure how that figures in with rust (I really need to get off my butt and check it out sometime), but if it could be pulled in as an npm dependency that would work. If it could be done inline even better (e.g. inline melody within a JS file, compiles to the expression inline...)

Anyway good job again so far, have followed the repo, all the best once more!

yoav_lavi · on Feb 18, 2022

Thank you!

>If it could be done inline even better

I've been thinking of using a tagged template based mechanism (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...) so no special editor support is needed and then transpile to a normal regex (similar to how htm handles it's syntax with no runtime, https://www.npmjs.com/package/babel-plugin-htm) but I'm not working on that part of the process yet so it may change.

If Melody does end up using the tagged template mechanism it may end up looking something like this:

  import melody from "melody";

  const batmanRegex = melody`
    16 of "na";

    2 of match {
      <space>;
      "batman";
    }
  `;

That would transpile to:

  const batmanRegex = /(?:na){16}(?:\sbatman){2}/;

Backticks would need escaping but otherwise it'd be normal Melody syntax

ZeroGravitas · on Feb 16, 2022

I always liked the Perl style commented regexes, would be nice if this could generate those, though I guess that needs language support.

https://stackoverflow.com/questions/15463257/commenting-regu...

Some interesting workarounds mentioned here that might pair well with melody type languages.

A way to specify example strings that match or don't as a mini unit test would be cool too.

kdtop · on Feb 16, 2022

I have never taken the time to learn RegEx stuff. It seems like it would be great if I could keep all the syntax in my head. So the idea of Melody seems great. I don't like that the github description claims it to be unstable currently. I hope this project continues and flourishes.

yoav_lavi · on Feb 16, 2022

Author here, thank you! The reason I stated that Melody is unstable is that the project is very young (days) and so some of the syntax is still being considered and may change (although the general idea and direction will remain), and also not everything is implemented yet. I'm also considering changing the way the parsing works but that wouldn't affect end users in terms of expected results for valid code)

thedevelopnik · on Feb 16, 2022

Even as someone who has invested time in learning to write regexp, they are still hard to read and maintain. This project looks super cool!

2OEH8eoCRo0 · on Feb 17, 2022

You could learn it in a few hours. There aren't many rules.

gowld · on Feb 16, 2022

Just use one of the many generators

manual: https://regexr.com/

AI: https://regex-generator.olafneumann.org/

nixpulvis · on Feb 16, 2022

I'm a big fan of https://rubular.com.

SonOfLilit · on Feb 19, 2022

Shameless plug: I'm working on a similar project, with a strong focus on providing a painless migration path. Author: let's chat and perhaps join forces?

Please don't submit to HN until I release the vscode plugin :-)

https://github.com/sonoflilit/kleenexp

sivizius · on Feb 17, 2022

So, a RegEx (melody syntax) to RegEx (unspecified syntax) compiler? I mean, the syntax is nice, but 1. please specify which kind of regular expression it compiles to, 2. are those really regular expressions or a language higher in the chomsky hierarchy? 3. I suggest to add a graphical output of the state machine, e.g. with graphviz.

golf_mike · on Feb 17, 2022

As for 1: "The current goal is supporting the JavaScript implementation of regular expressions." Right on the readme :). 2: I couldn't tell you, but does it matter if it has a practical use? I for one never understood why regexes have the notation they have, and always struggle because I use them next to never. This looks like an attempt to make something that would suit me better. 3. What do you mean exactly?

makach · on Feb 17, 2022

Excellent, now we need a language that compiles into Melody. Can I have a GUI with that?

d--b · on Feb 17, 2022

“And now you have 3 problems”.

Sorry I had to make the joke! This is fine. Regex are a pain.

asicsp · on Feb 17, 2022

See also: https://github.com/VerbalExpressions

politician · on Feb 17, 2022

I wouldn’t write regex in this, but it would be interesting to use it to debug/read complex expressions.