Agreed, this is why I like the idea of searching parsed files, for example using scope selectors against code tokenized with a sublime-syntax grammar. Unfortunately, it means it needs to be parsed/indexed first, which is slower than a plain regex search.
I wonder if the AST built by Tree-Sitter could also help with this type of search - does anyone know of any existing solutions for this?
> does anyone know of any existing solutions for this?
https://semgrep.dev, though it's mostly an analysis tool it can be used as a search tool. IIRC it's not super fast, but for the cases where there is no way to really contort a regex into something suitable (the regex has way too many false positives and / or negatives) it works rather well.
> I like the idea of searching parsed files, for example using scope selectors against code tokenized with a sublime-syntax grammar.
I've been working on a trigram-based search engine with support for exactly this (via github.com/trishume/syntect) over the past several months and plan on open-sourcing it soon. Cool to see someone else with this idea!
You might also like https://comby.dev - it is aware of code structure without parsing files
> I've been working on a trigram-based search engine with support for exactly this (via github.com/trishume/syntect) over the past several months and plan on open-sourcing it soon.
Awesome, I look forward to that - you'll post a "show HN" for it, I hope? :)
Does syntect's lack of support for the newest sublime-syntax features cause you any problems?
> You might also like https://comby.dev - it is aware of code structure without parsing files
> Awesome, I look forward to that - you'll post a "show HN" for it, I hope? :) Does syntect's lack of support for the newest sublime-syntax features cause you any problems?
Show HN -> Yep :)
Lack of support for newest sublime-syntax features: less than you would expect, but a little for sure.
Most of the syntax definitions in the wild today don't really use the newer features, at Sourcegraph I wrote a little Rust HTTP server wrapping syntect[1] and we use it for all our syntax highlighting for the past several years, I would say it works on like 95% of code files even if you include a lot of additional syntaxes that are untested with Syntect[2]. That said, it does barf hard on some specific files - either taking a really long time to do the work or getting completely stuck in a busy waiting loop for some reason. That said, it's still the 2nd best syntax highlighter out there (second only to Sublime itself.)
One of my hopes for this side project is that I'll be able to contribute more time upstream with e.g. a more extensive test suite for Syntect against a much larger number of syntax definitions from the wild instead of just Sublime's built-in ones.
I wonder if the AST built by Tree-Sitter could also help with this type of search - does anyone know of any existing solutions for this?