in order to perform that kind of canonicalization/folding/cse you effectively need to embed a whole-ass tensor ops interpreter into your compiler. note this isn't so far-fetched
the problem is it needs to be performant as well. so what some people do is jit these canonicalizations using the compiler (yo dawg i heard you like compiling...).
in order to perform that kind of canonicalization/folding/cse you effectively need to embed a whole-ass tensor ops interpreter into your compiler. note this isn't so far-fetched
https://github.com/openxla/stablehlo/blob/main/docs/interpre...
the problem is it needs to be performant as well. so what some people do is jit these canonicalizations using the compiler (yo dawg i heard you like compiling...).