> not things like multiplication in order to perform that kind of canonicalizati...

> not things like multiplication

in order to perform that kind of canonicalization/folding/cse you effectively need to embed a whole-ass tensor ops interpreter into your compiler. note this isn't so far-fetched

https://github.com/openxla/stablehlo/blob/main/docs/interpre...

the problem is it needs to be performant as well. so what some people do is jit these canonicalizations using the compiler (yo dawg i heard you like compiling...).