For example, to evaluate something as simple as 1.3+2.4, you need an exact model of your target architecture's fp unit.
If you want to cross compile, generate code for x86 while running the compiler on anything other than x86, this means you need to implement 80-bit long double fp in software.
If you want to cross compile for PowerPC, you get to build software support for their bespoke "double double" format, which is not IEEE754 128 bit but a totally different thing.
Some GPUs (and also CPUs) flush fp denormals, so you also have to handle this in your constant folding.
And all that is just to handle addition, one small part of constant folding.
You're making assumptions about the input languages guarantees about floating point math here. That illustrates the challenge with LLVM: It needs to be complex because it aims to support a very large set of both inputs and outputs. What you say is true in the general case. It may or may not be true for a given compiler depending on input language and output architectures. And, indeed on whether you need to care about cross compilation.
It makes an important point that one needs to be more precise when evaluating complexity.
If you want to cross compile, generate code for x86 while running the compiler on anything other than x86, this means you need to implement 80-bit long double fp in software.
If you want to cross compile for PowerPC, you get to build software support for their bespoke "double double" format, which is not IEEE754 128 bit but a totally different thing.
Some GPUs (and also CPUs) flush fp denormals, so you also have to handle this in your constant folding.
And all that is just to handle addition, one small part of constant folding.