Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Sure, but everything you say about inc is true of add as well, but add double-fuses fine (by "double-fuse" I mean it is 2/4 ops in the fused/unfused domains unlike inc which is 3/4). In general many RMW instructions (double) fuse and most (all?) also modify the flags.

I doubt there is a virtual register for the 1 really - sure there is some storage for it somewhere in the ROB or the scheduler or whatever, but it doesn't need to be "renamed" in the usual sense since it's not visible anywhere. In any case, the add case is "worse" since it can have a real architectural register there, not just an implied immediate 1.

Yes, there is a definitely a limit on the number of inputs a uop can have - and you can see this in the effect of "unlamination" which is where a uop fuses in the uop cache, but then unfuses after that and so mostly acts like an unfused uop (except for uop cache space). This shows up with indexed addressing modes.

For example:

    add [rax], 1
fully double-fuses, but:

    add [rax + rbx], 1
Double-fuses only in the uop cache (counts as 2 there), but unlaminates once after that (counts as 3 in the rest of the fused domain).

Interestingly though this guy:

    add [rax], rbx
Still fully double-fuses everywhere, despite having the same number of input registers as the add [rax + rbx], case. Probably it's easier for the renamer though because the registers are spread across the uops more evenly rather than being concentrated in the load uop?

Moving away from RMW to load-op there are other indications flags aren't a problem: things like BMI shrx/shlx/rorx with memory operand don't fuse despite that these don't update flags at all. On the other and ANDN, which is similarly in BMI and is also 3-arg instruction (distinct src/dest) and updates flags does fuse! So actually I'd say updating the flags in a consistent way makes it more likely to fuse.

Maybe that's the answer then?

Anything that updates the flags in the "standard way" - i.e., SF,ZF,CF,OF all set to something, can (potentially) micro-fuse. Anything which doesn't - whether that is updating fewer flags (inc) or no flags (shrx) or updating them "weirdly" (shl and friends) isn't eligible. Interesting theory and still consistent in broad strokes with your "it's the flags!" claim.



This theory is cool, but I don't think it works, all things considered. PDEP and PEXT should have the same unfused behavior as SHLX, since they also do not change any flags, but they _do_ fuse. BEXTR should (or could) fuse, but doesn't. So I don't know.


You are right, so yeah I can't explain really why certain ops fuse and some don't. There doesn't seem to be a strong pattern.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: