Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

+1, rounding f32 to bf16 is helpful. For the other direction, the approach we take in Highway/gemma.cpp is to load a full vector of bf16, then shift/AND to isolate the odd/even elements and convert to float. These can execute two per cycle, whereas promoting 16->32 bit is often just one per cycle (though a different port than FMA).


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: