Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Better programmable gather/scatter (like described here: https://www.nextplatform.com/2017/09/14/shedding-light-dark-...) can definitely open up a wider range of applications to vectorization.


What do you mean by programmable gather/scatter. GPUs already do efficient gather and scatter operations. I think the knight's landing AVX-512 even has efficient gather and scatter.


Read the linked article, and the paper linked from there. Basically the idea is that gather/scatter can be very inefficient from a cache and BW perspective. In the worst case you're using only a single element per cache line. So the idea is to "move" the scatter/gather engine to the memory controller, and pack the vectors already in the cache rather than in the register file.

Will it work in reality? No idea, but it's an interesting idea certainly worth exploring.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: