Surprisingly: no. 1. Cache-oblivious data-structures work no matter the cache-si...

Surprisingly: no.

1. Cache-oblivious data-structures work no matter the cache-size. They aren't 100% optimized to a particular cache size, but they are proven to be "efficient" across many different cache sizes.

2. For an example of #1, matrix multiplication is one of the most optimized functions of modern computing. One major step is to transpose the matrix, so that the "vertical" movement turns into a "horizontal" movement. The horizontal movement, which is "address + 1" allows for the cache to work with your loop, while a vertical movement across the matrix is "address + width", and not cache-efficient.

------------

3. I think it is safe to start making some degree of cache-assumptions on modern systems. All caches today have 64-byte cache lines (or greater). That's ARM, Intel, and AMD CPUs all use 64-byte cache lines. AMD GCN GPUs use 64-byte cache lines. AMD RDNA GPUs use 128-byte cache lines (and Intel is rumored to "pair up" cache lines into 128-byte segments at L3).

This means that every memory operation to main-memory reads or writes at least 64-bytes of data at a time, on all modern systems. DDR4 RAM itself has Burst-Length 8, meaning 64-bytes is literally the smallest unit of data that can be addressed, so it isn't surprising that all CPUs / GPUs are working with such a large number these days.

-----------

Given the 64-byte cache line (or 128-byte, for Intel's L3 cache or AMD RDNA GPUs), "unrolling" your linked lists, or using B-trees of larger sizes (instead of the cache-inefficient binary tree), or d-heaps (instead of binary heaps) plays to the benefit of caches helps out a lot.

You don't need to know the specific size of the cache line to know that d-heaps are more efficient than binary-heaps on modern, cache-heavy systems. You just need to know that sequential-memory access stays within the L1 cache.