Staff engineer with over a decade of experience leading products to successful delivery and beyond. Recently focused on high-performance streaming analytics at Sight Machine (I led the design and implementation of their stream processing engine, and before that, their data acquisition product), I can also work as a generalist backend developer or team lead.
Location: San Francisco, CA
Remote: Yes
Willing to relocate: Yes
Technologies: Java, C, C++, SQL, Python, Kubernetes, Linux, perf, PostgreSQL, OCaml
Résumé/CV: http://macdonell.net/resume.pdf
Email: brendan@macdonell.net
Staff engineer with over a decade of experience leading products to successful delivery and beyond. Recently focused on high-performance streaming analytics at Sight Machine (I led the design and implementation of their stream processing product, and before that, their data acquisition product), I can also work as a generalist backend developer or team lead.
Unfortunately none of the hardware used for testing supports FP16 arithmetic. Between Intel and AMD, the only platform that supports AVX512-FP16 is currently Sapphire Rapids.
Value types still require allocation for types larger than 128 bits if the value is either nullable or atomic — that seems like a reasonable trade-off to me.
Keep in mind that you still need send a print job to the fake printer to trigger the exploit. If you send the job to your real printer, nothing happens.
As the comment you replied to indicates, both of those APIs perform bounds-checking. In certain tight loops, this can add up to quite a bit of overhead [1]. However, it's not documented, but if you really know what you are doing you can convince the JIT to elide the bounds checks for MemorySegments [2].
Reading between the lines, it sounds as if they're using mmap. There is no "append" operation on a memory mapping, so the file would need to be preallocated before mapping it.
If the preallocation is done using fallocate or just writing zeros, then by default it's backed by blocks on disk, and readahead must hit the disk since there is data there. On the other hand, preallocating with fallocate using FALLOC_FL_ZERO_RANGE or (often) with ftruncate() will just update the logical file length, and even if readahead is triggered it won't actually hit the disk.
For the file being entirely pre-allocated case I understand, but for the file hole case I'm not sure I understand why you'd get such high disk activity.
If the index block also got evicted from the page cache, then could reading into a file hole still trigger a fault? Or is the "holiness" of a page for a mapping stored in the page table?
I suspect page size/aligned file holes could be backed by a read-only zero page via PTE as an optimization, but they might not be (I'm not as familiar with Linux mmap/filesystems as with FreeBSD).
It is quite possible the filesystem caches, e.g., the file extent tree (including holiness) separately from the backing inode/on-disk sectors for the tree.
I think you may have linked to the wrong graph — while that graph does have a spike on it, the spike happens when the index was broadened in May 2020 to include savings and money market accounts.
Remote: Yes
Willing to relocate: Yes
Technologies: Java, C, C++, SQL, Python, Kubernetes, Linux, perf, PostgreSQL, OCaml
Résumé/CV: http://macdonell.net/resume.pdf
Email: brendan@macdonell.net
Staff engineer with over a decade of experience leading products to successful delivery and beyond. Recently focused on high-performance streaming analytics at Sight Machine (I led the design and implementation of their stream processing engine, and before that, their data acquisition product), I can also work as a generalist backend developer or team lead.
Shoot me an email if you want to chat!