That approach is harder to use in practice that in sounds. It's not like people have not tried it.
The OS only let you alloc in large granules, like 4k or 16k, and the vast majority of the methods are significantly smaller than that, meaning a JIT must colocate multiple methods in the same allocation block or waste a significant amount of memory.
We could get around that by remapping memory between read/write to read/execute and have the OS solve the problem for us. Except for a couple of small details, modifying a memory mapping is very expensive and we're, well, in the performance business, and that mono is multi-threaded so one thread might be executing code from the exact page we just made non-executable.
This approach, IIRC, was tried by Firefox as it has some security advantages, but discarded due to the measurable performance impact - and they don't have the second problem as JS is single threaded.
How is this safe in the multithreaded case anyway? If a process has just written a new JITted method and is flushing the i$ on the CPU it's executing on, but then gets scheduled away part-way through the flush, if you were very unlucky then couldn't another thread then get scheduled on that CPU and try to execute the just-written method, which failed to be fully flushed from that CPU's cache?
Multi-threaded safety is simply due to JIT controlling the visibility of the newly compiled code. First flush, then make it visible for execution, can't go wrong with that and scheduling won't matter.
Things get a lot more complicated when it comes to code patching, but the principle is similar.
I don't think that helps - the point is that the flush might not be effective if the flushing thread gets scheduled away from the core which has the stale I$ before it manages to fully issue the flush.
Or is the flush guaranteed to flush all cores caches? That would be a fairly unusual design.
IC IVAU instructions are broadcast to all cores in the same 'inner shareable domain' (all cores running the same OS instance are in the same inner shareable domain)
Firefox is shipping W^X for jitcode, as far as I know. Or at least https://bugzilla.mozilla.org/show_bug.cgi?id=1215479 is marked as fixed and I don't see any obvious bugs blocking it that are fallout from that change.
And the "Firefox (Ion)" and "Firefox (non writable jitcode)" lines on https://arewefastyet.com/ seem to coincide...
But yes, actually making this work in practice is not at all simple.
I see the argument about the page size potentially being large relative to the size of a jitted function. But,
> modifying a memory mapping is very expensive
It used to be true that operations on memory mappings were appallingly expensive. However, the advent of virtualization has driven a significant change in performance. IIRC, ARMv8 has a TLB invalidation operation that is per-entry, addressed by the virtual address being invalidated. You don't need to flush the entire TLB cache.
Core migration don't need to reach a global synchronization point, just enough so that the 2 cores in question agree with each other. This can be done without requiring global visibility of all operations of the source core.
it reads some coproc regs, which are instructions that cannot be reordered. they slow down everything on an OOO core. after that just some bitmasking (not slow)
For 64-bit ARMv8 (ie AArch64) system registers are in general reorderable; software must provide explicit synchronization (typically via barrier instructions) where it does not want the reordering, except for a few registers which have implicit synchronization. Since CTR_EL0 is entirely constant there's no inherent reason why it shouldn't be reorderable pretty freely, though it's an implementation detail how fast or otherwise it is in practice. (Benchmark if it matters to you!)
(This is all documented in the v8 ARM ARM section "Synchronization requirements for AArch64 System Registers".)
Moving to a startup when under H-1B could be a very risky and possibly disastrous choice.
If those are companies that are illiterate on the most basic things, such the difference between an H-1B application and a transfer then there's a good chance they will fuck it up on the important parts.
Important things like H-1B renewal schedule, green card sponsorship, visa renewal expenses and so on.
You change jobs and the next thing you know is that you're stuck in the USA for up to 6 months because one of the overworked founders didn't apply for the renewal early enough.
The above is an easily manageable one compared to all the horror scenarios that a mismanaged H-1B immigration can be.
Plus there's no discussion against zstd itself and its container format.