ZCS – An Entity Component System in Zig

tauoverpi · 2025-04-14T22:30:21 1744669821

Nice,

I've found that using an iterator for this often generates quite a bit of extra code and prevents vectorization in general which is why I switched to an API using inversion of control for the `forEach` case (I don't have iterators).

Working on one item at a time (which the iterator causes) resulted in quite a bit of overhead defeating partially the gains from having a more compact memory layout (SoA) with a more complex code path while preventing use of SIMD over multiple components at a time.

Is this an issue observed in this implementation and what is the general design space that this implementation targets? How it works now is partially how mach had it at least a year or more ago if I remember correctly while they still had an ECS.

masonremaley · 2025-04-15T05:51:26 1744696286

That's a great question.

I was concerned about this, so I provide the following two APIs:

* https://docs.gamesbymason.com/zcs/#zcs.Entities.chunkIterato...

* https://docs.gamesbymason.com/zcs/#zcs.Entities.forEachChunk

Both of these iterate over chunks instead of entities, and as a result the iterator only needs to advance once per chunk of contiguous entities. The caller is then responsible for calling chunk.view (https://docs.gamesbymason.com/zcs/#zcs.chunk.Chunk.view) to get slices of components from the current chunk, and from there they can either rely on autovectorization or implement the vectorization themselves since they're now working actual slices of data instead of an iterator.

I haven't actually checked if the optimizer is smart enough to work back from the higher level API and realize that it's just an abstracted for loop. I inline some stuff to encourage this, but my assumption is basically that this is an unrealistic thing to expect it to do hence the lower level API. Providing the per-chunk API also isn't really a maintenance burden or anything since the high level API is implemented in terms of it anyway.

There's an argument to be made that I should only provide the lower level API since it's likely more optimal, but IMO it's less friendly & switching to it in the rare instances where it turns out to be a bottleneck should be easy enough.

Is my chunk iterator API similar to where you ended up with, or did you take it a different direction? Feel free to link your project if you want, I'm always interested to see how other people handle this stuff!

tauoverpi · 2025-04-16T22:31:26 1744842686

I initially ended up with a similar-ish chunk iterator [1][2] after a few iterations however even with that I found that the optimizer wouldn't remove the bookeeping required for maintainining the iterator state itself. This is where things shifted towards using inversion of control and inlining [3] to get better results [4][5].

From my testing (I should have kept the results) the optimizer isn't always smart enough and requires a bit of hand-holding to get there and as long as you cannot express `noalias` on slices (maybe a missing zig proposal? I don't know how to propose this one) independent of function parameters it might never auto-vectorize your code as it cannot be sure that another mutation isn't taking place (if you call a function) even if you have const slices. The switch from generating an `@call` tuple to passing a struct of slices ended up with losing auto-vectorization for AECS as an example as it isn't possible to specify `noalias`.

The stronger argument for only having the chunk API is that you gain the ability to treat each update of an archetype as a transaction which ends once you move on to process the next archetype. This makes it easier to see the units of work which could be split over multiple threads, archetype aware tooling (e.g an editor) has an easier time to hook into "begin/end transaction", and (similar to the editor) it's easier to dynamically handle different paths for different archetypes that match a given query. I'd argue this is more friendly as they can write the same scalar-style code with the difference being just a multi-item for loop over component slices rather than a while loop. The same applies to `forEach`.

The user can always make a single-item-iterator from the chunk API themselves if they need one but they have to justify such a change and it's effect on performance of their application as one-item-at-a-time is further removed from the problem at hand where many-at-a-time is the default for archetype style ECS (the single-item iterator API communicates the wrong mental model if that makes sense?).

[1]: https://gist.github.com/tauoverpi/89c506dda247a2848cda46dfc9...

[2]: https://gist.github.com/tauoverpi/6f4832a406e49ad112a88395fa...

[3]: https://codeberg.org/tauoverpi/game/src/commit/f9dc913a805dc...

[4]: https://codeberg.org/tauoverpi/game/src/commit/f9dc913a805dc...

[5]: https://codeberg.org/tauoverpi/game/src/commit/f9dc913a805dc...