Sometimes people are surprised that our muvm stack for Asahi Linux runs everything as emulated x86/x86_64 code, including the whole Mesa graphics driver. Yes, an Apple GPU driver compiled for Intel! That must be slow, right?
But... Rosetta on macOS does the same exact thing!
Why do we know? A few reasons, but one important one is that Apple implemented some custom x86 compatibility features in the M1 CPU as global, privileged switches, which cannot be changed without going through the kernel (which is too slow to do often). This means that a Rosetta process on macOS can only run x86_64 code (under a purpose-built emulator), it is not compatible with standard ARM64 code! In particular, NEON behavior is different in an incompatible way.
This is also why FEX can't use that particular feature, because it does rely on standard ARM64 infrastructure and libraries, and those would not work in that mode. We'd need a statically compiled FEX with it and every dependency built with a custom compiler ABI/mode. On the flip side, FEX does support running the GPU driver and a few other cherry-picked things as native ARM64 code, and we'll support that eventually.
I believe the M4 implements a standardized version of this x86 compatibility feature, which does allow changing the behavior at runtime efficiently so it can be used by FEX with mixed emulated x86_64 and native arm64 code.