You need hardware acceleration for any kind of performance. Without it you are just emulating which means you needs lots of hardware instructions for a few emulator instructions
Yes indeed. I develop QEMU at work mainly implementation of new hardware as needed for my employer. It has a software emulator, but it’s not very good. It’s acceptable.
The instruction generation backend does not seem to prioritize performance. Instead, it prioritizes accuracy and ease of maintenance. There is low-hanging fruit for making it faster but there isn’t much interest in doing so for the TCG backend. The attitude seems to be that it’s good enough.
For a small example, you may find it interesting that QEMU does not implement floating point acceleration. It’s done in software even though the host has floating point instructions. It usually doesn’t attempt to use those floating point hardware facilities on the host and instead execute many hundreds of instructions to do floating point using the Berkeley software implementation. Almost never does this matter but it costs a lot of performance. Compare this to the translation performed by projects like FEX and Box64 which do and blow QEMU out of the water for specific use cases.
Another place in the emulator that could be improved is handling of executable pages or cached output of the backend code generator. The executable code caching mechanism is very simple and could probably be much more aggressive on today’s systems.
If you examine change logs, TCG really doesn’t get much TLC last time I checked. It could be a better emulator but performance outside of KVM use case is not as important to the project.
QEMU absolutely will use hardware floating point where it can but only when it will give the correct results. FEX and Box64 are user mode emulators which achieve their speed by avoiding emulation where they can buy thunking at API boundaries.
Call it a difference of opinion that I don’t believe it should try to be bit-accurate for floating point. But, it’s a valid position to take. There are many use cases for QEMU. In this case where we emit some host instructions I do believe it’s still within the helper function instead of inline which is not ideal. The guest code using floating point in the first place to me implies some degree of inaccuracy is permissible and this is the position that some cross architecture game emulators take. But again, I suppose it can depend what code you wish to run.
With KVM performance will be quite good, but when you need to emulate cross architecture? I don’t think there are many alternatives that support the entire VM. I only know of user space tools that are focused on emulating a binary.
You need hardware acceleration for any kind of performance. Without it you are just emulating which means you needs lots of hardware instructions for a few emulator instructions
Yes indeed. I develop QEMU at work mainly implementation of new hardware as needed for my employer. It has a software emulator, but it’s not very good. It’s acceptable.
The instruction generation backend does not seem to prioritize performance. Instead, it prioritizes accuracy and ease of maintenance. There is low-hanging fruit for making it faster but there isn’t much interest in doing so for the TCG backend. The attitude seems to be that it’s good enough.
For a small example, you may find it interesting that QEMU does not implement floating point acceleration. It’s done in software even though the host has floating point instructions. It usually doesn’t attempt to use those floating point hardware facilities on the host and instead execute many hundreds of instructions to do floating point using the Berkeley software implementation. Almost never does this matter but it costs a lot of performance. Compare this to the translation performed by projects like FEX and Box64 which do and blow QEMU out of the water for specific use cases.
Another place in the emulator that could be improved is handling of executable pages or cached output of the backend code generator. The executable code caching mechanism is very simple and could probably be much more aggressive on today’s systems.
If you examine change logs, TCG really doesn’t get much TLC last time I checked. It could be a better emulator but performance outside of KVM use case is not as important to the project.
QEMU absolutely will use hardware floating point where it can but only when it will give the correct results. FEX and Box64 are user mode emulators which achieve their speed by avoiding emulation where they can buy thunking at API boundaries.
Call it a difference of opinion that I don’t believe it should try to be bit-accurate for floating point. But, it’s a valid position to take. There are many use cases for QEMU. In this case where we emit some host instructions I do believe it’s still within the helper function instead of inline which is not ideal. The guest code using floating point in the first place to me implies some degree of inaccuracy is permissible and this is the position that some cross architecture game emulators take. But again, I suppose it can depend what code you wish to run.
Is there a faster emulator? From my experience qemu is pretty performant. It is especially fast when you use KVM or Hyper-V acceleration
With KVM performance will be quite good, but when you need to emulate cross architecture? I don’t think there are many alternatives that support the entire VM. I only know of user space tools that are focused on emulating a binary.