Excellent investigation! The F4 should have FPU enabled - all the chips we use have FPU. I’ll do that tomorrow.
Honestly I don’t think we ever will go for chip without FPU for Arcade anymore. Initially we wanted to use a bigger version of STM32F103 but they are not big enough or they are more expensive than the F401.
It would be nice to somehow expose the FPU but I’m not sure how exactly. We need a uniform data representation so the floats would have to be boxed probably negating most of the perf advantage.
I do intend to optimize the fx8 math though.