In Meowbit hardware floating point support, or "how I spent an entire weekend adding two numbers together" - #2 by mmoskal , @mmoskal said:
The F4 should have FPU enabled - all the chips we use have FPU. I’ll do that tomorrow.
It would be nice to somehow expose the FPU but I’m not sure how exactly. We need a uniform data representation so the floats would have to be boxed probably negating most of the perf advantage.
I do intend to optimize the fx8 math though.
Instead of internally dealing with fixed-point math and its annoying overflow behavior, would it be worth revisiting using native single-precision float math on hardware targets? Especially if there would be a reasonably efficient way to turn them into fixed point on input or output.
Having a “boxed” representation for transform matrices and for vectors (or even better, arrays of vectors) would help for 3D rendering.
Some useful operators would be:
- matrix * vector => boxed vector
- perspective transform
- vector length (assuming there’s an efficient float square root available on hardware
- dot product
- squared vector length (that’s just dot product of a vector with itself)
The scalar valued functions could all return fixed point numbers if it can do internal math in a way that avoids overflow. The Fx8 22.8 bit number range for results is probably fine for typical use cases - I don’t remember how significant the issues were with precision that led me to using Fx14 instead for the kart demo. If it is a problem, maybe add a shift as a function parameter?
dot(a, b) => return integer part
dot(a, b, 8) => scaled to Fx8
dot(a, b, 14) => scaled to Fx14
If I have time I could try to go back to my renderer to see what other operations I’m doing with numbers. I’m not sure offhand which other operations could turn into performance bottlenecks - I suspect it’s likely the clipping stage that happens between applying the transform matrix and doing the perspective divide. That involves a lot of comparisons and calculating intersections.