What is the *most efficient* way to draw a single pixel to the screen?

In Meowbit hardware floating point support, or "how I spent an entire weekend adding two numbers together" - #2 by mmoskal , @mmoskal said:

The F4 should have FPU enabled - all the chips we use have FPU. I’ll do that tomorrow.

It would be nice to somehow expose the FPU but I’m not sure how exactly. We need a uniform data representation so the floats would have to be boxed probably negating most of the perf advantage.

I do intend to optimize the fx8 math though.

Instead of internally dealing with fixed-point math and its annoying overflow behavior, would it be worth revisiting using native single-precision float math on hardware targets? Especially if there would be a reasonably efficient way to turn them into fixed point on input or output.

Having a “boxed” representation for transform matrices and for vectors (or even better, arrays of vectors) would help for 3D rendering.

Some useful operators would be:

  • matrix * vector => boxed vector
  • perspective transform
  • vector length (assuming there’s an efficient float square root available on hardware
  • dot product
  • squared vector length (that’s just dot product of a vector with itself)

The scalar valued functions could all return fixed point numbers if it can do internal math in a way that avoids overflow. The Fx8 22.8 bit number range for results is probably fine for typical use cases - I don’t remember how significant the issues were with precision that led me to using Fx14 instead for the kart demo. If it is a problem, maybe add a shift as a function parameter?

dot(a, b) => return integer part
dot(a, b, 8) => scaled to Fx8
dot(a, b, 14) => scaled to Fx14

If I have time I could try to go back to my renderer to see what other operations I’m doing with numbers. I’m not sure offhand which other operations could turn into performance bottlenecks - I suspect it’s likely the clipping stage that happens between applying the transform matrix and doing the perspective divide. That involves a lot of comparisons and calculating intersections.

4 Likes