What is the most efficient way to draw a single pixel to the screen?

richard · April 15, 2025, 5:21pm

i don’t think we can make the jump to making everything 64 bit all the time (nor should we). tbh, the vector proposal was mainly just a way to surface some sort of “boxed” number where precision could be maintained while doing repeated calculations (like in the physics engine) and also avoiding overflow issues in multiplication. if those boxed numbers were accessed in non-native code, we would convert them back into the usual 32 bits.

…and of course, i’d need to do testing to make sure this sort of thing didn’t tank perf. i actually have no idea what the relative performance of 64 bit and 32 bit integers is on the hardware we target.

kwx · April 15, 2025, 6:09pm

In Meowbit hardware floating point support, or "how I spent an entire weekend adding two numbers together" - #2 by mmoskal , @mmoskal said:

The F4 should have FPU enabled - all the chips we use have FPU. I’ll do that tomorrow.

It would be nice to somehow expose the FPU but I’m not sure how exactly. We need a uniform data representation so the floats would have to be boxed probably negating most of the perf advantage.

I do intend to optimize the fx8 math though.

Instead of internally dealing with fixed-point math and its annoying overflow behavior, would it be worth revisiting using native single-precision float math on hardware targets? Especially if there would be a reasonably efficient way to turn them into fixed point on input or output.

Having a “boxed” representation for transform matrices and for vectors (or even better, arrays of vectors) would help for 3D rendering.

Some useful operators would be:

matrix * vector => boxed vector
perspective transform
vector length (assuming there’s an efficient float square root available on hardware
dot product
squared vector length (that’s just dot product of a vector with itself)

The scalar valued functions could all return fixed point numbers if it can do internal math in a way that avoids overflow. The Fx8 22.8 bit number range for results is probably fine for typical use cases - I don’t remember how significant the issues were with precision that led me to using Fx14 instead for the kart demo. If it is a problem, maybe add a shift as a function parameter?

dot(a, b) => return integer part
dot(a, b, 8) => scaled to Fx8
dot(a, b, 14) => scaled to Fx14

If I have time I could try to go back to my renderer to see what other operations I’m doing with numbers. I’m not sure offhand which other operations could turn into performance bottlenecks - I suspect it’s likely the clipping stage that happens between applying the transform matrix and doing the perspective divide. That involves a lot of comparisons and calculating intersections.

kwx · April 15, 2025, 6:18pm

For inspiration, I’d recommend checking out Brandon Jones’s JS library https://glmatrix.net/ . It’s very efficient and its data types could be a good match for the boxed types it would be useful to support.

The API is very focused on performance, for example it generally uses an out parameter instead of returning values to avoid allocating objects. I think it would be good to follow that.

Kiwiphoenix364 · April 17, 2025, 12:08pm

Ooh that sounds very nice, textured polys could be huge! I just skimmed this thread but from what I’ve seen one of the major issues with 3d is collision and depth sorting, which require in-depth knowledge of 3d math. These aren’t easy problems to solve (especially since clips still happen in modern games), but I believe these are the major roadblocks for “full 3d games” by competent-but-not-insanely-experienced-or-educated developers.

kwx · April 17, 2025, 6:06pm

Just to add to the complication, “clipping” and “culling” are two distinct concepts. Clipping in the sense of cleanly cutting off geometry at the sides of the view frustum is comparatively straightforward, but doesn’t always have the effect you’d want. For example, clipping includes cutting off objects at the “near” view plane intersection, so getting too close to objects can result in being able to see inside them. This gets very disturbing if you then see the inside of an NPC’s head. In this case clipping is working as intended, but the game is missing a collision check to keep you from intersecting the object.

It gets even more complicated in VR - you can’t stop the user’s head from moving, and decoupling the viewpoint movement from head movement is nauseating so you can’t prevent the view from moving along with the head. So apps need to use alternate methods such as fading to black when your head is inside objects or outside a wall.

Culling means discarding geometry before even trying to draw it, for example because an object is entirely outside the field of view (pretty easy) or hidden by a solid wall (much trickier). Games have had issues where poor culling has a big performance impact due to trying to draw objects that end up being covered up. Getting it wrong in the other direction can lead to objects suddenly popping into view.

Topic		Replies	Views
3D graphics test Show & Tell graphics-and-math	22	3506	December 4, 2023
Next Project: Sneak Peek Arcade	29	5698	April 9, 2020
Meowbit hardware floating point support, or "how I spent an entire weekend adding two numbers together" Arcade	1	1267	May 21, 2019
3D Renderer Demo Arcade	9	3744	June 7, 2020
Mandelbrot viewer Arcade	19	2021	May 21, 2019

What is the *most efficient* way to draw a single pixel to the screen?

Related topics

What is the most efficient way to draw a single pixel to the screen?