What is the *most efficient* way to draw a single pixel to the screen?

Hello, new to the forums but I’m a middle/highschool comp sci teacher who likes to push the limits of my teaching tools.

I plan to make a 3D renderer from scratch in Makecode Arcade. I know this has been done before in many forms, but I wanted to make sure I was handling this in the most efficient way from the ground up.

That being said… what is the most efficient way to draw a pixel to the screen in Makecode Arcade? I’ve done this in p5.js before by writing data to an image’s pixel array, which to my understanding is the fastest way with that library, but I wanted to ask the wizards here if they had any idea what works best in Arcade.

A couple things I’ve found with preliminary research:

  • The image data type lets you create an image from a string. This can then presumably be rendered to the screen at a desired position. I could write a function that sets characters at specific points in the string similar to how I would set the rbga values of a color on a p5.js image.
  • Similarly, I can technically set a specific pixel with the setPixel function, though I imagine this might be slightly slower than directly modifying image data myself.
  • There are a few other setPixel implementations using onShade and createRenderable
  • I’ve found a couple people mentioning Screen.blitRow, though since this wasn’t the most efficient way to draw to the screen in p5.js, I have a suspicion that may be the case here.

And that covers everything I’ve personally found in a few hours of research. Performance is really important for me. I want to create my own implementations of drawing lines and triangles because I’d eventually like to be able to make custom fill options to handle lighting and shading on the fly (by filling triangles with dithered gradients based on normal direction and scene lighting), and potentially eventually do vertex colors or textures. Most other 3D Makecode Arcade projects that I’ve found are using techniques that other threads have stated is suboptimal (one of the devs mentions drawing to scene.backgroundImage is not performant in one of the posts linked above, which is what many current implementations do).

Any thoughts?

5 Likes

When it comes to drawing basic flat one-color triangles or polygons, the built in triangle function is going to be the best when it comes to performance. This is because it is in the c++ layer of Makecode. But for the lighting you are looking for, it is probably best to build your own. As far as I know, the fastest way to draw to an image is image.setRows(x, buf)

When it comes to performance and 3D, drawing to the screen is not usually the issue when the screen size is so small, the issue is depth-sorting. I am lazy and use the painter’s algorithm with quicksort, but @kwx has had success using BSP trees, as seen in his game Space Rocks Revenge!

3 Likes

Thank you for the sage reply! Your work and @kwx’s have informed a lot of my early decision making with this engine. I actually deconstructed and played around with your dino model from the Makecode Arcade home page for a while before realizing I really wanted to play around with custom dithering. I’ve made a lot of advances thanks to the knowledge gained from that and the thread following Space Rock’s development.

Early discoveries


After playing around with the built in triangle function, this was my first step towards customization. This is heavily based on the Space Rocks code, though mine is more stripped down. As far as I can tell, this is the simplest, most efficient way to set the color of any given pixel on the screen, and you can see here where I’ve stress tested it with some marginally more strenuous math to give me an idea of what the performance might look like when I get around to coding the 3d engine.
This was actually giving me significantly better framerates than the built-in setPixel function, however I found it to be very inflexible. You have to set the pixel within this double for loop, or elsewise manually set them within some similar structure. With a smart 3d rendering pipeline, it’s usable (as evidenced by Space Rocks), but if you’re trying to build an engine from scratch with no experience, not being able to easily implement your own setPixel function is a non-starter.

By the way, I tried using createRenderable with this same technique as some have suggested, and it’s significantly slower than just directly modifying the background image. Awesome if you want to conveniently render things behind your 3d geometry, but it tanks performance. No clue why.

Next Iteration


This looks very similar, so what’s the difference? Well, in order to save myself the headache of trying to set the color of any given specific pixel on the screen (so I could eventually write my own line and fillTri functions), I created an array of buffers for each column of pixels on the screen. Now I can very easily take an x and y coordinate and set the pixel in that index to whatever I want:
image
And despite feeling like there’s got to be a better way of storing that data, I decided that a few extra kb in ram couldn’t be that bad, though I’m not really familiar with my meowbit’s hardware limitations. In any case, this is still more performant than the built-in setPixel, at least on my laptop, though is noticeably slower than the previous technique.

And that leaves us here
I’m following a pseudocode guide on creating computer graphics from scratch that I found at my local library. Since I’ve spent a lot of time since this was posted digging further through the forums and experimenting with y’alls existing 3d implementations, I haven’t made it super far, but I’m currently working on optimizing a line drawing algorithm - if anyone has tips for me, please let me know! I’m educated in C#, so I’m almost certainly breaking some common javascript conventions or missing simple stuff. For lines, I’ve figured that having special cases for horizontal/vertical lines is probably optimal for performance, as any other type of line requires many more comparisons and operations. Since I plan to use horizontal lines for my fill function, this hopefully is a big performance help.


But what I’d really like some feedback on is my possibly scuffed implementation of Bresenham’s algorithm for drawing lines, which my book offhandedly mentions is a more performant technique than what they’re teaching so I tried to follow Wikipedia’s explanation and got this:



Which works, but is now actually slower than the built in drawLine function… :frowning:
I particularly hate how redundant the drawLineLow and drawLineHigh functions look, and I am certain there’s a smart way to refactor them into one neat function, but tbh I started to go cross-eyed at the explanation of how the function works after a while and am struggling to refactor it myself as a result. I still need to finish changing * to imult in the drawLineHigh, for now.

If anyone can think of any more-efficient line drawing algorithms or has suggestions for my implementation, please chime in. There may be others interested in taking my path in the future.

4 Likes

if you want to render triangles using blocks you can use richard’s advanced blocks extension, here is the link:
https://github.com/riknoll/arcade-advanced-blocks
it basically adds some javascript features (such as the polygon/triangle rendering and blit) which should really help you make a 3D engine. also ive done entire screen effects using setPixel and getPixel (or whatever its called :laughing:) and ive had no performance issues.

I think it’s expected that a JS-coded line drawing function will be slower than the C++ version. Out of curiosity, why do you want to draw lines? Are you going for wireframe graphics? For polygon-based 3D graphics, you’ll definitely need a specialized pixel fill function instead.

Note that the renderer I wrote predated the native triangle fill function. If I were to implement something now, I’d probably go with using that instead of implementing my own, though it does come at the cost of losing dithering. With my JS approach, there really wasn’t a performance cost for dithering since it just switched out which of the specialized fill functions it used. It just needed versions for ABABAB and BABABA pixel patterns in addition to single color, since the 1/4 and 3/4 patterns just alternate between the 50% pattern and single color on alternating columns.

In case you haven’t seen it, Space Rocks 3D also included some ad-hoc profiling code which I used to check where the code is spending most of its time. One major gotcha for the hardware targets is that you need to be extremely careful to stick to integer (or fixed point) math for all inner loops. Having a floating point value creep in, for example due to exceeding the expected value range, will wreck performance. The profiling helped catch some of those, and also kept me on track for optimizing the parts that mattered.

In case you haven’t seen it, check out the Carrier 3D (unfinished work in progress) thread. That’s the most recent version of the renderer, and the thread also has some implementation notes and additional resource links.

Good luck on your project!

2 Likes

Ooh the profiler sounds very intriguing. I saw some of the older threads but I think I missed this one - I’ll be sure to check it out!

Right now, drawing lines is just where I’m at in this tutorial, and I suspect it will be replaced with specialized functions, as you mention, as I progress. Though having the option to do wireframe very much appeals to me, and I’ve elected not to use the native triangle function because I want to support a range of dither patterns and I aim to make something more colorful.

The floating point math I was made aware of thanks to some of the other posts I’ve run across - you can see in my drawLineLow snippet I’ve used integer multiplication. I found @mmoskal’s post on the Makecode Kart thread that details speed comparisons.

Thank you so much for your reply! I’ll be taking a look at some of your resources and continuing my own progress later today.

2 Likes

Hi all, wanted to come back with my new line drawing algorithm. Somewhere in that massive wall of text you didn’t read, I complained that there’s certainly a way to refactor my Bresenham Line implementation… And there totally was!

Over 80 lines of code shortened to 21. The intent behind this implementation of Bresenham’s Algorithm is to totally eliminate floating point values from line calculation, per the advice of @kwx . IF you are coding your own line drawing algorithm, this should be the fastest way of drawing lines of an arbitrary angle in an update loop (given you’ve coded an efficient drawPixel function). That being said… it still runs like half as fast as the native line drawing tool :frowning_with_open_mouth:

(Press A to toggle between my implementation, which rotates clockwise, and the native tool, which rotates counter-clockwise)

As for why I wanted to do this? Wireframes are of course nice (particularly for testing), but eventually the tris of a 3d object will always be filled with horizontal lines - so not at an arbitrary angle and not with this algorithm. However, I intend to use a modified version of this line drawer to generate arrays of x starting points and x ending points for each horizontal line that makes up a filled tri. There is probably a better way of doing it, but I can’t think of one that sticks to integer math!

Thank you for reading, I hope this helps someone out there :slight_smile:

4 Likes

I suspect you also have significant performance impact from calling drawPixel in the inner loop. Last I checked function calls added quite a bit of overhead, and you’ll also have redundant math (for example calculating offsets) when each pixel is drawn independently. That’s the main reason why specialized pixel fill functions can be much faster since they pretty much just need to increment a location going from one pixel to the next instead of having to calculate the destination location from scratch. Also, that way you can minimize conditionals in the inner loop. (I used loop unrolling in my renderer’s pixel fill methods to do groups of pixels without inner loop conditionals.)

1 Like

Oh it’s not looking good right now, lol. Despite my efforts, the fill triangle is even worse lol

Yet I trudge on! In my current WIP build, the points of the triangle can have intensity values, with 255 being the highest and 0 being the lowest. They are evenly divided into 15 portions at the moment, so they work with grayscale, but I can easily modify this later.

I also changed my buffer to be a single huge buffer instead of 160 buffers of length 120. I don’t know how much more efficient it is, but that’s how its done in p5.js. This took an hour of so of debugging but I hope it’s better for RAM. I think it will enable me to change some math to bitwise stuff later when I compel myself to understand bitwise operations.

Next is actually getting into the 3D perspective. I hope to have made decent progress by the end of the day.

1 Like

@MrJones it’s going to be pretty difficult to beat the native implementation; there’s a lot of overhead in the generated javascript code in the browser.

BTW, if you are interested in seeing those native implementations, the browser ones are in this file:

if you do want to stick with your own implementation, make sure to use native functions wherever possible in your code; for example, i see you’ve got functions for drawing horizontal/vertical lines in your screen buffer; at least one of those should be able to be implemented using Buffer.fill() instead of setting each pixel individually

1 Like

Thank you, this gives me a lot of hints. I have a lot of functions I’m calling for now, but eventually I’d like to take the code from them and put it directly into the loop, eliminating the call. It’s this way now for sake of convenience, but it’s on my optimization to-do list. Having some indication of where I can start optimizing is exactly why I came here for help - thank you so much!

Just for comparison, here’s the function I’m using to draw a vertical column of two alternating colors as part of the dithering implementation. It first draws pixels in blocks of eight, with no conditionals or JS function calls within that loop, and then fills in the remaining pixels in blocks of 4/2/1 as needed. It’s from Carrier 3D file triangle.ts.

(Hm, I just realized I may have messed up the math here - the y1end = y1 - 7 should probably have been something like y1end = y0 + (y1 - y0) & ~7 to just clear the bottom three bits. But I haven’t tested if that would actually have been faster.)

    export function drawTriLineAlternating(tri: ActiveTrapezoid, buf: Buffer, col0: number, col1: number) {
        const y0 = Fx.toIntFloor(tri.a_y)
        const y1 = Fx.toIntFloor(tri.b_y)

        let y = y0
        if (y & 1) {
            // Swap the colors so that col0 is consistently on even Y
            const tmp = col0
            col0 = col1
            col1 = tmp
        }

        let y1end = y1 - 7
        while (y <= y1end) {
            buf.setUint8(y, col0)
            buf.setUint8(y + 1, col1)
            buf.setUint8(y + 2, col0)
            buf.setUint8(y + 3, col1)
            buf.setUint8(y + 4, col0)
            buf.setUint8(y + 5, col1)
            buf.setUint8(y + 6, col0)
            buf.setUint8(y + 7, col1)
            y += 8
        }
        if (y <= y1 - 3) {
            buf.setUint8(y, col0)
            buf.setUint8(y + 1, col1)
            buf.setUint8(y + 2, col0)
            buf.setUint8(y + 3, col1)
            y += 4
        }
        if (y <= y1 - 1) {
            buf.setUint8(y, col0)
            buf.setUint8(y + 1, col1)
            y += 2
        }
        if (y <= y1) {
            buf.setUint8(y, col0)
            //++y
        }
        tri.a_y = Fx.add(tri.a_y, tri.a_dydx)
        tri.b_y = Fx.add(tri.b_y, tri.b_dydx)
    }
2 Likes

@richard
I was actually curious what that looked like, I spent an hour trying to find all the native implementations but got lost in the nest. With my current shaded triangle, I tried replacing my setpixel with the native one and it halved my framerate, but maybe I can find a clever way to use the native line drawer instead of ever touching setpixel. This is what my implementation looks like right now


I replaced my array of buffers with a single buffer of size 19200 that contains the color for every pixel on the screen. I use math to set it at the correct location, assuming the center of the screen is 0,0 (more or less). Then this is what my update loop looks like

I don’t intend on storing points like this in the long run, though I haven’t decided definitively how I’d like to. I know each {} allocation is pretty hefty per @mmoskal’s post in the MakeCode Kart thread
As I mentioned in my response to @kwx, I also intend on removing the function calls and placing the code directly into my update loop, to remove a little bit of overhead. I have them separate for now for convenience. I’ve also done my best to avoid floating point math where possible, but I’m sure there are a few equations sneaking around that I’ve missed.

Here’s the shaded triangle code I have right now, as I’m starting the perspective projection code.
A toggles wireframe, B randomized point locations and h values

2 Likes

At the risk of being a broken record on this, it’s not “a little bit of overhead”, at least on hardware targets. I’d guess that the time needed for doing a function call, four conditionals for bounds checking, and the index math probably make this 5-10x slower than an inline version that just directly writes data to adjacent pixels and that can skip all the per-pixel bounds checks and most of the index math. This requires restructuring the code to make such optimizations possible.

In other words, I think my answer to your question in this topic is “don’t draw single pixels”. You either need efficient specialized code to draw groups of pixels together, or use native code to do so for you.

2 Likes

@kwx @MrJones @Brohann out of curiosity what native functions would you most like to see added for image manipulation? or would things like vectors/matrix math be more useful?

1 Like

I’m about to dive into making my own matrix implementation, but for future’s sake that would be awesome!

I see. I do plan on doing this eventually, but right now having more control of exactly where pixels go is important to the way I’m programming this. I went into this topic with the incorrect assumption that there couldn’t possibly be a more efficient way than drawing a single pixel at a time, but I’ve learned a lot from talking with you, thank you. When I settle on a method for shading, I’ll work on refactoring what I have to a more specialized solution, almost certainly similar to what you’ve done. I have some ideas in mind, but I’m honestly pretty inexperienced with bitwise operations and I need more practice before I settle on a technique.

1 Like

It would be nice if there was a "distort (image) to [(x1,y1),(x2,y2),(x3,y3),(x4,y4)]. It would probably use code similar to Sega Saturn “3D” sprite distorsion. Not only could it be used to easily create textured polygons in a 3d space, but it could also be used for a variety of cool screen effects.

2 Likes

That’s an interesting question. Back when I was working on the 3D games I would really have liked native triangle and quad drawing functions, so I think that was one of the big gaps which is now covered.

A big obstacle for running on hardware devices was that fixed point math is clunky - the Fx8 class helps but is rather verbose to use, and it’s easy to accidentally exceed range limits in calculations. For example, doing a dot product requires multiplying components. Fx8 uses 8 bits after the decimal point (“binary point”?) out of 30 usable integer bits, so numbers need to stay below 127.0 (using 15 bits for a 7.8 bit fixed point number) to be able to multiply them without overflowing. (This is from memory, I may have gotten some details wrong.)

Apparently some of the target devices would natively support 32bit single precision float arithmetic, but from what I remember there’s not really any reasonable way to make use of that from TypeScript which is based on 64bit double precision floats. I don’t really know of any straightforward way to get around this. I suspect many users would be fine with focusing on the emulator which has good performance for double-precision float arithmetic and ignoring performance on hardware devices.

blitRow can be a useful helper a handcoded triangle renderer, but it’s a bit limited. It can be used to mix colors, but that needs some slightly clunky setup to make a source image that has vertical strips containing all the needed color mixes, and that would also be a bit wasteful as far as memory is concerned. I think it currently doesn’t really work for texture mapping since it doesn’t have support for rescaling or general transforms.

Would it be feasible to add a generalized blit function that can apply affine transforms + perspective when looking up the source texture? This could be used for effects similar to NES “mode 7” graphics. This would have been really helpful for the MakeCode Kart project (see MakeCode Kart - #166 by kwx), and could also be used to do some simple texture mapped 3D. That’s a bit of a slippery slope though - the next feature request would be applying lighting and shading to the textures, and that would be hard to do with the limited available color palette.

I have mixed feelings about adding vectors or matrix math. If code is intended to work on hardware devices, it pretty much needs to be based on fixed point arithmetic for the performance critical parts, and that is hard to generalize. It works if the high-level game logic does floating point math and then feeds coordinates of a handful of individual objects to a 3D renderer which then internally uses fixed point (that’s how I did it in my games), but it still needs to be used carefully to avoid exceeding the numerical range limits for fixed point values.

Adding a full native 3D renderer to MakeCode Arcade doesn’t sound like a good fit, especially since the APIs to do so in a generalized way would get very complicated.

Sorry, this turned out a bit more rambling than intended :slight_smile:

6 Likes

oh yeah, i don’t think 3d will ever be a mainline scenario for arcade. i’m primarily asking because i don’t personally have much experience with graphics and wanted to make sure there aren’t any huge gaps in our current api. and i hear you on the fixed point stuff; not only is the code unreadable but the limits and rounding errors are such a pain. if i did do some sort of native vector/matrix extension, it would definitely be fixed point on hardware and perhaps have an option for using 64 bit integers in addition to 32.

on a side note, @eanders has been working on a “draw texture to quad” function that will (hopefully) ship at some point! i think that should cover most of the texture mapping stuff

5 Likes

About fixed point, I used a Fx14 class for my MakeCode Kart floor texture renderer, it’s in the fx14.ts source file. It had a few specialized multiplication functions based on the operand precision, and a rather nasty division routine.

    // Multiply two 16.14 numbers (the slowest version)
    export function mulLL(a: Fx14, b: Fx14) {
        return (
            (Math.imul((a as any as number) & 0x3fff, (b as any as number) & 0x3fff) >> 14) +
            Math.imul((a as any as number) & 0x3fff, (b as any as number) >> 14) +
            Math.imul((a as any as number) >> 14, (b as any as number) & 0x3fff) +
            (Math.imul((a as any as number) >> 14, (b as any as number) >> 14) << 14)
        ) as any as Fx14
    }
    // Multiply a 16.14 number by a 0.14 number (range -1..1)
    export function mulLS(a: Fx14, b: Fx14) {
        return (
            (Math.imul((a as any as number) & 0x3fff, (b as any as number)) >> 14) +
            Math.imul((a as any as number) >> 14, (b as any as number))
        ) as any as Fx14
    }
    // Multiply two 0.14 numbers (both in the range -1..1)
    export function mulSS(a: Fx14, b: Fx14) {
        return (Math.imul((a as any as number), (b as any as number)) >> 14) as any as Fx14
    }
// [...]
    // Divide a 9.14 number by a 9.14 number
    export function divMM(a: Fx14, b: Fx14) {
        //console.log("float a=" + F14.toFloat(a) + " / b=" + F14.toFloat(b) + " = " + F14.toFloat(F14.div2(a, b)))
        let x = a as any as number
        let y = b as any as number
        //console.log("x=" + x + " y=" + y)
        let high = Math.idiv(x, y)
        x -= Math.imul(high, y)
        //console.log("high=" +high + " x=" + x)
        x <<= 7
        let mid = Math.idiv(x, y)
        x -= Math.imul(mid, y)
        //console.log("mid=" + mid + " x=" + x)
        x <<= 7
        let low = Math.idiv(x, y)
        x -= Math.imul(low, y)
        //console.log("low=" + low + " remainder=" + x)
        return (
            (high << 14) + (mid << 7) + low
        ) as any as Fx14
    }

If I remember right, a main reason I did that was that Fx8’s limited precision caused some noticeable jerkiness for small-angle rotations. I think having 16.14 fixed point numbers available would be fine for most use cases, it would provide a nice -65535..65535 value range with pretty good sub-integer accuracy. 64 bit math would be really helpful for implementing multiplications and divisions more conveniently to ensure that the result stays in the expected range, but for representing vector or matrix values it wouldn’t really be necessary.

There are some special cases such as vector squared length or dot products where the result of a calculation exceeds this range, so those could either be represented as 64bit or use a different scale for the resulting value. However, the needed scale depends on the use case - for a “is this object outside the play area” check you need the high bits to cover the full range (i.e. dot(Fx16.14, Fx16.14) => Fx30.0), but for lighting calculations you’d need the lower bits (dot(Fx0.14, Fx0.14) => Fx0.14). Using method variants with different precision results could work around this, but at some point I guess we’d just be reinventing floating point math…

What do you think would be a useful approach here? Using 64-bit integers to represent 50.14 bit fixed point numbers everywhere should work for all use cases (with a recommendation to keep coordinates in the 16.14 range to avoid overflow), but would be a bit wasteful of memory and would do extra arithmetic in places where it’s not really needed. Is there a sane way to have the API use a mix of 32bit and 64bit?

3 Likes