Help with custom audio clips...again

For some reason, I decided to try to play custom audio clips in MakeCode Arcade, completely ripping off of @chembot’s idea.

Instead of using SPEAR / complicated audio analysis tools, I decided to use scipy’s signal.spectrogram function to decompose to sine waves and play, which is a more basic version. You can see the GitHub repo here.

The thing I need help with, or would like the input and ideas of others, is that I’ve tried two methods to play the sine waves and I want to improve the sound quality of the second method… (both only tested in Chrome)


:interrobang: The first method, is storing two giant number[][] arrays of most prominent frequencies and their respective amplitudes, which actually turned out decently: (although it is not playing at full speed, more so ~92% speed…)

(make sure to interact with the page so sound can play. Also it’s quite laggy for the first few seconds as stuff loads in)

I generated it with the code in commit 04cb7a76510edee890c0f37265cff0d13940a6d3 and setting the period to 25. (20 sounded worse, but I left it as the default accidentally)


:poop: The second method, is using multiple long sound instructions to play the sine waves, which I thought would be better but it sounds much worse in that it’s “wobbly” and has a lot of extra sounds and noises:

I generated it with the code in commit 7cee335b09a4ecfd57332e53af4c4897c429f1e4. You can see that I generate the sound instructions according to the format specified in the docs.

I was wondering if anyone had any ideas on why the sound instruction method is much worse and if there are any ways to improve the quality. I’ve tried different frequencies of sine waves, which wasn’t much better. Maybe there are some limitations I don’t know about or ways I’m abusing the sound instruction API incorrectly. (honestly, there’s a good chance that there is a simple tweak and I’m just incompetent lol)

@richard or anyone else on the dev team, (sorry for the ping) if you have any ideas or thoughts, I would love to hear them! Especially on my abuse on the sound instruction API / format.

Note: I don’t envision this being used actually in games, as this probably takes up a significant / almost all of the simulator’s power just to play the sine waves, which wouldn’t be very good if you were in a game that also needs to process stuff. This is also just lower quality in general, and it would probably be better to use the Song API and MIDI conversion for games instead. (speaking of which, I should probably get to work on some day… :thinking:) And this would probably be a PITA for moderation of games - I’m just trying to see what’s possible with MakeCode Arcade. :grin:

17 Likes

nice converter!

I hope this gets to a point where it can be used for sound effects in games (not background music obviously)

9 Likes

I was bored again

9 Likes

@UnsignedArduino neat project!

first off, the clipping noisy sounds are being caused by rapidly switching the amplitude. you’ll get better results if you smoothly transition from the previous volume to the next for each sample instead of just abruptly changing from one volume to the next (i.e. for each instruction, set the start volume to the end volume of the previous instruction)

secondly, an interesting experiment might be to divide the sound into frequency bands instead of just taking the X highest amplitudes at each point in time.

what i mean by this is to basically divide the spectrum into buckets and then for each bucket choose the highest amplitude sample that is within that frequency range. this is how old school vocoders work. for example, here’s a classic piece of music tech:

each of those columns in that image represents a frequency in HZ. these numbers i believe were chosen to cover a pleasing range for the human ear, but you might experiment with different cutoff points for each of the buckets (or programmatically determine them based on the source audio).

I don’t know if that will produce better results, but it’s worth a try!

6 Likes

Tried it, produced slightly better results but still the noise is there.

The bucket method seemed like a good idea (so I put it in a branch) but this result is…interesting.

3 Likes

THIS IS SO AWESOMEEE!!! I love custom audio in makecode, this has so many applications! I hope you can get it worked out!

2 Likes

I also tried generating the sound instructions during the program by taking the list of frequencies and amplitudes and generating a buffer, it sound as bad as the Python-generated hex buffers. So maybe I’m messing something up with sound instructions.

3 Likes

@UnsignedArduino I actually did some experiments of my own last night and I found the frequency bucket method works really well, you just need to make sure that you keep it so that each instruction buffer is dedicated to a single bucket; the wobbliness is caused by big shifts in frequency between steps so if you keep each “thread” of audio within a limited frequency range you get better results. I also don’t have the clipping noise issues in mine

4 Likes

to generate the fft data, I used sox, like so:

sox input.wav -n stat -freq &> file.dat

and here’s the script (in JavaScript):

import * as fs from "fs";
import * as path from "path";

const NumberFormat = {
    Int8LE: 1,
    UInt8LE: 2,
    Int16LE: 3,
    UInt16LE: 4,
    Int32LE: 5,
    Int8BE: 6,
    UInt8BE: 7,
    Int16BE: 8,
    UInt16BE: 9,
    Int32BE: 10,
    UInt32LE: 11,
    UInt32BE: 12,
    Float32LE: 13,
    Float64LE: 14,
    Float32BE: 15,
    Float64BE: 16,
};

const source = path.resolve("file.dat");
const data = fs.readFileSync(source, "utf-8").split("\n");
const sourceLength = 39.594271;

const buckets = [
    50,
    159,
    200,
    252,
    317,
    400,
    504,
    635,
    800,
    1008,
    1270,
    1600,
    2016,
    2504,
    3200,
    4032,
    5080,
    7000,
    9000
]

const fft = [];
let spectrum = [];

let maxAmp = 0;

for (const line of data) {
    if (!line.trim()) continue;
    const [rawFreq, rawAmp] = line.split(/\s+/);
    if (!rawFreq || !rawAmp) continue;

    const freq = parseFloat(rawFreq) * 2;
    const amp = parseFloat(rawAmp);

    if (isNaN(freq) || isNaN(amp)) continue;

    if (spectrum.length && freq === 0) {
        fft.push(spectrum);
        spectrum = [];
    }

    spectrum.push({
        freq, amp
    });
}
fft.push(spectrum);

const timePerSample = (sourceLength / fft.length) * 1000;

const numThreads = buckets.length;
const threads = [];

for (let i = 0; i < numThreads; i++) {
    threads.push([]);
}

for (let i = 0; i < fft.length; i++) {
    let prev = 0;
    for (let j = 0; j < buckets.length; j++) {
        const [ sample ] = pickHighest(fft[i], prev, buckets[j], 1);
        maxAmp = Math.max(sample.amp, maxAmp);
        threads[j].push(sample);
        prev = buckets[j];
    }
}

const bufs = threads.map(t => samplesToBuffer(t));

const out = `
namespace music {
    //% shim=music::queuePlayInstructions
    export function queuePlayInstructions(timeDelta: number, buf: Buffer) { }
}

const instructions = [
${bufs.map(b => `    hex\`${b}\`,`).join("\n")}
]

for (const thread of instructions) {
    music.queuePlayInstructions(100, thread);
}
`

console.log(out);

function pickHighest(spectrum, min, max, num) {
    spectrum = spectrum.filter(s => s.freq >= min && s.freq < max);

    const out = [];

    for (let i = 0; i < num; i++) {
        out.push(spectrum[i]);
    }

    out.sort((a, b) => a.amp - b.amp);

    for (let i = num; i < spectrum.length; i++) {
        const current = spectrum[i];

        if (out[0].amp < current.amp) {
            out.shift();
            out.push(current);
            out.sort((a, b) => a.amp - b.amp);
        }
    }

    return out;
}

function samplesToBuffer(samples) {
    const buf = new Uint8Array(samples.length * 12);

    let prevAmp = 0;
    for (let i = 0; i < samples.length; i++) {
        const { freq, amp } = samples[i];
        const scaledAmp = (amp / maxAmp) * 1023;
        addNote(buf, i * 12, timePerSample, prevAmp, scaledAmp, 3, freq, freq)
        prevAmp = scaledAmp;
    }

    return toHex(buf);
}

function toHex(bytes) {
    let r = ""
    for (let i = 0; i < bytes.length; ++i)
        r += ("0" + bytes[i].toString(16)).slice(-2)
    return r
}

function addNote(sndInstr, sndInstrPtr, ms, beg, end, soundWave, hz, endHz) {
    if (ms > 0) {
        setNumber(sndInstr, NumberFormat.UInt8LE, sndInstrPtr, soundWave)
        setNumber(sndInstr, NumberFormat.UInt8LE, sndInstrPtr + 1, 0)
        setNumber(sndInstr, NumberFormat.UInt16LE, sndInstrPtr + 2, hz)
        setNumber(sndInstr, NumberFormat.UInt16LE, sndInstrPtr + 4, ms)
        setNumber(sndInstr, NumberFormat.UInt16LE, sndInstrPtr + 6, beg)
        setNumber(sndInstr, NumberFormat.UInt16LE, sndInstrPtr + 8, end)
        setNumber(sndInstr, NumberFormat.UInt16LE, sndInstrPtr + 10, endHz);
    }
    return sndInstrPtr
}


function fmtInfoCore(fmt) {
    switch (fmt) {
        case NumberFormat.Int8LE: return -1;
        case NumberFormat.UInt8LE: return 1;
        case NumberFormat.Int16LE: return -2;
        case NumberFormat.UInt16LE: return 2;
        case NumberFormat.Int32LE: return -4;
        case NumberFormat.UInt32LE: return 4;
        case NumberFormat.Int8BE: return -10;
        case NumberFormat.UInt8BE: return 10;
        case NumberFormat.Int16BE: return -20;
        case NumberFormat.UInt16BE: return 20;
        case NumberFormat.Int32BE: return -40;
        case NumberFormat.UInt32BE: return 40;

        case NumberFormat.Float32LE: return 4;
        case NumberFormat.Float32BE: return 40;
        case NumberFormat.Float64LE: return 8;
        case NumberFormat.Float64BE: return 80;
    }
}

function fmtInfo(fmt) {
    let size = fmtInfoCore(fmt)
    let signed = false
    if (size < 0) {
        signed = true
        size = -size
    }
    let swap = false
    if (size >= 10) {
        swap = true
        size /= 10
    }
    let isFloat = fmt >= NumberFormat.Float32LE
    return { size, signed, swap, isFloat }
}

function setNumber(buf, fmt, offset, r) {
    let inf = fmtInfo(fmt)
    if (inf.isFloat) {
        let arr = new Uint8Array(inf.size)
        if (inf.size == 4)
            new Float32Array(arr.buffer)[0] = r
        else
            new Float64Array(arr.buffer)[0] = r
        if (inf.swap)
            arr.reverse()
        for (let i = 0; i < inf.size; ++i) {
            buf[offset + i] = arr[i]
        }
        return
    }

    for (let i = 0; i < inf.size; ++i) {
        let off = !inf.swap ? offset + i : offset + inf.size - i - 1
        buf[off] = (r & 0xff)
        r >>= 8
    }
}

not posting the audio sample because it’s copyrighted and I’m too lazy to look up something public domain to use :stuck_out_tongue:

8 Likes

Everyone is not noticing how we can add voices to our games now

4 Likes