What to do with large data files?

Wanted to get some opinions, particularly from the devs, on a rather ridiculous idea that I have.

I’ve been thinking about word games lately, and I found myself down a rabbit hole this weekend. I thought, “what about a generic word lookup extension?” So, I created a proof of concept:

The program works just fine in a browser. The data file contains the entire YAWL (“Yet Another Word List”) with some 200,000+ entries. The data file itself is about 5 MB on my file system. In contrast, the word list that I used for What’s My Word, with 2,000+ entries, clocks in at around 30 kB. This clearly is ridiculous, and I really can’t think of a reason why I would need the entire YAWL loaded into a project. But, for giggles … what if I did?

The strings are already in hex format, so is it better to place these into buffers somehow and access the data that way? Or does the compiler already know to place constants into the game file and make them accessible to the runtime? Is the game file limit still 512 kB for hardware, or has that been expanded?

Would love your thoughts.

1 Like

For a word list, if you’re OK with some non-words being accepted as words (false positives), you could use a bloom filter as a space-efficient alternative. It’s a one-way function where you can only check if a word is part of the set, so it would work for Wordle’s list of valid guesses but not for the list of target words.

According to an online calculator, a 2000-word list with a 0.1% false positive rate would need about 3.51 kiB. For 10000 words with 0.01% false positives, it would be 23.4 kiB.

2 Likes

Well, I suppose this answers one of my questions. :laughing:

image

1 Like

@kwx TIL! this is very cool

2 Likes

Also @AlexK, no need to pack strings into buffers! The compiler does indeed include strings in a space efficient way.

2 Likes

Maybe compress it and store in base85?

I am doing prj using word list too, recently.
It’s a words reciting game. First vacabulary has about 1500 words, each word with 2 fields(spelling, meaning) stored in a 2d string array.
When I download to device(Meobit) got compile error. After I commented last 600+ words, about 900 left, it works. But could got 021 error in high chance, especially, when connecting USB cable.
(Only Arcade Text / Sprite Text ext imported)

1 Like