Game Programming :: The one Stack Overflow question I keep coming back to :: Author celes

2026-03-07T12:21:15+00:00

There's this Stack Overflow question I keep getting back to

https://stackoverflow.com/questions/172 ... babilities

Some of you may recognize the poster (it's a me!). Feel free to lurk, there's not much but there was a time when StackOverflow was good, and I think a naiver younger version of me asked some good questions.

Anyway, probabilities. There's a reason I keep coming back to this question: It shows up all the time in gamedev. For those of you who don't follow StackOverflow links (wouldn't blame you), or in case the link is no longer accessible: Here's the original question:

I have an array of N elements (representing the N letters of a given alphabet), and each cell of the array holds an integer value, that integer value meaning the number of occurrences in a given text of that letter. Now I want to randomly choose a letter from all of the letters in the alphabet, based on his number of appearances with the given constraints:

If the letter has a positive (nonzero) value, then it can be always chosen by the algorithm (with a bigger or smaller probability, of course).

If a letter A has a higher value than a letter B, then it has to be more likely to be chosen by the algorithm.

Now, taking that into account, I've come up with a simple algorithm that might do the job, but I was just wondering if there was a better thing to do. This seems to be quite fundamental, and I think there might be more clever things to do in order to accomplish this more efficiently. This is the algorithm i thought:

Add up all the frequencies in the array. Store it in SUM

Choosing up a random value from 0 to SUM. Store it in RAN

While RAN > 0, Starting from the first, visit each cell in the array (in order), and subtract the value of that cell from RAN

The last visited cell is the chosen one

So, is there a better thing to do than this? Am I missing something?

I'm aware most modern computers can compute this so fast I won't even notice if my algorithm is inefficient, so this is more of a theoretical question rather than a practical one.

I prefer an explained algorithm rather than just code for an answer, but If you're more comfortable providing your answer in code, I have no problem with that.

Let's take a moment to appreciate a time where StackOverflow was filled with good, insightful questions (such as mine ) and answers instead of burned-out mods closing questions they don't understand as duplicate of other questions they never bothered checking.

I don't even know what the original use case for this was, but if I had to take a guess, I was messing with the dark arts, trying to implement a markov chain generator. Either way I kept coming back to this problem and many times it has to do with designing gameplay systems, like having a boss pick its next attack, or picking where I'll spawn my next enemy where some areas are "hotter" than others, which was today's problem, actually!

Let's take a look at the accepted solution

The idea:

Iterate through all the elements and set the value of each element as the cumulative frequency thus far.

Generate a random number between 1 and the sum of all frequencies

Do a binary search on the values for this number (finding the first value greater than or equal to the number).

Example:
Code: Select all
Element    A B C D
Frequency  1 4 3 2
Cumulative 1 5 8 10
Generate a random number in the range 1-10 (1+4+3+2 = 10, the same as the last value in the cumulative list), do a binary search, which will return values as follows:
Code: Select all
Number   Element returned
1        A
2        B
3        B
4        B
5        B
6        C
7        C
8        C
9        D
10       D

Now, I must admit, even today, this answer flies slightly over my head. I wouldn't blame you if putting that binary search in there makes you feel uneasy, especially when you still have to iterate through all the elements so the complexity of the whole thing would be linear anyway. I have implemented this version at least once in the past and it seems to work, and things like binary searches tickled my past self's brain in the right way, so I marked it as accepted.

But these days, the answer I've converged towards is actually the algorhm I initially suggested in my question, not the answer. I've implemented it many times across many programming languages. It's good enough, and I don't think we're getting anything from the binary search over a linear scan for the small lists I typically deal with.

This was the latest iteration of it, I wrote it today as a utility function because there were at least three inline instances of it in Iris Blade and while I appreciate ths fun kata I keep coming back to, I think it's time to encapsulate it in a tiny utility function:

image.png

So anyway, this is the StackOverflow *question*, not answer, I keep coming back to ^^

Attachments

image.png (89.86 KiB)

Game Programming :: Re: The one Stack Overflow question I keep coming back to :: Reply by Sugui

2026-03-07T16:14:32+00:00

Hmm, for what I understood, the version with the binary search could be more efficient in the case you need to get multiple random items from the same set of weighted items already calculated (where you already calculated the frequencies and cumulatives and such), so this last part could be O(log(n)), instead of being O(n/2) (as average).

But I always think that, unless optimization is crucial, it's always better a code that is more easy to understand than another one that is 0.01% more efficient and requires more effort to read. You can always return to the answer if this somewhat becomes a bottleneck (a very large map, an aggresive spawn of enemies...).

Nice post, never thought I would find you in a StackOverflow question.

Game Programming :: Re: The one Stack Overflow question I keep coming back to :: Reply by celes

2026-03-07T20:57:04+00:00

Yup! That makes sense, I didn't consider you can reuse the same accumulated value, this would make sense to speed up when you have the same table and want to do many lookups, and even if it doesn't do much for my tiny lists, for something like a markov chain based on n-grams I guess it would speed up things quite a bit

Carrot Games

Game Programming :: The one Stack Overflow question I keep coming back to :: Author celes

Game Programming :: Re: The one Stack Overflow question I keep coming back to :: Reply by Sugui

Game Programming :: Re: The one Stack Overflow question I keep coming back to :: Reply by celes