Let's take a moment to appreciate a time where StackOverflow was filled with good, insightful questions (such as mineI have an array of N elements (representing the N letters of a given alphabet), and each cell of the array holds an integer value, that integer value meaning the number of occurrences in a given text of that letter. Now I want to randomly choose a letter from all of the letters in the alphabet, based on his number of appearances with the given constraints:
Now, taking that into account, I've come up with a simple algorithm that might do the job, but I was just wondering if there was a better thing to do. This seems to be quite fundamental, and I think there might be more clever things to do in order to accomplish this more efficiently. This is the algorithm i thought:
- If the letter has a positive (nonzero) value, then it can be always chosen by the algorithm (with a bigger or smaller probability, of course).
- If a letter A has a higher value than a letter B, then it has to be more likely to be chosen by the algorithm.
So, is there a better thing to do than this? Am I missing something?
- Add up all the frequencies in the array. Store it in SUM
- Choosing up a random value from 0 to SUM. Store it in RAN
- While RAN > 0, Starting from the first, visit each cell in the array (in order), and subtract the value of that cell from RAN
- The last visited cell is the chosen one
I'm aware most modern computers can compute this so fast I won't even notice if my algorithm is inefficient, so this is more of a theoretical question rather than a practical one.
I prefer an explained algorithm rather than just code for an answer, but If you're more comfortable providing your answer in code, I have no problem with that.
Now, I must admit, even today, this answer flies slightly over my head. I wouldn't blame you if putting that binary search in there makes you feel uneasy, especially when you still have to iterate through all the elements so the complexity of the whole thing would be linear anyway. I have implemented this version at least once in the past and it seems to work, and things like binary searches tickled my past self's brain in the right way, so I marked it as accepted.The idea:Example:
- Iterate through all the elements and set the value of each element as the cumulative frequency thus far.
- Generate a random number between 1 and the sum of all frequencies
- Do a binary search on the values for this number (finding the first value greater than or equal to the number).
Generate a random number in the range 1-10 (1+4+3+2 = 10, the same as the last value in the cumulative list), do a binary search, which will return values as follows:Code: Select all
Element A B C D Frequency 1 4 3 2 Cumulative 1 5 8 10
Code: Select all
Number Element returned 1 A 2 B 3 B 4 B 5 B 6 C 7 C 8 C 9 D 10 D
Attachments
Let's take a moment to appreciate a time where StackOverflow was filled with good, insightful questions (such as mineI have an array of N elements (representing the N letters of a given alphabet), and each cell of the array holds an integer value, that integer value meaning the number of occurrences in a given text of that letter. Now I want to randomly choose a letter from all of the letters in the alphabet, based on his number of appearances with the given constraints:
Now, taking that into account, I've come up with a simple algorithm that might do the job, but I was just wondering if there was a better thing to do. This seems to be quite fundamental, and I think there might be more clever things to do in order to accomplish this more efficiently. This is the algorithm i thought:
- If the letter has a positive (nonzero) value, then it can be always chosen by the algorithm (with a bigger or smaller probability, of course).
- If a letter A has a higher value than a letter B, then it has to be more likely to be chosen by the algorithm.
So, is there a better thing to do than this? Am I missing something?
- Add up all the frequencies in the array. Store it in SUM
- Choosing up a random value from 0 to SUM. Store it in RAN
- While RAN > 0, Starting from the first, visit each cell in the array (in order), and subtract the value of that cell from RAN
- The last visited cell is the chosen one
I'm aware most modern computers can compute this so fast I won't even notice if my algorithm is inefficient, so this is more of a theoretical question rather than a practical one.
I prefer an explained algorithm rather than just code for an answer, but If you're more comfortable providing your answer in code, I have no problem with that.
Now, I must admit, even today, this answer flies slightly over my head. I wouldn't blame you if putting that binary search in there makes you feel uneasy, especially when you still have to iterate through all the elements so the complexity of the whole thing would be linear anyway. I have implemented this version at least once in the past and it seems to work, and things like binary searches tickled my past self's brain in the right way, so I marked it as accepted.The idea:Example:
- Iterate through all the elements and set the value of each element as the cumulative frequency thus far.
- Generate a random number between 1 and the sum of all frequencies
- Do a binary search on the values for this number (finding the first value greater than or equal to the number).
Generate a random number in the range 1-10 (1+4+3+2 = 10, the same as the last value in the cumulative list), do a binary search, which will return values as follows:Code: Select all
Element A B C D Frequency 1 4 3 2 Cumulative 1 5 8 10
Code: Select all
Number Element returned 1 A 2 B 3 B 4 B 5 B 6 C 7 C 8 C 9 D 10 D
Attachments