I had been looking back on this other thread where a number of generators were discussed, and I had been particularly impressed with the ones Arlet was posting as they were short and simple in structure but seemed to give good results. This evening I installed PractRand and had a go at making something based on Arlet's method, and I have ended up with this one which seems to do a very good job:
Code: Select all
CLC
LDA #$0081
ADC state+0
STA state+0
ADC state+2
STA state+2
ADC state+4
STA state+4
ADC state+6
STA state+6
ADC state+8
STA state+8
ADC state+6
ADC state+4
STA state+6
So I think this RNG should be quite solid, at least for use cases that match the target platform! 128GB is way beyond the target I set myself at the start of the exercise. Given that it costs about 60 cycles to calculate the next 16-bit value, it would take a 20MHz computer over a day of continuous computation to get beyond 64GB of results.
The logic behind this structure - taken from what Arlet was doing in the other thread - is that the first few add/stores form a strict sequence that ensures those three words will cycle through every possible combination before repeating. On top of that two more words are used in a similar arrangement, but this time with feedback to shuffle up the results more - these form a kind of "output filter" on top of the underlying sequence. It's this bit that you basically experiment with to find combinations that do lead to good results.
I'm initialising the state vector to zero - I don't think the specific initial conditions will matter, so it should be fine to use any source you like to seed it.
I would highlight that I'm not an expert in this, I'm just following the patterns that Arlet established, and happy to see that it seems to lead to good results!
I will have a look at generating 32 bits as well - you could just pull one of the other state vector entries for the extra bits but they might not be random enough, so it's likely to work better if there are a few more stages of output filtering there, and at least one extra state vector. As Arlet noted, a big advantage to generating more at a time is efficiency - a few more adds can add a disproportionately large amount of extra random bits, reducing the overall cost per bit, and even if you don't need wider numbers out of it, you can use them to generate several narrower numbers at a time, so that you don't need to perform the main calculation as often.