How good (or bad) is the random number generator in WB?

Started by snowsnowsnow, October 21, 2020, 06:52:43 AM

Previous topic - Next topic

snowsnowsnow

In this thread, I'm looking for the following:

1) An authoritative answer as to how good it is.
2) An alternative way to get good random numbers from one run to the next of a program.

I've got a program that plays a randomized sequence of videos and I run it every night (think of it as being like a screensaver that displays pictures in a random order - more about this later).  There is a very large collection of videos to choose from, but, surprise!  surprise! it seems certain videos come up, shall we say, more frequently than they ought to.  In fact, the user is convinced that it isn't random at all.  This isn't really true - it doesn't, of course, generate exactly the same sequence every time - it is different each time - but, as noted, it is true that certain videos seem to come up a lot.

Now, we all know about how bad random number generators have historically been.  We all know how somebody, probably back in the 50s or 60s wrote a "reference implementation" for what a random number generator might look like - never intending for it to be actually used - but somehow, that reference implementation has found its way into many language development ecosystems.

Anyway, I know about IntControl(81) - but I'm not sure a) if it would make things any better and b) Exactly what value I should pass to it.  Suggestions are welcome.

Note, BTW, regarding screensavers.  Note that this problem isn't limited to WB.  One of our systems is running a Windows screensaver that displays random pictures, and it is running in a directory with about 3000 files (literally), yet the same 4 or 5 pictures seems to show up with amazing frequency...

They're probably using that same "reference implementation" for their random numbers...


td

Reality and perception of reality are two different things. If it is that important, you might consider developing a metric to determine if the perception of bias in the file selection is actual bias.  The WIL Random function is a pseudo-random number generator. That is, given the same seed value, it produces a predictable pattern. Of course, the way to prevent a repeating pattern is to use IntControl 81 to change the seed.  The result will not follow a predictable pattern but isn't cryptography grade either.

You could use the dotNet "System.Security.Cryptography" class with CLR hosting to get results approaching the requirements of cryptography but that would necessitate an upgrade to a newer version of WinBatch. You could also consider the Windows CNG crypto random number generator. It would require a few DllCalls to implement.

https://docs.microsoft.com/en-us/windows/win32/api/bcrypt/nf-bcrypt-bcryptgenrandom
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

kdmoyers

<<given the same seed value, it produces a predictable pattern>>
This is a great feature!  Very useful.
-Kirby
The mind is everything; What you think, you become.

snowsnowsnow

Well, one of the questions I have is:

Assuming you don't use IC(81), it is true that different runs of a program will generate different sequences of numbers.  This implies that (as is common with Rand() implementations of various programming ecosystems) the generator is seeded with some sort of random starting point automatically.  How, exactly, is this done in WB?  What is used to do this seeding?

And, this, then implies, the next question, which is, would I get better results if I use IC(81)?  Better than I get by default, that is.  And, if so, what should I use as the seed value?

I have considered using GetTickCount() as the seed value (for IC(81)).  Would that work well?

td

Like many things in computer science there are a known set of algorithms to perform specific tasks. These algorithms have a basis in mathematical proofs or other scholarly disciplines like linguistics. Creating pseudo random numbers is no exception.  WIL uses a proprietor variation on a common standard algorithm. It is not intended to be at the level required for encryption or other advanced tasks.  Basically, if you perform statistical analysis on the output of the Random function you can find a pattern.   The answer to your question is that you will have to decide the "good (or bad)" for yourself because it depends on how random your random needs to be.   A Web search may help you find the how-to of analyzing output. 

The documentation for IntControl 81 is fairly clear about what the p1 values use for seeding. Again, which one if any you use is dependent on how satisfied you are with the output of the Random function.  We do not keep a set of statistics on how all possible seeds perform.

And as mentioned earlier there are alternatives available for WIL scripts.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

stanl

Quote from: td on October 22, 2020, 08:58:15 AM
And as mentioned earlier there are alternatives available for WIL scripts.


Yeah. Like stripping decimals from a Guid.

td

The random number generator used by WIL is a modified version of something similar to this:

https://en.wikipedia.org/wiki/Linear_congruential_generator
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

snowsnowsnow

OK, I'm good with everything posted so far.

But the question I still have is: If I don't use IC(81), what does it use to seed the RNG on each run of the program?

I.e., and just to be clear, I'm not so much concerned with the "goodness" of the sequence of numbers generated within a single run (I trust that that is fine), but, rather, with the randomness across multiple runs of the program.  I.e., it seems like we get similar (but not identical) numbers in successive program runs.

td

Why not just experiment a bit with IntControl 81 until you find some combination you like. For example, you could consider "persisting" (MSFT speak) your last seed and using it to create your next seed via some bit mushing with the current time or something.  But keep in mind that pseudo-random number generators are far from perfectly random.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

stanl

Quote from: snowsnowsnow on October 22, 2020, 10:29:06 PM
  I.e., it seems like we get similar (but not identical) numbers in successive program runs.


You have a valid point: I iterated the code below and it appeared each appearance of i was often in the same realm


Code (WINBATCH) Select


a=""
For i = 1 to 10
   b = (random(100000)+1) / (random(20)+1)
   a = a:@LF:b
Next


Message("",a)





[EDIT]:
By iterate, I meant I ran the loop to fill a 100,000 row access 2019 table, then looked for when the position of the loop was in proximate range.  I found that positions 1 and 5 would often have a close proximity with successive runs, i.e. position 1 might be 4376 on first run then 4122 on second; position 5 would be 11252 on first run and 11288 on second. Nothing statistically significant. But then changing random(20) to random(10) and proximate values no longer found.

td

This quick and dirty example with almost no testing illustrates one way to get a seed for a random number generator that may introduce some entropy into the process of creating pseudo-random numbers.  I haven't tested this approach to determine if it is actually effective in producing results that give the appearance of not possessing a predictable pattern.   Mostly it is just an example of how to decipher Win 32 function prototypes for DllCalls.     

Code (winbatch) Select
#DefineFunction GetRngSeed()
   nSeed = 0
   hCrypt = DllLoad("Bcrypt.dll")
   hBuf = 0
   nAlgo = 0

   ; Get a provider.
   hAlgo = BinaryAlloc(4)
   strAlgo = "RNG" ; Should be OK on most systems. Can change to any supported provider.
   nStatus = DllCall(hCrypt, long:'BCryptOpenAlgorithmProvider', lpbinary:hAlgo, lpwstr:strAlgo, lpnull, long:0)
   if nStatus then goto BcryptError
   BinaryEodSet(hAlgo, 4)
   
   ; Get a random number to use as a seed.
   nAlgo = BinaryPeek4(hAlgo, 0)
   nSize = 4
   hBuf = BinaryAlloc(nSize)
   dwFlag = 0 ; Could try 2 here.
   nStatus = DllCall(hCrypt, long:'BCryptGenRandom', long:nAlgo, lpbinary:hBuf, long:nSize, long:dwFlag)
   if nStatus then goto BcryptError
   
   nStatus = DllCall(hCrypt, long:'BCryptCloseAlgorithmProvider', long:nAlgo, long:0)
   nAlgo = 0
   if nStatus then goto BcryptError
   
   BinaryEodSet(hBuf, 4)
   nSeed = BinaryPeek4(hBuf, 0)
   BinaryFree(hBuf)
   BinaryFree(hAlgo)
   DllFree(hCrypt)
   Pause('Random Seed', nSeed)
   
   return nSeed

:BcryptError
   if nAlgo then DllCall(hCrypt, long:'BCryptCloseAlgorithmProvider', long:nAlgo, long:0)
   if hBuf then BinaryFree(hBuf)
   if hAglo then BinaryFree(hAlgo)
   DllFree(hCrypt)
   Pause( 'Crypto Error', 'Returned status = ':nStatus)

   return nSeed
#EndFunction
   
; Example of usage
nSeed = GetRngSeed()
IntControl(81, nSeed, 0 , 0, 0)
nRng = Random(3000)

exit
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

td

A simple RNG to maybe play around with.

Code (winbatch) Select

#DefineFunction CryptoRandom(_max)
   hBuf = BinaryAlloc(4)
   DllCall("Bcrypt.dll", long:'BCryptGenRandom', lpnull, lpbinary:hBuf, long:4, long:2)
   BinaryEodSet(hBuf, 4)
   nReturn = ((BinaryPeek4(hBuf, 0) * 214013 + 2531011) >> 16) & 32767
   BinaryFree(hBuf)
   Return nReturn mod (_max+1)
#EndFunction

;Example of usage
nRnum = CryptoRandom(3000)


[edit] Added missing call to BinaryFree...
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

td

As a follow-up calculated the Chi-square on 10000 calls to the WIL Random function with the range number range set to 99.  The results show support for the null hypothesis for the function. In other words, the function's output is fairly random for a pseudo-random number generator.

Since Snow++'s question had was about repeating number patterns between usages, I counted the pattern matches from ten different runs where each run called the function 1000 with a range of 0 to 999.  Compared each of the 10 runs with each other and found between 5 and 12 cell matches total.  So it doesn't appear that pattern repitition is much of a problem given this test's constrants.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

snowsnowsnow

Thank you very much - you did a lot of work/research for this.

I will get back to you once I've had a chance to digest (i.e., test) all of this.

In the meantime, I have gone a different route with my underlying problem - and I am hoping my user is happier with the results the new method produces.

td

Quote from: snowsnowsnow on October 29, 2020, 05:32:13 PM
Thank you very much - you did a lot of work/research for this.

To be honest your question intersected with a fascination with statistics and an interest in random number generators. The interest in random number generators stems from an argument with a Russian programmer that occurred many years ago. Russian programmers tend to be better mathematicians than most.... Also, we use Random number generators a lot in WinBatch function performance testing so we have a lot of scripts with various iterations of an RNG sitting around.
"No one who sees a peregrine falcon fly can ever forget the beauty and thrill of that flight."
  - Dr. Tom Cade

snowsnowsnow