CMWC4096 - a new random number generator

General FreeBASIC programming questions.
Post Reply
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

CMWC4096 - a new random number generator

Post by deltarho[1859] »

Bug found - I'll be back
Last edited by deltarho[1859] on Jun 15, 2018 14:22, edited 2 times in total.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: CMWC4096 - a new random number generator

Post by deltarho[1859] »

I'm back <smile>

When we reference a variable in asm we have to use Dword Ptr like this:

Code: Select all

mov Dword Ptr [TestVar], 123
I ported some code from PowerBASIC and did not know that we have to do the same when referencing function parameters. I was using WinFBE and if any code uses asm then we should always open 'View Output Window' after every compilation otherwise we will not know if anything untoward has happened. I am seriously considering using poseidonFB if asm is used.

Anyway, where was I? <laugh>

I have ben meaning to do this for sometime but there was always something else going on.

The following is cross platform so you can drop it into either a Windows application or a Linux application.

A rule of thumb is that if we want 2^n random numbers in an application session then we should use a generator with a period of at least 2^2n. PowerBASIC's RND has a period of 2^32 so the rule of thumb will break if we wanted more than 65,536 random numbers. Doing nothing other than generate random numbers my machine will see PB's RND start repeating it's sequence in less than one minute. Having said that PB's RND will be fine for many applications. For other applications a period in excess of 2^32 will be needed.

If we jump straight into a period of 2^64 then we are in a very different environment given that 2^64 = 2^32 * 2^32.

Sebastiano Vigna wrote: "In other words: any generator with a period beyond 2^256 has a period that is sufficient for every imaginable application." He wrote similarly, but not quite as strong, with regard his 2^128 generators before moving up to his newer 2^256 generators.

However, there are some who require much larger periods and have been using the Mersenne Twister (MT) with a period of 2^19937-1. MT was introduced in 1997 and it had a much better quality random numbers than a Linear Congruential Generator (LCG) which was introduced in 1951. It has been found that there is a slight bias in the Mersenne Twister but it takes a while to manifest itself. At 256GB it fails a PractRand test, one of the modern random number testing suites.

The algorithm presented here uses what is called a lag-array where we move through the array to the end and start over. This provides the means to achieve very large periods. Lag~4096 is used, with 4096 * Ulongs, giving a period of 2^131086 which dwarves MT's period of 2^19937-1. In decimal terms that is greater than 10^39460. If you are into large periods this will give you some serious bragging rights. <smile> The algorithm is Complimentary Multiply With Carry (CMWC) devised by George Marsaglia. There were issues with MWC and addressed by Marsaglia himself. The Wikipedia entry here gives a good background. Marsaglia was obsessed with a generator's period and put out a challenge, not long before he died in 2011, for a lag~8192 or a lag~16384. I never did find out why but I suspect that he figured as the period tends to infinity the generator tends toward a cryptographic generator. That is the best that I can come up with.

For me, a generator is 'top drawer' if it manages to get to the 1TB test with PractRand and the author of PractRand says as much. Both my CryptoRndII and PCG32II do just that and now CMWC4096 has done. Even Vigna's xorshift128+ and xoroshiro128+ fail PractRand at 64GB. His later revisions pass and, it seems, his new 256 bit generators pass as well. In fact, I went out for the day and when I got back the tests had gone beyond 2TB. Here are the last three tests:

Code: Select all

rng=RNG_stdin32, seed=0x3d92e4fc
length= 512 gigabytes (2^39 bytes), time= 5316 seconds
  no anomalies in 213 test result(s)
 
rng=RNG_stdin32, seed=0x3d92e4fc
length= 1 terabyte (2^40 bytes), time= 10611 seconds
  no anomalies in 220 test result(s)
 
rng=RNG_stdin32, seed=0x3d92e4fc
length= 2 terabytes (2^41 bytes), time= 21278 seconds
  no anomalies in 229 test result(s)
It was a long test, taking over 11.8 hours, and, as you can see, the 2TB itself took over 5.9 hours.

There was only five unusual anomalies in total, at PractRand's bottom scale, and there is nothing unusual about that for any generator.

So, CMWC4096 is 'top drawer'.

I am no longer testing for speed with a simple loop to determine a MHz output. My RndMT is faster than FB's generator #3 when using a simple loop test but in 'real world' tests they are reversed. I have a graphic test which also has RndMT being slower. The difference is, however, not significant. I suspect that with simple loops gcc optimizing produces code which does not reflect the BASIC code being compiled. RndMT is a MT but gives access to the whole period and not just a tiny 2^32 portion of it.

What I am now using is the core statements of a Monte Carlo test.
x = RND : y = RND : z = Sqr( x*x + y*y)
and putting that in a loop.

Instead of timing a For/Next construct the tests are given 20 seconds and the number of loops achieved are noted. This is along the lines of a suggestion made by coderJeff in another thread.

Here is a comparison of the FB's generators and some of mine. The values are the number of loops in MHz. Several runs were undertaken on Windows but only one shown, the results were found to be very stable.

Code: Select all

2          409.87
PCG32II    409.82
CMWC4096   406.58
4          401.39
CryptoRnd  400.96
1          384.20
3          376.93
RndMT      366.60
5            1.58
My interest in this thread is how CMWC4096 compares with FB's #3 and my RndMT.

If your reason for using MT is it's large period then, obviously, CMWC4096's period is a plus. CMWC4096 qualifies as 'top drawer' but, as mentioned above, MT fails a PractRand test at 256GB. CMWC4096 is much faster than MT.

So, I propose that CMWC4096 be used as a direct replacement for MT.

As a side note it is worth looking at the table above in general.

FB's #2 generator is the fastest using a LCG and is the first one at the Wikipedia entry here. However, it's speed is almost identical to PCG32II and is decidedly not 'top drawer'. If it's speed was noticaby faster than PCG32II and 'top drawer' was not an issue then it would be a contender as it is easily used by simply declaring it with 'Randomize , 2'.

FB's #4 is fast but has a "low degree of randomness" according to the docs and since it is slower than PCG32II and CMWC4096, both 'top drawer, I see little need to use it even though it is easily added to our code. FB's #1 'the C runtime library's rand() function' is not held in much esteem by anyone that I can find and since it is not that fast should not, in my opinion, be given the time of day. FB's #5 in Windows was designed to fill buffers and not be called one number at a time.

This leaves us with CryptoRnd and PCG32II. If 'top drawer' is an issue and a large period is not a requirement then PCG32II is the best of the bunch. It is more compact than CryptoRnd, is not platform dependent and is thread safe.

Usage

At the moment CMWC4096 returns Ulongs, Double and Range using CMWCdw, CMWCse and CMWCRange respectively.

The 'se' in CMWCse means single extended and is terminology that I have used previously to indicate that although the generator works on 32 bits it outputs Doubles so that the 32 bit granularity is maintained. This is exactly what the FB generators do.

There is a facility to repeat a sequence. With RndMT we supply a string which is then subject to hashing and gives us access to the whole period in the same way as random seeding does. However, the hashing is via Microsoft's APIs leaving Linux users to write their own code for that job. With CMWC4096 I wanted to keep the code cross platform. The question is do we need, in this case, to be able to access the whole period. I would argue that we do not and if we restricted the entry points for repeat sequencing to 2^32 then we can use Rnd; which is cross platform.

For a random sequence we use RandomizeCMWC( "" ); the same method that RndMT uses.

Originally I used

Code: Select all

Declare Function MyRandomInt Lib "Advapi32.dll" Alias "SystemFunction036" _
 ( RandomBuffer As Any Ptr, RandomBufferLength As ULong ) As Byte
for randomly seeding the lag array and carry. However, Linux users would need to supply an equivalent. I refuse point blank to use a PRNG for this so use FB's generator #5 which is a CPRNG, cryptographic, and cross platform. There is a penalty for being stubborn - on my machine RandomizeCMWC( "" ) takes about 25 milli-seconds. That is an awful long time on a modern computer but I doubt that anyone will be conscious of the wait and it should only be employed once. In fact, RandomizeCMWC( "" ) is used in CMWC4096.bas so we can ignore seeding unless we want fixed sequences.

For fixed sequences we use Randomize( seed As String ) where seed is of the form "<Double>". We then use 'dblSeed = Val( seed ) : Randomize dblSeed, 3'. That is, MT with a fixed seed is used for the initial lag array with the index and carry being fixed. For a particular index the multiply in MWC is undertaken and the carry of the previous index is used. In a nutshell this is how MWC works.

The fixed sequence code is only a few lines and is good enough for me. I rarely use fixed sequencing and do not need the sophistication used in RndMT. If you are not happy with this then be my guest and rewrite this section. <smile>

That is it!

Following is some example usage code and CMWC4096.bas

ForumTest.bas

Code: Select all

#Include "CMWC4096.bas"
 
Dim As Long i
 
For i = 1 To 6
  Print CMWCdw
Next
Print
 
For i = 1 To 6
  Print CMWCse
Next
Print
 
For i = 1 To 6
  Print CMWCRange(0,9)
Next
 
Print
print "Average 10^9 CMWCse"
Dim totse As Double
For i = 1 To 10^8
  totse += CMWCse
Next
Print totse/10^8
 
Print
Print "Average 10^9 CMWCRange(1,6)"
Dim As Ulongint totr
For i = 1 To 10^8
  totr += CMWCRange(1,6)
Next
Print totr/10^8
 
print
print "Fixed sequencing"
print
RandomizeCMWC( "3.141592654" )
for i = 1 to 10
  print CMWCse
Next
print
RandomizeCMWC( "3.141592654" )
for i = 1 to 10
  print CMWCse
Next
 
Sleep

CMWC4096.bas

Code: Select all

Declare Sub RandomizeCMWC( As String )
Declare Function CMWCdw As Ulong
Declare Function CMWCse As Double
Declare Function CMWCRange( As Long, As Long ) As Long
 
#Macro EngineCode
  mov eax, index
  mov ecx, 4095
  Add eax, 1
  And eax, ecx
  mov index, eax
  mov ecx, dword Ptr [q0Address] ' Address of Q(0)
  mov eax, [ecx+eax*4]           ' Contents of Q(index)
  mov edx, 18782
  mul edx                        ' Multiplier * Q(index)
  mov ecx, carry
  Add eax, ecx                   ' + carry
  adc edx, 0                     ' In Case there Is a carry of 1, add it
  mov carry, edx                 ' Store carry
  Add eax, edx
  jnc Short 1f
  cmp eax, -1
  jne Short 2f
1:
  Add eax, 1
  Add dword Ptr [carry], 1
2:
  ' The CMWC aspect (Base-1) - x
	mov edx, &H0FFFFFFFE
  Sub edx, eax                   ' edx = &H0FFFFFFFE - x
  mov eax, index
  mov ecx, dword Ptr [q0Address] 'Address of Q(0)
  mov [ecx+eax*4], edx           ' Store Q(index)
#Endmacro
 
Dim Shared As Ulong Q(0 To 4095)
Dim Shared As Ulong index, carry
Dim Shared As Ulong Ptr q0Address
Dim Shared As Ulong TempVar
 
Private Sub RandomizeCMWC( seed As String )
Dim As Double dblSeed
Dim As Long i
 
  If seed = "" Then ' Random seeding
    index = Int( Rnd * 4096 )
    Randomize , 5 ' Cryptographic
    For i = 0 To 4095
      ' seeds should be less than 2^32-1
      Q(i) = Cast( Ulong, Rnd*4294967295 )
    Next
    ' Make 0 < carry < Multiplier = 18782
    carry = Int( Rnd*18781 + 1)
   Else             ' Fixed seeding
    index = 4095
    dblSeed = Val( seed )
    Randomize dblSeed, 3 ' Mersenne Twister
    For i = 0 To 4095
      ' seeds should be less than 2^32-1
      Q(i) = Cast( Ulong, Rnd*4294967295 )
    Next
    carry = 12345
  End If
 
End Sub
 
Private Function CMWCdw As Ulong
  Asm
    EngineCode
    mov [Function], edx
  End Asm
End Function
 
Private Function CMWCse As Double
  Asm
    EngineCode
    mov dword Ptr [TempVar], edx
  End Asm
  Return TempVar/4294967296.0
End Function
 
Private Function CMWCRange( Byval One As Long, Byval Two As Long ) As Long
  Asm
    EngineCode
    ' Range Asm by John Gleason at PowerBASIC forums
    mov ecx, dword ptr [One]        ' Now, evaluate parameters.
    mov eax, dword ptr [Two]
    cmp ecx, eax        ' Is One > Two?
    jl 3f         ' jump over Swap, already Two > One
    xchg eax, ecx       ' Had To switch them, Now ecx has lower bound As it should
  3:
    Sub eax, ecx        ' Now eax Is Range    [ Range (before +1) ]
    inc eax                ' Add 1 To Range. Result Is correct For two params, And 0 If max Range
    jz 4f                    ' jump If we incremented &hFFFFFFFF up To 0; we have maximum Range
    ' At This Point ECX = lower,  EAX = Range, EDX = Random
    mul edx               ' Random * Range+1 == edx:eax. Use edx As result As If /(2^32).
    Add edx, ecx        ' Add lower bound To rand result In edx
  4:
    mov [Function], edx
  End Asm
End Function
 
q0Address = Cast( Ulong Ptr, Varptr( Q(0) ) ) ' Address of Q(0)
RandomizeCMWC( "" )                           ' Random seeding as default
 
Last edited by deltarho[1859] on Jun 15, 2018 14:24, edited 3 times in total.
counting_pine
Site Admin
Posts: 6323
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Re: CMWC4096

Post by counting_pine »

Hi, any chance you could retitle the thread to explain what CMWC4096 actually is?
EDIT: Thanks for doing that.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: CMWC4096 - a new random number generator

Post by deltarho[1859] »

Changed.

However, I believe that the original title was fine. I reckoned it would draw more people in. Once in they would see that CMWC4096 was a reasonable replacement for Mersenne Twister. Any other title and they may not bother and, in so doing, would miss out. Just applying a little psychology but, heh, I could be wrong <smile> Quite a views though in a short space of time just before I changed the title, so......
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: CMWC4096 - a new random number generator

Post by deltarho[1859] »

Just out of interest here is 64 bit:

Code: Select all

PCG32II   875.96
2         866.13
4         817.09
CryptoRnd 753.40
CMWC4096  699.78
3         683.32
RndMT     550.56
1         370.23
5         1.90
Slight change in ordering but the spread has increased quite a bit. Conclusion on CMWC4096 is the same as with PCG32II.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: CMWC4096 - a new random number generator

Post by deltarho[1859] »

gcc 8.1.0 vs gcc 5.2.0

To date I have not seen much difference between 8.1 and 5.2

gcc 8.1 32 bit has taken a shine to RndMT and then 'murdered it' with 64 bit. CMWC4096 gets 'pipped' by #3 with gcc 8.1.0 64 bit.

With these figures 8.1 is faster than 5.2.

It is not possible to predict the outcomes. PCG32II, on the other hand, does well with all tests.

These figures contradict earlier tests where I was using a simple loop. It would be interesting to see a disassemble of the simple loop to see what the compilers are up to.

Here are the four tables:

Code: Select all

gcc 5.2.0 32 bit

2          409.87
PCG32II    409.82
CMWC4096   406.58
4          401.39
CryptoRnd  400.96
1          384.20
3          376.93
RndMT      366.60
5            1.58

gcc 5.2.0 64 bit

PCG32II   875.96
2         866.13
4         817.09
CryptoRnd 753.40
CMWC4096  699.78
3         683.32
RndMT     550.56
1         370.23
5         1.90

gcc 8.1.0 32 bit

PCG32II   869.84
RndMT     602.90
2         483.30
CryptoRnd 474.72
4         472.12
CMWC4096  460.88
3         435.80
1         422.93
5         1.59

gcc 8.1.0 64 bit

PCG32II   1364.99
2         1052.38
4         1006.85
CryptoRnd 950.18
3         827.21
CMWC4096  802.16
RndMT     731.59
1         533.57
5         1.90
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: CMWC4096 - a new random number generator

Post by deltarho[1859] »

In another thread MrSwiss and Ed Davis were usung -fpu sse. I wrote Xorshift128+ and Xoroshiro128+ using sse at PowerBASIC. I had to, PowerBASIC has no idea what 64 bit unsigned is. <laugh>

Anyway, I have only dabbled with sse in regard to compiler switches at FreeBASIC.

The four tables above use the X87. The following four tables use sse.

Glory at last - the top four with gcc 5.2 and gcc 8.1 are all mine. I did not see that coming. 64 bit is a different story.

There is a pattern:

gcc 32 5.2 sse > fpu
gcc 64 5.2 sse < fpu
gcc 32 8.1 sse > fpu
gcc 64 8.1 sse < fpu

The 32 bit and 64 bit are clearly different animals.

Of the 8 tables I would go for 32 bit 5.2 sse followed by 32 bit 8.1 sse for the extra 'grunt'.

Maybe I am biased but the evidence seems to point that way. 64 bit seems very 'flaky' to me.

PCG32II uses a lot of shifts in the engine which may be favoured by sse. Oh, wait - RndMT and CMWC4096 are almost 'wall to wall' assembler and there is a lot of asm in CryptoRndII.

So, I finish a little project and test 8 binaries because I have no idea how they will behave.

David, have you tried this switch? Not another one. Where is my DOS 6.2 book, I need to write a batch file.

Bit of an eye opener, I must admit.

Code: Select all

gcc 5.2.0 32 bit
 
PCG32II   592.69
CMWC4096  565.02
CryptoRnd 551.19
RndMT     434.03
2         392.54
4         385.39
1         371.90
3         362.16
5         1.50
 
gcc 5.2.0 64 bit
 
2         879.65
PCG32II   846.03
4         817.59
3         683.78
CMWC4096  679.16
CryptoRnd 589.63
1         486.37
RndMT     469.18
5           1.92
 
gcc 8.1.0 32 bit
 
PCG32II   838.12
CryptoRnd 658.23
CMWC4096  621.59
RndMT     587.88
2         475.29
4         464.50
3         427.20
1         413.12
5         1.59
 
gcc 8.1.0 64 bit
 
PCG32II   1365.00
2         1053.78
4         1019.60
3          826.44
CryptoRnd  686.71
CMWC4096   612.02
RndMT      575.46
1          534.91
5            1.91
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: CMWC4096 - a new random number generator

Post by deltarho[1859] »

I mentioned in another thread about the limitations of generators when it came to shuffling citing the case where "it is impossible for a generator with less than 226 bits of internal state to produce all the possible permutations of a 52-card deck, for example".

CMWC4096's period of 131086 will cover all permutations up to 10,940 elements. It seemed crazy not to have a Knuth Shuffle function in CMWC4096.

Add this to the declarations:

Code: Select all

Declare Function KnuthShuffle( Byval As String, As Ulong, As Ulong ) As String
Here is the function:

Code: Select all

Private Function KnuthShuffle( Byval sText As String, Byval ulStart As Ulong, Byval ulLength As Ulong ) As String
' Returns a shuffled string but sText remains unshuffled
' If we use '1, Len( sText )' then we have no margins and we get a full shuffle on sText.
' With 'n, 0', for some n, then we get no shuffling.
' With 'n, 1', for some n, character 'n' shuffles with itself so, effectively, no shuffling.
Dim As Ulong i, temp
  temp = ulStart + ulLength - 3
  For i = ulStart - 1 To temp
    Swap sText[i], sText[( CMWCRange( i, temp + 1 ) )]
  Next
  Return sText
End Function
KnuthShuffle shuffles a sub-string within a main string.

With ulStart = 1 and ulLength = Len(main string) then the whole string is shuffled.

The function returns a shuffled sub-string but the main string remains unshuffled. We could, of course, assign the main string to the shuffled string.

Example:

Code: Select all

Dim sText As String
sText = "HousesOfParliament"
Print KnuthShuffle( sText, 1, Len(sText))
Print sText
which could deliver

Code: Select all

oOneHsulaPtisearfm
HousesOfParliament
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: CMWC4096 - a new random number generator

Post by deltarho[1859] »

There is something which I should have explained better. If the lag~array was populated with a 2^32 LCG then we would be restricted to only 2^32 possible lag~arrays and find ourselves in the same boat as a standard Mersenne Twister implementation. We have no such restriction when we populate the lag~array with cryptographic random numbers. It is so easy to miss something out which may have been mentioned in several other threads. Just for the record, this is why I introduced RndMT which populates the state vector with cryptographic random numbers.
Post Reply