CMWC4096 - a new random number generator
-
- Posts: 4292
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
CMWC4096 - a new random number generator
Bug found - I'll be back
Last edited by deltarho[1859] on Jun 15, 2018 14:22, edited 2 times in total.
-
- Posts: 4292
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: CMWC4096 - a new random number generator
I'm back <smile>
When we reference a variable in asm we have to use Dword Ptr like this:
I ported some code from PowerBASIC and did not know that we have to do the same when referencing function parameters. I was using WinFBE and if any code uses asm then we should always open 'View Output Window' after every compilation otherwise we will not know if anything untoward has happened. I am seriously considering using poseidonFB if asm is used.
Anyway, where was I? <laugh>
I have ben meaning to do this for sometime but there was always something else going on.
The following is cross platform so you can drop it into either a Windows application or a Linux application.
A rule of thumb is that if we want 2^n random numbers in an application session then we should use a generator with a period of at least 2^2n. PowerBASIC's RND has a period of 2^32 so the rule of thumb will break if we wanted more than 65,536 random numbers. Doing nothing other than generate random numbers my machine will see PB's RND start repeating it's sequence in less than one minute. Having said that PB's RND will be fine for many applications. For other applications a period in excess of 2^32 will be needed.
If we jump straight into a period of 2^64 then we are in a very different environment given that 2^64 = 2^32 * 2^32.
Sebastiano Vigna wrote: "In other words: any generator with a period beyond 2^256 has a period that is sufficient for every imaginable application." He wrote similarly, but not quite as strong, with regard his 2^128 generators before moving up to his newer 2^256 generators.
However, there are some who require much larger periods and have been using the Mersenne Twister (MT) with a period of 2^19937-1. MT was introduced in 1997 and it had a much better quality random numbers than a Linear Congruential Generator (LCG) which was introduced in 1951. It has been found that there is a slight bias in the Mersenne Twister but it takes a while to manifest itself. At 256GB it fails a PractRand test, one of the modern random number testing suites.
The algorithm presented here uses what is called a lag-array where we move through the array to the end and start over. This provides the means to achieve very large periods. Lag~4096 is used, with 4096 * Ulongs, giving a period of 2^131086 which dwarves MT's period of 2^19937-1. In decimal terms that is greater than 10^39460. If you are into large periods this will give you some serious bragging rights. <smile> The algorithm is Complimentary Multiply With Carry (CMWC) devised by George Marsaglia. There were issues with MWC and addressed by Marsaglia himself. The Wikipedia entry here gives a good background. Marsaglia was obsessed with a generator's period and put out a challenge, not long before he died in 2011, for a lag~8192 or a lag~16384. I never did find out why but I suspect that he figured as the period tends to infinity the generator tends toward a cryptographic generator. That is the best that I can come up with.
For me, a generator is 'top drawer' if it manages to get to the 1TB test with PractRand and the author of PractRand says as much. Both my CryptoRndII and PCG32II do just that and now CMWC4096 has done. Even Vigna's xorshift128+ and xoroshiro128+ fail PractRand at 64GB. His later revisions pass and, it seems, his new 256 bit generators pass as well. In fact, I went out for the day and when I got back the tests had gone beyond 2TB. Here are the last three tests:
It was a long test, taking over 11.8 hours, and, as you can see, the 2TB itself took over 5.9 hours.
There was only five unusual anomalies in total, at PractRand's bottom scale, and there is nothing unusual about that for any generator.
So, CMWC4096 is 'top drawer'.
I am no longer testing for speed with a simple loop to determine a MHz output. My RndMT is faster than FB's generator #3 when using a simple loop test but in 'real world' tests they are reversed. I have a graphic test which also has RndMT being slower. The difference is, however, not significant. I suspect that with simple loops gcc optimizing produces code which does not reflect the BASIC code being compiled. RndMT is a MT but gives access to the whole period and not just a tiny 2^32 portion of it.
What I am now using is the core statements of a Monte Carlo test.
x = RND : y = RND : z = Sqr( x*x + y*y)
and putting that in a loop.
Instead of timing a For/Next construct the tests are given 20 seconds and the number of loops achieved are noted. This is along the lines of a suggestion made by coderJeff in another thread.
Here is a comparison of the FB's generators and some of mine. The values are the number of loops in MHz. Several runs were undertaken on Windows but only one shown, the results were found to be very stable.
My interest in this thread is how CMWC4096 compares with FB's #3 and my RndMT.
If your reason for using MT is it's large period then, obviously, CMWC4096's period is a plus. CMWC4096 qualifies as 'top drawer' but, as mentioned above, MT fails a PractRand test at 256GB. CMWC4096 is much faster than MT.
So, I propose that CMWC4096 be used as a direct replacement for MT.
As a side note it is worth looking at the table above in general.
FB's #2 generator is the fastest using a LCG and is the first one at the Wikipedia entry here. However, it's speed is almost identical to PCG32II and is decidedly not 'top drawer'. If it's speed was noticaby faster than PCG32II and 'top drawer' was not an issue then it would be a contender as it is easily used by simply declaring it with 'Randomize , 2'.
FB's #4 is fast but has a "low degree of randomness" according to the docs and since it is slower than PCG32II and CMWC4096, both 'top drawer, I see little need to use it even though it is easily added to our code. FB's #1 'the C runtime library's rand() function' is not held in much esteem by anyone that I can find and since it is not that fast should not, in my opinion, be given the time of day. FB's #5 in Windows was designed to fill buffers and not be called one number at a time.
This leaves us with CryptoRnd and PCG32II. If 'top drawer' is an issue and a large period is not a requirement then PCG32II is the best of the bunch. It is more compact than CryptoRnd, is not platform dependent and is thread safe.
Usage
At the moment CMWC4096 returns Ulongs, Double and Range using CMWCdw, CMWCse and CMWCRange respectively.
The 'se' in CMWCse means single extended and is terminology that I have used previously to indicate that although the generator works on 32 bits it outputs Doubles so that the 32 bit granularity is maintained. This is exactly what the FB generators do.
There is a facility to repeat a sequence. With RndMT we supply a string which is then subject to hashing and gives us access to the whole period in the same way as random seeding does. However, the hashing is via Microsoft's APIs leaving Linux users to write their own code for that job. With CMWC4096 I wanted to keep the code cross platform. The question is do we need, in this case, to be able to access the whole period. I would argue that we do not and if we restricted the entry points for repeat sequencing to 2^32 then we can use Rnd; which is cross platform.
For a random sequence we use RandomizeCMWC( "" ); the same method that RndMT uses.
Originally I used
for randomly seeding the lag array and carry. However, Linux users would need to supply an equivalent. I refuse point blank to use a PRNG for this so use FB's generator #5 which is a CPRNG, cryptographic, and cross platform. There is a penalty for being stubborn - on my machine RandomizeCMWC( "" ) takes about 25 milli-seconds. That is an awful long time on a modern computer but I doubt that anyone will be conscious of the wait and it should only be employed once. In fact, RandomizeCMWC( "" ) is used in CMWC4096.bas so we can ignore seeding unless we want fixed sequences.
For fixed sequences we use Randomize( seed As String ) where seed is of the form "<Double>". We then use 'dblSeed = Val( seed ) : Randomize dblSeed, 3'. That is, MT with a fixed seed is used for the initial lag array with the index and carry being fixed. For a particular index the multiply in MWC is undertaken and the carry of the previous index is used. In a nutshell this is how MWC works.
The fixed sequence code is only a few lines and is good enough for me. I rarely use fixed sequencing and do not need the sophistication used in RndMT. If you are not happy with this then be my guest and rewrite this section. <smile>
That is it!
Following is some example usage code and CMWC4096.bas
ForumTest.bas
CMWC4096.bas
When we reference a variable in asm we have to use Dword Ptr like this:
Code: Select all
mov Dword Ptr [TestVar], 123
Anyway, where was I? <laugh>
I have ben meaning to do this for sometime but there was always something else going on.
The following is cross platform so you can drop it into either a Windows application or a Linux application.
A rule of thumb is that if we want 2^n random numbers in an application session then we should use a generator with a period of at least 2^2n. PowerBASIC's RND has a period of 2^32 so the rule of thumb will break if we wanted more than 65,536 random numbers. Doing nothing other than generate random numbers my machine will see PB's RND start repeating it's sequence in less than one minute. Having said that PB's RND will be fine for many applications. For other applications a period in excess of 2^32 will be needed.
If we jump straight into a period of 2^64 then we are in a very different environment given that 2^64 = 2^32 * 2^32.
Sebastiano Vigna wrote: "In other words: any generator with a period beyond 2^256 has a period that is sufficient for every imaginable application." He wrote similarly, but not quite as strong, with regard his 2^128 generators before moving up to his newer 2^256 generators.
However, there are some who require much larger periods and have been using the Mersenne Twister (MT) with a period of 2^19937-1. MT was introduced in 1997 and it had a much better quality random numbers than a Linear Congruential Generator (LCG) which was introduced in 1951. It has been found that there is a slight bias in the Mersenne Twister but it takes a while to manifest itself. At 256GB it fails a PractRand test, one of the modern random number testing suites.
The algorithm presented here uses what is called a lag-array where we move through the array to the end and start over. This provides the means to achieve very large periods. Lag~4096 is used, with 4096 * Ulongs, giving a period of 2^131086 which dwarves MT's period of 2^19937-1. In decimal terms that is greater than 10^39460. If you are into large periods this will give you some serious bragging rights. <smile> The algorithm is Complimentary Multiply With Carry (CMWC) devised by George Marsaglia. There were issues with MWC and addressed by Marsaglia himself. The Wikipedia entry here gives a good background. Marsaglia was obsessed with a generator's period and put out a challenge, not long before he died in 2011, for a lag~8192 or a lag~16384. I never did find out why but I suspect that he figured as the period tends to infinity the generator tends toward a cryptographic generator. That is the best that I can come up with.
For me, a generator is 'top drawer' if it manages to get to the 1TB test with PractRand and the author of PractRand says as much. Both my CryptoRndII and PCG32II do just that and now CMWC4096 has done. Even Vigna's xorshift128+ and xoroshiro128+ fail PractRand at 64GB. His later revisions pass and, it seems, his new 256 bit generators pass as well. In fact, I went out for the day and when I got back the tests had gone beyond 2TB. Here are the last three tests:
Code: Select all
rng=RNG_stdin32, seed=0x3d92e4fc
length= 512 gigabytes (2^39 bytes), time= 5316 seconds
no anomalies in 213 test result(s)
rng=RNG_stdin32, seed=0x3d92e4fc
length= 1 terabyte (2^40 bytes), time= 10611 seconds
no anomalies in 220 test result(s)
rng=RNG_stdin32, seed=0x3d92e4fc
length= 2 terabytes (2^41 bytes), time= 21278 seconds
no anomalies in 229 test result(s)
There was only five unusual anomalies in total, at PractRand's bottom scale, and there is nothing unusual about that for any generator.
So, CMWC4096 is 'top drawer'.
I am no longer testing for speed with a simple loop to determine a MHz output. My RndMT is faster than FB's generator #3 when using a simple loop test but in 'real world' tests they are reversed. I have a graphic test which also has RndMT being slower. The difference is, however, not significant. I suspect that with simple loops gcc optimizing produces code which does not reflect the BASIC code being compiled. RndMT is a MT but gives access to the whole period and not just a tiny 2^32 portion of it.
What I am now using is the core statements of a Monte Carlo test.
x = RND : y = RND : z = Sqr( x*x + y*y)
and putting that in a loop.
Instead of timing a For/Next construct the tests are given 20 seconds and the number of loops achieved are noted. This is along the lines of a suggestion made by coderJeff in another thread.
Here is a comparison of the FB's generators and some of mine. The values are the number of loops in MHz. Several runs were undertaken on Windows but only one shown, the results were found to be very stable.
Code: Select all
2 409.87
PCG32II 409.82
CMWC4096 406.58
4 401.39
CryptoRnd 400.96
1 384.20
3 376.93
RndMT 366.60
5 1.58
If your reason for using MT is it's large period then, obviously, CMWC4096's period is a plus. CMWC4096 qualifies as 'top drawer' but, as mentioned above, MT fails a PractRand test at 256GB. CMWC4096 is much faster than MT.
So, I propose that CMWC4096 be used as a direct replacement for MT.
As a side note it is worth looking at the table above in general.
FB's #2 generator is the fastest using a LCG and is the first one at the Wikipedia entry here. However, it's speed is almost identical to PCG32II and is decidedly not 'top drawer'. If it's speed was noticaby faster than PCG32II and 'top drawer' was not an issue then it would be a contender as it is easily used by simply declaring it with 'Randomize , 2'.
FB's #4 is fast but has a "low degree of randomness" according to the docs and since it is slower than PCG32II and CMWC4096, both 'top drawer, I see little need to use it even though it is easily added to our code. FB's #1 'the C runtime library's rand() function' is not held in much esteem by anyone that I can find and since it is not that fast should not, in my opinion, be given the time of day. FB's #5 in Windows was designed to fill buffers and not be called one number at a time.
This leaves us with CryptoRnd and PCG32II. If 'top drawer' is an issue and a large period is not a requirement then PCG32II is the best of the bunch. It is more compact than CryptoRnd, is not platform dependent and is thread safe.
Usage
At the moment CMWC4096 returns Ulongs, Double and Range using CMWCdw, CMWCse and CMWCRange respectively.
The 'se' in CMWCse means single extended and is terminology that I have used previously to indicate that although the generator works on 32 bits it outputs Doubles so that the 32 bit granularity is maintained. This is exactly what the FB generators do.
There is a facility to repeat a sequence. With RndMT we supply a string which is then subject to hashing and gives us access to the whole period in the same way as random seeding does. However, the hashing is via Microsoft's APIs leaving Linux users to write their own code for that job. With CMWC4096 I wanted to keep the code cross platform. The question is do we need, in this case, to be able to access the whole period. I would argue that we do not and if we restricted the entry points for repeat sequencing to 2^32 then we can use Rnd; which is cross platform.
For a random sequence we use RandomizeCMWC( "" ); the same method that RndMT uses.
Originally I used
Code: Select all
Declare Function MyRandomInt Lib "Advapi32.dll" Alias "SystemFunction036" _
( RandomBuffer As Any Ptr, RandomBufferLength As ULong ) As Byte
For fixed sequences we use Randomize( seed As String ) where seed is of the form "<Double>". We then use 'dblSeed = Val( seed ) : Randomize dblSeed, 3'. That is, MT with a fixed seed is used for the initial lag array with the index and carry being fixed. For a particular index the multiply in MWC is undertaken and the carry of the previous index is used. In a nutshell this is how MWC works.
The fixed sequence code is only a few lines and is good enough for me. I rarely use fixed sequencing and do not need the sophistication used in RndMT. If you are not happy with this then be my guest and rewrite this section. <smile>
That is it!
Following is some example usage code and CMWC4096.bas
ForumTest.bas
Code: Select all
#Include "CMWC4096.bas"
Dim As Long i
For i = 1 To 6
Print CMWCdw
Next
Print
For i = 1 To 6
Print CMWCse
Next
Print
For i = 1 To 6
Print CMWCRange(0,9)
Next
Print
print "Average 10^9 CMWCse"
Dim totse As Double
For i = 1 To 10^8
totse += CMWCse
Next
Print totse/10^8
Print
Print "Average 10^9 CMWCRange(1,6)"
Dim As Ulongint totr
For i = 1 To 10^8
totr += CMWCRange(1,6)
Next
Print totr/10^8
print
print "Fixed sequencing"
print
RandomizeCMWC( "3.141592654" )
for i = 1 to 10
print CMWCse
Next
print
RandomizeCMWC( "3.141592654" )
for i = 1 to 10
print CMWCse
Next
Sleep
CMWC4096.bas
Code: Select all
Declare Sub RandomizeCMWC( As String )
Declare Function CMWCdw As Ulong
Declare Function CMWCse As Double
Declare Function CMWCRange( As Long, As Long ) As Long
#Macro EngineCode
mov eax, index
mov ecx, 4095
Add eax, 1
And eax, ecx
mov index, eax
mov ecx, dword Ptr [q0Address] ' Address of Q(0)
mov eax, [ecx+eax*4] ' Contents of Q(index)
mov edx, 18782
mul edx ' Multiplier * Q(index)
mov ecx, carry
Add eax, ecx ' + carry
adc edx, 0 ' In Case there Is a carry of 1, add it
mov carry, edx ' Store carry
Add eax, edx
jnc Short 1f
cmp eax, -1
jne Short 2f
1:
Add eax, 1
Add dword Ptr [carry], 1
2:
' The CMWC aspect (Base-1) - x
mov edx, &H0FFFFFFFE
Sub edx, eax ' edx = &H0FFFFFFFE - x
mov eax, index
mov ecx, dword Ptr [q0Address] 'Address of Q(0)
mov [ecx+eax*4], edx ' Store Q(index)
#Endmacro
Dim Shared As Ulong Q(0 To 4095)
Dim Shared As Ulong index, carry
Dim Shared As Ulong Ptr q0Address
Dim Shared As Ulong TempVar
Private Sub RandomizeCMWC( seed As String )
Dim As Double dblSeed
Dim As Long i
If seed = "" Then ' Random seeding
index = Int( Rnd * 4096 )
Randomize , 5 ' Cryptographic
For i = 0 To 4095
' seeds should be less than 2^32-1
Q(i) = Cast( Ulong, Rnd*4294967295 )
Next
' Make 0 < carry < Multiplier = 18782
carry = Int( Rnd*18781 + 1)
Else ' Fixed seeding
index = 4095
dblSeed = Val( seed )
Randomize dblSeed, 3 ' Mersenne Twister
For i = 0 To 4095
' seeds should be less than 2^32-1
Q(i) = Cast( Ulong, Rnd*4294967295 )
Next
carry = 12345
End If
End Sub
Private Function CMWCdw As Ulong
Asm
EngineCode
mov [Function], edx
End Asm
End Function
Private Function CMWCse As Double
Asm
EngineCode
mov dword Ptr [TempVar], edx
End Asm
Return TempVar/4294967296.0
End Function
Private Function CMWCRange( Byval One As Long, Byval Two As Long ) As Long
Asm
EngineCode
' Range Asm by John Gleason at PowerBASIC forums
mov ecx, dword ptr [One] ' Now, evaluate parameters.
mov eax, dword ptr [Two]
cmp ecx, eax ' Is One > Two?
jl 3f ' jump over Swap, already Two > One
xchg eax, ecx ' Had To switch them, Now ecx has lower bound As it should
3:
Sub eax, ecx ' Now eax Is Range [ Range (before +1) ]
inc eax ' Add 1 To Range. Result Is correct For two params, And 0 If max Range
jz 4f ' jump If we incremented &hFFFFFFFF up To 0; we have maximum Range
' At This Point ECX = lower, EAX = Range, EDX = Random
mul edx ' Random * Range+1 == edx:eax. Use edx As result As If /(2^32).
Add edx, ecx ' Add lower bound To rand result In edx
4:
mov [Function], edx
End Asm
End Function
q0Address = Cast( Ulong Ptr, Varptr( Q(0) ) ) ' Address of Q(0)
RandomizeCMWC( "" ) ' Random seeding as default
Last edited by deltarho[1859] on Jun 15, 2018 14:24, edited 3 times in total.
-
- Site Admin
- Posts: 6323
- Joined: Jul 05, 2005 17:32
- Location: Manchester, Lancs
Re: CMWC4096
Hi, any chance you could retitle the thread to explain what CMWC4096 actually is?
EDIT: Thanks for doing that.
EDIT: Thanks for doing that.
-
- Posts: 4292
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: CMWC4096 - a new random number generator
Changed.
However, I believe that the original title was fine. I reckoned it would draw more people in. Once in they would see that CMWC4096 was a reasonable replacement for Mersenne Twister. Any other title and they may not bother and, in so doing, would miss out. Just applying a little psychology but, heh, I could be wrong <smile> Quite a views though in a short space of time just before I changed the title, so......
However, I believe that the original title was fine. I reckoned it would draw more people in. Once in they would see that CMWC4096 was a reasonable replacement for Mersenne Twister. Any other title and they may not bother and, in so doing, would miss out. Just applying a little psychology but, heh, I could be wrong <smile> Quite a views though in a short space of time just before I changed the title, so......
-
- Posts: 4292
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: CMWC4096 - a new random number generator
Just out of interest here is 64 bit:
Slight change in ordering but the spread has increased quite a bit. Conclusion on CMWC4096 is the same as with PCG32II.
Code: Select all
PCG32II 875.96
2 866.13
4 817.09
CryptoRnd 753.40
CMWC4096 699.78
3 683.32
RndMT 550.56
1 370.23
5 1.90
-
- Posts: 4292
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: CMWC4096 - a new random number generator
gcc 8.1.0 vs gcc 5.2.0
To date I have not seen much difference between 8.1 and 5.2
gcc 8.1 32 bit has taken a shine to RndMT and then 'murdered it' with 64 bit. CMWC4096 gets 'pipped' by #3 with gcc 8.1.0 64 bit.
With these figures 8.1 is faster than 5.2.
It is not possible to predict the outcomes. PCG32II, on the other hand, does well with all tests.
These figures contradict earlier tests where I was using a simple loop. It would be interesting to see a disassemble of the simple loop to see what the compilers are up to.
Here are the four tables:
To date I have not seen much difference between 8.1 and 5.2
gcc 8.1 32 bit has taken a shine to RndMT and then 'murdered it' with 64 bit. CMWC4096 gets 'pipped' by #3 with gcc 8.1.0 64 bit.
With these figures 8.1 is faster than 5.2.
It is not possible to predict the outcomes. PCG32II, on the other hand, does well with all tests.
These figures contradict earlier tests where I was using a simple loop. It would be interesting to see a disassemble of the simple loop to see what the compilers are up to.
Here are the four tables:
Code: Select all
gcc 5.2.0 32 bit
2 409.87
PCG32II 409.82
CMWC4096 406.58
4 401.39
CryptoRnd 400.96
1 384.20
3 376.93
RndMT 366.60
5 1.58
gcc 5.2.0 64 bit
PCG32II 875.96
2 866.13
4 817.09
CryptoRnd 753.40
CMWC4096 699.78
3 683.32
RndMT 550.56
1 370.23
5 1.90
gcc 8.1.0 32 bit
PCG32II 869.84
RndMT 602.90
2 483.30
CryptoRnd 474.72
4 472.12
CMWC4096 460.88
3 435.80
1 422.93
5 1.59
gcc 8.1.0 64 bit
PCG32II 1364.99
2 1052.38
4 1006.85
CryptoRnd 950.18
3 827.21
CMWC4096 802.16
RndMT 731.59
1 533.57
5 1.90
-
- Posts: 4292
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: CMWC4096 - a new random number generator
In another thread MrSwiss and Ed Davis were usung -fpu sse. I wrote Xorshift128+ and Xoroshiro128+ using sse at PowerBASIC. I had to, PowerBASIC has no idea what 64 bit unsigned is. <laugh>
Anyway, I have only dabbled with sse in regard to compiler switches at FreeBASIC.
The four tables above use the X87. The following four tables use sse.
Glory at last - the top four with gcc 5.2 and gcc 8.1 are all mine. I did not see that coming. 64 bit is a different story.
There is a pattern:
gcc 32 5.2 sse > fpu
gcc 64 5.2 sse < fpu
gcc 32 8.1 sse > fpu
gcc 64 8.1 sse < fpu
The 32 bit and 64 bit are clearly different animals.
Of the 8 tables I would go for 32 bit 5.2 sse followed by 32 bit 8.1 sse for the extra 'grunt'.
Maybe I am biased but the evidence seems to point that way. 64 bit seems very 'flaky' to me.
PCG32II uses a lot of shifts in the engine which may be favoured by sse. Oh, wait - RndMT and CMWC4096 are almost 'wall to wall' assembler and there is a lot of asm in CryptoRndII.
So, I finish a little project and test 8 binaries because I have no idea how they will behave.
David, have you tried this switch? Not another one. Where is my DOS 6.2 book, I need to write a batch file.
Bit of an eye opener, I must admit.
Anyway, I have only dabbled with sse in regard to compiler switches at FreeBASIC.
The four tables above use the X87. The following four tables use sse.
Glory at last - the top four with gcc 5.2 and gcc 8.1 are all mine. I did not see that coming. 64 bit is a different story.
There is a pattern:
gcc 32 5.2 sse > fpu
gcc 64 5.2 sse < fpu
gcc 32 8.1 sse > fpu
gcc 64 8.1 sse < fpu
The 32 bit and 64 bit are clearly different animals.
Of the 8 tables I would go for 32 bit 5.2 sse followed by 32 bit 8.1 sse for the extra 'grunt'.
Maybe I am biased but the evidence seems to point that way. 64 bit seems very 'flaky' to me.
PCG32II uses a lot of shifts in the engine which may be favoured by sse. Oh, wait - RndMT and CMWC4096 are almost 'wall to wall' assembler and there is a lot of asm in CryptoRndII.
So, I finish a little project and test 8 binaries because I have no idea how they will behave.
David, have you tried this switch? Not another one. Where is my DOS 6.2 book, I need to write a batch file.
Bit of an eye opener, I must admit.
Code: Select all
gcc 5.2.0 32 bit
PCG32II 592.69
CMWC4096 565.02
CryptoRnd 551.19
RndMT 434.03
2 392.54
4 385.39
1 371.90
3 362.16
5 1.50
gcc 5.2.0 64 bit
2 879.65
PCG32II 846.03
4 817.59
3 683.78
CMWC4096 679.16
CryptoRnd 589.63
1 486.37
RndMT 469.18
5 1.92
gcc 8.1.0 32 bit
PCG32II 838.12
CryptoRnd 658.23
CMWC4096 621.59
RndMT 587.88
2 475.29
4 464.50
3 427.20
1 413.12
5 1.59
gcc 8.1.0 64 bit
PCG32II 1365.00
2 1053.78
4 1019.60
3 826.44
CryptoRnd 686.71
CMWC4096 612.02
RndMT 575.46
1 534.91
5 1.91
-
- Posts: 4292
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: CMWC4096 - a new random number generator
I mentioned in another thread about the limitations of generators when it came to shuffling citing the case where "it is impossible for a generator with less than 226 bits of internal state to produce all the possible permutations of a 52-card deck, for example".
CMWC4096's period of 131086 will cover all permutations up to 10,940 elements. It seemed crazy not to have a Knuth Shuffle function in CMWC4096.
Add this to the declarations:
Here is the function:
KnuthShuffle shuffles a sub-string within a main string.
With ulStart = 1 and ulLength = Len(main string) then the whole string is shuffled.
The function returns a shuffled sub-string but the main string remains unshuffled. We could, of course, assign the main string to the shuffled string.
Example:
which could deliver
CMWC4096's period of 131086 will cover all permutations up to 10,940 elements. It seemed crazy not to have a Knuth Shuffle function in CMWC4096.
Add this to the declarations:
Code: Select all
Declare Function KnuthShuffle( Byval As String, As Ulong, As Ulong ) As String
Code: Select all
Private Function KnuthShuffle( Byval sText As String, Byval ulStart As Ulong, Byval ulLength As Ulong ) As String
' Returns a shuffled string but sText remains unshuffled
' If we use '1, Len( sText )' then we have no margins and we get a full shuffle on sText.
' With 'n, 0', for some n, then we get no shuffling.
' With 'n, 1', for some n, character 'n' shuffles with itself so, effectively, no shuffling.
Dim As Ulong i, temp
temp = ulStart + ulLength - 3
For i = ulStart - 1 To temp
Swap sText[i], sText[( CMWCRange( i, temp + 1 ) )]
Next
Return sText
End Function
With ulStart = 1 and ulLength = Len(main string) then the whole string is shuffled.
The function returns a shuffled sub-string but the main string remains unshuffled. We could, of course, assign the main string to the shuffled string.
Example:
Code: Select all
Dim sText As String
sText = "HousesOfParliament"
Print KnuthShuffle( sText, 1, Len(sText))
Print sText
Code: Select all
oOneHsulaPtisearfm
HousesOfParliament
-
- Posts: 4292
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: CMWC4096 - a new random number generator
There is something which I should have explained better. If the lag~array was populated with a 2^32 LCG then we would be restricted to only 2^32 possible lag~arrays and find ourselves in the same boat as a standard Mersenne Twister implementation. We have no such restriction when we populate the lag~array with cryptographic random numbers. It is so easy to miss something out which may have been mentioned in several other threads. Just for the record, this is why I introduced RndMT which populates the state vector with cryptographic random numbers.