Mersenne Twister: A new twist.

Windows specific questions.
deltarho[1859]
Posts: 4308
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Mersenne Twister: A new twist.

Post by deltarho[1859] »

Solved the assembler problem. The author uses the stack for the state space and starts with 'Push 0' and ends with 'pop ebx'. I didn't think it was relevant with my usage. At some point we have 'inc DWORD PTR [esp]. If we didn't have that Push then we would jump past the first element of the state space. The ebx in the Pop is not important - it just balances the stack. When the state space exhausted we effectively got a buffer overrun. So, not unlike using 'i = -1' before a Loop construct and then increment i inside the loop.

Having got it working I did a spot of optimising and am now getting 91.6MHz, beating the FreeBASIC option 3, during the same run, of 85.8MHZ.

It is not big enough to gloat but it is on the right side for me to say that RndMT gives the maximum entry points, 2^19937 - 1, to the Mersenne Twister sequence compared with option 3's 2^32 entry points and it is faster than option 3.

We are still behind option 2 at 105MHz but this exercise was not about speed, it was about trying to get a better Mersenne Twister.

We have an interesting anomaly though.

You cannot run this code but it generates random 32 bit integers from the FreeBASIC engine and then from the Assembler engine.

Code: Select all

#Include "RndMT.bas"
 
Dim As Long i, lTest
Dim as UInteger uValue
 
lTest = 15575
 
Print "FreeBASIC"
RandomizeRndMT( "10 Downing Street" )
For i = 1 to lTest
  uValue = genrand_int32()
next
Print i;" ";genrand_int32();" ";mti
Print i+1;" ";genrand_int32();" ";mti
Print
Print "Assembler"
RandomizeRndMT( "10 Downing Street" )
For i = 1 to lTest
  uValue = ASMgenrand_int32()
next
Print i;" ";ASMgenrand_int32();" ";mti
Print i+1;" ";ASMgenrand_int32();" ";mti
 
Print "Done"
 
Sleep
with a typical output of

Code: Select all

FreeBASIC
 15576 1557749086  600
 15577 2253664216  601
 
Assembler
 15576 1557749086  600
 15577 1274588068  601
Done
They agree up to 15,576 numbers and then at 15,577, element 601 of the current state space, they drift apart. It is the same for any user seed.

So, who is throwing a 'wobbly'? FreeBASIC or Assembler. I have just done an average of 100,000,000 doubles and got 0.5000030400061463 with the Assembler version. Admittedly not a proof that all is well but it would have been proof of something not being well had I not got that. When I have time I will do a 1TB PractRand test on the Assembler version. If it passes that then we will have a very viable alternative to option 3.

I need to be satisfied that all is well before I publish the Assembler version but I am a bit happier than I was earlier. <no longer grrrrh!>
deltarho[1859]
Posts: 4308
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Mersenne Twister: A new twist.

Post by deltarho[1859] »

Further optimisation - now getting 95.3MHz.

I reckon that is as good as it will get, and stay readable.

PractRand up to 64GB - no concerns yet.
deltarho[1859]
Posts: 4308
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Mersenne Twister: A new twist.

Post by deltarho[1859] »

The Assembler version got through 1TB PractRand analysis with a small anamoly during the 256MB test. So, passed with flying colours.

Below is the source RndMT.bas but here is a link, to save you the trouble, to RndMT.zip which includes a folder RndMT which, in turn includes, RndMT.bi and libRndMT.a.

I have limited experience with building libraries but so far the code seems to take a performance hit. Not so with RndMT - the library is as fast as the raw source code.

Here is a copy of the bi file:

Code: Select all

Declare Function RndMTS() As Single
Declare Function RndMTD() As Double
Declare Function RndMTR( As Long, As Long ) As Long
Declare Function RndMTUL() As ULong
Declare Sub RandomizeRndMT( As String )
#inclib "RndMT"
So, we have Single, Double, Range and a new one UL. There is probably not much use for RndMTUL but since the generator churns out 32 bit integers for a living it seemed a rightful addition.

RandomizeRndMT( "" ) ==> System seeding
RandomizeRndMT( "Some string" ) ==> User seeding

Example usage:

Code: Select all

#Include Once "RndMT.bi"
RandomizeRndMT( "" )
Print RndMTS
Print RndMTD
Print RndMTUL
Print RndMTR(1,100)
Print "Done"
Sleep

Code: Select all

                 RndMT      FreeBasic
Entry points  2^19937 - 1     2^32
Speed           95.3MHz      85.8MHz  (i7-3770K 3.50GHz)

Don't forget, Windows Vista or later. However, you have the source code so you can do whatever you like with it.

Have fun.

RndMT.bas

Code: Select all

'   The C-program for MT19937, with initialization improved 2002/1/26
'   was coded by Takuji Nishimura and Makoto Matsumoto.
'
'   Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura,
'   All rights reserved.                          
'
' Translated to FreeBASIC by Greg Lyon - April 2006
'
' UInteger conversion to Single and Double, Range and entry point
' to Mersenne Twister maximised by David Roberts - March 2017

Const N As uInteger = 624
Const M As uInteger = 397
Const MATRIX_A   As uInteger = &H9908b0dfUL ' constant vector a
Const UPPER_MASK As uInteger = &H80000000UL ' most significant w-r bits
Const LOWER_MASK As uInteger = &H7fffffffUL ' least significant r bits

Dim Shared mt(N) As uInteger ' the array For the state vector
Dim Shared mti As Long = N + 1 ' mti == N+1 means mt[N] Is Not initialized   

' initializes mt[N] with a seed
Private Sub init_genrand(ByVal s As uInteger)
  mt(0)= s And &HffffffffUL
  For mti = 1 To N-1
    mt(mti) = ( 1812433253UL * ( mt(mti-1) Xor ( mt(mti-1) Shr 30 ) ) + mti ) 
    ' See Knuth TAOCP Vol2. 3rd Ed. P.106 For multiplier.
    ' In the previous versions, MSBs of the seed affect  
    ' only MSBs of the array mt[].                       
    ' 2002/01/09 modified by Makoto Matsumoto            
    mt(mti) = mt(mti) And &HffffffffUL
    ' For >32 Bit machines 
  Next mti
End Sub

'-------------------------------------------------------

' Adaptation of code by Sterling Stuart Stein
Private Function genrand_int32() As UInteger
Static As UInteger Ptr ptrMT
Static mag01(1) As UInteger = {&h00000000, &h9908b0df}
Static As UInteger Ptr ptrmag01
Static As Long Counter = 0

ptrMT = @mt(0)
ptrmag01 = @mag01(0)

Asm
  mov edi, dword ptr [ptrMT]
  cmp dword ptr [mti], N
  jb XB
  mov esi, edi
SB:
  mov eax,[esi]
  and eax, UPPER_MASK
  mov ebx,[esi+4]
  and ebx, LOWER_MASK
  or eax,ebx
  mov ecx,eax
  shr eax,1
  mov edx,esi
  add edx,(M*4)
  mov ebx,[edx]
  xor eax,ebx
  and ecx,1
  mov ebx, dword ptr [ptrmag01]
  xor eax, [ebx + ecx*4]
  mov [esi],eax
  add esi,4
  inc dword ptr [Counter]
  cmp dword ptr [Counter],(N-M)
  jnz SB
UB:
  mov eax,[esi]
  and eax,UPPER_MASK
  mov ebx,[esi+4]
  and ebx,LOWER_MASK
  or eax,ebx
  mov ecx,eax
  shr eax,1
  mov edx,esi
  add edx,((M-N)*4)
  mov ebx,[edx]
  xor eax,ebx
  and ecx,1
  mov ebx, dword ptr [ptrmag01]
  xor eax, [ebx + ecx*4]
  mov [esi],eax
  add esi,4
  inc dword ptr [Counter]
  cmp dword ptr [Counter],(N-1)
  jnz UB
  mov dword ptr [Counter], 0
  
  ' Added by David Roberts
  'y = (mt((N - 1)) and UPPER_MASK) or (mt(0) and LOWER_MASK)
  '---------------------------
  mov eax, [esi]
  and eax,UPPER_MASK
  mov ebx, [edi]
  and ebx, LOWER_MASK
  or eax, ebx
  mov ecx,eax
  shr eax, 1
  '---------------------------
  
  mov edx,edi
  add edx,(M-1)*4
  mov ebx,[edx]
  xor eax,ebx
  and ecx,1
  mov ebx, dword ptr [ptrmag01]
  xor eax, [ebx + ecx*4]
  mov [esi],eax
  mov dword ptr [mti], 0
XB:
  mov esi,edi
  mov eax, dword ptr [mti]
  inc dword ptr [mti]
  shl eax,2
  add esi,eax
  mov eax,[esi]
  mov ebx,eax
  shr eax,11
  xor ebx,eax
  mov eax,ebx
  shl eax,7
  and eax,&H9d2c5680
  xor ebx,eax
  mov eax,ebx
  shl eax,15
  and eax,&Hefc60000
  xor ebx,eax
  mov eax,ebx
  shr eax,18
  xor eax,ebx
  mov dword ptr [Function], eax
End Asm

End Function

'-------------------------------------------------------

' generates a Random number On [0,1)-real-interval
Public Function RndMTS() As Single
'Function genrand_real2() As Single
Static As UInteger TempVar

  TempVar = genrand_int32()
  
' Adaptation of code by Wilbert @ PureBASIC forums
Asm
  mov eax, dword ptr [TempVar]
  movd xmm0, eax
  psrlq xmm0, 9
  mov eax, 1
  cvtsi2ss xmm1, eax
  por xmm0, xmm1
  subss xmm0, xmm1
  movd [Function], xmm0
End Asm
  
  'Function = genrand_int32() * 1.0#/4294967296# 
  ' divided by 2^32
End Function
'-------------------------------------------------------

' generates a random number on [0,1) with 53-bit resolution
'Function genrand_res53() As Double
Public Function RndMTD() As Double  
Static As UInteger TempVar1, TempVar2

  TempVar1 = genrand_int32()
  TempVar2 = genrand_int32()

' Adaptation of code by Wilbert @ PureBASIC forums
Asm
  mov eax, dword ptr [TempVar1]
  movd xmm0, eax
  mov eax, dword ptr [TempVar2]
  movd xmm1, eax
  punpckldq xmm0, xmm1
  psrlq xmm0, 12
  mov eax, 1
  cvtsi2sd xmm1, eax
  por xmm0, xmm1
  subsd xmm0, xmm1
  movq [Function], xmm0
End Asm

'Dim a As uInteger = genrand_int32() Shr 5 
'Dim b As uInteger = genrand_int32() Shr 6
  'Function = (a * 67108864.0# + b) * (1.0# / 9007199254740992.0#) 
End Function
'------------------------------------------------------- 

'The following code added by David Roberts
 
#include once "windows.bi"
#include once "win/bcrypt.bi"
#inclib "bcrypt"
 
Declare Function MyRandomInt Lib "Advapi32.dll" Alias "SystemFunction036" _
 ( RandomBuffer As Any Ptr, RandomBufferLength As ULong ) As Byte

Private Function SHA512( sText as String ) as String
Dim as Byte Ptr phalg, phhash
Dim as ULong lhashlength, lresult
Dim sbinhash As String
   
  BCryptOpenAlgorithmProvider Varptr(phalg), wstr("SHA512"), 0, 0 ' we want phalg
  BCryptCreateHash phalg, @phhash, null, 0, 0, 0, 0 ' we want phhash
  BCryptHashData( phhash, Strptr( stext), len( stext ), 0 )
  BCryptGetProperty phalg, bcrypt_hash_length, Cast ( Puchar, @lhashlength ), 4, @lresult, 0
  sbinhash = String$( lhashlength, 0 )
  BCryptFinishHash phhash, Strptr( sbinhash ), lhashlength, 0
  BCryptDestroyHash phhash 
  BCryptCloseAlgorithmProvider phalg, 0
  Return sbinhash

End Function
'-------------------------------------------------------

' Entry point to RndMT.bas. Next statement will be executed
' Initialize state vector in case RandomizeRndMT is not employed
init_genrand(5489UL)
'-------------------------------------------------------

Public Sub RandomizeRndMT( seedString As String )
Dim BinHash As String * 64
Dim As Ulong Ptr BinHashPtr, BaseBinHashPtr
Dim arrayPtr as Byte Ptr
Dim As Long i, j

  If seedString <> "" Then ' User seed
    BinHashPtr = Cptr(Ulong Ptr, StrPtr( BinHash ))
    BaseBinHashPtr = BinHashPtr
    BinHash = SHA512( seedString )
    ' j = 0 to 38 x i = 0 to 15 ==> 39 x 16 = 624 ( = N )
    For j = 0 to 38
      BinHash = SHA512( BinHash ) ' Hash the hash
      BinHashPtr = BaseBinHashPtr
      For i = 0 to 15
        mt(j*16 + i) = PeeK( Ulong, BinHashPtr ) ' Populate state vector
        BinhashPtr += 1
      Next
    Next
  Else ' System seed
    arrayPtr = Cast( Byte Ptr, @mt(0) )
    MyRandomInt( arrayPtr, 4*N ) ' Populate state vector
  End If
  
  mti = N ' Signal state vector needs to be 'refurbished'
  
End Sub
'------------------------------------------------------- 

Public Function RndMTR( One As Long, Two As Long ) As Long
Static As UInteger TempVar

  TempVar = genrand_int32() 
  
  ' Adaptation of code by John Gleason @ PowerBASIC forums 
  Asm
    mov edx, dword ptr [TempVar]
    mov ecx, dword ptr [One]
    mov eax, dword ptr [Two]
    cmp ecx, eax
    jl Now1LT2
    xchg eax, ecx
Now1LT2:
    Sub eax, ecx
    inc eax
    jz doTheRnd
    mul edx
    Add edx, ecx
doTheRnd:
    mov dword ptr [Function], edx
  End Asm

End Function
'-------------------------------------------------------

Public Function RndMTUL() As ULong
  Function = genrand_int32()
End Function
'-------------------------------------------------------
PractRand results

Code: Select all

Microsoft Windows [Version 10.0.14393]
(c) 2016 Microsoft Corporation. All rights reserved.
 
C:\Users\deltarho>F:
 
F:\>cd msvc12_32bit
 
F:\msvc12_32bit>My_Rng | RNG_test stdin32
RNG = RNG_stdin32, PractRand version 0.91, seed = 0xd5bb4d5f
test set = normal, folding = standard (32 bit)
 
rng=RNG_stdin32, seed=0xd5bb4d5f
length= 128 megabytes (2^27 bytes), time= 3.0 seconds
  no anomalies in 117 test result(s)
 
rng=RNG_stdin32, seed=0xd5bb4d5f
length= 256 megabytes (2^28 bytes), time= 6.3 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low1/32]DC6-9x1Bytes-1           R=  +6.3  p =  3.1e-3   unusual
  ...and 123 test result(s) without anomalies
 
rng=RNG_stdin32, seed=0xd5bb4d5f
length= 512 megabytes (2^29 bytes), time= 12.5 seconds
  no anomalies in 132 test result(s)
 
rng=RNG_stdin32, seed=0xd5bb4d5f
length= 1 gigabyte (2^30 bytes), time= 24.8 seconds
  no anomalies in 141 test result(s)
 
rng=RNG_stdin32, seed=0xd5bb4d5f
length= 2 gigabytes (2^31 bytes), time= 48.7 seconds
  no anomalies in 148 test result(s)
 
rng=RNG_stdin32, seed=0xd5bb4d5f
length= 4 gigabytes (2^32 bytes), time= 95.6 seconds
  no anomalies in 156 test result(s)
 
rng=RNG_stdin32, seed=0xd5bb4d5f
length= 8 gigabytes (2^33 bytes), time= 190 seconds
  no anomalies in 165 test result(s)
 
rng=RNG_stdin32, seed=0xd5bb4d5f
length= 16 gigabytes (2^34 bytes), time= 379 seconds
  no anomalies in 172 test result(s)
 
rng=RNG_stdin32, seed=0xd5bb4d5f
length= 32 gigabytes (2^35 bytes), time= 751 seconds
  no anomalies in 180 test result(s)
 
rng=RNG_stdin32, seed=0xd5bb4d5f
length= 64 gigabytes (2^36 bytes), time= 1509 seconds
  no anomalies in 189 test result(s)
 
rng=RNG_stdin32, seed=0xd5bb4d5f
length= 128 gigabytes (2^37 bytes), time= 3015 seconds
  no anomalies in 196 test result(s)
 
rng=RNG_stdin32, seed=0xd5bb4d5f
length= 256 gigabytes (2^38 bytes), time= 6001 seconds
  no anomalies in 204 test result(s)
 
rng=RNG_stdin32, seed=0xd5bb4d5f
length= 512 gigabytes (2^39 bytes), time= 12037 seconds
  no anomalies in 213 test result(s)
 
rng=RNG_stdin32, seed=0xd5bb4d5f
length= 1 terabyte (2^40 bytes), time= 24056 seconds
  no anomalies in 220 test result(s)
deltarho[1859]
Posts: 4308
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Mersenne Twister: A new twist.

Post by deltarho[1859] »

Oh, wow! I was going through the assembler adding comments to see how it related to the C code. I got stuck and realised that there was something missing from it.

This is a segment of the 'engine' of the C code:

Code: Select all

for (;kk<N-1;kk++) {
    y = (mt[kk]&UPPER_MASK)|(mt[kk+1]&LOWER_MASK);
    mt[kk] = mt[kk+(M-N)] ^ (y >> 1) ^ mag01[y & 0x1UL];
}
y = (mt[N-1]&UPPER_MASK)|(mt[0]&LOWER_MASK);
mt[N-1] = mt[M-1] ^ (y >> 1) ^ mag01[y & 0x1UL];
 
mti = 0;

That 'y = (mt[N-1]&UPPER_MASK)|(mt[0]&LOWER_MASK);' was missing from the assembler code.

I inserted this into the assembler code:

Code: Select all

'y = (mt((N - 1)) and UPPER_MASK) or (mt(0) and LOWER_MASK)
mov eax, [esi]
and eax,UPPER_MASK
mov ebx, [edi]
and ebx, LOWER_MASK
or eax, ebx
mov ecx,eax
shr eax, 1
Earlier I showed where the FreeBASIC and Assembler output started to drift at 15577 numbers.

I run the same program and got agreement this time so did a test on 100 million numbers and got this.

Code: Select all

FreeBASIC
 100000001 3145756092  257
 100000002 3996574010  258
 
Assembler
 100000001 3145756092  257
 100000002 3996574010  258
I reckon that if there is no drift at this point then it ain't going to happen.

The initialisation in the C code was improved 2002/1/26 but no mention of that missing line being added since the original code. The assembler code was published 2002/Jan/2.

Obviously that missing line is there for a reason but it doesn't appear to be a large influence otherwise 1TB of PractRand should have spotted something.

Anyway, the code posted above has been corrected and the library download has been updated.

It was a good exercise for me because I had to understand what was going on before the insertion and understand what was going on after the insertion. To say that I was pleased to see synchronisation still going on at 100 million numbers is a bit of an understatement. <phew!>
deltarho[1859]
Posts: 4308
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Mersenne Twister: A new twist.

Post by deltarho[1859] »

Very simple tweak to do with how functions return values. RndMT is now beating option 2 and is 30% faster than option 3.

Code: Select all

1:       68.0 MHz
2:       92.1 MHz
3:       85.8 MHz
4:       83.7 MHz
RndMTS: 111.9 MHz
I won't publish yet as there is one more thing to try.
deltarho[1859]
Posts: 4308
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Mersenne Twister: A new twist.

Post by deltarho[1859] »

When the C code 'engine' was replaced with assembler the following two assignments were added to the 'engine' function.

Code: Select all

ptrMT = @mt(0)
ptrmag01 = @mag01(0)
I then got bogged down trying to get the assembler to work and then got bogged down with the missing statement. When everything was working the 'tweak' in the last post involved removing an assignment from the head of all the subsidiary functions.

It then dawned on me that although these two assignments aided readability there was no need to have them executed every time a random number was requested. Pretty obvious really but not that important with functions not called much. Passing them millions of times a second would be like putting the brakes on.

Pulling them out and putting them in the code's main body gave this.

Code: Select all

1:       69.4 MHz
2:       92.8 MHz
3:       85.9 MHz
4:       86.0 MHz
RndMTS: 124.7 MHz
The object of this thread was a better method of seeding. It was a bonus to eventually see RndMT going past option 3 in the speed stakes. It was never on the table to take on option 2 but we are now 45% faster than option 3 and 34% faster than option 2, the previous fastest option of the bunch. RndMT is now twice as fast as the opening post.

A few posts into Greg Lyon's translation thread he wrote: "It could be sped up with some inline assembly. There are versions in asm that are pretty darn fast." He was spot on.

We now have

Code: Select all

                 RndMT      FreeBasic MT
Entry points  2^19937 - 1       2^32
Speed           124.7MHz       85.9MHz  (i7-3770K 3.50GHz)
From my first post in this thread.
The above is all good news. Yes, there is some bad news, if it can be called bad news.
To make sure that RndMT still does what it says on the box I got PractRand out. I got four consecutive failures at 256GB. The 1xTB success was before the missing statement was included. I did a search for the missing statement and found many instances including a publication in 2012. So, either that missing statement is a mistake or PractRand is 'playing up'. I then remembered that at the PractRand website the author tests many popular PRNGs but I never read that section. I went back and Mersenne Twister had been tested. It got a good rating but there was an update.
update: PractRand 0.91 now includes a binary rank test in the core test
set. With this change, mt19937 now fails PractRand standard after 256
gigabytes. Oddly enough it doesn't just fail matrix sizes over 19937,
it also fails low-bit tests using smaller matrices (12 thousand). I'm
not sure what that means.
I am using PractRand 0.91.

It is worth noting that Xorishiro128+ also fails binary rank tests. Blackman and Vigna wrote:
Beside passing BigCrush, this generator passes the PractRand test suite
up to (and included) 16TB, with the exception of binary rank tests,
which fail due to the lowest bit being an LFSR; all other bits pass all
tests. We suggest to use a sign test to extract a random Boolean value.
Blackman and Vigna don't question the binary rank tests and, since Vigna has a Laurea in Mathematics and a Ph.D in computer science, then I won't either.

I am not going to question Mersenne Twister either even though I find it very spooky that an apparent oversight by the assembler author allows Mersenne Twister to pass PractRand with flying colours.

So, where do we go from here? I am inclined to leave the missing statement alone. 256GB (64GBs of ULongs) is a lot of data and most of us will never request anything like that. If you think that you will need more than that the simple solution is to go against the grain and reseed. <laugh>

The latest, and now very speedy <smile>, RndMT.bas has been posted in the opening post with a date next to it. If you want the library and bi file here is a link: RndMT.zip

Added: Forgot to mention that UInteger has been replaced with ULong. MrSwiss will, no doubt, approve of that.
deltarho[1859]
Posts: 4308
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Mersenne Twister: A new twist.

Post by deltarho[1859] »

Previous speed was 124.7 MHz. I am now getting 135.5 MHz. I normally regard significant as being >= 7%. Just scraped over with 8.7%. <smile>

Will now compile with '-gen gcc -Wc -O3'. -O3 was too big a hurdle but we can now get over that with temporary labels.

Code in opening post updated with today's date.
deltarho[1859]
Posts: 4308
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Mersenne Twister: A new twist.

Post by deltarho[1859] »

When I was able to use -O3 I forgot to check that the asm code for Single still had the edge on FreeBASIC. It doesn't and is faster using FreeBASIC; as was found with PCG32II and CryptoRndII.

I am now getting 165MHz. I have changed the output to Double giving 32 bits of granularity and changed the function name from RndMTS to RndMTSE to indicate extended single because it is neither a Single or Double in the classical sense.
Post Reply