A fast CPRNG

Windows specific questions.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

Found a bug in RdRand FillBuffer(). Speed has reduced further. I was storing eax + 0 instead of rax. I was moving through the buffer 8 bits at a time instead of 4 - got that right. <smile>. There is now so much conditional assembly going on in RdRand's FillBuffer it is getting close to having two versions of FillBuffer, one for 32 bit and one for 64 bit.

Code above corrected.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

Deleted
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

CryptoRndII

CryptoRndII is essentially the same as CryptoRnd but with one exception: CryptoRnd uses ThreadCreate whereas CryptoRndII uses Thread Pooling.

rpkelly, aka Rick Kelly, like me, is an immigrant from PowerBASIC and brought the subject up of Thread Pooling in Windows Thead Pool. I had never heard of it but what caught my eye was not the pooling aspect but the fact that threads were effectively reusable. With FB's ThreadCreate, effectively a wrapper for the Windows API CreateThread, once a secondary thread of execution completes it gets destroyed. It we wanted to repeat a job but use a different parameter then we would need to create another thread. The thread creation overhead may be small compared with the work done by a secondary thread of execution so the exercise is worthwhile. However, if the work being done by a thread is small then it may not be worthwhile. Where we have a large number of thread creations that run for a short time then it may still be a worthwhile exercise but the accumulation of the thread creation overhead does not come cheap. With reusable threads there is still an overhead but it is small compared with thread creation.

Here is a comparison between the FB's PRNGs, CryptoRnd and CryptoRndII.

Code: Select all

Time in seconds to generate 100 million random numbers
 
Default   0     1.301
CRT       1     1.440
Fast      2     0.977
Mersenne  3     1.293
Qbasic    4     1.036
CryptoRnd       0.705
CryptoRndII     0.362 
Here is a comparison between CryptoRnd and CryptoRndII.

Code: Select all

CryptoRnd
Requested 10485760
BufferSize 131072 NumberOfBuffers 80
67.358ms Time to crunch
155 Million per second
Stutter 0.5871898950504352
 
CryptoRndII
Requested 10485760
BufferSize 131072 NumberOfBuffers 80
36.103ms Time to crunch
290 Million per second
Stutter 0.1915569701375841
The exhaustion stutter is mentioned in the opening post of this thread. Using a 128MB buffer, recommended, the stutter should not be noticeable by a user. A 2/3rds reduction in stutter should put to rest any concerns about stutter.

The throughput increase was a surprise: CryptoRndII is nearly twice as fast as CryptoRnd which was fast to start with.

If we generate 100 million random floats using any of the above generators which should average out close to 0.5. If we get 0.3, for example, then we know we have something wrong with our implementation. On the other hand if we do get close to 0.5 this does indicate that our implementation is correct. 100 million x 0.5 will give an average of 0.5 but 100 million x 0.5 is not a set of random numbers. This is where an application such as PractRand comes in.

I did a few simple tests with CryptoRndII and got close to 0.5 on the average test. PractRand may raise anomalies ranging from unusual to FAIL!!! with quite a few descriptions in between. My first test of CryptoRndII using PractRand produced more FAIL!!! anomalies than I had ever seen before. The conversion from ThreadCreate to Thread Pooling was fairly easy but because of the way thread pooling work objects are created I needed one more statement in the buffer switching subroutine. After correcting that omission PractRand passed CryptoRndII was flying colours.

Here is the PractRand output up to one Terabyte.

Code: Select all

rng=RNG_stdin32, seed=0xf2281c93
length= 128 megabytes (2^27 bytes), time= 2.8 seconds
  no anomalies in 117 test result(s)
 
rng=RNG_stdin32, seed=0xf2281c93
length= 256 megabytes (2^28 bytes), time= 6.0 seconds
  no anomalies in 124 test result(s)
 
rng=RNG_stdin32, seed=0xf2281c93
length= 512 megabytes (2^29 bytes), time= 12.0 seconds
  no anomalies in 132 test result(s)
 
rng=RNG_stdin32, seed=0xf2281c93
length= 1 gigabyte (2^30 bytes), time= 23.6 seconds
  no anomalies in 141 test result(s)
 
rng=RNG_stdin32, seed=0xf2281c93
length= 2 gigabytes (2^31 bytes), time= 46.3 seconds
  no anomalies in 148 test result(s)
 
rng=RNG_stdin32, seed=0xf2281c93
length= 4 gigabytes (2^32 bytes), time= 90.8 seconds
  no anomalies in 156 test result(s)
 
rng=RNG_stdin32, seed=0xf2281c93
length= 8 gigabytes (2^33 bytes), time= 181 seconds
  no anomalies in 165 test result(s)
 
rng=RNG_stdin32, seed=0xf2281c93
length= 16 gigabytes (2^34 bytes), time= 360 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low8/32]DC6-9x1Bytes-1           R=  -3.7  p = 0.988     unusual
  [Low8/32]FPF-14+6/16:cross        R=  -2.4  p =1-4.1e-4   unusual
  ...and 170 test result(s) without anomalies
 
rng=RNG_stdin32, seed=0xf2281c93
length= 32 gigabytes (2^35 bytes), time= 713 seconds
  no anomalies in 180 test result(s)
 
rng=RNG_stdin32, seed=0xf2281c93
length= 64 gigabytes (2^36 bytes), time= 1430 seconds
  Test Name                         Raw       Processed     Evaluation
  [Low8/32]Gap-16:A                 R=  -4.1  p =1-1.7e-3   unusual
  ...and 188 test result(s) without anomalies
 
rng=RNG_stdin32, seed=0xf2281c93
length= 128 gigabytes (2^37 bytes), time= 2855 seconds
  no anomalies in 196 test result(s)
 
rng=RNG_stdin32, seed=0xf2281c93
length= 256 gigabytes (2^38 bytes), time= 5680 seconds
  no anomalies in 204 test result(s)
 
rng=RNG_stdin32, seed=0xf2281c93
length= 512 gigabytes (2^39 bytes), time= 11406 seconds
  no anomalies in 213 test result(s)
 
rng=RNG_stdin32, seed=0xf2281c93
length= 1 terabyte (2^40 bytes), time= 22798 seconds
  no anomalies in 220 test result(s)
There are only a few 'unusual' anomalies and we can expect this kind of output when analysing quantum random numbers. To put that another way there is no evidence in the above to suggest that we are not analysing quantum random numbers.

In use example

Code: Select all

const _WIN32_WINNT = &h0602
#Include Once "CryptoRndII.bas"
 
Dim i As Long
 
For i = 1 to 10
  Print CryptoS
Next
 
CleanUpCryptoRndIIBuffer
 
Sleep
If we do not include the first line the Thread Pooling routines will not get loaded. The CleanUp line is important - it closes the Microsoft random number provider and clears the Thread Pooling resources used - memory leak if not done.

64 bit mode

CryptoRnd took a performance hit in 64 bit mode and I did not know why. It improved measurably using the compiler options '-gen gcc -Wc -O1'. I may be wrong but ThreadCreate may not do as well in 64 bit mode as it does in 32 bit mode.

Here is a comparison between the FB's PRNGs, CryptoRnd and CryptoRndII in 64 bit mode using '-gen gcc -Wc -O1'

Code: Select all

Time in seconds to generate 100 million random numbers
 
Default   0     0.592
CRT       1     1.459
Fast      2     0.378
Mersenne  3     0.601
Qbasic    4     0.362
CryptoRnd       0.556
CryptoRndII     0.229
Although CryptoRnd was the fastest in 32 bit mode it still fell short compared with option 2, Multiply-with-carry, in 64 bit mode. CryptoRndII is now faster than option 2. Not to put to fine a point on it CryptoRndII is wiping the floor with everybody and it is generating cryptographic random numbers.

Here is a comparison between CryptoRnd and CryptoRndII in 64 bit mode using '-gen gcc -Wc -O1'

Code: Select all

CryptoRnd
Requested  10485760
BufferSize 131072 NumberOfBuffers 80
55.548ms Time to crunch
188 Million per second
Stutter 0.4376962067809286

CryptoRndII
Requested  10485760
BufferSize 131072 NumberOfBuffers 80
21.218ms Time to crunch
494 Million per second
Stutter 0.003139245721358301
The stutter has almost vanished.

However, look at that throughput. That is pushing 1/2 billion per second.

Intel RdRand

If we use '#define ALGO 2' CryptoRnd and CryptoRndII uses Intel's RdRand. From a speed perspective this does not compete with any of the algorithms above. However, it has benefited from Thread Pooling. In 32 bit mode I am getting 62 million per second and in 64 bit mode, using '-gen gcc -Wc -O1', I am getting 112 million per second.

Here is CryptoRndII.bas ( 2 July 2017 )

Code: Select all

#include once "windows.bi"
#include once "win/bcrypt.bi"
#inclib "bcrypt"
 
#ifndef ALGO
  #define ALGO 1
#endif
 
#if (ALGO = 2)
  Declare Function RtlGenRandom Lib "Advapi32.dll" Alias "SystemFunction036" _
  ( RandomBuffer As Any Ptr, RandomBufferLength As Ulong ) As Byte
#endif

Dim Shared As Byte Buffer0(), Buffer1()
Dim Shared As Long BufferSize
Dim Shared As Any Ptr ptrBuffer, ptrBaseBuffer0, ptrBaseBuffer1
Dim Shared As Any Ptr ptrBaseBuffer0plus, ptrBaseBuffer1plus
Dim Shared As Long SwitchBufferCriteria
Dim Shared Pool As PTP_POOL
Dim Shared As PTP_WORK Work0, Work0plus, Work1, Work1plus
Dim Shared As Byte Ptr hRand 

Declare Sub SwitchBuffer
Declare Sub FillBuffer( As PTP_CALLBACK_INSTANCE, As PVOID Ptr, As PTP_WORK )
Declare Sub ResetBufferPointer
Declare Sub InitializeCryptoBuffers( As Long )
 
#If (ALGO = 1)
  BufferSize = 128*1024
#Else
  BufferSize = 32*1024
#Endif
 
Pool = CreateThreadpool(Null)
InitializeCryptoBuffers( BufferSize )
 
Private Function CryptoDW As Ulong
 
  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If
 
  Asm
    mov eax, dword Ptr [ptrBuffer]
    mov eax, [eax]
    mov [Function], eax
  End Asm
 
  ptrBuffer += 4
 
End Function
 
Private Function CryptoS As Single ' [0,1)
Dim As Ulong TempVar
 
  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If
 
  ' ASM by Wilbert @ PureBasic forums
    Asm
	  	mov eax, dword Ptr [ptrBuffer]
	  	mov eax, [eax]
	  	mov dword Ptr [TempVar], eax
	  End Asm
	  Return TempVar/4294967296.0

  ptrBuffer += 4
 
End Function
 
Private Function CryptoSX As Single ' [-1,1]
 
  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If
 
  ' ASM adapted from CryptoS by author
  Asm
    mov eax, dword Ptr [ptrBuffer]
    mov eax, [eax]
    movd xmm0, eax
    psrlq xmm0, 9
    mov edx, 2
    cvtsi2ss xmm1, edx
    por xmm0, xmm1
    subss xmm0, xmm1
    mov edx, 1
    cvtsi2ss xmm1, edx
    subss xmm0, xmm1
    movd [Function], xmm0
  End Asm
 
  ptrBuffer += 4
 
End Function
 
Private Function CryptoD As Double  ' [0,1)
Dim As Ulongint uld
 
  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If
 
  ' ASM by Wilbert at PureBasic forums
  Asm
    mov eax, dword Ptr [ptrBuffer]
    movd xmm0, [eax]
    movd xmm1, [eax + 4]
    punpckldq xmm0, xmm1
    psrlq xmm0, 12
    mov eax, 1
    cvtsi2sd xmm1, eax
    por xmm0, xmm1
    subsd xmm0, xmm1
    movq [Function], xmm0
  End Asm
 
  ptrBuffer += 8
 
End Function
 
Private Function CryptoDX As Double  ' [-1,1]
 
  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If
 
  ' ASM adapted from CryptoD by author
  Asm
    mov eax, dword Ptr [ptrBuffer]
    movd xmm0, [eax]
    movd xmm1, [eax + 4]
    punpckldq xmm0, xmm1
    psrlq xmm0, 12
    mov eax, 2
    cvtsi2sd xmm1, eax
    por xmm0, xmm1
    subsd xmm0, xmm1
    mov eax, 1
    cvtsi2sd xmm1, eax
    subsd xmm0, xmm1
    movq [Function], xmm0
  End Asm
 
  ptrBuffer += 8
 
End Function
 
Private Function CryptoR( Byval One As Long, Byval Two As Long ) As Long
Dim As Ulong TempVar
 
  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If
 
' ASM by John Gleason @ PowerBASIC forums
	Asm
  mov edx, dword Ptr [ptrBuffer]
  mov edx, [edx]
  mov ecx, [One]
  mov eax, [Two]
  cmp ecx, eax
  jl 0f
  xchg eax, ecx
0:
  Sub eax, ecx
  inc eax
  jz 1f
  mul edx
  Add edx, ecx
1:
  mov [Function], edx
	End Asm

	
  ptrBuffer += 4
 
End Function
 
Private Function Gauss As Single
Static As Long u2_cached
Static As Single u1, u2, x1, x2, w
 
  If u2_cached = -1 Then
    u2_cached = 0
    Function = u2
  Else
    Do
      x1 = CryptoS
      x2 = CryptoS
      w = x1 * x1 + x2 * x2
    Loop While w >= 1
    w = Sqr( -2 * Log(w)/w )
    u1 = x1 * w
    u2 = x2 * w
    u2_cached = -1
    Function = u1
  End If
 
End Function
 
Private Sub InitializeCryptoBuffers( Byval Buffer As Long )
Dim As Byte Ptr hRand

  #If (ALGO = 1)
    BCryptOpenALGOrithmProvider( Varptr(hRand), BCRYPT_RNG_ALGORITHM, 0, 0)
  #endif
  If Buffer < 1024 Then
    BufferSize = 1024
  Else
    BufferSize = Buffer - Buffer Mod 8
  End If
  Redim Buffer0( 1 To BufferSize) As Byte
  ptrBaseBuffer0 = Varptr( Buffer0(1) )
  ptrBuffer = ptrBaseBuffer0
  #ifdef __FB_64BIT__
    SwitchBufferCriteria = Cast(Longint, ptrBuffer) + BufferSize
  #Else
    SwitchBufferCriteria = Cast(Long, ptrBuffer) + BufferSize
  #endif
  Work0 = CreateThreadpoolWork(Cast(PTP_WORK_CALLBACK, @FillBuffer), @ptrBaseBuffer0, Null)
  ptrBaseBuffer0plus = ptrBaseBuffer0 + BufferSize\2
  Work0plus = CreateThreadpoolWork(Cast(PTP_WORK_CALLBACK,@FillBuffer), @ptrBaseBuffer0plus, Null)
  SubmitThreadpoolWork(Work0)
  SubmitThreadpoolWork(Work0plus)
  Redim Buffer1( 1 To BufferSize) As Byte
  ptrBaseBuffer1 = Varptr( Buffer1(1) )
  Work1 = CreateThreadpoolWork(Cast(PTP_WORK_CALLBACK,@FillBuffer), @ptrBaseBuffer1, Null)
  ptrBaseBuffer1plus = ptrBaseBuffer1 + BufferSize\2
  Work1plus = CreateThreadpoolWork(Cast(PTP_WORK_CALLBACK,@FillBuffer), @ptrBaseBuffer1plus, Null)
  SubmitThreadpoolWork(Work1)
  SubmitThreadpoolWork(Work1plus)
  WaitForThreadpoolWorkCallbacks(Work0,FALSE)
  WaitForThreadpoolWorkCallbacks(Work0plus,FALSE)
  ' We don't need Work0 related objects again.
  CloseThreadpoolWork(Work0)
  CloseThreadpoolWork(Work0plus)
End Sub
 
#if (ALGO = 1)
  Private Sub CleanUpCryptoRndIIBuffer
    BCryptCloseALGOrithmProvider( hRand, 0  )
    CloseThreadpoolWork(Work1)
    CloseThreadpoolWork(Work1plus)
    CloseThreadpool(Pool)
  End Sub
#endif
 
#If (ALGO = 1)
Private Sub FillBuffer( Instance As PTP_CALLBACK_INSTANCE, Context As PVOID Ptr, Work As PTP_WORK)
Dim BaseBuffer As Any Ptr
  BaseBuffer = *Context
  BCryptGenRandom( hRand, BaseBuffer, BufferSize\2, 0)
End Sub
#Else
Private Sub FillBuffer(Byval Instance As PTP_CALLBACK_INSTANCE, Byval Context As PVOID Ptr, Byval Work As PTP_WORK)
Dim BaseBuffer As Any Ptr
Dim As Long HalfBuffer
Dim As Ulong RecoverBuffer
Dim As Any Ptr ptrRecoverBuffer
 
  BaseBuffer = *Context
  ptrRecoverBuffer = Varptr(RecoverBuffer)
 
  HalfBuffer = BufferSize\2
  Asm
    mov edi, dword Ptr [HalfBuffer]
    mov esi, 0
    mov ebx, dword Ptr [BaseBuffer]
  rptRdRand:
    mov ecx, 10 ' Max number Of tries before going into a recovery
  queryAgain:
  #ifdef __FB_64BIT__
    RdRand rax
  #Else
    RdRand eax
  #endif
    jc OK ' A Random value was available
    dec ecx
    jnz queryAgain
    Call Recover ' Use RtlGenRandom For This ULong
  OK:
    #ifdef __FB_64BIT__
      mov qword Ptr [ebx + esi], rax ' Store RdRand
      Add esi, 8
    #Else
      mov dword Ptr [ebx + esi], eax ' Store RdRand
      Add esi, 4
    #endif
      cmp edi, esi
      jne rptRdRand
      jmp Done
  Recover:
  #ifndef __FB_64BIT__
    pushad ' I am playing it safe here
  #endif
  End Asm
  #ifdef __FB_64BIT__
    RtlGenRandom(ptrRecoverBuffer, 8)  ' Populate buffer
  #Else
    RtlGenRandom(ptrRecoverBuffer, 4)
  #endif
  Asm
  #ifndef __FB_64BIT__
    popad
  #endif
  #ifdef __FB_64BIT__
    mov rax, qword Ptr [ptrRecoverBuffer]
  #Else
    mov eax, dword Ptr [ptrRecoverBuffer]
  #endif
    ret
  Done:
  End Asm
 
End Sub
#Endif
 
Private Sub SwitchBuffer
  WaitForThreadpoolWorkCallbacks(Work1,FALSE)
  WaitForThreadpoolWorkCallbacks(Work1plus,FALSE)
  Swap ptrBaseBuffer0, ptrBaseBuffer1
  Swap ptrBaseBuffer0plus, ptrBaseBuffer1plus
  ptrBuffer = ptrBaseBuffer0
  #ifdef __FB_64BIT__
    SwitchBufferCriteria = Cast(Longint, ptrBuffer) + BufferSize
  #Else
    SwitchBufferCriteria = Cast(Long, ptrBuffer) + BufferSize
  #endif
  SubmitThreadpoolWork(Work1)
  SubmitThreadpoolWork(Work1plus)
End Sub
 
Private Sub ResetBufferPointer
  ptrBuffer = ptrBaseBuffer0
End Sub
Last edited by deltarho[1859] on Jul 02, 2017 21:51, edited 1 time in total.
St_W
Posts: 1619
Joined: Feb 11, 2009 14:24
Location: Austria
Contact:

Re: A fast CPRNG

Post by St_W »

deltarho[1859] wrote:CryptoRnd took a performance hit in 64 bit mode and I did not know why. It improved measurably using the compiler options '-gen gcc -Wc -O1'. I may be wrong but ThreadCreate may not do as well in 64 bit mode as it does in 32 bit mode.
Do you use "-gen gcc" in 32-bit mode too? If you don't that's potentially a big speed improvement as GCC is way better in optimizing code than FBC is. By default FBC uses the GAS backend for 32-bit builds, so there are only the optimizations done by FBC. For 64-bit builds, however, the GCC backend is the default one (because a GAS backend for x86_64 hasn't been implemented up to now) - bringing all the optimizations for free.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

Blimey!

I have only used the gcc optimisation for 64 bit builds - I didn't know gcc could be used for 32 bit builds.

Using '-gen gcc -Wc -O1' in 32 bit mode the opening timings of my last post are now

Code: Select all

Time in seconds to generate 100 million random numbers
 
Default   0     1.155
CRT       1     1.399
Fast      2     0.942
Mersenne  3     1.166
Qbasic    4     1.019
CryptoRnd       0.614
CryptoRndII     0.212
Everyone has benefited to varying degrees but CryptoRndII has benefited the most by a long way.

With regard the second timings here is a comparison between CryptoRnd and CryptoRndII in 32bit mode using '-gen gcc -Wc -O1'

Code: Select all

CryptoRnd
Requested  10485760
BufferSize 131072 NumberOfBuffers 80
67.061ms Time to crunch
156 Million per second
Stutter 0.5834303418895867
 
CryptoRndII
Requested  10485760
BufferSize 131072 NumberOfBuffers 80
23.386ms Time to crunch
448 Million per second
Stutter 0.03058227442487887
CryptoRndII has gone through the roof and the stutter has collapsed.
GCC is way better in optimizing code than FBC is
You can say that again!

This goes to show that thread creation/destruction in this context is a lot more expensive then thread pooling than I thought it was.

So, with CryptoRndII we can get 448 million per second in 32 bit mode and 494 million per second in 64 bit mode.

In fact, in 32 bit mode CryptoRndII is now over four times faster than option 2, the previous record holder before CryptoRnd came along.

If anyone told me only a year ago that these sort of figures could be got from cryptographic random numbers I would have reported them to the authorities.

Thanks, St_W, I need to have a lie down in a dark room now.<smile>
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

In optimised 32 bit mode CryptoRndII is 5.5 times faster than FB's Mersenne Twister.

In real world situations the ratio will vary and is dependent upon the environment the random numbers find themselves in.

The following code uses two billion random numbers in a Monte Carlo estimate of PI.

Code: Select all

const _WIN32_WINNT = &h0602
#include Once "CryptoRndII.bas"

Dim As Ulong n, lCount
Dim As Long i
Dim As Single x, y
Dim t As Double

n = 1000000000

t = timer
For i = 1 to n
  x = CryptoS
  y = CryptoS
  If Sqr( x*x + y*y ) <= 1 Then lCount += 1
Next
Print timer - t;" seconds"
Print "Estimate of Pi:";4*lCount/n
 
CleanUpCryptoRndIIBuffer

Sleep

' Pi = 3.14159265358979323846
On my machine I get:

Code: Select all

 11.34849575335599 seconds
Estimate of Pi: 3.141576536
With Mersenne Twister I get:

Code: Select all

 35.70943039623867 seconds
Estimate of Pi: 3.141638968
This job usually takes some time - do we have time to go to the pub for a swift pint?
We could yesterday with Mersenne Twister but we are now using CryptoRndII - put the kettle on!
rpkelly
Posts: 52
Joined: Sep 03, 2016 22:36

Re: A fast CPRNG

Post by rpkelly »

You are da man Dave. Have a pint on me....

Rick
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

Have a pint on me....
Very good of you to offer, Rick, but I gave up drinking sometime ago.

I also gave up smoking sometime ago.

Giving up women was the last thing to go.

I did all of that to live longer - I cannot afford the funeral costs at the moment.

Actually, I am joking. The day I give up on women will be the day of my funeral.

<smile>

Just upgraded PractRand 0.91 to 0.92 - it runs twice as fast. Just as well, one terabyte with 0.91 took over 6 hours with an i7-3770K. I don't expect anything untoward. Any negatives and I will report back.
rpkelly
Posts: 52
Joined: Sep 03, 2016 22:36

Re: A fast CPRNG

Post by rpkelly »

Well Dave, I'm just another older guy now and pursue trout, salmon, and, grayling with my beloved bamboo with flies I tie. In between fishing trips, I sling a bit of code around and hike and travel with my wife who has beat me to retirement by a few years. I do like a pint a few times per month and don't worry about much of anything any longer...
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

don't worry about much of anything any longer...
It is a pity that we have to go around the block so many times to get there but, I suppose, it is better late than never.

PractRand 0.92 just came in with one terabyte and only two anomalies this time, one at 1GB and the other at 512GB. I think that I have mentioned this before but a terabyte without a single anomaly would be, for me, grounds for concern. The normal distribution has long tails and to never enter them at some point would be a contradiction. It would contradict a normal distribution and if we don't have a normal distribution then we don't have random numbers. If that ever happened I would let PractRand run to two terabytes and sit there with my fingers crossed hoping for an anomaly. We strive for perfection but woe betide us if we ever achieve it. <Ha, ha>
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

Just to get things into perspective here is a little snippet which simply takes a yellow square and fills it with blue dots at random - the aim being to end up with a blue square.

As is, we are using the Mersenne Twister.

The job gets done in 0.6 seconds.

Remember, CryptoS is about 5.5 times faster than the Mersenne Twister.

Comment the MT and uncomment the CryptoS line and run again.

This time I get 0.48 seconds.

So, what happened to the 5.5 times?

Firstly, we are only requesting 13,440,000 random numbers but more importantly, in CPU terms, PSet is very slow.

To bring the point really home if we replace the Rnd line with PSet (400, 300 ) this is tantamount to using a generator of infinite speed.

In this case I get 0.37 seconds with a yellow square and a single blue dot in the middle.

So, with this example the speed of the random number generator hardly makes a blind bit of difference.

You may feel that with such situations you may as well stay with the Mersenne Twister. On speed grounds I would agree but if you are in need of top drawer randomness then don't forget that the Mersenne Twister is a PRNG and CryptoRndII is a CPRNG.

Code: Select all

' Compile as GUI

Const _WIN32_WINNT = &h0602

#Include Once "CryptoRndII.bas"
 
Dim As Long i
Dim As Double t

Screen 19, 32
Color Rgb(0, 0, 255), Rgb(255, 255, 0)
Cls 

Randomize , 3

t = Timer
For i = 1 To 14*800*600
  Pset (Rnd*800, Rnd*600) ' Mersenne Twister
  'PSet (CryptoS*800, CryptoS*600)
Next
t = Timer - t

CleanUpCryptoRndIIBuffer
 
Locate 1,1
Print "Done";" ";Str(t)

Sleep
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

Found a silly error in my timing code which resulted in an understatement of the times and the time allowed to determine a throughput was also increased to give a better value.

Using '-gen gcc -Wc -O1' in both 32 bit and 64 bit mode I get:

Code: Select all

Random number generation
Millions per second

                  32 bit    64 bit
Default    0      87.348   199.253
CRT        1      71.780    74.730
Fast       2     107.853   346.943
Mersenne   3      87.458   199.211
Qbasic     4      99.372   320.740
CryptoRndII      506.170   511.902
Funny thing with these figures is that CryptoRndII isn't bothered whether it is used in 32 bit or 64 bit mode. The reason is probably because 99% of the 'running' code is in 32 bit assembler with most of the FreeBASIC stuff found in the initialisation.

So, there we have it - a half billion per second cryptographic single precision pseudo random number generator written with FreeBASIC. Actually, it does more than single precision but I didn't want to 'rattle' on.<smile>
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

HEALTH WARNING

Don't use either CryptoRnd or CryptoRndII in more than one thread of execution until further notice. Windows gets very upset if another thread calls by whilst we are switching buffers, more than 128KB of data requests.

Image

Some UK members may remember that image from way back. It allowed us to tune our televisions whilst the broadcaster was frantically trying to resume normal service. That little girl is now pushing 60 - well, I hope she is. Wonder what she is doing tonight. <laugh>
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

Update

During a buffer switch a mutex was employed and Windows no longer gets upset. If the routines are only used in a single thread the throughput is significantly reduced but, in this case, we don't need a mutex. When used in two threads the throughput is reduced further. This tells me that thread safety has been compromised. So, it looks like we had two issues: Buffer switching in a two thread environment; a memory variable conflict in a two thread environment.

Neither routine uses a state vector so we cannot have a memory variable conflict in that context. That leaves shared/global memory variables. Some of them are created during initialisation and do not change during an application session so whether we use one thread or more is irrelevant. That leaves shared/global memory variables which may change and they are the ones needing attention in the event of using more than one thread. If that can be resolved we still have to use a mutex. Having said that there is a way to avoid that. My CPG32 was recently made thread safe and St_W came up with an idea which improved my solution and that may be the way forward here.

Folks may disagree but I do not regard the ability to have CryptoRndII stable with multiple thread usage a burning issue so this work is going to go on my to do list.

So, "Don't use either CryptoRnd or CryptoRndII in more than one thread of execution until further notice."
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

CryptoRndII.bas has been replaced above at 'Here is CryptoRndII.bas ( 2 July 2017 )'

It has been given the same tweak as PCG32II and adjusted to ensure that the optimization level -O3 can be used.

I have just posted this table in the PCG32II thread.

Code: Select all

Random number generation
Millions per second

               32 bit   64 bit
Default   0    84.420   185.919
CRT       1    69.564    72.295
Fast      2    106.741  331.956
Mersenne  3    86.903   192.040
QBasic    4    99.874   323.384
CryptoRndII   588.233   972.532
PCG32II       493.647   974.215
CryptoRndII is now about 16% faster in 32 bit mode and has gone through the roof in 64 bit mode. Previously 64 bit mode was not much faster than 32 bit mode.

I don't think I will bother to try and make CryptoRndII thread safe. PCG32II is almost as fast, has top drawer randomness quality and is thread safe. Some folk may decide to make PCG32II their default random number generator - I am leaning toward that myself - it only uses 96 lines of code including line spaces and comments.
Post Reply