A fast CPRNG

Windows specific questions.
deltarho[1859]
Posts: 4305
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

A fast CPRNG

Post by deltarho[1859] »

We are spoilt for choice with regard random number generators.

Algorithm 5 provides cryptographic random numbers. For Windows, the CryptoAPI is used via CryptGenRandom. A better quality generator is with 'Cryptography API: Next Generation' introduced in Windows Vista via BCryptGenRandom. Both generators are designed for filling buffers. Algorithm 5 is OK for requesting the odd random number now and again but if we want access to a lot and fast then algorithm 5 is not the way to go.

On my machine Mersenne Twister knocked out a million random numbers in 13.004ms with one test. Algorithm 5 took 2361.078ms with one run. The method employed here uses BCryptGenRandom ( Vista and older remember ) and knocked out a million random numbers in 5.734ms with one test. That is more than twice as fast as Mersenne Twister and we are talking about a CPRNG as opposed to a PRNG.

Anyway, here is the method employed.

We have two equally sized buffers A and B.

'A' is split into two equally sized buffers A0 and A1 and they are filled by two secondary threads of execution. On completion, 'B' is treated the same way and 'A' becomes available for use.

When 'A' is exhausted we switch to 'B', start to fill 'A' and 'B' becomes available for use.

Code: Select all

|_________A________| |_________B_________|
|    0    |    1   | |    0    |    1    |
and so on.

If 'B', for example, is not filled when 'A' exhausts then we will have to wait and there will be a stutter. The stutter is directly proportional to the size of 'A' or 'B'. We call this stutter the exhaustion stutter. If 'B', for example, is filled when 'A' exhausts then there will be no exhaustion stutter. The original version of this method only used one secondary thread of execution. Using two saw a significant reduction in the exhaustion stutter.

When we switch from one buffer to the other there will be a stutter and this stutter is independent of the size of 'A' or 'B'. We call this stutter the switch stutter.

If we use a sufficiently large buffer and request less then there will no stutter at all. If we then use a smaller buffer and request the same number such that we switch often then subtracting the smaller time from the larger time and divide by the number of switches then we will determine the average combined stutter; that is the exhaustion stutter, if any, and the switch stutter.

That is easy. However, determining either the exhaustion or switch stutter is not. Including analysis code in a neighbourhood of the switch will affect the stutter and we have, effectively, Heisenberg's uncertainty principle applying. <laugh>

I think I will leave that as a reader exercise because I am getting a combined stutter, averaging several tests, of about 50ns. No, you did not misread that - 50 nanoseconds.

An analogy is the fly-back with a cathode ray tube - if it is fast enough we are not conscious of it.

It is worth noting that in the above we are exhausting as fast as possible. In practice, we will be using the random numbers whilst the 'filling' is doing nothing else but so the most likely scenario is the buffer to be used next will be sat there twiddling its thumbs whispering "I am ready when you are, neighbour.".

The following code section is the generator. It is in inc form. I am not up to full speed yet on creating libraries so, be my guest. <smile>

Functions:
CyptoDW - Generates Ulongs.
CryptoS - Single precision [0,1]
CryptoSX - Single precision [-1,1]
CryptoD - Double precision [0,1]
CryptoDX - Double precision [-1,1]
CryotoR - [ Long, Long ]

It is worth noting that Mersenne Twister "provides 32-bit granularity" so it is cast as a double. The 'double' in CryptoD and CryptoDX is a genuine double; it uses two 32 bit random numbers from the generator.

CryptoRndBufferCNG.inc

Code: Select all

#include once "windows.bi"
#include once "win/bcrypt.bi"
#inclib "bcrypt"

Dim Shared As Byte Ptr hRand
Dim Shared As Byte Buffer0(), Buffer1()
Dim Shared As Long BufferSize
Dim Shared As Any Ptr hThread()
Dim Shared As Any Ptr ptrBuffer, ptrBaseBuffer0, ptrBaseBuffer1 
Dim Shared As Long SwitchBufferCriteria

Declare Sub SwitchBuffer
Declare Sub FillBuffer( As Any Ptr )

Private Function CryptoDW As ULong

  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If
  
  Asm
    mov eax, [ptrBuffer]
    mov eax, [eax]
    mov [Function], eax
  End Asm
  
  ptrBuffer += 4
  
End Function

Private Function CryptoS As Single ' [0,1)
  
  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If

  ' ASM by Wilbert @ PureBasic forums
  Asm
    mov eax, [ptrBuffer]
    mov eax, [eax]
    movd xmm0, eax
    psrlq xmm0, 9
    mov eax, 1
    cvtsi2ss xmm1, eax
    por xmm0, xmm1
    subss xmm0, xmm1
    movd [Function], xmm0
  End Asm
  
  ptrBuffer += 4
 
End Function

Private Function CryptoSX As Single ' [-1,1]

  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If

  ' ASM adapted from CryptoS by author
  Asm
    mov eax, [ptrBuffer]
    mov eax, [eax]
    movd xmm0, eax
    psrlq xmm0, 9
    mov edx, 2
    cvtsi2ss xmm1, edx
    por xmm0, xmm1
    subss xmm0, xmm1
    mov edx, 1
    cvtsi2ss xmm1, edx
    subss xmm0, xmm1
    movd [Function], xmm0
  End Asm
  
  ptrBuffer += 4
  
End Function

Private Function CryptoD As Double  ' [0,1)
  
  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If  
  
  ' ASM by Wilbert at PureBasic forums
  Asm
    mov eax, [ptrBuffer]
    movd xmm0, [eax]
    movd xmm1, [eax + 4]
    punpckldq xmm0, xmm1
    psrlq xmm0, 12
    mov eax, 1
    cvtsi2sd xmm1, eax
    por xmm0, xmm1
    subsd xmm0, xmm1
    movq [Function], xmm0
  End Asm
  
  ptrBuffer += 8

End Function

Private Function CryptoDX As Double  ' [-1,1]

  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If
 
  ' ASM adapted from CryptoD by author
  Asm
    mov eax, [ptrBuffer]
    movd xmm0, [eax]
    movd xmm1, [eax + 4]
    punpckldq xmm0, xmm1
    psrlq xmm0, 12
    mov eax, 2
    cvtsi2sd xmm1, eax
    por xmm0, xmm1
    subsd xmm0, xmm1
    mov eax, 1
    cvtsi2sd xmm1, eax
    subsd xmm0, xmm1
    movq [Function], xmm0
  End Asm
  
  ptrBuffer += 8

End Function

Private Function CryptoR( ByVal One As Long, ByVal Two As Long ) As Long

  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If
  
  ' ASM by John Gleason @ PowerBASIC forums 
  Asm
    mov edx, [ptrBuffer]
    mov edx, [edx]
    mov ecx, [One]
    mov eax, [Two]
    cmp ecx, eax
    jl Now1LT2
    xchg eax, ecx
Now1LT2:
    Sub eax, ecx
    inc eax
    jz doTheRnd
    mul edx
    Add edx, ecx
doTheRnd:
    mov [Function], edx
  End Asm
  
  ptrBuffer += 4
  
End Function

Private Sub InitializeCryptoBuffers( ByVal Buffer As Long )
  BCryptOpenAlgorithmProvider( varptr(hRand), BCRYPT_RNG_ALGORITHM, 0, 0)
  ReDim As Any Ptr hThread(0 To 1)
  If Buffer < 1024 Then
    BufferSize = 1024
  Else
    BufferSize = Buffer - Buffer Mod 8
  End If
  ReDim Buffer0( 1 To BufferSize) As Byte
  ptrBaseBuffer0 = VarPtr( Buffer0(0) )
  ptrBuffer = ptrBaseBuffer0
  SwitchBufferCriteria = Cast(Long, ptrBuffer) + BufferSize
  hThread(0) = ThreadCreate( @FillBuffer, ptrBaseBuffer0 )
  hThread(1) = ThreadCreate( @FillBuffer, ptrBaseBuffer0 + BufferSize\2 )
  ThreadWait( hThread(0) )
  ThreadWait( hThread(1) )
  ReDim Buffer1( 1 To BufferSize) As Byte
  ptrBaseBuffer1 = VarPtr( Buffer1(0) )
  hThread(0) = ThreadCreate( @FillBuffer, ptrBaseBuffer1 )
  hThread(1) = ThreadCreate( @FillBuffer, ptrBaseBuffer1 + BufferSize\2 )
  
End Sub

Private Sub CleanUpCryptoRndBufferCNG
  BCryptCloseAlgorithmProvider( hRand, 0  )
End Sub

Private Sub FillBuffer( ByVal BaseBuffer As Any Ptr )
  BCryptGenRandom( hRand, BaseBuffer, BufferSize\2, 0)
End Sub

Private Sub SwitchBuffer
  ThreadWait( hThread(0) )
  ThreadWait( hThread(1) )
  Swap ptrBaseBuffer0, ptrBaseBuffer1
  ptrBuffer = ptrBaseBuffer0
  SwitchBufferCriteria = Cast(Long, ptrBuffer) + BufferSize
  hThread(0) = ThreadCreate( @FillBuffer, ptrBaseBuffer1 )
  hThread(1) = ThreadCreate( @FillBuffer, ptrBaseBuffer1 + BufferSize\2 )
End Sub
A lot of cryptographic code fails not with the algorithm employed but with its implementation. I GPF'd a few times because the assembler syntax differs a little to what I have been used to. I intend putting the code 'through the wringer' via statistical suites, PractRand if I can, but in the meantime here are a couple of simple tests.

The first one simply takes a black screen and zaps it with random white pixels, pausing to show the progression to a white screen. If you can see any black pixels on completion then take you monitor back, you paid good money for it. <laugh>. Actually, there probably are a few black pixels - we are talking random numbers here so some pixels may escape attention.

Plot ( Compile as GUI )

Code: Select all

' Compile as GUI
#Include Once "CryptoRndBufferCNG.inc"

Dim as Long i, j

ScreenRes 640, 640, 8

InitializeCryptoBuffers( 128*1024 )

Randomize

Sleep (1000,1)
j = 1
For i = 1 to 10*640*640
  PSet (CryptoR(11,631), CryptoR(11,631)), 15
  'PSet (11 + Rnd*621, 11 + Rnd*621), 15 ' Mersenne Twister
  If i Mod 640*640 = 0 Then
    Locate 1,1
    Print j
    j += 1
    Sleep (750,1)
  End If
Next

Locate 1,1
Print "Done"
Sleep

The next test uses Fourmilab's ent.exe. The code dumps 25MB of CryptoDW giving a file of 100MB random Ulongs.

Dump ( Compile as Console )

Code: Select all

' Compile as Console
#Include Once "CryptoRndBufferCNG.inc"
#Include "file.bi"

Dim as long i
InitializeCryptoBuffers( 128*1024 )
If FileExists( "100MB.txt" ) Then Kill "100MB.txt"
Open "100MB.txt" For Binary as #1
  For i = 1 to 25*1024*1024
    Put #1, , CryptoDW
  Next
Close #1
Shell( "ent.exe 100MB.txt")
Sleep
Here is a typical output

------------------------------------------------------------------------
Entropy = 7.999998 bits per byte.

Optimum compression would reduce the size
of this 104857600 byte file by 0 percent.

Chi square distribution for 104857600 samples is 255.54, and randomly
would exceed this value 47.88 percent of the times.

Arithmetic mean value of data bytes is 127.5064 (127.5 = random).
Monte Carlo value for Pi is 3.141566969 (error 0.00 percent).
Serial correlation coefficient is 0.000028 (totally uncorrelated = 0.0).
------------------------------------------------------------------------

No grounds for concern there.

Finally, code to test how long it takes to fill a chosen buffer, how long it takes to 'crunch' a requested number and an estimate of the rate of generation.

TestCrypto ( Compile as Console )

Code: Select all

' Compile as Console
#Include Once "MacroTimersQPC.inc"
#Include Once "CryptoRndBufferCNG.inc"

Dim as Long lBufferSize, lNumBuffers, lRequestNumber, i
Dim As Double Tot
Dim As Single Dummy

lBufferSize = 512*1024
lNumBuffers =  20
lRequestNumber = lBufferSize*lNumBuffers\4 ' 4 For Single precision and Range, 8 for double precision
Print "Requested " + str(lRequestNumber)

If lBufferSize < 1024 Then
  lBufferSize = 1024
Else
  lBufferSize = lBufferSize - lBufferSize Mod 8
End If 
Print "BufferSize";lBufferSize

StartTimer(0)
InitializeCryptoBuffers( lBuffersize )
StopTimer(0)
Print sTimeTaken(0,3,0) + " Time to fill"

StartTimer(0)
For i = 1 to lRequestNumber
  CryptoS
Next
StopTimer(0)
Dummy = Val(sTimeTaken(0,3,0))
Print str(Dummy) + "ms Time to crunch"
Print str(Int(lRequestNumber/(1000*Dummy))) + " Million per second"

CleanUpCryptoRndBufferCNG

Sleep
Important: The code using the functions is bracketed by InitializeCryptoBuffers( ) and CleanUpCryptoRndBufferCNG.

Here is the output of using a buffer of 512KB and a request of 20 buffers giving 2,621,440 random single precision numbers.
-----------------------
Requested 2621440
BufferSize 524288
.684ms Time to fill
10.265ms Time to crunch
255 Million per second
-----------------------

What bothers me is the rate of generation at 255 million per second.

I did not expect that. That is three times faster than Mersenne Twister.

A challenge: Break it!

Have fun.

David Roberts

PS Just for the record I am using Windows 10 Pro, Intel i7-3770K CPU @ 3.50GHz, 3.90GHz with turbo.
greenink
Posts: 200
Joined: Jan 28, 2016 15:45

Re: A fast CPRNG

Post by greenink »

I think a better way or at least faster way to convent a 32/64 bit random integer is just to multiply by a floating point constant (here for 64 bit ULongInt):

Code: Select all

	' (0,1]
	function rndSingle() as single
		return rand()*5.4210107e-20!
	end function

	' [-1,1]
	function rndSingleSym() as single
		return cast(longint,rand())*1.0842021e-19!
	end function
In assembly the conversion (with GCC and -O 3) is just variations of:

Code: Select all

cvtsi2ss	xmm0, rdx	'convert the 64 bit random number to a 32 bit float
mulss	xmm0, DWORD PTR .LC0[rip]    ' multiply by the constant
Last edited by greenink on Feb 03, 2017 6:07, edited 1 time in total.
greenink
Posts: 200
Joined: Jan 28, 2016 15:45

Re: A fast CPRNG

Post by greenink »

Anyway the main use of pseudo-random number generators in most applications is correlation breaking, which fortunately isn't too difficult.
deltarho[1859]
Posts: 4305
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

or 2.328306436538696289063e-10 ( from 1/2^32 ) for 32 bit [0,1), but where is the fun in that?
Anyway the main use of pseudo-random number generators in most applications is correlation breaking, which fortunately isn't too difficult.
I was wondering what that has to do with the price of bacon but then I thought it might. <smile>

Anyway, I am more interested in ideas than what use they may have.
greenink
Posts: 200
Joined: Jan 28, 2016 15:45

Re: A fast CPRNG

Post by greenink »

I suppose you could try to control markets by correlation breaking, that's maybe one for the Simons institute and the extremely rich Mr. Simons.
deltarho[1859]
Posts: 4305
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

multiply by a floating point constant
Did some tests and averaged.

125MB of CryptoS took 493.309ms with the constant multiplier approach and 467.097ms with the code as published.

I thought your constant multiplier may have an edge but I was wrong, although we only have a difference of 5%.

Credit to Wilbert at the PureBasic forums; and credited in the published code.

Of course, the constant multiplier gives readable code, provided we comment like '( from 1/2^32 )'. I will give in to non-readable code if the blinkers are on pursuing faster code. I must confess that I have blinkers on most of the time. <smile>
deltarho[1859]
Posts: 4305
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

I now have a library: http://deltarho.org.uk/Downloads/CryptoRnd.zip

The download is a zipped folder which unzips to CryptoRnd where you will find CryptoRnd.bi and libCryptoRnd.a.

The library is only 4612 bytes.

I thought that using a library would slow things down a bit, but it hasn't.

For the time being put the files into the folder where your exe resides. When I have finished testing you can then put them into the usual FreeBASIC locations.

Example usage:

Code: Select all

#include once "CryptoRnd.bi"
 
Dim As Long i, j
Dim As Double Tot
 
InitializeCryptoBuffers( 512*1024 )
 
j = 100000000 ' 100 million
 
For i = 1 to j
  Tot += CryptoS
Next
Print Tot/j
 
Tot = 0
For i = 1 to j
  Tot += CryptoD
Next
Print Tot/j
 
Tot = 0
For i = 1 to j
  Tot += CryptoSX
Next
Print Tot/j
 
Tot = 0
For i = 1 to j
  Tot += CryptoDX
Next
Print Tot/j
 
Tot = 0
For i = 1 to j
  Tot += CryptoR(0,255)
Next
Print Tot/j
 
CleanUpCryptoRnd
 
Sleep
Note that the clean up command is now called CleanUpCryptoRnd.
deltarho[1859]
Posts: 4305
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

Here is a typical run of the above:

Code: Select all

 0.5000036318678915
 0.5000035281555593
-9.399479785203934e-005
 6.767945995862125e-006
 127.49835567
deltarho[1859]
Posts: 4305
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

I am having problems piping to PractRand so I dumped 256MB of CryptoDW ( 1GB ) to disk. 32MB, 64MB, 128MB, 256MB, 512MB and 1GB were tested. Four of them had no anomalies and two had one unusual value. Nothing wrong here - I have seen worse with Intel RdRand and quantum random number dumps.

While I was at it I dumped 1GB of Mersenne Twister bytes ( 0 to 255 ) and got no anomalies at all. Not bad for a 20 year old generator. Perhaps I shouldn't mention this but PowerBASIC's RND was torn apart by PractRand and it did not get past 32MB - PractRand throws the towel in if too many failures; as opposed to anomalies. PowerBASIC's RND is proprietary so I am not sure what is used.

I will carry on as I'd like to test well past 1GB.
deltarho[1859]
Posts: 4305
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

Tested to 64GB - only one anomaly at 512MB. I haven't seen that many tests do that well. If any random number generator never finds itself in the tail of a bell curve then we would have a contradiction.

128GB would take a further 140 minutes and doubling each time. A TB would take a further 35 hours. My CPU was getting a bit warm. <smile>

I reckon that the library is good to go now.

Estimating Pi using Monte Carlo

Code: Select all

#Include Once "MacroTimersQPC.inc"
#include once "CryptoRnd.bi"
 
Dim As Ulong n, lCount
Dim As Long i
Dim As Single x, y
 
InitializeCryptoBuffers( 512*1024 )
 
n = 1000000000 ' One billion
 
StartTimer(0)
For i = 1 to n
  x = CryptoD
  y = CryptoD
  If Sqr( x*x + y*y) <= 1 Then lCount += 1
Next
StopTimer(0)
Print Str( Val( sTimeTaken(0,3,0) ) ) + "ms"
Print "Estimate of Pi:";4*lCount/n
 
CleanUpCryptoRnd
 
Sleep
 
' Pi = 3.14159265358979323846

One run:
16773.199ms
Estimate of Pi: 3.141599716

Not that accurate but just shy of 6 significant figures which isn't bad for 17 seconds crunching.
deltarho[1859]
Posts: 4305
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

I am having problems piping to PractRand ...
I was using MKI without realising that what I actually needed was a MKUI, unsigned integer, and that doesn't exist. <laugh>

So, what I am now doing is to populate a Ulong array with CryptoDW, copy the contents to a string and then stream the strings via a pipe to PractRand.

It may seem a bit of a 'dog's dinner' but it is better than dumping to a very large file and then reading it.

With PractRand we use, at the command prompt: My_RNG | RNG_test stdin

where My_RNG is the code below and RNG_test is supplied by the author of PractRand.

Code: Select all

#include once "CryptoRnd.bi"
 
Dim As Long i, j
Dim As Ulong UL(1 To 256) ' 1024 bytes
Dim As Ulong Ptr PtrUL
Dim As String S
Dim As ZString Ptr Ptrs
 
PtrUL = VarPtr( UL(1) )
S = Space(1024)
PtrS = StrPtr ( S )
 
InitializeCryptoBuffers( 512*1024 )
 
For i = 1 to 64*1024*1024 ' 64MB
  For j = 1 to 256 ' Populate UL, resulting in a test of 64GB
    UL(j) = CryptoDW
  Next
  Asm ' Copy array UL memory to string S memory
    cld
    mov ecx, 256
    mov esi, [PtrUL]
    mov edi, [PtrS]
    rep movsd
  End Asm
  Print S;
Next
 
CleanUpCryptoRnd
Sleep
deltarho[1859]
Posts: 4305
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

My_RNG was written in 'PowerBASC mode'. PowerBASIC has a command where an array and a string can share the same memory; without which I resorted to copying.

The FreeBASIC docs make it clear that we should use StrPtr to point to a string's character data with VarPtr returning the address of the internal descriptor. However, ZString Ptr is a byte pointer and I wanted to step through a string four bytes at a time.

A fixed length string does not have a descriptor so I reckoned I could use a ULong Ptr and use VarPtr.

It works but I get a warning that 'SPtr = VarPtr( S )' is a 'Suspicious pointer assignment'.

Perhaps the docs should tell us that VarPtr points to a fixed string's character data and not treat it as suspicious.

I have been using PractRand V0.91 because V0.92 did not work for me on Windows 7. It is working on Windows 10 and V0.92 is faster than V0.91. I am streaming 1MB strings instead of 1KB strings and using an infinite loop rather than a finite one.

V0.92 plus a revised My_RNG sees PractRand crunching data at about 75MB/s.

As I write this I have seen a couple of unusual values during a 256MB run; and not that unusual. There has been no anomalies since and the test is currently looking at 1TB. As mentioned, I have seen worse with Intel RdRand and quantum random numbers.

Here is the latest My_RNG

Code: Select all

#include Once "CryptoRnd.bi"
 
Dim Shared S As String * 1048576 ' Too large for stack so used Shared
Dim As ULong Ptr SPtr, BasePtr
Dim As Long i, j
 
SPtr = VarPtr( S )
BasePtr = SPtr
 
InitializeCryptoBuffers( 512*1024 )
 
Do
  SPtr = BasePtr
  For j = 1 to 262144
    *SPtr = CryptoDW
    SPtr += 1
  Next
  Print S;
Loop
 
CleanUpCryptoRnd
Sleep
deltarho[1859]
Posts: 4305
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

Finally hit 1TB with a clean sweep except for the two unusual values during the 256MB run. I am glad about them - I think that I would find it a little disconcerting to get a clean sweep. The test went from 256MB, 512MB, ..., 1TB giving a total crunch of 2.38TB. I started another test and it wasn't until 1GB that an anomaly (unusual) occurred. I pulled out then.

That will do for testing - my CPU is threatening me with a restraining order. <smile>
St_W
Posts: 1626
Joined: Feb 11, 2009 14:24
Location: Austria
Contact:

Re: A fast CPRNG

Post by St_W »

You still can and should use strptr to get the pointer to the string data. Simply cast the pointer to the datatype you want like this:

Code: Select all

Dim Shared S As String * 1048576 ' Too large for stack so used Shared
Dim As ULong Ptr SPtr
 
SPtr = cptr(ulong ptr, strptr( S ))
And keep in mind that you may read memory beyond the string data if your string doesn't have a length that is a multiple of four. (And you never should do such things, so ensure that only appropriate strings are used or read the last characters byte by byte)

Unfortunately I couldn't test your program as the library isn't compatible with my system. Is the source code for the CryptoRnd library available so I can compile it myself?
deltarho[1859]
Posts: 4305
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: A fast CPRNG

Post by deltarho[1859] »

Thanks, St_W. I am not up to speed with all the commands yet. :(

CryptoRnd.bas is only a slight change to CryptoRndBufferCNG.inc but here it is.

CryptoRnd.bas

Code: Select all

#include once "windows.bi"
#include once "win/bcrypt.bi"
#inclib "bcrypt"

Dim Shared As Byte Ptr hRand
Dim Shared As Byte Buffer0(), Buffer1()
Dim Shared As Long BufferSize
Dim Shared As Any Ptr hThread()
Dim Shared As Any Ptr ptrBuffer, ptrBaseBuffer0, ptrBaseBuffer1 
Dim Shared As Long SwitchBufferCriteria

Declare Sub SwitchBuffer
Declare Sub FillBuffer( As Any Ptr )

Public Function CryptoDW As ULong

  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If
  
  Asm
    mov eax, [ptrBuffer]
    mov eax, [eax]
    mov [Function], eax
  End Asm
  
  ptrBuffer += 4
  
End Function

Public Function CryptoS As Single ' [0,1)
  
  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If

  ' ASM by Wilbert @ PureBasic forums
  Asm
    mov eax, [ptrBuffer]
    mov eax, [eax]
    movd xmm0, eax
    psrlq xmm0, 9
    mov eax, 1
    cvtsi2ss xmm1, eax
    por xmm0, xmm1
    subss xmm0, xmm1
    movd [Function], xmm0
  End Asm
  
  ptrBuffer += 4
 
End Function

Public Function CryptoSX As Single ' [-1,1]

  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If

  ' ASM adapted from CryptoS by author
  Asm
    mov eax, [ptrBuffer]
    mov eax, [eax]
    movd xmm0, eax
    psrlq xmm0, 9
    mov edx, 2
    cvtsi2ss xmm1, edx
    por xmm0, xmm1
    subss xmm0, xmm1
    mov edx, 1
    cvtsi2ss xmm1, edx
    subss xmm0, xmm1
    movd [Function], xmm0
  End Asm
  
  ptrBuffer += 4
  
End Function

Public Function CryptoD As Double  ' [0,1)
  
  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If  
  
  ' ASM by Wilbert at PureBasic forums
  Asm
    mov eax, [ptrBuffer]
    movd xmm0, [eax]
    movd xmm1, [eax + 4]
    punpckldq xmm0, xmm1
    psrlq xmm0, 12
    mov eax, 1
    cvtsi2sd xmm1, eax
    por xmm0, xmm1
    subsd xmm0, xmm1
    movq [Function], xmm0
  End Asm
  
  ptrBuffer += 8

End Function

Public Function CryptoDX As Double  ' [-1,1]

  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If
 
  ' ASM adapted from CryptoD by author
  Asm
    mov eax, [ptrBuffer]
    movd xmm0, [eax]
    movd xmm1, [eax + 4]
    punpckldq xmm0, xmm1
    psrlq xmm0, 12
    mov eax, 2
    cvtsi2sd xmm1, eax
    por xmm0, xmm1
    subsd xmm0, xmm1
    mov eax, 1
    cvtsi2sd xmm1, eax
    subsd xmm0, xmm1
    movq [Function], xmm0
  End Asm
  
  ptrBuffer += 8

End Function

Public Function CryptoR( ByVal One As Long, ByVal Two As Long ) As Long

  If ptrBuffer >= SwitchBufferCriteria Then
    SwitchBuffer
  End If
  
  ' ASM by John Gleason @ PowerBASIC forums 
  Asm
    mov edx, [ptrBuffer]
    mov edx, [edx]
    mov ecx, [One]
    mov eax, [Two]
    cmp ecx, eax
    jl Now1LT2
    xchg eax, ecx
Now1LT2:
    Sub eax, ecx
    inc eax
    jz doTheRnd
    mul edx
    Add edx, ecx
doTheRnd:
    mov [Function], edx
  End Asm
  
  ptrBuffer += 4
  
End Function

Public Sub InitializeCryptoBuffers( ByVal Buffer As Long )
  BCryptOpenAlgorithmProvider( varptr(hRand), BCRYPT_RNG_ALGORITHM, 0, 0)
  ReDim As Any Ptr hThread(0 To 1)
  If Buffer < 1024 Then
    BufferSize = 1024
  Else
    BufferSize = Buffer - Buffer Mod 8
  End If
  ReDim Buffer0( 1 To BufferSize) As Byte
  ptrBaseBuffer0 = VarPtr( Buffer0(0) )
  ptrBuffer = ptrBaseBuffer0
  SwitchBufferCriteria = Cast(Long, ptrBuffer) + BufferSize
  hThread(0) = ThreadCreate( @FillBuffer, ptrBaseBuffer0 )
  hThread(1) = ThreadCreate( @FillBuffer, ptrBaseBuffer0 + BufferSize\2 )
  ThreadWait( hThread(0) )
  ThreadWait( hThread(1) )
  ReDim Buffer1( 1 To BufferSize) As Byte
  ptrBaseBuffer1 = VarPtr( Buffer1(0) )
  hThread(0) = ThreadCreate( @FillBuffer, ptrBaseBuffer1 )
  hThread(1) = ThreadCreate( @FillBuffer, ptrBaseBuffer1 + BufferSize\2 )
  
End Sub

Public Sub CleanUpCryptoRnd
  BCryptCloseAlgorithmProvider( hRand, 0  )
End Sub

Private Sub FillBuffer( ByVal BaseBuffer As Any Ptr )
  BCryptGenRandom( hRand, BaseBuffer, BufferSize\2, 0)
End Sub

Private Sub SwitchBuffer
  ThreadWait( hThread(0) )
  ThreadWait( hThread(1) )
  Swap ptrBaseBuffer0, ptrBaseBuffer1
  ptrBuffer = ptrBaseBuffer0
  SwitchBufferCriteria = Cast(Long, ptrBuffer) + BufferSize
  hThread(0) = ThreadCreate( @FillBuffer, ptrBaseBuffer1 )
  hThread(1) = ThreadCreate( @FillBuffer, ptrBaseBuffer1 + BufferSize\2 )
End Sub
and here is CryptoRnd.bi

CryptoRnd.bi

Code: Select all

#inclib "CryptoRnd"
Declare Function CryptoDW As ULong
Declare Function CryptoS As Single
Declare Function CryptoSX As Single
DecLare Function CryptoD As Double
Declare Function CryptoDX As Double
Declare Function CryptoR( As Long, As Long ) As Long
Declare Sub InitializeCryptoBuffers( As Long )
Declare Sub CleanUpCryptoRnd
Post Reply