A PRNG for graphics programs?

UEZ · Post by **UEZ** » Aug 02, 2020 10:49

deltarho[1859] wrote:I may as well continue.

UEZ uses
Code: Select all
Function RandomRange(fStart As Single, fEnd As Single) As Single
  Return Rnd() * (fEnd - fStart) + fStart
End Function
In keeping with Microsoft's appending with Ex for their 'posh' versions of APIs my normal distribution version is:
Code: Select all
Function RandomRangeEx(fStart As Double, fEnd As Double, delta As Double, flag As Double) As Double
  ' delta as in mean ± delta. delta/flag is the standard deviation
  ' flag is one of the SD definitions
  Return (fEnd + fStart)/2 + (2*(Rnd<0.5)+1) * NR.Gauss * delta/flag
End Function
Here are a couple of example usages: The first looks at a range -1 to 1 with a delta of 0.7 and an 80% 'capture'; the second looks at a range 20 to 40 with delta of 5 and a 90% 'capture'. We then work out the percentage of values, from 10^8 calls, without the 'capture', that is the less likely to be selected.
Code: Select all
#include "NR32.bas"
 
#define SD70 1.03643
#define SD80 1.28155
#define SD90 1.64485
#define SD95 1.96
#define SD99 2.57583
 
Function RandomRangeEx(fStart As Double, fEnd As Double, delta As Double, flag As Double) As Double
  ' delta as in mean ± delta. delta/flag is the standard deviation
  ' flag is one of the SD definitions
  Return (fEnd + fStart)/2 + (2*(Rnd<0.5)+1) * NR.Gauss * delta/flag
End Function
 
Dim As Ulong i, count, n = 10^8
For i = 1 to n
  x = RandomRangeEx( -1, 1, 0.7, SD80 )
  If x < -0.7 Orelse x > 0.7 Then count += 1
Next
Print 100*count/n;"%" ' => 19.999275%
 
count = 0
For i = 1 to n
  x = RandomRangeEx( 20, 40, 5, SD90 )
  If x < 25 Orelse x > 35 Then count += 1
Next
Print 100*count/n;"%" ' => 10.001282%
 
Sleep
In UEZ's Fireworks I replaced every RandomRange with RandomRangeEx, except the first one with regard rocketx. I then tried a variety of deltas and SD definitions and was amazed at the scenarios made available some of which 'ruined' the application. However, UEZ will be in a much better position to exploit RandomRangeEx. What I should have done was to replace them one at a time, which I will do. I wrote earlier "What I have described may be of no use to a graphics programmer but as intimated above what do I know?". Well, I now think that RandomRangeEx in the right hands may be a useful tool. Of course some ranges have to be uniformly distributed but for those that don't then a bit of tweaking may be worthwhile.

Your PRNG seems to be faster than the built-in version - well done. I will use it in my future projects. :-)

The purpose for the RandomRange function is when you need a range of random number, in the example of Fireworks it's used for the direction vector e.g. rocket. But sometimes I want to have a range [-2, 2] but I don't want to have a number around 0, otherwise the pixel will not really move when the number is near to zero. Thus, you can use twice the RandomRange function with its appropriate ranges and choose one of them randomly.

Btw, you use a constant 2^32, shouldn't it be 2^32-1?

deltarho[1859] · Post by **deltarho[1859]** » Aug 02, 2020 10:59

UEZ wrote:Btw, you use a constant 2^32, shouldn't it be 2^32-1?

Where in particular?

UEZ · Post by **UEZ** » Aug 02, 2020 11:48

deltarho[1859] wrote:
UEZ wrote:Btw, you use a constant 2^32, shouldn't it be 2^32-1?
Where in particular?

In functions:

Private Function NR32.RandD As Double ' Rnd replacement
Engine
Return this.state/2^32
End Function
Private Function NR32.Range Overload ( Byval One As Double, Byval Two As Double ) As Double
Engine
Return this.state/2^32*( Two - One ) + One
End Function
Private Sub NR32.MyRandomize( Byval seed As Ulong = 0 )
If seed = 0 Then
Randomize , 5 ' Crypto
this.state = Cast( Ulong, Rnd*(2^32) )
Else
this.state = seed
End If
' Warm up
For i As Ulong = 1 to 400
this.Rand
Next
End Sub

For my understanding 2^32 is a 64-bit number whereas 2^32-1 ist the biggest unsigned 32-bit number.

Code: Select all

? Hex(2^32)
? Hex(2^32 - 1)

Sleep

I would use #define _p 2^32 and replace those expression to avoid unnecessary calculations.

deltarho[1859] · Post by **deltarho[1859]** » Aug 02, 2020 12:39

UEZ wrote:For my understanding 2^32 is a 64-bit number whereas 2^32-1 ist the biggest unsigned 32-bit number.

Agreed, but I don't understand your point.

We can use '÷ 2^(35)', for example, on a scientific calculator so why should we have a problem with FB?

unnecessary calculations

Modern compilers will replace 2^32 with 4294967296. With PowerBASIC the programmer has to do the optimization.

I may be wrong but I think 1/2^32 is replaced with '*2.328306437e-10' to avoid the division.

UEZ · Post by **UEZ** » Aug 02, 2020 13:06

deltarho[1859] wrote:Agreed, but I don't understand your point.

That was only a brain fart... ^^

deltarho[1859] wrote:Modern compilers will replace 2^32 with 4294967296

I didn't disassemble the code to see if the FB compiler is smart to replace such expression.

deltarho[1859] · Post by **deltarho[1859]** » Aug 02, 2020 16:51

Looking at .c and .asm didn't tell me much.

So, I used:

Code: Select all

Const two32 = 2^32
Const divtwo32 = 1/2^32

The throughput for .RandD and .Range (continuous) was unchanged.

Obviously, with divtwo32 I use '*divtwo32' as in 'this.state/2^32' ==> 'this.state*divtwo32'

So, the compiler will replace 2^32 and 1/2^32 for us.

CAVEAT: This is only true for gcc - gas is not smart. I did not know this - I don't use gas. The gas throughput was like gcc wearing a ball and chain and why I never use it.

deltarho[1859] · Post by **deltarho[1859]** » Aug 03, 2020 21:23

Yours truly wrote:The gas throughput was like gcc wearing a ball and chain and why I never use it.

That is regarding what I call 'MHz' testing or laboratory testing where we consider the raw throughput. In real ilfe situations the relative differences are very much less because Rnd is part of a statement and not just the statement itself.

With this code we determine the average Rnd for the whole of the generator's sequence and show the average got and how long it took to calculate. The gcc figures used gcc 8.3 and optimization level -O2. The Const mentioned above were used to give gas 32 and gas 64 optimized values for 2^32 and 1/2^32.

Code: Select all

t = Timer
For i = 1 to two32
  tot += Rnd
Next
t = Timer - t
Print tot*divtwo32, t

I have tided up the output.

Code: Select all

gas 32: 0.4999999999998861          52.92s
gas 64: 0.4999999999998861          19.38s
gcc 32: 0.4999999998835847          31.96s
gcc 64: 0.4999999999998861           4.44s

Yes, gas 32 is the slowest but not by such a margin as a 'MHz' test would indicate.

Gas 64 is quite impressive in comparison.

Gcc 32 is showing the benefit of optimization. It is worth noting that gcc 32 without optimization comes in at 67.25s

Gcc 64 is, what can I say, remarkable.

Since we are looking at the whole of the generator's sequence the fact that a random seed is being used each time is academic as we are going 'full circle' so to speak. The average Rnd should be the same for all tests and that is true for gas 32, gas 64, and gcc 64. However, gcc 32 differs. I tried a few other 32-bit compilers, and they came up with the same average. I shouldn't think that this would matter in practice but it seems that the 32 bit gcc compiler's arithmetic ability is lacking somewhere.

I have replaced the opening post's code to include

Code: Select all

Const two32 = 2^32
Const divtwo32 = 1/2^32

That is for those of you who use gas but have not read the whole of this thread.

deltarho[1859] · Post by **deltarho[1859]** » Aug 03, 2020 22:49

Yours truly wrote:I shouldn't think that this would matter in practice but it seems that the 32 bit gcc compiler's arithmetic ability is lacking somewhere.

Forget that. I have just done a test using a FB generator with a fixed seed and got the same average on a loop of 10^8 for the four compilers so there is something else going on - don't know what yet.

deltarho[1859] · Post by **deltarho[1859]** » Aug 04, 2020 5:00

Yours truly wrote:don't know what yet.

A loop of 10^8 (2^26.57) is much less than 2^32, so I looked at loops of 2^28, 2^29, 2^30, and 2^31.

The first line is with gcc 32, the second line is with gcc 64 and the third line is the difference.

Added: Yes, I used a fixed seed for this test.

Code: Select all

2^28
0.4999865897698328
0.4999865898844297
1.145968875349013e-010
 
2^29
0.4999771722359583
0.4999771723514643
1.155060491697668e-010
 
2^30
0.4999893297208473
0.499989329836808
1.159606854983508e-010
 
2^31
0.499993363278918
0.4999933633951061
1.16188059173794e-010

The difference increases as the loop count increases.

This is classic truncation error accumulating.

It seems that gcc 32's internal precision is less than that of gcc 64 whereas the internal precision of the gas compilers is the same and equal to that of gcc 64.

If someone is looking for the best precision in their code the above suggests using either the gas compilers or gcc 64. However, it should be noted that the difference is small, and we are looking at 2^32, 4,294,967,296, calculations.

The question is have I stumbled onto something or is this well known to the community and debated before I joined the forum?

We are now off-topic so perhaps I should start a new thread. I will leave it here as a few are reading this thread.

PS I checked out gcc optimization levels -O1, -O3, and -Ofast, and we have a difference between gcc 32-bit and 64-bit. However, with no optimization there is no difference. OUCH!

deltarho[1859] · Post by **deltarho[1859]** » Aug 04, 2020 8:48

I also looked at the four loops using gas 32 and the average values were identical to gcc 64.

gcc 64 is being optimized so the 'glitch' is in gcc 32 optimization and has been there since before v5.2 - oh, dear!

I may be wrong but I do not think that this is a damaging revelation but it should not be happening.

SARG · Post by **SARG** » Aug 04, 2020 10:08

deltarho[1859] wrote:Gas 64 is quite impressive in comparison.

Great.
A tip for speeding up execution : if possible use integer (64bit) variable because all other datatypes (not floats) are converted in integer before calculation then converted again in the other direction. If division is involved it matters less.

Regarding 2^32 and 1/2^32 and gas32/64
2^32 is replaced by the compiler by 4294967296
And 1/2^32 is optimized removing only the multiplication by 1 so there is still the division by 4294967296....
However using '(1/2^32)' the full optimization is made : replaced by ' * 2.328306436538696e-010 '

I only looked at the code generated by gas64 but I guess gas32 should do the same.

deltarho[1859] · Post by **deltarho[1859]** » Aug 04, 2020 10:30

Yours truly wrote:but it should not be happening.

It doesn't if we use -fpu SSE.

Code: Select all

2^32
 gcc 32: 0.4999999999998862 5.588466799992602 (using -fpu SSE)
 gcc 64: 0.4999999999998862 4.422101800009841

Not only has the 'glitch' been removed from 32-bit but look at the time taken - down from 31.96s to 5.59s

This is crazy.

I need someone to join me on this because I am having a problem believing where I have ended up, which is:

"If you use gcc 32-bit then use SSE and not the FPU."

deltarho[1859] · Post by **deltarho[1859]** » Aug 04, 2020 10:41

@SARG

I am not getting any optimization for 2^32 or 1/2^32 with either gas 32 or gas 64.

However, if we include the two Const statements, which I now have done in the opening post, whether a particular compiler does or does not optimize is academic.

Code: Select all

Const two32 = 2^32
Const divtwo32 = 1/2^32
Print two32             ==> 4294967296
Print divtwo32          ==> 2.328306436538696e-010
Sleep

Examples:
Relace Rnd*2^32 with Rnd*two32
Replace x/2^32 with x*divtwo32

SARG · Post by **SARG** » Aug 04, 2020 11:43

deltarho[1859] wrote:I am not getting any optimization for 2^32 or 1/2^32 with either gas 32 or gas 64.

What I got with the code below. All calculations are done by default with SSE. I let some of the comments used for an easier debugging.
I cheat a bit with the division using parenthesis.

There are still some optimizations to be made as some moves to registers are useless. For now I check only 2 consecutives lines.

Code: Select all

dim as integer itg

itg=itg*2^32
itg=itg*(1/2^32)

Code: Select all

             # -> CONVERTING vr4 [double] := ITG.5+-128 [integer]
   pxor xmm0,xmm0
   cvtsi2sd xmm0, QWORD PTR -128[rbp]
   movq r11, xmm0
              # -> bop vr4 [double] * 4294967296 [double]
   mov rax, 0x41F0000000000000 # DBL=4294967296
   movq xmm1, rax
   mulsd xmm0, xmm1
   movq r10, xmm0
              # -> CONVERTING vr6 [integer] := vr5 [double]
   roundsd xmm0,xmm0,4
   cvttsd2si rax, xmm0
              # -> store ITG.5+-128 [integer] := vr6 [integer]
   mov -128[rbp], rax

   pxor xmm0,xmm0
   cvtsi2sd xmm0, QWORD PTR -128[rbp]
   movq r11, xmm0
              # -> bop vr7 [double] * 2.328306436538696e-010 [double]
   mov rax, 0x3DF0000000000000 # DBL=2.328306436538696e-010
   movq xmm1, rax
   mulsd xmm0, xmm1
   movq r10, xmm0
              # -> CONVERTING vr9 [integer] := vr8 [double]
   roundsd xmm0,xmm0,4
   cvttsd2si rax, xmm0
              # -> store ITG.5+-128 [integer] := vr9 [integer]
   mov -128[rbp], rax

deltarho[1859] · Post by **deltarho[1859]** » Aug 04, 2020 12:41

I am not sure what the point of your last post is trying to make.

Code: Select all

dim as integer itg
 
itg=itg*2^32
itg=itg*(1/2^32)

Makes no sense to me.

Perhaps I am using an old version of fbc64_gas64.exe because this code:

Code: Select all

Const divtwo32 = 1/2^32
 
#Include "NR32.bas"
Dim As Ulongint i
Dim As Double x, t, y = 2^28
 
NR.MyRandomize(894578)
t = timer
For i = 1 to 10^9
  x = y/2^32
Next
t = Timer -t
Print t            ' ==> 2.085969599982491
NR.MyRandomize(894578)
t = timer
For i = 1 to 10^9
  x = y*divtwo32
Next
t = Timer -t
Print t             ' ==> 1.810037200018996
Sleep

gives *divtwo32 running 13.2% faster than /2^32 using gas 64.

The point that I am making is that using the two Const forces optimization whether the compiler is optimizing or not.

I only looked at the code generated by gas64 but I guess gas32 should do the same.

I would not bank on that.

A PRNG for graphics programs?

Re: A PRNG for graphics programs?

Re: A PRNG for graphics programs?

Re: A PRNG for graphics programs?

Re: A PRNG for graphics programs?

Re: A PRNG for graphics programs?

Re: A PRNG for graphics programs?

Re: A PRNG for graphics programs?

Re: A PRNG for graphics programs?

Re: A PRNG for graphics programs?

Re: A PRNG for graphics programs?

Re: A PRNG for graphics programs?

Re: A PRNG for graphics programs?

Re: A PRNG for graphics programs?

Re: A PRNG for graphics programs?

Re: A PRNG for graphics programs?