When timing any algorithm we should never rely on a single test - I normally use ten. We also need to employ a 'breathing space' between tests otherwise we only get information on a long test. The breathing space should be random so that we do not synchronise with any else going on in the system. The following uses 'Sleep Rnd * 500 + 500'. From the CPU's perspective that is a long interval.
This is a typical output (using -gen gcc -asm intel -Wc -O3)
Code: Select all
512
471
433
472
473
470
472
452
453
472
Average: 467
I always use pcg.MyRandomize to ensure a 'warm up'.
However, if we comment pcg.MyRandomize this output is typical.
Code: Select all
324
307
307
298
299
305
307
306
298
307
Average: 306
.MyRandomize simply populates the state vector and then executes a warm-up. I did not think that a warm-up would give us the better speed. However, it still needed to be eliminated so I introduced a separate warm-up and saw a repeat of the second run.
Populating the state vector should not give us the better speed either but, perhaps, the default values of zero may be the reason for a slower speed. I then hard wired non-zero values into the pcg32 Type. That, as expected, had no effect.
I then left pcg.MyRandomize commented and introduced pcg.rands; commented in the following code.
This saw a repeat of the first run.
So, just introducing a single sample request was enough to give us the better speed. This did not make sense as the following loop requests 100 million samples.
I am, of course, in run-time thinking. What about compile-time thinking.
Is the compiler learning something when it considers pcg.rands and uses that in the following loop. Perhaps it cannot learn, whatever that is, in the loop itself because it is busy doing other things.
I do not get this behaviour with optimization levels -O1 an -O2, just -O3. I also do not get this behaviour in 64 bit mode. In a nutshell then I only get this behaviour with -O3 and 32 bit mode.
I am not overly concerned about this because I would always use .MyRandomize either in random mode or fixed mode.
If I had been testing without using .MyRandomize and someone suggested I make a single sample request before the loop and an increase of 50% throughput would be on the cards I would have responded with "Are you pulling my leg?"
Any thoughts?
Code: Select all
#Include "PCG32II.bas"
Dim As ULong i, j
Dim As Double t, tot
Dim As Double a(1 To 10)
Dim pcg As pcg32
Randomize
pcg.MyRandomize
'pcg.rands
For j = 1 To 10
t = Timer
For i = 1 To 10^8
pcg.rands
Next
t = Timer - t
Print CInt(100/t)
a(j) = t
Sleep Rnd * 500 + 500
Next
For j = 1 To 10
tot += a(j)
Next
tot = tot/10
Print
Print "Average:";CInt(100/tot)
Sleep