Code timimg ( Part I )

dodicat · Post by **dodicat** » Nov 27, 2017 19:24

Ignore my last post.
Here is an ordinary mean value. I don't think that the distribution of timings is uniform, the deviations are much less than the expected deviations.
Also I have calculated a loop dead time close to each loop.
Same thing, array index versus pointer index on the stack.

Code: Select all




function deadtime(num as ulong=10000000) as double
    dim as double t=timer
 for n as long=1 to num
 next
 return timer-t
end function

function mean(a() as double,byref sd as double=0,byref min as double=0,byref max as double=0) as double
    min=1e6:max=-1e6
    dim as long lb=lbound(a),ub=ubound(a)
    dim as double acc,m
    for n as long=lb to ub
        acc+=a(n)
        if max<a(n) then max=a(n)
        if min>a(n) then min=a(n)
    next
    m=acc/(ub-lb+1)
    acc=0
    for n as long=lb to ub
        acc+=(a(n)-m)*(a(n)-m)
    next
    sd =sqr(acc/(ub-lb))
    return m
end function



#macro external(pd)

#endmacro

'==================================================

'On the stack
Dim As Integer a(10000) 
Dim As Integer Ptr ap=@a(0)
 
a(2017)=131:a(96)=30

 
Dim As Double t
Dim As longint i

dim as double PDA,PDP   '      ""           for Paul doe function
dim as long trials=20
dim as long limit=20000000
dim as double ma(1 to trials),mp(1 to trials)
sleep 100

For z As Long=1 To trials
 
  if rnd >.5 then 'which one first?
      
    i=0  
   t=timer   
  For n As Long=1 To limit
    
    i+=a(2017)-a(96)
  Next
  t=timer-t-deadtime
  ma(z)=t 'store each time
  external(PDA)
  Print t,i,"array element"
  Sleep 50
 
  i=0
  t=Timer
  For n As Long=1 To limit
     
    i+=ap[2017] -ap[96]
  Next
  t=timer-t-deadtime
  mp(z)=t 'store each time
   external(PDP)
  Print t ,i,"pointer element"
  Print
  sleep 50
  
else
  
  i=0
  t=Timer
  For n As Long=1 To limit
     
    i+=ap[2017] -ap[96]
  Next
  t=timer-t-deadtime
  mp(z)=t 'store each time
   external(PDP)
  Print t ,i,"pointer element"
  sleep 50
  i=0
  t=timer
   For n As Long=1 To limit
      
    i+=a(2017)-a(96)
  Next
  t=timer-t-deadtime
  ma(z)=t 'store each time
   external(PDA)
  Print t,i,"array element"
  sleep 50
  Print
  end if
 
Next z
 dim as double sd,max,min
'Delete[] ap
print
print
print tab(45);"Deviations from mean        expected for uniform"
print "Mean array   time "; mean(ma(),sd,min,max),sd,sqr((max-min)/12)
print "Mean pointer time "; mean(mp(),sd,min,max),sd,sqr((max-min)/12)
print "deadtime approx ";deadtime
Sleep

deltarho[1859] · Post by **deltarho[1859]** » Nov 27, 2017 19:55

***************************************************************************
NOTE added 1 Dec 2017
The rest of this thread is involved in the pros and cons of using a confidence interval based upon the assumption that the frequency distribution conforms to a bell curve of sorts. This approach has been abandoned by deltarho1859 and a new scheme proposed in 'Code timing( Part II )'.

***************************************************************************

Here is a confidence interval calculator for 2 to 30 tests. 10 million x Rnd as the code to time.

This is the result on one run: 95% CI: 94.40500119456026 plus/minus 35.25551486489453

Considering how tight the timings are the interval seems a bit large. I need to double check '(*ResultsPtr).delta'. Perhaps someone can have a look at that - it is simply (standard deviation) x t-value / sqr( tests ).

@dodicat: I would prefer an average 'deadtime' - half a dozen or so.

Code: Select all

' Analyse 2 to 30 results
Dim As Long i, j
Dim As Double t(1 To 30), y, tover, toverAve

Type Results
  As Double xbar, delta
End Type

Declare Sub ConfidenceInterval( x() As Double, As Results Ptr )

' Get an avarage of the For/Next overhead
For i = 1 To 6
  tover = Timer
  For n As Long=1 To 10000000
  Next
  tover = Timer - tover
  toverAve += tover
  Sleep 1
Next
toverAve = toverAve/6

For j = 1 To 30
  t(j) = Timer
  For i = 1 To 10000000
    y = Rnd
  Next
  t(j) = 1000*(Timer - t(j) - toverAve) ' in milliseconds
  Sleep 1
Next

Dim TimingResults As Results
ConfidenceInterval( t(), @TimingResults )

Print "95% CI: ";TimingResults.xbar;" plus/minus";TimingResults.delta

Sleep

Sub ConfidenceInterval( x() As Double, ResultsPtr As Results Ptr)
Dim As Double sigmaX, sigmaX2
Dim As Double t(1 To 29)
Dim As Long i, l
Dim BeenHere As Boolean
  
  l = Ubound(x) - Lbound(x) + 1
  If l < 2 Or l > 30 Then
    Print "Invalid number of tests"
    Exit Sub
  End If
  
  For i = Lbound(x) To Ubound(x)
    sigmaX += x(i)
    sigmaX2 += x(i)*x(i)
  Next
  
  If BeenHere = 0 Then
    For i = 1 To 29
      Read t(i)
    Next
    BeenHere = -1
  End If
  
  (*ResultsPtr).xbar = sigmaX/l
  (*ResultsPtr).delta = Sqr( (sigmaX2 - sigmaX * sigmaX/l) )*t(l-1)/l ' l-1 degrees of freedom
End Sub

Data 12.706, 4.303, 3.182, 2.776, 2.571, 2.447, 2.365, 2.306, 2.262, 2.228, 2.201, 2.179, 2.16, 2.145 
Data 2.131, 2.12, 2.11, 2.101, 2.093, 2.086, 2.08, 2.074, 2.069, 2.064, 2.06, 2.056, 2.056, 2.048, 2.045

deltarho[1859] · Post by **deltarho[1859]** » Nov 27, 2017 20:46

I did get '(*ResultsPtr).delta' wrong. I had xbar * xbar and it should have been sigmaX * sigmaX. Flaming heck! Above code corrected.

The above CI is now:-
95% CI: 94.8232782838974 plus/minus 0.06130832105743243

That is much better. Dear, oh dear. <smile>

dodicat · Post by **dodicat** » Nov 27, 2017 20:59

It's been a while since I looked at statistics.
But I remember that these confidence intervals always assume an underlying normal distribution.
(They have to, what else is there?)
Anyway, comparing ordinary functions with byref functions, and getting an average dead loop time at the start:
Looks like a bug in -gen gas
If I comment out line 53
dim as double PDA,PDP
then I get different outcomes

Code: Select all




function deadtime(num as ulong=10000000) as double
    dim as double t=timer
 for n as long=1 to num
 next
 return timer-t
end function

function mean(a() as double,byref sd as double=0,byref min as double=0,byref max as double=0) as double
    min=1e6:max=-1e6
    dim as long lb=lbound(a),ub=ubound(a)
    dim as double acc,m
    for n as long=lb to ub
        acc+=a(n)
        if max<a(n) then max=a(n)
        if min>a(n) then min=a(n)
    next
    m=acc/(ub-lb+1)
    acc=0
    for n as long=lb to ub
        acc+=(a(n)-m)*(a(n)-m)
    next
    sd =sqr(acc/(ub-lb))
    return m
end function



#macro external(pd)

#endmacro

'=================== TEST FUNCTIONS  =============================
function fbyref(x as double) byref as const double
    static as double d
    d=x
    return d
end function

function fbyval(x as double)as double
   ' static as double d
   ' d=x
    return x
end function


Dim As Double t
Dim As double i

dim as double PDA,PDP  

dim as long trials=20
dim as long limit=20000000
dim as double ma(1 to trials),mp(1 to trials)

dim as double dead
for n as long=1 to trials
    dead+=deadtime(limit)
next
dead=dead/trials


sleep 100

For z As Long=1 To trials
 
  if rnd >.5 then 'which one first?
      
    i=0  
   t=timer   
  For n As Long=1 To limit
    i+=fbyref(.1)
  Next
  t=timer-t-dead
  ma(z)=t 'store each time
  Print t,i,"byref"
  Sleep 50
 
  i=0
  t=Timer
  For n As Long=1 To limit
     i+=fbyval(.1)
  Next
  t=timer-t-dead
  mp(z)=t 'store each time
  Print t ,i,"byval"
  Print
  sleep 50
  
else
  
  i=0
  t=Timer
  For n As Long=1 To limit
     i+=fbyval(.1)
  Next
  t=timer-t-dead
  mp(z)=t 'store each time
  Print t ,i,"byval"
  sleep 50
  i=0
  t=timer
   For n As Long=1 To limit
      i+=fbyref(.1)
  Next
  t=timer-t-dead
  ma(z)=t 'store each time
  Print t,i,"byref"
  sleep 50
  Print
  end if
 
Next z
 dim as double sd,max,min

print
print
print tab(45);"Deviations from mean        expected for uniform distribution"
print "Mean byref time "; mean(ma(),sd,min,max),sd,sqr((max-min)/12)
print "Mean byval time "; mean(mp(),sd,min,max),sd,sqr((max-min)/12)
print "deadtime approx ";dead
Sleep

Also, deadtime will be about zero with -O3 gcc

deltarho[1859] · Post by **deltarho[1859]** » Nov 27, 2017 22:33

dodicat wrote:But I remember that these confidence intervals always assume an underlying normal distribution.

Pretty much everything has an underlying normal distribution - bit like the natural constants - part of nature. I am not using the normal distribution because ConfidenceInterval() is limited to 30 tests so I use the t-distribution.

Also, deadtime will be about zero with -O3 gcc

Wow! My toverAve almost vanishes as well. From 0.015s to 9.4e-007. I normally have -O3 gcc hard wired but have using gas because you were.

dodicat · Post by **dodicat** » Nov 27, 2017 23:45

In -O3
loops are short circuited if they have a constant result (zero here)

I think there is a gcc flag (something like __asm__ volatile to ensure these loops are not omitted.

But I am not a gcc fan, so I am not sure of this.

deltarho[1859] · Post by **deltarho[1859]** » Nov 28, 2017 0:28

@dodicat: I will post this first then have a look at your last post.

I have put ConfidenceInterval() into a bi.

I should tell you that the original Sub was a disaster when called twice - I did not have t() or BeenHere as Static. Absolutely disgraceful!

"Sit in the corner and be static for 30 minutes."
"Can I take my tea with me."
"Only if you use a straw - I said 'static' remember?"
"Oh, yeah, I keep forgetting."

CI.bi

Code: Select all

#include once "windows.bi"

#Define t90 1
#Define t95 2
#Define t99 3

Type Results
  As Double xbar, delta
End Type

function ConfidenceInterval( x() As Double, ResultsPtr As Results Ptr, t_value as long) as ulong
Dim As Double sigmaX, sigmaX2
Dim As Long i, l
dim as double t(1 To 29)
  
  l = Ubound(x) - Lbound(x) + 1
  If l < 2 Or l > 30 Then
    SetLastError ERROR_INVALID_PARAMETER
    Return -1
    Exit function
  End If
  
  If ResultsPtr = 0 Then
    SetLastError ERROR_INVALID_PARAMETER
    Return -1
    Exit function
  End If
  
  Select Case t_value
    Case 1
      Restore tvalue90
    case 2
      Restore tvalue95
    case 3
      restore tvalue99
    Case Else
      restore tvalue95
  End Select
  
  For i = Lbound(x) To Ubound(x)
    sigmaX += x(i)
    sigmaX2 += x(i)*x(i)
  Next
  
  For i = 1 To 29
    Read t(i)
  Next
  
  (*ResultsPtr).xbar = sigmaX/l
  (*ResultsPtr).delta = Sqr( (sigmaX2 - sigmaX * sigmaX/l) )*t(l-1)/l ' l-1 degrees of freedom
  
  Return 0

End function

tvalue90:
Data 6.314, 2.92, 2.353, 2.132, 2.015, 1.943, 1.895, 1.86, 1.833, 1.812, 1.796, 1.782, 1.771, 1.761
Data 1.753, 1.746, 1.74, 1.734, 1.729, 1.725, 1.721, 1.717, 1.714, 1.711, 1.708, 1.706, 1.703, 1.701, 1.699
tvalue95:
Data 12.706, 4.303, 3.182, 2.776, 2.571, 2.447, 2.365, 2.306, 2.262, 2.228, 2.201, 2.179, 2.16, 2.145 
Data 2.131, 2.12, 2.11, 2.101, 2.093, 2.086, 2.08, 2.074, 2.069, 2.064, 2.06, 2.056, 2.056, 2.048, 2.045
tvalue99:
Data 63.657, 9.925, 5.841, 4.604, 4.032, 3.707, 3.499, 3.355, 3.25, 3.169, 3.106, 3.055, 3.012, 2.977
Data 2.947, 2.921, 2.898, 2.878, 2.861, 2.845, 2.831, 2.819, 2.807, 2.797, 2.787, 2.779, 2.771, 2.763, 2.756

In the following, ConfidenceInterval() has a third parameter - mentioned in a later post.

'Get t(n) timings for Code1 where n = number of tests, 1 < n < 31

Code: Select all

Dim Code1 As Results
ConfidenceInterval( t(), @Code1, t95 )
Print "Code1 95% CI: ";Code1.xbar;" plus/minus";Code1.delta

'Get another t(n) timings for Code2

Code: Select all

Dim Code2 As Results
ConfidenceInterval( t(), @Code2, t95 )
Print "Code2 95% CI: ";Code2.xbar;" plus/minus";Code2.delta

deltarho[1859] · Post by **deltarho[1859]** » Nov 28, 2017 0:37

dodicat wrote:I think there is a gcc flag (something like __asm__ volatile to ensure these loops are not omitted.

News to me. I am a gcc fan but still a novice. I have to sit in the corner for another 10 minutes after admitting that. Oh dear, that will put me past my next tea break. PANIC!

deltarho[1859] · Post by **deltarho[1859]** » Nov 28, 2017 1:40

If we violate the restrictions on the number of tests and allow ConfidenceInterval() to continue then a number of things could happen including a GPF so we need to cover that.

Printing a message is bad news. We would never do that in a dll and, of course, ConfidenceInterval() may be executing in a GUI. So, the Print has been removed. In the event of a violation <whatever>.xbar will return zero. The ball is in your court as to how to handle that - just like a dll failure, we get an error code and it is up to us how we handle it.

deltarho[1859] · Post by **deltarho[1859]** » Nov 28, 2017 6:24

Yours truly wrote:<whatever>.xbar will return zero.

That is a bit unconventional.

What I have now done is what I should have done in the first place but when I started this thread it was not my intention to write a 'posh' function.

ConfidenceInterval() is now a function which returns an error code. If the function succeeds, the return value is ERROR_SUCCESS ( = 0 ). If the function fails, the return value is a nonzero error code. To get extended error information, call GetLastError.

So, test as you would an API with 'If ConfidenceInterval( t(), @<whatever> ) <> ERROR_SUCCESS Then'.

At the moment GetLastError will give ERROR_INVALID_PARAMETER ( = 87 ) if we violate the the restrictions on the number of tests or pass a null pointer for the second parameter.

It has just dawned on me that we have now left platform non-specific for Windows. It is so easy to do when your native OS is Windows and you are ignorant of anything else. Not to worry you have the source code, you Linux folk, so redesign as you wish.

jj2007 · Post by **jj2007** » Nov 28, 2017 7:40

dodicat wrote:But I remember that these confidence intervals always assume an underlying normal distribution.
(They have to, what else is there?)

The "else" is called "cpu" ;-)

There is a processor struggling hard to give you the fastest response possible. And it works in cycles. If an algo needs 512 cycles, for example, even a benevolent processor will NEVER give it to you in 511 cycles. It can't.

However, the 512 cycles may be INTERRUPTED by some other activity of the system. So don't be surprised if, sometimes, you measure 516 or 8000 cycles. That is normal, but the resulting distribution is very far from a normal one.

The strategy to follow here is: Make a hundred runs, sort them, then eliminate the slowest runs until you reach the zone where you get constant results.

deltarho[1859] · Post by **deltarho[1859]** » Nov 28, 2017 8:22

jj2007 wrote:The strategy to follow here is: Make a hundred runs, sort them, then eliminate the slowest runs until you reach the zone where you get constant results.

I used to do that - "laboratory conditions" I called it. However, if we plan using the 'conditioned' timings we must remember that the system will occasionally throw us the odd 'wobbly'. I returned to accepting all timings on the basis we get what happens in real life as opposed to the laboratory. Of course, for a clinical comparison the 'lab' approach should be used. Some will argue that comparisons should always be clinical.

Both 'lab' and 'real' have their pros and cons. With ConfidenceInterval() it will crunch whatever it is given, 'lab' or 'real', so we can test with both views.

I am glad you mentioned the 'lab' strategy otherwise some folk would not think of 'conditioning'.

deltarho[1859] · Post by **deltarho[1859]** » Nov 28, 2017 8:32

Some folk maybe annoyed with me for allowing ConfidenceInterval() to be infected with Windows. <smile>

One way to get back to OS non-specific is to test for '<> 0', remove the 'SetlastError' statements, 'Return 1' for a test number violation and 'Return 2' for a passing of a null pointer for the second parameter. Oh, and remove #include once "windows.bi".

Of course, if we are confident that the parameters are valid then we can dispense with checking the return value.

deltarho[1859] · Post by **deltarho[1859]** » Nov 28, 2017 10:47

ConfidenceInterval() now has a third parameter - t_value as Long.

t_value takes t90 (1), t95 (2) and t99 (3) corresponding to 90%, 95% and 99% confidence intervals. If the third parameter is not recognised then the level defaults to 95%.

Here I am testing all three.

Code: Select all

Dim TimingResults As Results
 
ConfidenceInterval( t(), @TimingResults, t90 )
Print "90% CI: ";TimingResults.xbar;" plus/minus";TimingResults.delta
 
ConfidenceInterval( t(), @TimingResults, t95 )
Print "95% CI: ";TimingResults.xbar;" plus/minus";TimingResults.delta
 
ConfidenceInterval( t(), @TimingResults, t99 )
Print "99% CI: ";TimingResults.xbar;" plus/minus";TimingResults.delta

and got

Code: Select all

90% CI:  114.578947882887 plus/minus 1.04244296533137
95% CI:  114.578947882887 plus/minus 1.254735646911508
99% CI:  114.578947882887 plus/minus 1.690978700678785

Obviously, the average is the same with the same data - we have differing intervals.

Notice also, with the same data, the interval increases as the % level increases; it would be illogical otherwise.

So, which level should we use? With jj2007's strategy the timings will be 'tighter' so I would use the 99% level. Generally, I would use the 95% level. If the timings are a 'little wild' then the 90% level may be the better choice.

jj2007 · Post by **jj2007** » Nov 28, 2017 11:41

deltarho[1859] wrote:
jj2007 wrote:The strategy to follow here is: Make a hundred runs, sort them, then eliminate the slowest runs until you reach the zone where you get constant results.
I used to do that - "laboratory conditions" I called it. However, if we plan using the 'conditioned' timings we must remember that the system will occasionally throw us the odd 'wobbly'.

It's not exactly the odd 'wobbly' - there is a pattern, as explained above. I've hacked together a real life example, see screenshot here. It displays the distribution of a simple Open, Read, Close sequence. Surprisingly, there are indeed a number of events that are below the flat line - and I admit I have no explanation for that.

Code timimg ( Part I )

Re: Code timimg

Re: Code timimg

Re: Code timimg

Re: Code timimg

Re: Code timimg

Re: Code timimg

Re: Code timimg

Re: Code timimg

Re: Code timimg

Re: Code timimg

Re: Code timimg

Re: Code timimg

Re: Code timimg

Re: Code timimg

Re: Code timimg