## Code timimg ( Part I )

General FreeBASIC programming questions.
deltarho
Posts: 2092
Joined: Jan 02, 2017 0:34
Location: UK

### Code timimg ( Part I )

When members time code we often find something like this:-

Code: Select all

`t1 = TimerFor i = 1 To 10000000  ' Code1 to timeNextt1 = Timer - t1Print t1 t2 = TimerFor i = 1 To 10000000  ' Code2 to timeNextt2 = Timer - t2Print t2`

Lets look at t1. t1 includes the For/Next overhead.

So, t1 = Code1Time + overhead. By the same token t2 = Code2Time + overhead ... 

Since our machines are of the asynchronous variety then there may be a difference in those two overheads but it should not be large enough to have an effect on the proceedings.

Now if t2 > t1 then we would conclude that Code1 is the faster of the two.

Replacing t2 and t1 from  we get

ie Code2Time > Code1Time since the overheads cancel. This is an absolute comparison

Which is what we want to see so why am I writing this post?

There is a temptation to consider t2/t1 and then declare that t1 was x%, or whatever, faster than t2.

This would be wrong because

and we should be comparing code2Time/code1Time (a relative comparison); which is what we get if the overhead was zero.

As the overhead increases relative to the code being timed t2/t1 tends to 1 and our conclusion will tend to there being little difference between the codes being timed.

For a relative comparison we need to remove the overhead in 

Here is some code by dodicat from here followed by typical output.

Code: Select all

`Dim Shared As Integer a(1000000) '0 to 1000000 = 1000001 elementsDim As Integer Ptr ap=New Integer '1000001 elements a(2017)=131:a(96)=30ap=131:ap=30 Dim As Double tDim As longint i For z As Long=1 To 10   i=0  t=Timer  For n As Long=1 To 10000000    i+=a(2017)-a(96)  Next  t = Timer - t  Print t,i,"array element"  Sleep 50   i=0  t=Timer  For n As Long=1 To 10000000    i+=ap -ap  Next  t = Timer - t  Print t ,i,"pointer element"  Print Next z Delete[] apSleep`

Code: Select all

` 0.02051838990728871        1010000000    array element 0.03817670961015596        1010000000    pointer element  0.02051161530336687        1010000000    array element 0.03819158580192017        1010000000    pointer element  0.02050253593680296        1010000000    array element 0.03634987128285294        1010000000    pointer element  0.02366844110090627        1010000000    array element 0.04738011395321218        1010000000    pointer element  0.02050225657169635        1010000000    array element 0.03817126199032428        1010000000    pointer element  0.02050449149238665        1010000000    array element 0.03817265881519782        1010000000    pointer element  0.02050630736596792        1010000000    array element 0.03828335724271481        1010000000    pointer element  0.02050120895266039        1010000000    array element 0.03824166199914814        1010000000    pointer element  0.0205050502225852         1010000000    array element 0.04750086952434662        1010000000    pointer element  0.02051804070114427        1010000000    array element 0.04738185998475641        1010000000    pointer element`

Here is the same code taking into account the For/Next overhead and typical output. It would be unwise to use a single estimate of the overhead so we take an average of six.

Code: Select all

`Dim Shared As Integer a(1000000) '0 to 1000000 = 1000001 elementsDim As Integer Ptr ap=New Integer '1000001 elements a(2017)=131:a(96)=30ap=131:ap=30 Dim As Double t, tover, toverAveDim As Ulongint i For i = 1 To 6  tover = Timer  For n As Long=1 To 10000000  Next  tover = Timer - tover  toverAve += tover  Sleep 1NexttoverAve = toverAve/6 For z As Long=1 To 10   i=0  t=Timer  For n As Long=1 To 10000000    i+=a(2017)-a(96)  Next  t = Timer - t- toverAve  Print t,i,"array element"  Sleep 50   i=0  t=Timer  For n As Long=1 To 10000000    i+=ap -ap  Next  t = Timer - t - toverAve  Print t ,i,"pointer element"  Print Next z Delete[] apSleep`

Code: Select all

` 0.008781551379824171       1010000000    array element 0.02866473485653676        1010000000    pointer element  0.001710750481704215       1010000000    array element 0.02858197294160614        1010000000    pointer element  0.001690426669996858       1010000000    array element 0.01939477177482973        1010000000    pointer element  0.001709563180239959       1010000000    array element 0.01936620669187827        1010000000    pointer element  0.001690217146319784       1010000000    array element 0.01934846700763004        1010000000    pointer element  0.001707747307155172       1010000000    array element 0.02857917929063691        1010000000    pointer element  0.001693220321212108       1010000000    array element 0.02861382056503925        1010000000    pointer element  0.001693150479386452       1010000000    array element 0.02860062056277646        1010000000    pointer element  0.001684769525771346       1010000000    array element 0.01948514639013399        1010000000    pointer element  0.001719829847752541       1010000000    array element 0.01935908288192523        1010000000    pointer element`

If we subtract array from pointer in the first run we get the absolute difference since the overheads 'cancel'. There are no overheads in the second run but we will get similar absolute differences on subtraction.

The difference between the runs is when we consider relative differences. With the first run it looks like array is about twice as fast as pointer. However, that is not true. The truth is to found in the second run. Now we see that array is about 10 times faster.

If all we want to know is whether Code A is faster than Code B then the first method is fine. However, if we want to decide whether to use Code A or Code B in our code then the first method may lead us astray.

Remember my writing "As the overhead increases relative to the code being timed t2/t1 tends to 1 and our conclusion will tend to there being little difference between the codes being timed." That is what is happening in the above test code. The overhead is closing the ratio.
Last edited by deltarho on Dec 01, 2017 10:03, edited 1 time in total.
Munair
Posts: 836
Joined: Oct 19, 2017 15:00
Location: 't Zand, NL
Contact:

### Re: Code timimg

Whe doing the speed test on the four LeapYear algorithms, it was obvious that the order of algorithms was of influence. One would be faster when run second. Another when run third and yet another when run fourth. Therefore, the best result would be when running in all possible orders, preferably two or three times.

deltarho
Posts: 2092
Joined: Jan 02, 2017 0:34
Location: UK

### Re: Code timimg

That is because our machines are, for the most part, asynchronous. Probably the best tack would be to consider algorithms as separate instance, and not together in one run, with each instance been tested several times and averaged. We could then look at the instances at different times. We will not get perfection with a PC - they are not designed for definitive timing.
Munair
Posts: 836
Joined: Oct 19, 2017 15:00
Location: 't Zand, NL
Contact:

### Re: Code timimg

deltarho wrote:That is because our machines are, for the most part, asynchronous. Probably the best tack would be to consider algorithms as separate instance, and not together in one run, with each instance been tested several times and averaged. We could then look at the instances at different times. We will not get perfection with a PC - they are not designed for definitive timing.

They were run each with their own loop. Perhaps by instance you mean by executable.
paul doe
Posts: 960
Joined: Jul 25, 2017 17:22
Location: Argentina

### Re: Code timimg

deltarho wrote:That is because our machines are, for the most part, asynchronous. Probably the best tack would be to consider algorithms as separate instance, and not together in one run, with each instance been tested several times and averaged. We could then look at the instances at different times. We will not get perfection with a PC - they are not designed for definitive timing.

Indeed. However, as you correctly state, algorithms should be tested with a real use case, just not completely standalone. There are many things that can (unpredictably) change the throughput of the code.
The EMA function that I posted before can be slightly modified to give the average of the session (or rather, from the point where you initialize it):

Code: Select all

`function emaT( byval value as double = 0.0, byval zero as boolean = false ) as double   /'      EMA (Exponential Moving Average) implementation   '/   static as double avg   static as uinteger times      if( zero = true ) then      avg = 0.0      times = 0   end if      dim as double smoothFactor = 2.0 / ( 1.0 + times )      avg = avg * ( 1.0 - smoothFactor ) + value * smoothFactor       times += 1      return( avg )end function`

As you can see, it doesn't take a 'period' parameter anymore, it simply accumulates the results and gives you the EMA of that. The longer you probe the code, the closer to the 'real' average the measurement is. No measurement is perfect of course, but you know that a little too well ;)
deltarho
Posts: 2092
Joined: Jan 02, 2017 0:34
Location: UK

### Re: Code timimg

Munair wrote:Perhaps by instance you mean by executable.

Yes.

Most of my code timing is involved with procedures and I am in the millisecond or upper microsecond domains and loops in the thousands. With this type of timing the For/Next overhead has very little impact. The LeapYear tests, even though they employed large loops, had code being tested very much in the millisecond domain and, there too, the For/Next overhead has little impact. With the dodicat code above where we are examining single statements, which in themselves are in the nanosecond domain, then the For/Next overhead impacts heavily with relative comparisons.

I suppose there is an argument to use the second method above in all timing so that we do not have to question whether or not the overhead has any impact.
deltarho
Posts: 2092
Joined: Jan 02, 2017 0:34
Location: UK

### Re: Code timimg

Of course, we have grounds for questioning why we are comparing code which resides in the nanosecond domain anyway.

The absolute difference in either of the tables above is about 2 nanoseconds. Suppose Code1 was 4ns and Code2 was 2ns then we can say that Code2 was twice as fast as Code1. However, we would need to execute the faster of the two 100 million times in our application session to get a reward of 200 milliseconds.

If my application did not use anything like 100 million of the faster code and Code1 was more readable than Code2 then I would go for, yes, Code1. There is not much that I can do in 200, or less, milliseconds. It takes me longer than that to get out of my chair, so putting the kettle on is out. Quite frankly, if I have not got time to put the kettle on then any speed improvement would not qualify as significant. <laugh>
paul doe
Posts: 960
Joined: Jul 25, 2017 17:22
Location: Argentina

### Re: Code timimg

deltarho wrote:If my application did not use anything like 100 million of the faster code and Code1 was more readable than Code2 then I would go for, yes, Code1. There is not much that I can do in 200, or less, milliseconds. It takes me longer than that to get out of my chair, so putting the kettle on is out. Quite frankly, if I have not got time to put the kettle on then any speed improvement would not qualify as significant. <laugh>

Hahaha indeed. There are some areas (games, for example) where such things are significant, but for the rest I agree, the difference is immaterial. I also share the preference for clean code over uber-optimized-but-ugly code.

It is worth noting, however, that these seemingly 'immaterial' things do impact one area where code should be as tight as is humanly possible, no matter the cost: data structure implementation. They have a determinant impact on the speed of your code, especially the algorithms used. A cleverly implemented algorithm cannot truly shine if the data structures that uses are mediocre at best. It is a blessing that FreeBasic actually allows you to implement these in an efficient way.
dodicat
Posts: 5991
Joined: Jan 10, 2006 20:30
Location: Scotland

### Re: Code timimg

Using only stack variables.
Randomly choosing which to test first.
Comparing totals with Paul Doe's function.
And for the moment, not completely omitting the Heisenberg' effect ... deltarho, which looks like an indexed pointer anyway.
(the testing code bloats)
-gen gas no optimisations.

Code: Select all

`'Paul Doefunction emaT( byval value as double = 0.0, byval zero as boolean = false ) as double   /'      EMA (Exponential Moving Average) implementation   '/   static as double avg   static as uinteger times      if( zero = true ) then      avg = 0.0      times = 0   end if      dim as double smoothFactor = 2.0 / ( 1.0 + times )      avg = avg * ( 1.0 - smoothFactor ) + value * smoothFactor       times += 1      return( avg )end function#macro external(pd)pd+=emat(t)#endmacro'=================================================='On the stackDim As Integer a(10000) Dim As Integer Ptr ap=@a(0) a(2017)=131:a(96)=30'ap=131:ap=30 Dim As Double tDim As longint idim as double accA,accP 'time accunulatorsdim as double PDA,PDP   '      ""           for Paul doe function For z As Long=1 To 20   if rnd >.5 then 'whih one first?          i=0     t=timer     For n As Long=1 To 10000000        i+=a(2017)-a(96)  Next  t=timer-t  accA+=t  external(PDA)  Print t,i,"array element"  Sleep 50   i=0  t=Timer  For n As Long=1 To 10000000         i+=ap -ap  Next  t=timer-t  accP+=t   external(PDP)  Print t ,i,"pointer element"  Print  sleep 50  else    i=0  t=Timer  For n As Long=1 To 10000000         i+=ap -ap  Next  t=timer-t  accP+=t   external(PDP)  Print t ,i,"pointer element"  sleep 50  i=0  t=timer   For n As Long=1 To 10000000          i+=a(2017)-a(96)  Next  t=timer-t  accA+=t   external(PDA)  Print t,i,"array element"  sleep 50  Print  end if Next z 'Delete[] apprintprint "Toatal array   time ";accA,"  PAUL DOE'S=";PDAprint "Toatal pointer time ";accP,"  PAUL DOE'S=";PDPSleep `
deltarho
Posts: 2092
Joined: Jan 02, 2017 0:34
Location: UK

### Re: Code timimg

EMA is usually employed on two dimensional functions with the x-axis as temporal and should not be used where the x-axis is run number.

Ideally, for 20 tests, we should use the t-distribution with confidence limits where we would conclude with something like:-

95% CL: t ± Δt

As with above we cannot divide one total with another for a relative comparison because the For/Next overhead is still significant.

@paul doe
data structure implementation - haven't been there so cannot comment.
dodicat
Posts: 5991
Joined: Jan 10, 2006 20:30
Location: Scotland

### Re: Code timimg

Silly.
What was I thinking of ??
Last edited by dodicat on Nov 27, 2017 19:25, edited 1 time in total.
deltarho
Posts: 2092
Joined: Jan 02, 2017 0:34
Location: UK

### Re: Code timimg

I have no idea what dodicat is up to now so either he has been at the sauce or I have, and I haven't. Something tells me that if I did know what he was doing then it would be of no use to me and since I am scratching my head big time on something else at the moment I will pass. If anyone would like to enlighten me then feel free to do so. <smile>
paul doe
Posts: 960
Joined: Jul 25, 2017 17:22
Location: Argentina

### Re: Code timimg

deltarho wrote:EMA is usually employed on two dimensional functions with the x-axis as temporal and should not be used where the x-axis is run number.

Ideally, for 20 tests, we should use the t-distribution with confidence limits where we would conclude with something like:-

95% CL: t ± Δt

As with above we cannot divide one total with another for a relative comparison because the For/Next overhead is still significant.

?

The EMA is simply another average (and isn't even the best of the bunch), although a different kind that the simple average. The former is an Infinite Impulse Response type of filter, the latter isn't. It is usually used with time series, which doesn't mean it cannot be used otherwise. Doing 20 tests isn't nearly enough for me (there's too much 'randomness' between measurements), unless those 20 tests were composed of, say, 10000 samples. Can't tell if you're assuming this or not, but I suppose you do.
deltarho
Posts: 2092
Joined: Jan 02, 2017 0:34
Location: UK

### Re: Code timimg

Applying a weighting factor to our problem is nonsense. If we dropped our twenty tests on to the floor and then picked them up in any order it should not make a blind bit of difference to our analysis. Our problem is not a function of time - it is time. We are actually taking 10^7 samples for each test.

I think maybe the first post has not been fully read otherwise moving averages and the like would not have been mentioned. <smile>
fxm
Posts: 9260
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

### Re: Code timimg

dodicat wrote:Maybe keep any looping dead time buried

Code: Select all

`dim as integer a=48dim as integer ptr ip=@adim shared as string sdim as double tdim as double accI,accPfor z as long=1 to 20        t=timers=string(100000000,a)t=timer-taccI+=tprint t,s,"integer"sleep 50t=timers=string(100000000,ip)t=timer-taccP+=tprint t,s,"pointer index"printsleep 50nextprint "Straight integer time ";accIprint "Integer pointer  time ";accPsleep `

But since it is buried, maybe the method is fake.

From your code, I think there is only one read of 'a' and 'ip', before filling with this read value the 100000000 characters of the two strings.