## timing benchmark using reduced precision

General FreeBASIC programming questions.
srvaldez
Posts: 2252
Joined: Sep 25, 2005 21:54

### timing benchmark using reduced precision

some benchmarks that show the faster execution time for reduced precision when using the FPU, normally FB win32 uses the FPU but on windows you can also use the FPU with FB x64 by adding -Wc -mfpmath=387 on the compile command
my times

Code: Select all

`division ===================================== single ===division using default single precision: time  2.368007199999738 seconds            sum =  37.98161division using reduced single precision: time  1.382252299999891 seconds            sum =  37.98161reduced single precision division is  1.713151209804408 times faster=== double ===division using default double precision: time  2.3717587000001 seconds              sum =  37.98231426240503division using reduced double precision: time  2.177881900000102 seconds            sum =  37.98231426240503reduced double precision division is  1.089020805030791 times fastersqr ===================================== single ===sqr using default single precision: time  3.750732999999855 seconds   sum =  2.108227e+007sqr using reduced single precision: time  1.776602799999637 seconds   sum =  2.108227e+007reduced single precision sqr is  2.111182645890585 times faster=== double ===sqr using default double precision: time  3.748969399999623 seconds   sum =  21082008.97391792sqr using reduced double precision: time  3.161076800000046 seconds   sum =  21082008.97391793reduced double precision sqr is  1.185978588055681 times faster`

Code: Select all

`dim as ushort oldcw, cwdouble=&h27F, cwsingle=&h7Fdim as double t1, t2, t3, t4print "division =================================="print "=== single ==="scope   dim as single x=3.141592653589793, y, z, s   t1=timer   for i as integer=1 to 3000      s=0      for j as integer=1 to 100000         s+=x/j      next   next   t2=timer   t3=t2-t1   print "division using default single precision: time ";t3;" seconds","sum = ";s   t1=timer   asm      fstcw word ptr [oldcw]      fldcw word ptr [cwsingle] 'set FPU precision to single   end asm   for i as integer=1 to 3000      s=0      for j as integer=1 to 100000         s+=x/j      next   next   asm      fldcw word ptr [oldcw] 'restore control word   end asm   t2=timer   t4=t2-t1   print "division using reduced single precision: time ";t4;" seconds","sum = ";s   print "reduced single precision division is ";t3/t4;" times faster"end scopeprint "=== double ==="scope   dim as double x=3.141592653589793, y, z, s   t1=timer   for i as integer=1 to 3000      s=0      for j as integer=1 to 100000         s+=x/j      next   next   t2=timer   t3=t2-t1   print "division using default double precision: time ";t3;" seconds","sum = ";s   t1=timer   asm      fstcw word ptr [oldcw]      fldcw word ptr [cwdouble] 'set FPU precision to double   end asm   for i as integer=1 to 3000      s=0      for j as integer=1 to 100000         s+=x/j      next   next   asm      fldcw word ptr [oldcw] 'restore control word   end asm   t2=timer   t4=t2-t1   print "division using reduced double precision: time ";t4;" seconds","sum = ";s   print "reduced double precision division is ";t3/t4;" times faster"end scopeprint "sqr =================================="print "=== single ==="scope   dim as single x=3.141592653589793, y, z, s   t1=timer   for i as integer=1 to 3000      s=0      for j as integer=1 to 100000         s+=sqr(j)      next   next   t2=timer   t3=t2-t1   print "sqr using default single precision: time ";t3;" seconds","sum = ";s   t1=timer   asm      fstcw word ptr [oldcw]      fldcw word ptr [cwsingle] 'set FPU precision to single   end asm   for i as integer=1 to 3000      s=0      for j as integer=1 to 100000         s+=sqr(j)      next   next   asm      fldcw word ptr [oldcw] 'restore control word   end asm   t2=timer   t4=t2-t1   print "sqr using reduced single precision: time ";t4;" seconds","sum = ";s   print "reduced single precision sqr is ";t3/t4;" times faster"end scopeprint "=== double ==="scope   dim as double x=3.141592653589793, y, z, s   t1=timer   for i as integer=1 to 3000      s=0      for j as integer=1 to 100000         s+=sqr(j)      next   next   t2=timer   t3=t2-t1   print "sqr using default double precision: time ";t3;" seconds","sum = ";s   t1=timer   asm      fstcw word ptr [oldcw]      fldcw word ptr [cwdouble] 'set FPU precision to double   end asm   for i as integer=1 to 3000      s=0      for j as integer=1 to 100000         s+=sqr(j)      next   next   asm      fldcw word ptr [oldcw] 'restore control word   end asm   t2=timer   t4=t2-t1   print "sqr using reduced double precision: time ";t4;" seconds","sum = ";s   print "reduced double precision sqr is ";t3/t4;" times faster"end scopesleep`
Last edited by srvaldez on Nov 02, 2018 8:22, edited 1 time in total.
dafhi
Posts: 1329
Joined: Jun 04, 2005 9:51

### Re: timing benchmark using reduced precision

always looking for ways to speed up my non-critical applications :-)
jj2007
Posts: 1336
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

### Re: timing benchmark using reduced precision

asm
fstcw word ptr [oldcw]
fldcw word ptr [cwsingle] 'set FPU precision to double
end asm
Typo: you obviously mean single here.
srvaldez
Posts: 2252
Joined: Sep 25, 2005 21:54

### Re: timing benchmark using reduced precision

@jj2007
yes, copy-paste mistake - corrected, thanks for pointing it out.
[note] only division and sqr benefit from reduced precision, my hypothesis is that's because iteration is used.
Last edited by srvaldez on Nov 02, 2018 10:31, edited 2 times in total.
jj2007
Posts: 1336
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

### Re: timing benchmark using reduced precision

Your theory is probably correct. In practice, knowing that division is so terribly slow, you simply avoid it like the plague. After all, there is multiplication by the inverse value, which is fast.

My library functions include a dedicated FpuSet macro, but the default is REAL10 aka "extended double", the max precision that the hardware delivers; example PI= 3.141592653589793238, 19 digits. To notice a slowdown due to this "extreme" precision, you need a loop that a) features exotic stuff like sqrt or logarithms and b) runs a Million iterations. In these rare cases, reducing precision to REAL4 aka single is a good idea, otherwise it makes no sense at all.