timing benchmark using reduced precision

General FreeBASIC programming questions.
srvaldez
Posts: 2025
Joined: Sep 25, 2005 21:54

timing benchmark using reduced precision

Postby srvaldez » Oct 31, 2018 17:54

some benchmarks that show the faster execution time for reduced precision when using the FPU, normally FB win32 uses the FPU but on windows you can also use the FPU with FB x64 by adding -Wc -mfpmath=387 on the compile command
my times

Code: Select all

division ==================================
=== single ===
division using default single precision: time  2.368007199999738 seconds            sum =  37.98161
division using reduced single precision: time  1.382252299999891 seconds            sum =  37.98161
reduced single precision division is  1.713151209804408 times faster
=== double ===
division using default double precision: time  2.3717587000001 seconds              sum =  37.98231426240503
division using reduced double precision: time  2.177881900000102 seconds            sum =  37.98231426240503
reduced double precision division is  1.089020805030791 times faster
sqr ==================================
=== single ===
sqr using default single precision: time  3.750732999999855 seconds   sum =  2.108227e+007
sqr using reduced single precision: time  1.776602799999637 seconds   sum =  2.108227e+007
reduced single precision sqr is  2.111182645890585 times faster
=== double ===
sqr using default double precision: time  3.748969399999623 seconds   sum =  21082008.97391792
sqr using reduced double precision: time  3.161076800000046 seconds   sum =  21082008.97391793
reduced double precision sqr is  1.185978588055681 times faster

Code: Select all


dim as ushort oldcw, cwdouble=&h27F, cwsingle=&h7F
dim as double t1, t2, t3, t4
print "division =================================="
print "=== single ==="
scope
   dim as single x=3.141592653589793, y, z, s

   t1=timer

   for i as integer=1 to 3000
      s=0
      for j as integer=1 to 100000
         s+=x/j
      next
   next

   t2=timer
   t3=t2-t1
   print "division using default single precision: time ";t3;" seconds","sum = ";s

   t1=timer

   asm
      fstcw word ptr [oldcw]
      fldcw word ptr [cwsingle] 'set FPU precision to single
   end asm

   for i as integer=1 to 3000
      s=0
      for j as integer=1 to 100000
         s+=x/j
      next
   next

   asm
      fldcw word ptr [oldcw] 'restore control word
   end asm

   t2=timer
   t4=t2-t1
   print "division using reduced single precision: time ";t4;" seconds","sum = ";s
   print "reduced single precision division is ";t3/t4;" times faster"
end scope
print "=== double ==="
scope
   dim as double x=3.141592653589793, y, z, s

   t1=timer

   for i as integer=1 to 3000
      s=0
      for j as integer=1 to 100000
         s+=x/j
      next
   next

   t2=timer
   t3=t2-t1
   print "division using default double precision: time ";t3;" seconds","sum = ";s

   t1=timer

   asm
      fstcw word ptr [oldcw]
      fldcw word ptr [cwdouble] 'set FPU precision to double
   end asm

   for i as integer=1 to 3000
      s=0
      for j as integer=1 to 100000
         s+=x/j
      next
   next

   asm
      fldcw word ptr [oldcw] 'restore control word
   end asm

   t2=timer
   t4=t2-t1
   print "division using reduced double precision: time ";t4;" seconds","sum = ";s
   print "reduced double precision division is ";t3/t4;" times faster"
end scope

print "sqr =================================="
print "=== single ==="
scope
   dim as single x=3.141592653589793, y, z, s

   t1=timer

   for i as integer=1 to 3000
      s=0
      for j as integer=1 to 100000
         s+=sqr(j)
      next
   next

   t2=timer
   t3=t2-t1
   print "sqr using default single precision: time ";t3;" seconds","sum = ";s

   t1=timer

   asm
      fstcw word ptr [oldcw]
      fldcw word ptr [cwsingle] 'set FPU precision to single
   end asm

   for i as integer=1 to 3000
      s=0
      for j as integer=1 to 100000
         s+=sqr(j)
      next
   next

   asm
      fldcw word ptr [oldcw] 'restore control word
   end asm

   t2=timer
   t4=t2-t1
   print "sqr using reduced single precision: time ";t4;" seconds","sum = ";s
   print "reduced single precision sqr is ";t3/t4;" times faster"
end scope
print "=== double ==="
scope
   dim as double x=3.141592653589793, y, z, s

   t1=timer

   for i as integer=1 to 3000
      s=0
      for j as integer=1 to 100000
         s+=sqr(j)
      next
   next

   t2=timer
   t3=t2-t1
   print "sqr using default double precision: time ";t3;" seconds","sum = ";s

   t1=timer

   asm
      fstcw word ptr [oldcw]
      fldcw word ptr [cwdouble] 'set FPU precision to double
   end asm

   for i as integer=1 to 3000
      s=0
      for j as integer=1 to 100000
         s+=sqr(j)
      next
   next

   asm
      fldcw word ptr [oldcw] 'restore control word
   end asm

   t2=timer
   t4=t2-t1
   print "sqr using reduced double precision: time ";t4;" seconds","sum = ";s
   print "reduced double precision sqr is ";t3/t4;" times faster"
end scope
sleep
Last edited by srvaldez on Nov 02, 2018 8:22, edited 1 time in total.
dafhi
Posts: 1241
Joined: Jun 04, 2005 9:51

Re: timing benchmark using reduced precision

Postby dafhi » Nov 02, 2018 3:33

always looking for ways to speed up my non-critical applications :-)
jj2007
Posts: 1210
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: timing benchmark using reduced precision

Postby jj2007 » Nov 02, 2018 3:52

asm
fstcw word ptr [oldcw]
fldcw word ptr [cwsingle] 'set FPU precision to double
end asm
Typo: you obviously mean single here.
srvaldez
Posts: 2025
Joined: Sep 25, 2005 21:54

Re: timing benchmark using reduced precision

Postby srvaldez » Nov 02, 2018 8:24

@jj2007
yes, copy-paste mistake - corrected, thanks for pointing it out.
[note] only division and sqr benefit from reduced precision, my hypothesis is that's because iteration is used.
Last edited by srvaldez on Nov 02, 2018 10:31, edited 2 times in total.
jj2007
Posts: 1210
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: timing benchmark using reduced precision

Postby jj2007 » Nov 02, 2018 9:37

Your theory is probably correct. In practice, knowing that division is so terribly slow, you simply avoid it like the plague. After all, there is multiplication by the inverse value, which is fast.

My library functions include a dedicated FpuSet macro, but the default is REAL10 aka "extended double", the max precision that the hardware delivers; example PI= 3.141592653589793238, 19 digits. To notice a slowdown due to this "extreme" precision, you need a loop that a) features exotic stuff like sqrt or logarithms and b) runs a Million iterations. In these rare cases, reducing precision to REAL4 aka single is a good idea, otherwise it makes no sense at all.

Return to “General”

Who is online

Users browsing this forum: No registered users and 21 guests