timing benchmark using reduced precision

General FreeBASIC programming questions.
Post Reply
srvaldez
Posts: 3373
Joined: Sep 25, 2005 21:54

timing benchmark using reduced precision

Post by srvaldez »

some benchmarks that show the faster execution time for reduced precision when using the FPU, normally FB win32 uses the FPU but on windows you can also use the FPU with FB x64 by adding -Wc -mfpmath=387 on the compile command
my times

Code: Select all

division ==================================
=== single ===
division using default single precision: time  2.368007199999738 seconds            sum =  37.98161
division using reduced single precision: time  1.382252299999891 seconds            sum =  37.98161
reduced single precision division is  1.713151209804408 times faster
=== double ===
division using default double precision: time  2.3717587000001 seconds              sum =  37.98231426240503
division using reduced double precision: time  2.177881900000102 seconds            sum =  37.98231426240503
reduced double precision division is  1.089020805030791 times faster
sqr ==================================
=== single ===
sqr using default single precision: time  3.750732999999855 seconds   sum =  2.108227e+007
sqr using reduced single precision: time  1.776602799999637 seconds   sum =  2.108227e+007
reduced single precision sqr is  2.111182645890585 times faster
=== double ===
sqr using default double precision: time  3.748969399999623 seconds   sum =  21082008.97391792
sqr using reduced double precision: time  3.161076800000046 seconds   sum =  21082008.97391793
reduced double precision sqr is  1.185978588055681 times faster

Code: Select all


dim as ushort oldcw, cwdouble=&h27F, cwsingle=&h7F
dim as double t1, t2, t3, t4
print "division =================================="
print "=== single ==="
scope
	dim as single x=3.141592653589793, y, z, s

	t1=timer

	for i as integer=1 to 3000
		s=0
		for j as integer=1 to 100000
			s+=x/j
		next
	next

	t2=timer
	t3=t2-t1
	print "division using default single precision: time ";t3;" seconds","sum = ";s

	t1=timer

	asm
		fstcw word ptr [oldcw]
		fldcw word ptr [cwsingle] 'set FPU precision to single
	end asm

	for i as integer=1 to 3000
		s=0
		for j as integer=1 to 100000
			s+=x/j
		next
	next

	asm
		fldcw word ptr [oldcw] 'restore control word
	end asm

	t2=timer
	t4=t2-t1
	print "division using reduced single precision: time ";t4;" seconds","sum = ";s
	print "reduced single precision division is ";t3/t4;" times faster"
end scope
print "=== double ==="
scope
	dim as double x=3.141592653589793, y, z, s

	t1=timer

	for i as integer=1 to 3000
		s=0
		for j as integer=1 to 100000
			s+=x/j
		next
	next

	t2=timer
	t3=t2-t1
	print "division using default double precision: time ";t3;" seconds","sum = ";s

	t1=timer

	asm
		fstcw word ptr [oldcw]
		fldcw word ptr [cwdouble] 'set FPU precision to double
	end asm

	for i as integer=1 to 3000
		s=0
		for j as integer=1 to 100000
			s+=x/j
		next
	next

	asm
		fldcw word ptr [oldcw] 'restore control word
	end asm

	t2=timer
	t4=t2-t1
	print "division using reduced double precision: time ";t4;" seconds","sum = ";s
	print "reduced double precision division is ";t3/t4;" times faster"
end scope

print "sqr =================================="
print "=== single ==="
scope
	dim as single x=3.141592653589793, y, z, s

	t1=timer

	for i as integer=1 to 3000
		s=0
		for j as integer=1 to 100000
			s+=sqr(j)
		next
	next

	t2=timer
	t3=t2-t1
	print "sqr using default single precision: time ";t3;" seconds","sum = ";s

	t1=timer

	asm
		fstcw word ptr [oldcw]
		fldcw word ptr [cwsingle] 'set FPU precision to single
	end asm

	for i as integer=1 to 3000
		s=0
		for j as integer=1 to 100000
			s+=sqr(j)
		next
	next

	asm
		fldcw word ptr [oldcw] 'restore control word
	end asm

	t2=timer
	t4=t2-t1
	print "sqr using reduced single precision: time ";t4;" seconds","sum = ";s
	print "reduced single precision sqr is ";t3/t4;" times faster"
end scope
print "=== double ==="
scope
	dim as double x=3.141592653589793, y, z, s

	t1=timer

	for i as integer=1 to 3000
		s=0
		for j as integer=1 to 100000
			s+=sqr(j)
		next
	next

	t2=timer
	t3=t2-t1
	print "sqr using default double precision: time ";t3;" seconds","sum = ";s

	t1=timer

	asm
		fstcw word ptr [oldcw]
		fldcw word ptr [cwdouble] 'set FPU precision to double
	end asm

	for i as integer=1 to 3000
		s=0
		for j as integer=1 to 100000
			s+=sqr(j)
		next
	next

	asm
		fldcw word ptr [oldcw] 'restore control word
	end asm

	t2=timer
	t4=t2-t1
	print "sqr using reduced double precision: time ";t4;" seconds","sum = ";s
	print "reduced double precision sqr is ";t3/t4;" times faster"
end scope
sleep
Last edited by srvaldez on Nov 02, 2018 8:22, edited 1 time in total.
dafhi
Posts: 1640
Joined: Jun 04, 2005 9:51

Re: timing benchmark using reduced precision

Post by dafhi »

always looking for ways to speed up my non-critical applications :-)
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: timing benchmark using reduced precision

Post by jj2007 »

asm
fstcw word ptr [oldcw]
fldcw word ptr [cwsingle] 'set FPU precision to double
end asm
Typo: you obviously mean single here.
srvaldez
Posts: 3373
Joined: Sep 25, 2005 21:54

Re: timing benchmark using reduced precision

Post by srvaldez »

@jj2007
yes, copy-paste mistake - corrected, thanks for pointing it out.
[note] only division and sqr benefit from reduced precision, my hypothesis is that's because iteration is used.
Last edited by srvaldez on Nov 02, 2018 10:31, edited 2 times in total.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: timing benchmark using reduced precision

Post by jj2007 »

Your theory is probably correct. In practice, knowing that division is so terribly slow, you simply avoid it like the plague. After all, there is multiplication by the inverse value, which is fast.

My library functions include a dedicated FpuSet macro, but the default is REAL10 aka "extended double", the max precision that the hardware delivers; example PI= 3.141592653589793238, 19 digits. To notice a slowdown due to this "extreme" precision, you need a loop that a) features exotic stuff like sqrt or logarithms and b) runs a Million iterations. In these rare cases, reducing precision to REAL4 aka single is a good idea, otherwise it makes no sense at all.
Post Reply