[Freebasic 32 vs 64]

General FreeBASIC programming questions.
marcov
Posts: 3454
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: [Freebasic 32 vs 64]

Post by marcov »

Coolman wrote:
marcov wrote:
Coolman wrote:Windows
Make sure you recompiled everything with -dTEST_WIN32_SEH and appropriate -Cf parameters.

By default win32 has a slower exception system than win64, but that is historic and will hopefully be fixed soon.
I did not know this setting, I use classic optimizations. you mean with this setting. 32 and 64 bit programs have an equal execution speed?
No, just that comparing is complex, and defaults are not always optimal. IOW helping to compare apples to apples, not oranges.

Proper benchmarking is an art.
Coolman
Posts: 294
Joined: Nov 05, 2010 15:09

Re: [Freebasic 32 vs 64]

Post by Coolman »

I will see.

to return to freebasic. does anyone have a benchmark source code to concretely evaluate the speed of 32 and 64 bit programs:
- loop
- between file output
- arithmetic calculation
- sorting
...

no graphic test since apparently the 32 bit version is more optimized ...

I would do it but I do not have time for the moment. but if I find a code. I want to test ...

I made gcc version 8.2.0 work with freebasic ...
St_W
Posts: 1618
Joined: Feb 11, 2009 14:24
Location: Austria
Contact:

Re: [Freebasic 32 vs 64]

Post by St_W »

Coolman wrote:why not harmonize freebasic 32 and 64 so that it generates only C code with the default compilation enabled with the option -O2. it would be more logical. and it will optimize the c-generated code in both versions.
You can use the gcc backend also for 32-bit freebasic by passing "-gen gcc" on the command line (note that you (obviously) need gcc in that case; you can download a prepared addon package from freebasic's sourceforge page). The C backend doesn't only have upsides, it also has a few downsides (e.g. some things aren't directly possible in C code, but are in asm; optimization sometimes causes trouble; and compatibility issues might arise with existing code) that's probably why the asm backend is still the default one for FBC 32.
Coolman
Posts: 294
Joined: Nov 05, 2010 15:09

Re: [Freebasic 32 vs 64]

Post by Coolman »

St_W wrote:
Coolman wrote:why not harmonize freebasic 32 and 64 so that it generates only C code with the default compilation enabled with the option -O2. it would be more logical. and it will optimize the c-generated code in both versions.
You can use the gcc backend also for 32-bit freebasic by passing "-gen gcc" on the command line (note that you (obviously) need gcc in that case; you can download a prepared addon package from freebasic's sourceforge page). The C backend doesn't only have upsides, it also has a few downsides (e.g. some things aren't directly possible in C code, but are in asm; optimization sometimes causes trouble; and compatibility issues might arise with existing code) that's probably why the asm backend is still the default one for FBC 32.
i know for gcc. thanks anyway...

I did not know that there was a problem with compatibility with gcc. All the codes I tested work ...

I found this code :

viewtopic.php?f=7&t=17702

I tested with the same compilation parameter :

Freebasic 32 bit gcc 5.2.0 : fbc -gen gcc -Wc -O2

Bubble 2
Exchange 0.25
Shell 0.25
Insertion 0.25
Quick 0.25

Freebasic 64 bit gcc 5.2.0 : fbc -gen gcc -Wc -O2

Bubble 3.75
Exchange 0.25
Shell 0.75
Insertion 0.25
Quick 0.25

no comment. the result speaks for itself
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: [Freebasic 32 vs 64]

Post by jj2007 »

There is no general rule for the speed difference between 32- and 64-bit code.
- 64-bit code can be faster if the extra registers play a role
- 64-bit code can be slower because the code cache can be exhausted due to longer instructions and addresses
- both 32- and 64-bit code can use SIMD instructions.

In the Masm32 forum, we made lots of benchmarks. Most of the time, the differences are small. Libraries written in optimised assembly are generally faster than C code, often by a factor 2-3. There are cases, however, where C code is not slower - simply because the C compiler generates the same code that a human assembly programmer would choose.
marcov
Posts: 3454
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: [Freebasic 32 vs 64]

Post by marcov »

Some additions, note that I'm talking from a compiler standpoint, with at least partial assembler runtime helpers, and for x86/x86_64 only, not necessarily 64-bit universal. (since I have a Raspberry pi 3, I have ARM64 too, and there is a G5 lurking around somewhere)
jj2007 wrote:There is no general rule for the speed difference between 32- and 64-bit code.
- 64-bit code can be faster if the extra registers play a role
- 64-bit code can be slower because the code cache can be exhausted due to longer instructions and addresses
- both 32- and 64-bit code can use SIMD instructions.
- 64-bit SIMD has ABI support (volatile registers, aligned stack) and twice the number of registers, floating point passed in SIMD registers. This (aside for the #registers) can be emulated by compilers, but only for the current program. For x86_64 it is systemwide, so also for calls into the system, 3rd party dlls etc.
- SIMD floating point is generally is faster for simple operations, but slower (than x87) for complex operations. (not counting vectorization, since that is relatively rare)
- (Unix) position independent code is cheaper on x86_64
- since structures with pointers become larger, there are data cache effects to 64-bit too, though usually only noticable in special cases. (microbenchmarks)
- A 64-bit runtime can assume SSE2/3 as minimum, so usually more routines are optimized that way. In general, the minimal CPU level is higher.
Coolman
Posts: 294
Joined: Nov 05, 2010 15:09

Re: [Freebasic 32 vs 64]

Post by Coolman »

finally. I will use the 32 bit version of freebasic. it is windows 64 compatible and most programs are faster. that said. I will soon stop using windows for linux. I would see the difference ...
srvaldez
Posts: 3373
Joined: Sep 25, 2005 21:54

Re: [Freebasic 32 vs 64]

Post by srvaldez »

@Coolman
it is your choice which compiler version you want to use, in my experience, FBx64 executables are usually faster than FBx86, except for graphics
I don't have a real world benchmark, so here's the nbody benchmark
my times on Windows 10 x64

Code: Select all

FBwin32, fbc -w all -asm intel -gen gas nbody.bas
-0.169075164
-0.169059907
elapsed time  30.43833460000681 seconds

FBwin32 fbc -w all -asm intel -gen gcc -Wc -O2 nbody.bas
-0.169075164
-0.169059907
elapsed time  12.52443109998623 seconds

FBwin64 fbc -w all -asm intel -gen gcc -Wc -O2 nbody.bas
-0.169075164
-0.169059907
elapsed time  11.45735710003646 seconds
nbody.bas

Code: Select all

'https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/nbody-gcc-1.html
' The Computer Language Benchmarks Game
' https://benchmarksgame-team.pages.debian.net/benchmarksgame/

' contributed by Christoph Bauer
'
' https://benchmarksgame-team.pages.debian.net/benchmarksgame/license.html

' translated to FreeBasic by srvaldez

Declare Function main(Byval argc As Long) As Long

	Dim As Double t=Timer
	main(50000000)
	Print "elapsed time ";timer-t;" seconds"
End


Const pi = 3.141592653589793
Const solar_mass = (4 * pi) * pi
Const days_per_year = 365.24

Type planet
	x As Double
	y As Double
	z As Double
	vx As Double
	vy As Double
	vz As Double
	mass As Double
End Type

Private Sub advance(Byval nbodies As Long, bodies() As planet, Byval dt As Double)
	Dim i As Long
	Dim j As Long
	For i = 0To nbodies-1
		Dim b As planet Ptr = @bodies(i)
		For j = i + 1 To nbodies-1
			Dim b2 As planet Ptr = @bodies(j)
			Dim dx As Double = b->x - b2->x
			Dim dy As Double = b->y - b2->y
			Dim dz As Double = b->z - b2->z
			Dim distance As Double = Sqr(((dx * dx) + (dy * dy)) + (dz * dz))
			Dim mag As Double = dt / ((distance * distance) * distance)
			b->vx -= (dx * b2->mass) * mag
			b->vy -= (dy * b2->mass) * mag
			b->vz -= (dz * b2->mass) * mag
			b2->vx += (dx * b->mass) * mag
			b2->vy += (dy * b->mass) * mag
			b2->vz += (dz * b->mass) * mag
		Next
	Next
	For i = 0 To nbodies-1
		Dim b As planet Ptr = @bodies(i)
		b->x += dt * b->vx
		b->y += dt * b->vy
		b->z += dt * b->vz
	Next
End Sub

Private Function energy(Byval nbodies As Long, bodies() As planet) As Double
	Dim e As Double
	Dim i As Long
	Dim j As Long
	e = 0.0
	For i = 0 To nbodies-1
		Dim b As planet Ptr = @bodies(i)
		e += (0.5 * b->mass) * (((b->vx * b->vx) + (b->vy * b->vy)) + (b->vz * b->vz))
		For j = i + 1 To nbodies-1
			Dim b2 As planet Ptr = @bodies(j)
			Dim dx As Double = b->x - b2->x
			Dim dy As Double = b->y - b2->y
			Dim dz As Double = b->z - b2->z
			Dim distance As Double = Sqr(((dx * dx) + (dy * dy)) + (dz * dz))
			e -= (b->mass * b2->mass) / distance
		Next
	Next
	Return e
End Function

Private Sub offset_momentum(Byval nbodies As Long, bodies() As planet)
	Dim px As Double = 0.0
	Dim py As Double = 0.0
	Dim pz As Double = 0.0
	Dim i As Long
	For i = 0 To nbodies-1
		px += bodies(i).vx * bodies(i).mass
		py += bodies(i).vy * bodies(i).mass
		pz += bodies(i).vz * bodies(i).mass
	Next
	bodies(0).vx = (-px) / ((4 * 3.141592653589793) * 3.141592653589793)
	bodies(0).vy = (-py) / ((4 * 3.141592653589793) * 3.141592653589793)
	bodies(0).vz = (-pz) / ((4 * 3.141592653589793) * 3.141592653589793)
End Sub

Const NBODIES = 5
Extern     bodies(0 To 4) As planet
Dim Shared bodies(0 To 4) As planet = {(0, 0, 0, 0, 0, 0,_
                                      (4 * 3.141592653589793) * 3.141592653589793),_
                                      (4.84143144246472090e+00, -1.16032004402742839e+00,_
                                      -1.03622044471123109e-01, 1.66007664274403694e-03 * _
                                      365.24, 7.69901118419740425e-03 * 365.24, _
                                      (-6.90460016972063023e-05) * 365.24, _
                                      9.54791938424326609e-04 * ((4 * 3.141592653589793) * _
                                      3.141592653589793)), (8.34336671824457987e+00, _
                                      4.12479856412430479e+00, -4.03523417114321381e-01, _
                                      (-2.76742510726862411e-03) * 365.24, _
                                      4.99852801234917238e-03 * 365.24, _
                                      2.30417297573763929e-05 * 365.24, _
                                      2.85885980666130812e-04 * ((4 * 3.141592653589793) * _
                                      3.141592653589793)), (1.28943695621391310e+01, _
                                      -1.51111514016986312e+01, -2.23307578892655734e-01, _
                                      2.96460137564761618e-03 * 365.24, _
                                      2.37847173959480950e-03 * 365.24, _
                                      (-2.96589568540237556e-05) * 365.24, _
                                      4.36624404335156298e-05 * ((4 * 3.141592653589793) * _
                                      3.141592653589793)), (1.53796971148509165e+01, _
                                      -2.59193146099879641e+01, 1.79258772950371181e-01, _
                                      2.68067772490389322e-03 * 365.24, _
                                      1.62824170038242295e-03 * 365.24, _
                                      (-9.51592254519715870e-05) * 365.24, _
                                      5.15138902046611451e-05 * ((4 * 3.141592653589793) * _
                                      3.141592653589793))}

Private Function main(Byval argc As Long) As Long
	Dim n As Long = argc
	Dim i As Long
	offset_momentum(5, bodies())
	Print Using "##.#########"; energy(NBODIES, bodies())
	For i = 1 To n
		advance(NBODIES, bodies(), 0.01)
	Next
	Print Using "##.#########"; energy(NBODIES, bodies())
	Return 0
End Function

main(50000000)
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: [Freebasic 32 vs 64]

Post by jj2007 »

marcov wrote:- 64-bit SIMD has
There is no limitation to the use of SIMD in 32-bit land. In my main library, there are over 500 lines containing the string "xmm". But of course, there are some old compilers around that have no SIMD support.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: [Freebasic 32 vs 64]

Post by jj2007 »

srvaldez wrote:nbody.bas
On my Win7-64 machine, the 64-bit version is almost 10% faster. Except if you cheat a little bit:

Code: Select all

Dim distance As Double = 123.456 'Sqr(((dx * dx) + (dy * dy)) + (dz * dz))
Now the 32-bit version is 26% faster. Which means the whole test is dominated by one single function, SQR().

In 32-bit code, the line above is implemented as follows:

Code: Select all

fld st(2)
fmul st, st(3)
fld st(2)
fmul st, st(3)
faddp st(1), st
fld st(1)
fmul st, st(2)
faddp st(1), st
fld st
fsqrt
Same source but 64-bit code:

Code: Select all

movapd xmm4,xmm2 
movapd xmm3,xmm1 
mulsd xmm4,xmm2  
mulsd xmm3,xmm1  
addsd xmm3,xmm4  
movapd xmm4,xmm0 
mulsd xmm4,xmm0  
addsd xmm3,xmm4  
sqrtsd xmm3,xmm3
So that is FPU for 32-bit code, SIMD for 64-bit code. The latter is faster but also much less precise. And no CPU would complain if it was fed the SIMD code in 32-bit mode. So the reason for the slowness of 32-bit code is just a dumb GCC version, nothing else.
Coolman
Posts: 294
Joined: Nov 05, 2010 15:09

Re: [Freebasic 32 vs 64]

Post by Coolman »

srvaldez wrote:@Coolman
it is your choice which compiler version you want to use, in my experience, FBx64 executables are usually faster than FBx86, except for graphics
I don't have a real world benchmark, so here's the nbody benchmark
my times on Windows 10 x64

FBwin32 fbc -w all -asm intel -gen gas
launched four times. the result is not constant

-0.169075164
-0.169059907
elapsed time 23.55617942135427 seconds

-0.169075164
-0.169059907
elapsed time 27.06140585355911 seconds

-0.169075164
-0.169059907
elapsed time 27.09792673439802 seconds

-0.169075164
-0.169059907
elapsed time 26.94121167771459 seconds

FBwin32 fbc -w all -asm intel -gen gcc -Wc -O2
launched four times. the result is not constant

-0.169075164
-0.169059907
elapsed time 10.57177684700469 seconds

-0.169075164
-0.169059907
elapsed time 12.42956154142667 seconds

-0.169075164
-0.169059907
elapsed time 13.75664050981891 seconds

-0.169075164
-0.169059907
elapsed time 13.70927023998706 seconds

FBwin64 fbc -w all -asm intel -gen gcc -Wc -O2
launched four times. the result is not constant

-0.169075164
-0.169059907
elapsed time 10.85466894134879 seconds

-0.169075164
-0.169059907
elapsed time 12.73996500298381 seconds

-0.169075164
-0.169059907
elapsed time 12.79951698612422 seconds

-0.169075164
-0.169059907
elapsed time 12.76374577032402 seconds

thank you for the example.
the results are quite similar with a small advantage for the 64 bit version. I expected better...
very interesting.
srvaldez
Posts: 3373
Joined: Sep 25, 2005 21:54

Re: [Freebasic 32 vs 64]

Post by srvaldez »

jj2007 wrote: So the reason for the slowness of 32-bit code is just a dumb GCC version, nothing else.
I don't get your point, the 32-bit version compiled with -gen gcc is almost as fast as that of the 64-bit version, the -gen gas version is the one that is slow
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: [Freebasic 32 vs 64]

Post by jj2007 »

srvaldez wrote:
jj2007 wrote: So the reason for the slowness of 32-bit code is just a dumb GCC version, nothing else.
I don't get your point, the 32-bit version compiled with -gen gcc is almost as fast as that of the 64-bit version, the -gen gas version is the one that is slow
It was actually your point:
srvaldez wrote:in my experience, FBx64 executables are usually faster than FBx86
So I took your benchmark and had a look under the hood: Voilà, mystery solved, x64 is 10% faster because of one single faster instruction, sqrtsd vs fsqrt. As soon as you comment out that single instruction, the 32-bit version becomes 26% faster than the 64-bit version.

Benchmarks should be much more balanced, representing a variety of typical tasks like loops, string processing, conversions, sorting, searching, integer and float math, graphics, etc.
srvaldez
Posts: 3373
Joined: Sep 25, 2005 21:54

Re: [Freebasic 32 vs 64]

Post by srvaldez »

here's the binary-tree benchmark, time differences between FB versions are small, my times were
FBwin32 gas, 67.26 seconds
FBwin32 gcc, 61.59 seconds
FBwin64 gcc, 52.00 seconds

binary-trees.bas

Code: Select all

/' The Computer Language Benchmarks Game
 * https://benchmarksgame-team.pages.debian.net/benchmarksgame/

   contributed by Kevin Carson
   
   https://benchmarksgame-team.pages.debian.net/benchmarksgame/license.html
   
   FreeBASIC translation by srvaldez with the help of fbfrog https://github.com/dkl/fbfrog
'/

#Define NULL 0

Type tn
	left_ As tn Ptr
	right_ As tn Ptr
End Type

Type treeNode As tn

Function NewTreeNode(Byval left_ As treeNode Ptr, Byval right_ As treeNode Ptr) As treeNode Ptr
	Dim new_ As treeNode Ptr
	new_ = Cptr(treeNode Ptr, Allocate(Sizeof(treeNode)))
	new_->left_ = left_
	new_->right_ = right_
	Return new_
End Function

Function ItemCheck(Byval tree As treeNode Ptr) As Integer
	If tree->left_ = NULL Then
		Return 1
	Else
		Return (1 + ItemCheck(tree->left_)) + ItemCheck(tree->right_)
	End If
End Function

Function BottomUpTree(Byval depth As Ulong) As treeNode Ptr
	If depth > 0 Then
		Return NewTreeNode(BottomUpTree(depth - 1), BottomUpTree(depth - 1))
	Else
		Return NewTreeNode(NULL, NULL)
	End If
End Function

Sub DeleteTree(Byval tree As treeNode Ptr)
	If tree->left_ <> NULL Then
		DeleteTree(tree->left_)
		DeleteTree(tree->right_)
	End If
	Deallocate(tree)
End Sub

Sub main(Byval N As Ulong)
	Dim As Double t=Timer
	Dim depth As Ulong
	Dim minDepth As Ulong
	Dim maxDepth As Ulong
	Dim stretchDepth As Ulong
	Dim stretchTree As treeNode Ptr
	Dim longLivedTree As treeNode Ptr
	Dim tempTree As treeNode Ptr
	minDepth = 4
	If (minDepth + 2) > N Then
		maxDepth = minDepth + 2
	Else
		maxDepth = N
	End If
	stretchDepth = maxDepth + 1
	stretchTree = BottomUpTree(stretchDepth)
	Print "        stretch tree of depth ";stretchDepth, "check: "; ItemCheck(stretchTree)
	DeleteTree(stretchTree)
	longLivedTree = BottomUpTree(maxDepth)
	For depth = minDepth To maxDepth Step 2
		Dim i As Integer
		Dim iterations As Integer
		Dim check As Integer
		iterations = 2^((maxDepth - depth) + minDepth)
		check = 0
		For i = 1 To iterations
			tempTree = BottomUpTree(depth)
			check += ItemCheck(tempTree)
			DeleteTree(tempTree)
		Next
		Print iterations," trees of depth ";depth, "check: "; check
	Next
	Print "     long lived tree of depth ";maxDepth, "check: "; ItemCheck(longLivedTree)
	Print
	Print "elapsed time is "; timer-t;" seconds"
End Sub

main(21)
Last edited by srvaldez on Jan 30, 2019 15:18, edited 2 times in total.
marcov
Posts: 3454
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: [Freebasic 32 vs 64]

Post by marcov »

srvaldez wrote:here's the binary-tree benchmark, time differences between FB versions are small, my times were
FBwin32 gas, 67.26 seconds
FBwin32 gcc, 61.59 seconds
FBwin64 gcc, 52.00 seconds
Note that binary tree might be a case where data cache effects affect win64 performance (for high numbers of nodes)

This because sizeof(treeNode) is larger in 64-bit, and thus less entries fit the cache.
Post Reply