[Freebasic 32 vs 64]

General FreeBASIC programming questions.
marcov
Posts: 2757
Joined: Jun 16, 2005 9:45
Location: Eindhoven, NL
Contact:

Re: [Freebasic 32 vs 64]

Postby marcov » Jan 28, 2019 20:49

Coolman wrote:
marcov wrote:
Coolman wrote:Windows


Make sure you recompiled everything with -dTEST_WIN32_SEH and appropriate -Cf parameters.

By default win32 has a slower exception system than win64, but that is historic and will hopefully be fixed soon.


I did not know this setting, I use classic optimizations. you mean with this setting. 32 and 64 bit programs have an equal execution speed?


No, just that comparing is complex, and defaults are not always optimal. IOW helping to compare apples to apples, not oranges.

Proper benchmarking is an art.
Coolman
Posts: 208
Joined: Nov 05, 2010 15:09

Re: [Freebasic 32 vs 64]

Postby Coolman » Jan 28, 2019 22:09

I will see.

to return to freebasic. does anyone have a benchmark source code to concretely evaluate the speed of 32 and 64 bit programs:
- loop
- between file output
- arithmetic calculation
- sorting
...

no graphic test since apparently the 32 bit version is more optimized ...

I would do it but I do not have time for the moment. but if I find a code. I want to test ...

I made gcc version 8.2.0 work with freebasic ...
St_W
Posts: 1468
Joined: Feb 11, 2009 14:24
Location: Austria
Contact:

Re: [Freebasic 32 vs 64]

Postby St_W » Jan 28, 2019 22:17

Coolman wrote:why not harmonize freebasic 32 and 64 so that it generates only C code with the default compilation enabled with the option -O2. it would be more logical. and it will optimize the c-generated code in both versions.
You can use the gcc backend also for 32-bit freebasic by passing "-gen gcc" on the command line (note that you (obviously) need gcc in that case; you can download a prepared addon package from freebasic's sourceforge page). The C backend doesn't only have upsides, it also has a few downsides (e.g. some things aren't directly possible in C code, but are in asm; optimization sometimes causes trouble; and compatibility issues might arise with existing code) that's probably why the asm backend is still the default one for FBC 32.
Coolman
Posts: 208
Joined: Nov 05, 2010 15:09

Re: [Freebasic 32 vs 64]

Postby Coolman » Jan 28, 2019 22:59

St_W wrote:
Coolman wrote:why not harmonize freebasic 32 and 64 so that it generates only C code with the default compilation enabled with the option -O2. it would be more logical. and it will optimize the c-generated code in both versions.
You can use the gcc backend also for 32-bit freebasic by passing "-gen gcc" on the command line (note that you (obviously) need gcc in that case; you can download a prepared addon package from freebasic's sourceforge page). The C backend doesn't only have upsides, it also has a few downsides (e.g. some things aren't directly possible in C code, but are in asm; optimization sometimes causes trouble; and compatibility issues might arise with existing code) that's probably why the asm backend is still the default one for FBC 32.


i know for gcc. thanks anyway...

I did not know that there was a problem with compatibility with gcc. All the codes I tested work ...

I found this code :

viewtopic.php?f=7&t=17702

I tested with the same compilation parameter :

Freebasic 32 bit gcc 5.2.0 : fbc -gen gcc -Wc -O2

Bubble 2
Exchange 0.25
Shell 0.25
Insertion 0.25
Quick 0.25

Freebasic 64 bit gcc 5.2.0 : fbc -gen gcc -Wc -O2

Bubble 3.75
Exchange 0.25
Shell 0.75
Insertion 0.25
Quick 0.25

no comment. the result speaks for itself
jj2007
Posts: 1210
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: [Freebasic 32 vs 64]

Postby jj2007 » Jan 28, 2019 23:21

There is no general rule for the speed difference between 32- and 64-bit code.
- 64-bit code can be faster if the extra registers play a role
- 64-bit code can be slower because the code cache can be exhausted due to longer instructions and addresses
- both 32- and 64-bit code can use SIMD instructions.

In the Masm32 forum, we made lots of benchmarks. Most of the time, the differences are small. Libraries written in optimised assembly are generally faster than C code, often by a factor 2-3. There are cases, however, where C code is not slower - simply because the C compiler generates the same code that a human assembly programmer would choose.
marcov
Posts: 2757
Joined: Jun 16, 2005 9:45
Location: Eindhoven, NL
Contact:

Re: [Freebasic 32 vs 64]

Postby marcov » Jan 29, 2019 9:54

Some additions, note that I'm talking from a compiler standpoint, with at least partial assembler runtime helpers, and for x86/x86_64 only, not necessarily 64-bit universal. (since I have a Raspberry pi 3, I have ARM64 too, and there is a G5 lurking around somewhere)

jj2007 wrote:There is no general rule for the speed difference between 32- and 64-bit code.
- 64-bit code can be faster if the extra registers play a role
- 64-bit code can be slower because the code cache can be exhausted due to longer instructions and addresses
- both 32- and 64-bit code can use SIMD instructions.

- 64-bit SIMD has ABI support (volatile registers, aligned stack) and twice the number of registers, floating point passed in SIMD registers. This (aside for the #registers) can be emulated by compilers, but only for the current program. For x86_64 it is systemwide, so also for calls into the system, 3rd party dlls etc.
- SIMD floating point is generally is faster for simple operations, but slower (than x87) for complex operations. (not counting vectorization, since that is relatively rare)
- (Unix) position independent code is cheaper on x86_64
- since structures with pointers become larger, there are data cache effects to 64-bit too, though usually only noticable in special cases. (microbenchmarks)
- A 64-bit runtime can assume SSE2/3 as minimum, so usually more routines are optimized that way. In general, the minimal CPU level is higher.
Coolman
Posts: 208
Joined: Nov 05, 2010 15:09

Re: [Freebasic 32 vs 64]

Postby Coolman » Jan 29, 2019 13:22

finally. I will use the 32 bit version of freebasic. it is windows 64 compatible and most programs are faster. that said. I will soon stop using windows for linux. I would see the difference ...
srvaldez
Posts: 2023
Joined: Sep 25, 2005 21:54

Re: [Freebasic 32 vs 64]

Postby srvaldez » Jan 29, 2019 14:54

@Coolman
it is your choice which compiler version you want to use, in my experience, FBx64 executables are usually faster than FBx86, except for graphics
I don't have a real world benchmark, so here's the nbody benchmark
my times on Windows 10 x64

Code: Select all

FBwin32, fbc -w all -asm intel -gen gas nbody.bas
-0.169075164
-0.169059907
elapsed time  30.43833460000681 seconds

FBwin32 fbc -w all -asm intel -gen gcc -Wc -O2 nbody.bas
-0.169075164
-0.169059907
elapsed time  12.52443109998623 seconds

FBwin64 fbc -w all -asm intel -gen gcc -Wc -O2 nbody.bas
-0.169075164
-0.169059907
elapsed time  11.45735710003646 seconds

nbody.bas

Code: Select all

'https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/nbody-gcc-1.html
' The Computer Language Benchmarks Game
' https://benchmarksgame-team.pages.debian.net/benchmarksgame/

' contributed by Christoph Bauer
'
' https://benchmarksgame-team.pages.debian.net/benchmarksgame/license.html

' translated to FreeBasic by srvaldez

Declare Function main(Byval argc As Long) As Long

   Dim As Double t=Timer
   main(50000000)
   Print "elapsed time ";timer-t;" seconds"
End


Const pi = 3.141592653589793
Const solar_mass = (4 * pi) * pi
Const days_per_year = 365.24

Type planet
   x As Double
   y As Double
   z As Double
   vx As Double
   vy As Double
   vz As Double
   mass As Double
End Type

Private Sub advance(Byval nbodies As Long, bodies() As planet, Byval dt As Double)
   Dim i As Long
   Dim j As Long
   For i = 0To nbodies-1
      Dim b As planet Ptr = @bodies(i)
      For j = i + 1 To nbodies-1
         Dim b2 As planet Ptr = @bodies(j)
         Dim dx As Double = b->x - b2->x
         Dim dy As Double = b->y - b2->y
         Dim dz As Double = b->z - b2->z
         Dim distance As Double = Sqr(((dx * dx) + (dy * dy)) + (dz * dz))
         Dim mag As Double = dt / ((distance * distance) * distance)
         b->vx -= (dx * b2->mass) * mag
         b->vy -= (dy * b2->mass) * mag
         b->vz -= (dz * b2->mass) * mag
         b2->vx += (dx * b->mass) * mag
         b2->vy += (dy * b->mass) * mag
         b2->vz += (dz * b->mass) * mag
      Next
   Next
   For i = 0 To nbodies-1
      Dim b As planet Ptr = @bodies(i)
      b->x += dt * b->vx
      b->y += dt * b->vy
      b->z += dt * b->vz
   Next
End Sub

Private Function energy(Byval nbodies As Long, bodies() As planet) As Double
   Dim e As Double
   Dim i As Long
   Dim j As Long
   e = 0.0
   For i = 0 To nbodies-1
      Dim b As planet Ptr = @bodies(i)
      e += (0.5 * b->mass) * (((b->vx * b->vx) + (b->vy * b->vy)) + (b->vz * b->vz))
      For j = i + 1 To nbodies-1
         Dim b2 As planet Ptr = @bodies(j)
         Dim dx As Double = b->x - b2->x
         Dim dy As Double = b->y - b2->y
         Dim dz As Double = b->z - b2->z
         Dim distance As Double = Sqr(((dx * dx) + (dy * dy)) + (dz * dz))
         e -= (b->mass * b2->mass) / distance
      Next
   Next
   Return e
End Function

Private Sub offset_momentum(Byval nbodies As Long, bodies() As planet)
   Dim px As Double = 0.0
   Dim py As Double = 0.0
   Dim pz As Double = 0.0
   Dim i As Long
   For i = 0 To nbodies-1
      px += bodies(i).vx * bodies(i).mass
      py += bodies(i).vy * bodies(i).mass
      pz += bodies(i).vz * bodies(i).mass
   Next
   bodies(0).vx = (-px) / ((4 * 3.141592653589793) * 3.141592653589793)
   bodies(0).vy = (-py) / ((4 * 3.141592653589793) * 3.141592653589793)
   bodies(0).vz = (-pz) / ((4 * 3.141592653589793) * 3.141592653589793)
End Sub

Const NBODIES = 5
Extern     bodies(0 To 4) As planet
Dim Shared bodies(0 To 4) As planet = {(0, 0, 0, 0, 0, 0,_
                                      (4 * 3.141592653589793) * 3.141592653589793),_
                                      (4.84143144246472090e+00, -1.16032004402742839e+00,_
                                      -1.03622044471123109e-01, 1.66007664274403694e-03 * _
                                      365.24, 7.69901118419740425e-03 * 365.24, _
                                      (-6.90460016972063023e-05) * 365.24, _
                                      9.54791938424326609e-04 * ((4 * 3.141592653589793) * _
                                      3.141592653589793)), (8.34336671824457987e+00, _
                                      4.12479856412430479e+00, -4.03523417114321381e-01, _
                                      (-2.76742510726862411e-03) * 365.24, _
                                      4.99852801234917238e-03 * 365.24, _
                                      2.30417297573763929e-05 * 365.24, _
                                      2.85885980666130812e-04 * ((4 * 3.141592653589793) * _
                                      3.141592653589793)), (1.28943695621391310e+01, _
                                      -1.51111514016986312e+01, -2.23307578892655734e-01, _
                                      2.96460137564761618e-03 * 365.24, _
                                      2.37847173959480950e-03 * 365.24, _
                                      (-2.96589568540237556e-05) * 365.24, _
                                      4.36624404335156298e-05 * ((4 * 3.141592653589793) * _
                                      3.141592653589793)), (1.53796971148509165e+01, _
                                      -2.59193146099879641e+01, 1.79258772950371181e-01, _
                                      2.68067772490389322e-03 * 365.24, _
                                      1.62824170038242295e-03 * 365.24, _
                                      (-9.51592254519715870e-05) * 365.24, _
                                      5.15138902046611451e-05 * ((4 * 3.141592653589793) * _
                                      3.141592653589793))}

Private Function main(Byval argc As Long) As Long
   Dim n As Long = argc
   Dim i As Long
   offset_momentum(5, bodies())
   Print Using "##.#########"; energy(NBODIES, bodies())
   For i = 1 To n
      advance(NBODIES, bodies(), 0.01)
   Next
   Print Using "##.#########"; energy(NBODIES, bodies())
   Return 0
End Function

main(50000000)
jj2007
Posts: 1210
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: [Freebasic 32 vs 64]

Postby jj2007 » Jan 29, 2019 16:02

marcov wrote:- 64-bit SIMD has
There is no limitation to the use of SIMD in 32-bit land. In my main library, there are over 500 lines containing the string "xmm". But of course, there are some old compilers around that have no SIMD support.
jj2007
Posts: 1210
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: [Freebasic 32 vs 64]

Postby jj2007 » Jan 29, 2019 16:19

srvaldez wrote:nbody.bas

On my Win7-64 machine, the 64-bit version is almost 10% faster. Except if you cheat a little bit:

Code: Select all

Dim distance As Double = 123.456 'Sqr(((dx * dx) + (dy * dy)) + (dz * dz))

Now the 32-bit version is 26% faster. Which means the whole test is dominated by one single function, SQR().

In 32-bit code, the line above is implemented as follows:

Code: Select all

fld st(2)
fmul st, st(3)
fld st(2)
fmul st, st(3)
faddp st(1), st
fld st(1)
fmul st, st(2)
faddp st(1), st
fld st
fsqrt

Same source but 64-bit code:

Code: Select all

movapd xmm4,xmm2
movapd xmm3,xmm1
mulsd xmm4,xmm2 
mulsd xmm3,xmm1 
addsd xmm3,xmm4 
movapd xmm4,xmm0
mulsd xmm4,xmm0 
addsd xmm3,xmm4 
sqrtsd xmm3,xmm3

So that is FPU for 32-bit code, SIMD for 64-bit code. The latter is faster but also much less precise. And no CPU would complain if it was fed the SIMD code in 32-bit mode. So the reason for the slowness of 32-bit code is just a dumb GCC version, nothing else.
Coolman
Posts: 208
Joined: Nov 05, 2010 15:09

Re: [Freebasic 32 vs 64]

Postby Coolman » Jan 29, 2019 16:49

srvaldez wrote:@Coolman
it is your choice which compiler version you want to use, in my experience, FBx64 executables are usually faster than FBx86, except for graphics
I don't have a real world benchmark, so here's the nbody benchmark
my times on Windows 10 x64



FBwin32 fbc -w all -asm intel -gen gas
launched four times. the result is not constant

-0.169075164
-0.169059907
elapsed time 23.55617942135427 seconds

-0.169075164
-0.169059907
elapsed time 27.06140585355911 seconds

-0.169075164
-0.169059907
elapsed time 27.09792673439802 seconds

-0.169075164
-0.169059907
elapsed time 26.94121167771459 seconds

FBwin32 fbc -w all -asm intel -gen gcc -Wc -O2
launched four times. the result is not constant

-0.169075164
-0.169059907
elapsed time 10.57177684700469 seconds

-0.169075164
-0.169059907
elapsed time 12.42956154142667 seconds

-0.169075164
-0.169059907
elapsed time 13.75664050981891 seconds

-0.169075164
-0.169059907
elapsed time 13.70927023998706 seconds

FBwin64 fbc -w all -asm intel -gen gcc -Wc -O2
launched four times. the result is not constant

-0.169075164
-0.169059907
elapsed time 10.85466894134879 seconds

-0.169075164
-0.169059907
elapsed time 12.73996500298381 seconds

-0.169075164
-0.169059907
elapsed time 12.79951698612422 seconds

-0.169075164
-0.169059907
elapsed time 12.76374577032402 seconds

thank you for the example.
the results are quite similar with a small advantage for the 64 bit version. I expected better...
very interesting.
srvaldez
Posts: 2023
Joined: Sep 25, 2005 21:54

Re: [Freebasic 32 vs 64]

Postby srvaldez » Jan 29, 2019 16:54

jj2007 wrote: So the reason for the slowness of 32-bit code is just a dumb GCC version, nothing else.

I don't get your point, the 32-bit version compiled with -gen gcc is almost as fast as that of the 64-bit version, the -gen gas version is the one that is slow
jj2007
Posts: 1210
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: [Freebasic 32 vs 64]

Postby jj2007 » Jan 29, 2019 18:42

srvaldez wrote:
jj2007 wrote: So the reason for the slowness of 32-bit code is just a dumb GCC version, nothing else.

I don't get your point, the 32-bit version compiled with -gen gcc is almost as fast as that of the 64-bit version, the -gen gas version is the one that is slow

It was actually your point:
srvaldez wrote:in my experience, FBx64 executables are usually faster than FBx86

So I took your benchmark and had a look under the hood: Voilà, mystery solved, x64 is 10% faster because of one single faster instruction, sqrtsd vs fsqrt. As soon as you comment out that single instruction, the 32-bit version becomes 26% faster than the 64-bit version.

Benchmarks should be much more balanced, representing a variety of typical tasks like loops, string processing, conversions, sorting, searching, integer and float math, graphics, etc.
srvaldez
Posts: 2023
Joined: Sep 25, 2005 21:54

Re: [Freebasic 32 vs 64]

Postby srvaldez » Jan 30, 2019 15:16

here's the binary-tree benchmark, time differences between FB versions are small, my times were
FBwin32 gas, 67.26 seconds
FBwin32 gcc, 61.59 seconds
FBwin64 gcc, 52.00 seconds

binary-trees.bas

Code: Select all

/' The Computer Language Benchmarks Game
 * https://benchmarksgame-team.pages.debian.net/benchmarksgame/

   contributed by Kevin Carson
   
   https://benchmarksgame-team.pages.debian.net/benchmarksgame/license.html
   
   FreeBASIC translation by srvaldez with the help of fbfrog https://github.com/dkl/fbfrog
'/

#Define NULL 0

Type tn
   left_ As tn Ptr
   right_ As tn Ptr
End Type

Type treeNode As tn

Function NewTreeNode(Byval left_ As treeNode Ptr, Byval right_ As treeNode Ptr) As treeNode Ptr
   Dim new_ As treeNode Ptr
   new_ = Cptr(treeNode Ptr, Allocate(Sizeof(treeNode)))
   new_->left_ = left_
   new_->right_ = right_
   Return new_
End Function

Function ItemCheck(Byval tree As treeNode Ptr) As Integer
   If tree->left_ = NULL Then
      Return 1
   Else
      Return (1 + ItemCheck(tree->left_)) + ItemCheck(tree->right_)
   End If
End Function

Function BottomUpTree(Byval depth As Ulong) As treeNode Ptr
   If depth > 0 Then
      Return NewTreeNode(BottomUpTree(depth - 1), BottomUpTree(depth - 1))
   Else
      Return NewTreeNode(NULL, NULL)
   End If
End Function

Sub DeleteTree(Byval tree As treeNode Ptr)
   If tree->left_ <> NULL Then
      DeleteTree(tree->left_)
      DeleteTree(tree->right_)
   End If
   Deallocate(tree)
End Sub

Sub main(Byval N As Ulong)
   Dim As Double t=Timer
   Dim depth As Ulong
   Dim minDepth As Ulong
   Dim maxDepth As Ulong
   Dim stretchDepth As Ulong
   Dim stretchTree As treeNode Ptr
   Dim longLivedTree As treeNode Ptr
   Dim tempTree As treeNode Ptr
   minDepth = 4
   If (minDepth + 2) > N Then
      maxDepth = minDepth + 2
   Else
      maxDepth = N
   End If
   stretchDepth = maxDepth + 1
   stretchTree = BottomUpTree(stretchDepth)
   Print "        stretch tree of depth ";stretchDepth, "check: "; ItemCheck(stretchTree)
   DeleteTree(stretchTree)
   longLivedTree = BottomUpTree(maxDepth)
   For depth = minDepth To maxDepth Step 2
      Dim i As Integer
      Dim iterations As Integer
      Dim check As Integer
      iterations = 2^((maxDepth - depth) + minDepth)
      check = 0
      For i = 1 To iterations
         tempTree = BottomUpTree(depth)
         check += ItemCheck(tempTree)
         DeleteTree(tempTree)
      Next
      Print iterations," trees of depth ";depth, "check: "; check
   Next
   Print "     long lived tree of depth ";maxDepth, "check: "; ItemCheck(longLivedTree)
   Print
   Print "elapsed time is "; timer-t;" seconds"
End Sub

main(21)
Last edited by srvaldez on Jan 30, 2019 15:18, edited 2 times in total.
marcov
Posts: 2757
Joined: Jun 16, 2005 9:45
Location: Eindhoven, NL
Contact:

Re: [Freebasic 32 vs 64]

Postby marcov » Jan 30, 2019 15:18

srvaldez wrote:here's the binary-tree benchmark, time differences between FB versions are small, my times were
FBwin32 gas, 67.26 seconds
FBwin32 gcc, 61.59 seconds
FBwin64 gcc, 52.00 seconds


Note that binary tree might be a case where data cache effects affect win64 performance (for high numbers of nodes)

This because sizeof(treeNode) is larger in 64-bit, and thus less entries fit the cache.

Return to “General”

Who is online

Users browsing this forum: No registered users and 17 guests