Speed of FreeBasic
-
- Posts: 22
- Joined: Apr 23, 2013 19:12
- Contact:
Speed of FreeBasic
Hello everyone,
I would like to ask, there are still discussions about which language is faster, slower, etc.
It seems to me that the FB is very very fast.
Is there a recent test?
E.g. at least relevant comparisons of FB vs Python vs C++ vs JAVA.
Thank you in advance! ;-)
P. S.
Here I found some trivial comparison, but there is only FreePascal.
https://geekregator.com/2015-01-15-benc ... ascal.html
I would like to ask, there are still discussions about which language is faster, slower, etc.
It seems to me that the FB is very very fast.
Is there a recent test?
E.g. at least relevant comparisons of FB vs Python vs C++ vs JAVA.
Thank you in advance! ;-)
P. S.
Here I found some trivial comparison, but there is only FreePascal.
https://geekregator.com/2015-01-15-benc ... ascal.html
Re: Speed of FreeBasic
Yes, many are thinking it is almost if not as fast as C. Here you have a comparison:zxretrosoft wrote:It seems to me that the FB is very very fast.
Is there a recent test?
https://benchmarksgame-team.pages.debia ... stest.html
Edit: inserted new address
Last edited by lizard on Jun 15, 2018 23:36, edited 1 time in total.
Re: Speed of FreeBasic
Here is the raw C code translated without any external libraries.
With 32 bit -O3 optimisation, the loop is short circuited, so no result is available.
With 64 bit -O3 optimisation I get about 2.56 seconds :
Win 10
fb 1.05
32 bit gcc-5.2.0
64 bit gcc-5.2.0
Code: Select all
sub Sieve( maxNum as long)
dim as long i, j
dim as byte ptr _data
_Data = allocate(maxNum + 1)
clear *_data,1, maxNum+1
for i=2 to maxNum
if (_Data[i]) then
for j=i+i to maxNum
_Data[j]=0
next j
end if
next i
deallocate(_Data)
end sub
dim as double t,acc,diff
for z as long=1 to 7
t=timer
for n as long=1 to 10000
sieve(100000)
next
diff=timer-t
acc+=diff
print diff,z; " of ";7
next z
print "Mean ";acc/7
sleep
With 64 bit -O3 optimisation I get about 2.56 seconds :
Code: Select all
2.627888816758059 1 of 7
2.551011909265071 2 of 7
2.55284057641984 3 of 7
2.55291243645479 4 of 7
2.553794946317794 5 of 7
2.551769177312963 6 of 7
2.551165552897146 7 of 7
Mean 2.563054773632238
fb 1.05
32 bit gcc-5.2.0
64 bit gcc-5.2.0
Re: Speed of FreeBasic
I translated the C n-body program to FB http://benchmarksgame.alioth.debian.org/u64q/nbody.html
compile command: fbc -w all -gen gcc -fpu sse -Wc -O3 n-body.bas
on my Mac I get the following output
strangely if I remove the Private before the functions and subs the time is about 12 seconds
compile command: fbc -w all -gen gcc -fpu sse -Wc -O3 n-body.bas
on my Mac I get the following output
Code: Select all
-0.169075164
-0.169059907
elapsed time 11.43654584884644 seconds
Code: Select all
'http://benchmarksgame.alioth.debian.org/u64q/program.php?test=nbody&lang=gcc&id=1
' The Computer Language Benchmarks Game
' http://benchmarksgame.alioth.debian.org/
' contributed by Christoph Bauer
'
' http://benchmarksgame.alioth.debian.org/license.html
' translated to FreeBasic by srvaldez
Declare Function main(Byval argc As Long) As Long
Dim As Double t=Timer
main(50000000)
Print "elapsed time ";timer-t;" seconds"
End
Const pi = 3.141592653589793
Const solar_mass = (4 * pi) * pi
Const days_per_year = 365.24
Type planet
x As Double
y As Double
z As Double
vx As Double
vy As Double
vz As Double
mass As Double
End Type
Private Sub advance(Byval nbodies As Long, bodies() As planet, Byval dt As Double)
Dim i As Long
Dim j As Long
For i = 0To nbodies-1
Dim b As planet Ptr = @bodies(i)
For j = i + 1 To nbodies-1
Dim b2 As planet Ptr = @bodies(j)
Dim dx As Double = b->x - b2->x
Dim dy As Double = b->y - b2->y
Dim dz As Double = b->z - b2->z
Dim distance As Double = Sqr(((dx * dx) + (dy * dy)) + (dz * dz))
Dim mag As Double = dt / ((distance * distance) * distance)
b->vx -= (dx * b2->mass) * mag
b->vy -= (dy * b2->mass) * mag
b->vz -= (dz * b2->mass) * mag
b2->vx += (dx * b->mass) * mag
b2->vy += (dy * b->mass) * mag
b2->vz += (dz * b->mass) * mag
Next
Next
For i = 0 To nbodies-1
Dim b As planet Ptr = @bodies(i)
b->x += dt * b->vx
b->y += dt * b->vy
b->z += dt * b->vz
Next
End Sub
Private Function energy(Byval nbodies As Long, bodies() As planet) As Double
Dim e As Double
Dim i As Long
Dim j As Long
e = 0.0
For i = 0 To nbodies-1
Dim b As planet Ptr = @bodies(i)
e += (0.5 * b->mass) * (((b->vx * b->vx) + (b->vy * b->vy)) + (b->vz * b->vz))
For j = i + 1 To nbodies-1
Dim b2 As planet Ptr = @bodies(j)
Dim dx As Double = b->x - b2->x
Dim dy As Double = b->y - b2->y
Dim dz As Double = b->z - b2->z
Dim distance As Double = Sqr(((dx * dx) + (dy * dy)) + (dz * dz))
e -= (b->mass * b2->mass) / distance
Next
Next
Return e
End Function
Private Sub offset_momentum(Byval nbodies As Long, bodies() As planet)
Dim px As Double = 0.0
Dim py As Double = 0.0
Dim pz As Double = 0.0
Dim i As Long
For i = 0 To nbodies-1
px += bodies(i).vx * bodies(i).mass
py += bodies(i).vy * bodies(i).mass
pz += bodies(i).vz * bodies(i).mass
Next
bodies(0).vx = (-px) / ((4 * 3.141592653589793) * 3.141592653589793)
bodies(0).vy = (-py) / ((4 * 3.141592653589793) * 3.141592653589793)
bodies(0).vz = (-pz) / ((4 * 3.141592653589793) * 3.141592653589793)
End Sub
Const NBODIES = 5
Extern bodies(0 To 4) As planet
Dim Shared bodies(0 To 4) As planet = {(0, 0, 0, 0, 0, 0,_
(4 * 3.141592653589793) * 3.141592653589793),_
(4.84143144246472090e+00, -1.16032004402742839e+00,_
-1.03622044471123109e-01, 1.66007664274403694e-03 * _
365.24, 7.69901118419740425e-03 * 365.24, _
(-6.90460016972063023e-05) * 365.24, _
9.54791938424326609e-04 * ((4 * 3.141592653589793) * _
3.141592653589793)), (8.34336671824457987e+00, _
4.12479856412430479e+00, -4.03523417114321381e-01, _
(-2.76742510726862411e-03) * 365.24, _
4.99852801234917238e-03 * 365.24, _
2.30417297573763929e-05 * 365.24, _
2.85885980666130812e-04 * ((4 * 3.141592653589793) * _
3.141592653589793)), (1.28943695621391310e+01, _
-1.51111514016986312e+01, -2.23307578892655734e-01, _
2.96460137564761618e-03 * 365.24, _
2.37847173959480950e-03 * 365.24, _
(-2.96589568540237556e-05) * 365.24, _
4.36624404335156298e-05 * ((4 * 3.141592653589793) * _
3.141592653589793)), (1.53796971148509165e+01, _
-2.59193146099879641e+01, 1.79258772950371181e-01, _
2.68067772490389322e-03 * 365.24, _
1.62824170038242295e-03 * 365.24, _
(-9.51592254519715870e-05) * 365.24, _
5.15138902046611451e-05 * ((4 * 3.141592653589793) * _
3.141592653589793))}
Private Function main(Byval argc As Long) As Long
Dim n As Long = argc
Dim i As Long
offset_momentum(5, bodies())
Print Using "##.#########"; energy(NBODIES, bodies())
For i = 1 To n
advance(NBODIES, bodies(), 0.01)
Next
Print Using "##.#########"; energy(NBODIES, bodies())
Return 0
End Function
Re: Speed of FreeBasic
Using -gen GCC -O3 is not really fair in this context. Because that is just passing the compile out to GCC. If you want a good idea of how fast the compiled EXE's are FROM FBC, then you need to let FBC do the compile.
Re: Speed of FreeBasic
@Imortis
I don't agree, fbc x64 always uses -gen gcc even if you leave that option out from the command
I don't agree, fbc x64 always uses -gen gcc even if you leave that option out from the command
-
- Posts: 22
- Joined: Apr 23, 2013 19:12
- Contact:
Re: Speed of FreeBasic
Thank you very much, friends!
@lizard
But is not FB included in this test? Is that the 11.43 (seconds as srvaldez writes?)
@lizard
But is not FB included in this test? Is that the 11.43 (seconds as srvaldez writes?)
Re: Speed of FreeBasic
If you compile with gen gcc it produces c code that is compiled with gcc, AFAIK. Then it is actually gcc, and near #1 at benchmarksgame. Naturally it must be run on their hardware and os for a perfect comparision. But we can say, FB is close to the top. :-)zxretrosoft wrote:@lizard
But is not FB included in this test? Is that the 11.43 (seconds as srvaldez writes?)
Last edited by lizard on Mar 20, 2018 22:04, edited 1 time in total.
Re: Speed of FreeBasic
The comparison should really be -gen gas.
It's a bit slower.
It's a bit slower.
Code: Select all
6.381060839107448 1 of 7
6.304388219459455 2 of 7
6.309766430873751 3 of 7
6.308480820393896 4 of 7
6.308385006978185 5 of 7
6.311566354033488 6 of 7
6.305176284668931 7 of 7
Mean 6.31840342221645
Re: Speed of FreeBasic
I really don't understand your objections to -gen gcc, gcc is used as backend, just as -gen gas uses the gnu assembler as backend, if you really don't want to use -gen gcc that's your choice but then you can only compile to 32-bit applications
see https://superuser.com/a/1198792
see https://superuser.com/a/1198792
-
- Posts: 4310
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: Speed of FreeBasic
In dodicat's code there is a high level of precision but with some code timings we can have 'rogue' values. Very often the first timing may be the fastest. I have never been able to fathom out why that happens. On other occasions some timings may be a lot slower than the average.
We could have a sophisticated algorithm to examine the results and remove those which would have a profound effect on the average.
On the other hand a decidedly unsophisticated method is to take the median and not the mean.
To exaggerate this approach I forced dodicat's code to give the first 'diff' as zero. This is what I got.
The intitial 'rogue' value has been very influential.
Equally we have rogue values much slower than the rest.
By choosing the median any rogue values have a zero effect.
The author of the opening post's link is using medians.
I am inclined to agree with that.
Added: It is a pointless exercise to compare a FB test with the opening post's link. For a true comparison either the opening post's link should include FB in the tests or we should include C, Java and so on a FB setup. In other words they should all use the same CPU.
We could have a sophisticated algorithm to examine the results and remove those which would have a profound effect on the average.
On the other hand a decidedly unsophisticated method is to take the median and not the mean.
To exaggerate this approach I forced dodicat's code to give the first 'diff' as zero. This is what I got.
Code: Select all
0 1 of 7
1.579375311659533 2 of 7
1.597423764760606 3 of 7
1.578401235354249 4 of 7
1.579799527593423 5 of 7
1.598054361660616 6 of 7
1.578834251282387 7 of 7
Mean 1.358841207472973
Equally we have rogue values much slower than the rest.
By choosing the median any rogue values have a zero effect.
The author of the opening post's link is using medians.
I am inclined to agree with that.
Added: It is a pointless exercise to compare a FB test with the opening post's link. For a true comparison either the opening post's link should include FB in the tests or we should include C, Java and so on a FB setup. In other words they should all use the same CPU.
Re: Speed of FreeBasic
Yes, that is a good approach (see Best strategy for timings):deltarho[1859] wrote:We could have a sophisticated algorithm to examine the results and remove those which would have a profound effect on the average.
- discard the first n timings (the first 10%, ...?) to load the cache
- sort the results
- eliminate the highest 20%, i.e. the spikes
- calculate the average of the remaining values.
But before doing all that, it would be a good idea to agree on a testbed that simulates some relevant common tasks, like:
- calculations of all sorts, integers vs floats etc
- string generation, concatenation
- string parsing
- file loading and storing
- sorting
- filtering
...
-
- Posts: 4310
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: Speed of FreeBasic
In Statistics for the Terrified 'Mean versus Median' is considered.
In a nutshell it seems that we should use mean "with symmetrically distributed data" and median otherwise.
When you and I were discussing timings a little while ago you showed me that my assumption that timings were normally distributed was wrong. It took a while, <smile>, but I eventually conceded that you were correct.
It follows then, from the above link, that we should be using median and not mean.
I wrote "We could have a sophisticated algorithm to examine the results and remove those which would have a profound effect on the average." to which you wrote "Yes, that is a good approach".
I disagree. What we are doing is conditioning the data by using arbitrary values such as the first 10% and the highest 20%. I doubt that we would get a consensus on these values and there would be a temptation to tweak them for some reason.
With the median approach any 'rogue' values are automatically filtered out without using any arbitrary conditioning values.
If the data is symmetrical then the mean and median will be pretty much the same. With dodicat's timings, which are precise, then the conclusion would be the same whether we used mean or median. In your 'Best strategy for timings' graph using the mean would not be a good approach and why you considered a different approach.
I have a suspicion that if we have 31, say, timings where there is a small number of low values and a number of spikes then your approach and the median approach may very well result in a similar conclusion. However, in my case I would simply sort the results and choose the 16th value; job done.<smile>
Having a maths background I would always be attracted to an elegant solution but, in this case, the pragmatism of a median is the better solution.
In a nutshell it seems that we should use mean "with symmetrically distributed data" and median otherwise.
When you and I were discussing timings a little while ago you showed me that my assumption that timings were normally distributed was wrong. It took a while, <smile>, but I eventually conceded that you were correct.
It follows then, from the above link, that we should be using median and not mean.
I wrote "We could have a sophisticated algorithm to examine the results and remove those which would have a profound effect on the average." to which you wrote "Yes, that is a good approach".
I disagree. What we are doing is conditioning the data by using arbitrary values such as the first 10% and the highest 20%. I doubt that we would get a consensus on these values and there would be a temptation to tweak them for some reason.
With the median approach any 'rogue' values are automatically filtered out without using any arbitrary conditioning values.
If the data is symmetrical then the mean and median will be pretty much the same. With dodicat's timings, which are precise, then the conclusion would be the same whether we used mean or median. In your 'Best strategy for timings' graph using the mean would not be a good approach and why you considered a different approach.
I have a suspicion that if we have 31, say, timings where there is a small number of low values and a number of spikes then your approach and the median approach may very well result in a similar conclusion. However, in my case I would simply sort the results and choose the 16th value; job done.<smile>
Having a maths background I would always be attracted to an elegant solution but, in this case, the pragmatism of a median is the better solution.
Re: Speed of FreeBasic
You can't compare the speed of C and FB unless you run both on the same hardware! Quoting timings of just one is useless.
I compared srvaldez's n-body code against C++:
cpu: AMD FX-6100 (Bulldozer) @ 3.3GHz (with boosting above 3.3GHz disabled in BIOS)
fbc: 1.06 built from git
gcc: 7.3.0
linux 4.14.19, echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
I used the fastest n-body solution in C++ which didn't use SSE2 (here), and the fastest which did (here) -- which is tied for first place with Fortran -- and srvaldez's port of the C++ code to FB.
I compared both with no special compiler flags, and those compiler flags used by the benchmarksgame.
Summary of results:
32 bit builds:
As you can see, FB actually outperformed C++ here - the reason is probably that srvaldez translated a different C++ implementation than the one I used. This is the fault of the people who submitted C++ nbody implementations, for not submitting an optimal one that didn't use SSE. I definitely think you should NOT compare to the C++ & SSE2 implementation, because it's not pure C++, it's like using inline assembler!
fbc's gas backend does rather few optimisations - it doesn't even try to keep variables in registers between different lines of code! - so being only 3x slower than GCC is rather remarkable. However, it's OK at compiling expressions that fit onto a single line of code, which is why it produces faster assembly for math heavy code than in it does in general.
64 bit builds:
Interestingly this time, srvaldez's FB implementation is about 18-20% slower than the C++ one. Odd!
Here, attempting to micromanage g++ seems to muck up the use of SSE2 intrinsics.
Minimum times, out of 7 runs are used above. Here are the complete results, with medians and all times:
Here is the script I used to produce these results:
I think there are different ways to go about timing a language: either you can ask what is the fastest possible way to implement some algorithm in a language, or you can ask how fast is a more idiomatic implementation, ie using code that is the most natural for that language. Both performance comparisons are interesting for different use cases. These language benchmark games always go for maximum performance, using often pretty ugly code. Sometimes (IIRC 'fasta' in lua) they cheat outright by just calling some external numeric library to do the computations. For the n-body problem, unlike the fastest C++ solution, the fastest program, in FORTRAN, doesn't use CPU intrinsics!
Whenever I do any timing, I first switch the kernel's frequency governor to a constant frequency. This greatly reduces timing jitter, but on my machine does not actually disable the CPU's millisecond-scale frequency adjustment (which you can see by running cpufreq-aperf).
I compared srvaldez's n-body code against C++:
cpu: AMD FX-6100 (Bulldozer) @ 3.3GHz (with boosting above 3.3GHz disabled in BIOS)
fbc: 1.06 built from git
gcc: 7.3.0
linux 4.14.19, echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
I used the fastest n-body solution in C++ which didn't use SSE2 (here), and the fastest which did (here) -- which is tied for first place with Fortran -- and srvaldez's port of the C++ code to FB.
I compared both with no special compiler flags, and those compiler flags used by the benchmarksgame.
Summary of results:
32 bit builds:
Code: Select all
Program Compiler args Time/sec
FB -fpu sse -O 3 40.13
FB -fpu x87 -O 3 43.45
FB -gen gcc -O 3 12.63
FB -gen gcc -O 3 -Wc -march=native,-fomit-frame-pointer,-mfpmath=sse,-msse3 9.21
C++ -O3 14.87
C++ -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3 9.73
C++ & SSE -O3 8.03
C++ & SSE -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3 5.73
fbc's gas backend does rather few optimisations - it doesn't even try to keep variables in registers between different lines of code! - so being only 3x slower than GCC is rather remarkable. However, it's OK at compiling expressions that fit onto a single line of code, which is why it produces faster assembly for math heavy code than in it does in general.
64 bit builds:
Code: Select all
Program Compiler args Time/sec
FB -gen gcc -O 3 9.51
FB -gen gcc -O 3 -Wc -march=native,-fomit-frame-pointer,-mfpmath=sse,-msse3 8.71
C++ -O3 8.07
C++ -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3 7.27
C++ & SSE -O3 5.48
C++ & SSE -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3 5.95
Here, attempting to micromanage g++ seems to muck up the use of SSE2 intrinsics.
Minimum times, out of 7 runs are used above. Here are the complete results, with medians and all times:
Code: Select all
---32 bit---
fbc nbody.bas -arch 32 -fpu sse -O 3
Min: 40.13 Median: 40.29 All: [40.13, 40.21, 40.22, 40.29, 40.52, 40.69, 40.98]
fbc nbody.bas -arch 32 -fpu x87 -O 3
Min: 43.45 Median: 44.25 All: [43.45, 44.15, 44.19, 44.25, 44.29, 44.36, 44.45]
fbc nbody.bas -arch 32 -gen gcc -O 3
Min: 12.63 Median: 12.63 All: [12.63, 12.63, 12.63, 12.63, 12.63, 12.64, 12.71]
fbc nbody.bas -arch 32 -gen gcc -O 3 -Wc -march=native,-fomit-frame-pointer,-mfpmath=sse,-msse3
Min: 9.21 Median: 9.21 All: [9.21, 9.21, 9.21, 9.21, 9.21, 9.21, 9.21]
g++ nbody.cpp -o nbody_cpp -m32 -O3
Min: 14.87 Median: 14.88 All: [14.87, 14.88, 14.88, 14.88, 14.89, 14.9, 14.9]
g++ nbody.cpp -o nbody_cpp -m32 -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3
Min: 9.73 Median: 9.77 All: [9.73, 9.73, 9.74, 9.77, 9.86, 9.89, 9.96]
g++ nbody_sse.cpp -o nbody_sse_cpp -m32 -O3
Min: 8.03 Median: 8.14 All: [8.03, 8.09, 8.09, 8.14, 8.15, 8.18, 8.2]
g++ nbody_sse.cpp -o nbody_sse_cpp -m32 -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3
Min: 5.73 Median: 6.07 All: [5.73, 5.84, 5.91, 6.07, 6.19, 6.42, 7.13]
---64 bit---
fbc nbody.bas -arch 64 -gen gcc -O 3
Min: 9.51 Median: 9.51 All: [9.51, 9.51, 9.51, 9.51, 9.52, 9.52, 9.6]
fbc nbody.bas -arch 64 -gen gcc -O 3 -Wc -march=native,-fomit-frame-pointer,-mfpmath=sse,-msse3
Min: 8.71 Median: 8.71 All: [8.71, 8.71, 8.71, 8.71, 8.72, 8.72, 8.73]
g++ nbody.cpp -o nbody_cpp -m64 -O3
Min: 8.07 Median: 8.07 All: [8.07, 8.07, 8.07, 8.07, 8.08, 8.11, 8.15]
g++ nbody.cpp -o nbody_cpp -m64 -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3
Min: 7.27 Median: 7.28 All: [7.27, 7.27, 7.28, 7.28, 7.28, 7.28, 7.39]
g++ nbody_sse.cpp -o nbody_sse_cpp -m64 -O3
Min: 5.48 Median: 5.57 All: [5.48, 5.48, 5.56, 5.57, 5.58, 5.64, 5.66]
g++ nbody_sse.cpp -o nbody_sse_cpp -m64 -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3
Min: 5.95 Median: 5.98 All: [5.95, 5.98, 5.98, 5.98, 5.99, 6.02, 6.27]
Code: Select all
#!/bin/sh
print() {
echo
echo $*
$*
}
time_runs() {
prog=$1
outfile=times_$2.txt
rm -f $outfile
for i in {1..7}; do
/usr/bin/time -f '%e' $prog 2>&1 >$outfile.out | tee -a $outfile
done
# This uses pythonpy and numpy to compute medians. https://github.com/Russell91/pythonpy
cat $outfile | py --ji -l '"Min: %.2f\tMedian: %.2f\tAll: %s" % (min(l), numpy.median(l), sorted(l))'
}
print fbc nbody.bas -arch 32 -fpu sse -O 3
time_runs ./nbody 32_bas_gas_sse
print fbc nbody.bas -arch 32 -fpu x87 -O 3
time_runs ./nbody 32_bas_gas_fpu
print fbc nbody.bas -arch 32 -gen gcc -O 3
time_runs ./nbody 32_bas_gas_gcc
print fbc nbody.bas -arch 32 -gen gcc -O 3 -Wc -march=native,-fomit-frame-pointer,-mfpmath=sse,-msse3
time_runs ./nbody 32_bas_gas_gcc_native
print fbc nbody.bas -arch 64 -gen gcc -O 3
time_runs ./nbody 64_bas_gas_gcc
print fbc nbody.bas -arch 64 -gen gcc -O 3 -Wc -march=native,-fomit-frame-pointer,-mfpmath=sse,-msse3
time_runs ./nbody 64_bas_gas_gcc_native
print g++ nbody.cpp -o nbody_cpp -m32 -O3
time_runs './nbody_cpp 50000000' 32_cpp
print g++ nbody.cpp -o nbody_cpp -m32 -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3
time_runs './nbody_cpp 50000000' 32_cpp_native
print g++ nbody.cpp -o nbody_cpp -m64 -O3
time_runs './nbody_cpp 50000000' 64_cpp
print g++ nbody.cpp -o nbody_cpp -m64 -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3
time_runs './nbody_cpp 50000000' 64_cpp_native
print g++ nbody_sse.cpp -o nbody_sse_cpp -m32 -O3
time_runs './nbody_sse_cpp 50000000' 32_cpp_sse
print g++ nbody_sse.cpp -o nbody_sse_cpp -m32 -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3
time_runs './nbody_sse_cpp 50000000' 32_cpp_sse_native
print g++ nbody_sse.cpp -o nbody_sse_cpp -m64 -O3
time_runs './nbody_sse_cpp 50000000' 64_cpp_sse
print g++ nbody_sse.cpp -o nbody_sse_cpp -m64 -O3 -fomit-frame-pointer -march=native -mfpmath=sse -msse3
time_runs './nbody_sse_cpp 50000000' 64_cpp_sse_native
I am speculating, but perhaps this is caused by the CPU throttling up to an unsustainably high frequency for a short time, then reducing to a more sustainable (but still above baseline) frequency thereafter, to meet power draw and temperature requirements which are averages over time rather than instantaneous.deltarho[1859] wrote:In dodicat's code there is a high level of precision but with some code timings we can have 'rogue' values. Very often the first timing may be the fastest. I have never been able to fathom out why that happens. On other occasions some timings may be a lot slower than the average.
Whenever I do any timing, I first switch the kernel's frequency governor to a constant frequency. This greatly reduces timing jitter, but on my machine does not actually disable the CPU's millisecond-scale frequency adjustment (which you can see by running cpufreq-aperf).
Re: Speed of FreeBasic
Engineers and mathematicians often disagree ;-)deltarho[1859] wrote:I disagree. What we are doing is conditioning the data by using arbitrary values such as the first 10% and the highest 20%. I doubt that we would get a consensus on these values and there would be a temptation to tweak them for some reason.
This is probably the innermost loop:
Code: Select all
for(unsigned i=0,k=0; i < bodies.size()-1; ++i) {
Body& iBody = bodies[i];
for(unsigned j=i+1; j < bodies.size(); ++j,++k) {
iBody.vx -= r[k].dx * bodies[j].mass * mag[k];
iBody.vy -= r[k].dy * bodies[j].mass * mag[k];
iBody.vz -= r[k].dz * bodies[j].mass * mag[k];
bodies[j].vx += r[k].dx * iBody.mass * mag[k];
bodies[j].vy += r[k].dy * iBody.mass * mag[k];
bodies[j].vz += r[k].dz * iBody.mass * mag[k];
}
}