which is faster, GCC or freebasic?
-
- Posts: 5494
- Joined: Sep 12, 2005 20:06
- Location: California
In the code you posted there is no significant speed difference because it implements an algorithm where most of the execution time is spent in a slow mod operation that the compiler cannot optimize.TJF wrote:No, I made several speed tests with FB-, C- and C++-code. There was no significant speed difference.
Code: Select all
'====================================================================
#include "windows.bi"
#include "counter.bas"
'====================================================================
'' Counter.bas is available here:
''
'' http://www.freebasic.net/forum/viewtopic.php?t=4221
'====================================================================
dim shared as integer loop_count1, loop_count2
'====================================================================
FUNCTION istPrimzahl(Zahl AS INTEGER) AS INTEGER
IF Zahl = 2 THEN RETURN 1
FOR i AS INTEGER = 2 TO Zahl/2+1
IF (Zahl MOD i)=0 THEN RETURN 0
loop_count1 += 1
NEXT : RETURN 1
END FUNCTION
FUNCTION _istPrimzahl(Zahl AS INTEGER) AS INTEGER
IF Zahl = 2 THEN RETURN 1
FOR i AS INTEGER = 2 TO Zahl/2+1
'IF (Zahl MOD i)=0 THEN RETURN 0
loop_count2 += 1
NEXT : RETURN 1
END FUNCTION
'====================================================================
istPrimzahl(41)
_istPrimzahl(41)
print loop_count1, loop_count2
print
sleep 3000
counter_begin( 1000, HIGH_PRIORITY_CLASS )
counter_end
print counter_cycles;" cycles, empty"
counter_begin( 1000, HIGH_PRIORITY_CLASS )
istPrimzahl(41)
counter_end
print counter_cycles;" cycles, with mod "
counter_begin( 1000, HIGH_PRIORITY_CLASS )
_istPrimzahl(41)
counter_end
print counter_cycles;" cycles, without mod "
sleep
Code: Select all
21 21
0 cycles, empty
799 cycles, with mod
158 cycles, without mod
Code: Select all
int IsPrime( int n )
{
int i = 2, div = 5, max;
if(n == 2 | n == 3 | n == 5) return 1;
if(n % 2 == 0) return 0;
if(n % 3 == 0) return 0;
max = sqrt(n+1);
do
{
if(n % div == 0) return 0;
div += i;
i = 6 - i;
} while( div <= max );
return 1;
}
Code: Select all
'==============================================================================
#include "counter.bas"
'==============================================================================
declare function IsPrime_c cdecl alias "IsPrime"(as integer) as integer
'==============================================================================
dim shared as integer cnt
function IsPrime( byval n as integer ) as integer
dim as integer i = 2, div = 5, mx
if n = 2 or n = 3 or n = 5 then return 1
if n mod 2 = 0 then return 0
if n mod 3 = 0 then return 0
mx = cint(sqr(n+1))
do
if n mod div = 0 then return 0
div += i
i = 6 - i
cnt+=1
loop while div <= mx
return 1
end function
'==============================================================================
for i as integer = 2 to 1000
if IsPrime_c(i) then print i;
next
print
for i as integer = 2 to 1000
if IsPrime(i) then print i;
next
print
print
'==============================================================================
dim as integer r
sleep 3000
counter_begin( 1000, HIGH_PRIORITY_CLASS )
counter_end
print counter_cycles;" cycles, empty"
print
counter_begin( 1000, HIGH_PRIORITY_CLASS )
r = IsPrime(2)
counter_end
print counter_cycles;" cycles, IsPrime(2)"
counter_begin( 1000, HIGH_PRIORITY_CLASS )
r = IsPrime(100)
counter_end
print counter_cycles;" cycles, IsPrime(100)"
counter_begin( 1000, HIGH_PRIORITY_CLASS )
r = IsPrime(101)
counter_end
print counter_cycles;" cycles, IsPrime(101)"
counter_begin( 1000, HIGH_PRIORITY_CLASS )
r = IsPrime(500)
counter_end
print counter_cycles;" cycles, IsPrime(500)"
counter_begin( 1000, HIGH_PRIORITY_CLASS )
r = IsPrime(503)
counter_end
print counter_cycles;" cycles, IsPrime(503)"
print
counter_begin( 1000, HIGH_PRIORITY_CLASS )
r = IsPrime_c(2)
counter_end
print counter_cycles;" cycles, IsPrime_c(2)"
counter_begin( 1000, HIGH_PRIORITY_CLASS )
r = IsPrime_c(100)
counter_end
print counter_cycles;" cycles, IsPrime_c(100)"
counter_begin( 1000, HIGH_PRIORITY_CLASS )
r = IsPrime_c(101)
counter_end
print counter_cycles;" cycles, IsPrime_c(101)"
counter_begin( 1000, HIGH_PRIORITY_CLASS )
r = IsPrime_c(500)
counter_end
print counter_cycles;" cycles, IsPrime_c(500)"
counter_begin( 1000, HIGH_PRIORITY_CLASS )
r = IsPrime_c(503)
counter_end
print counter_cycles;" cycles, IsPrime_c(503)"
sleep
Code: Select all
55 cycles, IsPrime(2)
98 cycles, IsPrime(100)
300 cycles, IsPrime(101)
98 cycles, IsPrime(500)
452 cycles, IsPrime(503)
26 cycles, IsPrime_c(2)
28 cycles, IsPrime_c(100)
233 cycles, IsPrime_c(101)
28 cycles, IsPrime_c(500)
396 cycles, IsPrime_c(503)
The code you used is not typical.Neither C nor C++ can compile a faster binary in this case and most of the other cases. This is what I need to know. This is what I want to show.
-
- Posts: 5494
- Joined: Sep 12, 2005 20:06
- Location: California
If you wanted, couldn't
if( intPotentialPrime mod intPotentialFactor1 = 0 ) then ...
be replaced with
intPotentialFactor2 = intPotentialPrime \ intPotentialFactor1
intProduct = intPotentialFactor1 * intPotentialFactor2
if( intProduct = intPotentialPrime ) then ...
?
A floor function may have to be used if FB attempts any rounding.
if( intPotentialPrime mod intPotentialFactor1 = 0 ) then ...
be replaced with
intPotentialFactor2 = intPotentialPrime \ intPotentialFactor1
intProduct = intPotentialFactor1 * intPotentialFactor2
if( intProduct = intPotentialPrime ) then ...
?
A floor function may have to be used if FB attempts any rounding.
@anonymous1337:
Thanks for the link. My surface impression: all examples are compiled with -lang deprecated. I have no experience with this.
I made a short test with the first example pidigit and I managed to speed up the FB source by 1.5 % (on my box):
So the FB code should be as fast as the FreePascal code.
@MichaelW:
Thanks for the benchmarks! Unfortunately you made some mistakes. IMO you should
Thanks for the link. My surface impression: all examples are compiled with -lang deprecated. I have no experience with this.
I made a short test with the first example pidigit and I managed to speed up the FB source by 1.5 % (on my box):
Code: Select all
' The Computer Language Shootout
' http://shootout.alioth.debian.org/
' contributed by Simon Nash (yetifoot)
' converted to FreeBASIC from the gcc c version by Mike Pall
' modified by TJF
#INCLUDE "gmp.bi"
TYPE ctx_t
AS mpz_t q, r, s, t ' Transformation matrix components.
AS mpz_t u, v, w ' Temporary numbers.
AS INTEGER d, i, n ' Counters.
AS ZSTRING * 11 digits ' Accumulated digits for one line.
END TYPE
'
' Compose matrix with numbers on the right.
SUB compose_r(BYVAL C AS ctx_t PTR, BYVAL Q AS INTEGER, BYVAL R AS INTEGER, BYVAL S AS INTEGER, BYVAL T AS INTEGER)
WITH *C
mpz_mul_si(@.u, @.r, S)
mpz_mul_si(@.r, @.r, Q)
mpz_mul_si(@.v, @.t, R)
mpz_add (@.r, @.r, @.v)
mpz_mul_si(@.t, @.t, T)
mpz_add (@.t, @.t, @.u)
mpz_mul_si(@.s, @.s, T)
mpz_mul_si(@.u, @.q, S)
mpz_add (@.s, @.s, @.u)
mpz_mul_si(@.q, @.q, Q)
END WITH
END SUB
' Compose matrix with numbers on the left.
SUB compose_l(BYVAL C AS ctx_t PTR, BYVAL Q AS INTEGER, BYVAL R AS INTEGER, BYVAL S AS INTEGER, BYVAL T AS INTEGER)
WITH *C
mpz_mul_si(@.r, @.r, T)
mpz_mul_si(@.u, @.q, R)
mpz_add (@.r, @.r, @.u)
mpz_mul_si(@.u, @.t, S)
mpz_mul_si(@.t, @.t, T)
mpz_mul_si(@.v, @.s, R)
mpz_add (@.t, @.t, @.v)
mpz_mul_si(@.s, @.s, Q)
mpz_add (@.s, @.s, @.u)
mpz_mul_si(@.q, @.q, Q)
END WITH
END SUB
' Extract one digit.
FUNCTION extract(BYVAL C AS ctx_t PTR, J AS UINTEGER) AS INTEGER
WITH *C
mpz_mul_ui(@.u, @.q, j)
mpz_add (@.u, @.u, @.r)
mpz_mul_ui(@.v, @.s, j)
mpz_add (@.v, @.v, @.t)
mpz_tdiv_q(@.w, @.u, @.v)
RETURN mpz_get_ui(@.w)
END WITH
END FUNCTION
' Print one digit. Returns 1 for the last digit.
FUNCTION prdigit(BYVAL C AS ctx_t PTR, y AS INTEGER) AS INTEGER
C->digits[C->d] = ASC("0") + y
C->d += 1
C->i += 1
IF C->i < C->n THEN IF C->i MOD 10 THEN RETURN 0
C->digits[C->d] = 0
?C->digits, C->i
C->d = 0 : RETURN C->i = C->n
END FUNCTION
' Generate successive digits of PI.
SUB pidigits(BYVAL C AS ctx_t PTR)
C->d = 0
C->i = 0
WITH *C
mpz_init_set_ui(@.q, 1)
mpz_init_set_ui(@.r, 0)
mpz_init_set_ui(@.s, 0)
mpz_init_set_ui(@.t, 1)
mpz_init(@.u)
mpz_init(@.v)
mpz_init(@.w)
VAR k = 1
DO
VAR y = extract(C, 3)
IF y = extract(C, 4) THEN
IF prdigit(C, y) THEN EXIT SUB
compose_r(C, 10, -10*y, 0, 1)
ELSE
compose_l(C, k, 4*k+2, 0, 2*k+1)
k += 1
END IF
LOOP
END WITH
END SUB
DIM AS ctx_t c
c.n = CINT(Command$(1))
IF c.n <= 0 THEN c.n = 27
VAR t = TIMER
pidigits(@c)
?:?TIMER - t
@MichaelW:
Thanks for the benchmarks! Unfortunately you made some mistakes. IMO you should
- 1) eliminate the line cnt+=1 in function IsPrime.
2) find a similar evaluation for max/mx like mx = 0.5+sqr(n+1)
The cnt+=1 was a test that I forgot to remove. It is in the loop, but because integer addition is very fast removing it saved only a few cycles, changing the relevant counts from 300 and 452 to 299 and 449. The cint in mx=cint(sqr(n+1)) was to correct a problem that turned out to be elsewhere, that I also forgot to remove. Replacing it with mx=0.5+sqr(n+1) added extra floating-point operations and increased the counts to 333 and 488. Replacing it with just mx=sqr(n+1) reduced the counts to 296 and 450.TJF wrote: Unfortunately you made some mistakes. IMO you shouldI would appreciate seeing updated results.
- 1) eliminate the line cnt+=1 in function IsPrime.
2) find a similar evaluation for max/mx like mx = 0.5+sqr(n+1)
For composite numbers the compiler-optimized code is ~3x faster. For small prime numbers the compiler-optimized code has a small advantage, but as the size of the numbers increase the number of mod operations increase and eventually the compiler-optimized code has no advantage. I added a test to show this effect.
So the full results, running on a P3, are:
Code: Select all
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103
107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211
223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331
337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449
457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587
593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709
719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853
857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991
997
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103
107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211
223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331
337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449
457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587
593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709
719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853
857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991
997
0 cycles, empty
55 cycles, IsPrime(2)
98 cycles, IsPrime(100)
296 cycles, IsPrime(101)
98 cycles, IsPrime(500)
450 cycles, IsPrime(503)
1530 cycles, IsPrime(12347)
26 cycles, IsPrime_c(2)
28 cycles, IsPrime_c(100)
233 cycles, IsPrime_c(101)
28 cycles, IsPrime_c(500)
396 cycles, IsPrime_c(503)
1530 cycles, IsPrime_c(12347)
@MichaelW:
Thanks for the additional tests! So I have to learn that optimized binaries can be up to 3 times faster than native gas code.
I didn't notice that up to now. One reason may be that I use a lot of libraries written in C. It makes no difference if you call a library function from C or from FB. In this case both are equal in speed (exeption: inline functions).
Anyway, for daily work as a developer rather the speed of the binary than the speed of the compiling process is my main interest. And fbc compiles much faster than gcc with option -O 2!
Thanks for the additional tests! So I have to learn that optimized binaries can be up to 3 times faster than native gas code.
I didn't notice that up to now. One reason may be that I use a lot of libraries written in C. It makes no difference if you call a library function from C or from FB. In this case both are equal in speed (exeption: inline functions).
Anyway, for daily work as a developer rather the speed of the binary than the speed of the compiling process is my main interest. And fbc compiles much faster than gcc with option -O 2!
@TJF
Saw your pidigit program, tried it and ask myself can this made faster, did not see any thing that could be don to speed it up.
Had a second look today and took a closer look at the calling of the functions and subs and noticed
that the sub compose_r() was called with 3 constants and the sub compose_l()
was called with 1 constant.
Adding 0 to a variable or multiply it with 1 doesn't alter the variable so some instructions can be removed.
The first listing is the your original listing with the unnecessary code
commented out.
The second listing has the redundant code remove and some small changes because
some variables aren't passed over to the subs.
Speed:
Your original listing takes 3.52 sec. on my computer, first and second listing take 3.42 sec. About 2.8 %, not much but I don't think that this program can be made any faster.
My computer is a Intel dual core , windows xp , and the GMP library is a DLL version (5.0.1) not optimized (Can't get tuneup compiled).
Also tried it with MPIR but MPIR is slower then GMP.
EDIT
After submitting this post i found a litte speedup in the Sub compose_l()
the instructions
Mpz_mul_si(@.u, @.q, R)
Mpz_add (@.r, @.r, @.u)
can be replaced with
Mpz_addmul_ui(@.r, @.q, R)
I changed the listings in this post
The speed is now 3.34 seconds about 4.8 % faster then the original listing
OOPS forgot to mention that all timings are for N = 10000
Saw your pidigit program, tried it and ask myself can this made faster, did not see any thing that could be don to speed it up.
Had a second look today and took a closer look at the calling of the functions and subs and noticed
that the sub compose_r() was called with 3 constants and the sub compose_l()
was called with 1 constant.
Adding 0 to a variable or multiply it with 1 doesn't alter the variable so some instructions can be removed.
The first listing is the your original listing with the unnecessary code
commented out.
Code: Select all
' The Computer Language Shootout
' [url]http://shootout.alioth.debian.org/[/url]
' contributed by Simon Nash (yetifoot)
' converted to FreeBASIC from the gcc c version by Mike Pall
' modified by TJF
' commented some code that was unnecessary , frisian
#Include "gmp.bi"
Type ctx_t
As Mpz_t q, r, s, t ' Transformation matrix components.
As Mpz_t u, v, w ' Temporary numbers.
As Integer d, i, n ' Counters.
As ZString * 11 digits ' Accumulated digits for one line.
End Type
'
' Compose matrix with numbers on the right.
Sub compose_r(ByVal C As ctx_t Ptr, ByVal Q As Integer, ByVal R As Integer, ByVal S As Integer, ByVal T As Integer)
With *C
'==============================
' Q = 10
' R = variable
' S = 0
' T = 1
'==============================
' mpz_mul_si(@.u, @.r, S) u = 0 (r*0)
' Mpz_mul_si(@.r, @.r, Q) r = r * 10
Mpz_mul_si(@.r, @.r, 10)
Mpz_mul_si(@.v, @.t, R)
Mpz_add (@.r, @.r, @.v)
' mpz_mul_si(@.t, @.t, T) t = t (*1)
' Mpz_add (@.t, @.t, @.u) t = t (+0)
' mpz_mul_si(@.s, @.s, T) s = s (*1)
' mpz_mul_si(@.u, @.q, S) u = 0 again hasn't been used so is still 0
' mpz_add (@.s, @.s, @.u) s = s (+0)
' Mpz_mul_si(@.q, @.q, Q) q = q * 10
Mpz_mul_si(@.q, @.q, 10)
End With
End Sub
' Compose matrix with numbers on the left.
Sub compose_l(ByVal C As ctx_t Ptr, ByVal Q As Integer, ByVal R As Integer, ByVal S As Integer, ByVal T As Integer)
With *C
'==============================
' Q = variable
' R = variable
' S = 0
' T = variable
'==============================
Mpz_mul_si(@.r, @.r, T)
' Mpz_mul_si(@.u, @.q, R)
' Mpz_add (@.r, @.r, @.u)
Mpz_addmul_ui(@.r, @.q, R)
' Mpz_mul_si(@.u, @.t, S) make u = 0, since @.u is not used again and
' in function extract() filled with value there no need to do something with @.u
Mpz_mul_si(@.t, @.t, T)
Mpz_mul_si(@.v, @.s, R)
Mpz_add (@.t, @.t, @.v)
Mpz_mul_si(@.s, @.s, Q)
' Mpz_add (@.s, @.s, @.u) s = s + 0
Mpz_mul_si(@.q, @.q, Q)
End With
End Sub
' Extract one digit.
Function extract(ByVal C As ctx_t Ptr, J As UInteger) As Integer
With *C
Mpz_mul_ui(@.u, @.q, j)
Mpz_add (@.u, @.u, @.r)
Mpz_mul_ui(@.v, @.s, j)
Mpz_add (@.v, @.v, @.t)
Mpz_tdiv_q(@.w, @.u, @.v)
Return Mpz_get_ui(@.w)
End With
End Function
' Print one digit. Returns 1 for the last digit.
Function prdigit(ByVal C As ctx_t Ptr, y As Integer) As Integer
C->digits[C->d] = Asc("0") + y
C->d += 1
C->i += 1
If C->i < C->n Then If C->i Mod 10 Then Return 0
C->digits[C->d] = 0
?C->digits, C->i
C->d = 0 : Return C->i = C->n
End Function
' Generate successive digits of PI.
Sub pidigits(ByVal C As ctx_t Ptr)
C->d = 0
C->i = 0
With *C
Mpz_init_set_ui(@.q, 1)
Mpz_init_set_ui(@.r, 0) ' mpz_init(@.r) mpz_init set variable to 0
Mpz_init_set_ui(@.s, 0) ' mpz_init(@.s) only used once so very little gain
Mpz_init_set_ui(@.t, 1)
Mpz_init(@.u)
Mpz_init(@.v)
Mpz_init(@.w)
Var k = 1
Do
Var y = extract(C, 3)
If y = extract(C, 4) Then
If prdigit(C, y) Then Exit Sub
compose_r(C, 10, -10*y, 0, 1)
Else
compose_l(C, k, 4*k+2, 0, 2*k+1)
k += 1
EndIf
Loop
End With
End Sub
Dim As ctx_t c
c.n = CInt(Command$(1))
If c.n <= 0 Then c.n = 27
Var t = Timer
pidigits(@c)
?:?Timer – t
some variables aren't passed over to the subs.
Code: Select all
' The Computer Language Shootout
' [url]http://shootout.alioth.debian.org/[/url]
' contributed by Simon Nash (yetifoot)
' converted to FreeBASIC from the gcc c version by Mike Pall
' modified by TJF
' removed obsolete code by frisian
#Include "gmp.bi"
Type ctx_t
As Mpz_t q, r, s, t ' Transformation matrix components.
As Mpz_t u, v, w ' Temporary numbers.
As Integer d, i, n ' Counters.
As ZString * 11 digits ' Accumulated digits for one line.
End Type
' Compose matrix with numbers on the right.
Sub compose_r(ByVal C As ctx_t Ptr, ByVal R As Integer)
With *C
Mpz_mul_si(@.r, @.r, 10)
Mpz_mul_si(@.v, @.t, R)
Mpz_add (@.r, @.r, @.v)
Mpz_mul_si(@.q, @.q, 10)
End With
End Sub
' Compose matrix with numbers on the left.
Sub compose_l(ByVal C As ctx_t Ptr, ByVal Q As Integer, ByVal R As Integer, ByVal T As Integer)
With *C
Mpz_mul_si(@.r, @.r, T)
Mpz_addmul_ui(@.r, @.q, R)
Mpz_mul_si(@.t, @.t, T)
Mpz_mul_si(@.v, @.s, R)
Mpz_add (@.t, @.t, @.v)
Mpz_mul_si(@.s, @.s, Q)
Mpz_mul_si(@.q, @.q, Q)
End With
End Sub
' Extract one digit.
Function extract(ByVal C As ctx_t Ptr, J As UInteger) As Integer
With *C
Mpz_mul_ui(@.u, @.q, j)
Mpz_add (@.u, @.u, @.r)
Mpz_mul_ui(@.v, @.s, j)
Mpz_add (@.v, @.v, @.t)
Mpz_tdiv_q(@.w, @.u, @.v)
Return Mpz_get_ui(@.w)
End With
End Function
' Print one digit. Returns 1 for the last digit.
Function prdigit(ByVal C As ctx_t Ptr, y As Integer) As Integer
C->digits[C->d] = Asc("0") + y
C->d += 1
C->i += 1
If C->i < C->n Then If C->i Mod 10 Then Return 0
C->digits[C->d] = 0
?C->digits, C->i
C->d = 0 : Return C->i = C->n
End Function
' Generate successive digits of PI.
Sub pidigits(ByVal C As ctx_t Ptr)
C->d = 0
C->i = 0
With *C
Mpz_init_set_ui(@.q, 1)
Mpz_init_set_ui(@.r, 0)
Mpz_init_set_ui(@.s, 0)
Mpz_init_set_ui(@.t, 1)
Mpz_init(@.u)
Mpz_init(@.v)
Mpz_init(@.w)
Var k = 1
Do
Var y = extract(C, 3)
If y = extract(C, 4) Then
If prdigit(C, y) Then Exit Sub
compose_r(C, -10*y)
Else
compose_l(C, k, 4*k+2, 2*k+1)
k += 1
EndIf
Loop
End With
End Sub
Dim As ctx_t c
c.n = CInt(Command$(1))
If c.n <= 0 Then c.n = 27
Var t = Timer
pidigits(@c)
?:?Timer - t
Sleep
Your original listing takes 3.52 sec. on my computer, first and second listing take 3.42 sec. About 2.8 %, not much but I don't think that this program can be made any faster.
My computer is a Intel dual core , windows xp , and the GMP library is a DLL version (5.0.1) not optimized (Can't get tuneup compiled).
Also tried it with MPIR but MPIR is slower then GMP.
EDIT
After submitting this post i found a litte speedup in the Sub compose_l()
the instructions
Mpz_mul_si(@.u, @.q, R)
Mpz_add (@.r, @.r, @.u)
can be replaced with
Mpz_addmul_ui(@.r, @.q, R)
I changed the listings in this post
The speed is now 3.34 seconds about 4.8 % faster then the original listing
OOPS forgot to mention that all timings are for N = 10000
Why do the FB benchmarks use so much memory (compare with e.g. FPC, and see that it uses several times the memory FPC uses. And FPC is an ordinary compiler, not something specifically embedded)
http://shootout.alioth.debian.org/gp4/b ... g2=fpascal
http://shootout.alioth.debian.org/gp4/b ... g2=fpascal
@MichaelW:
Are you interested in testing this function against IsPrime_c?
Are you interested in testing this function against IsPrime_c?
Code: Select all
FUNCTION IsPrime( BYVAL n AS INTEGER ) AS INTEGER
SELECT CASE AS CONST n
CASE 2, 3, 5 : RETURN 1
CASE ELSE
IF n MOD 2 = 0 THEN RETURN 0
IF n MOD 3 = 0 THEN RETURN 0
VAR i = 2, div = 5, mx = CINT(SQR(n + 1))
DO
IF n MOD div = 0 THEN RETURN 0
div += i
i = 6 - i
LOOP WHILE div <= mx
END SELECT : RETURN 1
END FUNCTION
Running on a P3, for composite numbers your changes close the gap to 2x.
FWIW, I also tested with the Microsoft Visual C++ Toolkit 2003 compiler using this command line:
cl /QIfist /FA /G6 /O2 /Zl /c isprime_c.c
Once I found the /QIfist compiler option that allowed me to avoid a _ftol2 call (apparently otherwise used to convert the floating-point square root to an integer), the code was a little faster than the GCC-optimized (3.4.5) code.
Code: Select all
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103
107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211
223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331
337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449
457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587
593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709
719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853
857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991
997
2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103
107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211
223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331
337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449
457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587
593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709
719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853
857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991
997
0 cycles, empty
25 cycles, IsPrime(2)
56 cycles, IsPrime(100)
255 cycles, IsPrime(101)
56 cycles, IsPrime(500)
409 cycles, IsPrime(503)
1489 cycles, IsPrime(12347)
26 cycles, IsPrime_c(2)
28 cycles, IsPrime_c(100)
233 cycles, IsPrime_c(101)
28 cycles, IsPrime_c(500)
396 cycles, IsPrime_c(503)
1530 cycles, IsPrime_c(12347)
cl /QIfist /FA /G6 /O2 /Zl /c isprime_c.c
Once I found the /QIfist compiler option that allowed me to avoid a _ftol2 call (apparently otherwise used to convert the floating-point square root to an integer), the code was a little faster than the GCC-optimized (3.4.5) code.
Code: Select all
19 cycles, IsPrime_c(2)
22 cycles, IsPrime_c(100)
210 cycles, IsPrime_c(101)
24 cycles, IsPrime_c(500)
368 cycles, IsPrime_c(503)
1502 cycles, IsPrime_c(12347)
I looked into this, and suspect shootout only registers committed memory. The FPC heapmanager allocates a somewhat smaller initial block for heap than libc does, and for these small benchmark programs, often no extra memory is needed.marcov wrote:Why do the FB benchmarks use so much memory (compare with e.g. FPC, and see that it uses several times the memory FPC uses. And FPC is an ordinary compiler, not something specifically embedded)
http://shootout.alioth.debian.org/gp4/b ... g2=fpascal
So for languages that don't differ too much (statically compiled, manual allocation) the benchmarks seems to mostly measure initial size and with what increases suballocator increases mem.
Still, the FPC rtl does not scan e.g. zoneinfo on startup, so that might matter too.
just done some more tests
head 2 head tests carried out just the other day
various C compilations vs FB compiler
see the last 2 posts. MichaelW wrote & compiled the C tests
http://www.freebasic.net/forum/viewtopic.php?t=17561
Ive run my code on a bunch of machine including a dual core. Times, issues ,and spread pointed out.There is no hard and fast answer with these things. But a series of tests can nail the issue down to a ballpark of say 10%
what is needed is a more sophisticated test then my simple benchmark. Some trusted exe benchmarking code we can pass round. C FB etc I have a windows network & im happy to run a bunch of tests on that.
handy site
http://www.cpubenchmark.net/
Ill post the results and conclusions here once we have some good figures
http://freebasicportal.wikispaces.com/FB+Benchmarks
various C compilations vs FB compiler
see the last 2 posts. MichaelW wrote & compiled the C tests
http://www.freebasic.net/forum/viewtopic.php?t=17561
Ive run my code on a bunch of machine including a dual core. Times, issues ,and spread pointed out.There is no hard and fast answer with these things. But a series of tests can nail the issue down to a ballpark of say 10%
what is needed is a more sophisticated test then my simple benchmark. Some trusted exe benchmarking code we can pass round. C FB etc I have a windows network & im happy to run a bunch of tests on that.
handy site
http://www.cpubenchmark.net/
Ill post the results and conclusions here once we have some good figures
http://freebasicportal.wikispaces.com/FB+Benchmarks