which is faster, GCC or freebasic?

anonymous1337 · Post by **anonymous1337** » Mar 17, 2011 12:45

@TJF:

The tested source is published: http://shootout.alioth.debian.org/gp4/c ... ang=FreeBASIC

Click any of those test URLs.

MichaelW · Post by **MichaelW** » Mar 17, 2011 14:24

TJF wrote:No, I made several speed tests with FB-, C- and C++-code. There was no significant speed difference.

In the code you posted there is no significant speed difference because it implements an algorithm where most of the execution time is spent in a slow mod operation that the compiler cannot optimize.

Code: Select all

'====================================================================
#include "windows.bi"
#include "counter.bas"
'====================================================================
'' Counter.bas is available here:
''
'' http://www.freebasic.net/forum/viewtopic.php?t=4221
'====================================================================

dim shared as integer loop_count1, loop_count2

'====================================================================

FUNCTION istPrimzahl(Zahl AS INTEGER) AS INTEGER
  IF Zahl = 2 THEN RETURN 1
  FOR i AS INTEGER = 2 TO Zahl/2+1
    IF (Zahl MOD i)=0 THEN RETURN 0
    loop_count1 += 1
  NEXT : RETURN 1
END FUNCTION

FUNCTION _istPrimzahl(Zahl AS INTEGER) AS INTEGER
  IF Zahl = 2 THEN RETURN 1
  FOR i AS INTEGER = 2 TO Zahl/2+1
    'IF (Zahl MOD i)=0 THEN RETURN 0
    loop_count2 += 1
  NEXT : RETURN 1
END FUNCTION

'====================================================================

istPrimzahl(41)
_istPrimzahl(41)

print loop_count1, loop_count2
print

sleep 3000

counter_begin( 1000, HIGH_PRIORITY_CLASS )
counter_end
print counter_cycles;" cycles, empty"
counter_begin( 1000, HIGH_PRIORITY_CLASS )
    istPrimzahl(41)
counter_end
print counter_cycles;" cycles, with mod  "
counter_begin( 1000, HIGH_PRIORITY_CLASS )
    _istPrimzahl(41)
counter_end
print counter_cycles;" cycles, without mod "
sleep

Code: Select all

21            21

0 cycles, empty
799 cycles, with mod
158 cycles, without mod

While I have not been able to find any easy way to eliminate the MOD operations, I did find a way to reduce the number of them.

Code: Select all

int IsPrime( int n )
{
    int i = 2, div = 5,  max;
    if(n == 2 | n == 3 | n == 5)  return 1;
    if(n % 2 == 0) return 0;
    if(n % 3 == 0) return 0;
    max = sqrt(n+1);
    do
    {
        if(n % div == 0) return 0;
        div += i;
        i = 6 - i;
    } while( div <= max );
    return 1;
}

Code: Select all

'==============================================================================
#include "counter.bas"
'==============================================================================

declare function IsPrime_c cdecl alias "IsPrime"(as integer) as integer

'==============================================================================

dim shared as integer cnt

function IsPrime( byval n as integer ) as integer
    dim as integer i = 2, div = 5, mx
    if n = 2 or n = 3 or n = 5 then return 1
    if n mod 2 = 0 then return 0
    if n mod 3 = 0 then return 0
    mx = cint(sqr(n+1))
    do
        if n mod div = 0 then return 0
        div += i
        i = 6 - i
        cnt+=1
    loop while div <= mx
    return 1
end function

'==============================================================================

for i as integer = 2 to 1000
    if IsPrime_c(i) then print i;
next

print

for i as integer = 2 to 1000
    if IsPrime(i) then print i;
next

print
print

'==============================================================================

dim as integer r

sleep 3000

counter_begin( 1000, HIGH_PRIORITY_CLASS )
counter_end
print counter_cycles;" cycles, empty"

print

counter_begin( 1000, HIGH_PRIORITY_CLASS )
    r = IsPrime(2)
counter_end
print counter_cycles;" cycles, IsPrime(2)"

counter_begin( 1000, HIGH_PRIORITY_CLASS )
    r = IsPrime(100)
counter_end
print counter_cycles;" cycles, IsPrime(100)"

counter_begin( 1000, HIGH_PRIORITY_CLASS )
    r = IsPrime(101)
counter_end
print counter_cycles;" cycles, IsPrime(101)"

counter_begin( 1000, HIGH_PRIORITY_CLASS )
    r = IsPrime(500)
counter_end
print counter_cycles;" cycles, IsPrime(500)"

counter_begin( 1000, HIGH_PRIORITY_CLASS )
    r = IsPrime(503)
counter_end
print counter_cycles;" cycles, IsPrime(503)"

print

counter_begin( 1000, HIGH_PRIORITY_CLASS )
    r = IsPrime_c(2)
counter_end
print counter_cycles;" cycles, IsPrime_c(2)"

counter_begin( 1000, HIGH_PRIORITY_CLASS )
    r = IsPrime_c(100)
counter_end
print counter_cycles;" cycles, IsPrime_c(100)"

counter_begin( 1000, HIGH_PRIORITY_CLASS )
    r = IsPrime_c(101)
counter_end
print counter_cycles;" cycles, IsPrime_c(101)"

counter_begin( 1000, HIGH_PRIORITY_CLASS )
    r = IsPrime_c(500)
counter_end
print counter_cycles;" cycles, IsPrime_c(500)"
counter_begin( 1000, HIGH_PRIORITY_CLASS )
    r = IsPrime_c(503)
counter_end
print counter_cycles;" cycles, IsPrime_c(503)"

sleep

Results with C code compiled with gcc -O2 -c isprime_c.c:

Code: Select all

55 cycles, IsPrime(2)
98 cycles, IsPrime(100)
300 cycles, IsPrime(101)
98 cycles, IsPrime(500)
452 cycles, IsPrime(503)

26 cycles, IsPrime_c(2)
28 cycles, IsPrime_c(100)
233 cycles, IsPrime_c(101)
28 cycles, IsPrime_c(500)
396 cycles, IsPrime_c(503)

Neither C nor C++ can compile a faster binary in this case and most of the other cases. This is what I need to know. This is what I want to show.

The code you used is not typical.

anonymous1337 · Post by **anonymous1337** » Mar 17, 2011 15:25

If you wanted, couldn't

if( intPotentialPrime mod intPotentialFactor1 = 0 ) then ...

be replaced with

intPotentialFactor2 = intPotentialPrime \ intPotentialFactor1
intProduct = intPotentialFactor1 * intPotentialFactor2

if( intProduct = intPotentialPrime ) then ...

?

A floor function may have to be used if FB attempts any rounding.

TJF · Post by **TJF** » Mar 17, 2011 15:36

@anonymous1337:

Thanks for the link. My surface impression: all examples are compiled with -lang deprecated. I have no experience with this.

I made a short test with the first example pidigit and I managed to speed up the FB source by 1.5 % (on my box):

Code: Select all

' The Computer Language Shootout
' http://shootout.alioth.debian.org/
' contributed by Simon Nash (yetifoot)
' converted to FreeBASIC from the gcc c version by Mike Pall
' modified by TJF

#INCLUDE "gmp.bi"

TYPE ctx_t
  AS mpz_t q, r, s, t    ' Transformation matrix components.
  AS mpz_t u, v, w       ' Temporary numbers.
  AS INTEGER d, i, n     ' Counters.
  AS ZSTRING * 11 digits ' Accumulated digits for one line.
END TYPE
'
' Compose matrix with numbers on the right.
SUB compose_r(BYVAL C AS ctx_t PTR, BYVAL Q AS INTEGER, BYVAL R AS INTEGER, BYVAL S AS INTEGER, BYVAL T AS INTEGER)
  WITH *C
    mpz_mul_si(@.u, @.r, S)
    mpz_mul_si(@.r, @.r, Q)
    mpz_mul_si(@.v, @.t, R)
    mpz_add   (@.r, @.r, @.v)
    mpz_mul_si(@.t, @.t, T)
    mpz_add   (@.t, @.t, @.u)
    mpz_mul_si(@.s, @.s, T)
    mpz_mul_si(@.u, @.q, S)
    mpz_add   (@.s, @.s, @.u)
    mpz_mul_si(@.q, @.q, Q)
  END WITH
END SUB

' Compose matrix with numbers on the left.
SUB compose_l(BYVAL C AS ctx_t PTR, BYVAL Q AS INTEGER, BYVAL R AS INTEGER, BYVAL S AS INTEGER, BYVAL T AS INTEGER)
  WITH *C
    mpz_mul_si(@.r, @.r, T)
    mpz_mul_si(@.u, @.q, R)
    mpz_add   (@.r, @.r, @.u)
    mpz_mul_si(@.u, @.t, S)
    mpz_mul_si(@.t, @.t, T)
    mpz_mul_si(@.v, @.s, R)
    mpz_add   (@.t, @.t, @.v)
    mpz_mul_si(@.s, @.s, Q)
    mpz_add   (@.s, @.s, @.u)
    mpz_mul_si(@.q, @.q, Q)
  END WITH
END SUB

' Extract one digit.
FUNCTION extract(BYVAL C AS ctx_t PTR, J AS UINTEGER) AS INTEGER
  WITH *C
    mpz_mul_ui(@.u, @.q, j)
    mpz_add   (@.u, @.u, @.r)
    mpz_mul_ui(@.v, @.s, j)
    mpz_add   (@.v, @.v, @.t)
    mpz_tdiv_q(@.w, @.u, @.v)
    RETURN mpz_get_ui(@.w)
  END WITH
END FUNCTION

' Print one digit. Returns 1 for the last digit.
FUNCTION prdigit(BYVAL C AS ctx_t PTR, y AS INTEGER) AS INTEGER
  C->digits[C->d] = ASC("0") + y
  C->d += 1
  C->i += 1
  IF C->i < C->n THEN IF C->i MOD 10 THEN RETURN 0
  C->digits[C->d] = 0
  ?C->digits, C->i
  C->d = 0 : RETURN C->i = C->n
END FUNCTION

' Generate successive digits of PI.
SUB pidigits(BYVAL C AS ctx_t PTR)
  C->d = 0
  C->i = 0
  WITH *C
    mpz_init_set_ui(@.q, 1)
    mpz_init_set_ui(@.r, 0)
    mpz_init_set_ui(@.s, 0)
    mpz_init_set_ui(@.t, 1)
    mpz_init(@.u)
    mpz_init(@.v)
    mpz_init(@.w)
  VAR k = 1
  DO
    VAR y = extract(C, 3)
    IF y = extract(C, 4) THEN
      IF prdigit(C, y) THEN EXIT SUB
      compose_r(C, 10, -10*y, 0, 1)
    ELSE
      compose_l(C, k, 4*k+2, 0, 2*k+1)
      k += 1
    END IF
  LOOP
  END WITH
END SUB

DIM AS ctx_t c
  c.n = CINT(Command$(1))
  IF c.n <= 0 THEN c.n = 27

VAR t = TIMER
pidigits(@c)
?:?TIMER - t

So the FB code should be as fast as the FreePascal code.

@MichaelW:

Thanks for the benchmarks! Unfortunately you made some mistakes. IMO you should

1) eliminate the line cnt+=1 in function IsPrime.
2) find a similar evaluation for max/mx like mx = 0.5+sqr(n+1)

I would appreciate seeing updated results.

MichaelW · Post by **MichaelW** » Mar 17, 2011 21:40

TJF wrote: Unfortunately you made some mistakes. IMO you should
1) eliminate the line cnt+=1 in function IsPrime.
2) find a similar evaluation for max/mx like mx = 0.5+sqr(n+1)
I would appreciate seeing updated results.

The cnt+=1 was a test that I forgot to remove. It is in the loop, but because integer addition is very fast removing it saved only a few cycles, changing the relevant counts from 300 and 452 to 299 and 449. The cint in mx=cint(sqr(n+1)) was to correct a problem that turned out to be elsewhere, that I also forgot to remove. Replacing it with mx=0.5+sqr(n+1) added extra floating-point operations and increased the counts to 333 and 488. Replacing it with just mx=sqr(n+1) reduced the counts to 296 and 450.

For composite numbers the compiler-optimized code is ~3x faster. For small prime numbers the compiler-optimized code has a small advantage, but as the size of the numbers increase the number of mod operations increase and eventually the compiler-optimized code has no advantage. I added a test to show this effect.

So the full results, running on a P3, are:

Code: Select all

 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103
107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211
223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331
337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449
457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587
593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709
719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853
857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991
997
 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103
107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211
223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331
337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449
457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587
593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709
719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853
857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991
997

0 cycles, empty

55 cycles, IsPrime(2)
98 cycles, IsPrime(100)
296 cycles, IsPrime(101)
98 cycles, IsPrime(500)
450 cycles, IsPrime(503)
1530 cycles, IsPrime(12347)

26 cycles, IsPrime_c(2)
28 cycles, IsPrime_c(100)
233 cycles, IsPrime_c(101)
28 cycles, IsPrime_c(500)
396 cycles, IsPrime_c(503)
1530 cycles, IsPrime_c(12347)

TJF · Post by **TJF** » Mar 18, 2011 11:38

@MichaelW:

Thanks for the additional tests! So I have to learn that optimized binaries can be up to 3 times faster than native gas code.

I didn't notice that up to now. One reason may be that I use a lot of libraries written in C. It makes no difference if you call a library function from C or from FB. In this case both are equal in speed (exeption: inline functions).

Anyway, for daily work as a developer rather the speed of the binary than the speed of the compiling process is my main interest. And fbc compiles much faster than gcc with option -O 2!

frisian · Post by **frisian** » Mar 18, 2011 22:30

@TJF

Saw your pidigit program, tried it and ask myself can this made faster, did not see any thing that could be don to speed it up.
Had a second look today and took a closer look at the calling of the functions and subs and noticed
that the sub compose_r() was called with 3 constants and the sub compose_l()
was called with 1 constant.
Adding 0 to a variable or multiply it with 1 doesn't alter the variable so some instructions can be removed.

The first listing is the your original listing with the unnecessary code
commented out.

Code: Select all

' The Computer Language Shootout
' [url]http://shootout.alioth.debian.org/[/url]
' contributed by Simon Nash (yetifoot)
' converted to FreeBASIC from the gcc c version by Mike Pall
' modified by TJF
' commented some code that was unnecessary , frisian

#Include "gmp.bi"

Type ctx_t
	As Mpz_t q, r, s, t    ' Transformation matrix components.
	As Mpz_t u, v, w       ' Temporary numbers.
	As Integer d, i, n     ' Counters.
	As ZString * 11 digits ' Accumulated digits for one line.
End Type
'
' Compose matrix with numbers on the right.
Sub compose_r(ByVal C As ctx_t Ptr, ByVal Q As Integer, ByVal R As Integer, ByVal S As Integer, ByVal T As Integer)
	With *C
'==============================
' Q = 10 
' R = variable
' S = 0
' T = 1 
'==============================
		'    mpz_mul_si(@.u, @.r, S)        u = 0 (r*0)
		'    Mpz_mul_si(@.r, @.r, Q)	         r = r * 10
		Mpz_mul_si(@.r, @.r, 10)
		Mpz_mul_si(@.v, @.t, R)
		Mpz_add   (@.r, @.r, @.v)
		'    mpz_mul_si(@.t, @.t, T)		t = t  (*1)
		'    Mpz_add   (@.t, @.t, @.u)	t = t  (+0)
		'    mpz_mul_si(@.s, @.s, T)	s = s (*1)
		'    mpz_mul_si(@.u, @.q, S)	u = 0 again hasn't been used so is still 0
		'    mpz_add   (@.s, @.s, @.u)	s = s (+0)
		'    Mpz_mul_si(@.q, @.q, Q)	q = q * 10
		Mpz_mul_si(@.q, @.q, 10)
	End With
End Sub

' Compose matrix with numbers on the left.
Sub compose_l(ByVal C As ctx_t Ptr, ByVal Q As Integer, ByVal R As Integer, ByVal S As Integer, ByVal T As Integer)
	With *C
'==============================
' Q = variable
' R = variable
' S = 0
' T = variable 
'==============================		
		Mpz_mul_si(@.r, @.r, T)
	'	Mpz_mul_si(@.u, @.q, R)
	'	Mpz_add   (@.r, @.r, @.u)
                Mpz_addmul_ui(@.r, @.q, R)
	'	Mpz_mul_si(@.u, @.t, S)		make u = 0, since @.u is not used again and
	'	in function extract() filled with value there no need to do something with @.u
		Mpz_mul_si(@.t, @.t, T)
		Mpz_mul_si(@.v, @.s, R)
		Mpz_add   (@.t, @.t, @.v)
		Mpz_mul_si(@.s, @.s, Q)
	'	Mpz_add   (@.s, @.s, @.u)	s = s + 0
		Mpz_mul_si(@.q, @.q, Q)
	End With
End Sub

' Extract one digit.
Function extract(ByVal C As ctx_t Ptr, J As UInteger) As Integer
	With *C
		Mpz_mul_ui(@.u, @.q, j)
		Mpz_add   (@.u, @.u, @.r)
		Mpz_mul_ui(@.v, @.s, j)
		Mpz_add   (@.v, @.v, @.t)
		Mpz_tdiv_q(@.w, @.u, @.v)
		Return Mpz_get_ui(@.w)
	End With
End Function

' Print one digit. Returns 1 for the last digit.
Function prdigit(ByVal C As ctx_t Ptr, y As Integer) As Integer
	C->digits[C->d] = Asc("0") + y
	C->d += 1
	C->i += 1
	If C->i < C->n Then If C->i Mod 10 Then Return 0
	C->digits[C->d] = 0
	?C->digits, C->i
	C->d = 0 : Return C->i = C->n
End Function

' Generate successive digits of PI.
Sub pidigits(ByVal C As ctx_t Ptr)
	C->d = 0
	C->i = 0
	With *C
		Mpz_init_set_ui(@.q, 1)
		Mpz_init_set_ui(@.r, 0)		' mpz_init(@.r)	mpz_init set variable to 0 
		Mpz_init_set_ui(@.s, 0)	' mpz_init(@.s) only used once so very little gain
		Mpz_init_set_ui(@.t, 1)
		Mpz_init(@.u)
		Mpz_init(@.v)
		Mpz_init(@.w)
		Var k = 1
		Do
			Var y = extract(C, 3)
			If y = extract(C, 4) Then
				If prdigit(C, y) Then Exit Sub
				compose_r(C, 10, -10*y, 0, 1)
			Else
				compose_l(C, k, 4*k+2, 0, 2*k+1)
				k += 1
			EndIf
		Loop
	End With
End Sub

Dim As ctx_t c
c.n = CInt(Command$(1))
If c.n <= 0 Then c.n = 27

Var t = Timer
pidigits(@c)
?:?Timer – t

The second listing has the redundant code remove and some small changes because
some variables aren't passed over to the subs.

Code: Select all

' The Computer Language Shootout
' [url]http://shootout.alioth.debian.org/[/url]
' contributed by Simon Nash (yetifoot)
' converted to FreeBASIC from the gcc c version by Mike Pall
' modified by TJF
' removed obsolete code by frisian

#Include "gmp.bi"

Type ctx_t
	As Mpz_t q, r, s, t    ' Transformation matrix components.
	As Mpz_t u, v, w       ' Temporary numbers.
	As Integer d, i, n     ' Counters.
	As ZString * 11 digits ' Accumulated digits for one line.
End Type

' Compose matrix with numbers on the right.
Sub compose_r(ByVal C As ctx_t Ptr, ByVal R As Integer)
	With *C
		Mpz_mul_si(@.r, @.r, 10)
		Mpz_mul_si(@.v, @.t, R)
		Mpz_add   (@.r, @.r, @.v)
		Mpz_mul_si(@.q, @.q, 10)
	End With
End Sub

' Compose matrix with numbers on the left.
Sub compose_l(ByVal C As ctx_t Ptr, ByVal Q As Integer, ByVal R As Integer, ByVal T As Integer)
	With *C
		Mpz_mul_si(@.r, @.r, T)
		Mpz_addmul_ui(@.r, @.q, R)
		Mpz_mul_si(@.t, @.t, T)
		Mpz_mul_si(@.v, @.s, R)
		Mpz_add   (@.t, @.t, @.v)
		Mpz_mul_si(@.s, @.s, Q)
		Mpz_mul_si(@.q, @.q, Q)
	End With
End Sub

' Extract one digit.
Function extract(ByVal C As ctx_t Ptr, J As UInteger) As Integer
	With *C
		Mpz_mul_ui(@.u, @.q, j)
		Mpz_add   (@.u, @.u, @.r)
		Mpz_mul_ui(@.v, @.s, j)
		Mpz_add   (@.v, @.v, @.t)
		Mpz_tdiv_q(@.w, @.u, @.v)
		Return Mpz_get_ui(@.w)
	End With
End Function

' Print one digit. Returns 1 for the last digit.
Function prdigit(ByVal C As ctx_t Ptr, y As Integer) As Integer
	C->digits[C->d] = Asc("0") + y
	C->d += 1
	C->i += 1
	If C->i < C->n Then If C->i Mod 10 Then Return 0
	C->digits[C->d] = 0
	?C->digits, C->i
	C->d = 0 : Return C->i = C->n
End Function

' Generate successive digits of PI.
Sub pidigits(ByVal C As ctx_t Ptr)
	C->d = 0
	C->i = 0
	With *C
		Mpz_init_set_ui(@.q, 1)
		Mpz_init_set_ui(@.r, 0)
		Mpz_init_set_ui(@.s, 0)
		Mpz_init_set_ui(@.t, 1)
		Mpz_init(@.u)
		Mpz_init(@.v)
		Mpz_init(@.w)
		Var k = 1
		Do
			Var y = extract(C, 3)
			If y = extract(C, 4) Then
				If prdigit(C, y) Then Exit Sub
				compose_r(C, -10*y)
			Else
				compose_l(C, k, 4*k+2, 2*k+1)
				k += 1
			EndIf
		Loop
	End With
End Sub

Dim As ctx_t c
c.n = CInt(Command$(1))
If c.n <= 0 Then c.n = 27

Var t = Timer
pidigits(@c)
?:?Timer - t

Sleep

Speed:
Your original listing takes 3.52 sec. on my computer, first and second listing take 3.42 sec. About 2.8 %, not much but I don't think that this program can be made any faster.

My computer is a Intel dual core , windows xp , and the GMP library is a DLL version (5.0.1) not optimized (Can't get tuneup compiled).
Also tried it with MPIR but MPIR is slower then GMP.

EDIT

After submitting this post i found a litte speedup in the Sub compose_l()
the instructions
Mpz_mul_si(@.u, @.q, R)
Mpz_add (@.r, @.r, @.u)
can be replaced with
Mpz_addmul_ui(@.r, @.q, R)

I changed the listings in this post

The speed is now 3.34 seconds about 4.8 % faster then the original listing

OOPS forgot to mention that all timings are for N = 10000

TJF · Post by **TJF** » Mar 19, 2011 9:14

@frisian:

Good job!

marcov · Post by **marcov** » Mar 19, 2011 13:01

Why do the FB benchmarks use so much memory (compare with e.g. FPC, and see that it uses several times the memory FPC uses. And FPC is an ordinary compiler, not something specifically embedded)

http://shootout.alioth.debian.org/gp4/b ... g2=fpascal

TJF · Post by **TJF** » Mar 19, 2011 15:50

@MichaelW:

Are you interested in testing this function against IsPrime_c?

Code: Select all

FUNCTION IsPrime( BYVAL n AS INTEGER ) AS INTEGER
  SELECT CASE AS CONST n
  CASE 2, 3, 5 : RETURN 1
  CASE ELSE
    IF n MOD 2 = 0 THEN RETURN 0
    IF n MOD 3 = 0 THEN RETURN 0
    VAR i = 2, div = 5, mx = CINT(SQR(n + 1))
    DO
      IF n MOD div = 0 THEN RETURN 0
      div += i
      i = 6 - i
    LOOP WHILE div <= mx
  END SELECT : RETURN 1
END FUNCTION

MichaelW · Post by **MichaelW** » Mar 19, 2011 22:55

Running on a P3, for composite numbers your changes close the gap to 2x.

Code: Select all

 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103
107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211
223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331
337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449
457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587
593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709
719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853
857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991
997
 2 3 5 7 11 13 17 19 23 29 31 37 41 43 47 53 59 61 67 71 73 79 83 89 97 101 103
107 109 113 127 131 137 139 149 151 157 163 167 173 179 181 191 193 197 199 211
223 227 229 233 239 241 251 257 263 269 271 277 281 283 293 307 311 313 317 331
337 347 349 353 359 367 373 379 383 389 397 401 409 419 421 431 433 439 443 449
457 461 463 467 479 487 491 499 503 509 521 523 541 547 557 563 569 571 577 587
593 599 601 607 613 617 619 631 641 643 647 653 659 661 673 677 683 691 701 709
719 727 733 739 743 751 757 761 769 773 787 797 809 811 821 823 827 829 839 853
857 859 863 877 881 883 887 907 911 919 929 937 941 947 953 967 971 977 983 991
997

0 cycles, empty

25 cycles, IsPrime(2)
56 cycles, IsPrime(100)
255 cycles, IsPrime(101)
56 cycles, IsPrime(500)
409 cycles, IsPrime(503)
1489 cycles, IsPrime(12347)

26 cycles, IsPrime_c(2)
28 cycles, IsPrime_c(100)
233 cycles, IsPrime_c(101)
28 cycles, IsPrime_c(500)
396 cycles, IsPrime_c(503)
1530 cycles, IsPrime_c(12347)

FWIW, I also tested with the Microsoft Visual C++ Toolkit 2003 compiler using this command line:

cl /QIfist /FA /G6 /O2 /Zl /c isprime_c.c

Once I found the /QIfist compiler option that allowed me to avoid a _ftol2 call (apparently otherwise used to convert the floating-point square root to an integer), the code was a little faster than the GCC-optimized (3.4.5) code.

Code: Select all

19 cycles, IsPrime_c(2)
22 cycles, IsPrime_c(100)
210 cycles, IsPrime_c(101)
24 cycles, IsPrime_c(500)
368 cycles, IsPrime_c(503)
1502 cycles, IsPrime_c(12347)

TJF · Post by **TJF** » Mar 20, 2011 12:02

@MichaelW:

Thanks for the report!

I learned that FB is a bit but not much slower than C.

marcov · Post by **marcov** » Mar 21, 2011 15:36

marcov wrote:Why do the FB benchmarks use so much memory (compare with e.g. FPC, and see that it uses several times the memory FPC uses. And FPC is an ordinary compiler, not something specifically embedded)

http://shootout.alioth.debian.org/gp4/b ... g2=fpascal

I looked into this, and suspect shootout only registers committed memory. The FPC heapmanager allocates a somewhat smaller initial block for heap than libc does, and for these small benchmark programs, often no extra memory is needed.

So for languages that don't differ too much (statically compiled, manual allocation) the benchmarks seems to mostly measure initial size and with what increases suballocator increases mem.

Still, the FPC rtl does not scan e.g. zoneinfo on startup, so that might matter too.

TESLACOIL · Post by **TESLACOIL** » Mar 29, 2011 20:47

head 2 head tests carried out just the other day

various C compilations vs FB compiler
see the last 2 posts. MichaelW wrote & compiled the C tests

http://www.freebasic.net/forum/viewtopic.php?t=17561

Ive run my code on a bunch of machine including a dual core. Times, issues ,and spread pointed out.There is no hard and fast answer with these things. But a series of tests can nail the issue down to a ballpark of say 10%

what is needed is a more sophisticated test then my simple benchmark. Some trusted exe benchmarking code we can pass round. C FB etc I have a windows network & im happy to run a bunch of tests on that.

handy site
http://www.cpubenchmark.net/

Ill post the results and conclusions here once we have some good figures
http://freebasicportal.wikispaces.com/FB+Benchmarks

TheMG · Post by **TheMG** » Apr 02, 2011 17:43

TJF: Those FreeBASIC benchmarks were left that way for a reason: they are comparing like for like. The C, Pascal and FreeBASIC versions of a test should all be as close to each other as possible so that you are not measuring the skill of the programmer, only the optimization of the compiler!

which is faster, GCC or freebasic?

just done some more tests