integer operations benchmark (or interpreter killer)

For other topics related to the FreeBASIC project or its community.
Ed Davis
Posts: 10
Joined: Jul 28, 2008 23:24

integer operations benchmark (or interpreter killer)

Here is the code in FreeBasic for an ASCII Mandelbrot, using integers only.

Code: Select all

dim as integer left_edge   = -420
dim as integer right_edge  =  300
dim as integer top_edge    =  300
dim as integer bottom_edge = -300
dim as integer x_step      =  7
dim as integer y_step      =  15
dim as integer max_iter    =  200

dim as integer y0 = top_edge
while y0 > bottom_edge
dim as integer x0 = left_edge
while x0 < right_edge
dim as integer y = 0
dim as integer x = 0
dim as integer x_x = 0
dim as integer y_y = 0
dim as integer i = 0
dim as string the_char = " "
while i < max_iter and x_x + y_y <= 800
x_x =    (x * x) \ 200
y_y =    (y * y) \ 200
if x_x + y_y > 800 then
the_char = chr(48 + i)
if i > 9 then the_char = "@"
exit while
end if
dim as integer temp = x_x - y_y + x0
if (x < 0 and y > 0) or (x > 0 and y < 0) then
y = (-1 * ((-1 * x * y) \ 100)) + y0
else
y = (x * y \ 100) + y0
end if
x = temp
i = i + 1
wend
x0 = x0 + x_step
print the_char ;
wend
y0 = y0 - y_step
print
wend

This is based on code I found at http://rosettacode.org/wiki/Mandelbrot_set

I got to thinking, this might make a nice integer benchmark, if instead of printing the Mandelbrot, we just accumulate a value (the sum of the generated characters) and looped for a while.

So, I modified the program, and after testing, found that a C version of the above, compiled with gcc -O3, took 1 second to perform 1545 iterations on my machine. The code is very similar to the above ascii integer Mandelbrot, except that it performs multiple iterations, accumulates the character to be printed, and the 'exit loop' is removed - not all simple languages support early-loop-exits, and I wanted the code to be as common between languages as possible.

I figured I could use the C version as a standard, and finding a count that took 1 second would give me an easy estimate as to how much slower other language processors are, compared to C, and of course, only in regards to this benchmark.

And (ta dah), the FreeBasic version compares very favorably with the C version, speed-wise. In fact, it was the second fastest processor that I tested.

Here is the FreeBasic version:

Code: Select all

dim as integer accum = 0
dim as integer count = 0
while count < 1545
dim as integer left_edge   = -420
dim as integer right_edge  =  300
dim as integer top_edge    =  300
dim as integer bottom_edge = -300
dim as integer x_step      =  7
dim as integer y_step      =  15
dim as integer max_iter    =  200

dim as integer y0 = top_edge
while y0 > bottom_edge
dim as integer x0 = left_edge
while x0 < right_edge
dim as integer y = 0
dim as integer x = 0
dim as integer x_x = 0
dim as integer y_y = 0
dim as integer i = 0
dim as integer the_char = 32
while i < max_iter and x_x + y_y <= 800
x_x =    ((x * x) \ 200)
y_y =    ((y * y) \ 200)
if x_x + y_y > 800 then
the_char = 48 + i
if i > 9 then the_char = 64
else
dim as integer temp = x_x - y_y + x0
if (x < 0 and y > 0) or (x > 0 and y < 0) then
y = (-1 * ((-1 * x * y) \ 100)) + y0
else
y = (x * y \ 100) + y0
end if
x = temp
end if

i = i + 1
wend
accum = accum + the_char

x0 = x0 + x_step
wend

y0 = y0 - y_step
wend

if count mod 300 = 0 then print accum

count = count + 1
wend
print accum

This is the output:

200574 60372774 120544974 180717174 240889374 301061574 309886830

This is a pretty intense integer benchmark!

Note this funky code:

Code: Select all

if (x < 0 and y > 0) or (x > 0 and y < 0) then
y = (-1 * ((-1 * x * y) \ 100)) + y0
else
y = (x * y \ 100) + y0
end if

It turns out that there is no consensus as to whether integer division, when one of the operands is negative, should round towards zero, negative infinity, or positive infinity.

Python and Ruby both round towards negative infinity. C rounds towards zero.

In order to get the same output from each language (to verify that each language was essentially computing the same thing and doing similar work), I had to figure out how to preclude one of the operands from being negative.

The code checks, and if either x or y is < 0 (but not both), then it multiplies by minus one to force positive division, and then by minus one again at the end to restore the sign.

Whats the purpose of this?

I enjoy fiddling with compilers/interpreters, especially those that are simple enough that I can understand. I also enjoy writing my own. I wanted to find out what makes some interpreters so much faster than others, and why are others so slow. Examining the source to some of these interpreters has helped me learn some of the reasons, and has helped to improve the speed of those I'm working on.

Here are the specs for my machine:

Windows 7, Service Pack 1, 64-bit
Intel Core i7-3720QM CPU @2.60GHz
16.0 GB (15.9 usable)

Below are the tests that I have run.

Integer operations benchmark (based on integer ascii Mandelbrot): (1545 iterations)

Must generate the following output using 1545 iterations of the code above/below:

200574 60372774 120544974 180717174 240889374 301061574 309886830

All times are in seconds.

Compilers or JIT

Code: Select all

gcc: C -O3 or -O2    1.00
FreeBasic v1.01x64   1.09  freebasic.net
BCX                  1.12  bcxbasic.com
Nim                  1.12  nim-lang.org
Java                 1.23
C#                   1.34
VisualBasic.Net      1.52
FreePascal -O3       1.92
gcc: C, no options   1.96
gcc: C -Os           2.38
Borland C            2.80
TinyC                3.23  bellard.org/tcc
euc -gcc -con        3.58  openeuphoria.org
Oxygen basic         3.77  oxygenbasic.org
PowerBasic           4.03
NodeJS              10.23  nodejs.org
QB64                32.43  qb64.net

Interpreters that compile to byte code or an AST

Code: Select all

pe             14.26  github.com/peberlein/interpreter
toy.bas        33.35  updated version of: QDepartment/Files/Ed-Davis-stuff/toy.bas
Euphoria       39.90  openeuphoria.org
Pike           44.39  pike.lysator.liu.se
toy4.c         57.24  updated version of: QDepartment/Files/Ed-Davis-stuff/toy.c
Tinypas.c      58.97  Pascal-S converted to C
Ruby           60.25  ruby-lang.org
tiny-c.c       65.96  updated version of: iro.umontreal.ca/~felipe/IFT2030-Automne2002/Complements/tinyc.c
calc3a         70.01  updated version of: epaperpress.com/lexandyacc/
vspl           68.76  cl.cam.ac.uk/~mr10/VSPL.html
Lua            80.14  lua.org
php            81.23  php.net
Wren           84.57  github.com/munificent/wren
C4             97.80  github.com/rswier/c4
UnderC        103     home.mweb.co.za/sd/sdonovan/underc.html
jwillia basic 109     github.com/jwillia3/BASIC
hoc           140     en.wikipedia.org/wiki/Hoc_%28programming_language%29
NaaLaa        168     naalaa.com
CInt          201     root.cern.ch/drupal/content/cint
Python        274     python.org
Yabasic       278     yabasic.de
hanson-calc   321     github.com/drh/drhanson/tree/master/calc

Interpreters (not using a VM or AST)

Code: Select all

ChipMunkBasic  442  nicholson.com/rhn/basic
SmallBASIC     514  smallbasic.sourceforge.net
BBCBasic       531  bbcbasic.co.uk
ThinBasic      543  thinbasic.com
FBSL           551  fbsl.net
CH             582  softintegration.com
Scriba         618  scriptbasic.org
SI            1020  drdobbs.com/cpp/si-a-c-like-script-interpreter/184408141
LBB           1502  bbcbasic.co.uk/lbb
PicoC         2478  github.com/zsaleeba/picoc
DDS           3033  modified version of: grail.cba.csuohio.edu/~somos/ddsbasic.c
MiniBASIC     5471  sourceforge.net/p/minibasic/wiki/Home
nuBASIC       8132  nubasic.eu

Selected notes

Code: Select all

FreeBasic     Screamingly fast!  Impressive!
TinyC         Very fast compiling compiler
euc -gcc -con Euphoria rocks!
Oxygen basic  Oxyben Basic rocks!
NodeJs        Server/desktop Javascript
pe            Micro Euphoria.  Fastest interpreter tested.
toy.bas       Updated version of a simple interpreter I wrote several years ago.
Pike          Neat C-like scripting language
toy4.c        Updated version of a simple interpreter I wrote several years ago.
Tinypas.c     I converted Pascal-S (famous pascal compiler/interpreter) to C.
Ruby          Ruby rocks!
tiny-c.c      I updated this, adding what was needed to run the benchmark.
calc3a        I updated this, adding what was needed to run the benchmark.
vspl          B-like programming language
Lua           Lua rocks!
php           php rocks!
c4            Neat C subset interpreter
UnderC        Almost full C and lots of C++ interpreter
jwillia basic Neat tiny Basic interpreter.
NaaLaa        NaaLaa rocks!
CInt          Full C interpreter
Python        Python rocks!
CH            Full C interpreter
SI            I updated this, adding what was needed to run the benchmark.
LBB           Liberty Basic Booster
LittleC       I updated this, adding what was needed to run the benchmark.
PicoC         Almost full C interpreter
DDS           I updated this, adding what was needed to run the benchmark.

Some take-aways:

• FreeBasic is really fast.
• If you are going to write an interpreter, and speed matters, compile to byte code or an AST.
• Several well-known interpreters are surprisingly slow, in this case at least. In there defense, there may be other features that they offer that make up for the slowness in this instance.

Source code for selected languages

The C version:

Code: Select all

#include <stdio.h>

int main() {
int left_edge, right_edge, top_edge, bottom_edge, max_iter,
x_step, y_step, y0, x0, x, y, i, x_x, y_y, temp,
the_char, accum, count;

accum = 0;
count = 0;
while (count < 1545) {
left_edge = -420;
right_edge  =  300;
top_edge  =  300;
bottom_edge = -300;
x_step    =  7;
y_step    =  15;

max_iter  =  200;

y0 = top_edge;
while (y0 > bottom_edge) {
x0 = left_edge;
while (x0 < right_edge) {
y = 0;
x = 0;
the_char = 32;
x_x = 0;
y_y = 0;
i = 0;
while (i < max_iter && x_x + y_y <= 800) {
x_x = (x * x) / 200;
y_y = (y * y) / 200;
if (x_x + y_y > 800 ) {
the_char = 48 + i;
if (i > 9) {
the_char = 64;
}
} else {
temp = x_x - y_y + x0;
if ((x < 0 && y > 0) || (x > 0 && y < 0)) {
y = (-1 * ((-1 * (x * y)) / 100)) + y0;
} else {
y = x * y / 100 + y0;
}
x = temp;
}
i = i + 1;
}
accum = accum + the_char;
x0 = x0 + x_step;
}
y0 = y0 - y_step;
}
if (count % 300 == 0) {
printf("%d\n", accum);
}
count = count + 1;
}
printf("%d\n", accum);
return 0;
}

The Java version:

Code: Select all

class mandelbench {
public static void main(String[] args) {

int left_edge, right_edge, top_edge, bottom_edge, max_iter,
x_step, y_step, y0, x0, x, y, i, x_x, y_y, temp, the_char,
accum, count;

accum = 0;
count = 0;
while (count < 1545) {
left_edge = -420;
right_edge  =  300;
top_edge  =  300;
bottom_edge = -300;
x_step    =  7;
y_step    =  15;

max_iter  =  200;

y0 = top_edge;
while (y0 > bottom_edge) {
x0 = left_edge;
while (x0 < right_edge) {
y = 0;
x = 0;
the_char = ' ';
x_x = 0;
y_y = 0;
i = 0;
while (i < max_iter && x_x + y_y <= 800) {
x_x = (x * x) / 200;
y_y = (y * y) / 200;
if (x_x + y_y > 800) {
the_char = '0' + i;
if (i > 9) {
the_char = '@';
}
} else {
temp = x_x - y_y + x0;
if ((x < 0 && y > 0) || (x > 0 && y < 0))
y = (-1 * (-1 * x * y) / 100) + y0;
else
y = x * y / 100 + y0;
x = temp;
}

i = i + 1;
}
accum = accum + the_char;

x0 = x0 + x_step;
}
y0 = y0 - y_step;
}
if (count % 300 == 0)
System.out.printf("%d ", accum);

count = count + 1;
}

System.out.printf("%d\n", accum);
}
}

The Euphoria version:

Code: Select all

without type_check
integer left_edge, right_edge, top_edge, bottom_edge, max_iter,
x_step, y_step, y0, x0, x, y, i, x_x, y_y, temp, the_char, accum,
count

accum = 0
count = 0
while count < 1545 do
left_edge = -420
right_edge  =  300
top_edge  =  300
bottom_edge = -300
x_step    =  7
y_step    =  15

max_iter  =  200

y0 = top_edge
while y0 > bottom_edge do
x0 = left_edge
while x0 < right_edge do
y = 0
x = 0
the_char = 32
x_x = 0
y_y = 0
i = 0
while i < max_iter and x_x + y_y <= 800 do
x_x = floor((x * x) / 200)
y_y = floor((y * y) / 200)
if x_x + y_y > 800 then
the_char = 48 + i
if i > 9 then
the_char = 64
end if
else
temp = x_x - y_y + x0
if ((x < 0 and y > 0) or (x > 0 and y < 0)) then
y = (-1 * floor((-1 * x * y) / 100)) + y0
else
y = floor(x * y / 100) + y0
end if
x = temp
end if

i = i + 1
end while
accum = accum + the_char

x0 = x0 + x_step
end while

y0 = y0 - y_step
end while

if remainder(count, 300) = 0 then
printf(1, "%d ", accum)
end if

count = count + 1
end while

printf(1, "%d\n", accum)

The Ruby version:

Code: Select all

accum = 0
count = 0
while count < 1545 do
left_edge = -420
right_edge  =  300
top_edge  =  300
bottom_edge = -300
x_step    =  7
y_step    =  15

max_iter  =  200

y0 = top_edge
while y0 > bottom_edge do
x0 = left_edge
while x0 < right_edge do
y = 0
x = 0
the_char = 32
x_x = 0
y_y = 0
i = 0
while i < max_iter and x_x + y_y <= 800 do
x_x = (x * x) / 200
y_y = (y * y) / 200
if x_x + y_y > 800 then
the_char = 48 + i
if i > 9 then
the_char = 64
end
else
temp = x_x - y_y + x0
if ((x < 0 and y > 0) or (x > 0 and y < 0)) then
y = (-1 * ((-1 * x * y) / 100)) + y0
else
y = (x * y / 100) + y0
end
x = temp
end

i = i + 1
end
accum = accum + the_char

x0 = x0 + x_step
end

y0 = y0 - y_step
end

if count % 300 == 0 then
printf "%d ", accum
end

count = count + 1
end

puts accum

The Pascal version:

Code: Select all

program mandel;

var left_edge, right_edge, top_edge, bottom_edge, max_iter,
x_step, y_step, y0, x0, x, y, i, x_x, y_y, temp, the_char, accum,
count:integer;

begin;
accum := 0;

count := 0;
while count < 1545 do begin

left_edge := -420;        { left edge = -2.1 }
right_edge  :=  300;        { right edge = 1.5 }
top_edge  :=  300;        { top edge = 1.5 }
bottom_edge := -300;        { bottom edge = -1.5 }
x_step    :=  7;          { x step size }
y_step    :=  15;         { y step size }

max_iter  := 200;         { max iteration depth }

y0 := top_edge;
while y0 > bottom_edge do begin     { y0 }
x0 := left_edge;
while x0 < right_edge do begin      { x0 }
y := 0;               { y }
x := 0;               { x }
the_char := 32;           { char to be displayed }
x_x := 0;
y_y := 0;
i := 0;
while (i < max_iter) and (x_x + y_y <= 800) do begin  { iteration }
x_x := (x * x) div 200;               { x*x }
y_y := (y * y) div 200;               { y*y }
if x_x + y_y >   800 then begin
the_char := 48 + i;               { print digit 0...9 }
if i > 9 then begin               { if iteration count > 9, }
the_char := 64;               {  print '@' }
end;
end else begin
temp := x_x - y_y + x0;             { temp = x*x - y*y + x0 }
if (((x < 0) and (y > 0)) or ((x > 0) and (y < 0))) then
y := (-1 * ((-1 * x * y) div 100)) + y0
else
y := (x * y div 100) + y0;          { y = 2*x*y + y0 }
x := temp;                    { x = temp }
end;

i := i + 1;
end;
accum := accum + the_char;

x0 := x0 + x_step;
end;

y0 := y0 - y_step;
end;

if count mod 300 = 0 then
write(accum, ' ');

count := count + 1;
end;

writeln(accum);
end.

The Lua version.

Code: Select all

local left_edge, right_edge, top_edge, bottom_edge, max_iter, x_step, y_step, y0, x0, x,
y, i, x_x, y_y, temp, the_char, accum, count

accum = 0
count = 0
while count < 1545 do
left_edge = -420
right_edge  =  300
top_edge  =  300
bottom_edge = -300
x_step    =  7
y_step    =  15

max_iter  =  200

y0 = top_edge
while y0 > bottom_edge do
x0 = left_edge
while x0 < right_edge do
y = 0
x = 0
the_char = 32
x_x = 0
y_y = 0
i = 0
while i < max_iter and x_x + y_y <= 800 do
x_x = math.floor((x * x) / 200)
y_y = math.floor((y * y) / 200)
if x_x + y_y > 800 then
the_char = 48 + i
if i > 9 then
the_char = 64
end
else
temp = x_x - y_y + x0
if ((x < 0 and y > 0) or (x > 0 and y < 0)) then
y = (-1 * math.floor((-1 * x * y) / 100)) + y0
else
y = math.floor(x * y / 100) + y0
end
x = temp
end

i = i + 1
end
accum = accum + the_char

x0 = x0 + x_step
end

y0 = y0 - y_step
end

if count % 300 == 0 then
print(accum)
end

count = count + 1
end

print (pc)

The Python version.

Code: Select all

accum = 0
count = 0
while count < 1545:
left_edge = -420
right_edge  =  300
top_edge  =  300
bottom_edge = -300
x_step    =  7
y_step    =  15

max_iter  =  200

y0 = top_edge
while y0 > bottom_edge:
x0 = left_edge
while x0 < right_edge:
y = 0
x = 0
the_char = 32
x_x = 0
y_y = 0
i = 0
while i < max_iter and x_x + y_y <= 800:
x_x = int((x * x) / 200)
y_y = int((y * y) / 200)
if x_x + y_y > 800:
the_char = 48 + i
if i > 9:
the_char = 64

else:
temp = x_x - y_y + x0
if ((x < 0 and y > 0) or (x > 0 and y < 0)):
y = (-1 * int((-1 * x * y) / 100)) + y0
else:
y = int(x * y / 100) + y0
x = temp

i = i + 1

accum = accum + the_char

x0 = x0 + x_step

y0 = y0 - y_step

if count % 300 == 0:
print(accum)

count = count + 1

print(accum)
fxm
Posts: 9182
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: integer operations benchmark (or interpreter killer)

- As you are using Windows 64-bit, I think that INTEGER defines a 64 bits integer number.
- Can you test it by using everywhere LONG (32 bits) instead of INTEGER?
Ed Davis
Posts: 10
Joined: Jul 28, 2008 23:24

Re: integer operations benchmark (or interpreter killer)

- As you are using Windows 64-bit, I think that INTEGER defines a 64 bits integer number.
- Can you test it by using everywhere LONG (32 bits) instead of INTEGER?

1.37 seconds.

Interesting. I thought it would have been much closer to the 1.09 seconds of the INTEGER version.
D.J.Peters
Posts: 7825
Joined: May 28, 2005 3:28

Re: integer operations benchmark (or interpreter killer)

For array indices i use in my current FreeBASIC project unsiged integers.

The native CPU data type is always faster than any other data types.

A 64 bit data type on 64 bit CPU can be transferred on the data bus without masking.

Vice versa a 64 bit data type on 32 bit CPU must be masked and/or shifted and transferred in two pieces.

So i would accept ~20% performance differences in both direction

64bit data on 32bit CPU/BUS or 32bit on 64bit CPU/BUS.

But this all, is only true if the compiler, virtual machine or interpreter knows what it does. :-)

bla bla bla.

However nice collection of benchmarks.

Joshy
dkl
Posts: 3209
Joined: Jul 28, 2005 14:45
Location: Germany

Re: integer operations benchmark (or interpreter killer)

If using the 64bit version of FB, that means it's using the -gen gcc backend, and you can do fbc -O 2 to tell it to use gcc -O2, which should make it similar to the gcc benchmark with the same options.
fxm
Posts: 9182
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: integer operations benchmark (or interpreter killer)

Because I was surprised by the score of 1.09 for FreeBASIC compared to 1 for C/gcc.
I think that for C/gcc, INT is a 32 bits long.
Thus, 1.37 (for FreeBASIC with LONG) must rather be compared to 1 (for C/gcc with INT).

And what about all the other results?
caseih
Posts: 1379
Joined: Feb 26, 2007 5:32

Re: integer operations benchmark (or interpreter killer)

dkl wrote:If using the 64bit version of FB, that means it's using the -gen gcc backend, and you can do fbc -O 2 to tell it to use gcc -O2, which should make it similar to the gcc benchmark with the same options.

Yes I would think that the C code would be a nearly 1:1 translation by fbc, so the result should be statistically insignificant from the C-only version.

I'm pretty impressed by Java's results. I bet if you ran the test in a big loop, say 1000 times, the Java speed would come pretty near the C code speed, after the JIT has had a chance to analyze the execution paths.

EDIT:
On my 64-bit Linux machine here, with the -O2 optimization for both FB and C versions, the results are so close as to be identical, 1.3 s each, ran several times.

I also for kicks ran the Python version through PyPy JIT and was pretty impressed. The execution time was 14 seconds vs the interpreter which took at least 60 seconds. Not too shabby for a dynamic language.
Last edited by caseih on Apr 18, 2015 14:45, edited 2 times in total.
marcov
Posts: 2777
Joined: Jun 16, 2005 9:45
Location: Eindhoven, NL
Contact:

Re: integer operations benchmark (or interpreter killer)

caseih wrote:I'm pretty impressed by Java's results. I bet if you ran the test in a big loop, say 1000 times, the Java speed would come pretty near the C code speed, after the JIT has had a chance to analyze the execution paths.

I wonder why that is surprising? It's all value types, and then the basic problem to the (JIT-) compiler is no different, except that the JIT compiler knows the exact CPU it runs on. Java could be FASTER than gcc. Though of course it is, as a JIT compiler, probably more reluctant to spend time in compiling/optimizing.
caseih
Posts: 1379
Joined: Feb 26, 2007 5:32

Re: integer operations benchmark (or interpreter killer)

Well I never said I was surprised by Java's results (and in fact I was not at all surprised). Rather I was impressed. On my machine, when I turn the benchmark into a method and run it many times, the execution time comes within .1 s of the C version. Anyway, sort of interesting.
dodicat
Posts: 5942
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: integer operations benchmark (or interpreter killer)

If you use andalso instead of and
Then
instead of or use orelse

Then the Freebasic code is a bit faster.

Or is that perhaps cheating?
caseih
Posts: 1379
Joined: Feb 26, 2007 5:32

Re: integer operations benchmark (or interpreter killer)

Wait. Why would andalso be cheating, and how would it be faster than the C code? I realize that in normal FB code, the AND is bitwise, not logical, so it's not short-circuited. hence it will be slower. But in C, a logical and is short-circuited, so in the C version it's already as fast as it can be. ANDALSO would simply bring it to C equivalence, no?

I learn something new about FB a lot on this forum. I hadn't known about these operators and I always had found BASIC's lack of true logical operators to be a weak spot, which ANDALSO and ORELSE handily address. This brings up a question. Why would I not want to use ANDALSO or ORELSE in a logical expression? I can't think of any situation, unless I specifically need bitwise operations. Thoughts?
dkl