Code: Select all
function add_asm naked (a as single, b as single) as single
asm addss xmm0,xmm1
asm ret
end asm
end function
function add_basic(a as single, b as single) as single
return a+b
end function
add_basic A is in xmm0 and B are in xmm1 so only a "addss xmm0,xmm1" are needed
now take a look what the gcc assembler code emitter writes out (compiler on dope)
Code: Select all
ADD_BASIC:
push rbp
mov rbp, rsp
sub rsp, 20
movss DWORD PTR 16[rbp], xmm0 ' here this are totally stupid A and B are stored in memory why ?
movss DWORD PTR 24[rbp], xmm1
lea rax, -4[rbp]
mov DWORD PTR [rax], 0 ' a temporary return value not needed at all are set to 0 (function = 0)
movss xmm0, DWORD PTR 16[rbp] ' now A are reloaded from memory back in xmm0
addss xmm0, DWORD PTR 24[rbp] ' add b from memory to xmm0 (but b was before in xmm1)
movss DWORD PTR -4[rbp], xmm0 ' function= a+b
nop
mov eax, DWORD PTR -4[rbp] ' copy [function] back in xmm0 it was after the add "addss" instruction in xmm0
mov DWORD PTR -20[rbp], eax
movss xmm0, DWORD PTR -20[rbp]
leave
ret
but I won't count how many clock cicles this stupid code needs to return the result of a+b !
Now I know why java script becomes faster and faster compared to gcc code generators. (70-80% of compiled gcc code)
For example firefox comes with assembler on the fly emitters (I counted more than 10 CPU's are supported)
java script does not have any fast integer type all are 64-bit double's
b = array1[x]+array2[y]
here the array indices x,y are double also
but Firefox and chrome writes out more clever assembler code on the fly as gcc :-)
Joshy