Sometimes I think the -gen gcc asm code emitter is totaly stupid :-)

D.J.Peters · Post by **D.J.Peters** » Aug 10, 2019 2:44

function  add_asm naked (a as single, b as single) as single
  asm addss xmm0,xmm1
  asm  ret 
end asm
end function
function  add_basic(a as single, b as single) as single
  return a+b
end function

note: x86_64
add_basic A is in xmm0 and B are in xmm1 so only a "addss xmm0,xmm1" are needed
now take a look what the gcc assembler code emitter writes out (compiler on dope)

Code: Select all

ADD_BASIC:
  push  rbp
  mov  rbp, rsp
  sub  rsp, 20
  movss  DWORD PTR 16[rbp], xmm0 ' here this are totally stupid A and B are stored in memory why ?
  movss  DWORD PTR 24[rbp], xmm1
  lea  rax, -4[rbp]
  mov  DWORD PTR [rax], 0  ' a temporary return value not needed at all are set to 0 (function = 0) 
  movss  xmm0, DWORD PTR 16[rbp] ' now A are reloaded from memory back in xmm0
  addss  xmm0, DWORD PTR 24[rbp] ' add b from memory to xmm0 (but b was before in xmm1)
  movss  DWORD PTR -4[rbp], xmm0 ' function= a+b
  nop
  mov  eax, DWORD PTR -4[rbp]         ' copy [function] back in xmm0  it was after the add "addss" instruction in xmm0 
  mov  DWORD PTR -20[rbp], eax
  movss  xmm0, DWORD PTR -20[rbp] 
  leave
  ret

It's cool if we have modern CPU's to day with SSE and 2 floats can be added in 2 clock cicles
but I won't count how many clock cicles this stupid code needs to return the result of a+b !

Now I know why java script becomes faster and faster compared to gcc code generators. (70-80% of compiled gcc code)
For example firefox comes with assembler on the fly emitters (I counted more than 10 CPU's are supported)
java script does not have any fast integer type all are 64-bit double's
b = array1[x]+array2[y]
here the array indices x,y are double also
but Firefox and chrome writes out more clever assembler code on the fly as gcc :-)

Joshy

IchMagBier · Post by **IchMagBier** » Aug 10, 2019 4:14

With fbc -O 3:

Code: Select all

ADD_ASM:
	addss xmm0,xmm1
	ret

ADD_BASIC:
	addss xmm0, xmm1
	ret

I don't know why it still creates the ADD_BASIC function, when it's inlined anyway. But besides that, this seems to be the best solution I can think of.

D.J.Peters · Post by **D.J.Peters** » Aug 10, 2019 11:25

original BASIC:
function add_basic(a as single, b as single) as single
return a+b
end function

If some of you can't read ASM code here are the BASIC version of what the code emitter created !

Joshy

Code: Select all

dim shared as single gxmm0,gxmm1
dim shared as long   geax
function ADD_BASIC(a as single, b as single) as single
  dim as single tmpA=any,tmpB=any,retValue=any 
  dim as single ptr PretValue=any    ' sub  rsp, 20
  tmpA = a                           ' movss  DWORD PTR 16[rbp], xmm0
  tmpB = b                           ' movss  DWORD PTR 24[rbp], xmm1
  pretValue = @retValue              ' lea    rax              , -4[rbp]
  *pretValue=0.0                     ' mov    DWORD PTR [rax]  , 0
  gxmm0     = tmpA                   ' movss  xmm0             , DWORD PTR 16[rbp]
  gxmm0    += tmpB                   ' addss  xmm0             , DWORD PTR 24[rbp]
  *PretValue=gxmm0                   ' movss  DWORD PTR -4[rbp], xmm0
  asm nop                            ' no operation 
  geax = *cptr(long ptr,PretValue)   ' mov   eax               , DWORD PTR -4[rbp]
  retValue = *cptr(single ptr,@geax) ' mov   DWORD PTR -20[rbp], eax
  function = retValue                ' movss xmm0              , DWORD PTR -20[rbp]
end function
print ADD_BASIC(1,2)
sleep

TeeEmCee · Post by **TeeEmCee** » Aug 28, 2019 10:36

Like IchMagBier wrote, you need to pass -O 1/2/3 to fbc, otherwise fbc invokes gcc with no optimisations. "gcc -O 0" produces very bad asm, yes, even significantly worse than fbc. Of course, because absolutely no optimisations are performed and every line is compiled independently. But even with just -O 1, it compiles to just "addss %xmm1,%xmm0; retq".
If you don't want to compile with optimisations for debugging reasons, try "fbc -Wc -Og" (use only optimisations which do not hinder debuggability), though that's probably very similar to -O 1.

I don't know why it still creates the ADD_BASIC function, when it's inlined anyway.

Because the function isn't declared private (static), so a non-inlined version needs to be put in the object file in case it's called from another module.

java script does not have any fast integer type all are 64-bit double's

Excepting asm.js. Which, yeah, doesn't really count.

Firefox's JS supports 10 different ISAs? Wow. Is that the JIT compiler, or just the interpreter?

D.J.Peters · Post by **D.J.Peters** » Aug 28, 2019 14:32

TeeEmCee wrote:Is that the JIT compiler, or just the interpreter?

Both just in time compiler and interpreter :-)

Here are nice pictures and easy to read :-)
How JavaScript is run in the browser
A crash course in assembly
Creating and working with WebAssembly modules
What makes WebAssembly fast ?
...

Joshy

TeeEmCee · Post by **TeeEmCee** » Aug 29, 2019 5:32

Thanks. I'm looking forward to porting a FB program (the OHRRPGCE) to wasm! Has anyone tried yet?

Sometimes I think the -gen gcc asm code emitter is totaly stupid :-)

Sometimes I think the -gen gcc asm code emitter is totaly stupid :-)

Re: Sometimes I think the -gen gcc asm code emitter is totaly stupid :-)

Re: Sometimes I think the -gen gcc asm code emitter is totally stupid :-)

Re: Sometimes I think the -gen gcc asm code emitter is totaly stupid :-)

Re: Sometimes I think the -gen gcc asm code emitter is totaly stupid :-)

Re: Sometimes I think the -gen gcc asm code emitter is totaly stupid :-)