-gen clang

srvaldez · Post by **srvaldez** » Mar 11, 2024 13:28

out of curiosity I tried compiling with -gen clang, other than -gen clang no other options are needed for 64-bit but for 32-bit you must also use -asm att
I wanted my FB package to only have the needed extra files for clang, so I set geany to compile for 64-bit with -gen clang, and after adding clang.exe to the bin\win64 folder I would try to compile, as expected a dialog would popup telling me about missing dll's, I would add the dll's and try again and more popups showed up, eventually after adding about 112 dll's it would compile
but that process didn't work for 32-bit, it did work initially but after adding a few dll's I was only greeted with The application was unable to start correctly (0xc000007b), so I just copied all the dll's (32-version) that I had to copy for 64-bit, but still no go
I noticed that the 32-bit version of clang required libstdc++-6.dll among others, after adding that I still would only get (0xc000007b), so I would add one dll at a time and if it didn't compile I would remove it and try again, after about a 100 trials I found that it was libgmp-10.dll
but now I would get tons of asm errors, after a good night sleep it occurred to me to try adding -asm att to the compile command and it WORKED
I have not tested extensively but in one of my test the performance was about 25% slower and the size of the exe was about 22% larger than if compiled using gcc

srvaldez · Post by **srvaldez** » Mar 11, 2024 14:27

about the -asm att option, I tried other small programs and they compile ok without that option, it's a mystery why that option is required for this particular program, there are no inline asm statements used

srvaldez · Post by **srvaldez** » Mar 11, 2024 14:47

in 32-bit with -gen clang the following will give the error error: unsupported relocation type push offset L_.str

Code: Select all

print "Hello World"

compiles ok if you use -asm att
but in my larger test program there were many asm errors not related to strings

srvaldez · Post by **srvaldez** » Mar 11, 2024 14:59

the following program compiles ok in 32-bit without -asm att
the matrix multiplication is from the Rosetta code, but the demo is mine, it shows how drastically the precision is lost

Code: Select all

type Matrix
    dim as double m( any , any )
    declare constructor ( )
    declare constructor ( byval x as uinteger , byval y as uinteger )
end type

constructor Matrix ( )
end constructor

constructor Matrix ( byval x as uinteger , byval y as uinteger )
    redim this.m( x - 1 , y - 1 )
end constructor

operator * ( byref a as Matrix , byref b as Matrix ) as Matrix
    dim as Matrix ret
    dim as uinteger i, j, k
    if ubound( a.m , 2 ) = ubound( b.m , 1 ) and ubound( a.m , 1 ) = ubound( b.m , 2 ) then
        redim ret.m( ubound( a.m , 1 ) , ubound( b.m , 2 ) )
        for i = 0 to ubound( a.m , 1 )
            for j = 0 to ubound( b.m , 2 )
                for k = 0 to ubound( b.m , 1 )
                    ret.m( i , j ) += a.m( i , k ) * b.m( k , j )
                next k
            next j
        next i
    end if
    return ret
end operator

'some garbage matrices for demonstration
dim as Matrix a = Matrix(4 , 4)
a.m(0 , 0) = 1 : a.m(0 , 1) = 1 : a.m(0 , 2) = 1 : a.m(0 , 3) = 1
a.m(1 , 0) = 2 : a.m(1 , 1) = 4 : a.m(1 , 2) = 8 : a.m(1 , 3) = 16
a.m(2 , 0) = 3 : a.m(2 , 1) = 9 : a.m(2 , 2) = 27 : a.m(2 , 3) = 81
a.m(3 , 0) = 4 : a.m(3 , 1) = 16 : a.m(3 , 2) = 64 : a.m(3 , 3) = 256
dim as Matrix b = Matrix( 4 , 4 )
b.m(0 , 0) = 4 : b.m(0 , 1) = -3 : b.m(0 , 2) = 4/3 : b.m (0, 3) = -1/4
b.m(1 , 0) = -13/3 : b.m(1 , 1) = 19/4 : b.m(1 , 2) = -7/3 : b.m (1, 3) = 11/24
b.m(2 , 0) = 3/2 : b.m(2 , 1) = -2 : b.m(2 , 2) = 7/6 : b.m (2, 3) = -1/4
b.m(3 , 0) = -1/6 : b.m(3 , 1) = 1/4 : b.m(3 , 2) = -1/6 : b.m (3, 3) = 1/24
dim as Matrix c = a * a * b
print c.m(0, 0), c.m(0, 1), c.m(0, 2), c.m(0, 3)
print c.m(1, 0), c.m(1, 1), c.m(1, 2), c.m(1, 3)
print c.m(2, 0), c.m(2, 1), c.m(2, 2), c.m(2, 3)
print c.m(3, 0), c.m(3, 1), c.m(3, 2), c.m(3, 3)
?"=============================================="
for i as long=1 to 4
	c = c * c * b
	print c.m(0, 0), c.m(0, 1), c.m(0, 2), c.m(0, 3)
	print c.m(1, 0), c.m(1, 1), c.m(1, 2), c.m(1, 3)
	print c.m(2, 0), c.m(2, 1), c.m(2, 2), c.m(2, 3)
	print c.m(3, 0), c.m(3, 1), c.m(3, 2), c.m(3, 3)
	?"=============================================="
next

Xusinboy Bekchanov · Post by **Xusinboy Bekchanov** » Mar 11, 2024 15:35

Please share your clang binaries.

srvaldez · Post by **srvaldez** » Mar 11, 2024 15:46

you can get it from https://u.pcloud.link/publink/show?code ... h6yXwC0uu7
the toolchain is from https://winlibs.com/ GCC 13.2.0 (with POSIX threads) + LLVM/Clang/LLD/LLDB 18.1.1 + MinGW-w64 11.0.1 (MSVCRT) - release 6 (LATEST)

srvaldez · Post by **srvaldez** » Mar 11, 2024 15:49

the following gives asm errors if compiled without -asm att

Code: Select all

dim as double pi=3.1415926535897932
dim as long i

i=int(pi)
? i

the problem is with i=int(pi)

Xusinboy Bekchanov · Post by **Xusinboy Bekchanov** » Mar 11, 2024 16:25

srvaldez wrote: ↑Mar 11, 2024 15:46 you can get it from https://u.pcloud.link/publink/show?code ... h6yXwC0uu7
the toolchain is from https://winlibs.com/ GCC 13.2.0 (with POSIX threads) + LLVM/Clang/LLD/LLDB 18.1.1 + MinGW-w64 11.0.1 (MSVCRT) - release 6 (LATEST)

Thanks, I tried it, it compiles twice faster than gcc, but twice slower than gas:
32-bit: gas - 2,47 Seconds - 387 KB
32-bit: clang - 5,23 Seconds - 391 KB
32-bit: gcc - 9,18 Seconds - 495 KB
32-bit: llvm - 236,99 Seconds
64-bit: gas64 - 3,41 Seconds - 387 KB
64-bit: clang - 3,68 Seconds - 305 KB
64-bit: gcc - 6,05 Seconds - 402 KB

deltarho[1859] · Post by **deltarho[1859]** » Mar 12, 2024 6:53

Using 13.2.0 is asking for trouble.

Using 'GCC 11.2.0 + LLVM/Clang/LLD/LLDB 14.0.0 + MinGW-w64 9.0.0 (UCRT) - release 7' would have been a better bet as it is closer to 9.3.0.

I have tried clang on my PRNG plot program in 64-bit mode. It was luck of the draw whether they compiled or not, and some saw WinFBE not responding. A lot of the 64-bit instructions were a mess. As for -asm att for asm blocks in 32-bit mode, I reckon many members will say: “Well that isn't going to happen!”. As it stands, gcc has nothing to worry about.

On a separate issue, we could be getting 11.2.0 in the near future.

I banged on for ages to get us away from 5.2. I found 8.3 to be the best for fbc both from a performance aspect and binary size. According to internet benchmarks, each new version of gcc has seen a marginal performance improvement on balance. That may be true for C and C++ coders, but no versions since version 8 has done any favours for FreeBASIC. All they have done is increase the size of the resulting binaries. It would seem that the emitted C is not taking advantage of the 'improvements' of newer versions of gcc.

If we do get 11.2.0 I have a simple question: Why?

9, 10, 11, 12, 13, and 14 do nothing for FreeBASIC.

Regrettably, I didn't keep my 8.3. 8.1 performs better than 9.3, but that is very marginal.

I wonder how fbc 1.10.1/gcc 8.5 performs. I must ask my toolchain guru to knock one out for me. Why can't I do that? What I know about toolchain building can be put on the back of a postage stamp.

deltarho[1859] · Post by **deltarho[1859]** » Mar 13, 2024 5:25

I'm finding a lot of source code will not compile.

I checked out a 64-bit asm file, which failed.

Here is a typical example.

mov Dword Ptr [[rsp + 4]], eax

That has too many brackets and will not assemble.

There were 14 such instances.

Some source code is compiling successfully without errors or warnings and appear to be executing correctly.

So the C emitter is not at fault — that is clang 'screwing up'.

I assume that the UCRT runtime library is being used. That is Microsoft.

It may be worthwhile building using the MSVCRT runtime library.

There is one at WinLibs with clang using GCC 11.2.0.

Good luck on that because I cannot help.

Added: I reread the earlier posts and see that MSVCRT was used. In which case, try UCRT. Unfortunately, that means Wndows 10 or later. No choice really because the MSVCRT version is unreliable and clang may not work on a lot of our code. If the UCRT is just as bad, then WinLibs is a no-go area for clang.

srvaldez · Post by **srvaldez** » Mar 13, 2024 8:58

deltarho[1859]
from what I have seen, using clang with FB doesn't work with intel asm, don't know the details as to why.
I will try some inline att asm and see what happens.
<edit>
all seems to work ok with att inline asm, I tested with one file containing hundreds of lines of inline asm in att syntax

srvaldez · Post by **srvaldez** » Mar 13, 2024 10:25

just a couple of samples
sample1
'I forgot when I wrote this and how it works

Code: Select all

#cmdline "-w all -arch native -asm att -gen clang -Wc -O2"

#ifdef __FB_WIN32__
   #ifdef __FB_64BIT__
   
    type bar
       as double d
       as long l
       as longint ld
       as zstring*19 sz
    end type

    function  foo naked () as bar
       asm
          "fldpi"
          "fstl   (%rcx)"
          "movl   $123,%eax"
          "movl   %eax,8(%rcx)"
          "movq   $123456789,%rax"
          "movq   %rax,16(%rcx)"
          "movq   .L0(%rip),%rax"
          "movq   %rax,24(%rcx)"
          "movq   .L0+8(%rip),%rax"
          "movq   %rax,32(%rcx)"
          "ret"
          ".L0: .byte 'h','e','l','l','o',' ','w','o','r','l','d',0"
       end asm
    end function

    dim as bar y

    y=foo()
    ? y.d, y.l, y.ld, y.sz
    Sleep
   #endif
#endif

sample2

Code: Select all

#cmdline "-w all -arch native -asm att -gen clang -Wc -O2"

''(-b + sqrt(b * b - 4 * a * c)) / (2 * a);
#ifdef __FB_WIN32__
   #ifdef __FB_64BIT__

    function quadraticRoot naked cdecl(byval a as double, byval b as double, byval c as double) as double
        asm
            "mulsd 0f(%rip), %xmm2"
            "movapd %xmm1, %xmm3"
            "mulsd %xmm1, %xmm3"
            "mulsd %xmm0, %xmm2"
            "subsd %xmm2, %xmm3"
            "sqrtsd %xmm3, %xmm3"
            "subsd %xmm1, %xmm3"
            "mulsd 1f(%rip), %xmm3"
            "divsd %xmm0, %xmm3"
            "movapd %xmm3, %xmm0"
            "ret"
            "0:"
            ".double 4"
            "1:"
            ".double 0.5"
        end asm
    end function

    function quadraticRootV2 naked cdecl(byval a as double, byval b as double, byval c as double) as double
        asm
            "mulsd 0f(%rip), %xmm2"
            "movapd %xmm1, %xmm3"
            "mulsd %xmm1, %xmm3"
            "mulsd %xmm0, %xmm2"
            "subsd %xmm2, %xmm3"
            "sqrtsd %xmm3, %xmm3"
            "subsd %xmm1, %xmm3"
            "mulsd 1f(%rip), %xmm3"
            "divsd %xmm0, %xmm3"
            "movapd %xmm3, %xmm0"
            "ret"
            "0:"
            ".long 0"
            ".long 1074790400"
            "1:"
            ".long 0"
            ".long 1071644672"
        end asm
    end function

    Print "quadraticRoot(4,2,-4) = ";quadraticRoot(4,2,-4)
    Print "quadraticRoot(4,2,-4) = ";quadraticRootV2(4,2,-4)
    Sleep
   #endif
#endif

srvaldez · Post by **srvaldez** » Mar 13, 2024 10:39

here's an example that uses ".intel_syntax noprefix"
you can use Intel syntax asm when compiling with -asm att
but accessing variables by name is problematic

Code: Select all

#cmdline "-w all -arch native -asm att -gen clang -Wc -O2"

#ifdef __FB_WIN32__
   #ifdef __FB_64BIT__
       
    function iPower naked ( Byval x As double, Byval e As Integer) as double
        Asm
        ".intel_syntax noprefix"
        "    push    rbx"         '' preserve non-volatile rbx
            '''mov rax,[e]
        "    mov rax, rdx"
        "    mov rbx, rax"
        "ipower_absrax:"
        "    neg rax"
        "    js ipower_absrax"
        "    fld1" '  z=1.0
        "    fld1"
            '''mov rdx,[x]
        "    movq rdx, xmm0"
        "    push rdx"
        "    fld qword ptr [rsp]" 'load st0 with x
        "    pop rdx"
        "    cmp rax,0"           'while e>0
        "ipower_while1:"
            "jle ipower_wend1"
        "ipower_while2:"
            "bt rax,0"            'test for odd/even
            "jc ipower_wend2"     'jump if odd
                                'while e is even
            "sar rax,1"           'rax=rax/2
            "fmul st(0),st(0)"    'x=x*x
            "jmp ipower_while2"
        "ipower_wend2:"
            "sub rax,1"
            "fmul st(1),st(0)"    'z=z*x 'st1=st1*st0
            "jmp ipower_while1" 
        "ipower_wend1:"
            "fstp st(0)"          'cleanup fpu stack
            "fstp st(1)"          '"       "   "
            "cmp rbx,0"           'test to see if e<0
            "jge ipower_noinv"    'skip reciprocal if not less than 0
                                'if e<0 take reciprocal
            "fld1"
            "fdivrp st(1),st(0)"
        "ipower_noinv:"
            '''mov rax,[result]
            ''sub     rsp, 16      '' allocate buffer from stack
            ''                     '' maintaining 16-byte alignment   
            "sub     rsp, 8"      '' allocate buffer from stack
            '''fstp qword ptr [rax]" 'store z (st0)
            "fstp qword ptr [rsp]" '' store z to buffer
            "movq    xmm0, [rsp]"  '' store buffer in return register
            "add     rsp, 8"      '' free buffer
            "fstp st(0)"          'clear fpu stack
            "fstp st(0)"          'clear fpu stack
            "pop     rbx"         '' recover non-volatile rbx 
            "ret"
        ".att_syntax prefix"
        End Asm
    End function

    dim as double x, y
    x=2
    y = iPower(x,3)
    print y
    y = iPower(x,-3)
    print y
    print "press return to end"
    sleep
    
   #endif
#endif

srvaldez · Post by **srvaldez** » Mar 13, 2024 18:06

you can access named variables in asm att but not in asm att with intel syntax
here's the quadraticRoot function using named variables, the variables get ucased and a $1 is appended to the right, if you are new to this and you want to tinker with inline att asm then the following example may serve as a template

Code: Select all

''(-b + sqrt(b * b - 4 * a * c)) / (2 * a);
    function quadraticRoot cdecl(byval a as double, byval b as double, byval c as double) as double
        dim as double four=4, result
        asm
            "movsd %[A$1], %%xmm1 \n" _
            "movsd %[C$1], %%xmm2 \n" _
            "mulsd %[FOUR$1], %%xmm2 \n" _
            "mulsd %[A$1], %%xmm2 \n" _
            "movsd %[B$1], %%xmm0 \n" _ 
            "mulsd %[B$1], %%xmm0 \n" _
            "subsd %%xmm2, %%xmm0 \n" _
            "sqrtsd %%xmm0, %%xmm0 \n" _
            "subsd %[B$1], %%xmm0 \n" _
            "addsd %%xmm1, %%xmm1 \n" _
            "divsd %%xmm1, %%xmm0 \n" _
            "movsd %%xmm0, %[RESULT$1] \n" _
            : _
            :[a]"m"(a),[b]"m"(b),[c]"m"(c),[four]"m"(four),[result]"m"(result) _ 'you must declare the variables here
            :"xmm1","xmm2" 'it's good to list the registers used so that the compiler can avoid conflicts
        end asm
        return result
    end function

deltarho[1859] · Post by **deltarho[1859]** » Mar 14, 2024 23:52

Here is the latest.

I now have clang with gcc 13.2, gcc 11.2, gcc 9.3, and gcc 8.3.

Where did I get 9.3 and 8.3 from? I cannot tell you — don't go there.

What I have learnt is:

The moral here is to use -asm att when using clang in 32-bit mode or 64-bit mode. Any in-line assembly must be in -asm att or a BASIC replacement.

Intel asm syntax should be nowhere in sight in your source code.

I am now getting failed compilations to compile and execute with the above protocol.

I had PCG32II working but MsWsII failing. MsWsII is now working and others which failed.

Fortunately, inline assembly does not benefit FreeBASIC as it does with PowerBASIC. Why? With gcc or gcc/clang FreeBASIC is faster than PowerBASIC.

In 32-bit mode PCG32II and MsWsII are faster with gcc. In 64-bit mode PCG32II and MsWsII are faster with clang.

I replaced a asm procedure in MsWsII with a BASIC equivalent. It was for seeding and didn't impact on the performance. The BASIC equivalent works, but it is not my favoured way of seeding. Someone very kindly replaced the asm procedure with att syntax and that worked. I am now back to my favoured way of seeding MsWsII.

So, in-line assembly must be in -asm att or a BASIC replacement — no intel asm syntax in sight.

I am now thinking of writing a clang version of my SetCompilerSwitchesII. I think I shall call it SetCompilerSwitcesIII.

WinFBE will then have II or III. I will then be able to compile in gcc or gcc/clang. I will go with gcc 9.3.0 as that is the official gcc at the moment.

I wrote earlier: If we do get 11.2.0 I have a simple question: Why?

That may be because at WinLibs we have gcc/clang toolchains.

Since gcc 9.3.0/clang works we don't have to use gcc 11.2 which is slower than gcc 9.3.0 - it just produces larger binaries.

A lot more testing with clang is required. If we stick to the above protocol, we should no longer have any issues with compiling.

Whether gcc or gcc/clang is the faster will require testing both. I very much doubt that second guessing will help at all.

-gen clang

-gen clang

Re: -gen clang

Re: -gen clang

Re: -gen clang

Re: -gen clang

Re: -gen clang

Re: -gen clang

Re: -gen clang

Re: -gen clang

Re: -gen clang

Re: -gen clang

Re: -gen clang

Re: -gen clang

Re: -gen clang

Re: -gen clang