-gen clang
-gen clang
out of curiosity I tried compiling with -gen clang, other than -gen clang no other options are needed for 64-bit but for 32-bit you must also use -asm att
I wanted my FB package to only have the needed extra files for clang, so I set geany to compile for 64-bit with -gen clang, and after adding clang.exe to the bin\win64 folder I would try to compile, as expected a dialog would popup telling me about missing dll's, I would add the dll's and try again and more popups showed up, eventually after adding about 112 dll's it would compile
but that process didn't work for 32-bit, it did work initially but after adding a few dll's I was only greeted with The application was unable to start correctly (0xc000007b), so I just copied all the dll's (32-version) that I had to copy for 64-bit, but still no go
I noticed that the 32-bit version of clang required libstdc++-6.dll among others, after adding that I still would only get (0xc000007b), so I would add one dll at a time and if it didn't compile I would remove it and try again, after about a 100 trials I found that it was libgmp-10.dll
but now I would get tons of asm errors, after a good night sleep it occurred to me to try adding -asm att to the compile command and it WORKED
I have not tested extensively but in one of my test the performance was about 25% slower and the size of the exe was about 22% larger than if compiled using gcc
I wanted my FB package to only have the needed extra files for clang, so I set geany to compile for 64-bit with -gen clang, and after adding clang.exe to the bin\win64 folder I would try to compile, as expected a dialog would popup telling me about missing dll's, I would add the dll's and try again and more popups showed up, eventually after adding about 112 dll's it would compile
but that process didn't work for 32-bit, it did work initially but after adding a few dll's I was only greeted with The application was unable to start correctly (0xc000007b), so I just copied all the dll's (32-version) that I had to copy for 64-bit, but still no go
I noticed that the 32-bit version of clang required libstdc++-6.dll among others, after adding that I still would only get (0xc000007b), so I would add one dll at a time and if it didn't compile I would remove it and try again, after about a 100 trials I found that it was libgmp-10.dll
but now I would get tons of asm errors, after a good night sleep it occurred to me to try adding -asm att to the compile command and it WORKED
I have not tested extensively but in one of my test the performance was about 25% slower and the size of the exe was about 22% larger than if compiled using gcc
Re: -gen clang
about the -asm att option, I tried other small programs and they compile ok without that option, it's a mystery why that option is required for this particular program, there are no inline asm statements used
Re: -gen clang
in 32-bit with -gen clang the following will give the error error: unsupported relocation type push offset L_.str
compiles ok if you use -asm att
but in my larger test program there were many asm errors not related to strings
Code: Select all
print "Hello World"
but in my larger test program there were many asm errors not related to strings
Re: -gen clang
the following program compiles ok in 32-bit without -asm att
the matrix multiplication is from the Rosetta code, but the demo is mine, it shows how drastically the precision is lost
the matrix multiplication is from the Rosetta code, but the demo is mine, it shows how drastically the precision is lost
Code: Select all
type Matrix
dim as double m( any , any )
declare constructor ( )
declare constructor ( byval x as uinteger , byval y as uinteger )
end type
constructor Matrix ( )
end constructor
constructor Matrix ( byval x as uinteger , byval y as uinteger )
redim this.m( x - 1 , y - 1 )
end constructor
operator * ( byref a as Matrix , byref b as Matrix ) as Matrix
dim as Matrix ret
dim as uinteger i, j, k
if ubound( a.m , 2 ) = ubound( b.m , 1 ) and ubound( a.m , 1 ) = ubound( b.m , 2 ) then
redim ret.m( ubound( a.m , 1 ) , ubound( b.m , 2 ) )
for i = 0 to ubound( a.m , 1 )
for j = 0 to ubound( b.m , 2 )
for k = 0 to ubound( b.m , 1 )
ret.m( i , j ) += a.m( i , k ) * b.m( k , j )
next k
next j
next i
end if
return ret
end operator
'some garbage matrices for demonstration
dim as Matrix a = Matrix(4 , 4)
a.m(0 , 0) = 1 : a.m(0 , 1) = 1 : a.m(0 , 2) = 1 : a.m(0 , 3) = 1
a.m(1 , 0) = 2 : a.m(1 , 1) = 4 : a.m(1 , 2) = 8 : a.m(1 , 3) = 16
a.m(2 , 0) = 3 : a.m(2 , 1) = 9 : a.m(2 , 2) = 27 : a.m(2 , 3) = 81
a.m(3 , 0) = 4 : a.m(3 , 1) = 16 : a.m(3 , 2) = 64 : a.m(3 , 3) = 256
dim as Matrix b = Matrix( 4 , 4 )
b.m(0 , 0) = 4 : b.m(0 , 1) = -3 : b.m(0 , 2) = 4/3 : b.m (0, 3) = -1/4
b.m(1 , 0) = -13/3 : b.m(1 , 1) = 19/4 : b.m(1 , 2) = -7/3 : b.m (1, 3) = 11/24
b.m(2 , 0) = 3/2 : b.m(2 , 1) = -2 : b.m(2 , 2) = 7/6 : b.m (2, 3) = -1/4
b.m(3 , 0) = -1/6 : b.m(3 , 1) = 1/4 : b.m(3 , 2) = -1/6 : b.m (3, 3) = 1/24
dim as Matrix c = a * a * b
print c.m(0, 0), c.m(0, 1), c.m(0, 2), c.m(0, 3)
print c.m(1, 0), c.m(1, 1), c.m(1, 2), c.m(1, 3)
print c.m(2, 0), c.m(2, 1), c.m(2, 2), c.m(2, 3)
print c.m(3, 0), c.m(3, 1), c.m(3, 2), c.m(3, 3)
?"=============================================="
for i as long=1 to 4
c = c * c * b
print c.m(0, 0), c.m(0, 1), c.m(0, 2), c.m(0, 3)
print c.m(1, 0), c.m(1, 1), c.m(1, 2), c.m(1, 3)
print c.m(2, 0), c.m(2, 1), c.m(2, 2), c.m(2, 3)
print c.m(3, 0), c.m(3, 1), c.m(3, 2), c.m(3, 3)
?"=============================================="
next
-
- Posts: 792
- Joined: Jul 26, 2018 18:28
Re: -gen clang
Please share your clang binaries.
Re: -gen clang
you can get it from https://u.pcloud.link/publink/show?code ... h6yXwC0uu7
the toolchain is from https://winlibs.com/ GCC 13.2.0 (with POSIX threads) + LLVM/Clang/LLD/LLDB 18.1.1 + MinGW-w64 11.0.1 (MSVCRT) - release 6 (LATEST)
the toolchain is from https://winlibs.com/ GCC 13.2.0 (with POSIX threads) + LLVM/Clang/LLD/LLDB 18.1.1 + MinGW-w64 11.0.1 (MSVCRT) - release 6 (LATEST)
Re: -gen clang
the following gives asm errors if compiled without -asm att
the problem is with i=int(pi)
Code: Select all
dim as double pi=3.1415926535897932
dim as long i
i=int(pi)
? i
-
- Posts: 792
- Joined: Jul 26, 2018 18:28
Re: -gen clang
Thanks, I tried it, it compiles twice faster than gcc, but twice slower than gas:srvaldez wrote: ↑Mar 11, 2024 15:46 you can get it from https://u.pcloud.link/publink/show?code ... h6yXwC0uu7
the toolchain is from https://winlibs.com/ GCC 13.2.0 (with POSIX threads) + LLVM/Clang/LLD/LLDB 18.1.1 + MinGW-w64 11.0.1 (MSVCRT) - release 6 (LATEST)
32-bit: gas - 2,47 Seconds - 387 KB
32-bit: clang - 5,23 Seconds - 391 KB
32-bit: gcc - 9,18 Seconds - 495 KB
32-bit: llvm - 236,99 Seconds
64-bit: gas64 - 3,41 Seconds - 387 KB
64-bit: clang - 3,68 Seconds - 305 KB
64-bit: gcc - 6,05 Seconds - 402 KB
-
- Posts: 4313
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: -gen clang
Using 13.2.0 is asking for trouble.
Using 'GCC 11.2.0 + LLVM/Clang/LLD/LLDB 14.0.0 + MinGW-w64 9.0.0 (UCRT) - release 7' would have been a better bet as it is closer to 9.3.0.
I have tried clang on my PRNG plot program in 64-bit mode. It was luck of the draw whether they compiled or not, and some saw WinFBE not responding. A lot of the 64-bit instructions were a mess. As for -asm att for asm blocks in 32-bit mode, I reckon many members will say: “Well that isn't going to happen!”. As it stands, gcc has nothing to worry about.
On a separate issue, we could be getting 11.2.0 in the near future.
I banged on for ages to get us away from 5.2. I found 8.3 to be the best for fbc both from a performance aspect and binary size. According to internet benchmarks, each new version of gcc has seen a marginal performance improvement on balance. That may be true for C and C++ coders, but no versions since version 8 has done any favours for FreeBASIC. All they have done is increase the size of the resulting binaries. It would seem that the emitted C is not taking advantage of the 'improvements' of newer versions of gcc.
If we do get 11.2.0 I have a simple question: Why?
9, 10, 11, 12, 13, and 14 do nothing for FreeBASIC.
Regrettably, I didn't keep my 8.3. 8.1 performs better than 9.3, but that is very marginal.
I wonder how fbc 1.10.1/gcc 8.5 performs. I must ask my toolchain guru to knock one out for me. Why can't I do that? What I know about toolchain building can be put on the back of a postage stamp.
Using 'GCC 11.2.0 + LLVM/Clang/LLD/LLDB 14.0.0 + MinGW-w64 9.0.0 (UCRT) - release 7' would have been a better bet as it is closer to 9.3.0.
I have tried clang on my PRNG plot program in 64-bit mode. It was luck of the draw whether they compiled or not, and some saw WinFBE not responding. A lot of the 64-bit instructions were a mess. As for -asm att for asm blocks in 32-bit mode, I reckon many members will say: “Well that isn't going to happen!”. As it stands, gcc has nothing to worry about.
On a separate issue, we could be getting 11.2.0 in the near future.
I banged on for ages to get us away from 5.2. I found 8.3 to be the best for fbc both from a performance aspect and binary size. According to internet benchmarks, each new version of gcc has seen a marginal performance improvement on balance. That may be true for C and C++ coders, but no versions since version 8 has done any favours for FreeBASIC. All they have done is increase the size of the resulting binaries. It would seem that the emitted C is not taking advantage of the 'improvements' of newer versions of gcc.
If we do get 11.2.0 I have a simple question: Why?
9, 10, 11, 12, 13, and 14 do nothing for FreeBASIC.
Regrettably, I didn't keep my 8.3. 8.1 performs better than 9.3, but that is very marginal.
I wonder how fbc 1.10.1/gcc 8.5 performs. I must ask my toolchain guru to knock one out for me. Why can't I do that? What I know about toolchain building can be put on the back of a postage stamp.
-
- Posts: 4313
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: -gen clang
I'm finding a lot of source code will not compile.
I checked out a 64-bit asm file, which failed.
Here is a typical example.
mov Dword Ptr [[rsp + 4]], eax
That has too many brackets and will not assemble.
There were 14 such instances.
Some source code is compiling successfully without errors or warnings and appear to be executing correctly.
So the C emitter is not at fault — that is clang 'screwing up'.
I assume that the UCRT runtime library is being used. That is Microsoft.
It may be worthwhile building using the MSVCRT runtime library.
There is one at WinLibs with clang using GCC 11.2.0.
Good luck on that because I cannot help.
Added: I reread the earlier posts and see that MSVCRT was used. In which case, try UCRT. Unfortunately, that means Wndows 10 or later. No choice really because the MSVCRT version is unreliable and clang may not work on a lot of our code. If the UCRT is just as bad, then WinLibs is a no-go area for clang.
I checked out a 64-bit asm file, which failed.
Here is a typical example.
mov Dword Ptr [[rsp + 4]], eax
That has too many brackets and will not assemble.
There were 14 such instances.
Some source code is compiling successfully without errors or warnings and appear to be executing correctly.
So the C emitter is not at fault — that is clang 'screwing up'.
I assume that the UCRT runtime library is being used. That is Microsoft.
It may be worthwhile building using the MSVCRT runtime library.
There is one at WinLibs with clang using GCC 11.2.0.
Good luck on that because I cannot help.
Added: I reread the earlier posts and see that MSVCRT was used. In which case, try UCRT. Unfortunately, that means Wndows 10 or later. No choice really because the MSVCRT version is unreliable and clang may not work on a lot of our code. If the UCRT is just as bad, then WinLibs is a no-go area for clang.
Re: -gen clang
deltarho[1859]
from what I have seen, using clang with FB doesn't work with intel asm, don't know the details as to why.
I will try some inline att asm and see what happens.
<edit>
all seems to work ok with att inline asm, I tested with one file containing hundreds of lines of inline asm in att syntax
from what I have seen, using clang with FB doesn't work with intel asm, don't know the details as to why.
I will try some inline att asm and see what happens.
<edit>
all seems to work ok with att inline asm, I tested with one file containing hundreds of lines of inline asm in att syntax
Re: -gen clang
just a couple of samples
sample1
'I forgot when I wrote this and how it works
sample2
sample1
'I forgot when I wrote this and how it works
Code: Select all
#cmdline "-w all -arch native -asm att -gen clang -Wc -O2"
#ifdef __FB_WIN32__
#ifdef __FB_64BIT__
type bar
as double d
as long l
as longint ld
as zstring*19 sz
end type
function foo naked () as bar
asm
"fldpi"
"fstl (%rcx)"
"movl $123,%eax"
"movl %eax,8(%rcx)"
"movq $123456789,%rax"
"movq %rax,16(%rcx)"
"movq .L0(%rip),%rax"
"movq %rax,24(%rcx)"
"movq .L0+8(%rip),%rax"
"movq %rax,32(%rcx)"
"ret"
".L0: .byte 'h','e','l','l','o',' ','w','o','r','l','d',0"
end asm
end function
dim as bar y
y=foo()
? y.d, y.l, y.ld, y.sz
Sleep
#endif
#endif
Code: Select all
#cmdline "-w all -arch native -asm att -gen clang -Wc -O2"
''(-b + sqrt(b * b - 4 * a * c)) / (2 * a);
#ifdef __FB_WIN32__
#ifdef __FB_64BIT__
function quadraticRoot naked cdecl(byval a as double, byval b as double, byval c as double) as double
asm
"mulsd 0f(%rip), %xmm2"
"movapd %xmm1, %xmm3"
"mulsd %xmm1, %xmm3"
"mulsd %xmm0, %xmm2"
"subsd %xmm2, %xmm3"
"sqrtsd %xmm3, %xmm3"
"subsd %xmm1, %xmm3"
"mulsd 1f(%rip), %xmm3"
"divsd %xmm0, %xmm3"
"movapd %xmm3, %xmm0"
"ret"
"0:"
".double 4"
"1:"
".double 0.5"
end asm
end function
function quadraticRootV2 naked cdecl(byval a as double, byval b as double, byval c as double) as double
asm
"mulsd 0f(%rip), %xmm2"
"movapd %xmm1, %xmm3"
"mulsd %xmm1, %xmm3"
"mulsd %xmm0, %xmm2"
"subsd %xmm2, %xmm3"
"sqrtsd %xmm3, %xmm3"
"subsd %xmm1, %xmm3"
"mulsd 1f(%rip), %xmm3"
"divsd %xmm0, %xmm3"
"movapd %xmm3, %xmm0"
"ret"
"0:"
".long 0"
".long 1074790400"
"1:"
".long 0"
".long 1071644672"
end asm
end function
Print "quadraticRoot(4,2,-4) = ";quadraticRoot(4,2,-4)
Print "quadraticRoot(4,2,-4) = ";quadraticRootV2(4,2,-4)
Sleep
#endif
#endif
Re: -gen clang
here's an example that uses ".intel_syntax noprefix"
you can use Intel syntax asm when compiling with -asm att
but accessing variables by name is problematic
you can use Intel syntax asm when compiling with -asm att
but accessing variables by name is problematic
Code: Select all
#cmdline "-w all -arch native -asm att -gen clang -Wc -O2"
#ifdef __FB_WIN32__
#ifdef __FB_64BIT__
function iPower naked ( Byval x As double, Byval e As Integer) as double
Asm
".intel_syntax noprefix"
" push rbx" '' preserve non-volatile rbx
'''mov rax,[e]
" mov rax, rdx"
" mov rbx, rax"
"ipower_absrax:"
" neg rax"
" js ipower_absrax"
" fld1" ' z=1.0
" fld1"
'''mov rdx,[x]
" movq rdx, xmm0"
" push rdx"
" fld qword ptr [rsp]" 'load st0 with x
" pop rdx"
" cmp rax,0" 'while e>0
"ipower_while1:"
"jle ipower_wend1"
"ipower_while2:"
"bt rax,0" 'test for odd/even
"jc ipower_wend2" 'jump if odd
'while e is even
"sar rax,1" 'rax=rax/2
"fmul st(0),st(0)" 'x=x*x
"jmp ipower_while2"
"ipower_wend2:"
"sub rax,1"
"fmul st(1),st(0)" 'z=z*x 'st1=st1*st0
"jmp ipower_while1"
"ipower_wend1:"
"fstp st(0)" 'cleanup fpu stack
"fstp st(1)" '" " "
"cmp rbx,0" 'test to see if e<0
"jge ipower_noinv" 'skip reciprocal if not less than 0
'if e<0 take reciprocal
"fld1"
"fdivrp st(1),st(0)"
"ipower_noinv:"
'''mov rax,[result]
''sub rsp, 16 '' allocate buffer from stack
'' '' maintaining 16-byte alignment
"sub rsp, 8" '' allocate buffer from stack
'''fstp qword ptr [rax]" 'store z (st0)
"fstp qword ptr [rsp]" '' store z to buffer
"movq xmm0, [rsp]" '' store buffer in return register
"add rsp, 8" '' free buffer
"fstp st(0)" 'clear fpu stack
"fstp st(0)" 'clear fpu stack
"pop rbx" '' recover non-volatile rbx
"ret"
".att_syntax prefix"
End Asm
End function
dim as double x, y
x=2
y = iPower(x,3)
print y
y = iPower(x,-3)
print y
print "press return to end"
sleep
#endif
#endif
Re: -gen clang
you can access named variables in asm att but not in asm att with intel syntax
here's the quadraticRoot function using named variables, the variables get ucased and a $1 is appended to the right, if you are new to this and you want to tinker with inline att asm then the following example may serve as a template
here's the quadraticRoot function using named variables, the variables get ucased and a $1 is appended to the right, if you are new to this and you want to tinker with inline att asm then the following example may serve as a template
Code: Select all
''(-b + sqrt(b * b - 4 * a * c)) / (2 * a);
function quadraticRoot cdecl(byval a as double, byval b as double, byval c as double) as double
dim as double four=4, result
asm
"movsd %[A$1], %%xmm1 \n" _
"movsd %[C$1], %%xmm2 \n" _
"mulsd %[FOUR$1], %%xmm2 \n" _
"mulsd %[A$1], %%xmm2 \n" _
"movsd %[B$1], %%xmm0 \n" _
"mulsd %[B$1], %%xmm0 \n" _
"subsd %%xmm2, %%xmm0 \n" _
"sqrtsd %%xmm0, %%xmm0 \n" _
"subsd %[B$1], %%xmm0 \n" _
"addsd %%xmm1, %%xmm1 \n" _
"divsd %%xmm1, %%xmm0 \n" _
"movsd %%xmm0, %[RESULT$1] \n" _
: _
:[a]"m"(a),[b]"m"(b),[c]"m"(c),[four]"m"(four),[result]"m"(result) _ 'you must declare the variables here
:"xmm1","xmm2" 'it's good to list the registers used so that the compiler can avoid conflicts
end asm
return result
end function
-
- Posts: 4313
- Joined: Jan 02, 2017 0:34
- Location: UK
- Contact:
Re: -gen clang
Here is the latest.
I now have clang with gcc 13.2, gcc 11.2, gcc 9.3, and gcc 8.3.
Where did I get 9.3 and 8.3 from? I cannot tell you — don't go there.
What I have learnt is:
The moral here is to use -asm att when using clang in 32-bit mode or 64-bit mode. Any in-line assembly must be in -asm att or a BASIC replacement.
Intel asm syntax should be nowhere in sight in your source code.
I am now getting failed compilations to compile and execute with the above protocol.
I had PCG32II working but MsWsII failing. MsWsII is now working and others which failed.
Fortunately, inline assembly does not benefit FreeBASIC as it does with PowerBASIC. Why? With gcc or gcc/clang FreeBASIC is faster than PowerBASIC.
In 32-bit mode PCG32II and MsWsII are faster with gcc. In 64-bit mode PCG32II and MsWsII are faster with clang.
I replaced a asm procedure in MsWsII with a BASIC equivalent. It was for seeding and didn't impact on the performance. The BASIC equivalent works, but it is not my favoured way of seeding. Someone very kindly replaced the asm procedure with att syntax and that worked. I am now back to my favoured way of seeding MsWsII.
So, in-line assembly must be in -asm att or a BASIC replacement — no intel asm syntax in sight.
I am now thinking of writing a clang version of my SetCompilerSwitchesII. I think I shall call it SetCompilerSwitcesIII.
WinFBE will then have II or III. I will then be able to compile in gcc or gcc/clang. I will go with gcc 9.3.0 as that is the official gcc at the moment.
I wrote earlier: If we do get 11.2.0 I have a simple question: Why?
That may be because at WinLibs we have gcc/clang toolchains.
Since gcc 9.3.0/clang works we don't have to use gcc 11.2 which is slower than gcc 9.3.0 - it just produces larger binaries.
A lot more testing with clang is required. If we stick to the above protocol, we should no longer have any issues with compiling.
Whether gcc or gcc/clang is the faster will require testing both. I very much doubt that second guessing will help at all.
I now have clang with gcc 13.2, gcc 11.2, gcc 9.3, and gcc 8.3.
Where did I get 9.3 and 8.3 from? I cannot tell you — don't go there.
What I have learnt is:
The moral here is to use -asm att when using clang in 32-bit mode or 64-bit mode. Any in-line assembly must be in -asm att or a BASIC replacement.
Intel asm syntax should be nowhere in sight in your source code.
I am now getting failed compilations to compile and execute with the above protocol.
I had PCG32II working but MsWsII failing. MsWsII is now working and others which failed.
Fortunately, inline assembly does not benefit FreeBASIC as it does with PowerBASIC. Why? With gcc or gcc/clang FreeBASIC is faster than PowerBASIC.
In 32-bit mode PCG32II and MsWsII are faster with gcc. In 64-bit mode PCG32II and MsWsII are faster with clang.
I replaced a asm procedure in MsWsII with a BASIC equivalent. It was for seeding and didn't impact on the performance. The BASIC equivalent works, but it is not my favoured way of seeding. Someone very kindly replaced the asm procedure with att syntax and that worked. I am now back to my favoured way of seeding MsWsII.
So, in-line assembly must be in -asm att or a BASIC replacement — no intel asm syntax in sight.
I am now thinking of writing a clang version of my SetCompilerSwitchesII. I think I shall call it SetCompilerSwitcesIII.
WinFBE will then have II or III. I will then be able to compile in gcc or gcc/clang. I will go with gcc 9.3.0 as that is the official gcc at the moment.
I wrote earlier: If we do get 11.2.0 I have a simple question: Why?
That may be because at WinLibs we have gcc/clang toolchains.
Since gcc 9.3.0/clang works we don't have to use gcc 11.2 which is slower than gcc 9.3.0 - it just produces larger binaries.
A lot more testing with clang is required. If we stick to the above protocol, we should no longer have any issues with compiling.
Whether gcc or gcc/clang is the faster will require testing both. I very much doubt that second guessing will help at all.