How to use SSE2 in freebasic ?
Re: How to use SSE2 in freebasic ?
it's a i9-9900K CPU @ 3.60GHz, 3600 Mhz
Total Cores 8.
Total Threads 16.
Max Turbo Frequency 5.00 GHz.
Intel® Turbo Boost Technology 2.0 Frequency‡ 5.00 GHz.
Processor Base Frequency 3.60 GHz.
Cache 16 MB Intel® Smart Cache.
Bus Speed 8 GT/s.
TDP 95 W.
I maxed-out the RAM because I frequently run VM's
Total Cores 8.
Total Threads 16.
Max Turbo Frequency 5.00 GHz.
Intel® Turbo Boost Technology 2.0 Frequency‡ 5.00 GHz.
Processor Base Frequency 3.60 GHz.
Cache 16 MB Intel® Smart Cache.
Bus Speed 8 GT/s.
TDP 95 W.
I maxed-out the RAM because I frequently run VM's
-
- Posts: 8586
- Joined: May 28, 2005 3:28
- Contact:
Re: How to use SSE2 in freebasic ?
Cool 8 Cores with 16 Threads 2 of this PC's (32 threads) would be ideal for the mitsuba 3D renderer I use and love so much :-)
To get 24 hardware threads for the renderer I use mitsuba render nodes in my local network with 6 quad core PC's :lol:
Joshy
To get 24 hardware threads for the renderer I use mitsuba render nodes in my local network with 6 quad core PC's :lol:
Joshy
Re: How to use SSE2 in freebasic ?
CPU: i7-3770 8MB
[08:43:13.11]time v1: 2.66459666513823
[08:43:13.11]time v2: 2.67936478230213
[08:43:13.11]time v3: 2.68704450507995
[08:43:13.11]time v4: 2.680799485646958
I can cnot ompile v5 , so v5 no data
[08:43:13.11]time v1: 2.66459666513823
[08:43:13.11]time v2: 2.67936478230213
[08:43:13.11]time v3: 2.68704450507995
[08:43:13.11]time v4: 2.680799485646958
I can cnot ompile v5 , so v5 no data
Last edited by quickbbbb on Dec 09, 2021 1:01, edited 1 time in total.
Re: How to use SSE2 in freebasic ?
when you run example of D.J.Peters , your cpu run in 5.00 GHz. ?srvaldez wrote:it's a i9-9900K CPU @ 3.60GHz, 3600 Mhz
Total Cores 8.
Total Threads 16.
Max Turbo Frequency 5.00 GHz.
Intel® Turbo Boost Technology 2.0 Frequency‡ 5.00 GHz.
Processor Base Frequency 3.60 GHz.
Cache 16 MB Intel® Smart Cache.
Bus Speed 8 GT/s.
TDP 95 W.
I maxed-out the RAM because I frequently run VM's
2.68 second ( my test ) div 0.812 second ( your test ) = 3.3 times
my cpu = 3.7G
if your cpu run in 5.00 GHz ----> 5G / 3.7G = 1.35135 times
3.3 times / 1.35135 times = 2.44 times ( let i7-3770 , i9-9900k has same frequency )
CPU_I9_9900k speed is 2.44 times progress at the same frequency more than CPU_I7_3770
Re: How to use SSE2 in freebasic ?
yes, the CPU throttles up and down all the time usually between 800 and 5000 MHzquickbbbb wrote: when you run example of D.J.Peters , your cpu run in 5.00 GHz. ?
but I am suspicious about some of the reported speeds of the tests, what command line options are you using?
they can have a huge impact on performance, my compile command is fbc64 -t 4096 -w all -arch native -gen gcc -Wc -O2,-fno-builtin -v "%f"
where "%f" is the filename
-
- Posts: 8586
- Joined: May 28, 2005 3:28
- Contact:
Re: How to use SSE2 in freebasic ?
you need SSE on 32-bit windows ?quickbbbb wrote:I can cnot ompile v5 , so v5 no data
Joshy
-
- Posts: 8586
- Joined: May 28, 2005 3:28
- Contact:
Re: How to use SSE2 in freebasic ?
@quickbbbb try a gain I added 32-bit SSE also.
Joshy
Joshy
Re: How to use SSE2 in freebasic ?
D.J.Peters wrote:@quickbbbb try a gain I added 32-bit SSE also.
Joshy
Sorry!
Because I use VisialFreeBasic IDE , it can not compile v5 ( occur error )
so I download FBIde , and use FBIde to compile.
Now compile success !! ( FBIde )
Test result is following:
i7-3770 + win7_64bit
================================= first run
time v1: 1.920214737634524
time v2: 2.544125353175332
time v3: 2.45014096495288
time v4: 2.449346490073367
time v5: 0.8082955011923332
================================= second run
time v1: 1.967784827691503
time v2: 2.472062978427857
time v3: 2.495674760328257
time v4: 2.464804641276714
time v5: 0.8106885851593688
Last edited by quickbbbb on Dec 09, 2021 4:03, edited 3 times in total.
Re: How to use SSE2 in freebasic ?
srvaldez wrote: they can have a huge impact on performance, my compile command is fbc64 -t 4096 -w all -arch native -gen gcc -Wc -O2,-fno-builtin -v "%f"
where "%f" is the filename
I still do not know how to use the command in PBide , I will study it .
thank you very muchD.J.Peters wrote:@quickbbbb try a gain I added 32-bit SSE also.
Joshy
I will study how to compile use command
=============
movsd xmm1, QWORD PTR [rbx+rax*8]
mulsd xmm1, QWORD PTR [rdx+rax*8]
addsd xmm0, xmm1
WOW!
so now I will have function (1) ar1 * ar2 (2) ar1 + ar2 (3) ar1 - ar2
mulsd xmm1, QWORD PTR [rdx+rax*8] -> math *
addsd xmm1, QWORD PTR [rdx+rax*8] -> math +
subsd xmm1, QWORD PTR [rdx+rax*8] -> math -
-
- Posts: 8586
- Joined: May 28, 2005 3:28
- Contact:
Re: How to use SSE2 in freebasic ?
@quickbbbb there is o need to use other compiler for 32-bit if you use the right compiler switches the optimized SSE code are really fast.
Here you can see v4() BASIC function is faster as the hand written SSE naked v5() assembler code :-)
on 32-bit use this:
fbc -gen gcc -arch pentium4-sse3 -Wc -O3 -fpu sse -O 3 -fpmode fast -asm intel ssetest.bas
on 64-bit I use:
fbc -arch x86-64 -Wc -O3 -fpmode fast -fpu sse -O 3 -asm intel ssetest.bas
Joshy
file: "ssetest.bas"
Here you can see v4() BASIC function is faster as the hand written SSE naked v5() assembler code :-)
on 32-bit use this:
fbc -gen gcc -arch pentium4-sse3 -Wc -O3 -fpu sse -O 3 -fpmode fast -asm intel ssetest.bas
on 64-bit I use:
fbc -arch x86-64 -Wc -O3 -fpmode fast -fpu sse -O 3 -asm intel ssetest.bas
Joshy
file: "ssetest.bas"
Code: Select all
function v1(l as double ptr,r as double ptr,s as uinteger,e as uinteger) as double
dim as double result
for i as uinteger = s to e
result += l[i] * r[i]
next
return result
end function
function v2(l as double ptr,r as double ptr,n as uinteger) as double
dim as double result
for i as uinteger = 0 to n
result += l[i] * r[i]
next
return result
end function
function v3(l as double ptr,r as double ptr,n as uinteger) as double
dim as double result
for i as uinteger = 0 to n
result += *l * *r : l+=1 : r+=1
next
return result
end function
function v4(l as double ptr,r as double ptr,n as uinteger) as double
dim as double result
dim as double ptr e=l+n+1
while l<e : result += *l * *r : l+=1 : r+=1 : wend
return result
end function
#ifndef __FB_64BIT__
' 32-bit used the stack
' 64-bit params: rcx=@a, rdx=@b, r8=n
sub v5 naked (byval a as double ptr, _
byval b as double ptr, _
byval n as uinteger, _
byval r as double ptr)
#define BASIS 8
asm
push ebp
mov ebp,esp
push ebx
push edx
mov ebx,[ebp+BASIS]
lea ebx,[ebx]
mov edx,[ebp+BASIS+4]
lea edx,[edx]
mov ecx,[ebp+BASIS+8]
inc ecx
xorpd xmm0, xmm0
xor eax,eax
loop_x86_v5:
movsd xmm1, QWORD PTR [ebx+eax*8]
mulsd xmm1, QWORD PTR [edx+eax*8]
addsd xmm0, xmm1
inc eax
dec ecx
jnz loop_x86_v5
mov edx,[ebp+BASIS+12]
lea edx,[edx]
movsd QWORD PTR [edx],xmm0
pop edx
pop ebx
pop ebp
ret 16
end asm
end sub
#else
' 64-bit params: rcx=@a, rdx=@b, r8=n
function v5 (byval a as double ptr, _
byval b as double ptr, _
byval n as uinteger) as double
asm
push rbx
push rdx
lea rbx,[rcx]
lea rdx,[rdx]
mov rcx,r8
inc rcx
xorpd xmm0, xmm0
xor rax,rax
loop_x86_64_v5:
movsd xmm1, QWORD PTR [rbx+rax*8]
mulsd xmm1, QWORD PTR [rdx+rax*8]
addsd xmm0, xmm1
inc rax
dec rcx
jnz loop_x86_64_v5
pop rdx
pop rbx
movsd [function],xmm0
end asm
end function
#endif
const as uinteger N = 100000
const as uinteger S = 1000 ' first item
const as uinteger E = 99000 ' last item
dim shared as double a(N-1),b(N-1)
for i as uinteger = 0 to N-1
a(i)=i:b(i)=i
next
const as uinteger NLOOPS = 5000
print "please wait while run 5 tests ..."
dim as double result
var t1 = timer()
for i as uinteger = 1 to NLOOPS
result = v1(@a(0),@b(0),S,E)
next
t1=timer()-t1
print "result v1: " & result
var t2 = timer()
for i as uinteger = 1 to NLOOPS
result = v2(@a(s),@b(s),E-S)
next
t2=timer()-t2
print "result v2: " & result
var t3 = timer()
for i as uinteger = 1 to NLOOPS
result = v3(@a(s),@b(s),E-S)
next
t3=timer()-t3
print "result v3: " & result
var t4 = timer()
for i as uinteger = 1 to NLOOPS
result = v4(@a(s),@b(s),E-S)
next
t4=timer()-t4
print "result v4: " & result
result=0
var t5=timer()
for i as uinteger = 1 to NLOOPS
#ifndef __FB_64BIT__
' on 32.bit implemented as sub
v5(@a(s),@b(s),E-S,@result)
#else
result = v5(@a(s),@b(s),E-S)
#endif
next
t5=timer()-t5
print "result v5: " & result
print
print "time v1: " & t1
print "time v2: " & t2
print "time v3: " & t3
print "time v4: " & t4
print "time v5: " & t5
sleep
Re: How to use SSE2 in freebasic ?
D.J.Peters wrote:@quickbbbb there is o need to use other compiler for 32-bit if you use the right compiler switches the optimized SSE code are really fast.
Here you can see v4() BASIC function is faster as the hand written SSE naked v5() assembler code :-)
on 32-bit use this:
fbc -gen gcc -arch pentium4-sse3 -Wc -O3 -fpu sse -O 3 -fpmode fast -asm intel ssetest.bas
on 64-bit I use:
fbc -arch x86-64 -Wc -O3 -fpmode fast -fpu sse -O 3 -asm intel ssetest.bas
Joshy
OK! Thank you vey much!
Re: How to use SSE2 in freebasic ?
D.J.Peters wrote:@quickbbbb there is o need to use other compiler for 32-bit if you use the right compiler switches the optimized SSE code are really fast.
Here you can see v4() BASIC function is faster as the hand written SSE naked v5() assembler code :-)
on 32-bit use this:
fbc -gen gcc -arch pentium4-sse3 -Wc -O3 -fpu sse -O 3 -fpmode fast -asm intel ssetest.bas
on 64-bit I use:
fbc -arch x86-64 -Wc -O3 -fpmode fast -fpu sse -O 3 -asm intel ssetest.bas
Joshy
WOW , My God!
old Test as following
====================================
time v1: 1.920214737634524
time v2: 2.544125353175332
time v3: 2.45014096495288
time v4: 2.449346490073367
New Test as following command = -gen gcc -arch pentium4-sse3 -Wc -O3 -fpu sse -O 3 -fpmode fast
====================================
time v1: 0.8113254932250129
time v2: 0.8090549612388713
time v3: 0.8096571563073667
time v4: 0.8080199101677863
thanks again everyone
-
- Posts: 8586
- Joined: May 28, 2005 3:28
- Contact:
Re: How to use SSE2 in freebasic ?
Why do not post the result of V5() ?
Joshy
Joshy
Re: How to use SSE2 in freebasic ?
(1)D.J.Peters wrote:Why do not post the result of V5() ?
Joshy
when include function v5 and use command= -s console -gen gcc -arch pentium4-sse3 -Wc -O3 -fpu sse -O 3 -fpmode fast ---> compile error
I compile as 32 bit ,WinPE command = -s console -gen gcc -arch pentium4-sse3 -Wc -O3 -fpu sse -O 3 -fpmode fast -----> compile error
I compile as 64 bit ,WinPE command = -s console -gen gcc -arch pentium4-sse3 -Wc -O3 -fpu sse -O 3 -fpmode fast -----> compile error
WinPE show command as following
G:\FreeBasic\Compile\fbc32.exe -m "D:\QQQ.bas" -v -s console -gen gcc -arch pentium4-sse3 -Wc -O3 -fpu sse -O 3 -fpmode fast -x "D:\QQQ.exe"
(2)
when include function v5 and use command= -s console ---> compile success
WinPE show command as following
G:\FreeBasic\Compile\fbc64.exe -m "D:\QQQ.bas" -v -s console -x "D:\QQQ.exe"
-
- Posts: 8586
- Joined: May 28, 2005 3:28
- Contact:
Re: How to use SSE2 in freebasic ?
I'm self are very impressed :-)quickbbbb wrote:WOW , My God!
I compiled an old program I wrote in 2008 where over 4 GB 3D Vectors are calculated and solve a Jacobi Matrix (for radiosity)
With the command line switched for SSE both binaries 32-bit and 64-bit are 2 times faster as without :-)
Joshy