fast sin compare

General FreeBASIC programming questions.
srvaldez
Posts: 1513
Joined: Sep 25, 2005 21:54

fast sin compare

Postby srvaldez » Jan 03, 2018 12:44

while trying to decipher the _Sin6th function from the post here viewtopic.php?f=7&t=26248
I searched the net for alternatives and found fast_sin from https://www.gamedev.net/forums/topic/62 ... oximation/
it turns out to be much faster and more accurate (about 3 decimal digits), here are the times on my PC

Code: Select all

32-bit, gen gcc -Wc -O2
time for FB sin =  18.01501279999502
sum = -6.369349492274523e-013
time for fast_sin =  4.04145359992981
sum =  2.114974861910923e-013
time for _Sin6th =  11.92606279999018
sum =  67.97632024315362

64-bit gen gcc -Wc -O2
time for FB sin =  42.23232729989104
sum =  8.659739592076221e-015
time for fast_sin =  3.331463000038639
sum =  2.114974861910923e-013
time for _Sin6th =  6.449131899978966
sum =  67.97632024315362

one surprise was the much longer execution time for FB x64 but more accurate, the sum should be 0

Code: Select all

'https://www.freebasic.net/forum/viewtopic.php?f=7&t=26248
Function _Sin6th(fX As Double) As Double
   asm
      jmp _Sin6th_Start
         _Sin6th_Mul: .double 683565275.57643158
         _Sin6th_Div: .double -0.0000000061763971109087229
         _Sin6th_Rnd: .double 6755399441055744.0
       
      _Sin6th_Start:
         movq xmm0, [fX]
         mulsd xmm0, [_Sin6th_Mul]
         addsd xmm0, [_Sin6th_Rnd]
         movd ebx, xmm0
   
         lea  eax, [ebx*2+0x80000000]
         sar  eax, 2
         imul eax
         sar  ebx, 31
         lea  eax, [edx*2-0x70000000]
         lea  ecx, [edx*8+edx-0x24000000]
         imul edx
         xor  ecx, ebx
         lea  eax, [edx*8+edx+0x44A00000]
         imul ecx
         
         cvtsi2sd xmm0, edx
         mulsd xmm0, [_Sin6th_Div]
         movq [Function], xmm0
   End Asm
End Function

'https://www.gamedev.net/forums/topic/621589-extremely-fast-sin-approximation/
function fast_round(byval x as double) as long
   dim MAGIC_ROUND as const double = 6755399441055744.0
   union fast_trunc
      d as double
      type
         lw as long
         hw as long
      end type
   end union
   dim fast_trunc as fast_trunc
   fast_trunc.d = x
   fast_trunc.d += MAGIC_ROUND
   return fast_trunc.lw
end function

'https://www.gamedev.net/forums/topic/621589-extremely-fast-sin-approximation/
function fast_sin(byval x as double) as double
   dim PI as const double = 3.14159265358979323846264338327950288
   dim INVPI as const double = 0.31830988618379067153776752674502872
   dim A as const double = 0.00735246819687011731341356165096815
   dim B as const double = -0.16528911397014738207016302002888890
   dim C as const double = 0.99969198629596757779830113868360584
   dim k as long
   dim x2 as double
   k = fast_round(INVPI * x)
   x -= k * PI
   x2 = x * x
   x = x * (C + (x2 * (B + (A * x2))))
   if k mod 2 then
      x = -x
   end if
   return x
end function

dim as double t, y
t=timer
y=0
for x as double=-214748364 to 214748364
   y += sin(x)
next
print "time for FB sin = ";timer-t
print "sum = ";y

t=timer
y=0
for x as double=-214748364 to 214748364
   y += fast_sin(x)
next
print "time for fast_sin = ";timer-t
print "sum = ";y
t=timer
y=0
for x as double=-214748364 to 214748364
   y += _Sin6th(x)
next
print "time for _Sin6th = ";timer-t
print "sum = ";y
jj2007
Posts: 372
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: fast sin compare

Postby jj2007 » Jan 03, 2018 19:10

FB32
Gas:

Code: Select all

time for FB sin =  17.03938541668578
sum =  6.441038566222523e-013
time for fast_sin =  28.36938175893437
sum = -3.419486915845482e-014
time for _Sin6th =  9.583104912452711
sum =  67.97632024315428


-gen gcc -Wc -O2 -s console:

Code: Select all

time for FB sin =  18.63207066554548
sum = -6.378231276471524e-013
time for fast_sin =  4.682086121603732
sum = -6.447741104052829e-013
time for _Sin6th =  10.41132472365041
sum =  67.97632024315428

No assembler code? I'd like to test it against Sinus(), which is about 3.5 times as fast as _Sin6th.
srvaldez
Posts: 1513
Joined: Sep 25, 2005 21:54

Re: fast sin compare

Postby srvaldez » Jan 03, 2018 19:15

hi jj2007
as you noticed gcc does a good job at optimizing, that's one reason I am frowning at non-portable asm code, asm is fun but inline asm in FB x64 is problematic, try compiling with -O3 or -Ofast.
jj2007
Posts: 372
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: fast sin compare

Postby jj2007 » Jan 03, 2018 22:53

srvaldez wrote:try compiling with -O3 or -Ofast.

I tried gcc -Wc -O3 and -Wc -O3 but it fails miserably with TmpFile.asm:340: Error: symbol `_Sin6th_Mul' is already defined
Same code compiles just fine with -gen gcc -Wc -O2
Probably a bug...
UEZ
Posts: 140
Joined: May 05, 2017 19:59
Location: Germany

Re: fast sin compare

Postby UEZ » Jan 04, 2018 0:01

Well, when I replace only the calls for _Sin6th only with fast_sin in the 3D Sine Wave demo then the FPS decreases from 64 fps to 50 fps!

Compiled with standard -s gui (Win32).
srvaldez
Posts: 1513
Joined: Sep 25, 2005 21:54

Re: fast sin compare

Postby srvaldez » Jan 04, 2018 0:34

@UEZ
you need to compile using gcc, for example: fbc 3D_Sine_Wave.bas -gen gcc -Wc -O2
fast sine is only faster when optimized using gcc, however FB graphics are quite a bit slower when compiled with gcc
UEZ
Posts: 140
Joined: May 05, 2017 19:59
Location: Germany

Re: fast sin compare

Postby UEZ » Jan 04, 2018 1:09

srvaldez wrote:@UEZ
you need to compile using gcc, for example: fbc 3D_Sine_Wave.bas -gen gcc -Wc -O2
fast sine is only faster when optimized using gcc, however FB graphics are quite a bit slower when compiled with gcc


Indeed, now the fps goes up to 75 fps only when changed the fast_sin calls within _3Dto2D2 sub.

Is there also a fast_cos counterpart function available?
srvaldez
Posts: 1513
Joined: Sep 25, 2005 21:54

Re: fast sin compare

Postby srvaldez » Jan 04, 2018 1:38

UEZ wrote:Is there also a fast_cos counterpart function available?

you can make a copy of fast_sin and change x to Pi/2 - x or you could change the calls to cos to fast_sin(PI/2 - x), here's a modified fast_sin to give cos

Code: Select all

'https://www.gamedev.net/forums/topic/621589-extremely-fast-sin-approximation/
function fast_cos(byval x as double) as double
   dim PI as const double = 3.14159265358979323846264338327950288
   dim PI2 as const double = 1.5707963267948966192313216916397514421
   dim INVPI as const double = 0.31830988618379067153776752674502872
   dim A as const double = 0.00735246819687011731341356165096815
   dim B as const double = -0.16528911397014738207016302002888890
   dim C as const double = 0.99969198629596757779830113868360584
   dim k as long
   dim x2 as double
   x = PI2 - x
   k = fast_round(INVPI * x)
   x -= k * PI
   x2 = x * x
   x = x * (C + (x2 * (B + (A * x2))))
   if k mod 2 then
      x = -x
   end if
   return x
end function
deltarho[1859]
Posts: 1014
Joined: Jan 02, 2017 0:34
Location: UK

Re: fast sin compare

Postby deltarho[1859] » Jan 04, 2018 2:49

The following compiles with -03 and -0fast. Strap yourselves into your chair. <smile>

Code: Select all

Function _Sin6th(fX As Double) As Double
  Asm
    jmp 0f
  1: .double 683565275.57643158
  2: .double -0.0000000061763971109087229
  3: .double 6755399441055744.0
       
  0:
    movq xmm0, [fX]
    mulsd xmm0, [1b]
    addsd xmm0, [3b]
    movd ebx, xmm0

    lea  eax, [ebx*2+0x80000000]
    sar  eax, 2
    imul eax
    sar  ebx, 31
    lea  eax, [edx*2-0x70000000]
    lea  ecx, [edx*8+edx-0x24000000]
    imul edx
    xor  ecx, ebx
    lea  eax, [edx*8+edx+0x44A00000]
    imul ecx

    cvtsi2sd xmm0, edx
    mulsd xmm0, [2b]
    movq [Function], xmm0
  End Asm
End Function
UEZ
Posts: 140
Joined: May 05, 2017 19:59
Location: Germany

Re: fast sin compare

Postby UEZ » Jan 04, 2018 11:30

@srvaldez: thanks. :-)


deltarho[1859] wrote:The following compiles with -03 and -0fast. Strap yourselves into your chair. <smile>

Code: Select all

Function _Sin6th(fX As Double) As Double
  Asm
    jmp 0f
  1: .double 683565275.57643158
  2: .double -0.0000000061763971109087229
  3: .double 6755399441055744.0
       
  0:
    movq xmm0, [fX]
    mulsd xmm0, [1b]
    addsd xmm0, [3b]
    movd ebx, xmm0

    lea  eax, [ebx*2+0x80000000]
    sar  eax, 2
    imul eax
    sar  ebx, 31
    lea  eax, [edx*2-0x70000000]
    lea  ecx, [edx*8+edx-0x24000000]
    imul edx
    xor  ecx, ebx
    lea  eax, [edx*8+edx+0x44A00000]
    imul ecx

    cvtsi2sd xmm0, edx
    mulsd xmm0, [2b]
    movq [Function], xmm0
  End Asm
End Function


What is the meaning of the suffixes f or b (jmp 0f, mulsd xmm0, [1b])?
deltarho[1859]
Posts: 1014
Joined: Jan 02, 2017 0:34
Location: UK

Re: fast sin compare

Postby deltarho[1859] » Jan 04, 2018 11:48

@UEZ

Put srvadez's and mine side by side and ponder. You will only get half way through a cup of tea and the penny will drop - trust me. <smile>
UEZ
Posts: 140
Joined: May 05, 2017 19:59
Location: Germany

Re: fast sin compare

Postby UEZ » Jan 04, 2018 12:31

deltarho[1859] wrote:@UEZ

Put srvadez's and mine side by side and ponder. You will only get half way through a cup of tea and the penny will drop - trust me. <smile>


Well, I googled a litte bit: the f suffix means forward and b means backward to the next / previous label. Is this faster than the original code?

Btw, the asm functions _Sin6th / _Cos6th was written by Eukalyptus. ;-)
deltarho[1859]
Posts: 1014
Joined: Jan 02, 2017 0:34
Location: UK

Re: fast sin compare

Postby deltarho[1859] » Jan 04, 2018 12:45

UEZ wrote:Is this faster than the original code?

It is infinitely faster because the original code will not compile with optimisation levels -03 and -0fast.
Under certain circumstances, GCC may duplicate (or remove duplicates of) your assembly code when optimizing. This can lead to unexpected duplicate symbol errors during compilation if your asm code defines symbols or labels.

When we use local/temporary labels the compiler generates unique labels and will not shoot itself in the foot.
jj2007
Posts: 372
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: fast sin compare

Postby jj2007 » Jan 04, 2018 13:03

Here is deltarho's last code with slight modifications to allow more realistic testing: I split the loop in two, an outer one with an integer counter, and an inner one with a double counter that covers the 0...359.99 degrees range. The step is tweaked to arrive at a zero sum:

Code: Select all

  for x as integer=0 to outerloops
   for s as double=0 to 6.2831853 Step 0.01750190891   ' make sure it's 360 inner iterations
      innercount+=1
      y += sin(s)
   next
  next

Results:

Code: Select all

time for FB sin =  0.1608874157742548     for  3600000 iterations, average:  44 ms/call
sum = -4.244895989445813e-005
time for MB sin =  0.03104239161274336    for  3600000 iterations, average:  8 ms/call
sum =  1.076452668051075e-014
time for fast_sin =  0.0363557692970744   for  3600000 iterations, average:  10 ms/call
sum = -4.243645435448729e-005
time for _Sin6th =  0.0875843159346914    for  3600000 iterations, average:  24 ms/call
sum =  0.01235279427191839

Both FB sin and fast_sin arrive at the same (small) sum, _Sin6th is a bit off but nothing to worry about. Timings are as expected, but probably it's nanosecs, not milliseconds; I'll have to check that one more ;-)

The full source is further down.
Last edited by jj2007 on Jan 04, 2018 15:51, edited 1 time in total.
dodicat
Posts: 4764
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: fast sin compare

Postby dodicat » Jan 04, 2018 14:34

Sinus.dll is not very well regarded.
http://www.completelyuninstallprogram.com/sinus-dll/
But I took a chance with your dll.
It scanned OK with Avira.

I see it only works with -gen gcc 32 bits.

But what is it, does it return the sin of an angle?
Or is it really discounts and best prices?

Code: Select all

#inclib "mbsinusdll"
declare function mbsinus cdecl alias "MbSinP"( as double) as double
dim as function  cdecl(as double)as double psinus=@mbsinus

for n as long=0 to 10
print psinus(n),mbsinus(n)
next

sleep
 

Return to “General”

Who is online

Users browsing this forum: No registered users and 3 guests