Fast Inverse Square Root

Post your FreeBASIC tips and tricks here. Please don’t post your code without including an explanation.
sir_mud
Posts: 1401
Joined: Jul 29, 2006 3:00
Location: US
Contact:

Fast Inverse Square Root

Postby sir_mud » Dec 02, 2006 8:27

Translated this simple but extremely fast function from Quake3's source code. Full story here: http://www.beyond3d.com/articles/fastinvsqrt/

Code: Select all

function InvSqrt ( x as double ) as double
dim as double xhalf = .5 * x
dim as integer i = cast(integer, x)
i = &h5F3759DF - (i shr 1)
x = cast(double, i)
x = x * (1.5 - xhalf * x * x)
return x
end function


With this quickie sample code:

Code: Select all

? InvSqrt(23.456)


Profiling says it takes 0.00001 seconds to compute the Inverse Square root, it takes 0.00003 just to print the resulting double xD
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Postby MichaelW » Dec 02, 2006 9:53

Interesting, but I think the function is not implemented correctly (tested 0.17-July30 only).

Code: Select all

#include "windows.bi"
#include "counter.bas"

Function InvSqrt ( x As Double ) As Double
  Dim As Double xhalf = .5 * x
  Dim As Integer i = cast(Integer, x)
  i = &h5F3759DF - (i Shr 1)
  x = cast(Double, i)
  x = x * (1.5 - xhalf * x * x)
  Return x
End Function

Function FpuInvSqrt ( x As Double ) As Double
  return 1.0 / sqr(x)
End Function

dim as double d, r

print InvSqrt( 1 )
print FpuInvSqrt( 1 )

print InvSqrt( .0000123 )
print FpuInvSqrt( .0000123 )

print InvSqrt( 123 )
print FpuInvSqrt( 123 )

d = 123.456

sleep 3000      ' Give lazy writes time to complete

counter_begin
  r = InvSqrt( d )
counter_end
print "InvSqrt ";counter_cycles;" cycles"

counter_begin
  r = FpuInvSqrt( d )
counter_end
print "FpuInvSqrt ";counter_cycles;" cycles"

sleep

Code: Select all

-2.038273385915891e+027
 1
-2.507076264676306e+022
 285.1329742561005
-2.507075977474315e+029
 9.016696346674323e-002
InvSqrt 217 cycles
FpuInvSqrt 271 cycles

It’s faster on my P3, but fixing it could speed it up or slow it down.

Counter.bas is here.

Edit: My timings are meaningless. After the first call to InvSqrt the cycle count goes up for both procedures, about 70 cycles for FpuInvSqrt, and about 535 cycles for InvSqrt! I at first thought the increase might be due to an exception, but a delay between the calls and/or an finit does not change the timings. I can't see how the problem could be my timing macros, but that doesn't mean it isn't. There is a potential problem with the FPU executing in parallel with the timing code, but inserting FWAITs does not correct the problem, and I can't see how this could extend the cycle count.
Last edited by MichaelW on Dec 02, 2006 11:21, edited 1 time in total.
v1ctor
Site Admin
Posts: 3804
Joined: May 27, 2005 8:08
Location: SP / Bra[s]il
Contact:

Postby v1ctor » Dec 02, 2006 10:41

Yeah, the pointer magic was missed:

Code: Select all

function InvSqrt (x as single) as single
   dim as single xhalf = 0.5f * x
   dim as integer i = *cast(integer ptr, @x)
    i = &h5f3759df - (i shr 1)
    x = *cast(single ptr, @i)
    x = x * (1.5f - xhalf*x*x)
    return x
end function
cha0s
Site Admin
Posts: 5317
Joined: May 27, 2005 6:42
Location: Illinois
Contact:

Postby cha0s » Dec 02, 2006 10:48

yeah, try this:

Code: Select all

Function InvSqrt ( x As single ) As single
  Dim As single xhalf = .5 * x
  Dim As Integer i = *cast(Integer ptr, @x)
  i = &h5F3759DF - (i Shr 1)
  x = *cast(single ptr, @i)
  x = x * (1.5 - xhalf * x * x)
  Return x
End Function


EDIT: you beat me!!!
sir_mud
Posts: 1401
Joined: Jul 29, 2006 3:00
Location: US
Contact:

Postby sir_mud » Dec 02, 2006 11:03

Dang, teach me to translate when i'm tired, didn't even think about the pointer magic xD. Still it's a neat little function with an interesting history.
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Postby MichaelW » Dec 02, 2006 11:40

Yes, that fixed everything. Now I get consistent cycle counts.

Code: Select all

 0.9983072
 1
 285.0786
 285.133
 9.011263e-002
 9.016696e-002
FpuInvSqrt 132 cycles
FpuInvSqrt 132 cycles
FpuInvSqrt 133 cycles
InvSqrt 80 cycles
InvSqrt 80 cycles
InvSqrt 80 cycles
FpuInvSqrt 132 cycles
InvSqrt 80 cycles
FpuInvSqrt 132 cycles
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Postby MichaelW » Dec 03, 2006 5:38

I played with this some more, linking in an object module created from this source:

Code: Select all

float GccInvSqrt(float x)
{
  float xhalf = 0.5f*x;
  int i = *(int*)&x;
  i = 0x5f3759df - (i>>1);
  x = *(float*)&i;
  x = x*(1.5f-xhalf*x*x);
  return x;
}

With this command line (note the level 3 optimizations):

gcc -O3 -c gccinvsqrt.c

And the results were somewhat surprising.

Code: Select all

#include "windows.bi"
#include "counter.bas"

declare function GccInvSqrt cdecl alias "GccInvSqrt"(byval x As Single) As Single

Function InvSqrt (x As Single) As Single
  Dim As Single xhalf = 0.5f * x
  Dim As Integer i = *cast(Integer Ptr, @x)
  i = &h5f3759df - (i Shr 1)
  x = *cast(Single Ptr, @i)
  x = x * (1.5f - xhalf*x*x)
  Return x
End Function

Function FpuInvSqrt ( x As Single ) As Single
  return 1.0 /sqr(x)
End Function

dim as integer i
dim as single s,r

print InvSqrt( 1 )
print FpuInvSqrt( 1 )
print GccInvSqrt( 1 )

print InvSqrt( 0.0000123 )
print FpuInvSqrt( 0.0000123 )
print GccInvSqrt( 0.0000123 )

print InvSqrt( 123 )
print FpuInvSqrt( 123 )
print GccInvSqrt( 123 )

s = 1.23

sleep 3000      ' Give lazy writes time to complete

for i = 1 to 3
  counter_begin
    r = InvSqrt( s )
  counter_end
  print "InvSqrt ";counter_cycles;" cycles"
  counter_begin
    r = FpuInvSqrt( s )
  counter_end
  print "FpuInvSqrt ";counter_cycles;" cycles"
    counter_begin
    r = GccInvSqrt( s )
  counter_end
  print "GccInvSqrt ";counter_cycles;" cycles"
next

sleep

Running on a P3:

Code: Select all

 0.9983072
 1
 0.9983072
 285.0786
 285.133
 285.0786
 9.011263e-002
 9.016696e-002
 9.011263e-002
InvSqrt 77 cycles
FpuInvSqrt 131 cycles
GccInvSqrt 35 cycles
InvSqrt 77 cycles
FpuInvSqrt 131 cycles
GccInvSqrt 35 cycles
InvSqrt 77 cycles
FpuInvSqrt 131 cycles
GccInvSqrt 35 cycles
sir_mud
Posts: 1401
Joined: Jul 29, 2006 3:00
Location: US
Contact:

Postby sir_mud » Dec 03, 2006 5:47

Yea, FB isn't really optimized like GCC though
duke4e
Posts: 717
Joined: Dec 04, 2005 0:16
Location: Varazdin, Croatia, Europe
Contact:

Postby duke4e » Mar 09, 2008 18:07

Sorry to bring this old thread up, but I've just discovered how to get square root from this neat little "hack". So, if anyone is interested, here is the code.

Code: Select all

Function Sqrt(x As Single) As Single
    Dim As Single xhalf = 0.5f * x
    Dim As Integer i = *Cast(Integer Ptr, @x)
    i = &h5f3759df - (i Shr 1)
    x = *Cast(Single Ptr, @i)
    x = x * (1.5f - xhalf * x * x)
    Return x * xhalf * 2
End Function
h4tt3n
Posts: 694
Joined: Oct 22, 2005 21:12
Location: Denmark

Postby h4tt3n » Mar 09, 2008 19:09

I'm not quite sure why anyone would use this since it is clearly both much slower and less accurate that FB's own sqr function:

Code: Select all

Function Sqrt(x As Single) As Single
    Dim As Single xhalf = 0.5f * x
    Dim As Integer i = *Cast(Integer Ptr, @x)
    i = &h5f3759df - (i Shr 1)
    x = *Cast(Single Ptr, @i)
    x = x * (1.5f - xhalf * x * x)
    Return x * xhalf * 2
End Function

Const num_loops = 100000000
dim as single t, s
Dim as integer i

t = timer
for i = 0 to num_loops
  s = Sqr(num_loops)
next
print Using "FB's Sqr() function took: #.### seconds"; timer-t

t = timer
for i = 0 to num_loops
  s = Sqrt(num_loops)
next
print Using "Sqrt() function took: #.### seconds"; timer-t

sleep

end
duke4e
Posts: 717
Joined: Dec 04, 2005 0:16
Location: Varazdin, Croatia, Europe
Contact:

Postby duke4e » Mar 09, 2008 20:23

If you initiate it properly, it is faster than FB sqr.

Code: Select all

Function Sqrt(x As Single) As Single
    Dim As Single xhalf = 0.5f * x
    Dim As Integer i = *Cast(Integer Ptr, @x)
    i = &h5f3759df - (i Shr 1)
    x = *Cast(Single Ptr, @i)
    x = x * (1.5f - xhalf * x * x)
    Return x * xhalf * 2
End Function

Const num_loops = 10000000
Dim As Single t, s, i

t = Timer
For i = 0 To num_loops
  s = Sqr(i)
Next
Print Using "FB's Sqr() function took: #.### seconds"; timer-t

t = Timer
For i = 0 To num_loops
  s = Sqrt(i)
Next
Print Using "Sqrt() function took: #.### seconds"; timer-t

Sleep

End
HD_
Posts: 215
Joined: Jun 10, 2006 12:15
Contact:

Postby HD_ » Mar 09, 2008 20:32

On my machine it's still slower, or equally fast at best :(
anonymous1337
Posts: 5494
Joined: Sep 12, 2005 20:06
Location: California

Postby anonymous1337 » Mar 09, 2008 20:34

Inverse Square Root and Square Root. Aren't they different? Wtf are they, exactly? A google search on inverse square root just gave me algorithms and stuff.
duke4e
Posts: 717
Joined: Dec 04, 2005 0:16
Location: Varazdin, Croatia, Europe
Contact:

Postby duke4e » Mar 09, 2008 20:46

Inverse square root is "1 / sqr()".

The trick is that "Return x * xhalf * 2" is faster than "Return 1 / x"
Actually "Return 1 / x" is slower than FB sqr.
h4tt3n
Posts: 694
Joined: Oct 22, 2005 21:12
Location: Denmark

Postby h4tt3n » Mar 09, 2008 21:37

duke4e wrote:If you initiate it properly, it is faster than FB sqr.


Sorry duke, in the example you posted the Sqrt() function was still more than twice as slow than the ambient FB Sqr() one.

Which is odd, by the way. I've heard that this actually should be very fast and that they use it in the game industry?!?

Return to “Tips and Tricks”

Who is online

Users browsing this forum: No registered users and 3 guests