## Fast Inverse Square Root

sir_mud
Posts: 1401
Joined: Jul 29, 2006 3:00
Location: US
Contact:

### Fast Inverse Square Root

Translated this simple but extremely fast function from Quake3's source code. Full story here: http://www.beyond3d.com/articles/fastinvsqrt/

Code: Select all

function InvSqrt ( x as double ) as double
dim as double xhalf = .5 * x
dim as integer i = cast(integer, x)
i = &h5F3759DF - (i shr 1)
x = cast(double, i)
x = x * (1.5 - xhalf * x * x)
return x
end function

With this quickie sample code:

Code: Select all

? InvSqrt(23.456)

Profiling says it takes 0.00001 seconds to compute the Inverse Square root, it takes 0.00003 just to print the resulting double xD
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA
Interesting, but I think the function is not implemented correctly (tested 0.17-July30 only).

Code: Select all

#include "windows.bi"
#include "counter.bas"

Function InvSqrt ( x As Double ) As Double
Dim As Double xhalf = .5 * x
Dim As Integer i = cast(Integer, x)
i = &h5F3759DF - (i Shr 1)
x = cast(Double, i)
x = x * (1.5 - xhalf * x * x)
Return x
End Function

Function FpuInvSqrt ( x As Double ) As Double
return 1.0 / sqr(x)
End Function

dim as double d, r

print InvSqrt( 1 )
print FpuInvSqrt( 1 )

print InvSqrt( .0000123 )
print FpuInvSqrt( .0000123 )

print InvSqrt( 123 )
print FpuInvSqrt( 123 )

d = 123.456

sleep 3000      ' Give lazy writes time to complete

counter_begin
r = InvSqrt( d )
counter_end
print "InvSqrt ";counter_cycles;" cycles"

counter_begin
r = FpuInvSqrt( d )
counter_end
print "FpuInvSqrt ";counter_cycles;" cycles"

sleep

Code: Select all

-2.038273385915891e+027
1
-2.507076264676306e+022
285.1329742561005
-2.507075977474315e+029
9.016696346674323e-002
InvSqrt 217 cycles
FpuInvSqrt 271 cycles

It’s faster on my P3, but fixing it could speed it up or slow it down.

Counter.bas is here.

Edit: My timings are meaningless. After the first call to InvSqrt the cycle count goes up for both procedures, about 70 cycles for FpuInvSqrt, and about 535 cycles for InvSqrt! I at first thought the increase might be due to an exception, but a delay between the calls and/or an finit does not change the timings. I can't see how the problem could be my timing macros, but that doesn't mean it isn't. There is a potential problem with the FPU executing in parallel with the timing code, but inserting FWAITs does not correct the problem, and I can't see how this could extend the cycle count.
Last edited by MichaelW on Dec 02, 2006 11:21, edited 1 time in total.
v1ctor
Posts: 3804
Joined: May 27, 2005 8:08
Location: SP / Bra[s]il
Contact:
Yeah, the pointer magic was missed:

Code: Select all

function InvSqrt (x as single) as single
dim as single xhalf = 0.5f * x
dim as integer i = *cast(integer ptr, @x)
i = &h5f3759df - (i shr 1)
x = *cast(single ptr, @i)
x = x * (1.5f - xhalf*x*x)
return x
end function
cha0s
Posts: 5317
Joined: May 27, 2005 6:42
Location: Illinois
Contact:
yeah, try this:

Code: Select all

Function InvSqrt ( x As single ) As single
Dim As single xhalf = .5 * x
Dim As Integer i = *cast(Integer ptr, @x)
i = &h5F3759DF - (i Shr 1)
x = *cast(single ptr, @i)
x = x * (1.5 - xhalf * x * x)
Return x
End Function

EDIT: you beat me!!!
sir_mud
Posts: 1401
Joined: Jul 29, 2006 3:00
Location: US
Contact:
Dang, teach me to translate when i'm tired, didn't even think about the pointer magic xD. Still it's a neat little function with an interesting history.
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA
Yes, that fixed everything. Now I get consistent cycle counts.

Code: Select all

0.9983072
1
285.0786
285.133
9.011263e-002
9.016696e-002
FpuInvSqrt 132 cycles
FpuInvSqrt 132 cycles
FpuInvSqrt 133 cycles
InvSqrt 80 cycles
InvSqrt 80 cycles
InvSqrt 80 cycles
FpuInvSqrt 132 cycles
InvSqrt 80 cycles
FpuInvSqrt 132 cycles
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA
I played with this some more, linking in an object module created from this source:

Code: Select all

float GccInvSqrt(float x)
{
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1);
x = *(float*)&i;
x = x*(1.5f-xhalf*x*x);
return x;
}

With this command line (note the level 3 optimizations):

gcc -O3 -c gccinvsqrt.c

And the results were somewhat surprising.

Code: Select all

#include "windows.bi"
#include "counter.bas"

declare function GccInvSqrt cdecl alias "GccInvSqrt"(byval x As Single) As Single

Function InvSqrt (x As Single) As Single
Dim As Single xhalf = 0.5f * x
Dim As Integer i = *cast(Integer Ptr, @x)
i = &h5f3759df - (i Shr 1)
x = *cast(Single Ptr, @i)
x = x * (1.5f - xhalf*x*x)
Return x
End Function

Function FpuInvSqrt ( x As Single ) As Single
return 1.0 /sqr(x)
End Function

dim as integer i
dim as single s,r

print InvSqrt( 1 )
print FpuInvSqrt( 1 )
print GccInvSqrt( 1 )

print InvSqrt( 0.0000123 )
print FpuInvSqrt( 0.0000123 )
print GccInvSqrt( 0.0000123 )

print InvSqrt( 123 )
print FpuInvSqrt( 123 )
print GccInvSqrt( 123 )

s = 1.23

sleep 3000      ' Give lazy writes time to complete

for i = 1 to 3
counter_begin
r = InvSqrt( s )
counter_end
print "InvSqrt ";counter_cycles;" cycles"
counter_begin
r = FpuInvSqrt( s )
counter_end
print "FpuInvSqrt ";counter_cycles;" cycles"
counter_begin
r = GccInvSqrt( s )
counter_end
print "GccInvSqrt ";counter_cycles;" cycles"
next

sleep

Running on a P3:

Code: Select all

0.9983072
1
0.9983072
285.0786
285.133
285.0786
9.011263e-002
9.016696e-002
9.011263e-002
InvSqrt 77 cycles
FpuInvSqrt 131 cycles
GccInvSqrt 35 cycles
InvSqrt 77 cycles
FpuInvSqrt 131 cycles
GccInvSqrt 35 cycles
InvSqrt 77 cycles
FpuInvSqrt 131 cycles
GccInvSqrt 35 cycles
sir_mud
Posts: 1401
Joined: Jul 29, 2006 3:00
Location: US
Contact:
Yea, FB isn't really optimized like GCC though
duke4e
Posts: 717
Joined: Dec 04, 2005 0:16
Location: Varazdin, Croatia, Europe
Contact:
Sorry to bring this old thread up, but I've just discovered how to get square root from this neat little "hack". So, if anyone is interested, here is the code.

Code: Select all

Function Sqrt(x As Single) As Single
Dim As Single xhalf = 0.5f * x
Dim As Integer i = *Cast(Integer Ptr, @x)
i = &h5f3759df - (i Shr 1)
x = *Cast(Single Ptr, @i)
x = x * (1.5f - xhalf * x * x)
Return x * xhalf * 2
End Function
h4tt3n
Posts: 694
Joined: Oct 22, 2005 21:12
Location: Denmark
I'm not quite sure why anyone would use this since it is clearly both much slower and less accurate that FB's own sqr function:

Code: Select all

Function Sqrt(x As Single) As Single
Dim As Single xhalf = 0.5f * x
Dim As Integer i = *Cast(Integer Ptr, @x)
i = &h5f3759df - (i Shr 1)
x = *Cast(Single Ptr, @i)
x = x * (1.5f - xhalf * x * x)
Return x * xhalf * 2
End Function

Const num_loops = 100000000
dim as single t, s
Dim as integer i

t = timer
for i = 0 to num_loops
s = Sqr(num_loops)
next
print Using "FB's Sqr() function took: #.### seconds"; timer-t

t = timer
for i = 0 to num_loops
s = Sqrt(num_loops)
next
print Using "Sqrt() function took: #.### seconds"; timer-t

sleep

end
duke4e
Posts: 717
Joined: Dec 04, 2005 0:16
Location: Varazdin, Croatia, Europe
Contact:
If you initiate it properly, it is faster than FB sqr.

Code: Select all

Function Sqrt(x As Single) As Single
Dim As Single xhalf = 0.5f * x
Dim As Integer i = *Cast(Integer Ptr, @x)
i = &h5f3759df - (i Shr 1)
x = *Cast(Single Ptr, @i)
x = x * (1.5f - xhalf * x * x)
Return x * xhalf * 2
End Function

Const num_loops = 10000000
Dim As Single t, s, i

t = Timer
For i = 0 To num_loops
s = Sqr(i)
Next
Print Using "FB's Sqr() function took: #.### seconds"; timer-t

t = Timer
For i = 0 To num_loops
s = Sqrt(i)
Next
Print Using "Sqrt() function took: #.### seconds"; timer-t

Sleep

End
HD_
Posts: 215
Joined: Jun 10, 2006 12:15
Contact:
On my machine it's still slower, or equally fast at best :(
anonymous1337
Posts: 5494
Joined: Sep 12, 2005 20:06
Location: California
Inverse Square Root and Square Root. Aren't they different? Wtf are they, exactly? A google search on inverse square root just gave me algorithms and stuff.
duke4e
Posts: 717
Joined: Dec 04, 2005 0:16
Location: Varazdin, Croatia, Europe
Contact:
Inverse square root is "1 / sqr()".

The trick is that "Return x * xhalf * 2" is faster than "Return 1 / x"
Actually "Return 1 / x" is slower than FB sqr.
h4tt3n
Posts: 694
Joined: Oct 22, 2005 21:12
Location: Denmark
duke4e wrote:If you initiate it properly, it is faster than FB sqr.

Sorry duke, in the example you posted the Sqrt() function was still more than twice as slow than the ambient FB Sqr() one.

Which is odd, by the way. I've heard that this actually should be very fast and that they use it in the game industry?!?