## Fast Inverse Square Root

sir_mud
Posts: 1401
Joined: Jul 29, 2006 3:00
Location: US
Contact:

### Fast Inverse Square Root

Translated this simple but extremely fast function from Quake3's source code. Full story here: http://www.beyond3d.com/articles/fastinvsqrt/

Code: Select all

`function InvSqrt ( x as double ) as doubledim as double xhalf = .5 * xdim as integer i = cast(integer, x)i = &h5F3759DF - (i shr 1)x = cast(double, i)x = x * (1.5 - xhalf * x * x)return xend function`

With this quickie sample code:

Code: Select all

`? InvSqrt(23.456)`

Profiling says it takes 0.00001 seconds to compute the Inverse Square root, it takes 0.00003 just to print the resulting double xD
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA
Interesting, but I think the function is not implemented correctly (tested 0.17-July30 only).

Code: Select all

`#include "windows.bi"#include "counter.bas"Function InvSqrt ( x As Double ) As Double  Dim As Double xhalf = .5 * x  Dim As Integer i = cast(Integer, x)  i = &h5F3759DF - (i Shr 1)  x = cast(Double, i)  x = x * (1.5 - xhalf * x * x)  Return xEnd FunctionFunction FpuInvSqrt ( x As Double ) As Double  return 1.0 / sqr(x)End Functiondim as double d, rprint InvSqrt( 1 )print FpuInvSqrt( 1 )print InvSqrt( .0000123 )print FpuInvSqrt( .0000123 )print InvSqrt( 123 )print FpuInvSqrt( 123 )d = 123.456sleep 3000      ' Give lazy writes time to completecounter_begin  r = InvSqrt( d )counter_endprint "InvSqrt ";counter_cycles;" cycles"counter_begin  r = FpuInvSqrt( d )counter_endprint "FpuInvSqrt ";counter_cycles;" cycles"sleep`

Code: Select all

`-2.038273385915891e+027 1-2.507076264676306e+022 285.1329742561005-2.507075977474315e+029 9.016696346674323e-002InvSqrt 217 cyclesFpuInvSqrt 271 cycles`

It’s faster on my P3, but fixing it could speed it up or slow it down.

Counter.bas is here.

Edit: My timings are meaningless. After the first call to InvSqrt the cycle count goes up for both procedures, about 70 cycles for FpuInvSqrt, and about 535 cycles for InvSqrt! I at first thought the increase might be due to an exception, but a delay between the calls and/or an finit does not change the timings. I can't see how the problem could be my timing macros, but that doesn't mean it isn't. There is a potential problem with the FPU executing in parallel with the timing code, but inserting FWAITs does not correct the problem, and I can't see how this could extend the cycle count.
Last edited by MichaelW on Dec 02, 2006 11:21, edited 1 time in total.
v1ctor
Posts: 3804
Joined: May 27, 2005 8:08
Location: SP / Bra[s]il
Contact:
Yeah, the pointer magic was missed:

Code: Select all

`function InvSqrt (x as single) as single   dim as single xhalf = 0.5f * x   dim as integer i = *cast(integer ptr, @x)    i = &h5f3759df - (i shr 1)    x = *cast(single ptr, @i)    x = x * (1.5f - xhalf*x*x)    return xend function`
cha0s
Posts: 5317
Joined: May 27, 2005 6:42
Location: Illinois
Contact:
yeah, try this:

Code: Select all

`Function InvSqrt ( x As single ) As single  Dim As single xhalf = .5 * x  Dim As Integer i = *cast(Integer ptr, @x)  i = &h5F3759DF - (i Shr 1)  x = *cast(single ptr, @i)  x = x * (1.5 - xhalf * x * x)  Return xEnd Function`

EDIT: you beat me!!!
sir_mud
Posts: 1401
Joined: Jul 29, 2006 3:00
Location: US
Contact:
Dang, teach me to translate when i'm tired, didn't even think about the pointer magic xD. Still it's a neat little function with an interesting history.
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA
Yes, that fixed everything. Now I get consistent cycle counts.

Code: Select all

` 0.9983072 1 285.0786 285.133 9.011263e-002 9.016696e-002FpuInvSqrt 132 cyclesFpuInvSqrt 132 cyclesFpuInvSqrt 133 cyclesInvSqrt 80 cyclesInvSqrt 80 cyclesInvSqrt 80 cyclesFpuInvSqrt 132 cyclesInvSqrt 80 cyclesFpuInvSqrt 132 cycles`
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA
I played with this some more, linking in an object module created from this source:

Code: Select all

`float GccInvSqrt(float x){  float xhalf = 0.5f*x;  int i = *(int*)&x;  i = 0x5f3759df - (i>>1);  x = *(float*)&i;  x = x*(1.5f-xhalf*x*x);  return x;}`

With this command line (note the level 3 optimizations):

gcc -O3 -c gccinvsqrt.c

And the results were somewhat surprising.

Code: Select all

`#include "windows.bi"#include "counter.bas"declare function GccInvSqrt cdecl alias "GccInvSqrt"(byval x As Single) As SingleFunction InvSqrt (x As Single) As Single  Dim As Single xhalf = 0.5f * x  Dim As Integer i = *cast(Integer Ptr, @x)  i = &h5f3759df - (i Shr 1)  x = *cast(Single Ptr, @i)  x = x * (1.5f - xhalf*x*x)  Return xEnd FunctionFunction FpuInvSqrt ( x As Single ) As Single  return 1.0 /sqr(x)End Functiondim as integer idim as single s,rprint InvSqrt( 1 )print FpuInvSqrt( 1 )print GccInvSqrt( 1 )print InvSqrt( 0.0000123 )print FpuInvSqrt( 0.0000123 )print GccInvSqrt( 0.0000123 )print InvSqrt( 123 )print FpuInvSqrt( 123 )print GccInvSqrt( 123 )s = 1.23sleep 3000      ' Give lazy writes time to completefor i = 1 to 3  counter_begin    r = InvSqrt( s )  counter_end  print "InvSqrt ";counter_cycles;" cycles"  counter_begin    r = FpuInvSqrt( s )  counter_end  print "FpuInvSqrt ";counter_cycles;" cycles"    counter_begin    r = GccInvSqrt( s )  counter_end  print "GccInvSqrt ";counter_cycles;" cycles"nextsleep`

Running on a P3:

Code: Select all

` 0.9983072 1 0.9983072 285.0786 285.133 285.0786 9.011263e-002 9.016696e-002 9.011263e-002InvSqrt 77 cyclesFpuInvSqrt 131 cyclesGccInvSqrt 35 cyclesInvSqrt 77 cyclesFpuInvSqrt 131 cyclesGccInvSqrt 35 cyclesInvSqrt 77 cyclesFpuInvSqrt 131 cyclesGccInvSqrt 35 cycles`
sir_mud
Posts: 1401
Joined: Jul 29, 2006 3:00
Location: US
Contact:
Yea, FB isn't really optimized like GCC though
duke4e
Posts: 717
Joined: Dec 04, 2005 0:16
Location: Varazdin, Croatia, Europe
Contact:
Sorry to bring this old thread up, but I've just discovered how to get square root from this neat little "hack". So, if anyone is interested, here is the code.

Code: Select all

`Function Sqrt(x As Single) As Single    Dim As Single xhalf = 0.5f * x    Dim As Integer i = *Cast(Integer Ptr, @x)    i = &h5f3759df - (i Shr 1)    x = *Cast(Single Ptr, @i)    x = x * (1.5f - xhalf * x * x)    Return x * xhalf * 2End Function`
h4tt3n
Posts: 694
Joined: Oct 22, 2005 21:12
Location: Denmark
I'm not quite sure why anyone would use this since it is clearly both much slower and less accurate that FB's own sqr function:

Code: Select all

`Function Sqrt(x As Single) As Single    Dim As Single xhalf = 0.5f * x    Dim As Integer i = *Cast(Integer Ptr, @x)    i = &h5f3759df - (i Shr 1)    x = *Cast(Single Ptr, @i)    x = x * (1.5f - xhalf * x * x)    Return x * xhalf * 2End FunctionConst num_loops = 100000000dim as single t, sDim as integer it = timerfor i = 0 to num_loops  s = Sqr(num_loops)nextprint Using "FB's Sqr() function took: #.### seconds"; timer-tt = timerfor i = 0 to num_loops  s = Sqrt(num_loops)nextprint Using "Sqrt() function took: #.### seconds"; timer-tsleepend`
duke4e
Posts: 717
Joined: Dec 04, 2005 0:16
Location: Varazdin, Croatia, Europe
Contact:
If you initiate it properly, it is faster than FB sqr.

Code: Select all

`Function Sqrt(x As Single) As Single    Dim As Single xhalf = 0.5f * x    Dim As Integer i = *Cast(Integer Ptr, @x)    i = &h5f3759df - (i Shr 1)    x = *Cast(Single Ptr, @i)    x = x * (1.5f - xhalf * x * x)    Return x * xhalf * 2End FunctionConst num_loops = 10000000Dim As Single t, s, it = TimerFor i = 0 To num_loops  s = Sqr(i)NextPrint Using "FB's Sqr() function took: #.### seconds"; timer-tt = TimerFor i = 0 To num_loops  s = Sqrt(i)NextPrint Using "Sqrt() function took: #.### seconds"; timer-tSleepEnd`
HD_
Posts: 215
Joined: Jun 10, 2006 12:15
Contact:
On my machine it's still slower, or equally fast at best :(
anonymous1337
Posts: 5494
Joined: Sep 12, 2005 20:06
Location: California
Inverse Square Root and Square Root. Aren't they different? Wtf are they, exactly? A google search on inverse square root just gave me algorithms and stuff.
duke4e
Posts: 717
Joined: Dec 04, 2005 0:16
Location: Varazdin, Croatia, Europe
Contact:
Inverse square root is "1 / sqr()".

The trick is that "Return x * xhalf * 2" is faster than "Return 1 / x"
Actually "Return 1 / x" is slower than FB sqr.
h4tt3n
Posts: 694
Joined: Oct 22, 2005 21:12
Location: Denmark
duke4e wrote:If you initiate it properly, it is faster than FB sqr.

Sorry duke, in the example you posted the Sqrt() function was still more than twice as slow than the ambient FB Sqr() one.

Which is odd, by the way. I've heard that this actually should be very fast and that they use it in the game industry?!?