Need advice: cast(integer, float)

Gonzo · Post by **Gonzo** » Dec 06, 2012 19:24

you can see my post for all the various examples
the 'classic' example is how it would do it on x86 pre 2004
and yes, it does the whole shebang, among other things setting the rounding mode to "truncate"
but the original thread is about the default rounding mode (and me asking wether or not this affects the speed of (int)float)
i think the conclusion will be that unless the default rounding mode is truncate, it won't affect int()
and that default mode to round to nearest should be banished to QB :)

here is the gcc emission for no-flags compile:

Code: Select all

#include <stdio.h>

int main(void)
{
	float x = 0.0;
	
	printf("Type in a number \n");
	scanf("%f", &x);
	
	int n = (int)x;
	printf("Number: %d \n", n);
	
	return 0;
}

result: http://fbcraft.fwsnet.net/test.asm

Code: Select all

	flds	40(%esp)        ; move to stack
	fnstcw	30(%esp)        ; get rounding mode
	movw	30(%esp), %ax   ; move previous rounding mode to ax
	movb	$12, %ah        ; set rounding mode to truncate
	movw	%ax, 28(%esp)   ; save back to stack
	fldcw	28(%esp)        ; set control word
	fistpl	44(%esp)        ; store long from fp to stack
	fldcw	30(%esp)        ; set back to what it was
	movl	44(%esp), %eax  ; read long into eax

i have written in commentary what i think happens

MichaelW · Post by **MichaelW** » Dec 06, 2012 20:55

counting_pine wrote:By the way, how does C tend perform (int)f in x86? Does it have to adjust the rounding mode each time?

For a double the Microsoft VC Toolkit 2003 compiler loads the double onto the FPU stack and then calls a function named __ftol2 to do the work:

Code: Select all

fld QWORD PTR _d$[ebp]
call  __ftol2
mov DWORD PTR _integral$[ebp], eax

The code runs in about 34 cycles on my P3, so I doubt that it adjusts the rounding mode. I would guess that it’s doing essentially the same as Agner Fog’s code.

Gonzo · Post by **Gonzo** » Dec 06, 2012 22:04

http://www.jbox.dk/sanos/source/lib/math/ftol.asm.html

MichaelW · Post by **MichaelW** » Dec 07, 2012 0:47

The linked code expects the floating-point value to be in ST(0), so it’s apparently intended to be called from FPU code. To keep it simple I tested a function version that takes the value as an argument, with the additional overhead of pushing it onto the stack (2 PUSHs), and loading it into the FPU (FLD).

Code: Select all

''===================================================================================
#include "counter.bas"
#include "crt.bi"
''===================================================================================
''
'' The newer cycle count macros are available here:
''
''    http://www.freebasic.net/forum/viewtopic.php?f=7&t=20003
''
''===================================================================================

dim as double d = 12345.6789
dim as integer i

''===================================================================================

function ftol2 naked( byval x as double ) as integer
    asm
        fld QWORD PTR [esp+4]     '' load x
        fnstcw [esp-2]            '' get old cw
        mov ax, [esp-2]           '' copy to ax
        or ax, 0x0c00             '' set RC bits for truncate
        mov [esp-4], ax           '' copy new cw to memory
        fldcw [esp-4]             '' load new cw
        fistp QWORD PTR [esp-12]  '' store value rounded to 64-bit integer
        fldcw [esp-2]             '' load old cw
        mov eax, [esp-12]         '' return in EDX:EAX
        mov edx, [esp-8]
        ret 8
    end asm
end function

''===================================================================================

i = int(d)
print i
i = ftol2(d)
print i
print

SetProcessAffinityMask( GetCurrentProcess(), 1)

sleep 5000

for j as integer = 1 to 4

    counter_begin( 10000000, REALTIME_PRIORITY_CLASS, THREAD_PRIORITY_TIME_CRITICAL )
    counter_end()
    print counter_cycles;" cycles, empty"

    counter_begin( 10000000, REALTIME_PRIORITY_CLASS, THREAD_PRIORITY_TIME_CRITICAL )
        i = int(d)
    counter_end()
    print counter_cycles;" cycles, int()"

    counter_begin( 10000000, REALTIME_PRIORITY_CLASS, THREAD_PRIORITY_TIME_CRITICAL )
        i = ftol2(d)
    counter_end()
    print counter_cycles;" cycles, ftol2()"
    print

next

sleep

Running on a P3:

Code: Select all

 12345
 12345

 0 cycles, empty
 65 cycles, int()
 47 cycles, ftol2()

 0 cycles, empty
 65 cycles, int()
 47 cycles, ftol2()

 0 cycles, empty
 65 cycles, int()
 47 cycles, ftol2()

 0 cycles, empty
 65 cycles, int()
 47 cycles, ftol2()

That’s 13 cycles over what I got for the Microsoft version and I think the additional overhead could possibly add that many cycles.

Edit:

What I timed for the Microsoft version included the FLD, and the pushes alone would not have taken 13 cycles, so I suspect the Microsoft version is different than what I tested here. My attempts to optimize the code by eliminating the partial register accesses did not make it faster.

Edit2:

The Microsoft code is nothing like what I tested here. It does not change the FPU rounding mode, and since it contains several conditional jumps the cycle count may vary with the input value.

MichaelW · Post by **MichaelW** » Dec 07, 2012 2:22

Stonemonkey wrote:Is there any reason not to just switch the fpu rounding mode? `

Doing so doesn’t seem to break anything that I have tested, so far:

Code: Select all

''=============================================================================
#include "crt.bi"
''=============================================================================

#define FRC_NEAREST  0     '' or to even if equidistant (initialized state)
#define FRC_DOWN     &h400 '' toward -infinity
#define FRC_UP       &h800 '' toward +inifinity
#define FRC_TRUNCATE &hc00 '' toward zero

''--------------------------------------------------
'' This macro sets the rounding control bits in the
'' FPU Control Word to one of the above values.
''--------------------------------------------------

#macro SETRC(rc)
  #ifndef __fpu__cw__
    dim as ushort __fpu__cw__
  #endif
  asm
    fstcw [__fpu__cw__]
    and WORD PTR [__fpu__cw__], NOT 0xc00
    or  WORD PTR [__fpu__cw__], rc
    fldcw [__fpu__cw__]
  end asm
#endmacro

sub ShowRC()
    dim as ushort cw
    asm fstcw [cw]
    cw and= &hc00
    select case cw
        case FRC_NEAREST
            print "nearest"
        case FRC_DOWN
            print "down"
        case FRC_UP
            print "up"
        case FRC_TRUNCATE
            print "truncate"
    end select
end sub

''=============================================================================

dim as double d = 12345.6789
dim as single s
dim as integer i
dim as uinteger u
dim as longint li
dim as ulongint ul

ShowRC
SETRC(FRC_DOWN)
ShowRC
SETRC(FRC_UP)
ShowRC
SETRC(FRC_TRUNCATE)
ShowRC
print

SETRC(FRC_NEAREST)
ShowRC

print "floor(d)   ",floor(d)
print "ceil(d)    ",ceil(d)
i = cint(d)
print "cint(d)    ",i
u = cuint(d)
print "cuint(d)   ",u
i = int(d)
print "int(d)     ",i
i = clng(d)
print "clng(d)    ",i
u = culng(d)
print "culng(d)   ",u
li = clngint(d)
print "clngint(d) ",li
ul = culngint(d)
print "culngint(d)",ul
s = csng(d)
print "csng(d) ",s
d = cdbl(s)
print "cdlb(s) ",d

print
SETRC(FRC_TRUNCATE)
ShowRC

print "floor(d)   ",floor(d)
print "ceil(d)    ",ceil(d)
i = cint(d)
print "cint(d)    ",i
u = cuint(d)
print "cuint(d)   ",u
i = int(d)
print "int(d)     ",i
i = clng(d)
print "clng(d)    ",i
u = culng(d)
print "culng(d)   ",u
li = clngint(d)
print "clngint(d) ",li
ul = culngint(d)
print "culngint(d)",ul
s = csng(d)
print "csng(d) ",s
d = cdbl(s)
print "cdlb(s) ",d

sleep

Code: Select all

nearest
down
up
truncate

nearest
floor(d)       12345
ceil(d)        12346
cint(d)        12346
cuint(d)      12346
int(d)         12345
clng(d)        12346
culng(d)      12346
clngint(d)     12346
culngint(d)   12346
csng(d)        12345.68
cdlb(s)        12345.6787109375

truncate
floor(d)       12345
ceil(d)        12346
cint(d)        12345
cuint(d)      12345
int(d)         12345
clng(d)        12345
culng(d)      12345
clngint(d)     12345
culngint(d)   12345
csng(d)        12345.68
cdlb(s)        12345.6787109375

Stonemonkey · Post by **Stonemonkey** » Dec 07, 2012 2:43

MichaelW wrote:
Stonemonkey wrote:Is there any reason not to just switch the fpu rounding mode? `
Doing so doesn’t seem to break anything that I have tested, so far:

So just switching the mode at the start of the program would answer this from the OP?

is there any way to turn this "feature" off?

I guess that any code generated or from external sources that is dependent on the mode set will ensure it is and then reset it?

MichaelW · Post by **MichaelW** » Dec 07, 2012 9:18

Stonemonkey wrote: I guess that any code generated or from external sources that is dependent on the mode set will ensure it is and then reset it?

I think that there needs to be more testing done. I tested only conversions, but there could be other constructs that could be affected, conditionals for example.

Need advice: cast(integer, float)

Re: Need advice: cast(integer, float)

Re: Need advice: cast(integer, float)

Re: Need advice: cast(integer, float)

Re: Need advice: cast(integer, float)

Re: Need advice: cast(integer, float)

Re: Need advice: cast(integer, float)

Re: Need advice: cast(integer, float)