Please test new FBC features

brybry · Post by **brybry** » Dec 27, 2008 14:17

I've added some new features to the compiler, and I need people to test them. I've added automatic vectorization, a sin() and a cos() approximation, and two reciprocal optimizations.

Automatic Vectorization

If the compiler can determine that the same operation occurs on two or more variables in contiguous memory locations, it can merge them into one operation. Currently, only floating point operations are vectorized. Singles can be vectorized up to 4 wide, while doubles can be only 2 wide. If a double is found in the expression, the maximum width is 2 even for operations on singles. Only add, subtract, multiply, and divide can be vectorized. If the same number is used in the vectorized operations, it will be swizzled:

Code: Select all

Dim As Single Ptr x

x[0] = x[0] * x[2]
x[1] = x[1] * x[2]

Other thoughts:

The vectorization is still highly experimental, but it catches a lot of cases. It does not (yet) look at loops like most other vectorizing compilers. The vectorizer misses certain cases where arrays are involved. Using vector widths of 2 and 4 is the fastest. Consider adding a 4th vector component to standard 3 component vectors. It will be faster. There are two vectorization modes (so far): complete expression merging, and intra-node merging. The example above is complete expression merging. Intra-node merging is for vectorization that is possible in a single expression:

Code: Select all

Dim As Single Ptr x, y

x[2] = x[0] * y[0] + x[1] * y[1]

Intra-node merging still needs a lot of work, and in many cases will probably slow down your code.

To enable vectorization, use the command "-vec n" where n is the level of vectorization:

0 - no vectorization (the default)
1 - complete expression merging
2 - intra-node merging

Vectorization requires the SSE fpu mode.

Reciprocal optimizations

If the FPU mode is set to SSE, then standard reciprocal and reciprocal square root (RSQRT) calculations can be optimized. This will only occur on single-precision numbers, not doubles. The emitter will use the SSE instructions RCPSS and RSQRTSS for high-speed approximations of reciprocals.

Code: Select all

Dim As Single x

'' plain reciprocal
x = 1.0f / x

'' reciprocal square root
x = 1.0f / Sqr(x)

RSQRT is useful for vector normalization.

These optimizations are controlled with the command "-fpmode FAST | PRECISE". FAST enables the optimizations and PRECISE disables them (the default).

Sin()/Cos() approximations

When the command "-fpmode FAST" is used and the FPU mode is set to SSE, then sin() and cos() are approximated using SSE and general integer instructions. These are also only for single-precision numbers.

The reciprocal optimizations and sin()/cos() approximations should work as expected, but please test them anyway. I want to get them added to the compiler ASAP. The vectorization needs a lot more testing. Please use the modified compiler to compile your projects and see if:

1. your project still runs as expected.
2. all optimizable cases get optimized.

I only have the Windows version of the compiler so far. I will try to get a Linux version of the modified compiler available soon, if people want it. I haven't ever tried cross-compiling so I'll work on that. Please post problems to this topic, and I'll try to fix them.

This is a modified version of 0.21.0 from SVN.

There is a 4.2MB hourly transfer limit on this (not imposed by me), so if it doesn't work... try later.

http://www.geocities.com/bryan.js00/fbc_mod_win.zip

VonGodric · Post by **VonGodric** » Dec 27, 2008 17:19

Good work. I will do some tests. Good to see initiative!

Just in case I uploaded this to my server as well: http://fbdevzone.com/downloads/fbc_mod_win.zip

McLovin · Post by **McLovin** » Dec 27, 2008 17:37

I also applaud your initiative! Excellent!

nobozoz · Post by **nobozoz** » Dec 28, 2008 0:59

I'll be running some of the benchmarks from the "shootout" at shootout.alioth.debian.org using your fbc version. My system is WinXP Pro on a Dell Inspiron 2650 with a Celeron cpu.

Using these options for now...
fbc -s console -arch 686 -fpmode FAST -fpu SSE -w pedantic -version -v

Jim

Post by **counting_pine** » Dec 28, 2008 1:33

Nice work Bryan, it would be great to see the code for all this some time.
Note: at this point though, the options seem to crash the compiler if "-fpu sse" is not set...

KristopherWindsor · Post by **KristopherWindsor** » Dec 28, 2008 3:37

Fix the compiler!!! :D :D

brybry · Post by **brybry** » Dec 28, 2008 14:33

Updated compiler uploaded. Any mirrored links may or may not be updated...

I'm glad people like these new features.

Sorry about the crashing. I had added a check after all the command-line options get parsed, to throw an error if either vectorization or fast math (or both) were enabled without the SSE fpu mode. I'm sure I tested it, but who knows what happened. It should work now.

I've also discovered an issue where if the first expression can be intra-vectorized, it will get vectorized (if intra-node vectorization is enabled) and therefore be unvectorizable with the next expression.

Arrays should work in more cases now.

Another thing is that, currently, if variables are used to index arrays and pointers, they will prevent vectorization. I should be able to fix this soon.

I hate to say this, but do not expect the vectorization to give a huge (or any) increase in speed. I have to always assume unaligned accesses (which are slow), swizzling is kinda slow, and writing 3-component vectors to memory requires 3 instructions (slow). Like I said, try to use 4 component vectors. When/If aligning variables gets added to FB, the vectorization should be able to utilize that for faster accesses.

Plus remember this is very experimental and still very new. My main goal is to create a vectorizer, making it as fast as possible will be the next step.

@counting_pine: I want to have the fast math optimizations added to the compiler ASAP. I think they work fine. I want to work on the vectorization more before releasing the code.

VonGodric · Post by **VonGodric** » Dec 28, 2008 15:50

Updated the link

MichaelW · Post by **MichaelW** » Dec 29, 2008 2:14

brybry wrote:
When/If aligning variables gets added to FB, the vectorization should be able to utilize that for faster accesses.

Perhaps I don’t understand your statement, but by my interpretation it’s been possible at least since 0.16, and maybe much further back than that.

Code: Select all

function Alignment( byval addr as any ptr ) as integer

  '' The reference to local label 1 has an extra f
  '' appended to the end to correct for a problem
  '' in FBC that causes it to ignore the first f.
  '' The Nb references work as they should.

  asm
    xor eax, eax
    mov ecx, [addr]
    bsf ecx, ecx
    jz 1ff
    mov eax, 1
    shl eax, cl
  1:
    mov [function], eax
  end asm

end function

extern as single s1, s2, s3, s4
asm
  .data
    .balign 16
    s1: .single 1.2345
    s2: .single 12.345
    s3: .single 123.45
    .balign 16
    s4: .single 1234.5
  .text
end asm

print s1,s2,s3,s4
print alignment( @s1 ), alignment( @s2 ),
print alignment( @s3 ), alignment( @s4 )
sleep

Code: Select all

 1.2345        12.345        123.45        1234.5
 16            4             8             32

brybry · Post by **brybry** » Jan 01, 2009 19:41

You are correct. Alignment is possible, but it is being done by the programmer, not the compiler. The compiler has no knowledge of that type of alignment and therefore cannot exploit it.

There is a feature request asking for keywords to select alignment of variables. The emitter can be given the alignment of the variables and use the faster instructions.

I'm still working on the vectorization. I recently improved support for pointer casting. It will catch more possible cases. I will upload a new version in a day or two.

brybry · Post by **brybry** » Jan 05, 2009 13:54

I just uploaded a new version of the compiler. It should work much better with pointer casting.

VonGodric · Post by **VonGodric** » Jan 05, 2009 14:07

brybry do you need some hosting? I could give you some web space for fb-related stuff that doesn't impose ridiculous bandwidth limits

brybry · Post by **brybry** » Jan 06, 2009 13:57

@VonGodric:

umm, maybe...

I would like something better than Yahoo! Geocities, but I'm between jobs right now, and I don't know if you were going to charge me, but I cannot afford to spend anything. Even then, I have very little out there right now. I think few people are downloading the modified FBC I have posted, and so the bandwidth limits aren't a problem.

I also have this site for my 3D graphics/game engine that I've been working on for the past years.

http://www.geocities.com/ragtagsoftware/

This is a freeBASIC project. Again, I doubt the band-width limit is a problem.

I'll think about it. Thanks for the offer.

VonGodric · Post by **VonGodric** » Jan 06, 2009 14:15

Don't worry I support FB related project when I can and you seem to be doing a good job with it so I'll be happy to provide some hosting.

contact me: albeva [at] me . com

Post by **counting_pine** » Jan 07, 2009 18:00

Just to let everyone know: I've recently added a patch to SVN with the recent -fpmode additions.
We're happy for anything brybry does to become part of the official FBC code when he's satisfied it's ready. We don't want him to have to fork/maintain his own separate version.