Please test new FBC features

For other topics related to the FreeBASIC project or its community.
brybry
Project Member
Posts: 69
Joined: Aug 27, 2005 14:43

Please test new FBC features

I've added some new features to the compiler, and I need people to test them. I've added automatic vectorization, a sin() and a cos() approximation, and two reciprocal optimizations.

Automatic Vectorization

If the compiler can determine that the same operation occurs on two or more variables in contiguous memory locations, it can merge them into one operation. Currently, only floating point operations are vectorized. Singles can be vectorized up to 4 wide, while doubles can be only 2 wide. If a double is found in the expression, the maximum width is 2 even for operations on singles. Only add, subtract, multiply, and divide can be vectorized. If the same number is used in the vectorized operations, it will be swizzled:

Code: Select all

Dim As Single Ptr x

x[0] = x[0] * x[2]
x[1] = x[1] * x[2]

Other thoughts:

The vectorization is still highly experimental, but it catches a lot of cases. It does not (yet) look at loops like most other vectorizing compilers. The vectorizer misses certain cases where arrays are involved. Using vector widths of 2 and 4 is the fastest. Consider adding a 4th vector component to standard 3 component vectors. It will be faster. There are two vectorization modes (so far): complete expression merging, and intra-node merging. The example above is complete expression merging. Intra-node merging is for vectorization that is possible in a single expression:

Code: Select all

Dim As Single Ptr x, y

x[2] = x[0] * y[0] + x[1] * y[1]

Intra-node merging still needs a lot of work, and in many cases will probably slow down your code.

To enable vectorization, use the command "-vec n" where n is the level of vectorization:

0 - no vectorization (the default)
1 - complete expression merging
2 - intra-node merging

Vectorization requires the SSE fpu mode.

Reciprocal optimizations

If the FPU mode is set to SSE, then standard reciprocal and reciprocal square root (RSQRT) calculations can be optimized. This will only occur on single-precision numbers, not doubles. The emitter will use the SSE instructions RCPSS and RSQRTSS for high-speed approximations of reciprocals.

Code: Select all

Dim As Single x

'' plain reciprocal
x = 1.0f / x

'' reciprocal square root
x = 1.0f / Sqr(x)

RSQRT is useful for vector normalization.

These optimizations are controlled with the command "-fpmode FAST | PRECISE". FAST enables the optimizations and PRECISE disables them (the default).

Sin()/Cos() approximations

When the command "-fpmode FAST" is used and the FPU mode is set to SSE, then sin() and cos() are approximated using SSE and general integer instructions. These are also only for single-precision numbers.

The reciprocal optimizations and sin()/cos() approximations should work as expected, but please test them anyway. I want to get them added to the compiler ASAP. The vectorization needs a lot more testing. Please use the modified compiler to compile your projects and see if:

1. your project still runs as expected.
2. all optimizable cases get optimized.

I only have the Windows version of the compiler so far. I will try to get a Linux version of the modified compiler available soon, if people want it. I haven't ever tried cross-compiling so I'll work on that. Please post problems to this topic, and I'll try to fix them.

This is a modified version of 0.21.0 from SVN.

There is a 4.2MB hourly transfer limit on this (not imposed by me), so if it doesn't work... try later.

http://www.geocities.com/bryan.js00/fbc_mod_win.zip
VonGodric
Posts: 997
Joined: May 27, 2005 9:06
Location: London
Contact:
Good work. I will do some tests. Good to see initiative!

Just in case I uploaded this to my server as well: http://fbdevzone.com/downloads/fbc_mod_win.zip
McLovin
Posts: 82
Joined: Oct 21, 2008 1:15
Contact:
I also applaud your initiative! Excellent!
nobozoz
Posts: 238
Joined: Nov 17, 2005 6:24
Location: Chino Hills, CA, USA
I'll be running some of the benchmarks from the "shootout" at shootout.alioth.debian.org using your fbc version. My system is WinXP Pro on a Dell Inspiron 2650 with a Celeron cpu.

Using these options for now...
fbc -s console -arch 686 -fpmode FAST -fpu SSE -w pedantic -version -v

Jim
counting_pine
Posts: 6225
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs
Nice work Bryan, it would be great to see the code for all this some time.
Note: at this point though, the options seem to crash the compiler if "-fpu sse" is not set...
KristopherWindsor
Posts: 2428
Joined: Jul 19, 2006 19:17
Location: Sunnyvale, CA
Contact:
Fix the compiler!!! :D :D
brybry
Project Member
Posts: 69
Joined: Aug 27, 2005 14:43
Updated compiler uploaded. Any mirrored links may or may not be updated...

I'm glad people like these new features.

Sorry about the crashing. I had added a check after all the command-line options get parsed, to throw an error if either vectorization or fast math (or both) were enabled without the SSE fpu mode. I'm sure I tested it, but who knows what happened. It should work now.

I've also discovered an issue where if the first expression can be intra-vectorized, it will get vectorized (if intra-node vectorization is enabled) and therefore be unvectorizable with the next expression.

Arrays should work in more cases now.

Another thing is that, currently, if variables are used to index arrays and pointers, they will prevent vectorization. I should be able to fix this soon.

I hate to say this, but do not expect the vectorization to give a huge (or any) increase in speed. I have to always assume unaligned accesses (which are slow), swizzling is kinda slow, and writing 3-component vectors to memory requires 3 instructions (slow). Like I said, try to use 4 component vectors. When/If aligning variables gets added to FB, the vectorization should be able to utilize that for faster accesses.

Plus remember this is very experimental and still very new. My main goal is to create a vectorizer, making it as fast as possible will be the next step.

@counting_pine: I want to have the fast math optimizations added to the compiler ASAP. I think they work fine. I want to work on the vectorization more before releasing the code.
VonGodric
Posts: 997
Joined: May 27, 2005 9:06
Location: London
Contact:
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA
brybry wrote:
When/If aligning variables gets added to FB, the vectorization should be able to utilize that for faster accesses.

Perhaps I don’t understand your statement, but by my interpretation it’s been possible at least since 0.16, and maybe much further back than that.

Code: Select all

function Alignment( byval addr as any ptr ) as integer

'' The reference to local label 1 has an extra f
'' appended to the end to correct for a problem
'' in FBC that causes it to ignore the first f.
'' The Nb references work as they should.

asm
xor eax, eax
bsf ecx, ecx
jz 1ff
mov eax, 1
shl eax, cl
1:
mov [function], eax
end asm

end function

extern as single s1, s2, s3, s4
asm
.data
.balign 16
s1: .single 1.2345
s2: .single 12.345
s3: .single 123.45
.balign 16
s4: .single 1234.5
.text
end asm

print s1,s2,s3,s4
print alignment( @s1 ), alignment( @s2 ),
print alignment( @s3 ), alignment( @s4 )
sleep

Code: Select all

1.2345        12.345        123.45        1234.5
16            4             8             32
brybry
Project Member
Posts: 69
Joined: Aug 27, 2005 14:43
You are correct. Alignment is possible, but it is being done by the programmer, not the compiler. The compiler has no knowledge of that type of alignment and therefore cannot exploit it.

There is a feature request asking for keywords to select alignment of variables. The emitter can be given the alignment of the variables and use the faster instructions.

I'm still working on the vectorization. I recently improved support for pointer casting. It will catch more possible cases. I will upload a new version in a day or two.
brybry
Project Member
Posts: 69
Joined: Aug 27, 2005 14:43
I just uploaded a new version of the compiler. It should work much better with pointer casting.
VonGodric
Posts: 997
Joined: May 27, 2005 9:06
Location: London
Contact:
brybry do you need some hosting? I could give you some web space for fb-related stuff that doesn't impose ridiculous bandwidth limits
brybry
Project Member
Posts: 69
Joined: Aug 27, 2005 14:43
@VonGodric:

umm, maybe...

I would like something better than Yahoo! Geocities, but I'm between jobs right now, and I don't know if you were going to charge me, but I cannot afford to spend anything. Even then, I have very little out there right now. I think few people are downloading the modified FBC I have posted, and so the bandwidth limits aren't a problem.

I also have this site for my 3D graphics/game engine that I've been working on for the past years.

http://www.geocities.com/ragtagsoftware/

This is a freeBASIC project. Again, I doubt the band-width limit is a problem.

I'll think about it. Thanks for the offer.
VonGodric
Posts: 997
Joined: May 27, 2005 9:06
Location: London
Contact:
Don't worry I support FB related project when I can and you seem to be doing a good job with it so I'll be happy to provide some hosting.

contact me: albeva [at] me . com
counting_pine