Automatic Vectorization
If the compiler can determine that the same operation occurs on two or more variables in contiguous memory locations, it can merge them into one operation. Currently, only floating point operations are vectorized. Singles can be vectorized up to 4 wide, while doubles can be only 2 wide. If a double is found in the expression, the maximum width is 2 even for operations on singles. Only add, subtract, multiply, and divide can be vectorized. If the same number is used in the vectorized operations, it will be swizzled:
Code: Select all
Dim As Single Ptr x
x[0] = x[0] * x[2]
x[1] = x[1] * x[2]
The vectorization is still highly experimental, but it catches a lot of cases. It does not (yet) look at loops like most other vectorizing compilers. The vectorizer misses certain cases where arrays are involved. Using vector widths of 2 and 4 is the fastest. Consider adding a 4th vector component to standard 3 component vectors. It will be faster. There are two vectorization modes (so far): complete expression merging, and intra-node merging. The example above is complete expression merging. Intra-node merging is for vectorization that is possible in a single expression:
Code: Select all
Dim As Single Ptr x, y
x[2] = x[0] * y[0] + x[1] * y[1]
To enable vectorization, use the command "-vec n" where n is the level of vectorization:
0 - no vectorization (the default)
1 - complete expression merging
2 - intra-node merging
Vectorization requires the SSE fpu mode.
Reciprocal optimizations
If the FPU mode is set to SSE, then standard reciprocal and reciprocal square root (RSQRT) calculations can be optimized. This will only occur on single-precision numbers, not doubles. The emitter will use the SSE instructions RCPSS and RSQRTSS for high-speed approximations of reciprocals.
Code: Select all
Dim As Single x
'' plain reciprocal
x = 1.0f / x
'' reciprocal square root
x = 1.0f / Sqr(x)
These optimizations are controlled with the command "-fpmode FAST | PRECISE". FAST enables the optimizations and PRECISE disables them (the default).
Sin()/Cos() approximations
When the command "-fpmode FAST" is used and the FPU mode is set to SSE, then sin() and cos() are approximated using SSE and general integer instructions. These are also only for single-precision numbers.
The reciprocal optimizations and sin()/cos() approximations should work as expected, but please test them anyway. I want to get them added to the compiler ASAP. The vectorization needs a lot more testing. Please use the modified compiler to compile your projects and see if:
1. your project still runs as expected.
2. all optimizable cases get optimized.
I only have the Windows version of the compiler so far. I will try to get a Linux version of the modified compiler available soon, if people want it. I haven't ever tried cross-compiling so I'll work on that. Please post problems to this topic, and I'll try to fix them.
This is a modified version of 0.21.0 from SVN.
There is a 4.2MB hourly transfer limit on this (not imposed by me), so if it doesn't work... try later.
http://www.geocities.com/bryan.js00/fbc_mod_win.zip