Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

General FreeBASIC programming questions.
nddulac
Posts: 11
Joined: Oct 26, 2009 5:34
Contact:

Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by nddulac »

I like to use one of my programs to benchmark computers. Basically, the program uses Gauss-Jordan Elimination to solve about 2300 simultaneous linear equations - i.e. a lot of floating-point arithmetic. I compare execution time to GeekBench 5 single-core benchmark ratings, and have discovered bizarre results for my 5000 series Ryzen chips - they are slow at running this particular program! Below is a graph of 1/(FTS execution time) as a function of Geekbench Single-Core scores:
Image
I expect to see a fairly linear relationship. However, the data points for the Ryzen 5000 chips (5300G, 5600G, and 5950X in various rigs) fall well below the line quite consistently. The trouble here is that there is nothing scientific about this graph. While it includes data from a number of different AMD and Intel CPU systems, they are from different rigs with different memory/motherboard configurations. In an attempt to tidy up the comparison, I picked five CPUs (Athlon 300GE, Ryzen 3 2200G, Ryzen 5 3400G, Ryzen 5 Pro 4650G, and Ryzen 3 5300G) to test in the same B450 Motherboard, with the same memory (clocked to the maximum the CPU would allow, so unfortunately, that was not the same for every CPU.) The results there are shown below.
Image
I compiled the code on my laptop (Intel Core i5-10210U, with 20GB of DDR4 SODIMM RAM) using FreeBASIC Compiler - Version 1.07.1 (2019-09-27), built for win32 (32bit). I use the same executable and data set for all of my benchmarking tests.

I find the results both interesting and disappointing. They Ryzen 5000 series chips are impressive by all measures I have seen - except in running my program! I don't know if this is because the chips don't work/play well with FreeBASIC compiled code or if they are just anomalously slow at floating point math. At any rate, these results have caused me to keep using my Ryzen 5 3600 rig as my daily driver and not move to my newer Ryzen 9 5950X build!
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by MrSwiss »

If you run 32 bit FBC on 64 bit platform (64 bit CPU/FPU and 64 bit OS) then you're likely
seeing more of the WoW (emulator for 32 bit programs) then the real performance.

For tests to be reliable, optimal HW / OS / Progam compatibility is a must (same bitness preferably).
Since I'm assuming the HW to be 64 bit, I'd recomend to use FBC 64 too.

Latest official FBC release is currently: 1.08.1 (GCC ver. 9.3 WinLibs, for WIN)
nddulac
Posts: 11
Joined: Oct 26, 2009 5:34
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by nddulac »

MrSwiss wrote:For tests to be reliable, optimal HW / OS / Progam compatibility is a must (same bitness preferably).
Since I'm assuming the HW to be 64 bit, I'd recomend to use FBC 64 too.
I'm sure there are many differences from cpu to cpu (as well as other rig eccentricities) that will limit the value of comparing runtimes on a single program. However, my assumption (which is certainly open to criticism and debate) is that those differences will affect the GeekBench scores also.

that said, I have tried 64-bit FreeBASIC compilers over the years but rejected them as they produce very slow code in my applications. (Plus, 32-bit floating point number are already overkill for this application.) For example, the code I used in these test takes about 60-75 seconds to run on most reasonable rigs (as much as 5-6 minutes on the slower Celeron rigs.) However, compiling with 64 bit FreeBASIC increases the runtime on the same application/data to more like 11 minutes (on a rig for which the 32-bit version ran in under70 seconds). So I have always stuck with the 32-bit versions.
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by MrSwiss »

nddulac wrote:I have tried 64-bit FreeBASIC compilers over the years but rejected them as they produce very slow code in my applications. (Plus, 32-bit floating point number are already overkill for this application.)
There are no smaller floating point variables, than Singles (binary32), in FBC 32/64.

The slow code might have been the 'stone-age' version of GCC older FBC 64 was using: 5.2 ...
Also replace Integer (64 bit in FBC 64) with Long in code, might help.
(reduces size of program and increases cache throughput, aka: efficiency)
In FBC 64 make use of GCC optimizer options (I'd recommend "-gen gcc -O 2").
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by jj2007 »

MrSwiss wrote:If you run 32 bit FBC on 64 bit platform (64 bit CPU/FPU and 64 bit OS) then you're likely seeing more of the WoW (emulator for 32 bit programs) then the real performance.
WoW is a thin layer that redirects Win32 API calls to the 64-bit API. No chance to see any performance hit due to WoW.
coderJeff
Site Admin
Posts: 4326
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by coderJeff »

32-bit versus 64-bit is most likely a difference between fbc's assembler and gcc backends. Default on 32-bit is -gen gas backend and default on 64-bit is -gen gcc backend. For example, if the benchmark program has a 'gosub' statement, it is horribly slow with gcc compared to straight assembly.

For the CPU timings, I don't know. My guess would be to investigate the memory specs and timings. Feel like I had this problem once where the memory was compatible with the CPU but not the best match.
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by dodicat »

Gauss-Jordan Elimination can soon go out of double range with large matrices, especially if you pivot at each turn, because you are using the largest absolute values in the computations.

Is your answer vector intact, i.e. no -1.#IND results?
nddulac
Posts: 11
Joined: Oct 26, 2009 5:34
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by nddulac »

dodicat wrote:... Is your answer vector intact, i.e. no -1.#IND results?
If this was the issue, it would be a product of the data and the implementation of the algorithm, and would not happen on some cpus and not others.

Both the data and the algorithm are fine. The only issue is the long run times on newer (5000 series) Ryzen processors.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by jj2007 »

nddulac wrote:Both the data and the algorithm are fine. The only issue is the long run times on newer (5000 series) Ryzen processors.
This remains a mystery, of course. What I could imagine is that a computation yields an error on the FPU (overflow, division by zero, etc). I have seen huge differences in the handling of FPU errors between different CPUs.
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by marcov »

nddulac wrote:I like to use one of my programs to benchmark computers. Basically, the program uses Gauss-Jordan While it includes data from a number of different AMD and Intel CPU systems, they are from different rigs with different memory/motherboard configurations.
But did you run the tests with exactly the same binary ?
TeeEmCee
Posts: 375
Joined: Jul 22, 2006 0:54
Location: Auckland

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by TeeEmCee »

Occurrence of FPU errors (divide by zero, etc) are not the only factor in how long FPU instructions take to execute. Denormal numbers can be significantly slower to operate on, but even excluding them, the particular values you are operating on can make a big difference for some instructions in some microarchitectures (μarchs), while in other μarchs it doesn't matter (though I thought that modern μarchs tend to be more consistent compared to old ones). Division and transcendent functions seem to be the most likely to have variable latencies. I know nothing about the μarch of any of the Ryzens, though.

Anyway that's all an aside and probably not relevant. You could try profiling your code (using random samples, eg gprof) to look for a difference in sample distribution on different CPUs to look for unusually slow instructions.

Also, it's bizarre that 64-bit builds are so slow and that -gen gcc is slower than -gen gas, you should see the opposite assuming you compiled with optimisation on, unless you're using gosub, or 64-bit ptrs/ints leads to running out of cache. Again, you could try profiling and looking at the assembly.
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by marcov »

Oh, and don't forget to set all machines' power settings to "high performance". That has bitten me sometimes too when benchmarking.
nddulac
Posts: 11
Joined: Oct 26, 2009 5:34
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by nddulac »

marcov wrote:
nddulac wrote:I like to use one of my programs to benchmark computers. Basically, the program uses Gauss-Jordan While it includes data from a number of different AMD and Intel CPU systems, they are from different rigs with different memory/motherboard configurations.
But did you run the tests with exactly the same binary ?
Yes. Same binary. I even ran on the same test bench where the only hardware that changed was the cpu. (I had to adjust the memory speed as well, but always used the maximum speed supported out of the box for the given cpu.) The data for these tests are shown in the second graph. The CPUs (APUs, I guess) were Athlon 300GE, Ryzen 3 2200G, Ryzen 5 3400G, Ryzen 5 Pro 4650G, and Ryzen 3 5300G.

One of the things about which I am curious is whether or not this has any relationship to the new Windows 11 scheduler issues with Ryzen CPUs. I have yet to test (and have not seen data, but haven't really looked) to see if the issue disproportionately affects 5000 series CPUs, but I would guess that if it did, I would have come across someone pointing it out!
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by marcov »

nddulac wrote:
marcov wrote:
nddulac wrote:I like to use one of my programs to benchmark computers. Basically, the
One of the things about which I am curious is whether or not this has any relationship to the new Windows 11 scheduler issues with Ryzen CPUs. I have yet to test (and have not seen data, but haven't really looked) to see if the issue disproportionately affects 5000 series CPUs, but I would guess that if it did, I would have come across someone pointing it out!
Windows 11 has a known issue with Ryzens increasing L3 latecy threefold. A fix was released last week, but might not yet be in your release channel yet.
nddulac
Posts: 11
Joined: Oct 26, 2009 5:34
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by nddulac »

I just built my first 12th generation intel (i5-12400) box yesterday . . . and the same thing happened! my 9th and 10th gen i5 computers will run my compiled code in about 50 seconds, but the 12 gen takes 130+ seconds. Geekbench measures these 5000 series Ryzens and the i5-12400 as plenty fast, but they my math-heaving compiled FreeBASIC codes disappointingly slowly.

I am currently running more scientifically controlled tests with several generations of Ryzen (and Athlon) APUs, and will post when I get through my last few cpus. But the effect is real and reproducible.
Post Reply