Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

General FreeBASIC programming questions.
Provoni
Posts: 514
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by Provoni »

What are your compiler options? Are you using 64-bit Fb 1.09?
nddulac
Posts: 11
Joined: Oct 26, 2009 5:34
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by nddulac »

Provoni wrote: Mar 13, 2023 6:39 What are your compiler options? Are you using 64-bit Fb 1.09?
This isn't an issue with compiler options. I am comparing the same compiled exe on different hardware platforms. The execution time should scale inversely with the single-core cpu rating. the issue is that that relationship holds for pre-5000 series Ryzen chips and all of the intel chips on which I have run this benchmark . . but runs slowly on the Ryzen 5000 chips and now also on the i5-12400.

I am almost through testing ~20 Athlon and Ryzen APUs on a test bench where the only things I am changing are the CPUs (same motherboard, memory (always clocked to 3200 Megatransfers/second, NVMe drive, etc.) I have even run these test on both Windows and Linux. (The compiled code runs much faster under Windows!) The software is all the same as well. As I said, the effect is real and reproducible. Interestingly, I don't see the anomalous slowdown running other math-heavy applications. To test this, I am running a molecular modeling job using the Gaussian package of programs. (Gaussian runs about a factor of two times faster on Linux that on Windows.)

All that said, I compiled my code using version 1.08.1 using the "-lang qb" flag. I always use the 32 bit compiler since 64-bit compiled code runs about 10 times slower. But once again, since i am not recompiling for each platform, the effect is due to the change in hardware, not due to how the code was compiled (unless, of course, there are some cpu-specific flags about which I do not know.)
shadow008
Posts: 86
Joined: Nov 26, 2013 2:43

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by shadow008 »

I've recently done a build with a ryzen 7900. One of the biggest issues I've found is with Windows 11's scheduling on the multi CCD cpus (which would include your 5950x). Microsoft claims to have fixed this for the R9 5000 series chips, but I don't believe them for one second as a friend of mine has had this issue since the cpu's were released. For some reason, the OS will schedule a process on a CPU on one CCD, then swap it over to a process on another CCD. This causes a -very- significant loss in performance. Consider reading up on "Ryzen 9 inter-ccd latency". My suggestion is to use Ryzen Master and run your computer in Game Mode (requires reboot). This will disable one CCD, leaving only a single chiplet active. Then re-run your test bench.

I have little hope that would actually work as you're seeing it on an i5 as well, but it's worth a shot?

If you'd like, you can send me your benchmark binary and I can run it on an r9 7900 and send you the specs + results.
Provoni
Posts: 514
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by Provoni »

nddulac wrote: Mar 13, 2023 17:10 All that said, I compiled my code using version 1.08.1 using the "-lang qb" flag. I always use the 32 bit compiler since 64-bit compiled code runs about 10 times slower. But once again, since i am not recompiling for each platform, the effect is due to the change in hardware, not due to how the code was compiled (unless, of course, there are some cpu-specific flags about which I do not know.)
64-bit code running 10 times slower? That doesn't seem right at all.

Try the following if you please:

Code: Select all

fbc64.exe -gen gcc -Wc -march=native,-Ofast
nddulac
Posts: 11
Joined: Oct 26, 2009 5:34
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by nddulac »

Provoni wrote: Mar 13, 2023 21:55 Try the following if you please:

Code: Select all

fbc64.exe -gen gcc -Wc -march=native,-Ofast
I did this on my laptop, which sports an i5-11400H. Here are the results.

32-bit compiler using -lang qb flag: runtime = 54.646 seconds
64-bit compiler using -lang qb flag: runtime = 1473.408 seconds
64 bit compiler using the suggested flags (+ -lang qb): runtime = 61.958 seconds

An interesting result, and thank you for the suggestion. I was unable to get the code compiled on my laptop to run on the the i5-12400 build. I'll see if I can get things compiled on that machine and report back. My prediction is that the code compiled with the suggested flags will still execute anomalously slowly on these more advanced chips. We'll see if I am right or wrong.

In the meantime, something interesting (and frustrating) which I encountered. I measure the runtime of the program using the "timer" function, which logs the value of the internal clock of the cpu core on which the code is running. However, since the execution may jump between cores, if it does not finish on the core on which the execution began, the reliability of the "timer" function to keep meaningful time is based on whether or not the core timers are synced - which on Ryzen processors, they are not! Upon discovering this (easily observed by sometimes seeing wild execution times reported, including negative execution times) have moved to OS level timers. I still see wide variances in execution times on older Ryzen variants (presumably due to a lower ability to maintain a turbo boost during execution) which results in a lot of scatter in my data.

Anyhow, this has been an interesting problem to look at.
Provoni
Posts: 514
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by Provoni »

I'm interested to see how your problem develops.

My project is nearly 10 years old now and is fully multi-threaded. I'm not an expert on multi-threading and the way I set it up is very simple. It can sometimes lead to truly unexpected behaviour and I've had my share of these where I went on multi-day bug hunt making stops at every step of the program just to figure out what's going wrong.

It's 99% integer workload and it wants as many threads as it can get. Depending on the workload fast memory or a large CPU cache is also preferred. It runs best on workstations that have more than one CPU.

I ran into one problem a long while ago that you may want to look into to: do not make calls to the FreeBASIC random number generators since these have issues with multi-threading causing massive slowdown as the number of threads go up.

You can use something like this instead:

Code: Select all

#macro rng(a,b,state) 'https://en.wikipedia.org/wiki/Lehmer_random_number_generator
   	state=48271*state and 2147483647 '32-bit
	a=b*state shr 31
#endmacro

screenres 640,480,32

dim as integer random_number,state=1

'print 5 random numbers between 1 and 100

rng(random_number,1+100,state):print random_number
rng(random_number,1+100,state):print random_number
rng(random_number,1+100,state):print random_number
rng(random_number,1+100,state):print random_number
rng(random_number,1+100,state):print random_number

sleep
Last edited by Provoni on Mar 26, 2023 6:02, edited 1 time in total.
Provoni
Posts: 514
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by Provoni »

Can you tell me briefly about the flow of your program?

Is there one main thread that dispatches work to other worker threads?

If so, after the worker threads return there work to the main thread, do they run again with new information? Are any of the threads waiting for information at any point, if only for a brief time?

Are there mutexes?
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by marcov »

The benchmarking respective to geekbench also complicates. What if the anomaly is in geekbench, not in your programs? E.g. if it is very integer bound and you are very FPU bound. Note also that the 5000 series of CPUs is a mix of Zen2(+?) and Zen3. E.g. the 5700g is Zen2 (related to the laptop 4xxx series) and the 5700x is Zen3.

Wisest is to make a smallest possible sample that you can distribute for testing, so that devels can figure out what the exact problem spot is.

I have some seen some performance problems in x86_64-win64 with compliant compilers that use SSE for floating point rather than x87. More complex math operations like sin/cos seem to be slower. (but by about 40%).

But that goes for any native x64 platform, not just for last generation intel.
dodicat
Posts: 7987
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by dodicat »

For the original 2300 linear equation solver I would use upper/lower triangular method, it is faster than row echelon (Gauss).
Here I get about 8 seconds on this old 12 year old machine.
64 bits or 32 bits
Processor: Intel(R) Core(TM) i5-2320 CPU @ 3.00GHz (4 CPUs), ~3.0GHz
using -gen gcc -O 2 (in the source)

Code: Select all


#cmdline "-gen gcc -O 2"
Sub solve(matrix() As Double, rhs() As Double,solution() As Double) 
      #Macro setup(All_the_variables)
      Dim mm As Double
      Dim m As Long=Ubound(matrix,1)
      Dim As Integer x1,k1,g,sign,dimcount
      sign=1
      Redim l(1 To m,1 To m) As Double  
      Redim b(1 To m,1 To m) As Double
      Redim iv(1 To m) As Double
      Redim crhs(1 To m) As Double
      Redim solution(1 To m) As Double
      #EndMacro 
      
      #macro make(copy_matrix_elements)
      For i As Long=1 To m
            For j As Long=1 To m
                  b(i,j)=matrix(i,j)    'take a copy of the matrix
                  If i=j Then 
                        l(i,j)=1.0            'make a unit matrix
                  Else
                        l(i,j)=0.0
                  End If
            Next j
            crhs(i)=rhs(i) 
      Next i
      #endmacro
      
      #Macro Pivot(If_required)
      For x1=1 To Ubound(matrix)
            For k1=1 To Ubound(matrix)-1
                  If k1+1 >Ubound(matrix) Then Exit For
                  If x1>Ubound(matrix) Then Exit For
                  If Abs(b(k1+1,1))>Abs(b(x1,1)) Then
                        sign=-1*sign   'sign changes with each swap
                        Swap crhs(k1+1),crhs(x1)
                        For g As Long=1 To Ubound(matrix)
                              Swap b(k1+1,g),b(x1,g)
                        Next g
                  End If
            Next k1
      Next x1
      #EndMacro
      
      #Macro lu(Make_triangular_matrices)
      Dim As Long x,y,z
      For y=1 To m-1
            If b(y,y)=0 Then       'keep ahead of problems by pivoting
                  pivot(avoid division by zero)
                  If b(y,y)=0 Then
                        Print "SINGULAR MATRIX"
                        Exit Sub
                  End If
            End If
            For x=y To m-1
                  l(x+1,y)=b(x+1,y)/b(y,y)         'l() is lower triangular matrix
                  mm=l(x+1,y)
                  b(x+1,y)=0
                  For z=y+1 To m
                        b(x+1,z)=b(x+1,z)-mm*b(y,z)  'b() is upper triangular matrix
                  Next z
            Next x
      Next y
      #EndMacro
      
      #Macro ivector(Make_intermediate_vector)
      For n As Long=1 To m
            iv(n)=crhs(n)
            For j As Long=1 To n-1        
                  iv(n)=iv(n)-l(n,j)*iv(j)        
            Next j
      Next n
      #EndMacro
      
      #Macro fvector(Solution_vector)
      For n As Long=m To 1 Step -1
            solution(n)=iv(n)/b(n,n)
            For j As Long = m To n+1 Step -1
                  solution(n)=solution(n)-(b(n,j)*solution(j)/b(n,n))
            Next j
      Next  n
      
      #EndMacro
      
      '                     MAIN
      setup(All variables)     'set up the working arrays
      make(copy matrix elements)
      lu(Find the triangular matrices)   
      ivector(Find the intermediate vector)   
      fvector(Find the solution)
      '                     END MAIN        
End Sub

sub mult(m1() As double,m2() As double,ans() as double)
Dim rows As Integer=Ubound(m1,1)
Dim columns As Integer=Ubound(m2,2)
If Ubound(m1,2)<>Ubound(m2,1) Then
    Print "Can't do"
    Exit sub
End If
Redim ans(1 to rows)
Dim rxc As Double
For r As Integer=1 To rows
        rxc=0
        For k As Integer = 1 To Ubound(m1,2)
            rxc=rxc+m1(r,k)*m2(k)
        Next k
        ans(r)=rxc
Next r
End sub


Randomize timer
print "please wait a few seconds"
Var size=2300
Dim Shared As Double mat1(1 To size,1 To size)
Dim Shared As  Double rhs1(1 To size)
For r As Long=1 To size
      For c As Long=1 To size
            mat1(r,c)=Rnd
      Next
      rhs1(r)=Rnd
Next
Dim As Double t1,t2
Redim As Double solution()
t1=Timer
solve(mat1(),rhs1(),solution())
t1=Timer-t1

Dim As String s

For n As Long=1 To size
      s+= Str(solution(n))+Chr(10)
Next
Print
Print t1; "  seconds, please press a key for the solution array"
sleep
print s
print "check the first few"
print "orig right hand side","matrix * solution","difference"
redim as double ans()
mult(mat1(),solution(),ans())

for n as long=1 to 20
print rhs1(n),ans(n),rhs1(n)-ans(n)
next

Sleep

 
nddulac
Posts: 11
Joined: Oct 26, 2009 5:34
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by nddulac »

Just to be clear, I'm really not much of a programmer. I learned Basic in high school (late 1970s) and never really progressed much beyond. So I am sure that there are better algorithms, and I am also sure (I have data now to back up that certainty!) that a better understanding of compiler flags and options would improve execution speed. But none of that was the point of my observation. My point was that something in the processors changed.

In the case of 12th-generation Intel processors, at least one very relevant change is documented: the removal of the AVX-512 instruction set (see https://www.makeuseof.com/what-is-avx-5 ... illing-it/ and https://www.tomshardware.com/news/intel ... in-silicon). That has a huge impact on floating point math, and explains why a piece of code runs so much more slowly on a 12th gen processor (in the case, an i5-12400) than on my laptop's 1th-gen processor (an i5-11400H). I assume something similar happened in Ryzen 5000 series processors, as they show the same anomalous behavior.

I have more to say on this topic, as it has been a fascinating road for me to explore. Among things I have discovered is that using

Code: Select all

fbc64.exe -gen gcc -Wc -march=native,-Ofast -lang qb fname.bas
may not compile to the best CPU target (especially since as best as I can tell version 9.3 of gcc doesn't include some of the newer cpus, such as alderlake and raptorlake (12th and 13th gen Intels, for people like me who have an easier time with numbers than names)) as potential targets for the march= flag. Anyhow, more on that in a future post. I still have some testing to do!
nddulac
Posts: 11
Joined: Oct 26, 2009 5:34
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by nddulac »

marcov wrote: Mar 15, 2023 10:41 The benchmarking respective to geekbench also complicates. What if the anomaly is in geekbench, not in your programs?
This is a good point to think about. As a control, I have also used a molecular modeling package (Gaussian 16), and have run all of the tests on both windows and linux. (As a chemist, my main interest in this project was to see if Gaussian was also slow on the newer chips, as that is a package I use all of the time in my research - the TLDR is that it is not - an observation I also find interesting.) The long and short of it is that Gaussian execution times track with Geekbench5 scores across all 20 APUs I tested on my testbench, but my FB compiled code reproducibly shows a huge slowdown on 5000 series chips.
E.g. if it is very integer bound and you are very FPU bound. Note also that the 5000 series of CPUs is a mix of Zen2(+?) and Zen3. E.g. the 5700g is Zen2 (related to the laptop 4xxx series) and the 5700x is Zen3.
Also an interesting point to consider. However, I see the same slow execution times on my 5950X, 5800X, and 5500 as I do on the 5300G, 5350G, 5600G, 5650G, and 5700G. I'll share these results soon (I have one more CPU to finish, but by office internet connection went out, so I went home!)
nddulac
Posts: 11
Joined: Oct 26, 2009 5:34
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by nddulac »

Provoni wrote: Mar 15, 2023 9:59 Can you tell me briefly about the flow of your program?
I'm afraid I don't know much about multithread coding. (I wouldn't know what a "mutex" was if it hit me in the face!) I have not, however, attempted to parallelize anything in my code. I would be happy to share my code with anyone interested, however, including the scripts I have been using to compile it and run the executable for benchmarking purposes.
dafhi
Posts: 1650
Joined: Jun 04, 2005 9:51

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by dafhi »

i experienced similar slowness
neil
Posts: 594
Joined: Mar 17, 2022 23:26

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by neil »

@dafhi
I have an AMD Ryzen 7. It's still fast with compiled FreeBasic code. If you have the the gcc\g++ compiler try dodicat's myrandoms.cpp and do Practrand test. Mine only took 63 minutes for a TeraByte. Let me know what your results are,
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Compiled code runs anomolously slowly on 5000 series Ryzen CPUs

Post by marcov »

nddulac wrote: Mar 29, 2023 22:15 Also an interesting point to consider. However, I see the same slow execution times on my 5950X, 5800X, and 5500 as I do on the 5300G, 5350G, 5600G, 5650G, and 5700G. I'll share these results soon (I have one more CPU to finish, but by office internet connection went out, so I went home!)
It is weird. One can have various opinions on Intel vs AMD, but the differences should not be that pronounced.

I bought a 5800H last week (laptop to complement my existing desktop 5700X). I only had a Celeron G6900 (which is budget 12th gen Intel) to benchmark against, but the AMDs ran circles around them. (as they should, since the AMDs were not budget processors). It was not so bad as it used to be though, seems the cache difference between budget and mid segment (highest 5, lowest 7) matters less for my purposes than it used to.

AMD is a bit less single core, and the energy efficient cores of the 12th and 13th generations are surprisingly good (as in, overall performance is not just bound by the P cores). Anything before 12th generation is no match for Ryzens (with maybe the Ice Cove laptops as exception, the ones with a letter in the type number). But your differences are too large and weird.

I have a slight preference for AMD, mostly because AMD is more liberal with rolling out features to the middle priced segment, while Intel often reserves stuff to the high end for a long time. I do my own benchmarking (see below) because in the past specially multi media instruction sets mattered heavily in the published benchmarks, and compilers hardly use them. I also like machines to be relatively silent, and the Intels are power hogs.

The published GCC benchmarks are hard to interpret because they are fully multi core, while FPC bootstrap rarely goes over 4 cores. It makes it hard to compare 4 vs 6 vs 8 vs 16 core systems

I heard somewhere that AMD (but those were pre Ryzen AMDs) handled denormal floating point relatively slowly(it is something that shouldn't happen, but...). Still, denormalized use would have to be a substantial part of your rate determining calculation to notice this.

Make sure your program doesn't operate on uninitialized floating point variables, and avoid messing with stored floating point on the bit level.

I myself use a Free Pascal compile cycle as a benchmark. The 5700x did a bootstrap of the project in about 1:10, and the 5800H laptop (same arch, roughly same frequency) now about 1:40. That was after turning off Windows Defender and setting the power to always on. As I received the laptop, the times were 2:40. Configuring power dropped to 2:10, disabling Defender the rest.)

I still feel the laptop has room for improvement, but possibly the bottleneck is the SSD (an OEM Samsung with no or little cache), which makes the INSTALL step with its massive writes relatively slow. In the weekend I plan to time the stages (clean build install) separately to determine this.
Post Reply