Multithread Parameter Passing

MichaelW · Post by **MichaelW** » Sep 17, 2013 4:25

Regarding the main thread, in my code during the test the main thread is suspended waiting for the WaitForSingleObject call to return. And AFAIK ThreadWait works the same way.

chjmartin2 wrote: Please clarify: In the total execution time, you also get the benefit of 10X more work, or the same work using 0 threads versus 10 threads. We have two different total execution times depending on the system:

1 thread is 0.354 or 0.267
10 threads is 1.85 or 1.10

If the same amount of computations are done, then in both instances 10 threads are worse, but if 10X the work is done then you'd divide the execution time by 10 to get to the equivalent.

10X the work would be done only if each thread were running on its own core. If, for example, you have 2 cores and 10 threads, only 2 threads will be running at a time, with the other 8 waiting to be scheduled to run. So if when running one thread on a single core the delay loop can complete in .354 seconds, running 10 threads 2 at a time, and ignoring the overhead for switching between the threads, etc, will take (10/2)*.354=1.77 seconds.

The total work per unit time is proportional to the number of cores, not the number of threads.

Post by **fxm** » Sep 17, 2013 7:20

chjmartin2 wrote:I was so excited to test this code when I got home... I copied and pasted it and got this compilation error:

error 25: Invalid data types, before ',' in 'Print n & Iif(n=0, " (1 main instead)", " "), t,, Cast(Single, t / Iif(n=0, 1, n))'

- Sorry: "Iif" accepts strings from the fbc version 0.90.0 only.
Today, official revision is 0.90.1.!
But I modified this line of my program to be compatible with previous versions.
Workaround principle:
Iff(condition, string1, string2) *Iif(condition, @string1, @string2)

- Your processor:
Intel® Xeon® Processor X5355
(8M Cache, 2.66 GHz, 1333 MHz FSB)
# of Cores: 4
With my program, you should see the effect of the 4 cores!

(updated program at http://www.freebasic.net/forum/viewtopi ... 66#p191066)

chjmartin2 · Post by **chjmartin2** » Sep 18, 2013 2:16

Code: Select all

Number of threads           Total execution time        Execution time per job
 0 (1 main instead)          0.2911277                   0.2911277
 1                           0.3126467                   0.3126467
 2                           0.297321                    0.1486605
 3                           0.31343                     0.1044767
 4                           0.2973417                   0.07433543
 5                           0.3101213                   0.06202426
 6                           0.3218256                   0.05363761
 7                           0.3015293                   0.04307562
 8                           0.3229882                   0.04037353
 9                           0.4180999                   0.04645555
 10                          0.4943856                   0.04943856

Apparently I should use 8 threads - which is amazing, because I have two physical processors each with 4 cores. If I read this correctly, then in a perfect world, or at least on a comparative basis, it would go 7x faster. May not be fast enough to make a difference. It will just have to run and run and run and run... I'm stuck on how to approach it now, knowing that after 10 threads, it stops being worth it.

Post by **fxm** » Sep 18, 2013 7:11

chjmartin2,

Your own measure:

Theorical response for a 2*4 cores (this model is computed from the measured value for 1 thread):

In order to improve the accuracy of your measurement and better see your floor of "Execution time per job" (normally for 2*4=8 threads), you could increase the loop duration and the maximum number of threads.
Take care to have the minimum of other programs running in background.

My test as an example with the following new parameters:
#define Loops 1000000000
#define NbThreadMax 12

Code: Select all

Number of threads           Total execution time        Execution time per job
 0 (1 main instead)          3.20906                     3.20906
 1                           3.382109                    3.382109
 2                           3.593634                    1.796817
 3                           5.388273                    1.796091
 4                           7.218829                    1.804707
 5                           9.039326                    1.807865
 6                           10.7922                     1.7987
 7                           12.57691                    1.796701
 8                           14.37852                    1.797315
 9                           16.12555                    1.791728
 10                          17.96364                    1.796364
 11                          19.73356                    1.79396
 12                          21.52302                    1.793585

marcov · Post by **marcov** » Sep 18, 2013 10:34

For modern processors, your theoretical assumption is not correct, since core clocks are dynamically changed to matched a certain dissipation profile. Afaik that is an property of consumer processors though, don't know if server ones too (and given the fact that you have a SMP system, I assume you have Xeons)

Or: a CPU with 1 core active will run at a higher frequency than running with all 4 active cores. Typically rule of thumb is 10%. Intel calls this turboboost iirc.

Post by **fxm** » Sep 18, 2013 10:53

Up to 8 threads, that seems to be coherent between your measure and my model (the model is computed from the measured value for 1 thread):

In any case, your optimum should be reached for 8 threads.

Provoni · Post by **Provoni** » Feb 07, 2014 15:27

Intel i7 930 @ 3.5Ghz (Quad core + HT):

Code: Select all

Number of threads           Total execution time        Execution time per job
 0 (1 main instead)          0.2411586                   0.2411586
 1                           0.2395322                   0.2395322
 2                           0.2464726                   0.1232363
 3                           0.2388591                   0.07961971
 4                           0.290635                    0.07265875
 5                           0.3490039                   0.06980078
 6                           0.3822908                   0.06371514
 7                           0.4027184                   0.0575312
 8                           0.4563661                   0.05704576
 9                           0.5037312                   0.05597013
 10                          0.5799475                   0.05799475

Multithread Parameter Passing

Re: Multithread Parameter Passing

Re: Multithread Parameter Passing

Re: Multithread Parameter Passing

Re: Multithread Parameter Passing

Re: Multithread Parameter Passing

Re: Multithread Parameter Passing

Re: Multithread Parameter Passing