10X the work would be done only if each thread were running on its own core. If, for example, you have 2 cores and 10 threads, only 2 threads will be running at a time, with the other 8 waiting to be scheduled to run. So if when running one thread on a single core the delay loop can complete in .354 seconds, running 10 threads 2 at a time, and ignoring the overhead for switching between the threads, etc, will take (10/2)*.354=1.77 seconds.chjmartin2 wrote: Please clarify: In the total execution time, you also get the benefit of 10X more work, or the same work using 0 threads versus 10 threads. We have two different total execution times depending on the system:
1 thread is 0.354 or 0.267
10 threads is 1.85 or 1.10
If the same amount of computations are done, then in both instances 10 threads are worse, but if 10X the work is done then you'd divide the execution time by 10 to get to the equivalent.
The total work per unit time is proportional to the number of cores, not the number of threads.