With multi-threaded programs, heavy use of the rnd function (think stochastic hill-climber) will slow down the program greatly. This has been a long time problem with one of my programs and it must be some issue. Example code follows, change the thread variable from 1 to 4 and see what happens.
Is it perhaps just the overhead of the rnd function?
screenres 800,600
randomize timer,4
declare sub freebasic_rnd(byval nopointer as any ptr)
declare sub custom_rnd(byval nopointer as any ptr)
dim as integer i,j,k
dim shared as integer threads=1 'change to 4 and compare timings
dim as any ptr thread_ptr(threads)
dim as double t=timer
for i=1 to threads
thread_ptr(i)=threadcreate(@freebasic_rnd,0)
sleep 10
next i
for i=1 to threads
threadwait(thread_ptr(i))
next i
print "freebasic rnd timing: "+str(timer-t)
t=timer
for i=1 to threads
thread_ptr(i)=threadcreate(@custom_rnd,0)
sleep 10
next i
for i=1 to threads
threadwait(thread_ptr(i))
next i
print "custom rnd timing: "+str(timer-t)
sleep
sub freebasic_rnd(byval nopointer as any ptr)
dim as longint i,j
for i=1 to 50000000/threads
j+=rnd*123
next i
end sub
sub custom_rnd(byval nopointer as any ptr)
dim as longint i,j,m=123
for i=1 to 50000000/threads
m=(214013*m+2531011)mod 2147483648
j+=((m shr 16)/32768)*123 '32767
next i
end sub
Last edited by Provoni on Nov 03, 2019 9:24, edited 1 time in total.
Does not seem to contain any locks at first sight, but on the other hand uses a shared (static) state variable, so there should be some synchronisation necessary but I haven't taken a closer look into the code.
btw. there has been a really good discussion about RNGs recently, started by user "deltarho[1859]", who also proposed alternative (better, faster) implementations in the "Tips" section on this forums. Definitely worth a look.
Bump: this is still happening and makes the use of rnd while threading in a performance critical application unpractical. Why does rnd slow down so much when called from many threads?
Why does rnd slow down so much when called from many threads?
Because the many threads are calling the same function, ie RND, and there will be collisions causing a drop in throughput. The generator has to be thread safe to avoid collisions and very few generators are thread safe, none of the FreeBASIC generators are thread safe. My PCG32 was not thread safe. PCG32II, on the other hand, is thread safe provided each thread invokes its own generator.
Why does rnd slow down so much when called from many threads?
Because the many threads are calling the same function, ie RND, and there will be collisions causing a drop in throughput. The generator has to be thread safe to avoid collisions and very few generators are thread safe, none of the FreeBASIC generators are thread safe. My PCG32 was not thread safe. PCG32II, on the other hand, is thread safe provided each thread invokes its own generator.
Okay that explains it.
With own generator you mean something like this right?
Any generator can be made thread safe, but they will need a major rewrite.
However, why bother because none of the FB's generators are particularly fast compared with the generators developed in the last few years. I knocked out an assembler version of the Mersenne Twister which pushed the throughput quite a bit but even it is left standing nowadays. In addition, none of the FB's generators will pass a PractRand test to 16TB.
For about a year I was using RND in multi-threaded code and all I knew was that there were significant diminishing returns from using more and more threads. So I ran multiple instances of my program to overcome this. Per accident I found out it was RND causing a massive slowdown. I think there should be at least more awareness about this issue because it may affect some people. Perhaps it's just me. :)
Thread safety is a computer programming concept applicable in the context of multi-threaded programs. A piece of code is thread-safe if it functions correctly during simultaneous execution by multiple threads. In particular, it must satisfy the need for multiple threads to access the same shared data, and the need for a shared piece of data to be accessed by only one thread at any given time.
There are a few ways to achieve thread safety:
Re-entrancy:
Writing code in such a way that it can be partially executed by one task, reentered by another task, and then resumed from the original task. This requires the saving of state information in variables local to each task, usually on its stack, instead of in static or global variables.
Mutual exclusion:
Access to shared data is serialized using mechanisms that ensure only one thread reads or writes the shared data at any time. Great care is required if a piece of code accesses multiple shared pieces of data—problems include race conditions, deadlocks, livelocks, starvation, and various other ills enumerated in many operating systems textbooks.
Thread-local storage:
Variables are localized so that each thread has its own private copy. These variables retain their values across subroutine and other code boundaries, and are thread-safe since they are local to each thread, even though the code which accesses them might be reentrant.
Atomic operations:
Shared data are accessed by using atomic operations which cannot be interrupted by other threads. This usually requires using special machine language instructions, which might be available in a runtime library. Since the operations are atomic, the shared data are always kept in a valid state, no matter what other threads access it. Atomic operations form the basis of many thread locking mechanisms.
I think that the 'Rand()' built in function, with probably a mutex exclusion inside for calling the generator, has become thread-safe but is therefore no longer really parallelizable in the facts.
@coderJeff Could RND be made thread-safe? Is this on the to do list?
I'm a bit confused. Clearly Rnd()'s implementation is thread safe. It can be called from multiple threads and returns sane values as you yourself demonstrated. Being slow doesn't make it unsafe.
That's what I tried to explain in my previous post:
I think that with probably a mutex exclusion inside for calling the generator, the 'Rnd()' built in function has become thread-safe but the generator calls are therefore no longer really parallelizable in the facts (in order to improve the execution time when the multithreading is used).