FreeBASIC 1.08 Development

For other topics related to the FreeBASIC project or its community.
deltarho[1859]
Posts: 2689
Joined: Jan 02, 2017 0:34
Location: UK

Re: FreeBASIC 1.08 Development

Postby deltarho[1859] » Oct 04, 2020 21:36

I'm back.

I first noticed something 'strange' about PowerBASIC's RND in January 2016. If I used RND in a primary thread and a secondary thread the throughput collapsed. One of the members, Paul Dixon, is an expert on threads, and he wrote: "When both threads call the same code to generate the next number there's a conflict with 2 CPU cores trying to change the same memory locations and the CPU has to re-cache the new number across all the CPU cores before it can continue. In effect, you're repeatedly forcing a cache miss which slows everything down." Sharing the RND code is not an issue but RND uses memory to remember its state vector and that is used on the next call to RND. With RND used in two separate threads the same memory, holding the state vector, is being changed by the two threads and that is where the conflict is.

In the thread 'So, RND is not thread safe then.' Mersenne Twister (MT) comes in at 84MHz. However, if MT is used in a primary thread and a secondary thread the throughput for each drop to 42MHz. MT's state vector is also being changed by the two threads. When I looked at PCG32 I duplicated everything in sight and got no collisions. I wrote that thread in June 2017 only a few months as a member of the forum. St_W introduced me to using procedures in UDTs leading me to PCG32II and I thought it was that when no collisions occurred.

In fact, the reason no collisions occurred was because the state vector was being duplicated and not because the generator was in the UDT. Each time the UDT is used a new memory location is created for the state vector so it will only be changed by the new UDT instance and, therefore, no conflict.

It still makes sense to have the generator as an element of the UDT otherwise we would have to duplicate it but that has nothing to do with collisions.

I no longer disagree with fxm and stand corrected with everything he wrote.

All of FreeBASIC's generators can be made thread safe if each instance can be given a unique memory location for its state vector. That should not be difficult if all the state vectors were the same size, but they are not. However, that should not be unsurmountable. Can I do it? Maybe, but I can think of a few who would get there a lot quicker than me.

Thread local storage? I am not acquainted with that but is sounds interesting.

So, who needs a thread safe RND anyway? Not many but coderJeff is looking at it and, I cannot remember which, but two languages at least have thread safe generators, and they did not develop them for the fun of it. Image

Added:
Yours truly wrote:So, if pcg32A and pcg32B are seeded differently then not only are we using two separate sequences we are using different sequences.

I then wrote: "It would seem that is also false. It looks like we are using the same sequence but different entry points."

Oh dear, that is wrong. PCG32 uses a seed and a sequence number. If pcg32A and pcg32B use a different sequence number then we are using different sequences. The state vector is seed and sequence. If we use the same sequence number but a different seed then we are using the same sequence but different entry points. It is amazing what we forget as time marches on.

Corection: In the last paragraph I wrote "The state vector is seed and sequence." That should read "The state vector is state and sequence." The initial state is the seed and it gets updated on each random number request. Even that is not strictly true because the generator is warmed up by burning a bunch of outputs before releasing any random numbers to us.
Last edited by deltarho[1859] on Oct 05, 2020 4:28, edited 2 times in total.
deltarho[1859]
Posts: 2689
Joined: Jan 02, 2017 0:34
Location: UK

Re: FreeBASIC 1.08 Development

Postby deltarho[1859] » Oct 05, 2020 0:27

I have been looking at TLS and it looks a bit heavy to me.

It is worth remembering that each thread has its own stack. Perhaps we can make use of that. Just an idea - probably barmy.
fxm
Posts: 9984
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: FreeBASIC 1.08 Development

Postby fxm » Oct 05, 2020 20:53

fxm wrote:When a new UDT instance is created, the code of the member procedures is not duplicated (only non-static data members are duplicated).

Now this is pointed out in the Programmer's Guide:
ProPgUDTs → fxm [added small paragraph 'UDT Instantiation']
coderJeff
Site Admin
Posts: 3342
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: FreeBASIC 1.08 Development

Postby coderJeff » Oct 08, 2020 0:53

deltarho[1859] wrote:I have been looking at TLS and it looks a bit heavy to me.

It is worth remembering that each thread has its own stack. Perhaps we can make use of that. Just an idea - probably barmy.

Yeah, TLS is not too bad.

I tried a mutex implementation with some extra runtime API functions that bypass conditionals and conversions. But still uses one state for the RNG so lots of time is spent competing for a lock on the RNG regardless, and still uses some pointer indirection and function calls, which I think is unavoidable in fbc.

For maximum bandwidth, would be looking for something that uses thread local storage, is inline, no pointer indirection, no function calls, no unnecessary conversions, and no conditionals.

We can't do the ideal for a few reasons:
- freebasic has no inline functions (#macro's maybe can fake it out though).
- pointer look-ups and function calls needed due to RANDOMIZE and RND() mapping to multiple RNG's.
- conversions to double due to traditional RND() as double usage in freebasic
- RND(arg) preforms a conditional on arg for alternate behaviour - return new number or last number.

I think with TLS at least we will at least get rid of a mutex lock bottleneck. It will be at least thread-safe and best we can do given other limitations. With a few extra API functions, might be able to shave off a few cycles for unnecessary computations. For users that really want to optimize a RNG, custom implementations are probably still going to win the performance race. But, I have no idea what the comparison will be until I write it and try it out.

Return to “Community Discussion”

Who is online

Users browsing this forum: No registered users and 7 guests