So, RND is not thread safe then.

St_W · Post by **St_W** » Jun 22, 2017 10:02

deltarho[1859] wrote:Dim creates space on the stack? I didn't know that - Help just says "Declares a variable by name and reserves memory to accommodate it." This is also different to PowerBASIC. Ouch!

By default variables are allocated on the stack. This is not the case for "dim shared" (and maybe there are some other exceptions that don't come into my mind currently). When you allocate memory explicitly with new or (c)allocate you get memory from the heap and need to free/delete it yourself. How does this work in PowerBasic? It can't be that different I guess.

deltarho[1859] wrote:I am getting '142 mill/sec 145 mill/sec' compared with my 100% code duplication of '195 mill/sec 193 mil/sec'.

Hm, would be interesting why it's clearly slower then. On my PC the difference was just marginal, but I've tested a fbc64 debug build on a quite old machine (Q6600, 4-cores) and got 16 or 17 mill/sec there.
Anyway some results seem very strange. For example on my current machine running a fbc32 (-gen gas) build with two threads results in a dramatically lower throughput compared to the single-threaded one. This does not happen for the fbc64 (-gen gcc) build. Maybe there's still some error in the code ..

deltarho[1859] wrote:Now, call me old fashioned but we should not have to use a dummy variable to get something to work. For me that is a bug workaround. The compiler should allow an empty type structure and issue a warning along the lines of "Oy, pudding head - you've got an empty ENUM, TYPE or UNION".<smile>

I think the question is rather: why would we need an empty type? The limitation is probably a technical one. Remember that each function in a type actually has a hidden "this" pointer as argument, but is otherwise identical to "plain" functions (the function exists only once and is technically not part of the type). So e.g.
declare function f()
would become something like
declare function f(this as myType ptr)
internally. That would make no sense if myType is empty as there would be a pointer to a zero-length memory. So, what would be the advantage of allowing empty types? You can have the namespace nesting (namespace.function) by using namespaces directly instead of types. Are there any other things?

deltarho[1859] wrote:pcg32A did not have to be global [...]

Of course you're right. I've just been lazy :-) One should generally avoid global variables if not necessary, as you already mentioned.

deltarho[1859] wrote:
St_W wrote:I think you would be a perfect candidate for improving the RND implementation in FreeBasic's runtime library.
Et tu, Brute?

Thanks, but my knowledge about RNG is quite limited.

Post by **fxm** » Jun 22, 2017 11:45

The multithreading behavior performance is right if we use the C function Rand() instead of the FreeBASIC function Rnd().
(I think it's a simple workaround for now)

Results (with FB function Rnd()):
- one thread : 44 miil/sec
- two threads and one core only : 22 miil/sec, 22 miil/sec, => OK
- two threads and two cores : 12 miil/sec, 12 miil/sec, => NOK

Results (with C function Rand()):
- one thread : 58 miil/sec
- two threads and one core only : 29 miil/sec, 29 miil/sec, => OK
- two threads and two cores : 58 miil/sec, 58 miil/sec, => OK

deltarho[1859] · Post by **deltarho[1859]** » Jun 22, 2017 14:28

St_W wrote:Hm, would be interesting why it's clearly slower then.

There will be an extra overhead with calling pcg32A.rands() compared with calling pcg32AS()

Code: Select all

.

Added: If we can get the address of pcg32A.rands() then we can use a procedure pointer and eliminate the overhead but the compiler does not like procptr(pcg32A.rands()).
 
This is what I get with FreeBASIC's algorithms
[code]1  66 30/30
2 102 19/19
3  82 19/19
4  95 20/20
5 Too slow to catch a cold!

The first column is one thread and the second column is two threads, two cores.

So, on my machine none of the algorithms are thread safe.

St_W · Post by **St_W** » Jun 22, 2017 15:28

deltarho[1859] wrote:There will be an extra overhead with calling pcg32A.rands() compared with calling pcg32AS()

Code: Select all

.
Added: If we can get the address of pcg32A.rands() then we can use a procedure pointer and eliminate the overhead but the compiler does not like procptr(pcg32A.rands()).[/quote]
Hm, that could be the problem, but in a slightly different way. The call could be more expensive because one parameter (the implicit "this" pointer = @pcg32A) is passed while your method has that value hardcoded. If the gcc doesn't optimize that call there's the overhead of pushing that pointer on the stack and popping it from the stack for each call.

I didn't even know that "procptr" existed up to now, I've always used "@" for that.

deltarho[1859] · Post by **deltarho[1859]** » Jun 22, 2017 16:33

Found an example 'Pointers to member procedures'. Pretty heady stuff but it slowed things down.

If the gcc doesn't optimize that call there's the overhead of pushing that pointer on the stack and popping it from the stack for each call.

You hit the nail on the head, St_W.

Did a speed test as a benchmark for today and got: 144 mill/sec 145 mill/sec.

I then got gcc to get really nasty with -O3 as opposed to -O2. Normally a -O3 hasn't been worth it.

I am now getting: 213 mill/sec 227 mill/sec

A single thread run came in at 216 mill/sec, which is only marginally faster then -O2.

We now have your safe thread code coming in a tad faster then my code duplication method and the code which did not concern itself with thread safety.

What a choice: We can use a non thread safe version or a thread safe version which is a little faster. Ermm <laugh>

<decidedly big smile>

So, RND is not thread safe then.

Re: So, RND is not thread safe then.

Re: So, RND is not thread safe then.

Re: So, RND is not thread safe then.

Re: So, RND is not thread safe then.

Re: So, RND is not thread safe then.