Parallel for loop in freebasic idea

General FreeBASIC programming questions.
Post Reply
Gunslinger
Posts: 115
Joined: Mar 08, 2016 19:10
Location: The Netherlands

Parallel for loop in freebasic idea

Post by Gunslinger »

I was watching some game development and came across a parallel for loop in unity.
After some more research i am not clear what programming language are using that syntax.

Looks like this:

Code: Select all

   Parallel.For (0, texture.height, y => {
      for (int x = 0; x < texture.width; x++) {
         int sourceIndex = (y * texture.width) + x;
         dest[sourceIndex].r = source[sourceIndex].r;
         dest[sourceIndex].g = source[sourceIndex].g;
         dest[sourceIndex].b = source[sourceIndex].b;
      }
   });
I do find it very interesting, a easy way for splits task over more cores by only putting parallel in front of for.
Sinds i am not very good at multi core programming or memory pointer stuff, i think you can do this better job than i did :)
Still made a little code to try it out my self.

compiled with fbc -w all "%f" -target win32

Code: Select all

#include "fbthread.bi"
Const MAX_THREADS = 31

Function GetCPUCores_x2 As UByte
	Dim As UByte numcores
	Asm
		mov eax,&h0A
		cpuid
		mov [numcores],al
	End Asm
	If numcores = 0 Then numcores = 1
	Return numcores*2-1  'multiplay by 2
End Function

type range_type
	as long low
	as long high
	as any ptr sub_ptr
end type

sub Launch_sub(byval mysub as sub(byref a as long, byref b as long), myptr as range_type ptr )
	dim as range_type ptr parallel_ptr ' pointer to org memory
	parallel_ptr = myptr
	mysub( myptr->low, myptr->high )
end sub

Sub Parallel_thread(byval userdata as any ptr)
	' Work (other thread)
	dim as range_type ptr test3  ' pointer to org memory
	test3 = userdata
	Launch_sub(test3->sub_ptr, userdata)
End Sub

sub thread_launch(num_of_cores as byte, byref a as any ptr, range_low as long, range_high as long)
	dim as long size = range_high - range_low
	if size - range_low < num_of_cores or num_of_cores > MAX_THREADS then exit sub
	
	dim as any ptr thread(MAX_THREADS)
	dim as range_type range(MAX_THREADS)
	dim as byte i
	
	if num_of_cores > 0 then
		dim as range_type testers(MAX_THREADS)
		'range(0).low = range_low
		
		for i = 0 to num_of_cores
			var steps = (size \ (num_of_cores+1)) * i
			'print range_low + steps, range_low + steps + (size \ (num_of_cores + 1))-1
			range(i).low = range_low + steps
			range(i).high = range_low + steps + (size \ (num_of_cores +1))-1
		next
		range(num_of_cores).high = range_high
		
		for i = 0 to num_of_cores 
			testers(i).low  = range(i).low
			testers(i).high = range(i).high
			testers(i).sub_ptr = a
			thread(i) = ThreadCreate(@Parallel_thread, @testers(i))
			'ThreadWait(thread(i))
		next i
		
		for i = 0 to num_of_cores 
			ThreadWait(thread(i))
		next
	else
		dim as range_type test = ( range_low ,range_high, a)
		thread(0) = ThreadCreate(@Parallel_thread, @test)
		ThreadWait(thread(0))
	end if
end sub

sub tt(byref a as long, byref b as long)
	'print "hallo", a, b
	
	dim as double do_somthing = 0
	for i as longint = a to b
		do_somthing ^= i
		do_somthing += i
	next i
end sub


dim as double t = timer
print "number of threads: "; GetCPUCores_x2+1

do
	
	t = timer
	thread_launch(GetCPUCores_x2, @tt(), 0 , 100000000)
	print "multi core",timer - t
	'print
	
	t = timer
	thread_launch(0, @tt(), 0 , 100000000)
	print "single core",timer - t
	print
	
	sleep 1000
loop until inkey <> ""
print


sleep
end
Output time in seconds:
number of threads: 8
multi core 2.703375900006378
single core 16.82610790000416


I wish this was possible in freebasic in a easier way than this.
The loop is spited up into pieces for each core in my code.
Problem is that it has to create that threats at that point.
It is probably better to pre create the threats and call them when needed.
Also give it a more realistic load to compute.

Detecting number cores sometime is incorrect i think (by 2 cores ?)
Now days computers get more and more cores and still not easy to work with.
Last edited by Gunslinger on Dec 05, 2023 12:03, edited 2 times in total.
angros47
Posts: 2385
Joined: Jun 21, 2005 19:04

Re: Parallel for loop in freebasic idea

Post by angros47 »

Shouldn't such a feature be something that happens under the hood? The GCC compiler has something like that:

https://gcc.gnu.org/wiki/AutoParInGCC

https://hub.packtpub.com/gnu-community- ... compilers/

Since FreeBasic can use GCC as backend, it should be able to use it with no need to change your code.
Provoni
Posts: 521
Joined: Jan 05, 2014 12:33
Location: Belgium

Re: Parallel for loop in freebasic idea

Post by Provoni »

You may have a hybrid CPU where some cores don't have hyperthreading. I know it's besides the point of your thread but with such a CPU make sure that in Windows under Power settings, you have selected the best performance scheme (Beste prestaties) or Windows will offload your FreeBASIC program to the efficiency cores causing massive slowdown.

My program makes heavy use of Fb's multithreading capabilities and have to say that my experience has been okay. When you get everything set up correctly Fb can actually be quite fast. I'm not good at it either and rarely use pointers
Gunslinger
Posts: 115
Joined: Mar 08, 2016 19:10
Location: The Netherlands

Re: Parallel for loop in freebasic idea

Post by Gunslinger »

angros47 wrote: Dec 02, 2023 23:52 Shouldn't such a feature be something that happens under the hood? The GCC compiler has something like that:

https://gcc.gnu.org/wiki/AutoParInGCC

https://hub.packtpub.com/gnu-community- ... compilers/

Since FreeBasic can use GCC as backend, it should be able to use it with no need to change your code.
Yes i thinks this has to be implanted under the hood of Freebasic.
I don't think a can do that, sorry.

What is interesting if i compile with: fbc -w all "%f" -gen gcc -Wc -O3 -target win64
Then i get outputs like this:

nummer of threads: 8
multi core 0.001392899999927977
single core 0.001112699999794131

This happens because of the bad load simulation i think.
Provoni wrote: Dec 04, 2023 22:06 You may have a hybrid CPU where some cores don't have hyperthreading. I know it's besides the point of your thread but with such a CPU make sure that in Windows under Power settings, you have selected the best performance scheme (Beste prestaties) or Windows will offload your FreeBASIC program to the efficiency cores causing massive slowdown.

My program makes heavy use of Fb's multithreading capabilities and have to say that my experience has been okay. When you get everything set up correctly Fb can actually be quite fast. I'm not good at it either and rarely use pointers
Thanks for the tip i have to check the power settings on other laptop.
adeyblue
Posts: 351
Joined: Nov 07, 2019 20:08

Re: Parallel for loop in freebasic idea

Post by adeyblue »

Gunslinger wrote: After some more research i am not clear what programming language are using that syntax.
It's C#
angros47 wrote: Dec 02, 2023 23:52 Since FreeBasic can use GCC as backend, it should be able to use it with no need to change your code.
You can do it already. To get it to happen in GCC 'automatically' you need to compile your code with
-Wc -fprofile-generate
run the app how it is likely to be used a few times, then recompile it with
-Wc -fprofile-use=<pathToGeneratedData>
and then GCC has to decide that the profiling shows it's worth doing. You have to do that every time you change the code. Well, I don't suppose you /have/ to, but if the profiling data doesn't match the actual code it'll probably either not do anything or optimize incorrectly.

GCC does support the openmp pragmas though (the auto feature is essentially the same as placing those intelligently).

Oh, it doesn't look like the Windows version of the FB compilers come with libgomp or libgcov, so none of these are going to work on Windows right now.
marcov
Posts: 3503
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Parallel for loop in freebasic idea

Post by marcov »

It seems that your code is basically a memory copy that will be memory, not CPU bound with sufficient enough operations. So IMHO this is not really a case for parallel for. Use memcpy or derivatives, or improve the compiler to unroll and optimize the three component RGB24 assignments to e.g. 3 32-bit operations
shadow008
Posts: 118
Joined: Nov 26, 2013 2:43

Re: Parallel for loop in freebasic idea

Post by shadow008 »

marcov wrote: Dec 06, 2023 21:41 It seems that your code is basically a memory copy that will be memory, not CPU bound with sufficient enough operations. So IMHO this is not really a case for parallel for. Use memcpy or derivatives, or improve the compiler to unroll and optimize the three component RGB24 assignments to e.g. 3 32-bit operations
I dunno if this would always be the case. Considering the original post appears to be referencing copying textures, it could be the case that these textures are small and could fit in L1/L2 cache. Many many modern architectures have per-core lower level cache.

That's all I got. It's an incomplete thought and I have no numbers to back it up. But in this narrow situation, one might just not hit that "sufficient enough operations" threshold.
marcov
Posts: 3503
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Parallel for loop in freebasic idea

Post by marcov »

shadow008 wrote: Dec 07, 2023 7:49 I dunno if this would always be the case. Considering the original post appears to be referencing copying textures, it could be the case that these textures are small and could fit in L1/L2 cache. Many many modern architectures have per-core lower level cache.
Usually 256KB. But yes, you can be right, but IMHO only when carefully crafted for a specific CPU generation. There is no "general" case.
Post Reply