Multithread Parameter Passing

General FreeBASIC programming questions.
chjmartin2
Posts: 11
Joined: Sep 15, 2013 18:57

Multithread Parameter Passing

Postby chjmartin2 » Sep 15, 2013 19:10

Hi,

I am sure this is straightforward to many of you, but I am at a loss. I have read the examples, and searched for responses to other inquiries on the topic, but I am stuck. I have a program that has to loop through 511*511 combinations of comparisons to determine two output values (in this case - two character codes from 0 to 255) from a given set of input coordinates in an image. Since the determination of the optimal characters are independent of each other and because the code executes far too slowly I am trying to use additional threads to do that calculation.

I want to be able to do this (not real code, conceptual), x and y are values to send to the thread and I want it to spit back mini and minj. x is from 1 to 320, y is from 1 to 200, mini and minj are from 0 to 255. mini and minj are found at the same time and both variables need to come back

Code: Select all


for y=1 to 200 step 8
     for x=1 to 320 step 8
         
          param = threadcreate (x,y)
          returni=param.mini
          returnj=param.minj

      .... do stuff with the return values ...

     next x
next y

threadwait (wait for them all to finish)

... move on ...



I have not used pointers or custom data types, so I would really appreciate some hand holding from a kind soul! In my head it is simple - I just create a thread for each set of characters. Also, does a thread have access to the data in shared arrays? If not, then I am really toast, because it needs to have access to a set of 20 arrays that contain the image, the character sets, etc.?

Help?

Thanks,

Chris
Pritchard
Posts: 5492
Joined: Sep 12, 2005 20:06
Location: Ohio, USA

Re: Multithread Parameter Passing

Postby Pritchard » Sep 16, 2013 1:27

I'm going to work on this. I'll post again when I have, with some additional feedback. The way your program is written currently, it will not multi-thread properly. There is no point in performing your operations in parallel. ThreadCreate and ThreadWait do not accept arguments or return values in the manner in which they are being used in the sample code you have provided.

ThreadCreate takes a parameter to be used in the thread and returns the thread handle, which must be passed to ThreadWait if you want to force the program to wait for a thread to complete. It does not return the result of the thread.
Pritchard
Posts: 5492
Joined: Sep 12, 2005 20:06
Location: Ohio, USA

Re: Multithread Parameter Passing

Postby Pritchard » Sep 16, 2013 1:53

This is the best thing I could come up with. IMO, it's a bit of a mess. Sorry about that.

Code: Select all

type Min_Max_Comparison
    as integer x, y
    as integer min, max
    as any ptr thread_handle
   
    declare static sub Get_Min_And_Max( param as any ptr )
   
end type

sub Min_Max_Comparison.Get_Min_And_Max( param as any ptr )
   
    ' Our param is our result in this case, because there's not a ton of other
    ' good ways to store the result of a thread's execution because the thread
    ' doesn't return a value... it just accesses "param".
    dim as Min_Max_Comparison ptr result = cast( Min_Max_Comparison ptr, param )
   
    dim as integer min, max
   
    ' Comparisons
    if( result->x <= result->y ) then
        result->min = result->x
        result->max = result->y
    else
        result->min = result->y
        result->max = result->x
    end if
   
    ' For testing purposes...
    print "Comparison between " & result->x & " and " & result->y & " complete!"
   
end sub

' Since we have to store the result of every thread's execution, cache it all.
dim as Min_Max_Comparison compare_result( 320 \ 8, 200 \ 8 )
dim as integer min, max

' Important to get this ordering right.  y, x, lbound(array, index)
for y as integer = lbound(compare_result, 2) to ubound(compare_result, 2)
     for x as integer = lbound(compare_result, 1) to ubound(compare_result, 1)
         
          ' Okay, convert x and y to what they were before.
          compare_result(x, y).x = x * 8
          compare_result(x, y).y = y * 8
         
          ' Have to pass in the address of the sub which will get a thread.
          ' Also need the address of the param item.
          compare_result(x, y).thread_handle = threadcreate ( _
                        procPtr(Min_Max_Comparison.get_min_and_max), _
                        cast( any ptr, @compare_result(x, y) ) )

     next
next

' Wait on all threads to complete.
for y as integer = lbound(compare_result, 2) to ubound(compare_result, 2)
     for x as integer = lbound(compare_result, 1) to ubound(compare_result, 1)
         threadwait( compare_result(x, y).thread_handle )
     next
next

sleep

' Display comparison results.
for y as integer = lbound(compare_result, 2) to ubound(compare_result, 2)
     for x as integer = lbound(compare_result, 1) to ubound(compare_result, 1)
         
        print "X: " & compare_result(x, y).x & " Y: " & compare_result(x, y).y
        print "Min: " & compare_result(x, y).min & " Max: " & compare_result(x, y).max
        print
       
     next
next

sleep


I know the code probably doesn't look how you were hoping. Understand that FreeBASIC is going to execute your code sequentially while the other threads are running. The way you had your threadcreate/threadwait setup just wasn't going to work.

You can't expect to display the results of a thread without having threadwait'd on it first. That is only going to cause you problems.

You can't return a value from a thread, only store the return value in the parameter you pass into a thread. That isn't hard to do. I've created and posted thread classes on the forum before. I'm not sure they're worth their weight in code any more, but essentially, do as I have done except with a simpler data type:

Code: Select all

type ThreadParameter
    inValue as any ptr
    outValue as any ptr
end type


Then wrap all of your thread operations using that.
Pritchard
Posts: 5492
Joined: Sep 12, 2005 20:06
Location: Ohio, USA

Re: Multithread Parameter Passing

Postby Pritchard » Sep 16, 2013 1:57

PS: I'm tired, so I hope everything made sense. I don't even have the energy to read over what I wrote right now.

Can we get maybe more context of the problem? More threads won't necessarily solve what may be an architectural problem. Maybe you need to perform caching, perform calculations in the background, or do some other stuff to improve program efficiency. Threading gets difficult quickly because there's so many considerations and tradeoffs when performing multithreading. Tbh, I almost never use threading directly, even at work.

If threading is the approach you're sure you want to take, I would once again start with very basic threading examples whose proper execution you can verify. Then, move step-by-step until you have solve your problem. That being said, wrapping the thread's input and output values in a ThreadParam type is actually very simple. Just be careful about things which will need mutex locks, which will require a different way of thinking about your code and can effect efficiency.
chjmartin2
Posts: 11
Joined: Sep 15, 2013 18:57

Re: Multithread Parameter Passing

Postby chjmartin2 » Sep 16, 2013 3:20

Ok... here is what I am doing, and the code that works but executes far, far too slow to be useful. I'm trying to encode an algorithm that does this:

Uses the 40x25 mode of the CGA adapter, and creates two screens for each frame on 30 fps video to create a 60 fps video to attempt to reproduce a bitmap image.

https://docs.google.com/file/d/0B4xJFZX ... sp=sharing

Take a look at the attached picture. I want to find the best match of 40x25 letters to produce a blended image that when played at 60 fps, will provide the illusion of great picture resolution. Take on the C64 flicker, applied to PC text mode. Anyway, I can do the math, I have the algorithm, but it requires a boatload of math.

Loop through all combinations of 511 letters (256 characters, both dark/light and light dark) compared to the downconverted source image into fixed interval luminance and dither.

Code: Select all

for x=1 to 160 step 4            ' 40 characters wide
     for y=1 to 100 step 4       ' 25 characters long
          for I=1 to 511             '0 to 255 light/black, 256 to 511 inverse - first character
              for j=1 to 511         '0 to 255 light/black, 256 to 511 inverse - second character
                 for p=0 to 3         'fix pixel to last pixel, 4x4 square
                      for q=0 to 3
                                ...  add up distances
                                                             
                      next q
                  next p
                 ... test if it is the lowest, if so, save it
            next j
         next i 

       .... identify as lowest
    next y
next x

for x=1 to 40                          '40 characters wide
      for y=1 to 25                   ' 25 characters long
           for a=0 to 15              'first foreground color
               for b=0 to 15          'first background color
                  for c=0 to 15            'second foreground color
                      for d=0 to 15        'second background color
                             ... test if it is the lowest, if so, save it
             
                       next d
                  next c
               next b
           next a
             .... indentify as lowest
      next y
next x


So, do that math...

First loop is 40 * 25 * 511 * 511 * 4 * 4 = 4.18 x 10^9
Second Loop is 50*25*15*15*15*15 *7*7 = 2.48 x 10^9

Ironically enough, it is 6.66 x 10^9 comparisons per image * 7,000 images = 4.66 x 10^10 difference calculations

Days of execution would not be enough to complete it - at least the way it is written...
Last edited by counting_pine on Sep 16, 2013 4:41, edited 1 time in total.
Reason: Put code inside [code] block
Pritchard
Posts: 5492
Joined: Sep 12, 2005 20:06
Location: Ohio, USA

Re: Multithread Parameter Passing

Postby Pritchard » Sep 16, 2013 3:50

Well, I'm not familiar with the problem space. The example code and images helped me understand more, but only a little more. The idea is to maybe understand this well enough to where some kind of optimizations can be made. Hopefully, the thing can be divided and conquered or approached entirely differently so that it doesn't require as much processing.

You know, if you can come up with a detailed enough problem description, there's a few people on the forums who are great with this kind of thing. Namely, Robert, fxm, and dodicat who I've seen posting frequently.

I'm not sure more threads will help. I mean, threading can improve things somewhat on multicore machines, but I think there's a limit to a processes' thread count. It's a real limit because threads take up memory. Hopefully, that won't be a huge issue if the calculations are fairly basic. As you've seen, however, we may need to allocate more RAM to accommodate a multithreaded approach.
Gonzo
Posts: 722
Joined: Dec 11, 2005 22:46

Re: Multithread Parameter Passing

Postby Gonzo » Sep 16, 2013 5:04

you can get theoretical speeds with lockless multithreading
i should know since i have that in my generator - generally 92-95% cpu usage during the entire operation, except compression and io stages (since im lazy)
but, you didnt define your problem well at all, so its not easy to help you

either way, i would first look to reducing your problems big-O by a few orders of magnitude
honestly, unless you have decisive proof that your algorithm requires that many iterations... :P

also, is your problem "embarassingly" parallell? (look it up on google)
if so, you have a finite size of work to do, lets say N jobs and T threads
on a ht dual core, you want 4 threads, and ht quad core 8 threads
create T queues, fill them evenly, start T threads, wait for all threads to end
they work from their own queues, on their own jobs and finish when they're done

edit: you might want to read this, and know it well
http://hbfs.wordpress.com/2009/04/14/ad ... thods-rle/
seeing as this is CGA screens there are limited colors, right? well build a palette and limit data = good cache usage = anywhere from 0x to 100x speedup depending on how lucky you are, more or less :)

also, this is interlacing/de-interlacing right? something people work on their whole lives and still find no good solution
and the size of the problem might mean you have to move into SSE territory (which is what it's made for anyways: media)
if you somehow manage to get your iterations down to a manageable size, you can probably get by using gcc's automatic sse code
build with gcc and remember that only -O3 enables vectorization by default (but use -Ofast -msse4.1 if you can)
good luck :)
MichaelW
Posts: 3500
Joined: May 16, 2006 22:34
Location: USA

Re: Multithread Parameter Passing

Postby MichaelW » Sep 16, 2013 8:37

Assuming that the computations can be done in parallel, unless you can run each thread on a separate core, the system overhead involved in managing the threads will likely eliminate any speed advantage.

The code below is an attempt to show the effect. I used the Windows API instead of the FreeBASIC threading functions because I needed more control over the threads. And I used the default -gen gas because I didn't want the compiler optimizing my delay loops by eliminating them :)

Code: Select all

''==============================================================================
#include "windows.bi"
''==============================================================================

#define LOOPS 100000000

''==============================================================================

function ThreadProc1( param as LPVOID ) as DWORD
    for i as longint = 1 to LOOPS
    next
    return 0
end function

''==============================================================================

function ThreadProc2( param as LPVOID ) as DWORD
    for i as longint = 1 to LOOPS
    next
    return 0
end function

''==============================================================================

dim as DWORD_PTR processAffinityMask, systemAffinityMask
dim as HANDLE hThread1, hThread2
dim as integer param
dim as double t

GetProcessAffinityMask( GetCurrentProcess(), _
                        @processAffinityMask, _
                        @systemAffinityMask )

''------------------------------------------------------------------
'' The system affinity mask is a bit vector which will have one set
'' bit for each of the processor cores configured into the system.
''------------------------------------------------------------------

print bin(systemAffinityMask)
print

''----------------------------------------------------------
'' Increase our process priority to minimize interruptions.
''----------------------------------------------------------

SetPriorityClass(GetCurrentProcess(), REALTIME_PRIORITY_CLASS )


hThread1 = CreateThread( NULL, 0, @ThreadProc1, @param, CREATE_SUSPENDED, NULL )

sleep 4000

t = timer
ResumeThread( hThread1 )
WaitForSingleObject( hThread1, INFINITE )
print using "###.####";timer-t
print

hThread1 = CreateThread( NULL, 0, @ThreadProc1, @param, CREATE_SUSPENDED, NULL )
hThread2 = CreateThread( NULL, 0, @ThreadProc2, @param, CREATE_SUSPENDED, NULL )

''--------------------------------------------------------------------------
'' If the system has more than one processor core, instead of assuming that
'' it will assign each of the test threads to a separate core, force it to.
''--------------------------------------------------------------------------

if systemAffinityMask > 1 then
    SetThreadAffinityMask( hThread1, 1 )
    SetThreadAffinityMask( hThread2, 2 )
end if

sleep 1000

t = timer
ResumeThread( hThread1 )
ResumeThread( hThread2 )
WaitForSingleObject( hThread1, INFINITE )
WaitForSingleObject( hThread2, INFINITE )
print using "###.####";timer-t

sleep

Running on my single-core P3, two threads took essentially twice as long to execute as one thread:

Code: Select all

1

  1.6003

  3.1989

Running on my P4 with HT, two threads took significantly less than twice as long to execute as one thread:

Code: Select all

11

  0.5171

  0.9154

I don’t have a processor with more than one physical core, but running on such a processor two threads should take approximately the same time to execute as one thread.
fxm
Posts: 8970
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Multithread Parameter Passing

Postby fxm » Sep 16, 2013 11:33

On my Portable Lenovo R400 with Intel Core 2 Duo P8400 / 2.26 GHz:

Code: Select all

11

  0.3508

  0.3632
fxm
Posts: 8970
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Multithread Parameter Passing

Postby fxm » Sep 16, 2013 13:59

A rustic program (from MichaelW's idea), but in pure FB, just to show the execution time depending on the number of threads:
(must be compiled without code optimization option)

Code: Select all

#define Loops 100000000
#define NbThreadMax 10

Type UDTthread
  Dim As Any Ptr pThread
  Declare Static Sub looping (Byval p As Any Ptr)
End Type
Static Sub UDTthread.looping (Byval p As Any Ptr)
  For I As Longint = 1 To Loops
  Next I
End Sub

Sub test (Byval n As Integer)
  Dim As Double t = Timer
  If n = 0 Then
    UDTthread.looping(0)
  Else
    Dim As UDTthread u(1 To n)
    For I As Integer = 1 To n
      u(I).pThread = Threadcreate(@UDTthread.looping, 0)
    Next I
    For I As Integer = 1 To n
      Threadwait(u(I).pThread)
    Next I
  End If
  t = Timer - t
'  Print n; Iif(n=0, " (1 main instead)", "                 "), Cast(Single, t),, Cast(Single, t / Iif(n=0, 1, n))
  Print n; *Iif(n=0, @" (1 main instead)", @"                 "), Cast(Single, t),, Cast(Single, t / Iif(n=0, 1, n))
End Sub

Print "Number of threads", "Total execution time", "Execution time per job"
For I As Integer = 0 To NbThreadMax
  Sleep 1000
  If Inkey = Chr(27) Then
    Exit For
  End If
  test(I)
Next I
Sleep

Code: Select all

Number of threads           Total execution time        Execution time per job
 0 (1 main instead)          0.349178                    0.349178
 1                           0.3547516                   0.3547516
 2                           0.3674124                   0.1837062
 3                           0.550747                    0.1835823
 4                           0.73337                     0.1833425
 5                           0.9144824                   0.1828965
 6                           1.109422                    0.1849036
 7                           1.285036                    0.1835766
 8                           1.480795                    0.1850993
 9                           1.66207                     0.1846744
 10                          1.825957                    0.1825957
So we can verify that my PC (Portable Lenovo R400 with Intel Core 2 Duo P8400 / 2.26 GHz) is effectively a dual core.
(Windows XP (build 2600, Service Pack 3))


[Edit]
- Program updated by adding another column corresponding to the execution time per job (one job being the execution of one UDTthread.looping()).
- One program line modified to be compatible with fbc versions previous the 0.90.0.
- Added a test on <Escape> to stop program.
Last edited by fxm on Aug 07, 2018 6:12, edited 17 times in total.
Drago
Posts: 115
Joined: Aug 10, 2005 13:15

Re: Multithread Parameter Passing

Postby Drago » Sep 16, 2013 15:02

Code: Select all

Number of threads           Total execution time
 0                           0.2670425144270396
 1                           0.2608125494236333
 2                           0.2730501391400182
 3                           0.3960836272065862
 4                           0.5325033760578322
 5                           0.5866755935998249
 6                           0.6975398327043791
 7                           0.8269798525575531
 8                           0.8904300958729934
 9                           0.9679041020676209
 10                          1.097555467048974


with edit-code

Code: Select all

Number of threads           Total execution time        Execution time per job
0 (1 main instead)           0.2607979                   0.2607979
1                            0.2608632                   0.2608632
2                            0.268719                    0.1343595
3                            0.3800643                   0.1266881
4                            0.5153718                   0.1288429
5                            0.5637876                   0.1127575
6                            0.6658098                   0.1109683
7                            0.765951                    0.1094216
8                            0.9278149                   0.1159769
9                            1.019476                    0.1132751
10                           1.091977                    0.1091977


iCore i3-2100 @3.10 GHz
Last edited by Drago on Sep 17, 2013 11:07, edited 1 time in total.
chjmartin2
Posts: 11
Joined: Sep 15, 2013 18:57

Re: Multithread Parameter Passing

Postby chjmartin2 » Sep 16, 2013 15:57

Please clarify: In the total execution time, you also get the benefit of 10X more work, or the same work using 0 threads versus 10 threads. We have two different total execution times depending on the system:

1 thread is 0.354 or 0.267
10 threads is 1.85 or 1.10

If the same amount of computations are done, then in both instances 10 threads are worse, but if 10X the work is done then you'd divide the execution time by 10 to get to the equivalent.

What is the valid comparison? .354/.267 versus 1.85/1.10 or .354/.267 versus .185/.110????
fxm
Posts: 8970
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Multithread Parameter Passing

Postby fxm » Sep 16, 2013 16:37

- Intel® Core™2 Duo Processor P8400
(3M Cache, 2.26 GHz, 1066 MHz FSB)
# of Cores: 2

- Intel® Core™ i3-2100 Processor
(3M Cache, 3.10 GHz)
# of Cores: 2
# of Threads: 4


Remark:
I updated my previous program by adding another column corresponding to the execution time per job (one job being the execution of one UDTthread.looping()).
chjmartin2
Posts: 11
Joined: Sep 15, 2013 18:57

Re: Multithread Parameter Passing

Postby chjmartin2 » Sep 17, 2013 1:14

I was so excited to test this code when I got home... I copied and pasted it and got this compilation error:

error 25: Invalid data types, before ',' in 'Print n & Iif(n=0, " (1 main instead)", " "), t,, Cast(Single, t / Iif(n=0, 1, n))'
chjmartin2
Posts: 11
Joined: Sep 15, 2013 18:57

Re: Multithread Parameter Passing

Postby chjmartin2 » Sep 17, 2013 1:22

On my Dual Processor Quad Core Xeon X5355 at 2.66 GHz running Windows 8 .... I get

Code: Select all

11111111

0.3108

0.3120

Return to “General”

Who is online

Users browsing this forum: David Watson and 3 guests