Windows Thread Pool

Windows specific questions.
rpkelly
Posts: 52
Joined: Sep 03, 2016 22:36

Windows Thread Pool

Post by rpkelly »

Does anybody have any experience using the windows thread pool api's, and, what are best practices?

1. CreateThreadpool
2. SetThreadpoolThreadMaximum
3. SetThreadpoolThreadMinimum
4. CreateThreadpoolWork
5. SubmitThreadpoolWork
6. CloseThreadpoolWork
7. CloseThreadPool
rpkelly
Posts: 52
Joined: Sep 03, 2016 22:36

Re: Windows Thread Pool

Post by rpkelly »

Not knowing much about things, I put this small snippet of code together and it seems to work with both 32 and 64 bit compiles.

Code: Select all

const _WIN32_WINNT = &h0602
#INCLUDE ONCE "windows.bi"

declare sub myThread (byval Instance as PTP_CALLBACK_INSTANCE, byval Context as PVOID, byval Work as PTP_WORK)
dim shared lpCriticalSection as CRITICAL_SECTION

dim iError            as long
dim ptpp              as PTP_POOL
dim reserved          as PVOID
dim ucbe              as TP_CALLBACK_ENVIRON
dim cbe               as PTP_CALLBACK_ENVIRON
dim Work              as PTP_WORK
dim iIndex            as integer

cbe = cast(PTP_CALLBACK_ENVIRON,varptr(ucbe))
InitializeCriticalSection(ByVal VarPtr(lpCriticalSection))

ptpp = CreateThreadpool(reserved)
iError = GetLastError()

if ptpp = 0 THEN

print "CreateThreadPool failed,error=" + str(iError)

      Print "press q to quit"
Do
     Sleep 1, 1
Loop Until Inkey = "q"   
      
      END
   
END IF

print "CreateThreadPool successful..."

TpInitializeCallbackEnviron(cbe)
TpSetCallbackLongFunction(cbe)

for iIndex = 1 to 3

Work = CreateThreadpoolWork(cast(PTP_WORK_CALLBACK,@myThread),cast(PVOID,iIndex),cbe)
SubmitThreadpoolWork(Work)
CloseThreadpoolWork(Work)
   
NEXT

sleep 1000,1

TpDestroyCallbackEnviron(cbe)
CloseThreadpool(ptpp)
DeleteCriticalSection(ByVal VarPtr(lpCriticalSection))

      Print "press q to quit"
Do
     Sleep 1, 1
Loop Until Inkey = "q"

end

sub myThread (byval Instance as PTP_CALLBACK_INSTANCE, byval Context as PVOID, byval Work as PTP_WORK)

EnterCriticalSection(ByVal VarPtr(lpCriticalSection))

print "Thread started,Instance=" + str(Instance) + ",Context=" + str(Context)

LeaveCriticalSection(ByVal VarPtr(lpCriticalSection))

end sub
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Windows Thread Pool

Post by deltarho[1859] »

Does anybody have any experience using the windows thread pool api's, and, what are best practices?
Good question - never heard of them. I have used threads quite a lot and learnt that they are not cheap to create. With AES-HMAC I created a secondary thread of execution for the HMAC on decryption but only if the file being processed exceeded 1MB otherwise the object was defeated.

From that you will have gathered that what I know about thread pooling can be put on the back of a postage stamp. Anyway it looked interesting enough to put the kettle on. Thread pooling in Windows XP was poor and saw a revamp in Windows Vista. The code above requires Windows Vista and later.

First thing, CloseThreadpoolWork() is nothing like Threaddetach which releases a thread handle without waiting for the thread to finish. CloseThreadpoolWork releases the specified work object. In other words it kills the thread. I added some work to myThread but it was not being done.

Secondly, if we comment the 'Sleep 5000,1' we go flying into the cleanup section prematurely and get a 'stopped working' message from the system. We dont have a WaitForMultipleObjects as with thread creation but we do have WaitForThreadpoolWorkCallbacks and we should use that before the cleanup section and/or before any further calls to SubmitThreadpoolWork using the same TP_WORK structure.

Reading a few blogs where folks mentioned how expensive creating threads can be and that can be overcome by thread pooling resulted in the following code; adapted from rpkelly's code above. Are you Rick Kelly from PB? From MSDN's definition of the first parameter of CreateThreadpoolWork it occurred to me that when a work object had 'done it's bit' rather than create another thread we simply 'Submit' it again. With my AES-HMAC a 100MB file gets processed using 400 x 256KB buffers. That is 400 thread creations. The AES is quicker than HMAC-SHA256 so I had to wait for the HMAC before the next AES. It was worthwhile because the AES was effectively being done for 'free'. I am now wondering what it would be like with 400 x submissions instead of 400 thread creations. <big smile>

In the following only one CreateThreadpoolWork is employed but the work is submitted a couple of times. No doubt the code is an absolute disgrace but I am walking in front with a red flag.<laugh>

The primary thread has two 'Sleep 5000,1' and the secondary thread polls Timer for three seconds.

This is whatI get:

Code: Select all

CreateThreadPool successful...
 
Do some work 22:29:08
 
Thread started,Instance=12189332,Context=1
22:29:08
3.00 Finished secondary work at 22:29:11
 
Finished primary work at 22:29:13
 
Do more work 22:29:13
 
Thread started,Instance=12189332,Context=1
22:29:13
3.01 Finished secondary work at 22:29:16
 
Finished primary work at 22:29:18
 
press q to quit
Effectively, we have a 'straight' 10 second run in the primary thread. The secondary threads are encapsulated giving us a free lunch - the full benefit of threading. Reversing the times and we get the primary work being encapsulated by the secondary threads.

I think that there is more to the cleanup section so we may have a leak - more reading required.

There are a handful of applications which can benefit from using a thread pool one of which is:
"An application that creates and destroys a large number of threads that each run for a short time. Using the thread pool can reduce the complexity of thread management and the overhead involved in thread creation and destruction."

All of my work using threads fall into that category. My A fast CPRNG is one such. It is very fast as is - I will go back to that to see what 'pooling' will do.

With regard best practices there are a 'pile' of them.

This is a 'bruiser' of a subject.

Code: Select all

const _WIN32_WINNT = &h0602
#INCLUDE ONCE "windows.bi"
 
declare sub myThread (byval Instance as PTP_CALLBACK_INSTANCE, byval Context as PVOID, byval Work as PTP_WORK)
dim shared lpCriticalSection as CRITICAL_SECTION
dim iError            as long
dim ptpp              as PTP_POOL
dim reserved          as PVOID
dim ucbe              as TP_CALLBACK_ENVIRON
dim cbe               as PTP_CALLBACK_ENVIRON
dim Work              as PTP_WORK
dim iIndex            as integer
 
cbe = cast(PTP_CALLBACK_ENVIRON,varptr(ucbe))
InitializeCriticalSection(ByVal VarPtr(lpCriticalSection))
 
ptpp = CreateThreadpool(reserved)
iError = GetLastError()
 
if ptpp = 0 THEN
  ? "CreateThreadPool failed,error=" + str(iError)
  ? "press q to quit"
  Do
    Sleep 1, 1
  Loop Until Inkey = "q"
  END
END IF
 
? "CreateThreadPool successful..."
 
TpInitializeCallbackEnviron(cbe)
TpSetCallbackLongFunction(cbe)
 
Work = CreateThreadpoolWork(cast(PTP_WORK_CALLBACK,@myThread),cast(PVOID,1),cbe) '
 
SubmitThreadpoolWork(Work) ' First outing
 
EnterCriticalSection(ByVal VarPtr(lpCriticalSection))
? : ? "Do some work ";Time
LeaveCriticalSection(ByVal VarPtr(lpCriticalSection))
Sleep 5000, 1
? : ? "Finished primary work at ";Time
WaitForThreadpoolWorkCallbacks(Work,FALSE)
 
SubmitThreadpoolWork(Work) ' Second outing
 
EnterCriticalSection(ByVal VarPtr(lpCriticalSection))
? : ? "Do more work ";Time
LeaveCriticalSection(ByVal VarPtr(lpCriticalSection))
Sleep 5000, 1
? : ? "Finished primary work at ";Time
WaitForThreadpoolWorkCallbacks(Work,FALSE)
 
TpDestroyCallbackEnviron(cbe)
CloseThreadpool(ptpp)
DeleteCriticalSection(ByVal VarPtr(lpCriticalSection))
 
? : ? "press q to quit"
Do
  Sleep 1, 1
Loop Until Inkey = "q"
 
sub myThread (byval Instance as PTP_CALLBACK_INSTANCE, byval Context as PVOID, byval Work as PTP_WORK)
Dim As Double t, done
 
EnterCriticalSection(ByVal VarPtr(lpCriticalSection))
? : ? "Thread started,Instance=" + str(Instance) + ",Context=" + str(Context)
? Time
LeaveCriticalSection(ByVal VarPtr(lpCriticalSection))
 
t = timer
do
  sleep 1,1
  done = timer - t
loop Until done >= 3
 
EnterCriticalSection(ByVal VarPtr(lpCriticalSection))
? Using "#.##"; done;
? " Finished secondary work at ";Time
LeaveCriticalSection(ByVal VarPtr(lpCriticalSection))
 
end sub
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Windows Thread Pool

Post by deltarho[1859] »

With regard CloseThreadPoolWork MSDN says "If there is a cleanup group associated with the work object, it is not necessary to call this function; calling the CloseThreadpoolCleanupGroupMembers function releases the work, wait, and timer objects associated with the cleanup group." Since we are only using a work object the CloseThreadpoolCleanupGroupMembers seems to me to be overkill. So we should use CloseThreadPoolWork. I reckon just before TpDestroyCallbackEnviron(cbe).
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Windows Thread Pool

Post by deltarho[1859] »

and/or before any further calls to SubmitThreadpoolWork using the same TP_WORK structure.
This is not true. I don't see the point of 2 X SubmitThreadpoolWork(Work) but if we did then we get two instance IDs with the two threads running in parallel. Of course we can have umpteen instances of SubmitThreadpoolWork but with a differing second parameter: "Optional application-defined data to pass to the callback function."

With regard the code above it seems to me that both TpInitializeCallbackEnviron(cbe) and TpSetCallbackLongFunction(cbe) are redundant. There are reasons for our getting involved in defining a callback environment but if they do not exist then we can use Null for the third parameter of CreateThreadpoolWork. With TpSetCallbackLongFunction "The thread pool may use this information to better determine when a new thread should be created.". So far my imagination is being stifled by a lack of knowledge and I have not gone beyond my being in control of when a new thread should be created.

Taking the above into account and a tidy up my code above reduces to:

Code: Select all

const _WIN32_WINNT = &h0602
#INCLUDE ONCE "windows.bi"
 
declare sub myThread (byval Instance as PTP_CALLBACK_INSTANCE, byval Context as PVOID, byval Work as PTP_WORK)
 
dim shared lpCriticalSection as CRITICAL_SECTION
dim iError            as long
dim Pool              as PTP_POOL
dim Work              as PTP_WORK

InitializeCriticalSection(ByVal VarPtr(lpCriticalSection))
 
Pool = CreateThreadpool(Null)
iError = GetLastError()
 
if Pool = 0 THEN
  DeleteCriticalSection(ByVal VarPtr(lpCriticalSection))
  ? "CreateThreadPool failed,error=" + str(iError)
  ? "press q to quit"
  Do
    Sleep 1, 1
  Loop Until Inkey = "q"
  END
END IF
 
? "CreateThreadPool successful..."
 
Work = CreateThreadpoolWork(cast(PTP_WORK_CALLBACK,@myThread),cast(PVOID,1), Null)
 
SubmitThreadpoolWork(Work) ' First outing
 
EnterCriticalSection(ByVal VarPtr(lpCriticalSection))
? : ? "Do some work ";Time
LeaveCriticalSection(ByVal VarPtr(lpCriticalSection))
Sleep 3000, 1
? : ? "Finished primary work at ";Time
WaitForThreadpoolWorkCallbacks(Work,FALSE)
 
SubmitThreadpoolWork(Work) ' Second outing
 
EnterCriticalSection(ByVal VarPtr(lpCriticalSection))
? : ? "Do more work ";Time
LeaveCriticalSection(ByVal VarPtr(lpCriticalSection))
Sleep 3000, 1
? : ? "Finished primary work at ";Time
WaitForThreadpoolWorkCallbacks(Work,FALSE)
 
CloseThreadpoolWork(Work)
CloseThreadpool(Pool)
DeleteCriticalSection(ByVal VarPtr(lpCriticalSection))
 
? : ? "press q to quit"
Do
  Sleep 1, 1
Loop Until Inkey = "q"
 
sub myThread (byval Instance as PTP_CALLBACK_INSTANCE, byval Context as PVOID, byval Work as PTP_WORK)
Dim As Double t, done
 
EnterCriticalSection(ByVal VarPtr(lpCriticalSection))
? : ? "Thread started,Instance=" + str(Instance) + ",Context=" + str(Context)
? Time
LeaveCriticalSection(ByVal VarPtr(lpCriticalSection))
 
t = timer
do
  sleep 1,1
  done = timer - t
loop Until done >= 3
 
EnterCriticalSection(ByVal VarPtr(lpCriticalSection))
? Using "#.##"; done;
? " Finished secondary work at ";Time
LeaveCriticalSection(ByVal VarPtr(lpCriticalSection))
 
end sub
Here are some links if rpkelly has whetted your appetite for thread pooling. If you are already au fait with thread pooling then please give my code a 'good kicking'.

Thread Pool API
Thread Pools
Understanding Thread Pool Enhancements
Developing with Thread Pool Enhancements
Using the Thread Pool Functions
rpkelly
Posts: 52
Joined: Sep 03, 2016 22:36

Re: Windows Thread Pool

Post by rpkelly »

I'm the PB guy having transitioned to FB. I'm developing SQLite Client/Server classes when the whole threadpool api set came into view.

See my take at:

https://github.com/breacsealgaire/FreeB ... Lite-Class

What I found out is that I can have a thread cleanup group without a callback and then CloseThreadpoolCleanupGroupMembers function blocks until all currently executing callback functions finish which was important to allow all outstanding SQLite threads to finish before shutting down the server.

My choices in implementing the thread pool api's were, of course, oriented to SQLite and a multithreaded server.

As I learn more about thread pools, I'm made to think that I should do all my threading this way.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Windows Thread Pool

Post by deltarho[1859] »

Hi Rick
See my take at:
Wow, that is well outside of my comfort zone. <smile>
As I learn more about thread pools, I'm made to think that I should do all my threading this way.
That is the impression that I am getting from all the blogs that I have read.

I should think that I will keep life simple by concentrating on work objects.

David Roberts
rpkelly
Posts: 52
Joined: Sep 03, 2016 22:36

Re: Windows Thread Pool

Post by rpkelly »

Grab my class and sample script and with your work flow. I'd be interested in seeing if it holds up.

Welcome to the 64 bit world...
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Windows Thread Pool

Post by deltarho[1859] »

As the cCTServerThreadPool.bi stands I reckon that you don't need

Code: Select all

TpInitializeCallbackEnviron(This.cbe)
TpSetCallbackLongFunction(This.cbe)
and, therefore, cbe.
Welcome to the 64 bit world...
I will be 70 bits in December but I stopped counting at 32 bits.
rpkelly
Posts: 52
Joined: Sep 03, 2016 22:36

Re: Windows Thread Pool

Post by rpkelly »

You may be correct. I only included the environment api's since I thought threads working with SQLite would be thought of as "long". I have a lot of testing remaining and I'll find out when I throw a few hundred connections at my server class as fast as I can create them.

70 bits just means you have to keep smiling while you have most of your teeth....:-)
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Windows Thread Pool

Post by deltarho[1859] »

You may be correct.
You may be correct as well. I have found some answers without being able to determine what the question was - bit like being handed a pair of oars without knowing what a rowing boat is. <smile> The secret, of course, is to just keep reading until the penny drops. It helps if there are several sources of reading - it is less helpful when the source is dominated by MSDN.
I have a lot of testing remaining and I'll find out when I throw a few hundred connections at my server class as fast as I can create them.
That is often the only way to go. Throw a brick at our code and see what happens.
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Windows Thread Pool

Post by deltarho[1859] »

I have just finished CryptoRndBufferII which uses thread pooling as opposed to thread creation. The coding was not as easy - I had to use four work objects.

I expected the exhaustion stutter to be less. The exhaustion stutter occurs when a buffer exhausts before the other buffer has filled. The worst case scenario is when we request random numbers and nothing else ie flat out. I reckoned that the throughput may increase but only marginally.

Here is a comparison.

CryptoRndBuffer
Requested 10485760
BufferSize 131072 NumberOfBuffers 80
67.358ms Time to crunch
155 Million per second
Stutter 0.5871898950504352

CryptoRndBufferII
Requested 10485760
BufferSize 131072 NumberOfBuffers 80
36.103ms Time to crunch
290 Million per second
Stutter 0.1915569701375841

I have been using a buffer size of 128KB to keep the worst case stutter at a manageable level - the larger the buffer the greater the exhaustion stutter. The new version has a stutter down to a third of the original version.

The shock result is the throughput - pushing twice as fast. CRB was already faster than FB's option 2 generator, CMC, but is now leaving that standing. FB's Mersenne Twister comes in at 85 Million per second. It is worth remembering that we are talking about a CPRNG here and not a PRNG.

Needless to say much testing is required and a 1TB PractRand run is a must to make sure nothing untoward is happening.

All my ThreadCreate applications are now shaking in their boots.<smile>

Thanks, Rick.
rpkelly
Posts: 52
Joined: Sep 03, 2016 22:36

Re: Windows Thread Pool

Post by rpkelly »

Wow! Thanks for the feedback. The overhead for creating threads is higher than I had imagined and yours is the first proof that thread pools have their place. I'm looking forward towards throwing that proverbial brick wall at my SQLite server class. I have a connection pool class that manages the SQLite connection handles on a check out basis to keep open's to a minimum for much the same reason as using a thread pool.

Keep us posted on your journey.

Rick
deltarho[1859]
Posts: 4292
Joined: Jan 02, 2017 0:34
Location: UK
Contact:

Re: Windows Thread Pool

Post by deltarho[1859] »

Keep us posted on your journey.
OK, here is some breaking news. <smile>

This was begging to be done. The following compares 100,000 SubmitThreadpoolWork with 100,000 ThreadCreate both using threads which do absolutely nothing.

The test was done five times - single tests is a bad practice which many folk do.

In comparison with SubmitThreadpoolWork the overhead for Threadcreate is enormous.

As Microsoft says "Using the thread pool can reduce the complexity of thread management and the overhead involved in thread creation and destruction."

Boy, did they get that right. We are talking 10 microseconds compared with 250 microseconds.

Given two heavy duty tasks which can be executed in parallel but only once then 250 microseconds is neither here nor there and ThreadCreate is easier to implement. However, if done many more times than once, and not necessarily heavy duty, then we have a very different ball game. All of my thread work falls into the latter case and it has been worthwhile. The mind boggles at what thread pooling can do for them.

Results ( in seconds ):

Code: Select all

 1.020808976953255
 27.28130401556713
 
 0.9288334450974496
 19.20958011653508
 
 1.27032038656305
 26.62135063937834
 
 1.228162935241453
 27.38241969856165
 
 0.864315230326298
 26.46788907886365
Code used:

Code: Select all

Const _WIN32_WINNT = &h0602
#INCLUDE ONCE "windows.bi"

Declare Sub myThread1(As PTP_CALLBACK_INSTANCE, As PVOID, As PTP_WORK)
Declare Sub myThread2( As Any Ptr )

Dim As Long i, j
Dim Pool As PTP_POOL
Dim Work As PTP_WORK
Dim As Any Ptr hThread, x
Dim t As Double

Pool = CreateThreadpool(Null)
Work = CreateThreadpoolWork(Cast(PTP_WORK_CALLBACK,@myThread1),Cast(PVOID,1), Null)

For j = 1 To 5
  
  t = Timer
  For i = 1 To 100000
    SubmitThreadpoolWork(Work)
    WaitForThreadpoolWorkCallbacks(Work,FALSE)
  Next
  Print Timer - t

  t = Timer
  For i = 1 To 100000
    hThread = Threadcreate( @myThread2, x  )
    Threadwait( hThread )
  Next
  Print Timer - t
  
  Print
Next

CloseThreadpoolWork(Work)
CloseThreadpool(Pool)

Sleep

Sub myThread1(Byval Instance As PTP_CALLBACK_INSTANCE, Byval Context As PVOID, Byval Work As PTP_WORK)
'Do nothing
End Sub

Sub myThread2( Byval x As Any Ptr )
' Do nothing  
End Sub
rpkelly
Posts: 52
Joined: Sep 03, 2016 22:36

Re: Windows Thread Pool

Post by rpkelly »

Your comparison results are similar to mine although I only used 1000 threads.

Links I found useful for tuning of thread pools is at:

https://blogs.msdn.microsoft.com/pedram ... ol-thread/

http://www.thejoyofcode.com/Tuning_the_ThreadPool.aspx
Post Reply