Original thread topic: Using Framebased movement with Delta-timing by relsoft
1000101 » Nov 16, 2012 8:04 wrote:Re: Using Framebased movement with Delta-timing
The method I use is aggressive multi-threading for each sub-system (renderer, logic (physics, input processing), input, network, sound, etc). Each sub-system can then run at it's own resolution and the sub-systems use a tagged FIFO queue to communicate to each other. This allows for each sub-system to not only be more responsive but also be minimally affected by a latency of any other sub-system.
There are some drawbacks however. Namely, to use this method it requires greater knowledge and more careful design to implement successfully.
This is a preferable method, imo, over a single-threaded program which relies the computing power of a single resource (ie: core frequency and logic). Since each sub-system takes a fraction of the total frequency to run, the operating system can load balance the threads for maximum efficiency across all available cores. Since the number of cores is increasing faster now than the speed and ability of any single core, the effect is that you have cores * frequency - overhead performance gain. Since the program overhead is minimal (every sub-system must read it's queue each "loop cycle", sub-systems must add to a queue to communicate) and the OS overhead is also small (thread switching), the benefits far out-weigh the additional complexity of the system. Further, as the number of processor cores increase, the responsiveness of each sub-system potentially increases.
Several months ago I posted libraries which are designed for exactly this. Check this forum thread for my code.
Gonzo » Nov 16, 2012 12:13 wrote:Re: Using Framebased movement with Delta-timing
personally i use the single responsibility principle
if you do that, you can use round-robins on the "other end" without any locks or queues
the drawback (there is always one), is that you have to iterate the entire array on the "other end"
if your queues are small this is a non-issue.. given that the queue has an .alive member its a fast loop
you'd have to have many checks in place to avoid unfavorable conditions, but if your objects have states that are well defined,
your engine can handle anything that comes its way accordingly
i imagine the last part goes for any type of multithreaded system, anyways
the main point is to always avoid locks, not because they give you less headache, but because they force the OS to not release your thread until the lock is freed
and yes, single-threaded is not the way to go :P unless you are making a demo...
i reckon most people have at least 4 cores now.. with intel that's 8 (because of "hyperthreading")
note: i haven't made a lockfree solution for work-threads yet.. that is going to be a major headache
i have only made it for transfer of data between physics and render thread in various ways
1000101 » Nov 16, 2012 19:36 wrote:Re: Using Framebased movement with Delta-timing
There are lockless solutions available to problems, but as gonzo stated, they are a major headache to design and implement. Anywhere you have a situation of write once, read many (for example) is a good place for lockless algorithms (just create data before the readers start). Many times however, data can not be accessed in such a free manner and can not be lockless or require some form of guard to validate data. My tagged queue is lockless but uses the cmpxchg8b instruction when adding and removing from the queue to validate data. This may result in a small "spin" to re-establish the data in the correct manner but is still faster than a lock since it is highly unlikely that multiple cores will be modifying the exact same memory address (head/tail of the queue) at the exact same time which is the only way the function will spin. In order to protect this, the code uses hardware level verification of data (cmpxcg8b) when updating the queue.
No matter how you do handle it, the real solution to modern computing problems lies in taking advantage of modern technology. Trying to "carry forward" older inefficient paradigms is a lost cause, just like none of us use q[uick]basic (note the two are different) anymore as we move forward we need to stop trying to use the solutions we used which no longer fit the new paradigm. Some algorithms and techniques are timeless or are irrespective of the technology used but most can be modernized or redesigned to face new challenges and take advantage of new methods and technologies.
I think gonzo won't argue with me that multi-threading can greatly help complex systems which are really sets of smaller systems which need to work together. As such by decoupling the systems and creating communication pathways (regardless of implementation) is a more efficient method overall for modern hardware then single-threaded brute force methods and workarounds.
I do have a nitpick though. Gonzo stated that the OS will not release a thread while there is an active lock. This is categorically not true. Locks are a function of program logic. The OS only knows you have a lock established, not it's purpose and intent. The OS will task-switch no matter how many locks are active in a thread. The task scheduler will not activate other threads which are waiting to aquire an active lock but it does not prevent the task scheduler from switching out of a thread with active locks. This can be easily demonstrated by the following code:
Code: Select all
Dim Shared As Any Ptr mutex Dim As Any Ptr threads( 0 To 2 ) Sub thread_0( ByVal foo As Any Ptr ) Print "Thread 0: Aquiring lock" MutexLock( mutex ) Print "Thread 0: Sleeping for a long time" Sleep 10000, 1 Print "Thread 0: Releasing lock" MutexUnLock( mutex ) Print "Thread 0: Terminating" End Sub Sub thread_1( ByVal foo As Any Ptr ) Print "Thread 1: Aquiring lock" MutexLock( mutex ) Print "Thread 1: Sleeping for a long time" Sleep 10000, 1 Print "Thread 1: Releasing lock" MutexUnLock( mutex ) Print "Thread 1: Terminating" End Sub Sub thread_2( ByVal foo As Any Ptr ) Print "Thread 2: No lock, now a silly loop" For i As Integer = 0 To 100000 If( ( i Mod 123 ) = 0 )Then Print "Thread 2: I'm going hard!" EndIf Next Print "Thread 2: Terminating" End Sub Print "Main: Creating mutex" mutex = MutexCreate() Print "Main: Spawning child theads" threads( 0 ) = ThreadCreate( @thread_0 ) threads( 1 ) = ThreadCreate( @thread_1 ) threads( 2 ) = ThreadCreate( @thread_2 ) ThreadWait( threads( 0 ) ) ThreadWait( threads( 1 ) ) ThreadWait( threads( 2 ) ) Print "Main: Destroying mutex" MutexDestroy( mutex ) Print "Main: Terminating" Sleep End
Please keep your posts relevant to the topic (which is intentionally vague).Gonzo » Nov 16, 2012 20:03 wrote:Re: Using Framebased movement with Delta-timing
yes, you may be right.. it doesnt make sense for the OS not to give time to other processes
still, as a programmer your concern should be within your own program (and that was what i meant)
if it seemed i didnt use multithreading - i am.. way too many threads =)
1 for render, 1 for everything else (physics etc.), up to 8 work threads, threads for networking and sound.. and im sure the graphics driver are using quite a few as well
my main reason for using up to 8 work threads is because the work i'm doing takes an extremely long time
but not enough time to make the physics etc. thread slow, it's enough for let's say 1 round
there are however at least 4 cores that aren't doing anything, or enough time in average to do work at least 4-8 times in parallell
if i had any concerns about my setup - it's that i'm not allowed to say which threads uses what core
i must ensure that the rendering thread is all alone on one core, but thats impossible, isnt it?
the problem really just boils down to that im rendering way too heavily (something ive tried for 2 years to rectify, but it doesnt get likelier for each day/month that passes)
and the other is the work threads doing too much and consuming too much time
this is something i can fix by splitting the work into passes, something im confident i can do in time
in any case, the most effective work thread scheduling i have used yet is simply using conditionals (signals)
the control part of my thread scheduler is lockless, which is a pain in the ass, because the control part is not the receiving end
it's a convoluted system because it's parallellized 3-ways :) physics -> work -> render
the render thread can't wait for anything - ever, so there's some hardcore stuff in there that no coder should have to put up with :P
it basically boils down to taking over memory..
once the controller has scheduled a thread, and that thread is finished, the rendering thread immediately takes over the pointers to the memory in question, and thats it.. the rendering thread may choose, or not choose to do anything with it, depending on how stressed it is
switching pointers is something ive had to come up with as a necessity for survival :P more or less
in closing.. i wonder if i create the threads in a very specific order, do you think the OS will round-robin on the available cores?
i hope so, it's something i would consider trying out
Edit: Fixed url link in quote