Optimising execution speed

fzabkar · Post by **fzabkar** » Dec 30, 2024 22:59

Is there a resource that provides general tips and tricks for optimising the execution speed of FreeBasic code?

paul doe · Post by **paul doe** » Dec 31, 2024 4:08

fzabkar wrote: ↑Dec 30, 2024 22:59 Is there a resource that provides general tips and tricks for optimising the execution speed of FreeBasic code?

Not really, because optimizations are very dependent on the underlying program. Some obvious tips off the top of my head:

Precompute as much as you can
Avoid virtual calls as much as possible, especially in tight loops
Pick good algorithms for the task at hand
Complement of the above: pick the right data structures for the right task (ie don't use a linked list if you need to iterate over many elements often)
Not so obvious, especially for beginners: try to split your code in a logic, organized way (ie favor subs and functions over shared variables). This is because a sub/function provides an abstraction layer, which you can then profile and optimize separately if needed
Old optimization tricks (ie loop unrolling) don't work as well as they used to, because processors nowadays are quite different from the early days. Organize your data, especially data you need to iterate often, sequentially in memory (ie use arrays). This has a huge impact on performance, especially if you have data that can be processed linearly
Try to minimize allocations, especially if they're small and happen often (ie a game loop). Instead, preallocate as much as possible and work within those limits (or only reallocate if you absolutely need to)

Check out Data Oriented Design techniques for info on some good ways you can optimize performance for modern processors.

fzabkar · Post by **fzabkar** » Dec 31, 2024 7:32

Some of the terms are unfamiliar to me (eg loop unrolling and virtual calls), but at least one of your tips has paid off so far. I think it was Wikipedia's article on loop unrolling that advised against relying too much on pointer arithmetic, and I've now cleaned up that mess in my code.

Thanks, and happy new year.

marcov · Post by **marcov** » Dec 31, 2024 15:59

Virtual calls have some minor overhead, but basically it is a indirect call instead of a direct one.

Are they really the first thing to optimise these days?

paul doe · Post by **paul doe** » Jan 01, 2025 0:20

marcov wrote: ↑Dec 31, 2024 15:59 Virtual calls have some minor overhead, but basically it is a indirect call instead of a direct one.

Are they really the first thing to optimise these days?

Mostly on tight loops (ie you need to iterate over lots of entities in, say, a game). I've measured the overhead of a virtual call in FreeBasic: about 22%. So depending on usage, yes, it can add up to a lot.

The first thing to optimize these days is data, not so much code. Compilers are pretty efficient already and, unless you're really experienced and need to squeeze every last drop of performance, just arranging data in a cache-coherent fashion and processing things linearly, you can get significant speed improvements, but does require practice (it's basically a different paradigm)

oyster · Post by **oyster** » Jan 01, 2025 4:39

it is a waste of time to think any optimization before any working program comes out.

fzabkar · Post by **fzabkar** » Jan 01, 2025 11:17

oyster wrote: ↑Jan 01, 2025 4:39 it is a waste of time to think any optimization before any working program comes out.

I recently wrote a program to scan each sector in a dd image file of an 80GB HDD, looking for 1 of 17 5-byte signatures. I noticed that the 5th byte of each signature was always 0 or 1, so I tested this byte separately before the others. I consider this to be optimisation, and in this case it was best done at the outset.

On a related matter, I used a series of Select Case statements to perform the comparisons, but I could also have used If Then ElseIf statements. Assuming there are no constants to be tested, is there any reason, other than neatness, to use Select Case rather than If?

marcov · Post by **marcov** » Jan 01, 2025 12:51

paul doe wrote: ↑Jan 01, 2025 0:20
marcov wrote: ↑Dec 31, 2024 15:59 Virtual calls have some minor overhead, but basically it is a indirect call instead of a direct one.

Are they really the first thing to optimise these days?
Mostly on tight loops (ie you need to iterate over lots of entities in, say, a game). I've measured the overhead of a virtual call in FreeBasic: about 22%. So depending on usage, yes, it can add up to a lot.

I assume the indirection frustrates branch prediction, though even then the second time it could do it speculatively. It is possible to make a micro benchmark out of about everything. If all you do is put a indirection in a loop, yes, then the indirection is measurable. But I wonder it is worthwhile to focus on it as part of the first round of optimisation.

Some compilers support devirtualisation as part of their global optimisations.

The first thing to optimize these days is data, not so much code. Compilers are pretty efficient already and, unless you're really experienced and need to squeeze every last drop of performance, just arranging data in a cache-coherent fashion and processing things linearly, you can get significant speed improvements, but does require practice (it's basically a different paradigm)

Cache coherency is a strategy to keep multiple caches coherent (to avoid dirty reads etc) ? Don't see the relevance for programming.

But optimising for cache can be important. I'm in image processing, so you don't have to tell me. Btw I had quite some success with Loop tiling as strategy.

But usually in such instance you already identified the rate determining step, and identified it is really calculation (rather than some busy waiting) that is the problem

paul doe · Post by **paul doe** » Jan 01, 2025 14:01

marcov wrote: ↑Jan 01, 2025 12:51 I assume the indirection frustrates branch prediction, though even then the second time it could do it speculatively. It is possible to make a micro benchmark out of about everything. If all you do is put a indirection in a loop, yes, then the indirection is measurable. But I wonder it is worthwhile to focus on it as part of the first round of optimisation.

Some compilers support devirtualisation as part of their global optimisations.

Nono, we're talking a real program here that had to manage several thousand entities at the same time. Now, for something simple and/or not many calls per loop then yes, the performance loss is irrelevant. But yeah, it messes with branch prediction (not to mention having to perform a lookup vs just the call)

Cache coherency is a strategy to keep multiple caches coherent (to avoid dirty reads etc) ? Don't see the relevance for programming.

But optimising for cache can be important. I'm in image processing, so you don't have to tell me. Btw I had quite some success with Loop tiling as strategy.

'Cache coherency' in the sense that everything that needs to be processed together is kept tightly packed in memory as much as possible. Think the way you arrange and pass data to GPUs, but for everything else as well (like in Entity-Component-System schemes). For images this is easy (all the pixels are packed together sequentially), but when you're dealing with a state that can change completely from one frame to the next it becomes really challenging.

This is perhaps the most immediately observable optimization one can make. To convince me back at the time when I read about the concept, I coded a very simple simulation: one using virtual calls and the 'usual' patterns (ie an entity->update() virtual call) vs a simple procedural approach (with the proper data arrangements of course). In this particular case, the 'simple' approach was 17 times faster than the virtual call approach. Of course, it was just for this test, not a real app, but served me to change the way I approach certain tasks (ie use virtual calls and OOP for coarse, architectural layers; and use simple arrays and linear iterations for everything else).

I've read the book about Data Oriented Design and I've found it really good, but I usually don't overdo it if I don't have to. I'm lazy, you know, and for me the best optimization approach will always be 'don't write more code than absolutely necessary to accomplish the task'

marcov · Post by **marcov** » Jan 01, 2025 21:04

paul doe wrote: ↑Jan 01, 2025 14:01 Nono, we're talking a real program here that had to manage several thousand entities at the same time. Now, for something simple and/or not many calls per loop then yes, the performance loss is irrelevant. But yeah, it messes with branch prediction (not to mention having to perform a lookup vs just the call)

So what kind of program? A program that built a heterogeneous tree like a compiler?

'Cache coherency' in the sense that everything that needs to be processed together is kept tightly packed in memory as much as possible.

It is not. That is cache optimisation. Cache coherency is mostly a silicon thing synchronising between multiple caches and a few rules for programmers to not violate the model. The model is to make multiple memory systems with multiple caches seem like a single logical memory system.

As both Cache coherency and optimisation are like, euuugh, cache related, they work with cache line granularity. (as in a write from multiple sources to the same cache line is a problem) But it is not the same thing. Coherency are the rules to follow so that memory doesn't get inadvertently corrupted, cache optimisation is about not unnecessarily hitting the (silicon part of) barriers that ensure cache coherency.

Also for cache optimisation, extreme memory packing is not always a beneficial strategy. If you know multiple CPUs access the two variables, you surely don't want them in the same cache line, as that will invalidate multiple caches and reduce speed.

Think the way you arrange and pass data to GPUs, but for everything else as well (like in Entity-Component-System schemes).

I usually pass data as PBO's to GPUs. But I assume you mean exactly the opposite of ECS, not having references for everything but folding it into the game objects as much as possible. Those are tricks we did back in the game, when I still did LPC MUD work on P-I systems, yes.

But I still consider that an extreme case, to be only pursued after all other avenues have been exhausted.

For images this is easy (all the pixels are packed together sequentially), but when you're dealing with a state that can change completely from one frame to the next it becomes really challenging.

In general the rule is that every operations need to reduce the size of the (intermediate) result as much as possible. The newer processors got, the more processing was allowed to reach that goal. In the past that was mostly fixed point stuff in SSE registers (as floating point (singles) would decrease SIMD granularity and throughput).

Actually that might be one of the good things of the AI revolution, having cheap SIMD floating point types

This is perhaps the most immediately observable optimization one can make. To convince me back at the time when I read about the concept, I coded a very simple simulation: one using virtual calls and the 'usual' patterns (ie an entity->update() virtual call) vs a simple procedural approach (with the proper data arrangements of course). In this particular case, the 'simple' approach was 17 times faster than the virtual call approach.

It might not have been just the virtual call, but also the references and having multiple allocations vs one. Make absolutely sure that the number of allocations in both scenarios are absolutely the same. Talking about virtual call overhead mixed with allocations (dynamic memory allocation is a much "heavier" mechanism) is futile.

I've read the book about Data Oriented Design and I've found it really good, but I usually don't overdo it if I don't have to. I'm lazy, you know, and for me the best optimization approach will always be 'don't write more code than absolutely necessary to accomplish the task'

For any good optimisation strategy, you need to revisit after a time to make sure the feeling you got when solving something is actually the cause of the change you attribute it to. Eliminate all other factors to make absolutely sure.

So many rigidly held believes that I had in the nineties when I first did my non trivial programming (started with C=64 in the eighties) have been toppled since.

badidea · Post by **badidea** » Jan 02, 2025 22:47

Any idea how costly (slow) recursive functions are compared to a loop and with a custom stack implementation for the data?
E.g. for this crossword generator project using the 'recursiveMagic' function, where the depth is limited by the amount of words to fit (currently testing with 25 words).
Update: I got a working non-recursive version, 15% slower at the moment and more ugly.

Lost Zergling · Post by **Lost Zergling** » Apr 25, 2025 21:23

How I see the issue : there are several reasons that may explain the lack of tips and tricks regarding code optimization.

Look on the forum; there are some particularly fast codes (Dodicat, Srvaldez, Juergen Kulhwein, etc.), and yet all these codes are very different. Coding style is a coherent way of thinking: it's absurd to change your coding style to gain a few milliseconds; the price to pay is a conceptualization guided by... nothing. Before thinking about optimization, think about building your own coding style.

Avoid rigid or doctrinal positions as much as possible (i.e., arrays are better or lists are better, the use of pointers is irrelevant, recursive programming is better or worse, etc.). Keep in mind that the more options you manage to preserve, the more optimization paths you'll preserve. This is very difficult, because the easiest way is, of course, to lock yourself into a coding style. Before thinking about optimization, keep as many options as possible open to you.

Optimize, yes, but against what? Take the example of recursive programming: often faster, but also a major consumer of a very specific resource: the stack. Pointers: sometimes slower than a dedicated function but less cache-intensive. Pure, stand-alone optimization in a "hard science" way doesn't always correspond to what we're looking for. Before thinking about optimization, think about the system as a whole and over time.

Optimize, yes, but what? Identify the bottlenecks and focus on them. To be effective, before thinking about optimization, think about efficiency.

Way out ?
Think coding style, think about flexibility, think about the system (a little empathy for that unfortunate computer), and think about efficiency, and then, only, think optimization, and finally, with experience and technique, you will find your way.
That being said, nothing prevents you from researching technical tips: think about the process and minimize unnecessary loops as much as possible, reduce as much as possible everything that must be in a loop, and above all think through the algorithms correctly.
One last piece of advice, perhaps the most important: place immense importance on conceptualization, by trying to think also through optimization upfront. Optimizing code to death will never correct a conceptual error at the outset.

Lost Zergling · Post by **Lost Zergling** » Apr 28, 2025 13:59

edited : wrong topic

kwazarmax · Post by **kwazarmax** » Apr 30, 2025 13:25

Hi there,
Here is my 2 million particles optimized simulation with collisions on CPU written on FreeBasic. In the comments there are some of optimization tips.
https://youtu.be/g3jBiWW4BzE?si=qoYAdzTZGETX7e4Z

Optimising execution speed

Optimising execution speed

Re: Optimising execution speed

Re: Optimising execution speed

Re: Optimising execution speed

Re: Optimising execution speed

Re: Optimising execution speed

Re: Optimising execution speed

Re: Optimising execution speed

Re: Optimising execution speed

Re: Optimising execution speed

Re: Optimising execution speed

Re: Optimising execution speed

Re: Optimising execution speed

Re: Optimising execution speed