Speed issue with string concatenation and a solution

General FreeBASIC programming questions.
Post Reply
SARG
Posts: 1763
Joined: May 27, 2005 7:15
Location: FRANCE

Speed issue with string concatenation and a solution

Post by SARG »

As I was facing a speed issue when working on the asm64 backend I did some tests and I mainly looked for a solution.

The purpose was just to add many strings to another. When it's "directly" no problem but when the strings to add are passed as parameter of a procedure the execution time is very different.
Run the example.
Part 1/part2 directly and very fast
Part3 simulates my old code, very long time
Part4, using "mid(main_string,start)=add_string", the solution with time factor of #100.

In part1 and part2 there are few space allocations and main string moves.
In part3 there are allocations and moves for every adding
In part4 only one allocation and no main string move

Note : The final size not known so it's necessary to check when adding and increase the initial size if needed (not coded in that example).

Code: Select all

type ttest
    strg as string 
end type

dim shared as ttest test
dim shared as string addstrg,sg
dim shared as long start
start=1
addstrg="kdjfzifi  zefoi   zoefh   zofhe   ozhefo  hefoh   zoefh   ohef    hefo    hefh    he"
dim as integer ptr pstrg=@test.strg
dim as double tim

sub adding(sg as string)
    test.strg+=sg
end sub

sub adding2(sg as string)
    mid(test.strg,start)=sg
    start+=len(sg)
end sub


print "string adress"," lenght"," space allocated"

''1/
test.strg=""
tim=timer
for i as long =1 to 10000
    test.strg+=str(i)+addstrg+space(50)
    if i>9990 then print *(pstrg),*(pstrg+1),*(pstrg+2)
next
print "time=";timer-tim," size=";len(test.strg)
print

''2/
test.strg=""
tim=timer
for i as long =1 to 10000
    sg=str(i)+addstrg+space(50)
    test.strg+=sg
    if i>9990 then print *(pstrg),*(pstrg+1),*(pstrg+2)
next
print "time=";timer-tim," size=";len(test.strg)
print

''3/
test.strg=""
tim=timer
for i as long =1 to 10000
    adding(str(i)+addstrg+space(50))
    if i>9990 then print *(pstrg),*(pstrg+1),*(pstrg+2)
next
print "time=";timer-tim," size=";len(test.strg)
print

''4/
test.strg=space(2000000)
tim=timer
for i as long =1 to 10000
    adding2(str(i)+addstrg+space(50))
    if i>9990 then print *(pstrg),*(pstrg+1),*(pstrg+2)
next
print "time=";timer-tim,"real string size=";start-1
sleep
dodicat
Posts: 7983
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Speed issue with string concatenation and a solution

Post by dodicat »

It is desperately slow I see.
But if you take the weight off the concatenations in the adding sub by

sub adding(sg as string)
var t=sg
test.strg+=t
end sub

It is back up to speed.
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Speed issue with string concatenation and a solution

Post by marcov »

This is very normal and even so common it could be considered a cross language problem. It is simply string being allocated pretty close to its (current) size, which with the next concatenation is already to small, so a reallocation and copy follows. Typical cases is generating webpages where line after line is added.

The usual workaround is defining a class that uses one or multiple fairly large internal blocks of memory (often called "StringBuilder") to store the string, and have one operation at the end to get the result.

The simplest forms are simply a wrapper around an array of char and increase the size exponentially (*)

See e.g. .NET's stringbuilder class

(*) a good strategy is to double the allocation till a certain size is reached (usually several MBs or even tens of MBs) and then taper the rate of exponential growing to increasing by 10-25% or so
Last edited by marcov on Apr 12, 2019 13:45, edited 1 time in total.
fxm
Moderator
Posts: 12106
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Speed issue with string concatenation and a solution

Post by fxm »

Another variant:

Code: Select all

sub adding1(sg as string)
    if cptr(integer ptr, @test.strg)[2] > cptr(integer ptr, @test.strg)[1]+len(sg) then
        for i as integer = 0 to len(sg)-1
            cptr(ubyte ptr ptr, @test.strg)[0][cptr(integer ptr, @test.strg)[1]+i]=sg[i]
        next i
        cptr(integer ptr, @test.strg)[1]+=len(sg)
    else
        test.strg+=sg
    end if
end sub
SARG
Posts: 1763
Joined: May 27, 2005 7:15
Location: FRANCE

Re: Speed issue with string concatenation and a solution

Post by SARG »

Thank you all for your replies,

I'll use dodicat's code. Sometimes it's faster than mine sometimes not but It's a bit simpliest (no use of extra variable, no size check).

@marcov
I have used the workaround you talk however fbc is very efficient, the time difference is small between the 2 ways (dodicat's one and mine).
But imo fbc is very bad when using directly the string passed as parameter. Maybe something to dig.
angros47
Posts: 2323
Joined: Jun 21, 2005 19:04

Re: Speed issue with string concatenation and a solution

Post by angros47 »

String concatenation changes the amount of memory occupied by a string variable, and this may require a reallocation. Reallocations are managed by the operating system, so there is no way to make them more efficient. The only way to speed up things is to reduce the number of reallocations, by preallocating enough memory for the string. In C++, for example, when a string grows over the allocated memory, it is reallocated to twice its previous size, so a reallocation would be needed only when string size doubles again. This approach is pretty speed-effective, but can waste a lot of memory (if a string is 33 kb long, for example, it will allocate 64 kb, wasting 31 kb). Not a problem on modern systems, but it was a problem on the old, memory-limited computers BASIC was originally intended to run. For that reason, BASIC was in general designed to minimize the amount of memory wasted, and that can be achieved only with more frequent reallocations: a slower, but more memory efficient solution.

As usual, optimizations of something comes at expenses of something else: memory size vs speed
SARG
Posts: 1763
Joined: May 27, 2005 7:15
Location: FRANCE

Re: Speed issue with string concatenation and a solution

Post by SARG »

@angros47
I know and I understand but the "problem", not a bug as the result is right, appears depending on the way you do the concatenation : with the parameter or with a local variable.
Obviously after a lot of calls causing a big string.

Very slow, my test : compiling 2000 lines of code (only in one procedure, fbc test suite) generating 96000 lines of asm takes several minutes. The size of the final string should be 2.7Mo.

Code: Select all

sub adding(sg as string)
   main_string+=sg
end sub
Much faster, less than 5 seconds..... (with nothing else different)

Code: Select all

sub adding(sg as string)
   dim as string local_sg=sg
   main_string+=local_sg
end sub
So there is a point to be improved.
speedfixer
Posts: 606
Joined: Nov 28, 2012 1:27
Location: CA, USA moving to WA, USA
Contact:

Re: Speed issue with string concatenation and a solution

Post by speedfixer »

A question to the group:

Would it be possible to add an alternate string library, called by a compile option or lib include, that would use a globally allocated string memory pool for all string functions? Could this could dramatically increase string operation speed in most cases?

Of course, it would need a garbage collector. Nothing is free.

For most casual programmers - especially in BASIC - playing with strings is how they learn. It is easy to see (and show) the result of your efforts.
And for a large project, if one particular module is string-heavy, just that module could use the fast-string lib.

david
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Speed issue with string concatenation and a solution

Post by jj2007 »

The problem boils down to either
- marcov's StringBuilder class (several reallocations, risk to fail a HeapRealloc for large sizes)
- or finding a way to know beforehand which size you need, at least approximately (requires coder to think)

There is no perfect solution. Even in assembly string concatenation is a difficult optimisation problem.
Last edited by jj2007 on Apr 12, 2019 18:06, edited 1 time in total.
fxm
Moderator
Posts: 12106
Joined: Apr 22, 2009 12:46
Location: Paris suburbs, FRANCE

Re: Speed issue with string concatenation and a solution

Post by fxm »

7 years ago, I had already written a feature request about this problem:
#245 Execution speed optimization of var-len string instruction

You observation corresponds to my paragraph 2.3 in this analyse.
We can verify that the problem comes back if we dereference a string pointer (even pointing to a local string), like it works under the hood when a string is passed by reference:

Code: Select all

sub adding(sg as string)
   dim as string local_sg=sg
   dim as string ptr p_local_sg= @sg
   main_string+=*p_local_sg
end sub
If you wish, you can add your personal obdervations in my feature request.
MrSwiss
Posts: 3910
Joined: Jun 02, 2013 9:27
Location: Switzerland

Re: Speed issue with string concatenation and a solution

Post by MrSwiss »

speedfixer wrote:A question to the group: ... Could this could dramatically increase string operation speed in most cases?
No, because inside the pool, (a single) string's memory would (at times, at least) require
a re-allocation, when growing the string. (Remember: allocation is per string instance)
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Speed issue with string concatenation and a solution

Post by marcov »

jj2007 wrote:The problem boils down to either
- marcov's StringBuilder class (several reallocations, risk to fail a HeapRealloc for large sizes)
- or finding a way to know beforehand which size you need, at least approximately (requires coder to think)
Stringbuilders typically do the first default, and the second if you set a capacity. Or they avoid both to simply not wrap a single block allocation, but a linked list of say <n>MB blocks that you can then write to a socket one by one. That way there is never a need to allocate large blocks, avoiding fragmentation in long running apps.

Basically Stringbuilder is that global string type, but then with just the minimal support for concatting. Any other string operations you do on a "normal" string before concatenating it to a stringbuilder.

Of course the default stringbuilder is just a tool in the box. For very specific cases, you can roll your own. I had one once that was mapped to memory mapped (shared) memory for debugging cases. If the frontend died, the process that spawned it would log the shared memory as a dump, and you could see (among others ) the request and roughly where during the construction of the html page it crashed. We were transitioning webframeworks at the time, and the new version still had many bugs, so that was an helpful tool.
coderJeff
Site Admin
Posts: 4323
Joined: Nov 04, 2005 14:23
Location: Ontario, Canada
Contact:

Re: Speed issue with string concatenation and a solution

Post by coderJeff »

fxm wrote:7 years ago, I had already written a feature request about this problem:
#245 Execution speed optimization of var-len string instruction
oops, I added the comments to the topic that the feature request references: viewtopic.php?f=3&t=19781
Using examples from this topic for context.
jj2007
Posts: 2326
Joined: Oct 23, 2016 15:28
Location: Roma, Italia
Contact:

Re: Speed issue with string concatenation and a solution

Post by jj2007 »

marcov wrote:Stringbuilders typically do the first default, and the second if you set a capacity. Or they avoid both to simply not wrap a single block allocation, but a linked list of say <n>MB blocks that you can then write to a socket one by one.
That is basically what an array does. In fact, there is rarely a need to build one contiguous big string. I only use it with Store "some.txt", some$() because writing many little strings to file can be a bit slow; in that case, however, there is only one allocation, and if that one is full, it gets written to disk, and the next round of string building starts until all strings are written to disk.
marcov
Posts: 3462
Joined: Jun 16, 2005 9:45
Location: Netherlands
Contact:

Re: Speed issue with string concatenation and a solution

Post by marcov »

jj2007 wrote:
marcov wrote:Stringbuilders typically do the first default, and the second if you set a capacity. Or they avoid both to simply not wrap a single block allocation, but a linked list of say <n>MB blocks that you can then write to a socket one by one.
That is basically what an array does. In fact, there is rarely a need to build one contiguous big string. I only use it with Store "some.txt", some$() because writing many little strings to file can be a bit slow; in that case, however, there is only one allocation, and if that one is full, it gets written to disk, and the next round of string building starts until all strings are written to disk.
... and a stringbuilder of <n> MB blocks is just a reusable optimization of that. It avoids too large blocks like can happen in the single store case, and also the too many per line allocations and write syscalls of the per line case, taking a middle approach of blocks of several MBs.
Post Reply