Fast string concatenation

New to FreeBASIC? Post your questions here.
Post Reply
gbos
Posts: 35
Joined: Feb 09, 2006 12:58

Fast string concatenation

Post by gbos »

Suppose I have to do the following loop for many thousand times

read x …. if something is true ….. string = string & x

is there an efficient way to improve the string concatenation? So far the only way to improve significantly the speed was to use many temporary strings for storage and do something like this

Dim string(1 to 100)
for j = 1 to 1000000
read x …. if something is true …. string(int(j/1000)+1) = string(int(j/1000)+1 )&x
next j
string = string(1) + string(2) + …..
bradlis7
Posts: 12
Joined: May 29, 2008 19:20
Location: Texas
Contact:

Post by bradlis7 »

Just an idea, I didn't try it.

Code: Select all

Dim s As String = space(1000000), tmpstr As String
Dim offset As Integer = 1
For j = 1 to 1000000
  tmpstr = "bla" '' Whatever needs to go here.
  Mid(s,offset,len(tmpstr)) = tmpstr
  offset += len(tmpstr)
Next j
s = Left(s,offset)

I may not have my indexes right with offset. I think it should be faster to modify in place than to concatenate, so that's why I use Mid() = tmpstr.
Zippy
Posts: 1295
Joined: Feb 10, 2006 18:05

Post by Zippy »

The fb generic len() function and mid() function/statements are relatively slow because they have to work with UTDs, 8-bit strings, wstrings, and perform considerable error checks. If you need speed then you need to write code that is specific to your application, excluding the generic functions.

A solution is to use an allocated buffer, memcpy the string to be concatenated into that buffer, while using a shortcut to determine the length of the string copied. The process isn't difficult if you are comfortable with pointers. Try this:

Code: Select all

#include once "crt.bi"
'
dim as integer baselimit,j,slen
dim as string x
'
'the maximum number of bytes to allocate for storage/buffer
dim as integer basemax=5*1024*1024
'
'allocate the buffer, basestr = address of first byte of buffer
dim as byte ptr basestr=allocate(basemax)
if basestr=0 then print "buffer allocation failed":end
'
'this is our index into the buffer
dim as byte ptr baseptr
'
'if you want to treat the buffer as a string, later
dim as zstring ptr basezptr
'
'
'set our index to the beginning of the buffer
baseptr=basestr
'
'define the address of the last byte in our buffer to write to
baselimit=cast(integer,baseptr)+basemax-1
'
'testdata in lieu of "read x"
x="string "
'
for j=1 to 1000000
    'read x
    if j/2=j\2 then 'if x=whatever..
        '
        'get length of string x, much quicker then len(x)
        slen=*(cptr(integer ptr,@x)+1)
        '
        'test for index beyond end of buffer
        if baseptr+slen>baselimit then
            print using "exceeded memalloc at iteration: #,###,###";j
            exit for
        end if
        '
        'copy to buffer using index baseptr, copy from x, copy slen number of chars
        memcpy(baseptr,strptr(x),slen)
        '
        'increment baseptr
        baseptr+=slen
    end if
next j
'
'terminate buffer data for zstring usage.
*(baseptr+1)=0
'
'baseptr - basestr = number of active bytes in our buffer
print using "Done, size: ###,###,### bytes";baseptr-basestr
print
'
'get a zstring ptr to our buffer
basezptr=cast(zstring ptr,basestr)
'ztring access using zstring ptr: *basezptr
print left(*basezptr,100)
'
'cleanup if you want
deallocate(basestr)
sleep
gbos
Posts: 35
Joined: Feb 09, 2006 12:58

Post by gbos »

Thank you both for the suggestions.
KristopherWindsor
Posts: 2428
Joined: Jul 19, 2006 19:17
Location: Sunnyvale, CA
Contact:

Post by KristopherWindsor »

The built-in String type automatically allocates extra space for things like this. Also Len() is not slow, because it doesn't process the whole String; the length is saved in a variable.

Code: Select all

var s = "12345678"
s[2] = 0 'if Len() looked for Chr(0), this would change the length
print len(s)
sleep()
Zippy
Posts: 1295
Joined: Feb 10, 2006 18:05

Post by Zippy »

KristopherWindsor wrote:The built-in String type automatically allocates extra space for things like this. Also Len() is not slow, because it doesn't process the whole String; the length is saved in a variable.

Code: Select all

var s = "12345678"
s[2] = 0 'if Len() looked for Chr(0), this would change the length
print len(s)
sleep()
@KW

I didn't make up the alternative method to [avoid using] len(), it came from a dev - I think either cha0s or Jeff. Regardless, I didn't make it up. I have tested it (have you?), it can shave tenths of seconds off a million or several millions iterations. Anything much less than a millon or millions of iterations moots this entire conversation. Moot moot.

A null, "0", terminates zstrings. If you noted my "0" in the buffer it is so that zstring functions don't examine garbage beyond the end of valid data. I said nothing about len() and "0", so..???

You're confusing what you've read in another thread about automatic string allocation after concatenation. fb does allocate some extra bytes after concatenation but certainly not millions. Each and every time fb has to change the size of the string another area of memory must be allocated, the string data must be copied to the new area, the old memory must be released. fb performs approximately 92 reallocations when concatenating small strings for the first 10mb of base string. ==> Obviously: as the size of the base string increases the time to reallocate the new string increases.

I'll post some test code. Again (others have proved this before):

Code: Select all

'test automatic string allocation
'
#include once "crt.bi"
'
dim as integer c,iter,l,na,os,s
dim as double bt,et
dim as string k,x,uf
'
os=0
iter=1000000
'
print "string+=string Working.."
bt=timer
for c=1 to iter
    k=str(c*c)
    x+=k
    'x+=str(c*c)
next
et=timer
'
uf="iterations: ##,###,### time: ##.######## len: ##,###,###"
print using uf;iter,et-bt,len(x)
print left(x,20);" ";
print right(x,20)
'
'
dim as ubyte ptr basestr=allocate(50*1024*1024)
dim as ubyte ptr baseptr=basestr
'
print
print "memcpy+len() Working.."
'
bt=timer
for c=1 to iter
    x=str(c*c)
    '
    'l=*(cptr(integer ptr,@x)+1)
    'memcpy(baseptr,strptr(x),l)
    memcpy(baseptr,strptr(x),len(x))
    '
    'baseptr+=l
    baseptr+=len(x)
    '
next
et=timer
'
uf="iterations: ##,###,### time: ##.######## len: ##,###,###"
print using uf;iter,et-bt,cast(integer,baseptr-basestr)
print left(*cast(zstring ptr,basestr),20);" ";
print right(*cast(zstring ptr,basestr),20)
'
print
print "memcpy+cptr Working.."
'
deallocate(basestr)
basestr=callocate(50*1024*1024)
baseptr=basestr
'
bt=timer
for c=1 to iter
    x=str(c*c)
    '
    l=*(cptr(integer ptr,@x)+1)
    memcpy(baseptr,strptr(x),l)
    'memcpy(baseptr,strptr(x),len(x))
    '
    baseptr+=l
    'baseptr+=len(x)
    '
next
et=timer
'
uf="iterations: ##,###,### time: ##.######## len: ##,###,###"
print using uf;iter,et-bt,cast(integer,baseptr-basestr)
print left(*cast(zstring ptr,basestr),20);" ";
print right(*cast(zstring ptr,basestr),20)
'
print
print "asm Working.."
'
deallocate(basestr)
basestr=callocate(50*1024*1024)
baseptr=basestr
'
bt=timer
for c=1 to iter
    x=str(c*c)
    '
    asm
        mov esi,[baseptr]
        mov edi,[x]
        mov ecx,[x+4]
        add [baseptr],ecx
    doloop:
        mov al,[edi]
        mov [esi],al
        inc edi
        inc esi
        dec ecx
        jnz doloop
    end asm
'
next
et=timer
'
uf="iterations: ##,###,### time: ##.######## len: ##,###,###"
print using uf;iter,et-bt,cast(integer,baseptr-basestr)
print left(*cast(zstring ptr,basestr),20);" ";
print right(*cast(zstring ptr,basestr),20)

'
print
print "Sleeping to Exit.."
deallocate(basestr)
sleep
end
Thanks for the drive-by. Again.

ETA: fixed asm
Last edited by Zippy on Apr 14, 2009 4:18, edited 1 time in total.
Zippy
Posts: 1295
Joined: Feb 10, 2006 18:05

Post by Zippy »

If we test just l=len(x) against l=*(cptr(integer ptr,@x)+1) using MichaelW's counter.bas code we find that len(x) requires SIX TIMES as many clock cycles:

Code: Select all

#include "windows.bi"
print "Working.."
print
#include "counter.bi" 'counter.bas renamed to counter.bi..
'====================================================================
'' Counter.bas is available here:
''
'' http://www.freebasic.net/forum/viewtopic.php?t=4221
'====================================================================
'
sleep 4000
'
dim as integer l
dim as string x="Zippy"
'
counter_begin(10000,HIGH_PRIORITY_CLASS)
    l=len(x)
counter_end
print counter_cycles; " cycles using len()"
'
counter_begin(10000,HIGH_PRIORITY_CLASS)
    l=*(cptr(integer ptr,@x)+1)
counter_end
print counter_cycles; " cycles using cptr"
'
print
print "Sleeping to Exit.."
sleep
'
'RESULTS:
'
'Working..
'
'60 cycles using len()
'10 cycles using cptr
'
'Sleeping to Exit..
This cptr method is specific to 8-bit strings. It does not reflect any failing of/in the fb methods.
KristopherWindsor
Posts: 2428
Joined: Jul 19, 2006 19:17
Location: Sunnyvale, CA
Contact:

Post by KristopherWindsor »

I meant that Len() is O(1), not O(n) as it seemed you might have been implying.
Post Reply