is there any string replacement function in FB

General FreeBASIC programming questions.
counting_pine
Site Admin
Posts: 6190
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Postby counting_pine » Oct 16, 2008 13:58

Eclipzer wrote:I only make a single call to INSTR. Most approaches require an initial call outside the loop, with subsequent calls within the loop.
Though it hasn't been used in this thread, have you considered that an initial call outside the loop could be done on the original Text string? If it returned zero, you could simply return Text, rather than doing an unnecessary copy to Result.

Here's a more efficient implementation. While I've incorporated the above trick, the main difference is that Result gradually builds itself up from newString and chunks of Text, rather than rebuilding itself each time an insertion is made.

Code: Select all

function str_replace( _
   byref text as const string, _
   byref subString as const string, _
   byref newString as const string) as string

   if len(text) = 0 then return ""
   if len(subString) = 0 then return text

   dim as integer match = instr(text, subString), match0 = any
   if match = 0 then return text

   dim as string  result
   dim as integer sublen = len(subString), newlen = len(newString)

   result = left(text, match - 1) & newstring

   do
      match0 = match + sublen
      match  = instr(match0, text, subString)

      if match = 0 then return result & mid(text, match0)

      assert( match >= match0 )
      result &= mid(text, match0, match - match0)
      result &= newString
   loop

end function
I think the best way to gain efficiency after this is to go low-level, and incorporate it into the actual searching code.
There's some setting up that Instr has to do, which can technically be eliminated when the only change on each call is that the start value increases monotonically.
Eclipzer
Posts: 432
Joined: Oct 01, 2005 10:50
Location: Maryland
Contact:

Postby Eclipzer » Oct 17, 2008 4:43

@PaulSquires, my speed tests show that your optimization is indeed faster. Thanks for pointing this out. I will use this from now on.

@counting_pine, your powers of optimization never cease to impress. I would probably clean the code up a bit for readability, but I'm pretty sure I follow your logic/process. Thanks for you input.
stylin
Posts: 1253
Joined: Nov 06, 2005 5:19

Postby stylin » Oct 17, 2008 11:43

Sisophon2001, for what it's worth, fbext has a bunch of string-handling APIs (including a string class, no wide-chars yet). It also has most of the PHP string-handling APIs as well. Check it out at:
http://code.google.com/p/fb-extended-lib/
Sisophon2001
Posts: 1704
Joined: May 27, 2005 6:34
Location: Cambodia, Thailand, Lao, Ireland etc.
Contact:

Postby Sisophon2001 » Oct 17, 2008 15:22

stylin wrote:Sisophon2001, for what it's worth, fbext has a bunch of string-handling APIs (including a string class, no wide-chars yet). It also has most of the PHP string-handling APIs as well. Check it out at:
http://code.google.com/p/fb-extended-lib/

I know about fbext, but never used it. I dislike having to clutter my program with different licences, but actually, now that I re-look at it, fbext has made a programmer friendly choice in its licensing conditions.

Perhaps the major issue is simply that the functions do not appear when I enter search items in the help file, so I go "back to home" which for me is the BCX runtime library written in C (which I can easily translate into BASIC).

Why not make a chm file of fbext functions? Would it be possible to add them to the main help file? And a FAQ on the licence could be included so people could decide it it is what they want.

Garvan

PS. Actually I should just install it, and spend some time familiarising myself with the functions. I don't really have an excuse for not doing so before.
stylin
Posts: 1253
Joined: Nov 06, 2005 5:19

Postby stylin » Oct 18, 2008 0:56

Sisophon2001, fbext is not officially affiliated with FreeBASIC. The online documentation is not available (some server issues or something, I think, sorry).

It uses NaturalDocs for documentation, which only supports HTML output. We have not really thought about different formats, but if there's a demand, then we could certainly try and provide a .CHM version. I don't know what you mean by "main help file", so I can't comment on that.
counting_pine
Site Admin
Posts: 6190
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Postby counting_pine » Oct 18, 2008 2:34

@stylin, I kind of like fbext's method of string replacing. It probably has more efficient memory management, due to the way it finds out the number of replaces needed beforehand, so it can allocate exactly the right string size.

It looks like it might run int o troubles with the array size if there are too many matches found though. Not sure what the best workaround for that would be...
Perhaps every hundred matches or so, it should clear out the array by doing all the necessary replacements it's found at that point.
Though it would mean that, in general, you wouldn't know the final size of the string until you'd found the last match...


EDIT: There's a bug in it: when searching for matches, you should do:

Code: Select all

strpos = instr(strpos + len(oldtext), ...

rather than:

Code: Select all

strpos = instr(strpos + 1, ...

Otherwise, it would be possible to find matches that overlap, e.g. when replacing matches of "abab" in "ababab". In the case of this routine, the result seems to be a segfault.
Sisophon2001
Posts: 1704
Joined: May 27, 2005 6:34
Location: Cambodia, Thailand, Lao, Ireland etc.
Contact:

Postby Sisophon2001 » Oct 18, 2008 5:26

I was not able to build fbext on Ubuntu 8.04? FB 0.20b.

fbc -c -w all -g -i inc src/core/libext-file-file.bas -o src/core/libext-file-file.o
src/core/libext-file-file.bas(360) error 225: Elements must be empty for strings and arrays, found 'amount' in 'FBEXT_DEFINE(FILE_PUT,string)'
src/core/libext-file-file.bas(360) error 225: Elements must be empty for strings and arrays, found 'amount' in 'FBEXT_DEFINE(FILE_PUT,string)'
make: *** [src/core/libext-file-file.o] Segmentation fault
garvan@ubuntu:~/Desktop/ext_0_2_2$


Any ideas from the results of running make?

I think my version of replace is significantly faster than the others posted. I made one test in a 10000 loop, so there may be cases where the others perform better. I was not able to test fbext.

Garvan
stylin
Posts: 1253
Joined: Nov 06, 2005 5:19

Postby stylin » Oct 18, 2008 15:04

Sisophon2001, some recent changes to Get# caused the ext.File class to stop compiling. This has been fixed in the recent revision r771. For a quick fix, replace src/core/libext-file-file.bas with the following and rebuild. Sorry about that, I'm working to get another release out soon(tm).

Code: Select all

''Copyright (c) 2007, FreeBASIC Extended Library Development Group
''
''All rights reserved.
''
''Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
''
''    * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
''    * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
''    * Neither the name of the FreeBASIC Extended Library Development Group nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
''
''THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
''"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
''LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
''A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
''CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
''EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
''PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
''PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
''LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
''NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
''SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

# include once "ext/detail/common.bi"
# include once "ext/file/file.bi"


#macro FBEXT_DEFINE_FILE_PRINT( _T_ )
:
   sub File.print( byref data_ as _T_ )

   #ifdef FBEXT_MULTITHREADED
      mutexlock(m_mutex)
   #endif

      if m_filehandle <> null then

         ..print #m_filehandle, data_

      end if

   #ifdef FBEXT_MULTITHREADED
      mutexunlock(m_mutex)
   #endif

   end sub

   sub File.print( _data() as _T_ , byval amount as integer = 0 )

   #ifdef FBEXT_MULTITHREADED
      mutexlock(m_mutex)
   #endif

   if m_filehandle <> null then

   var amm = iif(amount = 0, ubound(_data), amount)

   for n as integer = lbound(_data) to amm

      ..print #m_filehandle, _data(n)

   next n

   end if

   #ifdef FBEXT_MULTITHREADED
      mutexunlock(m_mutex)
   #endif

   end sub
:
#endmacro

#macro FBEXT_DEFINE_FILE_GET( _T_ )
:
   sub File.get( byval filepos as longint = -1, byref data_ as _T_, byval amount as integer = 1 )

   #ifdef FBEXT_MULTITHREADED
      mutexlock(m_mutex)
   #endif

    #if _T_ = string
      if filepos = -1 then

         ..get #m_filehandle, , data_

      else

         ..get #m_filehandle, filepos, data_

      end if
    #else
      if filepos = -1 then

         ..get #m_filehandle, , data_, amount

      else

         ..get #m_filehandle, filepos, data_, amount

      end if
   #endif

   #ifdef FBEXT_MULTITHREADED
      mutexunlock(m_mutex)
   #endif

   end sub
:
#endmacro

#macro FBEXT_DEFINE_FILE_PUT( _T_ )
:
   sub File.put( byval filepos as longint = -1, byref data_ as _T_, byval amount as integer = 1 )

   #ifdef FBEXT_MULTITHREADED
      mutexlock(m_mutex)
   #endif

    #if _T_ = string
         if filepos = -1 then
         ..put #m_filehandle, , data_

      else
         ..put #m_filehandle, filepos, data_

      end if
    #else
         if filepos = -1 then
         ..put #m_filehandle, , data_, amount

      else
         ..put #m_filehandle, filepos, data_, amount

      end if
   #endif

   #ifdef FBEXT_MULTITHREADED
      mutexunlock(m_mutex)
   #endif

   end sub
:
#endmacro

namespace ext


constructor File ( byref filename as string, byval acc as ACCESS_TYPE = R )

   m_filename = filename
   m_filehandle = 0
   m_access = acc
   m_lasterror = 0

   #ifdef FBEXT_MULTITHREADED

   m_mutex = mutexcreate()

   #endif

end constructor

constructor File ( )

   m_filename = ""
   m_filehandle = 0
   m_access = -1
   m_lasterror = 0

   #ifdef FBEXT_MULTITHREADED

   m_mutex = mutexcreate()

   #endif


end constructor

function File.eof( ) as ext.bool

   #ifdef FBEXT_MULTITHREADED
      mutexlock(m_mutex)
   #endif

   var x = iif(..eof(m_filehandle), ext.true, ext.false)

   #ifdef FBEXT_MULTITHREADED
      mutexunlock(m_mutex)
   #endif

   return x

end function

property File.handle() as integer

   return m_filehandle

end property

function File.lof() as longint

   #ifdef FBEXT_MULTITHREADED
      mutexlock(m_mutex)
   #endif

   var x = ..lof(m_filehandle)

   #ifdef FBEXT_MULTITHREADED
      mutexunlock(m_mutex)
   #endif

   return x

end function

function File.loc() as longint

   #ifdef FBEXT_MULTITHREADED
      mutexlock(m_mutex)
   #endif

   var x = ..loc(m_filehandle)

   #ifdef FBEXT_MULTITHREADED
      mutexunlock(m_mutex)
   #endif

   return x

end function

property File.seek( byval poz as longint )

   #ifdef FBEXT_MULTITHREADED
      mutexlock(m_mutex)
   #endif

   ..seek #m_filehandle, poz

   #ifdef FBEXT_MULTITHREADED
      mutexunlock(m_mutex)

   #endif

end property

property File.seek( ) as longint

   #ifdef FBEXT_MULTITHREADED
      mutexlock(m_mutex)
   #endif

   var x = ..seek(m_filehandle)

   #ifdef FBEXT_MULTITHREADED
      mutexunlock(m_mutex)
   #endif

   return x

end property

function File.open( byref filename as string, byval acc as ACCESS_TYPE = R ) as ext.bool

   #ifdef FBEXT_MULTITHREADED
      mutexlock(m_mutex)
   #endif

   m_filename = filename
   m_access = acc

   #ifdef FBEXT_MULTITHREADED
      mutexunlock(m_mutex)
   #endif

   return open()

end function

property File.LastError() as integer

   return m_lasterror

end property

function File.open() as ext.bool

   #ifdef FBEXT_MULTITHREADED
      mutexlock(m_mutex)
   #endif

   m_filehandle = freefile

   if m_filename <> "" then

      var ret = 0

      select case m_access
      case 0
         ret = ..open(m_filename, for binary, ACCESS READ, as m_filehandle)

      case 1
         ret = ..open(m_filename, for binary, ACCESS WRITE, as m_filehandle)

      case 2
         ret = ..open(m_filename, for binary, ACCESS READ WRITE, as m_filehandle)

      end select

      if ret <> 0 then

         m_lasterror = ret

         #ifdef FBEXT_MULTITHREADED
         mutexunlock(m_mutex)
         #endif

         return ext.true

      else

         #ifdef FBEXT_MULTITHREADED
         mutexunlock(m_mutex)
         #endif

         return ext.false

      end if

   else

      #ifdef FBEXT_MULTITHREADED
      mutexunlock(m_mutex)
      #endif

      return ext.true

   end if



end function

function File.linput() as string

   #ifdef FBEXT_MULTITHREADED
   mutexlock(m_mutex)
   #endif

   var x = ""
   line input #m_filehandle, x

   #ifdef FBEXT_MULTITHREADED
   mutexunlock(m_mutex)
   #endif

   return x

end function

sub File.close()

   #ifdef FBEXT_MULTITHREADED
   mutexlock(m_mutex)
   #endif

   if m_filehandle <> 0 then ..close #m_filehandle

   #ifdef FBEXT_MULTITHREADED
   mutexunlock(m_mutex)
   #endif

end sub

   FBEXT_PP_SEQ_FOREACH(FBEXT_DEFINE, FILE_PRINT, FBEXT_NUMERIC_TYPES())
   FBEXT_DEFINE(FILE_PRINT,string)

   FBEXT_PP_SEQ_FOREACH(FBEXT_DEFINE, FILE_PUT, FBEXT_NUMERIC_TYPES())
   FBEXT_DEFINE(FILE_PUT,string)

   FBEXT_PP_SEQ_FOREACH(FBEXT_DEFINE, FILE_GET, FBEXT_NUMERIC_TYPES())
   FBEXT_DEFINE(FILE_GET,string)

destructor File ( )

   #ifdef FBEXT_MULTITHREADED
   mutexdestroy(m_mutex)
   #endif

   if m_filehandle <> 0 then ..close #m_filehandle

end destructor

end namespace


counting_pine, good catch with Instr, it is now fixed in the repo. (yes, we do need lots more unit tests, something that is always on the TODO :).

Not sure how to handle more matches than the implementation-defined maximum.. I've temporarily changed it to make the maximum replacements the ceiling, which seems to have the effect of simply ignoring any additional matches. For ext.strings.Replace, I guess our options are that we could either do this, exit early or throw a run-time error (or a combination of the three). Or implement it in such a way as to get around this. As you suggest, there seems to always be a chance at failure -- at the very least, you have the relatively high limit of the maximum String size (maximum Integer value).. Any ideas ?

By the way, the goal of fbext is not to be bleeding edge fast, so it's not really implemented with that in mind; surely you will find faster routines out there. That said, the libraries depend on user input, so any ideas about making something faster or more efficient (time, space, whatever) are always appreciated. Thanks guys! :)
counting_pine
Site Admin
Posts: 6190
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Postby counting_pine » Oct 18, 2008 18:54

@stylin, how about using a method like this?
- Fill the array with matches
- If the array is big enough to hold them all, then great. Continue as normal
- If the array isn't big enough, then scan to the end anyway, to find the return length needed. Once you have the string length, then do the replacemtnts from the array, and search for the rest of the matches, replacing them as they occur.

That way, the only penalty for too many replacements is a time penalty, because it has to do up to twice as many instr scans.

A slight optimisation would be in the case where oldtext and newtext are the same length. In this case you don't need to scan ahead at all, since the length of the string won't change.

Sisophon2001 wrote:I think my version of replace is significantly faster than the others posted. I made one test in a 10000 loop, so there may be cases where the others perform better.

Does your extra speed come from the algorithm, or the fact that you use CRT functions? It looks like you use a similar method to fbext, except that you scan the string twice instead of saving the results.
It doesn't look like it would cope well with strings containing chr(0) though. You might want to show this by making the parameters zstring ptrs instead.


If your code is significantly faster than mine, I suspect the likely causes are the FB rtlib string functions in my code, or the reallocations as the replacement string is built.
Another possible cause might be instr using a different algorithm from strstr. I think FB's is better designed for cases with larger strings.
Sisophon2001
Posts: 1704
Joined: May 27, 2005 6:34
Location: Cambodia, Thailand, Lao, Ireland etc.
Contact:

Postby Sisophon2001 » Oct 19, 2008 2:43

@stylin, Thanks for the help, it compiled fine. The install went awry though. I have a default install (usr/local), I used sudo make install, and the files still went into the standalone directory structure, so I needed to move them.

Perhaps the makefile has not been updated? I glanced through it, and did not see the standalone code referred to in the README.

@counting_pine
I suspect when I wrote this function, zstring pts were not introduced, or I wouldn't have needed to use strings and then convert to ubyte ptrs. I changed it now as you suggested. Its 3 years since I wrote this, but I think the main reason for using CRT functions was because MID, LEFT etc. were slow. LEN is fast. If embedded NULL's are important to support, then I believe it will still be faster to avoid MID. I also wrote a C version in the style of the FB rtl that respected embedded NULL's, but I don't maintain it, and have gone back to using the version I posted.

Perhaps I will find it more convenient to use fbext, now that it is installed. I must spend time studying it.

Garvan
stylin
Posts: 1253
Joined: Nov 06, 2005 5:19

Postby stylin » Oct 19, 2008 5:51

counting_pine, that is an excellent idea. I have actually been trying to implement your original suggestion (reusing the replacement index buffer if needed). At the moment the naive tests are passing. Once I write and pass more tests for the edge cases I'll post the code here.

Sisophon2001, I will have a look at the Makefile in more depth ASAP. It has the "STANDALONE" option, but it must not be correct as is. Thanks for pointing that out. Glad you're interested in the project; if you need quick help, usually I or someone is lurking on #freebasic-ext on chat.freenode.net if you do IRC at all. The mailing list is also a good place to discuss stuff (read and/or subscribe to it from the google code page).
stylin
Posts: 1253
Joined: Nov 06, 2005 5:19

Postby stylin » Oct 19, 2008 10:37

Ok, I took a stab at modifying ext.strings.Replace as per counting-pine's suggestion to reuse the internal buffer that stores the positions of the replacements substrings. I've also added an optional parameter to specify buffer size, so performance characteristics of the procedure can be fine-tuned for specialized input. Here's the code, ripped from the freshly-committed repository:

Code: Select all

''Copyright (c) 2007, FreeBASIC Extended Library Development Group
''
''All rights reserved.
''
''Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
''
''    * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
''    * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
''    * Neither the name of the FreeBASIC Extended Library Development Group nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.
''
''THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
''"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
''LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
''A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
''CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
''EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
''PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
''PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
''LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
''NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
''SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

# include once "ext/strings.bi"
# include once "ext/misc.bi"
# include once "crt/string.bi"

namespace ext.strings

   '' :::::
   sub Replace (byref subject as string, byref oldtext as const string, byref newtext as const string, byval bufsize as ext.SizeType)
   
      if 0 = len(subject) then exit sub
      if 0 = len(oldtext) then exit sub
      if 0 = bufsize then exit sub
      
      static as integer      s_buf()
      static as integer ptr   buf = ext.null
      
      ' first use or previous buffer not big enough ?
      if (ext.null = buf) orelse (bufsize > ubound(s_buf) + 1) then
         redim s_buf(0 to bufsize - 1) as integer
      end if
      
      buf = @s_buf(0)
      
      var replacements = 0
      var result = ""
      var subpos = 1
      var sublen = 0
      var strpos = instr(subpos, subject, oldtext)
      
      do
         ' populate buffer.
         if (0 < strpos) andalso (replacements < bufsize) then
            buf[replacements] = strpos
            sublen = strpos - subpos + len(oldtext)
            strpos = instr(strpos + len(oldtext), subject, oldtext)
            replacements += 1
            
            continue do
         end if
         
         ' found all occurrences or filled entire buffer ?
         if ((0 = strpos) andalso (0 < replacements)) or (replacements = bufsize) then
            var newsize = sublen + replacements * (len(newtext) - len(oldtext))
            result += space(newsize)
            
            var src = strptr(subject) + subpos - 1
            var dst = strptr(result) + len(result) - newsize
            
            for r as integer = 0 to replacements - 1
               ' copy up to replacement.
               var length = @subject[buf[r] - 1] - src
               memcpy(dst, src, length)
               dst += length
               src += length
      
               ' copy the replacement.
               memcpy(dst, strptr(newtext), len(newtext))
               dst += len(newtext)
               src += len(oldtext)
            next
            
            ' no more replacements ?
            if 0 = strpos then
               ' copy remaining text, if any.
               var length = @subject[len(subject)] - src
               if 0 < length then
                  ' result may be reallocated, so dst is unreliable.
                  var offset = dst - strptr(result)
                  result += space(length)
                  memcpy(strptr(result) + offset, src, length)
               end if
               
               subject = result
               exit do
            end if
            
            ' repopulate buffer.
            replacements = 0
            subpos += sublen
            sublen = 0
         end if
         
         ' no occurrences found (0 = replacements) ?
         if 0 = strpos then
            exit do
         end if
      loop
      
      ' we always get here, in all cases.
   
   end sub

   '' :::::
   sub Replace (byref subject as string, oldtext() as const string, byref newtext as const string, byval bufsize as ext.SizeType)

      for i as integer = lbound(oldtext) to ubound(oldtext)
         ext.strings.Replace(subject, oldtext(i), newtext, bufsize)
      next

   end sub


   '' :::::
   sub Replace (byref subject as string, oldtext() as const string, newtext() as const string, byval bufsize as ext.SizeType )

      var result = subject

      var search_size = ubound(oldtext) - lbound(oldtext) + 1
      var replace_size = ubound(newtext) - lbound(newtext) + 1

      var n = FBEXT_MIN(search_size, replace_size)

      for i as integer = lbound(oldtext) to lbound(oldtext) + n - 1
         ext.strings.Replace(subject, oldtext(i), newtext(i), bufsize)
      next

      if search_size > replace_size then
         for i as integer = n - 1 to ubound(oldtext)
            ext.strings.Replace(subject, oldtext(i), "", bufsize)
         next
      end if

   end sub

   '' :::::
   sub replace (subject() as string, byref oldtext as const string, byref newtext as const string, byval bufsize as ext.SizeType )

      for i as integer = lbound(subject) to ubound(subject)
         ext.strings.Replace(subject(i), oldtext, newtext, bufsize)
      next

   end sub

   '' :::::
   sub replace (subject() as string, oldtext() as const string, byref newtext as const string, byval bufsize as ext.SizeType )

      for i as integer = lbound(subject) to ubound(subject)
         ext.strings.Replace(subject(i), oldtext(), newtext, bufsize)
      next

   end sub

   '' :::::
   sub replace (subject() as string, oldtext() as const string, newtext() as const string, byval bufsize as ext.SizeType )

      for i as integer = lbound(subject) to ubound(subject)
         replace(subject(i), oldtext(), newtext(), bufsize)
      next

   end sub

end namespace


The subpos/sublen vars are used to keep track of the current portion of subject that has been searched, I couldn't think of a better way..
counting_pine
Site Admin
Posts: 6190
Joined: Jul 05, 2005 17:32
Location: Manchester, Lancs

Postby counting_pine » Oct 19, 2008 22:05

I can't see any major problems with that code. You might want to force a lower bound on the size of the array though, I don't think anything less than one would work.
You seem to use a different convention from the one I'm used to when writing comparison exprs like "If (0 < strpos)". Looks a bit Yoda-ish to me :)

I've looked a bit into string searching functions today, and I get the impression that for any reasonably smallish cases, using the built-in functions will get more speed than any algorithm run through GCC.

Therefore I think the best approach would be, as Garvan does, to use the CRT as much as possible.
Assuming the strings aren't NUL-heavy, I guess functions like strstr should be a good "heuristic" for finding matches. memcmp could be used to confirm them, and memcpy for the concatenation.
tempeleng
Posts: 7
Joined: Sep 19, 2008 9:46

Postby tempeleng » Oct 20, 2008 5:16

This is one I made last week. It uses Mid to move chunks of text into an output string with the right size. It's a bit faster than an earlier version of mine that builds the string each loop. Since I'm new to FB, I'm sure there's a way to improve its speed so comments are welcome.

Code: Select all

Declare Sub testIt(ByVal sText As String, ByVal sOld As String, ByVal sNew As String, ByVal expected as string)
Declare Function Replace(sText As String, sOld As String, sNew As String) as String

testIt("aaa", "a", "baa", "baabaabaa")
testIt("", "b", "XX", "")
testIt("abc", "", "XX", "abc")
testIt("abc", "b", "", "ac")
testIt("abc", "c", "", "ab")
testIt("abc", "b", "X", "aXc")
testIt("abc", "b", "XX", "aXXc")
testIt("blah", "blah", "ha", "ha")
testIt("[[{{", "{", "x", "[[xx")
testIt("[[{{", "[", "x", "xx{{")
testIt("x" & Space$(10000) & "x", "x", Space$(10000), Space$(30000))

Sub testIt(ByVal sText As String, ByVal sOld As String, ByVal sNew As String, ByVal expected as string)
   Dim sAns as string
   sAns = Replace(sText, sOld, sNew)
   If sAns = expected Then
      Print "Passed:"
   Else
      Print "Failed: *" & expected & "*, *" & sAns & "*"
   End If
End Sub

Function Replace(sText As String, sOld As String, sNew As String) as String

   'If input is empty then exit
   If Len(sText) = 0 Then Return sText
   If Len(sOld) = 0 Then Return sText

   Dim as Integer iX, iLenOld, iLenNew, iCount, iPosOld, iPosNew, iPosBuf
   Dim as String sTemp
   iLenOld = Len(sOld)
   iLenNew = Len(sNew)

   'Count the number of changes required
   iX = InStr(sText, sOld)
   Do While iX > 0
      iCount = iCount + 1
      iX = InStr(iX + iLenOld, sText, sOld)
   Loop

   'If no changes required, just exit
   If iCount = 0 Then Return sText

   'Make room in output
   sTemp = Space$(Len(sText) + (iLenNew - iLenOld) * iCount)

   iPosBuf = 1
   iPosOld = 1
   iPosNew = InStr(sText, sOld)

   Do While iPosNew > 0

      'If there's at least a char before the new string
      If iPosNew > iPosOld Then
         Mid(sTemp, iPosBuf) = Mid$(sText, iPosOld, iPosNew - iPosOld)
         iPosBuf = iPosBuf + iPosNew - iPosOld
      End If
      iPosOld = iPosNew + iLenOld

      'Now move in the sNew
      Mid(sTemp, iPosBuf) = sNew
      iPosBuf = iPosBuf + iLenNew

      'Search for the next sOld
      iPosNew = InStr(iPosNew + iLenOld, sText, sOld)

   Loop

   'If there's at least a trailing char
   If iPosOld <= Len(sText) Then Mid(sTemp, iPosBuf) = Mid$(sText, iPosOld)

   Return sTemp
End Function
Zippy
Posts: 1295
Joined: Feb 10, 2006 18:05

Postby Zippy » Oct 21, 2008 3:14

This is a short and simple version:

Code: Select all

'substring replacement
'
declare function _
    str_replace(_
        byref text as string,_
        byref substring as string,_
        byref newstring as string) as string
'
dim as string text,newtext,ztext
text="Now is the time for all good men to come to the aid of the party.."
'
newtext=str_replace(text,"t","*Ralph*")
'
print newtext
print
'
ztext=str_replace(newtext,"*Ralph*","t")
'
print ztext
print
'
sleep
end
'
function _
    str_replace(_
        byref text as string,_
        byref substring as string,_
        byref newstring as string) as string
'
    dim as integer slt=len(text)     
        if slt=0 then return ""
    dim as integer sls=len(substring)
        if sls=0 then return text
    dim as integer sln=len(newstring)
        if sln=0 then return text
    dim as integer p=instr(text,substring)
        if p=0 then return text   
    dim as integer c,i,ni,rpc
    '
    while p>0
        text[p-1]=0
        rpc+=1
        p+=1
        p=instr(p,text,substring)
    wend
    '
    dim as string rstr=space(slt-(rpc*sls)+(rpc*sln))
    '
    while i<slt
        '
        if text[i]>0 then
            rstr[ni]=text[i]
            ni+=1:i+=1
        else
            for c=0 to sln-1
                rstr[ni+c]=newstring[c]
            next
            text[i]=substring[0]
            ni+=c:i+=sls
        end if
        '
    wend
    '
    return rstr
'
end function

Sisophon's CRT version is 2-3 times faster. Using mid() will be slower.

Return to “General”

Who is online

Users browsing this forum: No registered users and 5 guests