Bug in InStrRev Function ??

St_W · Post by **St_W** » Feb 18, 2009 23:21

The following Code shows the strange behavior of the InStrRev Function:

DIM StrData AS STRING
DIM SearchPat AS STRING
DECLARE FUNCTION instrrev2(BYREF searchstr AS STRING, BYREF searchpat AS STRING) AS INTEGER

SearchPat = "TEST"

StrData = SearchPat+SearchPat+SearchPat+CHR(0)

PRINT InstrRev(StrData, SearchPat, -1)
PRINT InstrRev2(StrData, SearchPat)
PRINT MID(StrData, InstrRev2(StrData, SearchPat), LEN(SearchPat))

SLEEP


FUNCTION instrrev2(BYREF searchstr AS STRING, BYREF searchpat AS STRING) AS INTEGER
    FOR q AS INTEGER = (LEN(searchstr) - LEN(searchpat) + 1) TO 1 STEP -1
        IF MID(searchstr,q,LEN(searchpat)) = searchpat THEN RETURN q
    NEXT
    RETURN 0
END FUNCTION

Shouldn't InStrRev return the same result as the other function? The String "TEST" appears in "TESTTESTTEST"+chr(0) at the positions 1,5 and 9. Shouldn't InStrRev return 9?

Zippy · Post by **Zippy** » Feb 18, 2009 23:44

Don't append chr(0) to a "normal" freeBASIC string. It is unnecessary (there is an internal null terminating the string) and confuses all other string functions. With two nulls the end of string and string length don't match.

MichaelW · Post by **MichaelW** » Feb 18, 2009 23:51

Yes, without the trailing chr(0) InStrRev returns 9. But all is not right, because InStr will search over a leading chr(0) and return the correct position, where InStrRev appears to search over a trailing chr(0) but does not return the correct position. I think both functions should return 0 if the first character they encounter in the string is a chr(0).

St_W · Post by **St_W** » Feb 19, 2009 0:49

The source-code above is only an example. Usually I don't append a chr(0) to strings, of course. I've encountered the problem while reading Data from various files which contain binary data.
It's also possible (but uncommon) that the files end with a chr(0) as last character.

I think a Null-Character at the end of a STRING shouldn't manipulate the behaviour of the String-Functions because STRINGs can contain Null-Characters at any position - in contrast to ZSTRINGs which may not contain any chr(0)-Characters.

Zippy · Post by **Zippy** » Feb 19, 2009 0:56

I may be wrong. InStrRev may be the only string function/statement affected by appending a null to an ordinary string. It's the only function that "reads" right to left, all others "read" left to right to the first null.

But I consider this issue trivial. The point, one impressed upon me at times, is that ordinary strings should not contain nulls. If I'm going to use a string for arbitrary binary data (which may include nulls) storage then I'm "hacking" and I'm on my own (I shouldn't expect any string functions to succeed upon said hacked string, and I don't).

If you mix nulls in your strings you may very well have unexpected behavior if you then want to treat them as ordinary (untainted) strings. Don't mix nulls. Don't drive on the left side of the road in Kansas.

MichaelW · Post by **MichaelW** » Feb 19, 2009 1:20

I agree with the no nulls in STRINGs. If you place nulls in a STRING then they will not work correctly if passed to a function that expects a null-terminated string, probably the most common string format.

Zippy · Post by **Zippy** » Feb 19, 2009 1:31

St_W wrote:The source-code above is only an example. Usually I don't append a chr(0) to strings, of course. I've encountered the problem while reading Data from various files which contain binary data.
It's also possible (but uncommon) that the files end with a chr(0) as last character.

I think a Null-Character at the end of a STRING shouldn't manipulate the behaviour of the String-Functions because STRINGs can contain Null-Characters at any position - in contrast to ZSTRINGs which may not contain any chr(0)-Characters.

Ah, no - or yes depending on your perspective. A null is considered to be a terminator in both ordinary strings and zstrings, in this realm. You are making what I consider to be an arbitrary distinction between ordinary.. strings.. and zstrings. Both can "contain" nulls in that the memory allocated to either can "contain" anything, but your ability to manipulate either as string data with inherent string functions is affected by any nulls present.

You can file a bug report if you like. See this thread:

http://www.freebasic.net/forum/viewtopic.php?t=9280

I'd rather see you filter your data.

Post by **counting_pine** » Feb 19, 2009 9:46

Yes, it should return 9 in this instance. So it is a bug - do please file a report.
But the null-char has nothing to do with it. Try appending a different character - like chr(1), or "x".

There shouldn't be a problem with nulls in normal FB string functions. Although currently this isn't generally true for literals like !"foo\0bar", because string literals are treated more like zstrings.

St_W · Post by **St_W** » Feb 19, 2009 9:47

I'd like to read some data from a file and search for a pattern. The file(s) contain NULL-Characters. I store them also in a big string, which contains the whole content of the file. But then there are also these NULL-Chars in the string and some of the string functions wouldn't work correctly. What should I do else?

Code: Select all

OPEN "TEST.XYZ" FOR BINARY AS #1
DIM FileDat AS STRING

FileDat = SPACE(LOF(1))
GET #1, 1, FileDat

PRINT "Last Occurence at: "; InStrRev(FileDat, "TESTSTRING")

CLOSE

/EDIT: I've just read your answer, counting_pine. I'll submit a bug-report at sourceforge soon. At first I have to register myself there. Until the problem is fixed I'll use my own InStrRev function, written in Assembly.

St_W · Post by **St_W** » Feb 19, 2009 18:03

As mentioned above I've written an InStrRev-Function in FreeBasic+Inline Assembler that should work correctly, can handle larger strings and is multiple times faster than the origin.

If somebody needs such a function - here's the source:

Code: Select all

function InStrRev2(byref searchstr as string, byref searchpat as string, byval startpos as uinteger = 0) as uinteger
    dim SStrPtr as integer = cint(strptr(searchstr))
    dim SPatPtr as integer = cint(strptr(searchpat))
    dim SStrLen as integer = len(searchstr)
    dim SPatLen as integer = len(searchpat)
    if startpos > 0 then SStrLen = startpos + SPatLen - 1
    if SStrLen > len(searchstr) then SStrLen = len(searchstr)
    if SPatLen > SStrLen then return 0
    asm
        std
        mov ecx, [SStrLen]
        mov ebx, [SPatLen]
        mov edi, [SStrPtr]
        mov esi, [SPatPtr]
        add edi, ecx
        sub edi, ebx
        mov al,  [esi]
      instrrev_continue:  
        repne scasb
        jz instrrev_foundfirst
        jcxz instrrev_notfound
      instrrev_foundfirst:
        push ecx
        push edi
        mov ecx, ebx
        add edi, ebx
        mov esi, [SPatPtr]
        add esi, ebx
        dec esi
        repe cmpsb
        jz instrrev_found
        pop edi
        pop ecx
        jmp instrrev_continue
      instrrev_found:
        pop edi
        pop ecx
        mov edx, edi
        sub edx, [SStrPtr]
        add edx, 2
        mov [Function], edx
      instrrev_notfound:
        cld
    end asm
end function

I'm not so good in assembly programming and I'm sure the function could be optimized...