Comparing strings

General FreeBASIC programming questions.
Post Reply
squall4226
Posts: 284
Joined: Dec 21, 2008 15:08
Contact:

Comparing strings

Post by squall4226 »

I know I asked this before, but the truth is I am just not understanding. I understand what I am trying to do, but not how I am trying to do it. I am comparing two strings, I want to return the percentage match, and also add 100 to a score variable for each letter matched. Here is my best shot:

Code: Select all

Sub compare
      'user is the users input
      'break here if a string has already been compared for this lyric
       'No retrying!
	If scorer.score(lcount) <> 0 Then
		user = ""
		return
	EndIf
	Dim As Integer g
	'perfect
	If UCase(user) = UCase(cursong.lyricarray(lcount)) Then
		scorer.score(lcount) = (100 * Len(cursong.lyricarray(lcount))) + 500 ' perfect bonus
		scorer.grade(lcount) = "1"
		score+= scorer.score(lcount)
	Else
        'compare per letter
		For p As Integer = 1 To Len(cursong.lyricarray(lcount))
			If Right(Left(user,p),1) = Right(Left(cursong.lyricarray(lcount),p),1) Then
				scorer.score(lcount)+ = 100
			EndIf
		Next
		score+=scorer.score(lcount)
		scorer.grade(lcount) = Str( (scorer.score(lcount)/100)/Len(cursong.lyricarray(lcount)))
	EndIf


	user = ""
End Sub
Now this will obviously match perfect matches. But here are some cases and what I'd like to happen:

cursong.lyricarray(lcount) = "concession"
_______________________________
user = "concession"
100% match: This works in my code

user = "condfsdfds"
40% match: this works in my code

user = "conchghession"
100% match, but I would add a penalty for the extra letters: This does not work in my code

user = "codfdsncesfdsfdikosn"
100% match again, - big penalty for extra letters: this does not work in my code

user = "confgfession"
90% match, this one I have no idea how to catch, explained below

So I know how to get the first two cases, for case three and four, I could just check for each letter in order, discarding incorrect ones. But for case 5, there is no second C, so that would not produce the output I want. Sorry to ask this again, but I am just not following this one hehe.


~Blyss
j_milton
Posts: 458
Joined: Feb 11, 2010 17:35

Post by j_milton »

So as I understand it the user can make the following types of errors:

1) they can omit one or more char's in 1 or more positions

2) they can insert 1 or more chars that were not in the target in 1 or more positions

3) they can substitute a single incorrect char for a correct char in 1 or more positions
and
they can make any number and mixture of type 1,2, & 3 errors in a given run.

If so then
- detecting a perfect score is trivial
- detecting type 3 errors only is trivial

Beyond that it gets tricky, type 2 and 3 errors are easy if the char they typed does not appear at all in the correct lyric, but other than that case it seems to me that you have a complex problem.

One method that you might want to consider for these cases is to calculate the "Levenshtein distance" , or maybe the "Damerau–Levenshtein distance" between the correct lyric and the user typed lyric, with a shorter distance being a better score. Explanation of these (including pseudo-code) is here:
http://en.wikipedia.org/wiki/Levenshtein_distance
http://en.wikipedia.org/wiki/Damerau%E2 ... n_distance
squall4226
Posts: 284
Joined: Dec 21, 2008 15:08
Contact:

Post by squall4226 »

From wikipedia:
The Levenshtein distance between two strings is defined as the minimum number of edits needed to transform one string into the other
Excellent!
I am going to study them now until I get it. Thank you very much!
duke4e
Posts: 717
Joined: Dec 04, 2005 0:16
Location: Varazdin, Croatia, Europe
Contact:

Post by duke4e »

squall4226
Posts: 284
Joined: Dec 21, 2008 15:08
Contact:

Post by squall4226 »

squall4226
Posts: 284
Joined: Dec 21, 2008 15:08
Contact:

Post by squall4226 »

Thanks to j_milton, duke4e and sir_mud! This is what I am doing for now:

Code: Select all

If UCase(user) = UCase(cursong.lyricarray(lcount)) Then
scorer.score(lcount) = (100 * Len(cursong.lyricarray(lcount))) *2
scorer.grade(lcount) = 1
score+= scorer.score(lcount)
Else
'score = length of (lyric * 100) - (100 * levenshtein distance)
'but not < 0
distance = levenshtein_distance(user, cursong.lyricarray(lcount))

scorer.score(lcount) = ((Len(cursong.lyricarray(lcount))) - (distance)) *100

If scorer.score(lcount) < 0 Then scorer.score(lcount) = 0
If distance >= Len(cursong.lyricarray(lcount)) Then 
scorer.grade(lcount) = 0
Else

scorer.grade(lcount) =  (Len(cursong.lyricarray(lcount))-distance) / (Len(cursong.lyricarray(lcount)))

EndIf
EndIf
I am interested in the transposition "detection" in the Damerau–Levenshtein distance algorithm, but it is pretty complicated, so this will do well for now!

Thanks again!
~Blyss

P.S.

I still only have a rudimentary understanding of how this works, I am going to "take it apart" at one point so I can learn, not just use.
dodicat
Posts: 7979
Joined: Jan 10, 2006 20:30
Location: Scotland

Re: Comparing strings

Post by dodicat »

squall4226 wrote:I know I asked this before, but the truth is I am just not understanding. I understand what I am trying to do, but not how I am trying to do it. I am comparing two strings, I want to return the percentage match, and also add 100 to a score variable for each letter matched. Here is my best shot:
Here is a simple way, using my own variables:
EDITED AFTER FIRST POST

Code: Select all


'STRING COMPARER
     Type score
        As Double percentage
        As Integer value
    End Type
    Dim Shared info As score
    
    Sub compare(Byref _correct As String,Byref _guess As String)
        dim as string correct=lcase(_correct)
        dim as string guess=lcase(_guess)
       Dim As String var1
      Dim As String var2 
      #macro split(stri,char)
      Scope
      Dim As Long pst
      
    pst=Instr(stri,char)
    var1="":var2=""
    If pst<>0 Then
    var1=Mid(stri,1,pst-1)
    var2=Mid(stri,pst+1)
Else
    var1=stri
    Endif
End Scope

    #endmacro 
    
    Dim ch As String
    Dim As String copycorrect,copyguess,t1,t2
    Dim z As Integer
    For z =1 To Len (correct)
        t1=(Left(correct,z))
        t2=(Left(guess,z))
        copycorrect=Mid(correct,z)
        copyguess=Mid(guess,z)
        ch= Mid(correct,z,1)
        split(copycorrect,ch)
        
        split(copyguess,ch)
        
        If var2="" Then
           If (correct)<>(guess) Then z=z-1
        Exit For
        End If
        If t1<>t2 Then 
        z=z-1
        Exit For
        End If
        Next z
    info.value =z*100
    info.percentage=z/Len(guess)*100
End Sub

Dim As String s1,s2

    s1="concession" 'correct string
    s2="Codfdsncesfdsfdikosn" 'trial string
   ' s2="coNcession"
   's2="condfsdfds"
    compare(s1,s2)
    
    Print s1,s2
    Print "score ";info.value;"   percentage correct";info.percentage
    Sleep
     

 
sir_mud
Posts: 1401
Joined: Jul 29, 2006 3:00
Location: US
Contact:

Post by sir_mud »

Here's my version that while not as "cool" as levenstien distance, I believe it is what you were looking for and a bit easier
to understand while introducing little new stuff (maybe the string indexing).

Code: Select all


function FuzzyMatch( byref correct as const string, byref match as const string ) as uinteger

	var slen = len(correct)
	if len(match) < slen then slen = len(match)
	var cp = 0
	var ret = 0u

	while cp < slen

		if match[cp] = correct[cp] then ret += 1
		cp += 1

	wend

	return (ret/slen)*100

end function


'Test how this works:
if __FB_ARGC__ < 3 then 
	print "usage: fm 'correct' 'match'"
	end 42
endif

var correct = command(1)
var match =  command(2)

var fm = FuzzyMatch( correct, match )

print using "'&' matches '&' ###%"; match, correct, fm

dodicat
Posts: 7979
Joined: Jan 10, 2006 20:30
Location: Scotland

Post by dodicat »

sir_mud wrote:Here's my version that while not as "cool" as levenstien distance, I believe it is what you were looking for and a bit easier
to understand while introducing little new stuff (maybe the string indexing).
Hi SIR
Nice neat easy to understand method.
I see that you have a different interpretation of Squall's requirements to my own.
I just used comparison from the beginning of the guess, if you got the first letter wrong, then my interpretation is that you are a bit of a twit, and get no reward.
My percentage reflects the number you get right to the length of your guess, so if you bung in millions of letters, my interpretation is you are a bit of a chancer.
It looks as though, since the advent of fb 21, that neat easy code like your own is no longer viable to solve easy questions.
Now you must have recourse to a double barrelled algorith, preferably translated from C++ pseudo code.
J_milton picked a cracker in the famous Damerau–Levenshtein .
So, in light of this new found fad, I'm going to do the same thing, and if I can't come across a fitting name, I'll just make my own what d'ya think?

Here's the Leiper and Poynter method incorporated with the thing that J_milton suggested.

Code: Select all


declare Sub compare(Byref _correct As String,Byref _guess As String)
declare function bigformat(num as string) as string

'YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY
  ' LEIPER AND POYNTER ALGORITHM FOR EXPANSIVE FORMATTING OF NUMERICAL STRINGS
  ' THE NEGOTIATED NEGATED Damerau–Levenshtein ALGORITHM FOR DEPTH SOUNDING
  ' TRANSLTED FROM THE C++ plus AND OBJECTIVE CAML PSEUDO CODE.
  ' THIS CODE IS OPEN SOURCE TO THE BEST OF MY RECKONING
  ' IN OTHER WORDS, IT CAN BE MESSED WITH, AT YOUR OWN PERIL
  ' DODICAT 27th. August 2010 @c

     Type score
        As Double percentage
        As Integer value
    End Type
    Dim Shared info As score
   ' *********************** **************************************
Dim As String s1,s2

    
   s1=bigformat("99887504534e-867")'the proper thing
   s2=bigformat("99887504533e-867")'the guess
    compare(s1,s2)
    
    Print s1
    print s2
    print
    Print "score ";info.value;"   percentage correct";info.percentage
    Sleep
     
' ************************ **************************************** 
    Sub compare(Byref _correct As String,Byref _guess As String)
        Dim As String correct=Lcase(_correct)
        Dim As String guess=Lcase(_guess)
       Dim As String var1
      Dim As String var2 
      #macro split(stri,char)
      Scope
      Dim As Long pst=Instr(stri,char)
    var1="":var2=""
    If pst<>0 Then
    var1=Mid(stri,1,pst-1)
    var2=Mid(stri,pst+1)
Else
    var1=stri
    Endif
End Scope
    #endmacro 
    
    Dim ch As String
    Dim As String copyguess,t1,t2
    Dim z As Integer
    For z =1 To Len (correct)
        t1=(Left(correct,z))
        t2=(Left(guess,z))
        copyguess=Mid(guess,z)
        ch= Mid(correct,z,1)
        
        split(copyguess,ch)
        
        If var2="" Then
           If (correct)<>(guess) Then z=z-1
        Exit For
        End If
        If t1<>t2 Then 
        z=z-1
        Exit For
        End If
        Next z
    info.value =z*100
    info.percentage=z/Len(guess)*100
End Sub

 
function bigformat(num as string) as string
    dim n as string=lcase(num)
    for z as integer=1 to len(n)   'if d is used, make it e
        if mid(n,z,1)="d" then mid(n,z,1)="e"
        next z
    if instr(n,"e") = 0 then return num 'no formatting needed
dim asci as integer
dim temp as string
' macro Remove all char from s
 #macro remove(s,char)  
    asci=asc(char)
    For i as long =0 To Len(s)-1 
    If s[i]<>asci Then temp=temp+Chr$(s[i])
    Next i
s= temp:temp=""
#endmacro
dim as string part1,part2
'macro insert a character into a string
#macro insert(s,char,position)
 part1=mid$(s,1,position-1)
 part2=mid$(s,position)
 s=part1+char+part2
 #endmacro
remove(n,"+")                     'get rid of + signs
'       _________________________________________________
    dim as string var1,var2,p1,p2
    dim c as integer
    dim tr as integer
    ' macro split on either side of the char
    #macro two_parts(stri,char,var1,var2)
    for c=1 to len(stri)
        if tr=0 and mid(stri,c,1)<>char then 
        var1=var1+mid(stri,c,1)
        endif
        if mid(stri,c,1)=char then tr=1
        if tr=1 then var2=var2+mid(stri,c+1,1)
    next c
    tr=0
    #endmacro
    
    two_parts(n,"e",p1,p2)    'e.g. 45.7e9 becomes p1=45.7 and p2= 9
    p1=rtrim(p1,".")          'get rid of unneeded decimal
    
    select case instr(p2,"-") 'e.g. 8e-9 where p2=-9
    case 0
    '________________________________________
    select case instr(p1,".") 'e.g. 5.8e3 where p1=5.8
    case 0  'no decimals
       return p1+string(val(P2),"0")
    case else
     dim as integer lf1
     dim as string f1,f2
     two_parts(p1,".",f1,f2)  'e.g. 67.78e-7 f2=78
     lf1=len(f2)
     return f1+f2+string(val(p2)-lf1,"0")
        end select
    '_end case 0________________________________________
    case else 'instr(p2,-) e.g. 78e-12 where p2=-12
        dim zeros as integer
        dim as integer lf1
        dim as string f1,f2,sign
        if instr(p1,"-") then sign="-"
        two_parts(p2,"-",f1,f2)
        zeros=val(f2)
        two_parts(p1,".",f1,f2)
        remove(f1,"-")
        
        if len(f1) > val(f2)then  'e.g. 12345e-4
        insert(f1,".",len(f1)-val(f2)+1)
        return sign+f1
        end if
        
        lf1=len(f1)
        remove(p1,"-")
        remove(p1,".")
        return ltrim(sign+"."+string(zeros-lf1,"0")+p1,"0")
        end select
    end function

    
    
    
squall4226
Posts: 284
Joined: Dec 21, 2008 15:08
Contact:

Post by squall4226 »

sir_mud wrote:Here's my version that while not as "cool" as levenstien distance, I believe it is what you were looking for and a bit easier
to understand while introducing little new stuff (maybe the string indexing).
You are right, that is pretty much close to what I asked for and I do understand it. On the other hand, I kind of like the Levenshtein Distance formula and the way I have been using it. I can use it to penalize for extra nonsense letters, it roughly gives an idea how many letters correct there are, in fact it does do that if the strings are of equal length or len(user) < len(correct). Given my ideas for the game I am using this in, I like it this way. I have learned the levenshtein distance theory now to, I can compute it in my head(not quickly though) but how the code is working is still very much eluding me, ha!

~Blyss

Edit I made this bit to tell the number of characters correct, in order, out of position. That way I can know the number correct if len(tocompare)> len(correct), since I can't compute that with the edit distance:

Code: Select all

'compares strings and returns number of letters correct
'They have to be in order, but not neccesarrily in the correct position.
Function strNumCorrect(toCompare As String, correct As String) As Integer
Dim As Integer numCorrect = 0
Dim As Integer bb = 1
For b As Integer = 1 To Len(toCompare)
	If Right(Left(toCompare,b),1) = Right(Left(correct,bb),1) Then
		numCorrect+=1
		bb+=1
	EndIf
	If numCorrect = Len(correct) Then Exit for
Next
Return numCorrect
End Function

'example stuff

Print strNumCorrect("Does this phrase contain test?", "test")
Print strNumCorrect("Does this phrase contain the same thing?", "test")
''''''''''''''''''''''''''t'''''''''e'''''''''''''s'''''t
Print strNumCorrect("Only one match", "test")
''''''''''''''''''''''''''''''''t''
Print strNumcorrect("Two matched", "test")
'etc

Sleep
Post Reply