Levenshtein Distance Algorithm

Post your FreeBASIC tips and tricks here. Please don’t post your code without including an explanation.
sir_mud
Posts: 1401
Joined: Jul 29, 2006 3:00
Location: US
Contact:

Levenshtein Distance Algorithm

Postby sir_mud » Oct 09, 2007 4:42

Here's an implementation of the levenshtein Distance Algorithm explained here: http://www.merriampark.com/ld.htm

Code: Select all

'Levenshtein Distance Algorithm for FreeBASIC
'Based on the C implementation of Lorenzo Seidenari here: http://www.merriampark.com/ldc.htm
'This code is assumed to be available under the Public Domain.

declare function levenshtein_distance( s as string, t as string ) as integer
declare function lev_minimum( a as integer, b as integer, c as integer ) as integer

'Just a simple test of the algorithm
? levenshtein_distance( command(1), command(2) )

function levenshtein_distance( s as string, t as string ) as integer

dim as integer k, i, j, n, m, cost, distance
dim as integer ptr d

n = len(s)
m = len(t)

if (n <> 0) AND (m <> 0) then
   d = allocate( sizeof(integer) * (m+1) * (n+1) )
   m += 1
   n += 1
   k = 0

   while k < n
      d[k]=k
      k += 1
   wend

   k = 0
   while k < m
      d[k*n]=k
      k += 1
   wend

   i = 1
   while i < n
      j = 1

      while j<m
         if (s[i-1] = t[j-1]) then
            cost = 0

         else
            cost = 1

         end if

         d[j*n+i] = lev_minimum(d[(j-1)*n+i]+1, d[j*n+i-1]+1, d[(j-1)*n+i-1]+cost)

         j += 1
      wend

      i += 1
   wend

   distance = d[n*m-1]
   deallocate d

   return distance

else
   return -1

end if

end function

function lev_minimum( a as integer, b as integer, c as integer ) as integer

var min = a

if (b<min) then min = b
if (c<min) then min = c

return min

end function

Pritchard
Posts: 5492
Joined: Sep 12, 2005 20:06
Location: Ohio, USA

Postby Pritchard » Oct 09, 2007 13:00

This is awesome.
maddogg6
Posts: 824
Joined: Dec 07, 2005 22:58
Contact:

Postby maddogg6 » Oct 09, 2007 15:34

In case someone needs to look it up (like I did)...
(Link in sir_mud's code comment - linked to only the C source code... not to the explanation - so here it is)

from: http://www.merriampark.com/ld.htm
Levenshtein distance (LD) is a measure of the similarity between two strings, which we will refer to as the source string (s) and the target string (t). The distance is the number of deletions, insertions, or substitutions required to transform s into t. For example,

* If s is "test" and t is "test", then LD(s,t) = 0, because no transformations are needed. The strings are already identical.
* If s is "test" and t is "tent", then LD(s,t) = 1, because one substitution (change "s" to "n") is sufficient to transform s into t.

The greater the Levenshtein distance, the more different the strings are.

Levenshtein distance is named after the Russian scientist Vladimir Levenshtein, who devised the algorithm in 1965. If you can't spell or pronounce Levenshtein, the metric is also sometimes called edit distance.

The Levenshtein distance algorithm has been used in:

* Spell checking
* Speech recognition
* DNA analysis
* Plagiarism detection


oh - and.... very cool sir_mud, I never knew such an algo existed. Thanks for sharing.

Return to “Tips and Tricks”

Who is online

Users browsing this forum: Exabot [Bot] and 2 guests