levenshtein 


string apg

CALCULATE the levenshtein distance between two STRINGS.





This function returns the Levenshtein distance defined as the minimal number of characters you have to replace, insert or delete to transform $str1 into $str2.

If one of $str1 or $str2 is longer than the limit of 255 characters, this function returns -1.

The complexity of the algorithm is O(m*n), where n and m are the length of $str1 and $str2; rather good when compared to the function similar_text, which is O(max(n,m)**3), but still expensive.

In its simplest form the function will take only the two STRINGS as parameter and will calculate just the number of insert, replace and delete operations needed to transform $str1 into $str2.

A second variant will take three additional parameters that define the cost of insert, replace and delete operations.

This is more general and adaptive than variant one, but not as efficient.

Vladimir Levenshtein  



<?php

int levenshtein 
str $str1 str $str2 )


where,

$str1 The first STRING being evaluated 
                             
for Levenshtein distance

$str2 
The second STRING being evaluated 
                             
for Levenshtein distance

?>
 

<?php

int levenshtein 
str $str1 str $str2 
                                          
int $cost_ins 
                                          
int $cost_rep 
                                          
int $cost_del )


where,

$str1 The first STRING being evaluated 
                                 
for Levenshtein distance

$str2 
The second STRING being evaluated 
                                 
for Levenshtein distance

$cost_ins 
To define the cost of insertion

$cost_rep 
To define the cost of replacement

$cost_del 
To define the cost of deletion

?>
 

$str1


The first STRING to get the Levenshtein distance.



$str2


The second STRING to get the Levenshtein distance.



$cost_ins


The cost of insertion.



$cost_rep


The cost of replacement.



$cost_del


The cost of deletion.



  1 EXERCISE   

<?php

$str01a 
'widow';

$str01b 'window';

$lev01 levenshtein($str01a$str01b);

echo 
'The Levenshtein distance between between "' 
              
$str01a '" and "' $str01b '" is ' $lev01 '.';

?> 

 RESULT   

The Levenshtein distance between between "widow" and "window" is 1.

To transform "widow" into "window" you need only to insert 1 character.


  2 EXERCISE   

<?php

$str02a 
'primary color';

$arrstr02a = ['red''green''blue'];

foreach(
$arrstr02a as $str02 => $st02)
{
echo 
'The Levenshtein distance between between "'
 
$str02a '" and "' $arrstr02a[$str02] . '" is '
 
levenshtein($str02a$st02) . '.<br><br>';
}
echo 
'<br><br>';

foreach(
$arrstr02a as $str02 => $st02)
{
echo 
'The Levenshtein distance between between "'
 
$arrstr02a[$str02] . '" and "' $str02a '" is '
 
levenshtein($st02$str02a) . '.<br><br>';
}

?> 

 RESULT   

 The Levenshtein distance between between "primary color" and "red" is 12.

The Levenshtein distance between between "primary color" and "green" is 12.

The Levenshtein distance between between "primary color" and "blue" is 12.


The Levenshtein distance between between "red" and "primary color" is 12.

The Levenshtein distance between between "green" and "primary color" is 12.

The Levenshtein distance between between "blue" and "primary color" is 12.


  3 EXERCISE   

<?php

$str03a 
'PRIMARY COLOR';

$arrstr03a = ['red''green''blue'];

foreach(
$arrstr03a as $str03 => $st03)
{
echo 
'The Levenshtein distance between between "'
 
$str03a '" and "' $arrstr03a[$str03] . '" is '
 
levenshtein($str03a$st03) . '.<br><br>';
}
echo 
'<br><br>';

foreach(
$arrstr03a as $str03 => $st03)
{
echo 
'The Levenshtein distance between between "'
 
$arrstr03a[$str03] . '" and "' $str03a '" is '
 
levenshtein($st03$str03a) . '.<br><br>';
}

?> 

 RESULT   

 The Levenshtein distance between between "PRIMARY COLOR and "red" is 13.

The Levenshtein distance between between "PRIMARY COLOR and "green" is 13.

The Levenshtein distance between between "PRIMARY COLOR and "blue" is 13.


The Levenshtein distance between between "red" and "PRIMARY COLOR is 13.

The Levenshtein distance between between "green" and "PRIMARY COLOR is 13.

The Levenshtein distance between between "blue" and "PRIMARY COLOR is 13.



The results of this exercise show that in order to transform each of the color names into 'PRIMARY COLOR', it takes 13 characters, that is, the number of characters in the string 'PRIMARY COLOR'.

It is worth noting that 'PRIMARY COLOR' is written in capital letters and each color name is in lowercase.


  4 EXERCISE   

<?php

$str04a 
'widow';

$str04b 'window';

$cins04 1;

$crep04 0;

$cdel04 2;

echo 
'The Levenshtein distance between between "'
 
$str04a '" and "' $str04b '" is '
 
levenshtein($str04a$str04b$cins04
                                   
$crep04$cdel04 ) . '.<br><br>';
 
echo 
'The Levenshtein distance between between "'
 
$str04b '" and "' $str04a '" is '
 
levenshtein($str04b$str04a$cins04
                                   
$crep04$cdel04) . '.<br>';

?> 

 RESULT   

The Levenshtein distance between between "widow" and "window" is 1.

The Levenshtein distance between between "window" and "widow" is 2.


  5 EXERCISE   

<?php

$str05a 
"arbitro de futebol";

$str05b "árbitro de futebol";
// soccer referee in portuguese

$cins05 0;

$crep05 1;

$cdel05 1;


echo 
'The Levenshtein distance between between "'
 
$str05a '" and "' $str05b '" is '
 
levenshtein($str05a$str05b$cins05
                                    
$crep05$cdel05 ) . '.<br><br>';
 
echo 
'The Levenshtein distance between between "'
 
$str05b '" and "' $str05a '" is '
 
levenshtein($str05b$str05a$cins05
                                    
$crep05$cdel05) . '.<br>';

?> 

 RESULT   

The Levenshtein distance between between "arbitro de futebol" and "árbitro de futebol" is 1.

The Levenshtein distance between between "árbitro de futebol" and "arbitro de futebol" is 2.


  6 EXERCISE   

<?php

echo '<br><br>Equal<br>' \PHP_EOL;
var_dump(levenshtein('12345''12345'));

echo 
'<br><br>First string empty<br>' \PHP_EOL;
var_dump(levenshtein('''xyz'));
echo 
'<br><br>Second string empty<br>' \PHP_EOL;
var_dump(levenshtein('xyz'''));
echo 
'<br><br>Both empty<br>' \PHP_EOL;
var_dump(levenshtein(''''));
var_dump(levenshtein(''''101010));

echo 
'<br><br>1 character<br>' \PHP_EOL;
var_dump(levenshtein('1''2'));
echo 
'<br><br>2 character swapped<br>' \PHP_EOL;
var_dump(levenshtein('12''21'));

echo 
'<br><br>Inexpensive deletion<br>' \PHP_EOL;
var_dump(levenshtein('2121''11'2));
echo 
'<br><br>Expensive deletion<br>' \PHP_EOL;
var_dump(levenshtein('2121''11'215));

echo 
'<br><br>Inexpensive insertion<br>' \PHP_EOL;
var_dump(levenshtein('11''2121'));
echo 
'<br><br>Expensive insertion<br>' \PHP_EOL;
var_dump(levenshtein('11''2121'5));

echo 
'<br><br>Expensive replacement<br>' \PHP_EOL;
var_dump(levenshtein('111''121'232));
echo 
'<br><br>Very expensive replacement<br>' \PHP_EOL;
var_dump(levenshtein('111''121'292));

?>

  7 EXERCISE   

<?php

// had as a bug

function levshtein$parA$parB)
{
    echo 
"The levenshtein between '$parA' and '$parB' is " 
               
levenshtein($parA$parB) . "<br><br>";
}

levshtein('a''bc');

levshtein('xa''xbc');

levshtein('xax''xbcx');

levshtein('ax''bcx');

levshtein('árbitro''arbitro');

echo 
'<br>';

levshtein('debugg''debug');

levshtein('ddebug''debug');

levshtein('debbbug''debug');

levshtein('debugging''debuging');

echo 
'<br>';

levshtein('13458''12345');

levshtein('1345''1234');

?>

  8 EXERCISE   

<?php

// levenshtein no longer has a maximum string length limit.

echo '<br><br>String 1<br>' \PHP_EOL;
var_dump(levenshtein('Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abc'
'A'));
try {
    
var_dump(levenshtein('Abcdefghijklmnopqrstuvwxyz
    Abcdefghijklmnopqrstuvwxyz
    Abcdefghijklmnopqrstuvwxyz
    Abcdefghijklmnopqrstuvwxyz
    Abcdefghijklmnopqrtsuvwxyz
    Abcdefghijklmnopqrtsuvwxyz
    Abcdefghijklmnopqrtsuvwxyz
    Abcdefghijklmnopqrtsuvwxyz
    Abcdefghijklmnopqrtsuvwx'
'A'));
} catch (
\ValueError $e) {
    echo 
$e->getMessage() . \PHP_EOL;
}
echo 
'<br><br>String 2<br>' \PHP_EOL;
var_dump(levenshtein('A''Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abc'
));
try {
    
var_dump(levenshtein('A''Abcdefghijklmnopqrstuvwxyz
    Abcdefghijklmnopqrstuvwxyz
    Abcdefghijklmnopqrstuvwxyz
    Abcdefghijklmnopqrstuvwxyz
    Abcdefghijklmnopqrtsuvwxyz
    Abcdefghijklmnopqrtsuvwxyz
    Abcdefghijklmnopqrtsuvwxyz
    Abcdefghijklmnopqrtsuvwxyz
    Abcdefghijklmnopqrtsuvwx'
));
} catch (
\ValueError $e) {
    echo 
$e->getMessage() . \PHP_EOL;
}

// TODO ValueError for negative costs?
// var_dump(levenshtein("", "", -1, -1, -1));

?>