levenshtein
CALCULATE the levenshtein distance between two STRINGS.
This function returns the Levenshtein distance defined as the minimal number of characters you have to replace, insert or delete to transform
$str1 into
$str2.
If one of
$str1 or
$str2 is longer than the limit of 255 characters, this function returns -1.
The complexity of the algorithm is O(m*n), where n and m are the length of
$str1 and
$str2; rather good when compared to the function
similar_text, which is O(max(n,m)**3), but still expensive.
In its simplest form the function will take only the two STRINGS as parameter and will calculate just the number of insert, replace and delete operations needed to transform
$str1 into
$str2.
A second variant will take three additional parameters that define the cost of insert, replace and delete operations.
This is more general and adaptive than variant one, but not as efficient.
Vladimir Levenshtein
<?php
int levenshtein ( str $str1 , str $str2 )
where,
$str1 = The first STRING being evaluated
for Levenshtein distance
$str2 = The second STRING being evaluated
for Levenshtein distance
?>
<?php
int levenshtein ( str $str1 , str $str2 ,
int $cost_ins ,
int $cost_rep ,
int $cost_del )
where,
$str1 = The first STRING being evaluated
for Levenshtein distance
$str2 = The second STRING being evaluated
for Levenshtein distance
$cost_ins = To define the cost of insertion
$cost_rep = To define the cost of replacement
$cost_del = To define the cost of deletion
?>
$str1
The first STRING to get the Levenshtein distance.
$str2
The second STRING to get the Levenshtein distance.
$cost_ins
The cost of insertion.
$cost_rep
The cost of replacement.
$cost_del
The cost of deletion.
EXERCISE
<?php
$str01a = 'widow';
$str01b = 'window';
$lev01 = levenshtein($str01a, $str01b);
echo 'The Levenshtein distance between between "' .
$str01a . '" and "' . $str01b . '" is ' . $lev01 . '.';
?>
RESULT
The Levenshtein distance between between "widow" and "window" is 1.
To transform "widow" into "window" you need only to insert 1 character.
EXERCISE
<?php
$str02a = 'primary color';
$arrstr02a = ['red', 'green', 'blue'];
foreach($arrstr02a as $str02 => $st02)
{
echo 'The Levenshtein distance between between "'
. $str02a . '" and "' . $arrstr02a[$str02] . '" is '
. levenshtein($str02a, $st02) . '.<br><br>';
}
echo '<br><br>';
foreach($arrstr02a as $str02 => $st02)
{
echo 'The Levenshtein distance between between "'
. $arrstr02a[$str02] . '" and "' . $str02a . '" is '
. levenshtein($st02, $str02a) . '.<br><br>';
}
?>
RESULT
The Levenshtein distance between between "primary color" and "red" is 12.
The Levenshtein distance between between "primary color" and "green" is 12.
The Levenshtein distance between between "primary color" and "blue" is 12.
The Levenshtein distance between between "red" and "primary color" is 12.
The Levenshtein distance between between "green" and "primary color" is 12.
The Levenshtein distance between between "blue" and "primary color" is 12.
EXERCISE
<?php
$str03a = 'PRIMARY COLOR';
$arrstr03a = ['red', 'green', 'blue'];
foreach($arrstr03a as $str03 => $st03)
{
echo 'The Levenshtein distance between between "'
. $str03a . '" and "' . $arrstr03a[$str03] . '" is '
. levenshtein($str03a, $st03) . '.<br><br>';
}
echo '<br><br>';
foreach($arrstr03a as $str03 => $st03)
{
echo 'The Levenshtein distance between between "'
. $arrstr03a[$str03] . '" and "' . $str03a . '" is '
. levenshtein($st03, $str03a) . '.<br><br>';
}
?>
RESULT
The Levenshtein distance between between "PRIMARY COLOR and "red" is 13.
The Levenshtein distance between between "PRIMARY COLOR and "green" is 13.
The Levenshtein distance between between "PRIMARY COLOR and "blue" is 13.
The Levenshtein distance between between "red" and "PRIMARY COLOR is 13.
The Levenshtein distance between between "green" and "PRIMARY COLOR is 13.
The Levenshtein distance between between "blue" and "PRIMARY COLOR is 13.
The results of this exercise show that in order to transform each of the color names into 'PRIMARY COLOR', it takes 13 characters, that is, the number of characters in the string 'PRIMARY COLOR'.
It is worth noting that 'PRIMARY COLOR' is written in capital letters and each color name is in lowercase.
EXERCISE
<?php
$str04a = 'widow';
$str04b = 'window';
$cins04 = 1;
$crep04 = 0;
$cdel04 = 2;
echo 'The Levenshtein distance between between "'
. $str04a . '" and "' . $str04b . '" is '
. levenshtein($str04a, $str04b, $cins04,
$crep04, $cdel04 ) . '.<br><br>';
echo 'The Levenshtein distance between between "'
. $str04b . '" and "' . $str04a . '" is '
. levenshtein($str04b, $str04a, $cins04,
$crep04, $cdel04) . '.<br>';
?>
RESULT
The Levenshtein distance between between "widow" and "window" is 1.
The Levenshtein distance between between "window" and "widow" is 2.
EXERCISE
<?php
$str05a = "arbitro de futebol";
$str05b = "árbitro de futebol";
// soccer referee in portuguese
$cins05 = 0;
$crep05 = 1;
$cdel05 = 1;
echo 'The Levenshtein distance between between "'
. $str05a . '" and "' . $str05b . '" is '
. levenshtein($str05a, $str05b, $cins05,
$crep05, $cdel05 ) . '.<br><br>';
echo 'The Levenshtein distance between between "'
. $str05b . '" and "' . $str05a . '" is '
. levenshtein($str05b, $str05a, $cins05,
$crep05, $cdel05) . '.<br>';
?>
RESULT
The Levenshtein distance between between "arbitro de futebol" and "árbitro de futebol" is 1.
The Levenshtein distance between between "árbitro de futebol" and "arbitro de futebol" is 2.
EXERCISE
<?php
echo '<br><br>Equal<br>' . \PHP_EOL;
var_dump(levenshtein('12345', '12345'));
echo '<br><br>First string empty<br>' . \PHP_EOL;
var_dump(levenshtein('', 'xyz'));
echo '<br><br>Second string empty<br>' . \PHP_EOL;
var_dump(levenshtein('xyz', ''));
echo '<br><br>Both empty<br>' . \PHP_EOL;
var_dump(levenshtein('', ''));
var_dump(levenshtein('', '', 10, 10, 10));
echo '<br><br>1 character<br>' . \PHP_EOL;
var_dump(levenshtein('1', '2'));
echo '<br><br>2 character swapped<br>' . \PHP_EOL;
var_dump(levenshtein('12', '21'));
echo '<br><br>Inexpensive deletion<br>' . \PHP_EOL;
var_dump(levenshtein('2121', '11', 2));
echo '<br><br>Expensive deletion<br>' . \PHP_EOL;
var_dump(levenshtein('2121', '11', 2, 1, 5));
echo '<br><br>Inexpensive insertion<br>' . \PHP_EOL;
var_dump(levenshtein('11', '2121'));
echo '<br><br>Expensive insertion<br>' . \PHP_EOL;
var_dump(levenshtein('11', '2121', 5));
echo '<br><br>Expensive replacement<br>' . \PHP_EOL;
var_dump(levenshtein('111', '121', 2, 3, 2));
echo '<br><br>Very expensive replacement<br>' . \PHP_EOL;
var_dump(levenshtein('111', '121', 2, 9, 2));
?>
EXERCISE
<?php
// had as a bug
function levshtein( $parA, $parB)
{
echo "The levenshtein between '$parA' and '$parB' is " .
levenshtein($parA, $parB) . "<br><br>";
}
levshtein('a', 'bc');
levshtein('xa', 'xbc');
levshtein('xax', 'xbcx');
levshtein('ax', 'bcx');
levshtein('árbitro', 'arbitro');
echo '<br>';
levshtein('debugg', 'debug');
levshtein('ddebug', 'debug');
levshtein('debbbug', 'debug');
levshtein('debugging', 'debuging');
echo '<br>';
levshtein('13458', '12345');
levshtein('1345', '1234');
?>
EXERCISE
<?php
// levenshtein no longer has a maximum string length limit.
echo '<br><br>String 1<br>' . \PHP_EOL;
var_dump(levenshtein('Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abc', 'A'));
try {
var_dump(levenshtein('Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwx', 'A'));
} catch (\ValueError $e) {
echo $e->getMessage() . \PHP_EOL;
}
echo '<br><br>String 2<br>' . \PHP_EOL;
var_dump(levenshtein('A', 'Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abc'));
try {
var_dump(levenshtein('A', 'Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrstuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwxyz
Abcdefghijklmnopqrtsuvwx'));
} catch (\ValueError $e) {
echo $e->getMessage() . \PHP_EOL;
}
// TODO ValueError for negative costs?
// var_dump(levenshtein("", "", -1, -1, -1));
?>