Recipe 6.13 Approximate Matching

6.13.1 Problem

You want to match fuzzily, that is, allowing for a margin of error, where the string doesn't quite match the pattern. Whenever you want to be forgiving of misspellings in user input, you want fuzzy matching.

6.13.2 Solution

Use the String::Approx module, available from CPAN:

use String::Approx qw(amatch);

if (amatch("PATTERN", @list)) {
    # matched
}

@matches = amatch("PATTERN", @list);

6.13.3 Discussion

String::Approx calculates the difference between the pattern and each string in the list. If less than a certain numberby default, 10 percent of the pattern lengthof one-character insertions, deletions, or substitutions are required to make the string fit the pattern, it still matches. In scalar context, amatch returns the number of successful matches. In list context, it returns the strings matched.

use String::Approx qw(amatch);
open(DICT, "/usr/dict/words")               or die "Can't open dict: $!";
while(<DICT>) {
    print if amatch("balast");
}

ballast
balustrade
blast
blastula
sandblast

Options passed to amatch control case-sensitivity and the permitted number of insertions, deletions, or substitutions. These are fully described in the String::Approx documentation.

The module's matching function seems to run between 10 and 40 times slower than Perl's built-in pattern matching. So use String::Approx only if you're after a fuzziness in your matching that Perl's patterns can't provide.

6.13.4 See Also

The documentation for the CPAN module String::Approx; Recipe 1.22