Recipe 6.11 Testing for a Valid Pattern

6.11.1 Problem

You want to let users enter their own patterns, but an invalid one would abort your program the first time you tried to use it.

6.11.2 Solution

Test the pattern in an eval { } construct first, matching against some dummy string. If $@ is not set, no exception occurred, so you know the pattern successfully compiled as a valid regular expression. Here is a loop that continues prompting until the user supplies a valid pattern:

do {
    print "Pattern? ";
    chomp($pat = <>);
    eval { "" =~ /$pat/ };
    warn "INVALID PATTERN $@" if $@;
} while $@;

Here's a standalone subroutine that verifies whether a pattern is valid:

sub is_valid_pattern {
    my $pat = shift;
    eval { "" =~ /$pat/ };
    return $@ ? 0 : 1;

Another way to write that is like this:

sub is_valid_pattern {
    my $pat = shift;
    return eval { "" =~ /$pat/; 1 } || 0;

This version doesn't need to use $@, because if the pattern match executes without exception, the next statement with just a 1 is reached and returned. Otherwise it's skipped, so just a 0 is returned.

6.11.3 Discussion

There's no limit to the number of invalid, uncompilable patterns. The user could mistakenly enter "<I\s*[^">, "*** GET RICH ***", or "+5-i". If you blindly use the proffered pattern in your program, it raises an exception, normally a fatal event.

The tiny program in Example 6-6 demonstrates this.

Example 6-6. paragrep
  # paragrep - trivial paragraph grepper
  die "usage: $0 pat [files]\n" unless @ARGV;
  $/ = '';
  $pat = shift;
  eval { "" =~ /$pat/; 1 }      or die "$0: Bad pattern $pat: $@\n";
  while (<>) {
      print "$ARGV $.: $_" if /$pat/o;

That /o means to interpolate variables once only, even if their contents later change.

You could encapsulate this in a function call that returns 1 if the block completes and 0 if not, as shown in the Solution. The simpler eval "/$pat/" would also work to trap the exception, but has two other problems. One is that any slashes (or whatever your chosen pattern delimiter is) in the string the user entered would raise an exception. More importantly, it would open a drastic security hole that you almost certainly want to avoid. Strings like this could ruin your day:

$pat = "You lose @{[ system('rm -rf *')]} big here";

If you don't want to let the user provide a real pattern, you can always metaquote the string first:

$safe_pat = quotemeta($pat);
something( ) if /$safe_pat/;

Or, even easier, use:

something( ) if /\Q$pat/;

But if you're going to do that, why are you using pattern matching at all? In that case, a simple use of index would be enough. But sometimes you want a literal part and a regex part, such as:

something( ) if /^\s*\Q$pat\E\s*$/;

Letting the user supply a real pattern gives them power enough for many interesting and useful operations. This is a good thing. You just have to be slightly careful. Suppose they wanted to enter a case-insensitive pattern, but you didn't provide the program with an option like grep's -i option. By permitting full patterns, the user can enter an embedded /i modifier as (?i), as in /(?i)stuff/.

What happens if the interpolated pattern expands to nothing? If $pat is the empty string, what does /$pat/ matchthat is, what does a blank // match? It doesn't match the start of all possible strings. Surprisingly, matching the null pattern exhibits the dubiously useful semantics of reusing the previous successfully matched pattern. In practice, this is hard to make good use of in Perl.

6.11.4 See Also

The eval function in perlfunc(1) and in Chapter 29 of Programming Perl; Recipe 10.12