Recipe 6.17 Matching Nested Patterns

6.17.1 Problem

You want to match a nested set of enclosing delimiters, such as the arguments to a function call.

6.17.2 Solution

Use match-time pattern interpolation, recursively:

```my \$np;
\$np = qr{
\(
(?:
(?> [^( )]+ )    # Non-capture group w/o backtracking
|
(??{ \$np })     # Group with matching parens
)*
\)
}x;```

Or use the Text::Balanced module's extract_bracketed function.

6.17.3 Discussion

The \$(??{ CODE }) construct runs the code and interpolates the string that the code returns right back into the pattern. A simple, non-recursive example that matches palindromes demonstrates this:

```if (\$word =~ /^(\w+)\w?(??{reverse \$1})\$/ ) {
print "\$word is a palindrome.\n";
}```

Consider a word like "reviver", which this pattern correctly reports as a palindrome. The \$1 variable contains "rev" partway through the match. The optional word character following catches the "i". Then the code reverse \$1 runs and produces "ver", and that result is interpolated into the pattern.

For matching something balanced, you need to recurse, which is a bit tricker. A compiled pattern that uses (??{ CODE }) can refer to itself. The pattern given in the Solution matches a set of nested parentheses, however deep they may go. Given the value of \$np in that pattern, you could use it like this to match a function call:

```\$text = "myfunfun(1,(2*(3+4)),5)";
\$funpat = qr/\w+\$np/;   # \$np as above
\$text =~ /^\$funpat\$/;   # Matches!```

You'll find many CPAN modules that help with matching (parsing) nested strings. The Regexp::Common module supplies canned patterns that match many of the tricker strings. For example:

```use Regexp::Common;
\$text = "myfunfun(1,(2*(3+4)),5)";
if (\$text =~ /(\w+\s*\$RE{balanced}{-parens=>'( )'})/o) {
print "Got function call: \$1\n";
}```

Other patterns provided by that module match numbers in various notations and quote-delimited strings:

```\$RE{num}{int}
\$RE{num}{real}
\$RE{num}{real}{'-base=2'}{'-sep=,'}{'-group=3'}
\$RE{quoted}
\$RE{delimited}{-delim=>'/'}```

The standard (as of v5.8) Text::Balanced module provides a general solution to this problem.

```use Text::Balanced qw/extract_bracketed/;
\$text = "myfunfun(1,(2*(3+4)),5)";
if ((\$before, \$found, \$after)  = extract_bracketed(\$text, "(")) {
} else {
print "FAILED\n";
}```

The section on "Match-time pattern interpolation" in Chapter 5 of Programming Perl; the documentation for the Regexp::Common CPAN module and the standard Text::Balanced module

