Recipe 6.17 Matching Nested Patterns

6.17.1 Problem

You want to match a nested set of enclosing delimiters, such as the arguments to a function call.

6.17.2 Solution

Use match-time pattern interpolation, recursively:

```my \$np;
\$np = qr{
\(
(?:
(?> [^( )]+ )    # Non-capture group w/o backtracking
|
(??{ \$np })     # Group with matching parens
)*
\)
}x;```

Or use the Text::Balanced module's extract_bracketed function.

6.17.3 Discussion

The \$(??{ CODE }) construct runs the code and interpolates the string that the code returns right back into the pattern. A simple, non-recursive example that matches palindromes demonstrates this:

```if (\$word =~ /^(\w+)\w?(??{reverse \$1})\$/ ) {
print "\$word is a palindrome.\n";
}```

Consider a word like "reviver", which this pattern correctly reports as a palindrome. The \$1 variable contains "rev" partway through the match. The optional word character following catches the "i". Then the code reverse \$1 runs and produces "ver", and that result is interpolated into the pattern.

For matching something balanced, you need to recurse, which is a bit tricker. A compiled pattern that uses (??{ CODE }) can refer to itself. The pattern given in the Solution matches a set of nested parentheses, however deep they may go. Given the value of \$np in that pattern, you could use it like this to match a function call:

```\$text = "myfunfun(1,(2*(3+4)),5)";
\$funpat = qr/\w+\$np/;   # \$np as above
\$text =~ /^\$funpat\$/;   # Matches!```

You'll find many CPAN modules that help with matching (parsing) nested strings. The Regexp::Common module supplies canned patterns that match many of the tricker strings. For example:

```use Regexp::Common;
\$text = "myfunfun(1,(2*(3+4)),5)";
if (\$text =~ /(\w+\s*\$RE{balanced}{-parens=>'( )'})/o) {
print "Got function call: \$1\n";
}```

Other patterns provided by that module match numbers in various notations and quote-delimited strings:

```\$RE{num}{int}
\$RE{num}{real}
\$RE{num}{real}{'-base=2'}{'-sep=,'}{'-group=3'}
\$RE{quoted}
\$RE{delimited}{-delim=>'/'}```

The standard (as of v5.8) Text::Balanced module provides a general solution to this problem.

```use Text::Balanced qw/extract_bracketed/;
\$text = "myfunfun(1,(2*(3+4)),5)";
if ((\$before, \$found, \$after)  = extract_bracketed(\$text, "(")) {
} else {
print "FAILED\n";
}```

The section on "Match-time pattern interpolation" in Chapter 5 of Programming Perl; the documentation for the Regexp::Common CPAN module and the standard Text::Balanced module

 Chapter 1. Strings
 Chapter 2. Numbers
 Chapter 3. Dates and Times
 Chapter 4. Arrays
 Chapter 5. Hashes
 Chapter 7. File Access
 Chapter 8. File Contents
 Chapter 9. Directories
 Chapter 10. Subroutines
 Chapter 11. References and Records
 Chapter 12. Packages, Libraries, and Modules
 Chapter 13. Classes, Objects, and Ties
 Chapter 14. Database Access
 Chapter 15. Interactivity
 Chapter 16. Process Management and Communication
 Chapter 17. Sockets
 Chapter 18. Internet Services
 Chapter 19. CGI Programming
 Chapter 20. Web Automation
 Chapter 21. mod_perl
 Chapter 22. XML