Recipe 6.5 Finding the Nth Occurrence of a Match

6.5.1 Problem

You want to find the Nth match in a string, not just the first one. For example, you'd like to find the word preceding the third occurrence of "fish":

One fish two fish red fish blue fish

6.5.2 Solution

Use the /g modifier in a while loop, keeping count of matches:

$WANT = 3;
$count = 0;
while (/(\w+)\s+fish\b/gi) {
    if (++$count =  = $WANT) {
        print "The third fish is a $1 one.\n";
        # Warning: don't `last' out of this loop
The third fish is a red one.

Or use a repetition count and repeated pattern like this:


6.5.3 Discussion

As explained in this chapter's Introduction, using the /g modifier in scalar context creates something of a progressive match, useful in while loops. This is commonly used to count the number of times a pattern matches in a string:

# simple way with while loop
$count = 0;
while ($string =~ /PAT/g) {
    $count++;               # or whatever you'd like to do here

# same thing with trailing while
$count = 0;
$count++ while $string =~ /PAT/g;

# or with for loop
for ($count = 0; $string =~ /PAT/g; $count++) { }

# Similar, but this time count overlapping matches
$count++ while $string =~ /(?=PAT)/g;

To find the Nth match, it's easiest to keep your own counter. When you reach the appropriate N, do whatever you care to. A similar technique could be used to find every Nth match by checking for multiples of N using the modulus operator. For example, (++$count % 3) = = 0 would be used to find every third match.

If this is too much bother, you can always extract all matches and then hunt for the ones you'd like.

$pond  = 'One fish two fish red fish blue fish';

# using a temporary
@colors = ($pond =~ /(\w+)\s+fish\b/gi);      # get all matches
$color  = $colors[2];                         # then the one we want

# or without a temporary array
$color = ( $pond =~ /(\w+)\s+fish\b/gi )[2];  # just grab element 3

print "The third fish in the pond is $color.\n";
The third fish in the pond is red.

To find all even-numbered fish:

$count = 0;
$_ = 'One fish two fish red fish blue fish';
@evens = grep { $count++ % 2 =  = 0 } /(\w+)\s+fish\b/gi;
print "Even numbered fish are @evens.\n";
Even numbered fish are two blue.

For substitution, the replacement value should be a code expression that returns the proper string. Make sure to return the original as a replacement string for cases you aren't interested in changing. Here we fish out the fourth specimen and turn it into a snack:

$count = 0;
   \b               # makes next \w more efficient
   ( \w+ )          # this is what we'll be changing
     \s+ fish \b
    if (++$count =  = 4) {
        "sushi" . $2;
    } else {
         $1   . $2;
One fish two fish red fish sushi fish

Picking out the last match instead of the first one is a fairly common task. The easiest way is to skip the beginning part greedily. After /.*\b(\w+)\s+fish\b/s, for example, the $1 variable has the last fish.

Another way to get arbitrary counts is to make a global match in list context to produce all hits, then extract the desired element of that list:

$pond = 'One fish two fish red fish blue fish swim here.';
$color = ( $pond =~ /\b(\w+)\s+fish\b/gi )[-1];
print "Last fish is $color.\n";
Last fish is blue.

To express this same notion of finding the last match in a single pattern without /g, use the negative lookahead assertion (?!THING). When you want the last match of arbitrary pattern P, you find P followed by any amount of not P through the end of the string. The general construct is P(?!.*P)*, which can be broken up for legibility:

    P               # find some pattern P
    (?!             # mustn't be able to find
        .*          # something
        P           # and P

That leaves us with this approach for selecting the last fish:

$pond = 'One fish two fish red fish blue fish swim here.';
if ($pond =~ m{
                \b  (  \w+) \s+ fish \b
                (?! .* \b fish \b )
            }six )
    print "Last fish is $1.\n";
} else {
    print "Failed!\n";
Last fish is blue.

This approach has the advantage that it can fit in just one pattern, which makes it suitable for similar situations as shown in Recipe 6.18. It has its disadvantages, though. It's obviously much harder to read and understand, although once you learn the formula, it's not too bad. However, it also runs more slowlyaround half as fast on the data set tested here.

6.5.4 See Also

The behavior of m//g in scalar context is given in the "Regexp Quote-like Operators" section of perlop(1), and in the "Pattern Matching Operators" section of Chapter 5 of Programming Perl; zero-width positive lookahead assertions are shown in the "Regular Expressions" section of perlre(1), and in the "Fancy Patterns" section of Chapter 5 of Programming Perl