Recipe 4.9 Computing Union, Intersection, or Difference of Unique Lists

4.9.1 Problem

You have a pair of lists, each holding unduplicated items. You'd like to find out which items are in both lists (intersection), one but not the other (difference), or either (union).

4.9.2 Solution

The following solutions need the listed initializations:

```@a = (1, 3, 5, 6, 7, 8);
@b = (2, 3, 5, 7, 9);

@union = @isect = @diff = ( );
%union = %isect = ( );
%count = ( );```
4.9.2.1 Simple solution for union and intersection
```foreach \$e (@a) { \$union{\$e} = 1 }

foreach \$e (@b) {
if ( \$union{\$e} ) { \$isect{\$e} = 1 }
\$union{\$e} = 1;
}
@union = keys %union;
@isect = keys %isect;```
4.9.2.2 More idiomatic version
```foreach \$e (@a, @b) { \$union{\$e}++ && \$isect{\$e}++ }

@union = keys %union;
@isect = keys %isect;```
4.9.2.3 Union, intersection, and symmetric difference
```foreach \$e (@a, @b) { \$count{\$e}++ }

@union = keys %count;
foreach \$e (keys %count) {
if (\$count{\$e} =  = 2) {
push @isect, \$e;
} else {
push @diff, \$e;
}
}```
4.9.2.4 Indirect solution
```@isect = @diff = @union = ( );

foreach \$e (@a, @b) { \$count{\$e}++ }

@union = keys %count;
foreach \$e (keys %count) {
push @{ \$count{\$e} =  = 2 ? \@isect : \@diff }, \$e;
}```

4.9.3 Discussion

The first solution most directly computes the union and intersection of two lists, neither containing duplicates. Two hashes are used to record whether a particular item goes in the union or the intersection. We put every element of the first array in the union hash, giving it a true value. Then, processing each element of the second array, we check whether that element is already present in the union. If it is, we put it in the intersection as well. In any event, it goes into the union. When we're done, we extract the keys of both the union and intersection hashes. The values aren't needed.

The second solution (Recipe 4.8.2.2) is essentially the same but relies on familiarity with the Perl (and awk, C, C++, and Java) ++ and && operators. By placing the ++ after the variable, we first look at its old value before incrementing it. The first time through it won't be in the union, which makes the first part of the && false, so the second part is consequently ignored. The second time that we encounter the same element, it's already present in the union, so we put it in the intersection.

The third solution uses just one hash to track how many times each element is seen. Once both arrays have their elements recorded in the hash, we grab those keys and put them in the union. Then we process those hash keys one at a time. Keys whose values are 2 were in both arrays, so they go in the intersection array. Keys whose values are 1 were in just one of the two arrays, so they go in the difference array. Elements of the output arrays are not in the same order as those in the input arrays.

The last solution, like the previous one, uses just one hash to count how many times each element is encountered. Here, though, we dynamically select one of two possible arrays by placing within the @{...} array-dereferencing block an expression whose evaluation yields a reference to whichever array is demanded by the situation.

In this recipe we compute the symmetric difference, not the simple difference. These are terms from set theory. A symmetric difference is the set of all elements that are members of either @A or @B, but not both. A simple difference is the set of members of @A but not @B, which we calculated in Recipe 4.8.

The "Hashes" section of Chapter 2 of Programming Perl; Chapter 5; we use hashes in a similar fashion in Recipe 4.7 and Recipe 4.8  Chapter 1. Strings  Chapter 2. Numbers  Chapter 3. Dates and Times  Chapter 4. Arrays  Introduction  Recipe 4.1 Specifying a List in Your Program  Recipe 4.2 Printing a List with Commas  Recipe 4.3 Changing Array Size  Recipe 4.4 Implementing a Sparse Array  Recipe 4.5 Iterating Over an Array  Recipe 4.6 Iterating Over an Array by Reference  Recipe 4.7 Extracting Unique Elements from a List  Recipe 4.8 Finding Elements in One Array but Not Another  Recipe 4.9 Computing Union, Intersection, or Difference of Unique Lists  Recipe 4.10 Appending One Array to Another  Recipe 4.11 Reversing an Array  Recipe 4.12 Processing Multiple Elements of an Array  Recipe 4.13 Finding the First List Element That Passes a Test  Recipe 4.14 Finding All Elements in an Array Matching Certain Criteria  Recipe 4.15 Sorting an Array Numerically  Recipe 4.16 Sorting a List by Computable Field  Recipe 4.17 Implementing a Circular List  Recipe 4.18 Randomizing an Array  Recipe 4.19 Program: words  Recipe 4.20 Program: permute  Chapter 5. Hashes  Chapter 6. Pattern Matching  Chapter 7. File Access  Chapter 8. File Contents  Chapter 9. Directories  Chapter 10. Subroutines  Chapter 11. References and Records  Chapter 12. Packages, Libraries, and Modules  Chapter 13. Classes, Objects, and Ties  Chapter 14. Database Access  Chapter 15. Interactivity  Chapter 16. Process Management and Communication  Chapter 17. Sockets  Chapter 18. Internet Services  Chapter 19. CGI Programming  Chapter 20. Web Automation  Chapter 21. mod_perl  Chapter 22. XML