Hack 33 Sort Your Recommendations by Average Customer Rating

figs/expert.giffigs/hack33.gif

Find the highest rated items among your Amazon product recommendations.

If you've taken the time to fine-tune your recommendations [Hack #14], you know how precise they can be. If you've also looked at the star rating for some of your favorite products, then you know that the rating can be a good indication of quality. Both the Amazon recommendation and the customer rating add important information to a product, and can help you make a decision about whether to buy one item over another.

To get a feel for the products Amazon recommends for you, you can visit your book recommendations at any time at the following URL:

http://www.amazon.com/o/tg/stores/recs/instant-recs/-/books/0/

In addition to books, you can also find recommendations in other product categories. You can replace books in the URL with any of Amazon's catalogs, including music, electronics, dvd, and photo.

When you browse to your recommendations, you'll likely find several pages of items. Wouldn't it be great if you could add the customer review dimension by sorting the entire list by its average star rating? This hack does exactly that with a bit of screen scraping.

33.1 The Code

Because Amazon doesn't offer sorting by customer rating, this script first gathers all of your Amazon book recommendations into one list. By providing your Amazon account email address and password, the script logs in as you, and then requests the book recommendations page. It continues to request pages in a loop, picking out the details of your product recommendations with regular expressions. Once all the products and details are stored in an array, they can be sorted by star rating and printed out in any order wanted?in this case, the average star rating.

Be sure to replace your email address and password in the proper places below. You'll also need to have write permission in the script's directory so you can store Amazon cookies in a text file, cookies.lwp.

#!/usr/bin/perl
# get_recommendations.pl
#
# A script to log on to Amazon, retrieve
# recommendations, and sort by highest rating.
# Usage: perl get_recommendations.pl

use warnings;
use strict;
use HTTP::Cookies;
use LWP::UserAgent;

# Amazon email and password.
my $email = 'insert Amazon account email';
my $password = 'insert Amazon account password';

# Amazon login URL for normal users.
my $logurl = "http://www.amazon.com/exec/obidos/flex-sign-in-done/";

# Now login to Amazon.
my $ua = LWP::UserAgent->new;
$ua->agent("(compatible; MSIE 4.01; MSN 2.5; AOL 4.0; Windows 98)");
$ua->cookie_jar( HTTP::Cookies->new('file' => 'cookies.lwp','autosave' [RETURN]
=> 1));
my %headers = ( 'content-type' => "application/x-www-form-urlencoded" );
$ua->post($logurl, [ email       => $email,
          password    => $password,
          method      => 'get', 
          opt         => 'oa',
          page        => 'recs/instant-recs-sign-in-standard.html',
          response  => 'tg/recs/recs-post-login-dispatch/-/recs/pd_rw_gw_r',
          'next-page' => 'recs/instant-recs-register-standard.html',
          action      => 'sign-in' ], %headers);

# Set some variables to hold
# our sorted recommendations.
my (%title_list, %author_list);
my (@asins, @ratings, $done);

# We're logged in, so request the recommendations.
my $recurl = "http://www.amazon.com/exec/obidos/tg/". 
             "stores/recs/instant-recs/-/books/0/t";

# Set all Amazon recommendations in
# an array /  title and author in hashes.
until ($done) {

     # send the request for the recommendations
     my $content = $ua->get($recurl)->content;
     #print $content;

     # loop through the HTML looking for matches.
     while ($content =~ m!<td colspan=2 width=100%>.*?detail/-/(.*?)[RETURN]
/ref.*?<b>(.*?)</b>.*?by (.*?)\n.*?Average Customer Review&#58;.*?(.*?)[RETURN]
out of 5 stars.*?<td colspan=3><hr noshade size=1></td>!mgis) {
         my ($asin,$title,$author,$rating) = ($1||'',$2||'',$3||'',$4||'');
         $title  =~ s!<.+?>!!g;          # drop HTML tags.
         $rating =~ s!\n!!g;             # remove newlines.
         $rating =~ s! !!g;              # remove spaces.
         $title_list{$asin} = $title;    # store the title.
         $author_list{$asin} = $author;  # and the author.
         push (@asins, $asin);           # and the ASINs.
         push (@ratings, $rating);       # and th... OK!
     }

     # see if there are more results... if so continue the loop
     if ($content =~ m!<a href=(.*?instant-recs.*?)>more results.*?</a>!i) {
        $recurl = "http://www.amazon.com$1";# reassign the URL.
     } else { $done = 1; } # nope, we're done.
}

# sort the results by highest star rating and print!
for (reverse sort { $ratings[$a] <=> $ratings[$b] } 0..$#ratings) {
    next unless $asins[$_]; # skip blanks.
    print "$title_list{$asins[$_]}  ($asins[$_])\n" . 
          "by $author_list{$asins[$_]} \n" .
          "$ratings[$_] stars.\n\n";
}

33.2 Running the Hack

Run the hack from the command line and send the results to another file, like this:

get_recommendations.pl > top_rated_recommendations.txt

The text file top_rated_recommendations.txt should be filled with product recommendations, with the highest rated items on top. You can tweak the URL in $recurl to look for DVDs, CDs, or other product types by changing books to the product line you're interested in.