Hack 44 Get Purchase Circle Products with Screen Scraping

Purchase Circles provide a unique look at sales patterns. You can access them programmatically only with screen scraping.

Amazon's purchase circles are specialized bestseller lists broken down by geography or organization. If you visit the Friends & Favorites page, choose "Purchase Circles" from the drop-down list, and type the name of your city, chances are you'll find what's uniquely popular among your fellow residents. Amazon also lists what's popular at universities and large corporations. If everyone at Microsoft is reading about a certain technology, you may find it in the next version of Windows!

44.1 Finding Purchase Circle IDs

In fact, you can link directly to the Microsoft Corporation purchase circle:

http://www.amazon.com/exec/obidos/tg/cm/browse-communities/-/211569/

The six-digit code at the end of the URL is the Purchase Circle ID for Microsoft. Every purchase circle has a unique ID. You can find IDs by noting them from URLs as you browse circles. The purchase circles home page (http://www.amazon.com/exec/obidos/subst/community/community.html) is a good place to start.

Once you know an ID, you can link to it directly using the URL format. You can also write scripts to access the page and retrieve a list of items.

44.2 The Code

This script takes a Purchase Circle ID and returns the books listed. Create a file called get_circle.pl and add the following code:

#!/usr/bin/perl
# get_circle.pl
# A script to scrape Amazon to retrieve purchase circle products
# Usage: perl get_circle.pl <circleID>

#Take the asin from the command-line
my $circleID =shift @ARGV or die "Usage:perl get_circle.pl <circleID>\n";

#Assemble the URL
my $url = "http://amazon.com/o/tg/cm/browse-communities/-/" .
          $circleID . "/t/";

use strict;
use LWP::Simple;

#Request the URL
my $content = get($url);
die "Could not retrieve $url" unless $content;

my $circle = (join '', $content);

while ($circle =~ m!<title>(.*?)</title>!mgis) {
    print $1 . "\n\n";
}

while ($circle =~ m!<td.*?<b><a.*?-/(.*?)[?/].*?>(.*?)</a></b>.*?by[RETURN]
(.*?)<br>.*?</td>!mgis) {
    my($asin,$title,$author) = ($1||'',$2||'',$3||'');
    #Print the results
    print $title . "\n" .
          "by " . $author . "\n" .  
          "ASIN: " . $asin .
          "\n\n";
}

One thing to note about this code is that it passes the /t/ URL argument to return a text-only version of the purchase circle page. Text-only pages have less HTML, which means that fewer bytes are flying around and it's generally easier to scrape for information.

44.3 Running the Hack

You can run this hack, providing a Purchase Circle ID, from the command line like this:

perl get_circle.pl insert purchase circle ID

44.4 Hacking the Hack

This script returns popular books for a given circle, but there's no reason you can't also get lists of the most popular music or movies for a circle. Add a catalog after the Purchase Circle ID to find what you're looking for. Here are the possible catalogs:

music
dvd
video
toy
ce (electronics)

So, for example, to link directly to DVDs that are popular in Sebastopol, CA, find the Purchase Circle ID, and add /dvd/ to the URL:

http://amazon.com/exec/obidos/tg/cm/browse-communities/-/216435/dvd/

If you'd like to keep it text-only as in the script, the /t/ follows the catalog:

http://amazon.com/exec/obidos/tg/cm/browse-communities/-/216435/dvd/t/

Credits

Foreword

Preface

Chapter 1. Browsing and Searching

Chapter 2. Controlling Your Information

Chapter 3. Participating in the Amazon Community

3.1 Hacks #27-48

3.2 Community Features

3.3 Accessing Community Features

Hack 27 Write a Review

Hack 28 Link Directly to Reviews of a Product

Hack 29 Post a Review from a Remote Site

Hack 30 Add Pop-up Amazon Reviews to Your Web Site

Hack 31 Send an Email Alert if a Review Is Added to a Product

Hack 32 Sort Books by Average Customer Rating

Hack 33 Sort Your Recommendations by Average Customer Rating

Hack 34 Scrape Product Reviews

Hack 35 Publish Your Amazon Reviews on Your Site

Hack 36 Share the Love (and Savings!) with Your Friends

Hack 37 Create a Guide

Hack 38 Post a Guide Remotely

Hack 39 Add Product Advice Remotely

Hack 40 Scrape Customer Advice

Hack 41 Create a Listmania! List

Hack 42 Gather Your Friends on Amazon

Hack 43 Gather Your Friends' Amazon IDs

Hack 44 Get Purchase Circle Products with Screen Scraping

Hack 45 Find Purchase Circles by Zip Code

Hack 46 Track the Ranks of Books Over Time

Hack 47 Group Conversations About Books

Hack 48 Add a 'Currently Reading' List to Your Web Site

Chapter 4. Selling Through Amazon

Chapter 5. Associates Program

Chapter 6. Amazon Web Services

Colophon