Hack 81 Loop Around the 10-Result Limit


When 10 products just aren't enough, it's time to get loopy.

Amazon returns 10 results per request, but there are times you want much more data than that. Along with each response, you'll find the TotalResults and TotalPages values. By using this information and the page variable with subsequent requests, you can make several queries and combine the results.

81.1 The Code

This code builds on the previous XML/HTTP Perl example [Hack #80] but sets the URL assembly, request, and response in a loop. The page value is incremented each time through the loop until the value in TotalPages is met. Create the file amazon_http_loopy.pl with this code:

# amazon_http_loopy.pl
# A simple XML request/parse script that loops through all pages of
# a keyword search.
# Usage: perl amazon_http_loopy.pl <keyword> 

#Your Amazon developer's token
my $dev_key='insert developer token ';

#Your Amazon affiliate code
my $af_tag='insert associate tag ';

#Take the query from the command-line
my $keyword =shift @ARGV or die "Usage:perl amazon_http_loopy.pl [RETURN]

use strict;

#Use the XML::Parser Perl module
use XML::Simple;
use LWP::Simple;

my $totalPages = 1; 

#The loop starts here
for (my $thisPage = 1; $thisPage <= $totalPages; $thisPage++) { 

    #Assemble the URL
    my $url = "http://xml.amazon.com/onca/xml3?t=" . $af_tag . 
        "&dev-t=" . $dev_key .
        "&type=lite&f=xml&mode=books&" .
        "KeywordSearch=" . $keyword . 
        "&page=" . $thisPage ;
    my $content = get($url);
    die "Could not retrieve $url" unless $content;

    my $xmlsimple = XML::Simple->new('forcearray' => 1);
    my $response = $xmlsimple->XMLin($content);

    $totalPages = $response->{TotalPages}[0]; 

    foreach my $result (@{$response->{Details}}) {
      #Print out the main bits of each result
      join "\n",
        $result->{ProductName}[0]||"no title",
      "ASIN: " . $result->{Asin}[0] . ", " .
      $result->{OurPrice}[0] . "\n\n";

    #Wait 1 second before making another request
    sleep 1; 

The sleep function at the end of the loop keeps this code compliant with the AWS terms of service. Amazon asks that you make only one request per second.

81.2 Running the Hack

Just run the script on the command line like this:

perl amazon_http_loopy.pl hacks

For this particular query, you should get around 22 pages of results returned in a continuous loop.

81.3 Hacking the Hack

You can do more than just loop through all the pages of results. If you'd rather return an arbitrary maximum number of results, set a variable somewhere outside the loop:

my $maxNumber = 30;

Then, inside the loop, stop the query once that maximum number has been reached. The last command provides this in Perl:

last unless (($thisPage * 10) <= $maxNumber);

We know that there are 10 results per page, so we just multiply the value of $thisPage by 10 to see how many items have been looped through. This works only for multiples of 10, but you could set a counter inside the XML loop for a more fine-grained approach.

By adding a sort variable and a line of code to exit the loop when your criteria are met, you can get just about any results you need, from a maximum sales rank to a minimum average user rating.