Hack 17 Create a Search Robot

figs/expert.giffigs/hack17.gif

Use the WWW::Search::eBay Perl module to perform your searches for you.

A collector in search of a particular item or type of item may repeat the same search, often several times a week. A serious collector, knowing that items sometimes sell within hours of being listed (see [Hack #26]), may repeat a search several times a day for an item. But who has the time?

The Favorites tab of the My eBay page, which allows you to keep track of up to 100 favorite searches (see [Hack #16]), also has a feature to email you when new items matching your search criteria appear on the site. Just check the Preferences link next to the search caption, and then turn on the "Email me daily whenever there are new items" option.

Unfortunately, eBay's new-item notification feature will send you notifications no more than once a day, and in that time, any number of juicy auctions could've started and ended. So I created this hack to do my searches for me, and do them as often as I see fit.

2.10.1 Constructing the Robot

By "scraping" eBay search results with the WWW::Search::eBay Perl module (developed by Martin Thurn), any Perl program can retrieve search results from eBay and manipulate them any way you want. You can download the module for free from search.cpan.org/perldoc?WWW::Search::eBay and install it on any computer that has Perl. See Installing Perl Modules for installation details.

Installing Perl Modules

(Adapted from Google Hacks by Tara Calishain and Rael Dornfest)

A few hacks in this book make use of add-on Perl modules, useful for turning dozens of lines of messy code into a couple of concise commands. If your Perl script resides on a server maintained by someone else (typically an ISP administrator), you'll have to request that they install the module before you can reference it in your scripts. But if you're the administrator, you'll have to install it yourself.

Installing on Unix and Mac OS X:

Assuming you have the CPAN module, have root access, and are connected to the Internet, installation should be no more complicated than:

% su
% perl -MCPAN -e shell
cpan> install WWW::Search::Ebay

Note that capitalization counts; copy-and-paste the module name for an exact match. If the install fails, you can try forcing an installation by typing:

cpan> force install WWW::Search::Ebay

Go grab yourself a cup of coffee, meander the garden, read the paper, and check back once in a while. Your terminal's sure to be riddled with incomprehensible gobbledegook that you can, for the most part, summarily ignore. You may be asked a question or three; in most cases, simply hitting Return to accept the default answer will do the trick.

Windows installation via PPM:

If you're running Perl under Windows, chances are it's ActiveState's ActivePerl (www.activestate.com/Products/ActivePerl/). Thankfully, ActivePerl is outfitted with a CPAN-like module installation utility. The Programmer's Package Manager (PPM, aspn.activestate.com/ASPN/Downloads/ActivePerl/PPM/) grabs nicely packaged module bundles from the ActiveState archive and drops them into place on your Windows system with little need of help from you. Simply launch PPM from inside a DOS terminal window and tell it to install the module:

C:\>ppm
PPM> install WWW-Search-eBay

The WWW::Search::eBay module retrieves search results by parsing eBay's search pages. Since it doesn't use an official programmer's interface (like the eBay API, discussed in Chapter 8), it's vulnerable to even minor changes in eBay's search pages. For this reason, you should routinely check for updated versions of the module, especially if it stops working as expected.

It's easy enough to use the WWW::Search::eBay module to create nothing more than an alternative interface to eBay's own search tool, but the module's real value is how it can be used behind the scenes.

A robot is a program that does automatically what you'd otherwise have to do manually. In this case, we want a robot that automatically performs an eBay search at a regular interval, and then emails us any new listings.

Here's the script that does it all:

#!/usr/bin/perl
$searchstring = "railex";     [1]
$email = "dave\@ebayhacks.com";
$localfile = "/usr/localweb/ebayhacks/search.txt";

use WWW::Search;     [2]
$searchobject = new WWW::Search('Ebay');     [3]
$query = WWW::Search::escape_query($searchstring);
$searchobject->native_query($query);     [4]

# *** put results into two arrays ***
$a = 0;
while ($resultobject = $searchobject->next_result(  )) {     [5]
  $a++;
  ($itemnumber[$a]) = ($resultobject->url =~ m!item=(\d+)!);     [6]
  $title[$a] = $resultobject->title;     [7]
}

# *** eliminate entries already in file ***
open (INFILE,"$localfile");
  while ( $line = <INFILE> ) {
    for ($b = $a; $b >= 1; $b--) {
      if ($line =~ $itemnumber[$b]) {     [8]
        splice @itemnumber, $b, 1;
        splice @title, $b, 1;
      }
    }
  }
close (INFILE);
$a = @itemnumber - 1;
if ($a == 0) { exit; }

# *** save any remaining new entries to file ***
open (OUTFILE,">>$localfile");
  for ($b = 1; $b <= $a; $b++) {
    print OUTFILE "$itemnumber[$b]\n";     [9]
  }
close (OUTFILE);

# *** send email with new entries found ***
open(MAIL,"|/usr/sbin/sendmail -t");     [10]

  print MAIL "To: $email\n";
  print MAIL "From: $email\n";
  print MAIL "Subject: New $searchstring items found\n\n";
  print MAIL "The following new items have been listed on eBay:\n";
  for ($b = 1; $b <= $a; $b++) {
    print MAIL "$title[$b]\n";
    print MAIL "http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=$itemnumber[$b]\n\n";
  }
close(MAIL);

2.10.2 How It Works

The text to search ("railex" in this case) and the email address of the recipient of the notification emails are specified at the beginning of the script [1]. Naturally, you'll want to modify these lines, as well as the $localfile variable, which points to the file in which previous search results are stored.

Next, the WWW::Search::eBay module is referenced [2] and the search is performed [4]. The $resultobject construct [5] is then used to enumerate the search results (if any) and retrieve such details as the item number [6] (taken from the URL) and title [7] for each auction returned.

All search results are then checked against a list of previous search results [8], which are stored in a text file ($localfile). Once duplicate auctions have been filtered out, the new auction numbers (if there are any left) are appended to the file [9].

Finally, a list of new auctions that meet the search criteria is emailed to the email address. You may have to adjust line [10] to suit your system, either to specify a different location for the sendmail executable or to use a different command-line-based email client.

2.10.3 Running the Hack

The search criteria you choose are entirely up to you, but narrow searches make more sense for this hack than broad searches. For instance, my example script targets Railex, a small German manufacturer of handmade brass model trains known for being very difficult to find. At any given time, there may be only a handful of these items for sale on eBay, which means that I may receive a single notification per month, if that. Conversely, a search yielding hundreds of results would quickly fill up your mailbox with dozens of emails with erroneous results. Use some of the other hacks in this chapter to narrow your searches, if necessary.

The best way to run this script is automatically at regular intervals, unless you enjoy waking up at 3 A.M. and typing commands into a terminal. How frequently you run the script is up to you, but it wouldn't make sense to run it more often than you check your email. In most cases, it's sufficient to activate the search robot 3-4 times a day, but given that new auctions can show up on eBay less than a minute after being listed, you can run it once an hour if you like.

Use this script responsibly. If eBay finds that their servers are over-burdened due to abuse by scrapers (which, strictly speaking, violate eBay's terms of service), they might take steps to disable them. See [Hack #83] for a version of this hack that uses the eBay API to perform searches.

If you're using Unix or Mac OS X, type crontab -u username -e to set up a cron job, where username is, not surprisingly, your username. In the editor that appears, add the following four lines:

0 0 * * * /home/mydirectory/scripts/search.pl
0 6 * * * /home/mydirectory/scripts/search.pl
0 12 * * * /home/mydirectory/scripts/search.pl
0 18 * * * /home/mydirectory/scripts/search.pl

where /home/mydirectory/scripts/search.pl is the full path and filename of the script. Save the file when you're done. This will instruct the server to run the script every six hours: at midnight, 6:00 A.M., noon, and 6:00 P.M. See www.superscripts.com/tutorial/crontab.html for more information on crontab.

If you're using Windows, open the Scheduled Tasks tool, right-click on an empty area of the window, and select New. (This bypasses the cumbersome wizard and goes directly to the so-called "advanced" properties sheet.) Type the full path and filename of the script in the Run field, and then choose the Schedule tab. Turn on the "Show multiple schedules" option, and click New three times. Set up each of the four schedules to run as follows: Daily at 12:00 A.M., Daily at 6:00 A.M., Daily at 12:00 P.M., and Daily at 6:00 P.M. Click OK when you're done.

Assuming all goes well, you should eventually get an email that looks something like this:

To: dave@ebayhacks.com
From: dave@ebayhacks.com
Subject: New railex items found

The following new items have been listed on eBay:
Railex Snowplow, RARE
http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=3128955953

Railex Glaskasten, Green & Black, NEW NR
http://cgi.ebay.com/ws/eBayISAPI.dll?ViewItem&item=        3128013702

You should continue getting emails as new auctions matching your criteria are listed on eBay; just click the links in the emails to view the auctions.

2.10.4 Hacking the Hack

By default, the WWW::Search::eBay module searches only titles. To search descriptions as well, change line [4] to the following:

$searchobject->native_query($query, {srchdesc => 'y'});

The search results are sorted by listing date, with newly listed items shown first. You can, of course, sort the results manually, or you can use the WWW::Search::eBay::ByEndDate module (part of the WWW::Search::eBay distribution) to sort by end date by replacing line [3] with the following:

$searchobject = new WWW::Search('Ebay::ByEndDate');

The WWW::Search::eBay module is only for searching the U.S. eBay site (www.ebay.com). To search non-U.S. eBay sites, use the WWW::Search::EBayGlobal or WWW::Search::EBayGlobal::ByEndDate modules.

One of the drawbacks to eBay's built-in email notification is that each search generates its own email; have 20 favorite searches, and you'll get up to 20 separate emails every day. In this hack, you can accommodate multiple searches by modifying lines [1] to [7] so that the script retrieves a list of individual keywords from a separate file and then compiles a single array from the results of all the searches. That way, you'll only get a single email, regardless of the number of different searches the robot performs.

Once you've been notified of newly listed auctions, you'll most likely want to keep track of their progress, as described in [Hack #24]. If you want to be a little adventurous, you can modify the search robot script to automatically write new entries to the track.txt file used by the track.pl script in [Hack #24]. That way, new auctions will automatically show up in your watching list!