Hack 90 Spellcheck All Your Auctions

figs/expert.giffigs/hack90.gif

Ensure that your titles and descriptions are spelled correctly.

The success of any auction is largely due to how readily it can be found in eBay searches. As described in Chapter 2, eBay searches show only exact matches (with very few exceptions), which means, among other things, that spelling most definitely counts.

Neither eBay's Sell Your Item form nor Turbo Lister supports spellchecking of any kind. So it's left to sellers to scrutinize their titles and auction descriptions, and to obnoxious bidders to point out any mistakes. Once again, the API comes to the rescue.

8.10.1 The Script

The following script requires the following modules and programs:

Module/program name

Available at

HTML::FormatText (by Sean M. Burke)

search.cpan.org/perldoc?HTML::FormatText

HTML::TreeBuilder (by Sean M. Burke)

search.cpan.org/perldoc?HTML::TreeBuilder

HTML::Entities (by Gisle Aas)

search.cpan.org/perldoc?HTML::Entities

Lingua::Ispell (by John Porter)

search.cpan.org/perldoc?Lingua::Ispell

ispell program (by Geoff Kuenning)

fmg-www.cs.ucla.edu/geoff/ispell.html

#!/usr/bin/perl
require 'ebay.pl';

require HTML::TreeBuilder;
require HTML::FormatText;
use Lingua::Ispell qw( spellcheck );
Lingua::Ispell::allow_compounds(1);

$out1 = "";
$outall = "";
$numchecked = 0;
$numfound = 0;

$today = &formatdate(time);
$yesterday = &formatdate(time - 86400);

my $page_number = 1;
PAGE:
while (1) {
    my $rsp = call_api({ Verb => 'GetSellerList',     [1]
                  DetailLevel => 0,
                       UserId => $user_id,
                StartTimeFrom => $yesterday,
                  StartTimeTo => $today,
                   PageNumber => $page_number
    });

    if ($rsp->{Errors}) {
      print_error($rsp);
      last PAGE;
    }
    foreach (@{$rsp->{SellerList}{Item}}) {
      my %i = %$_;
      $id = @i{qw/Id/};

      if (! -e "$localdir/$id") {
        my $rsp = call_api({ Verb => 'GetItem',
                      DetailLevel => 2,
                               Id => $id
        });
        
        if ($rsp->{Errors}) {
          print_error($rsp)
        } else {
          my %i = %{$rsp->{Item}[0]};
          my ($title, $description) = @i{qw/Title Description/};

          $spellthis = $title . " " . $description;     [2]
          $tree = HTML::TreeBuilder->new_from_content($spellthis);     [3]
          $formatter = HTML::FormatText->new();
          $spellthat = $formatter->format($tree);
          $tree = $tree->delete;     [4]

          for my $r ( spellcheck( $spellthat ) ) {     [5]
            if ( $r->{'type'} eq 'miss' ) {
              $out1 = $out1."'$r->{'term'}'";
              $out1 = $out1." - near misses: @{$r->{'misses'}}\n";
              $numfound++;
            }
            elsif ( $r->{'type'} eq 'guess' ) {
              $out1 = $out1."'$r->{'term'}'";
              $out1 = $out1." - guesses: @{$r->{'guesses'}}\n";
              $numfound++;
            }
            elsif ( $r->{'type'} eq 'none' ) {
              $out1 = $out1."'$r->{'term'}'";
              $out1 = $out1." - no match.\n";
              $numfound++;
            }
          }

        $numchecked++;
        if ($out1 ne "") {
          $outall = $outall."Errors in #$id '$title':\n";
          $outall = $outall."$out1\n\n";
          $out1 = "";
        }

        }
      }
    }
    last PAGE unless $rsp->{SellerList}{HasMoreItems};
    $page_number++;
}

print "$numfound spelling errors found in $numchecked auctions:\n\n";     [6]
print "$outall\n";

This script is based on the one in [Hack #87], but has a few important additions and changes.

First, instead of listing recently completed auctions, the GetSellerList API call (line [1]) is used to retrieve auctions that have started in the last 24 hours. This will work perfectly if the script is run every 24 hours, say, at 3:00 P.M. every day, as described in [Hack #17].

Second, since we want the auction descriptions, we need to use the GetItem API call for each auction we spellcheck. This means that spellchecking a dozen auctions will require 13 API calls: one call to retrieve the list, and one for each auction.

The code actually responsible for performing spellcheck starts on line [2], where the title and description are concatenated into a single variable, $spellthis, so that only one spellcheck is necessary for each auction. Next, the HTML::FormatText module is used (lines [3] to [4]) to convert any HTML-formatted text to plain text.

Finally, the Lingua::Ispell module [5] uses the external ispell program to perform a spellcheck on $spellthat (the cleaned-up version of $spellthis). As errors are found, suggestions are recorded into the $out1 variable, which is merged with $outall and displayed when the spellcheck is complete.

8.10.2 Hacking the Hack

Here are a few things you might want to do with this script:

  • Instead of simply printing out the results of the spellcheck, as the script does on line [6], you can quite easily have the results emailed to you. See [Hack #93] for an example.

  • Currently, the script performs a spellcheck on every running auction started in the last 24 hours. If you run the script every 24 hours, then this won't pose a problem. But if you choose to run the script manually and therefore specify a broader range of dates, you may wish to include error checking to prevent the script from needlessly checking the same auction twice.

  • If you're especially daring, you can have the spellchecker submit the revisions for you, although I would never trust a spellchecker to know how to spell all the weird names of my items.