Hack 92 Encode Text for URLs

figs/moderate.giffigs/hack92.gif

Make sure the text in XML/HTTP queries is valid for URLs.

One thing to keep in mind when making XML/HTTP requests is that they behave exactly like URLs for web pages. This means that spaces and symbols need to be encoded. Spaces aren't allowed in URLs, so anything after a space could be disregarded by the server. Also, characters like ampersands (&), question marks (?), and number signs (#) give directions to the server about how the URL should be processed. So if you're doing an XML/HTTP Amazon ArtistSearch for a band like Kruder & Dorfmeister, you've got trouble?the spaces and ampersand will break the request. But you can translate the characters into a URL-friendly format.

Technically, you can encode these characters by using the percent sign (%) followed by their hexadecimal numeric values. The numeric value for a space is 20, so a space is represented as %20 in a URL. Spaces can also be escaped as plus signs (+) for many systems, including Amazon's. Here are some commonly escaped characters and their encoded values:

Ampersand (&)

%26

Question mark (?)

%3F

Number sign (#)

%23

Comma (,)

%2C

Colon (:)

%3A

The ArtistSearch mentioned will only work if the band name is encoded as Kruder%20%26%20Dorfmeister. Doing this by hand each time you make a request is out of the question. Luckily, this is such a common task that most programming environments have built-in functions to handle this for you.

92.1 The Code

Here are few common ways to escape text for URLs in various scripting languages.

JavaScript
var artist = "Kruder & Dorfmeister";
artist = escape(artist);
Perl
use URI::Escape;
my $artist = "Kruder & Dorfmeister";
$artist = uri_escape($artist);
VBScript
strArtist = "Kruder & Dorfmeister"
strArtist = Server.URLEncode(strArtist)
PHP
$artist = "Kruder & Dorfmeister";
$artist = urlencode(strArtist);
Python

Unlike the previous examples, Python's urlencode takes variable/value pairs and creates a properly escaped querystring:

import sys
from urllib import urlencode
artist = "Kruder & Dorfmeister"
artist = urlencode({'ArtistSearch':artist})

This sets the variable artist equal to:

ArtistSearch=Kruder+%26+Dorfmeister

Encoding strings for URLs is an easy problem to solve, and it's something to look at if your XML/HTTP requests aren't working quite right.