9.2 Server-Side Validation with PHP

In this section, we introduce validation on the server using PHP. We show you how to validate numbers including currencies and credit cards, strings including email addresses and Zip Codes, and dates and times. We also show you how to check for mandatory fields, field lengths, and data types. Many of the PHP functions we use?including the regular expression and string functions?are discussed in detail in Chapter 3.

We illustrate many of our examples in this section with a case study of validating customer details. The techniques described here are typical of those that validate a form after the user has submitted data to the server. We show how to extend and integrate this approach further in Chapter 10 so that the batch errors are reported as part of a customer form, and we show a completed customer entry form and validation in Chapter 17.

9.2.1 Mandatory Data

Testing whether mandatory fields have been entered is straightforward, and we have implemented this in our examples in Chapter 8. For example, to test if the user's surname has been entered, the following approach is used:

/// Validate the Surname

if (empty($surname))

  formerror($template, "The surname field cannot be blank.", $errors);

The formerror( ) function outputs the error message as a batch error using a template and is discussed in detail in Chapter 8. For simplicity and compactness in the remainder of our examples in this chapter, we omit the formerror( ) function from code fragments and simply output the error messages using print.

9.2.2 Validating Strings

In this section, we discuss nonnumeric validation. We begin with the basics of validating strings, and then discuss the specifics of email addresses, URLs, and Zip or post codes.

9.2.2.1 Basic techniques

It's likely that most of the data entered by users will be strings and require validation. Indeed, checking that strings contain legal characters, are of the correct length, or have the correct format is the most common validation task. Strings are popular for two reasons: first, all data from a form that is stored in the superglobals $_GET and $_POST is of the type string; and, second, some nonstring data such as a date of birth or a phone number is likely to be stored as a string in a database table because it may contain brackets, dashes, and slashes. However, despite dates and phone numbers being sometimes stored as strings, we discuss their validation in Section 9.2.2.5.

The simplest test of a string is to check if it meets a minimum or maximum length requirement. For example:

if (strlen($password) < 4 || strlen($password) > 8)

  print "Password must contain between 4 and 8 characters";

Length validation can also be performed using a regular expression, as we show in later examples in this section. Our mysqlclean( ) and shellclean( ) functions also include an implicit maximum length validation. As discussed in Chapter 6, these functions should be used as a first step in validation that helps to secure an application.

Common tests for legal characters include checking if strings are uppercase, lowercase, alphabetic, or are drawn from a defined character set (such as, for example, alphabetic strings that may include hyphens or apostrophes). In PHP, the is_string( ) function can be used to check if a variable is a string type. However, this is of limited use in validation because a string can contain any character including (or even exclusively) digits or special characters. It's more useful to test what characters are in the string or detect characters that shouldn't be there.

Regular expressions offer three shortcuts for use in basic tests that are discussed in Chapter 3. To test if a string is alphabetic, use:

if (!ereg("^[[:alpha:]]$", $string))

  print "String must contain only alphabetic characters.";

To test if a string is uppercase or lowercase, use:

if (ereg("^[[:upper:]]$", $string))

  print "String contains only uppercase characters.",;



if (ereg("^[[:lower:]]$", $string))

  print "String contains only lowercase characters";

The expressions work for the English character sets, and also work for French if you set your locale at the beginning of the script using, for example, setlocale(`LC_ALL', 'fr'). In the future, it should work for all localities and, therefore, these techniques are useful for internationalizing your application.

If you're working with only the English language a simpler alphabetic test works:

if (!eregi("^[a-z]*$", $string))

  print "String must contain only alphabetic characters.";

For other character sets (or if you want detailed control over English validation), a handcrafted expression works well. For example, the following works as an alphabetic test for Spanish:

if (!eregi("^[a-zñ]*$", $string))

  print "La cadena debe contener solamente caracteres alfabeticos";

Sometimes it's easier to check what characters shouldn't be there. For example, at our university, student email accounts must begin with an S:

if (!ereg("^S", $text))

  print "Student accounts must begin with S.";

However, for this simple example, a regular expression will run slower than using a string library function. Instead, a better approach is to use substr( ) :

if (substr($text, 0 , 1) != "S")

  print "Student accounts must begin with S.";

In general, you should use string functions for low complexity tasks.

For our customer case study, we might allow the firstname and surname of the customer to contain only alphabetic characters, hyphens, and apostrophes; white space, numbers, and other special characters aren't allowed. For the firstname we use:

elseif (!eregi("^[a-z'-]*$", $firstName))

  print "The first name can contain only alphabetic " .

        "characters or - or '";

Length validation and character checks are often combined. For example, the customer's middle initial might be limited to exactly one alphabetic character:

if (!empty($initial) && !eregi("^[a-z]$", $initial))

  print "The initial field must be empty or one character in length.";

The if statement contains two clauses: a check as to whether the field contains data and, if that's true, a check of the contents of the field using eregi( ). As discussed in Chapter 2, the second clause is checked only if the first clause is true when an AND (&&) expression is evaluated. If the variable is empty, the eregi( ) expression isn't evaluated.

The expression ^[a-z]$ is the same as ^[a-z]{1}$. To check if a string is exactly four alphabetic characters in length use ^[a-z]{4}$. To check if it's between two and four characters use ^[a-z]{2,4}$.

9.2.2.2 Validating Zip and postcodes

Zip or postcodes are numeric in most countries but are typically stored as strings because spaces, letters, and special characters are sometimes allowed. In our customer case study, we might validate Zip Codes using a simple regular expression:

// Validate Zipcode

if (!ereg("^([0-9]{4,5})$", $zipcode))

   print "The zipcode must be 4 or 5 digits in length.";

This permits a Zip Code of either four or five digits in length; this works for both U.S. Zip Codes, and Australia's and several other countries' postcodes, but it's unsuitable for many other countries. For example, postcodes from the United Kingdom include letters and a space and have a complex structure.

For complete validation, we could adapt our Zip or postcode validation to match the country that the user has entered. Example 9-1 shows a validation function that adapts for many Zip and postcodes. The final five case statements check postcodes that must include spaces, dashes, and letters.

Example 9-1. A code fragment to validate many popular Zip and postcodes

function checkcountry($country, $zipcode)

{

  switch ($country)

  {

    case "Austria":

    case "Australia":

    case "Belgium":

    case "Denmark":

    case "Norway":

    case "Portugal":

    case "Switzerland":

      if (!ereg("^[0-9]{4}$", $zipcode))

      {

         print "The postcode/zipcode must be 4 digits in length";

         return false;

      }

      break;

    case "Finland":

    case "France":

    case "Germany":

    case "Italy":

    case "Spain":

    case "USA":

      if (!ereg("^[0-9]{5}$", $zipcode))

      {

         print "The postcode/zipcode must be 5 digits in length";

         return false;

       }

       break;

    case "Greece":

      if (!ereg("^[0-9]{3}[ ][0-9]{2}$", $zipcode))

      {

         print "The postcode must have 3 digits, a space,

                and then 2 digits";

         return false;

      }

      break;

    case "Netherlands":

      if (!ereg("^[0-9]{4}[ ][A-Z]{2}$", $zipcode))

      {

         print "The postcode must have 4 digits, a space, and then 2

                letters";

         return false;

      }

      break;

    case "Poland":

      if (!ereg("^[0-9]{2}-[0-9]{3}$", $zipcode))

      {

         print "The postcode must have 2 digits, a dash,

                and then 3 digits";

         return false;

      }

      break;

    case "Sweden":

      if (!ereg("^[0-9]{3}[ ][0-9]{2}$", $zipcode))

      {

         print "The postcode must have 3 digits, a space,

                and then 2 digits";

         return false;

      }

      break;

    case "United Kingdom":

      if (!ereg("^(([A-Z][0-9]{1,2})|([A-Z]{2}[0-9]{1,2})|" .

                "([A-Z]{2}[0-9][A-Z])|([A-Z][0-9][A-Z])|" .

                "([A-Z]{3}))[ ][0-9][A-Z]{2}$", $zipcode))

      {

         print "The postcode must begin with a string of the format

                A9, A99, AA9, AA99, AA9A, A9A, or AAA,

                and then be followed by a space and a string

                of the form 9AA.

                A is any letter and 9 is any number.";

        return false;

      }

      break;

    default:

      // No validation

  }

  return true;

}

Another common validation check with Zip Codes is to check that they match the city or state using a database table, but we don't consider this approach here.

9.2.2.3 Validating email addresses

Email addresses are another common string that requires field organization checking. There is a standard maintained by the Internet Engineering Task Force (IETF) called RFC-2822 that defines what a valid email address can be, and it's much more complex than might be expected. For example, an address such as the following is valid:

" <test> "@webdatabasebook.com

In our customer case study, we might use a regular expression and network functions to validate an email address. A function for this purpose is shown in Example 9-2.

Example 9-2. A function to validate an email address

function checkemail($email)

{

  // Check syntax

  $validEmailExpr =  "^[0-9a-z~!#$%&_-]([.]?[0-9a-z~!#$%&_-])*" .

                     "@[0-9a-z~!#$%&_-]([.]?[0-9a-z~!#$%&_-])*$";



  // Validate the email

  if (empty($email))

  {

    print "The email field cannot be blank";

    return false;

  }    

  elseif (!eregi($validEmailExpr, $email))

  {  

    print "The email must be in the name@domain format.";

    return false;

  }

  elseif (strlen($email) > 30)

  { 

    print "The email address can be no longer than 30 characters.";

    return false;

  }

  elseif (function_exists("getmxrr") && function_exists("gethostbyname"))

  {

    // Extract the domain of the email address

    $maildomain = substr(strstr($email, '@'), 1);



    if (!(getmxrr($maildomain, $temp) || 

          gethostbyname($maildomain) != $maildomain))

    {

      print "The domain does not exist.";

      return false;

    }

  }

  return true;

}

If any email test fails, an error message is output, and no further checks of the email value are made. A valid email passes all tests.

The first check tests to make sure that an email address has been entered. If it's omitted, an error is generated. It then uses a regular expression to check if the email address matches a template. It isn't RFC-2822-compliant but works reasonably for most email addresses:

It uses eregi( ), so either upper- or lowercase are matched by the use of a-z.
It expects the string to begin with a character from the set 0-9, a-z, and ~!#$%&_-. There has to be at least one character from this set at the beginning of the email address for it to be valid.
After the first character matches, there is an optional bracketed expression:
```
([.]?[0-9a-z~!#$%&_-])*
```
This expression is optional because it's suffixed with the * operator. However, if it does match, it matches any number of the characters specified. There can only be one consecutive full-stop if a full-stop occurs, as determined by the expression [.]?. The expression, for example, matches the string fred.williams but not fred..williams.
After the initial part of the email address, the character @ is expected. The @ has to occur after the first word for the string to be valid; our regular expression rejects an email address such as fred that has only the initial or local component.
Our validation expects there to be another word of at least one character after the @ symbol, and this can be followed by any combination of the permitted characters. Strings of permitted characters can be separated by a single full-stop.

The function is imperfect. It allows several illegal email addresses and doesn't allow many that are legal but unusual.

The third step is to check the length of the email address. If it exceeds 30 characters, an error is generated.

The fourth and final step is to check whether the domain of the email address actually exists. The fragment only works on platforms that support the network library functions getmxrr( ) and gethostbyname( ) :

elseif (function_exists("getmxrr") && function_exists("gethostbyname"))

{

  // Extract the domain of the email address

  $maildomain = substr(strstr($email, '@'), 1);



  if (!(getmxrr($maildomain, $temp) || 

        gethostbyname($maildomain) != $maildomain))

  {

    print "The domain does not exist.";

    return false;

  }

}

The function getmxrr( ) queries an Internet domain name server (DNS) to check if there is a record of the email domain as a mail exchanger (MX). If the domain isn't an `MX', the domain is checked with gethostbyname( ) to see if it has an `A' record; the relevant standard RFC-974 states that when a domain does not have an `MX', it should be interpreted as having one equal to the host name. If both tests fail, the domain of the email address isn't valid and we reject the email address.

For platforms (such as Microsoft Windows) that don't have the getmxrr( ) and gethostbyname( ) functions, the PEAR Net_DNS package can be used instead. It must be installed using the PEAR installer. The DNS lookup package must then be included into the source code using:

require_once "Net/DNS.php";

Installation of packages is discussed in Chapter 7.

The following fragment is a function checkMailDomain( ) that uses PEAR Net_DNS to check if the domain parameter $domain has a record of the type matching the parameter $type:

// Call with $type of MX, then A to check if an email address

// domain is valid

function checkMailDomain($domain, $type)

{

  // Create a DNS resolver, and look up an $type record for $domain

  $resolver = new Net_DNS_Resolver( );

  $answer = $resolver->search($domain, $type);



  // Is there an answer record?

  if (isset($answer->answer))

    // Iterate through the answers

    foreach($answer->answer as $ans)

      // If it's a $type answer, return true

      if ($ans->type == $type)

         return true;



  return false;

}

The function returns true if the DNS server responds with an answer that includes a record of the type that's been requested; it returns false otherwise.

The following code fragment can then be used to validate an email address:

// Extract the domain of the email address

$maildomain = substr(strstr($email, '@'), 1);



if (!(checkMailDomain($maildomain, "MX") || 

      checkMailDomain($maildomain, "A")))

  {

    print "The domain does not exist.";

    return false;

  }

As in the previous example that uses getmxrr( ) and gethostbyname( ), we check if there is a record of the email domain as a mail exchanger (MX). If the domain isn't an `MX', the domain is checked to see if it has an `A' record. If both tests fail, the domain of the email address isn't valid and we reject the email address.

9.2.2.4 Validating URLs

Home pages, links, and other URLs are sometimes entered by users. In PHP, validating these is straightforward because the library function parse_url( ) can do most of the work for you.

The parse_url( ) function takes one parameter, a URL string, and returns an associative array that contains the components of the URL. For example:

$bits =

  parse_url("http://www.webdatabasebook.com/test.php?status=F#message");

foreach($bits as $var => $val)

  echo "{$var} is {$val}\n";

produces the output:

scheme is http

host is www.webdatabasebook.com

path is /test.php

query is status=F

fragment is message

The parse_url( ) function can be used in validation as follows:

$bits = parse_url($url);



if ($bits["scheme"] != "http")

  print "URL must begin with http://.";

elseif (empty($bits["host"]))

  print "URL must include a host name.";

elseif (function_exists('checkdnsrr') && !checkdnsrr($bits["host"], 'A'))  

  print "Host does not exist.";

You might also add elseif clauses to check for specific path, query, or fragment components. In addition, you could modify the test of the scheme to check for other valid URL types, including ftp://, https://, or file://.

Unfortunately, at the time of writing, parse_url( ) is slightly broken in PHP 4.3; it works fine in earlier and later versions of PHP. The bug is that if no path is present in the URL, all following components (such as a query or fragment) are incorrectly appended to the host element. To fix this, you can include the following fragment after the call to parse_url( ):

// Fix the hostname (if needed) in PHP 4.3

if (strpos($bits["host"], '?'))

  $bits["host"] = substr($bits["host"], 0, strpos($bits["host"], '?'));

if (strpos($bits["host"], '#'))

  $bits["host"] = substr($bits["host"], 0, strpos($bits["host"], '#'));

For non-Unix environments, you can check the host domain exists by using the PEAR-based approach described in the previous section.

9.2.2.5 Validating numbers

Checking that values are numeric, are within a range, or have the correct format is a common validation task. For our case study customer example, there might be several semi-numeric fields such as fax and telephone numbers, the customer's salary, or a credit card number. Zip and post codes aren't always numeric, and are discussed in Section 9.2.2.

The two most common checks for numbers are whether they are in fact numeric and whether they're within a required range. In PHP, the is_numeric( ) function can be used to check if a variable contains only digits or if it matches one of the legal number formats. For example, to check if a salary is numeric, you can use:

if (!is_numeric($salary))

  print "Salary must be numeric";

The is_numeric( ) function doesn't always behave in the way you expect. Leading and trailing spaces, carriage returns, commas, and spaces after minus signs can result in a false return value. Leading and trailing spaces can be removed with the trim( ) function, while allowing specialized formats may instead require the use of a regular expression.

The legal number formats to is_numeric( ) include integers such as 87000, scientific notation such as 12e4, floating point numbers such as 3.14159 (or 3,14159 if your locale is set to France), hexadecimal notation such as 0xff, and negative numbers such as -1.

Before checking variables initialized from form data, they should be converted to a numeric type using the functions intval( ) or floatval( ) that convert a string to a number. A test such as if ($_GET["year"] < 1902) may not work as expected, because $_GET["year"] is a string and 1902 is an integer. The test if (intval($_GET["year"]) < 1902) works reliably. Both functions are discussed in Chapter 3.

Consider an example. Suppose that a whole-dollar salary is provided from a form through the POST method and is stored as $_POST["salary"]. To check if it's a valid number, use the following steps:

if (!is_numeric($salary))

  print "Salary must be numeric";

else 

  // remove spaces and convert to an integer

  $salary = intval($_POST["salary"]);

After type conversion to numbers, form data can be validated to check whether it meets range requirements using the basic comparison operators. For example, to check that an age is in a sensible range, you could use:

if ($age < 5 || $age > 105)

  print "Age must be in the range 5 to 105";

Another common type of numeric validation is checking currencies. Generally, these have one of two common formats: only a currency amount (for example, 10 dollars, 10 cents, or 25 Yen), or a currency amount and a unit amount (for example, $10.15). Currencies should be checked to see if they match the required format, and then (if needed) to see if they're within a range. For example, to check if a currency amount is in whole dollars and between four and six digits in length, you could use:

if (!ereg("^[0-9]{4,6}", $salary))

  print "Salary must be in whole dollars";

To check if a value is in the currency and unit format, you could use:

if (!ereg("^[0-9]{1,3}[.][0-9]{2}$", $price))

  print "Item price must be between US$0.00 and US$999.99, " . 

        "and must include the cent amount.";

It's important for an internationalized web database application to inform the user what currencies are allowed.

Simple variations of the currency validation techniques can be used to check the format of floating point numbers. For example, if a maximum of five decimal places are allowed for a length value, use:

if (!ereg("^[0-9]*([.][0-9]{1,5})?$", strval($length)))

  print "Length can have a maximum of five decimal places";

The expression ^[0-9]* allows any number of digits at the beginning of the number and before the optional decimal place. The ? in the expression ([.][0-9]{1,5})?$ implements an optional mantissa by allowing either zero or one copies of a string that matches the bracketed expression that precedes the ?. The bracketed expression itself requires a decimal point (represented by [.]), and then between one and five digits (represented by [0-9]{1,5}). The end of the number is expected after the optional mantissa. To allow positive or negative values to be specified, you could add [+-]? immediately after the ^ at the beginning of the expression.

It doesn't always make sense to range check numeric data. For example, phone and fax numbers aren't usually added, subtracted, or tested against ranges. In our customer example, we might validate a phone number using a regular expression that checks it has a reasonable structure:

// Phone is optional, but if it is entered it must have

// correct format

$validPhoneExpr = "^([0-9]{2,3}[ ]*)?[0-9]{4}[ ]*[0-9]{4}$";



if (!empty($phone) && !ereg($validPhoneExpr, $phone))

  print "The phone number must be 8 digits in length, " .

        "with an optional 2 or 3 digit area code";

This is an AND (&&) expression, so the ereg( ) function is only evaluated if the $phone variable is not empty.

The first expression ^([0-9]{2,3}[ ]*)? matches either zero or one occurrence of the bracketed expression at the beginning of the value. Inside the brackets, the expression that is matched is two or three digits and any number of optional space characters (represented as [ ]*). For example, a string 03 matches, as does 835. The second part of the expression [0-9]{4}[ ]*[0-9]{4}$ matches exactly four digits, followed by any number of optional spaces, followed by another four digits, and then the end of the string is expected. For example, the strings 1234 1234 and 12341234 both match the expression.

9.2.2.6 Validating credit cards

The last numeric type we consider in this section is credit card numbers. There are two steps to validating a credit card that's entered for payment of goods or services: first, we need to check the credit card number and its expiration date are valid; and, second, we need to verify that the payment will be honored by the bank or other credit card provider. If the user's entering their credit card as part of the account creation process, the second step isn't usually needed until they make a payment.

In this section, we show you how to validate a credit card number. Expiration dates can be validated using the date checking functions discussed later in this section.

Checking that payment will be honored by the credit card provider is outside the scope of this book. However, many credit card payment validation network libraries are available for this purpose: PEAR contains a few, several are available as PHP libraries as listed in Appendix G, and open source solutions have been developed and are readily available on the Web. All credit checking facilities require a paid subscription to a validation service.

Example 9-3 shows a function checkcard( ) that validates credit card numbers. The function works as follows. First, it checks the card number contains only digits and spaces, and after the check it removes the spaces using ereg_replace( ) leaving only the card number. Second, it extracts the first four digits and checks which of the different credit cards it matches and uses this to determine the correct length of the number; we discuss this further next. Third, it rejects cards that aren't supported or where the length doesn't match the correct length for the card. Last, the credit card is validated using the Luhn algorithm, which we return to in a moment.

Example 9-3. A function to validate credit card numbers

function checkcard($cc, $ccType)

{

  if (!ereg("^[0-9 ]*$", $cc))

  {

    print "Card number must contain only digits and spaces.";

    return (false);

  }



  // Remove spaces

  $cc = ereg_replace('[ ]', '', $cc);



  // Check first four digits

  $firstFour = intval(substr($cc, 0, 4));

  $type = "";

  $length = 0;



  if ($firstFour >= 8000 && $firstFour <= 8999)

  {

    // Try: 8000 0000 0000 1001

    $type = "SurchargeCard";

    $length = 16;

  }

  elseif ($firstFour >= 9100 && $firstFour <= 9599)

  {

    // Try: 9100 0000 0001 7

    $type = "AustralianExpress";

    $length = 13;

  }



  if (empty($type) || strcmp($type, $ccType) != 0)

  {

    print "Please check your card details.";

    return (false);

  }



  if (strlen($cc) != $length)

  {

    print "Card number must contain {$length} digits.";

    return (false);

  }



  $check = 0;



 // Add up every 2nd digit, beginning at the right end

  for($x=$length-1;$x>=0;$x-=2)

    $check += intval(substr($cc, $x, 1));



  // Add up every 2nd digit doubled, beginning at the right end - 1.

  // Subtract 9 where doubled value is greater than 10

  for($x=$length-2;$x>=0;$x-=2)

  {

    $double = intval(substr($cc, $x, 1)) * 2;

    if ($double >= 10)

      $check += $double - 9;

    else

      $check += $double;

  }



  // Is $check not a multiple of 10?

  if ($check % 10 != 0)

  {

    print "Credit card invalid. Please check number.";

    return (false);

  }

  return (true);

}

Table 9-1 shows the prefixes of the four most popular credit cards and the card number length for those cards. For example, MasterCard cards always begin with four digits in the range 5100 to 5599, and are sixteen digits in length. The function in Example 9-2 supports two fictional cards: SurchargeCard that begins with numbers in the range 8000 to 8999 and has 16 digits, and AustralianExpress with prefixes from 9100 to 9599 and 13 digits in length. Example valid card numbers for these fictional cards are included as comments in the code. You can find sample numbers for all popular cards at http://www.verisign.com/support/payflow/link/pfltestprocess.html.

Table 9-1. Popular credit card prefixes and lengths
Card name	Four-digit prefix	Length
American Express	3400-3499, 3700-3799	15
Diners Club	3000-3059, 3600-3699, 3800-3889	14
MasterCard	5100-5599	16
Visa	4000-4999	13 or 16

Credit card validation is performed with the Luhn algorithm. This works as follows:

Sum up every second digit in the credit card number, beginning with the last digit and proceeding right-to-left.
Sum up the double of every second digit in the credit card number, beginning with the second to the last digit and proceeding right-to-left. If the double of the digit is greater than 10, subtract 9 from the value before adding it to the sum.
Determine if the sum of the two steps is a multiple of 10. If it is, the credit card number is valid. If not, the number is rejected.

Consider an example credit card of ten digits in length: 1234000014. In the first step, we add every second digit from the right, beginning with the last. So, 4+0+0+4+2=10. Then, in the second step, we add the double of each digit beginning with the second last (subtracting 9 if any doubling is over 10) and then add the sum to the total from the first step. So, 2+0+0+6+2=10, and adding to 10 from the first step gives 20. Since 20 is exactly divisible by 10, the card has a valid number.

9.2.3 Validating Dates and Times

Dates of birth, expiry dates, order dates, and other dates are commonly entered by users. Most dates require specialized checks to see if the date is valid and if it's in a required date range. Times are less complicated, but specialized checks are still useful.

9.2.3.1 Dates

Dates can be given in several different formats and using many different calendars. We only discuss the Gregorian calendar here.

In the U.S., months are listed before days, but the majority of the rest of the world uses the opposite approach. Years can be provided as two or four digits, although we recommend avoiding two digit years for the obvious confusion caused when 99 comes before 00. This leads to four formats: DDMMYY, DDMMYYYY, MMDDYY, and MMDDYYYY, where Y is a year digit, M is month digit, and D is a day digit.

In all date formats, a forward slash, a hyphen, or (rarely) a colon can be used to separate the groups, leading to twelve formats in total. For sorting, a thirteenth (convenient) format is YYYYMMDD without the separators. Dates can also be specified using month names, leading to strings such as 11-Aug-1969 and 11 August 1969.

Date values have complex validation requirements, and are difficult to manipulate. Months have different numbers of days, some years are leap years, and some annual holidays fall on different days in different years. Adding and subtracting dates, working out the date of tomorrow or next week, and finding the first Sunday of the month aren't straightforward. A particularly non-straightforward task is finding when the Christian religion's Easter holiday falls in a year, as explained at the Astronomical Society of South Australia web site, http://www.assa.org.au/edm.html.

Consider an example from our customer case study. Let's suppose the user is required to provide a date of birth in the format common to most of the world, DD/MM/YYYY. We then need to validate this date of birth to check that it has been entered and to check its format, its validity, and whether it's within a range. The range of valid dates in the example begins with the user being alive?for simplicity, we assume alive users are born after 1902?and ends with the user being at least 18 years of age.

Date-of-birth checking is implemented with the code in Example 9-4.

Example 9-4. Date-of-birth validation

function checkdob($birth_date)

{

  if (empty($birth_date))

  {

    print "The date of birth field cannot be blank.";

    return false;

  }

  // Check the format and explode into $parts

  elseif (!ereg("^([0-9]{2})/([0-9]{2})/([0-9]{4})$", 

          $birth_date, $parts))

  {

    print "The date of birth is not a valid date in the 

           format DD/MM/YYYY";

    return false;

  }

  elseif (!checkdate($parts[2],$parts[1],$parts[3]))

  {

    print "The date of birth is invalid. Please check that the month is

           between 1 and 12, and the day is valid for that month.";

    return false;

  }

  elseif (intval($parts[3]) < 1902 || 

          intval($parts[3]) > intval(date("Y")))

  {

    print "You must be alive to use this service.";

    return false;

  }

  else

  {

    $dob = mktime(0, 0, 0, $parts[2], $parts[1], $parts[3]);



    // Check whether the user is 18 years old.

    if ((float)$dob > (float)strtotime("-18years"))

    {

      print "You must be 18+ years of age to use this service";

      return false;

    }

  }

  return true;

}

If any date test fails, an error is reported, and no further checks of the date are made. A valid date passes all the tests.

The first check tests if a date has been entered. The second check uses a regular expression to check whether the date consists of numbers and if it matches the template 99/99/9999 (where 9 means a number):

elseif (!ereg("^([0-9]{2})/([0-9]{2})/([0-9]{4})$", $birth_date, $parts))

{

  print "The date of birth is not a valid date in the format DD/MM/YYYY";

  return false;

}

You can adapt this check to match any of the other thirteen basic formats we outlined at the beginning of this section.

Whatever the result of this formatting check, the expression also explodes the date into the array $parts so that the component that matches the first bracketed expression ([0-9{2}) is found in $parts[1], the second bracketed expression in $parts[2], and the third bracketed expression in $parts[3]. Using this approach, the day of the month is accessible as $parts[1], the month as $parts[2], and the year as $parts[3]. The ereg( ) function also stores the string matching the complete expression in $parts[0].

The third check uses the exploded data stored in the array $parts and the function checkdate( ) to test if the date is a valid calendar date. For example, the date 31/02/1970 would fail this test. The fourth check tests if the year is in the range 1902 to the current year. The function date("Y") returns the current year as a string.

The fifth and final check tests if the user is 18 years of age or older, and uses the approach described in Chapter 3. It finds the difference between the date of birth and the current date using library functions, and checks that this difference is more than 18 years. We use the mktime( ) function to convert the date of birth to a large numeric Unix timestamp value, and the strtotime( ) function to discover the timestamp of exactly 18 years ago. Both are cast to a large floating number to ensure reliable comparison, and if the user is born in the past 18 years, an error is produced.

The mktime( ) function works for years between 1901 and 2038 on Unix systems, and only from 1970 to 2038 for variants of Microsoft Windows. The PEAR Date package doesn't suffer from year limitations, and we discuss how to use it later in this section.

9.2.3.2 Times

Times are easier to work with than dates, but they also come in several valid formats. These include the 24-hour clock format 9999, the 12-hour clock formats 99:99am or 99:99pm (or with a period instead of a colon), and formats that include seconds and hundredths of seconds. In each format, different ranges of values are allowed.

Consider an example where a user is required to enter a date in the 12-hour format using a colon as the separator. With this format, 12:42p.m. and 1:01a.m. are valid times. You can validate this format using the following regular expression:

if (!eregi("^(1[0-2]|0[1-9]):([0-5][0-9])(am|pm)$", $time))

  print "Time must be a valid 12-hour clock time in the format 

         HH:MMam or HH:MMpm.";

The first part of the expression ^(1[0-2]|0[1-9]) requires that the time begins with a number in range 10 to 12, or 01 to 09. After the colon, the second part of the expression requires the minute value to be in the range 00 to 59 as specified by the expression ([0-5][0-9]). Either AM or PM (in either upper- or lowercase) must then follow to conclude the time string.

For 24-hour times, a simple variant works:

if (!eregi("^([0-1][0-9]|2[0-3])([0-5][0-9])$", $time))

  print "Time must be a valid 24-hour clock time in the format HHMM.";

Working out differences between times is reasonably straightforward, after the time has been parsed into its components! For example, to check if a 12-hour clock arrival time is before a 12-hour clock departure time, use the following fragment:

// Explode departure time into the array $depBits

if (!eregi("^(1[0-2]|[1-9]):([0-5][0-9])(am|pm)$", $depTime, $depBits))

  print "Departure time must be a valid 12-hour clock time

         in the format HH:MMam or HH:MMpm.";



// Explode arrival time into the array $arrBits

if (!eregi("^(1[0-2]|[1-9]):([0-5][0-9])(am|pm)$", $arrTime, $arrBits))

  print "Arrival time must be a valid 12-hour clock time

         in the format HH:MMam or HH:MMpm.";



if (($depBits[3] == "pm" && $arrBits[3] == "am")) ||

    ($depBits[1] > $arrBits[1] && $depBits[3] == $arrBits[3]) ||

    ($depBits[2] >= $arrBits[2] && $depBits[1] == $arrBits[1] 

     && $depBits[3] == $arrBits[3]))

  print "Arrival time must be after departure time.";

The two ereg( ) expressions validate the format of a time using the approach we described previously. Similarly to our date validation, both expressions also explode the times into the arrays $arrBits and $depBits. The arrays contain the hour as elements $arrBits[1] and $depBits[1], the minutes as $arrBits[2] and $depBits[2], and the AM or PM suffix as $arrBits[3] and $depBits[3].

To determine if the arrival time is earlier than the departure time, there are three tests: first, if the arrival time is AM, the departure time can't be PM; second, if both times are AM or both times are PM the arrival hour can't be earlier than the departure hour; and, last, if both times are AM or both times are PM, and the departure hour is the arrival hour the arrival minutes can't be less than or equal to the departure minutes. With 24-hour times, only one test is needed; this is perhaps a good reason to use them in preference to 12-hour times in your applications.

For this type of validation, you could also convert a time to an integer value and then compare values. For example, you could convert two times to Unix timestamps and then compare these to determine if the arrival time is earlier than the departure time. However, as discussed in the previous section, the PHP date and time functions don't behave the same on all platforms, and so this approach isn't always portable between operating systems. For this reason, using logic as in our previous example or using a reliable package, such as the PEAR Date package discussed in the next section, is preferable.

9.2.3.3 Using the PEAR Date package

The PEAR Date package introduced in Chapter 7 is not limited in year ranges and provides a wide range of date validation and manipulation tools. It must be installed using the PEAR installer (as discussed in Chapter 7) and then the date calculation package must be included into the source code using:

require_once "Date/Calc.php";

An object can then be created using:

$date = new Date_Calc( );

Using the PEAR Date package, we can rewrite our date of birth checking in Example 9-4. Our third date of birth check can be rewritten to use the method isValidDate( ) as follows:

elseif (!$date->isValidDate($parts[1], $parts[2], $parts[3]))

{

  print "The date of birth is invalid. Please check that the month 

         is between 1 and 12, and the day is valid for that month.";

  return false;

}

The fourth check can be modified slightly to use the isFutureDate( ) method to check if the user has been born:

elseif (intval($parts[3]) < 1902 || 

        $date->isFutureDate($parts[1], $parts[2], $parts[3]))

{

  print "You must be alive to use this service.";

  return false;

}

The fifth check can make use of the compareDates( ) method to avoid the use of strtotime( ) and mktime( ) and solve the year limitation problem. The method compares two dates each specified as a day, month, and year. In our check, we test the difference between the date of birth and eighteen years earlier than today:

else

{

  // Check whether the user is 18 years old.

  if ($date->compareDates($parts[1], $parts[2], $parts[3],

      intval(date("d")), intval(date("m")), intval(date("Y"))-18) > 0)

  {

    print "You must be 18+ years of age to use this service.";

    return false;

  }

The compareDates( ) method returns 0 if the two dates are equal, -1 if the first date is less than the second, and 1 if the first date is greater than the second.

We've used three of the methods from the PEAR Date package. The package also has useful methods for determining if a year is a leap year, discovering the date of the beginning or end of the previous or next month, finding the date of the beginning or end of the previous or next week, finding the previous or next day or weekday, returning the number of days or weeks in a month, finding out the day of the week, converting dates to days, and returning formatted date strings.

Like many other PEAR packages, this one contains almost no documentation or examples. However, the methods are readable code and easy to use, and most are simple and reliable applications of the date functions that are discussed in Chapter 3. If you followed our PHP installation instructions in Appendix A through Appendix C and our PEAR installation instructions in Chapter 7, you'll find Date.php in /usr/local/lib/php/. The Date package also includes code in the file TimeZone.php for working with and finding the date and time in different time zones. If you're working with dates, PEAR Date is worth investigation and avoids most of the limitations of the PHP library functions.

9.2.3.4 Logic, the date function, and MySQL

There are other approaches to working with dates that don't use PEAR Date or Unix timestamps. Logic and the date( ) function can be combined to check and compare days, months, and years, similarly to our approach to testing times. For example, to check if a user is over 18, you can use this fragment after exploding the date into the array $parts:

// Were they born more than 19 years ago?

if (!((intval($parts[3]) < (intval(date("Y") - 19))) ||



// No, so were they born exactly 18 years ago, and

// has the month they were born in passed?

(intval($parts[3]) == (intval(date("Y")) - 18) &&

(intval($parts[2]) < intval(date("m")))) ||



// No, so were they born exactly 18 years ago in this

// month, and was the day today or earlier in the month?

(intval($parts[3]) == (intval(date("Y")) - 18) &&

(intval($parts[2]) ==  intval(date("m"))) &&

(intval($parts[1]) <= intval(date("d"))))))

  print "You must be 18+ years of age to use this service.";

You can also use the MySQL functions described in Chapter 15 through an SQL query as a simple calculator. However, the MySQL approach, which involves communication with the database, adds a lot more overhead and therefore is often less desirable than using PHP. However, if one or more dates are extracted from a database, MySQL date and time functions are a useful alternative for pre-processing prior to working with dates in PHP.