Investigating Strings in PHP

You do not always know everything about the data that you are working with. Strings can arrive from many sources, including user input, databases, files, and Web pages. Before you begin to work with data from an external source, you often will need to find out more about it. PHP provides many functions that enable you to acquire information about strings.

A Note About Indexing Strings

We will frequently use the word index in relation to strings. You will have come across the word more frequently in the context of arrays. In fact, strings and arrays are not as different as you might imagine. You can think of a string as an array of characters. So you can access individual characters of a string as if they were elements of an array:

$test = "scallywag";
print $test[0]; // prints "s"
print $test[2]; // prints "a"

It is important to remember, therefore, that when we talk about the position or index of a character within a string, characters, like array elements, are indexed from 0.

Finding the Length of a String with strlen()

You can use strlen() to determine the length of a string. strlen() requires a string and returns an integer representing the number of characters in the variable you have passed it. strlen() might typically be used to check the length of user input. The following snippet tests a membership code to ensure that it is four characters long:

if (strlen($membership) == 4) {
    print "Thank you!";
} else {
    print "Your membership number must have 4 digits<P>";

The user is thanked for his input only if the global variable $membership contains four characters; otherwise, an error message is generated.

Finding a Substring Within a String with strstr()

You can use strstr() to test whether a string exists embedded within another string. strstr() requires two arguments: a source string and the substring you want to find within it. The function returns false if the substring is absent. Otherwise, it returns the portion of the source string beginning with the substring. For the following example, imagine that we want to treat membership codes that contain the string AB differently from those that do not:

$membership = "pAB7";
if (strstr($membership, "AB")) {
   print "Thank you. Don't forget that your membership expires soon!";
} else {
   print "Thank you!";

Because our test variable $membership does contain the string AB, strstr() returns the string AB7. This resolves to true when tested, so we print a special message. What happens if our user enters "pab7"? Because strstr() is case sensitive, AB will not be found. The if statement's test will fail, and the default message will be printed to the browser. If we want to search for either AB or ab within the string, we must use stristr(), which works in exactly the same way but is not case sensitive.

Finding the Position of a Substring with strpos()

The strpos() function tells you both whether a string exists within a larger string and where it is to be found. strpos() requires two arguments: the source string and the substring you are seeking. The function also accepts an optional third argument, an integer representing the index from which you want to start searching. If the substring does not exist, strpos() returns false; otherwise, it returns the index at which the substring begins. The following snippet uses strpos() to ensure that a string begins with the string mz:

$membership = "mz00xyz";
if (strpos($membership, "mz") === 0) {
   print "hello mz";

Notice the trick we had to play to get expected results. strpos() finds mz in our string but finds it at the first element of the string. This means that it will return zero, which will resolve to false in our test. To work around this problem, we use PHP's equivalence operator ===, which returns true if the left- and right-hand operands are equivalent and of the same type.

Extracting Part of a String with substr()

The substr() function returns a portion of a string based on the start index and length of the portion you are looking for. This function demands two arguments: a source string and the starting index. It returns all characters from the starting index to the end of the string you are searching. It optionally accepts a third argument, which should be an integer representing the length of the string you want returned. If this argument is present, substr() returns only the number of characters specified from the start index onward.

$test = "scallywag";
print substr($test,6);   // prints "wag"
print substr($test,6,2)   // prints "wa"

If you pass substr() a negative number as its second (starting index) argument, it will count from the end rather than the beginning of the string. The following snippet writes a specific message to people who have submitted an e-mail address ending in .uk:

$test = "";
if ($test = substr($test, -3) == ".uk") {
   print "Don't forget our special offers for British customers";
} else {
   print "Welcome to our shop!";

Tokenizing a String with strtok()

You can parse a string word by word using strtok(). The strtok() function initially requires two arguments: the string to be tokenized and the delimiters by which to split the string. The delimiter string can include as many characters as you want, and the function will return the first token found. After strtok() has been called for the first time, the source string will be cached. For subsequent calls, you should pass strtok() only the delimiter string. The function will return the next found token every time it is called, returning false when the end of the string is reached. strtok() will usually be called repeatedly within a loop.

Listing 13.3 uses strtok() to tokenize a URL, splitting the host and path from the query string, and further dividing the name/value pairs of the query string.

Listing 13.3 Dividing a String into Tokens with strtok()
  1: <html>
  2: <head>
  3: <title>Listing 13.3 Dividing a string into tokens with strtok()</title>
  4: </head>
  5: <body>
  6: <?php
  7: $test = "";
  8: $test .= "OP=dnquery.xp&ST=MS&DBS=2&QRY=developer+php";
  9: $delims = "?&";
 10: $word = strtok($test, $delims);
 11: while (is_string($word)) {
 12:    if ($word) {
 13:        print "$word<br>";
 14:    }
 15:    $word = strtok($delims);
 16: }
 17: ?>
 18: </body>
 19: </html>

Put these lines into a text file called listing 13.3.php, and place this file in your Web server document root. When you access this script through your Web browser, it should look like Figure 13.4.

Figure 13.4. Output of Listing 13.3, a tokenized string.


The strtok() function is something of a blunt instrument, and a few tricks are required to work with it. We first store the delimiters that we want to work with in a variable, $delims on line 9. We call strtok() on line 10, passing it the URL we want to tokenize and the $delims string. We store the first result in $word. Within the conditional expression of the while loop on line 11, we test whether $word is a string. If it isn't, we know that the end of the string has been reached and no further action is required.

We are testing the return type because a string containing two delimiters in a row would cause strtok() to return an empty string when it reaches the first of these delimiters. So a more conventional test such as

while ($word) {
      $word = strtok($delims);

would fail if $word is an empty string, even if the end of the source string has not yet been reached.

Having established that $word contains a string, we can go on to work with it. If $word does not contain an empty string, we print it to the browser on line 13. We must then call strtok() again on line 15 to repopulate the $word variable for the next test. Notice that we don't pass the source string to strtok() a second time. If we were to do this, the first word of the source string would be returned once again, and we would find ourselves in an infinite loop.

    Part III: Getting Involved with the Code