3.2 Strings

A string of characters is probably the most commonly used data type when developing scripts, and PHP provides a large library of string functions to help transform, manipulate, and otherwise manage strings. We introduced the basics of PHP strings in Chapter 2. In this section, we show you many of the useful PHP string functions.

3.2.1 Length of a String

The length property of a string is determined with the strlen( ) function, which returns the number of eight-bit characters in the subject string:

integer strlen(string subject)

We used strlen( ) earlier in the chapter to compare string lengths. Consider another simple example that prints the length of a 16-character string:

print strlen("This is a String");  // prints 16

3.2.2 Printing and Formatting Strings

In the previous chapter, we presented the basic method for outputting text with echo and print. Earlier in this chapter, we showed you the functions print_r( ) and var_dump( ), which can determine the contents of variables during debugging. PHP provides several other functions that allow more complex and controlled formatting of strings, and we discuss them in this section.

3.2.2.1 Creating formatted output with sprintf( ) and printf( )

Sometimes, more complex output is required than can be produced with echo or print. For example, a floating-point value such as 3.14159 might need to be truncated to 3.14 in the output. For complex formatting, the sprintf( ) or printf( ) functions are useful:

string sprintf (string format [, mixed args...])
integer printf (string format [, mixed args...])

The operation of these functions is modeled on the identical C programming language functions, and both expect a format string with optional conversion specifications, followed by variables or values as arguments to match any formatting conversions. The difference between sprintf( ) and printf( ) is that the output of printf( ) goes directly to the output buffer that PHP uses to build a HTTP response, whereas the output of sprintf( ) is returned as a string.

Consider an example printf( ) statement:

$variable = 3.14159;



// prints "Result: 3.14"

printf("Result: %.2f\n", $variable);

The format string Result: %.2f\n is the first parameter to the printf( ) statement. Strings such as Result: are output the same as with echo or print. The %.2f component is a conversion specification that describes how the value of $variable is to be formatted. Conversion specifications always start with the % character and end with a type specifier; and can include width and precision components in between. The example above includes a precision specification .2 that prints two decimal places.

A specifier %5.3f means that the minimum width of the number before the decimal point should be five (by default, the output is padded on the left with space characters and right-aligned), and three digits should occur after the decimal point (by default, the output on the right of the decimal point is padded on the right with zeros).

Table 3-1 shows all the types supported by sprintf( ) and printf( ). While width specifiers can be used with all types?we show examples in Example 3-2?decimal precision can only be used with floating point numbers.

Table 3-1. Conversion types used in sprintf( ) and printf( )

Type

Description

%%

A literal percent character

%b

An integer formatted as a binary number

%c

An integer formatted as an ASCII character

%d

An integer formatted as a signed decimal number

%u

An integer formatted as an unsigned decimal number

%o

An integer formatted as an octal number

%x or %X

An integer formatted as a hexadecimal number using lowercase letters or uppercase letters

%f

A float formatted with specified decimal places

%s

A string


Both sprintf( ) and printf( ) allow the formatting of multiple parameters: each conversion specification in the format string formatting the corresponding parameter. Example 3-2 illustrates the use of printf( ) and sprintf( ), including how multiple parameters are formatted.

Example 3-2. Using printf to output formatted data
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"

                      "http://www.w3.org/TR/html401/loose.dtd">

<html>

<head>

  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

  <title>Examples of using printf( )</title>

</head>

<body bgcolor="#ffffff">

<h1>Examples of using printf( )</h1>

<pre>

<?php

    // Outputs "pi equals 3.14159"

    printf("pi equals %f\n", 3.14159);



    // Outputs "3.14"

    printf("%.2f\n", 3.14159);



    // Outputs "      3.14"

    printf("%10.2f\n", 3.14159);



    // Outputs "3.1415900000"

    printf("%.10f\n", 3.14159);



    // Outputs "halfofthe"

    printf("%.9s\n", "halfofthestring");



    // Outputs "1111011 123 123.000000 test"

    printf("%b %d %f %s\n", 123, 123, 123, "test");



    // Outputs "Over 55.71% of statistics are made up."

    printf("Over %.2f%% of statistics are made up.\n", 55.719);



    // sprintf( ) works just the same except the 

    // output is returned as a string

    $c = 245;

    $message = sprintf("%c = %x (Hex) %o (Octal)", $c, $c, $c);



    // prints "õ = f5 (Hex) 365 (Octal)"

    print($message);?>

</pre>

</body>

</html>

3.2.2.2 Padding strings

A simple method to space strings is to use the str_pad( ) function:

string str_pad(string input, int length [, string padding [, int pad_type]])

Characters are added to the input string so that the resulting string has length characters. The following example shows the simplest form of str_pad( ) that adds spaces to the end of the input string:

// prints "PHP" followed by three spaces

print str_pad("PHP", 6);

An optional string argument padding can be supplied that is used instead of the space character. By default, padding is added to the end of the string. By setting the optional argument pad_type to STR_PAD_LEFT or to STR_PAD_BOTH, the padding is added to the beginning of the string or to both ends. The following example shows how str_pad( ) can create a justified index:

$players =  

   array("DUNCAN, king of Scotland"=>"Larry", 

         "MALCOLM, son of the king"=>"Curly",  

         "MACBETH"=>"Moe",

         "MACDUFF"=>"Rafael");

 

print "<pre>";

 

// Print a heading

print str_pad("Dramatis Personae", 50, " ", STR_PAD_BOTH) . "\n";

 

// Print an index line for each entry

foreach($players as $role => $actor)

    print str_pad($role, 30, ".") 

          . str_pad($actor, 20, ".", STR_PAD_LEFT) 

          . "\n";

 

print "</pre>";

A foreach loop is used to create a line of the index: the loop assigns the key and value of the $players array to $role and $actor. The example prints:

                Dramatis Personae                 

DUNCAN, king of Scotland.....................Larry

MALCOLM, son of the king.....................Curly

MACBETH........................................Moe

MACDUFF.....................................Rafael

We have included the <pre> tags so a web browser doesn't ignore the spaces used to pad out the heading, and that a non-proportional font is used for the text; without the <pre> tags in this example, things don't line up.

3.2.2.3 Changing case

The following PHP functions return a copy of the subject string with changes in the case of the characters :

string strtolower(string subject)
string strtoupper(string subject)
string ucfirst(string subject)
string ucwords(string subject)

The following fragment shows how each operates:

print strtolower("PHP and MySQL"); // php and mysql

print strtoupper("PHP and MySQL"); // PHP AND MYSQL

print ucfirst("now is the time");  // Now is the time

print ucwords("now is the time");  // Now Is The Time

3.2.2.4 Trimming whitespace

PHP provides three functions that trim leading or trailing whitespace characters from strings:

string ltrim(string subject [, string character_list])
string rtrim(string subject [, string character_list])
string trim(string subject [, string character_list])

The three functions return a copy of the subject string: trim( ) removes both leading and trailing whitespace characters, ltrim( ) removes leading whitespace characters, and rtrim( ) removes trailing whitespace characters. The following example shows the effect of each:

$var = trim(" Tiger Land \n");   // "Tiger Land"

$var = ltrim(" Tiger Land \n");  // "Tiger Land \n"

$var = rtrim(" Tiger Land \n");  // " Tiger Land"

By default these functions trim space, tab (\t), newline (\n), carriage return (\r), NULL (\x00 ), and the vertical tab (\x0b ) characters. The optional character_list parameter allows you to specify the characters to trim. A range of characters can be specified using two periods (..) as shown in the following example:

$var = trim("16 MAY 2004", "0..9 ");  // Trims digits and spaces

print $var;                           // prints "MAY"

3.2.3 Comparing Strings

PHP provides the string comparison functions strcmp( ) and strncmp( ) that compare two strings in alphabetical order, str1 and str2:

integer strcmp(string str1, string str2)
integer strncmp(string str1, string str2, integer length)

While the equality operator == can compare two strings, the result isn't always as expected for strings with binary content or multi-byte encoding: strcmp( ) and strncmp( ) provide binary safe string comparison. Both strcmp( ) and strncmp( ) take two strings as parameters, str1 and str2, and return 0 if the strings are identical, 1 if str1 is less than str2, and -1 if str1 is greater that str2. The function strncmp( ) takes a third argument length that restricts the comparison to length characters. String comparisons are often used as a conditional expression in an if statement like this:

$a = "aardvark";

$z = "zebra";



// Test if $a and $z are not different (i.e. the same)

if (!strcmp($a, $z))

    print "a and z are the same";

When strcmp( ) compares two different strings, the function returns either -1 or 1 which is treated as true in a conditional expression. These examples show the results of various comparisons:

print strcmp("aardvark", "zebra");        // -1

print strcmp("zebra", "aardvark");        //  1

print strcmp("mouse", "mouse");           //  0

print strcmp("mouse", "Mouse");           //  1

print strncmp("aardvark", "aardwolf", 4); //  0

print strncmp("aardvark", "aardwolf", 5); // -1

The functions strcasecmp( ) and strncasecmp( ) are case-insensitive versions of strcmp( ) and strncmp( ). For example:

print strcasecmp("mouse", "Mouse");       //  0

The functions strcmp( ), strncmp( ), strcasecmp( ), or strncasecmp( ) can be used as the callback function when sorting arrays with usort( ). See Section 3.1.4 earlier in this chapter for a discussion on usort( ).

3.2.4 Finding and Extracting Substrings

PHP provides several simple and efficient functions that can identify and extract specific substrings of a string. As is common with string libraries in other languages, PHP string functions reference characters using an index that starts at zero for the first character, one for the next character and so on.

3.2.4.1 Extracting a substring from a string

The substr( ) function returns a substring from a source string:

string substr(string source, integer start [, integer length])

When called with two arguments, substr( ) returns the characters from the source string starting from position start (counting from zero) to the end of the string. With the optional length argument, a maximum of length characters are returned. The following examples show how substr( ) works:

$var = "abcdefgh";



print substr($var, 2);       //  "cdefgh"

print substr($var, 2, 3);    //  "cde"

print substr($var, 4, 10);   //  "efgh"

If a negative start position is passed as a parameter, the starting point of the returned string is counted from the end of the source string. If the length is negative, the returned string ends length characters from the end of the source string. The following examples show how negative indexes can be used:

$var = "abcdefgh";



print substr($var, -1);      //  "h"

print substr($var, -3);      //  "fgh"

print substr($var, -5, 2);   //  "de"

print substr($var, -5, -2);  //  "def"

3.2.4.2 Finding the position of a substring

The strpos( ) function returns the index of the first occurring substring needle in the string haystack:

integer strpos(string haystack, string needle [, integer offset])

When called with two arguments, the search for the substring needle is from the start of the string haystack at position zero. When called with three arguments, the search occurs from the index offset into the haystack. The following examples show how strpos( ) works:

$var = "To be or not to be";



print strpos($var, "T");     // 0

print strpos($var, "be");    // 3



// Start searching from the 5th character in $var

print strpos($var, "be", 4); // 16

The strrpos( ) function returns the index of the last occurrence of the single character needle in the string haystack:

integer strrpos(string haystack, string needle)

Prior to PHP 5, strrpos( ) uses the first character of needle to search. The following example shows how strrpos( ) works:

$var = "and by a sleep to say we end the heart-ache";



// Prints 18 using PHP 4.3 matching the "s" in "say"

// Prints 9 using PHP 5 matching the whole string "sleep"

print strrpos($var, "sleep");



// Prints 22 using PHP 4.3 matching the "w" of "we"

// The function returns false using PHP 5 as "wally" 

//   is not found

print strrpos($var, "wally");

If the substring needle isn't found by strpos( ) or strrpos( ), both functions return false. The is-identical operator ===, or the is-not-identical operator !== should be used when testing the returned value from these functions. This is because if the substring needle is found at the start of the string haystack, the index returned is zero and is interpreted as false if used as a Boolean value.

Example 3-3 shows how strpos( ) can be repeatedly called to find parts of a structured sequence like an Internet domain name.

Example 3-3. Using strpos( ) and substr( )
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"

                      "http://www.w3.org/TR/html401/loose.dtd">

<html>

<head>

  <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

  <title>Hello, world</title>

</head>

<body bgcolor="#ffffff">

<?php



    $domain = "orbit.mds.rmit.edu.au";



    $a = 0;

    while (($b = strpos($domain, ".", $a)) !== false)

    {

        print substr($domain, $a, $b-$a) . "\n";

        $a = $b + 1;

    }



    // print the piece to the right of the last found "."

    print substr($domain, $a);



?>

</body>

</html>

A while loop is used to repeatedly find the period character (.) in the string $domain. The body of the loop is executed if the value returned by strpos( ) is not false?we also assign the return result to $b in the same call. This is possible because an assignment can be used as an expression. In Example 3-3, the value of the assignment

($b = strpos($domain, ".", $a))

is the same as the value returned from calling strpos( ) alone

strpos($domain, ".", $a)

Each time strpos( ) is called, we pass the variable $a as the starting point in $domain for the search. For the first call, $a is set to zero and the first period in the string is found. The body of the while loop uses substr( ) to print the characters from $a up to the period character that's been found?the first time through the loop substr( ) prints $b characters from the string $domain starting from position zero. The starting point for the next search is calculated by setting $a to the location of the next character after the period found at position $b. The loop is then repeated if another period is found. When no more period characters are found, the final print statement uses substr( ) to print the remaining characters from the string $domain.

// print the piece to the right of the last found "."

print substr($domain, $a);

The output of Example 3-3 is:

orbit

mds

rmit

edu

au

3.2.4.3 Extracting a found portion of a string

The strstr( ) and stristr( ) functions search for the substring needle in the string haystack and return the portion of haystack from the first occurrence of needle to the end of haystack:

string strstr(string haystack, string needle)
string stristr(string haystack, string needle)

The strstr( ) search is case-sensitive, and the stristr( ) search isn't. If the needle isn't found in the haystack string, both strstr( ) and stristr( ) return false. The following examples show how the functions work:

$var = "To be or not to be";



print strstr($var, "to");    //  "to be"

print stristr($var, "to");   //  "To be or not to be"

print stristr($var, "oz");   // false

The strrchr( ) function returns the portion of haystack by searching for the single character needle; however, strrchr( ) returns the portion from the last occurrence of needle:

string strrchr(string haystack, string needle)

Unlike strstr( ) and stristr( ), strrchr( ) searches for a single character, and only the first character of the needle string is used. The following examples show how strrchr( ) works:

$var = "To be or not to be";



// Prints: "not to be"

print strrchr($var, "n"); 



// Prints "o be": Only searches for "o" which

// is found at position 14

print strrchr($var, "or");

3.2.5 Replacing Characters and Substrings

PHP provides several simple functions that can replace specific substrings or characters in a string with other strings or characters. These functions don't change the input string, instead they return a copy of the input modified by the require changes. In the next section, we discuss regular expressions, which are powerful tools for finding and replacing complex patterns of characters. However, the functions described in this section are faster than regular expressions and usually a better choice for simple tasks.

3.2.5.1 Replacing substrings

The substr_replace( ) function returns a copy of the source string with the characters from the position start to the end of the string replaced with the replace string:

string substr_replace(string source, string replace, int start [, int length])

If the optional length is supplied, only length characters are replaced. The following examples show how substr_replace( ) works:

$var = "abcdefghij";



// prints "abcDEF";

print substr_replace($var, "DEF", 3);



// prints "abcDEFghij";

print substr_replace($var, "DEF", 3, 3);



// prints "abcDEFdefghij";

print substr_replace($var, "DEF", 3, 0);

The last example shows how a string can be inserted by setting the length to zero.

The str_replace( ) function returns a string created by replacing occurrences of the string search in subject with the string replace:

mixed str_replace(mixed search, mixed replace, mixed subject)

In the following example, the subject string, "old-age for the old", is printed with both occurrences of old replaced with new:

$var = "old-age for the old.";



print str_replace("old", "new", $var);

The result is:

new-age for the new.

Since PHP 4.0.5, str_replace( ) allows an array of search strings and a corresponding array of replacement strings to be passed as parameters. The following example shows how the fields in a very short form letter can be populated:

// A short form-letter for an overdue account

$letter = "Dear #title #name, you owe us $#amount.";



// Set-up an array of three search strings that will be 

// replaced in the form-letter

$fields = array("#title", "#name", "#amount");



// Set-up an array of debtors. Each element is an array that

// holds the replacement values for the form-letter

$debtors = array(

    array("Mr", "Cartwright", "146.00"),

    array("Ms", "Yates", "1,662.00"),

    array("Dr", "Smith", "84.75"));



foreach($debtors as $debtor)

    print str_replace($fields, $debtor, $letter) . "\n";

The $fields array contains a list of strings that are to be replaced. These strings don't need to follow any particular format; we have chosen to prefix each field name with the # character to clearly identify the fields in the letter. The body of the foreach loop calls str_replace( ) to replace the corresponding fields in $letter with the values for each debtor. The output of this script is as follows:

Dear Mr Cartwright, you owe us $146.00.

Dear Ms Yates, you owe us $1,662.00.

Dear Dr Smith, you owe us $84.75.

If the array of replacement strings is shorter than the array of search strings, the unmatched search strings are replaced with empty strings.

3.2.5.2 Translating characters and substrings

The strtr( ) function translates characters or substrings in a subject string:

string strtr(string subject, string from, string to)
string strtr(string subject, array map)

When called with three arguments, strtr( ) translates the characters in the subject string that match those in the from string with the corresponding characters in the to string. When called with two arguments, you must use an associative array called a map. Occurrences of the map keys in subject are replaced with the corresponding map values.

The following example uses strtr( ) to replace all lowercase vowels with the corresponding umlauted character:

$mischief = strtr("command.com", "aeiou", "äëïöü");

print $mischief;  // prints cömmänd.cöm

When an associative array is passed as a translation map, strtr( ) replaces substrings rather than characters. The following example shows how strtr( ) can expand acronyms:

// Create an unintelligible email

$geekMail = "BTW, IMHO (IOW) you're wrong!";



// Short list of acronyms used in e-mail

$glossary = array("BTW"=>"by the way",

                  "IMHO"=>"in my humble opinion",

                  "IOW"=>"in other words",

                  "OTOH"=>"on the other hand");



// Maybe now I can understand

// Prints: by the way, in my humble opinion (in other words) you're wrong! 

print strtr($geekMail, $glossary);