8.6 Creating Filters

PHP 5 comes with very few stream filters, but it introduces the ability to write your own filters in PHP, effectively giving you unlimited possibilities. PHP 4 requires all filters to be implemented in C.

The filter interface is much less complex than the wrapper interface. It has only three different methods, and it's common to implement only a single method, filter( ). Table 8-3 contains an overview of the API.

Table 8-3. Filter methods



int filter(resource in, resource out, int &consumed, bool closing)

Called during data filtering; may be called multiple times per stream

void onCreate(void)

Called during filter instantiation

void onClose(void)

Called during filter destruction

The filter( ) method is where all the action takes place. Inside this method, you're required to process the incoming data, filter it when you can, and then alert the filter of your progress.

This section shows how to implement two different filters: one that encodes special characters (such as & and <) to their HTML entities equivalents, and one the does the reverse, by transforming HTML entities back into characters.

8.6.1 Converting to HTML Entities

Example 8-6 shows a filter that encodes HTML entities using the htmlentities( ) function.

Example 8-6. Encoding HTML entities with a filter
class htmlentitiesFilter extends php_user_filter {

    function filter($in, $out, &$consumed, $closing) {

        while ($bucket = stream_bucket_make_writeable($in)) {

            $bucket->data = htmlentities($bucket->data);

            $consumed += $bucket->datalen;

            stream_bucket_append($out, $bucket);


        return PSFS_PASS_ON;



This class looks complicated because there are many new functions and objects, but it's actually quite boring and simple.

Filters written in PHP are implemented as objects. There's a predefined base class, php_user_filter, which is automatically available for you to use; it does not need to be included. All filters must extend this class and implement the filter( ) method.

The filter( ) method is where the actual data conversion takes place. This method takes four parameters: $in, $out, &$consumed, and $closing. The first two parameters are the input and output bucket resources. These resources hold stream data.

The third parameter, &$consumed, is set to the amount of data in the stream already processed, or "eaten up," by the filter. This parameter must always be passed as a reference. The final argument is $closing. It's set to true when filter( ) is called for the last time, so you can be sure to flush any remaining data if necessary.

Your primary goal inside filter( ) is to take data from the input bucket, convert it, and then add it to the output bucket. However, you can't just operate on the bucket resources directly; instead, you need to call a few helper functions to convert the data from a resource to an object that is modifiable in PHP.

The stream_bucket_make_writable( ) function retrieves a portion of the data from the input bucket and converts it to a PHP bucket object. This object has two properties: data and datalen. The data property is a string holding the bucket's data, whereas datalen is its length.

Since this filter wants to modify all the input data to escape HTML entities, pass $bucket->data to htmlentities( ) and assign the return value back to $bucket->data. This alters the data inside the bucket object.

Your next step is to let the filter know you've processed some data. Do this by incrementing the value of $consumed by the length of the data you've filtered. Since in this case you've filtered the entire data property, you can just add the value of $bucket->datalen.

The last step is to take the bucket object and add it as output. The stream_bucket_append( ) function takes a bucket resource and appends a bucket object to it. Therefore, to add the converted bucket object back to the stream, call stream_bucket_append($out, $bucket).

This entire process takes place inside a while loop because the stream passes data to you in chunks instead of sending the entire dataset at once. When there's no more data, stream_bucket_make_writable( ) returns false and the loop terminates.

Once the loop is completed, you must return one of three constants: PSFS_PASS_ON, PSFS_FEED_ME, or PSFS_ERR_FATAL. When everything goes as planned, return PSFS_PASS_ON. When everything's okay but your filter cannot yet return any data because it needs additional data to complete the filtering process, return PSFS_FEED_ME. Whenever there's an unrecoverable error, return PSFS_ERR_FATAL.

Register the class as a filter with stream_filter_register( ):

stream_filter_register('convert.htmlentities', 'htmlentitiesFilter')

    or die('Failed to register filter');

$html = 'I am <b>bold</b>. I am <i>italic</i>.';

$fp = fopen('php://output', 'r');

stream_filter_prepend($fp, 'convert.htmlentities');

fwrite($fp, $html);


This prints:

I am &lt;b&gt;bold&lt;/b&gt;. I am &lt;i&gt;italic&lt;/i&gt;.

8.6.2 Converting from HTML Entities

The inverse operation, decoding HTML entities, requires special handling. Because HTML entities are multiple characters, they may span buckets. For example, one bucket could end with &a and the next one could begin with mp;.

When you don't have a good algorithm for identifying partial sequences, store the entire dataset into a property and wait until the $closing parameter is true, as in Example 8-7.

Example 8-7. Decoding HTML entities with a filter
class dehtmlentitiesFilter extends php_user_filter {

    function onCreate( ) {

        $this->data = '';

        return true;


    function filter($in, $out, &$consumed, $closing) {

        while ($bucket = stream_bucket_make_writeable($in)) {

            $this->data .= $bucket->data;

            $this->bucket = $bucket;

            $consumed = 0;


        if ($closing) {

            $consumed += strlen($this->data);


            // decode named entities

            $this->data = html_entity_decode($this->data);


            $this->bucket->data = $this->data;

            $this->bucket->datalen = strlen($this->data);

            stream_bucket_append($out, $this->bucket);

            return PSFS_PASS_ON;


        return PSFS_FEED_ME;



Unlike the filter in Example 8-6, Example 8-7 uses the onCreate( ) method to initialize a property. This method is run the first time the filter is invoked. There's also an onClose( ) method that's called when the filter finishes, but neither of the two example filters uses it.

Instead of reading in buckets and converting data as it arrives, filter( ) stores all the data locally and executes the conversion only at the end. This way you're assured that you're never caught mid-entity.

Inside the while, append each bucket's data to the $data property. You also store the bucket in the $bucket property; otherwise, PHP destroys the bucket when the loop terminates. Finally, since none of the data is consumed right now, always set $consumed to 0.

After processing all the buckets, check to see if the stream is closing. If it's true, you know all the data has arrived, so you can begin the conversion.

First, update $consumed to the length of the data. Then, decode the HTML entities with html_entity_decode( ), storing the results in $this->data.

With the mapping complete, it's time to update the bucket. Its data property gets the converted data, and its datalen property is set to the data's length. This is not the same as $consumed. It will be shorter because you're converting &amp; back into &, for example.

Now append the bucket to the output and return PSFS_PASS_ON. However, when $closing is false, omit the conversions and return PSFS_FEED_ME to let the stream know that your filter needs additional data to process.

Use this filter like the HTML entity encoding filter:

stream_filter_register("convert.dehtmlentities", "dehtmlentitiesFilter")

    or die("Failed to register filter");

$ascii = 'I am &lt;b&gt;bold&lt;/b&gt;. I am &lt;i&gt;italic&lt;/i&gt;.';

$fp = fopen("php://output", "r");

stream_filter_prepend($fp, "convert.dehtmlentities");

fwrite($fp, $html);


I am <b>bold</b>. I am <i>italic</i>.