10.2 Gathering the News

The code in Example 10-3 performs the heavy lifting required to actually load the news. Notice that a URL to the PopDex service is included as well (but commented out). PopDex, http://www.popdex.com/, also provides feeds. Another popular service, DayPop (http://www.daypop.com/) offers a list of popular weblog memes. To add or change the feeds you're interested in (and therefore the content of the news service), simply include different RSS feed(s). For example, if you're interested in health issues, you might browse Yahoo's RSS offerings (http://news.yahoo.com/rss) for a more suitable, health-related feed.

Example 10-3. Building the news sheet

package com.cascadetg.ch10;



import java.util.Timer;



import de.nava.informa.core.ChannelIF;

import de.nava.informa.core.ItemIF;

import de.nava.informa.impl.basic.ChannelBuilder;

import de.nava.informa.parsers.RSSParser;

import com.google.soap.search.*;

import com.cascadetg.ch04.DeveloperTokens;



public class NewsSheet extends java.util.TimerTask

{

    static NewsSheet runner = null;

    public static boolean isReady = false;



    public static String blogdexURL =

        "http://blogdex.net/xml/index.asp";



    public static String popdexURL = 

        "http://www.popdex.com/rss.xml";



    public static String yahooURL =

        "http://rss.news.yahoo.com/rss/topstories";



    public static Story[] yahooStories = new Story[3];

    public static Story[] blogdexStories = new Story[3];

//    public static Story[] popdexStories = new Story[3];

    

    public static String[] ignore_words =

        {

            " the ",

            " to ",

            " for ",

            " of ",

            " and ",

            " a ",

            " as ",

            " in ",

            " on",

            "A ",

            "For ",

            "The ",

            "On ",

            "As ",

            "In " };



    public static Story getStory(int x, int y)

    {

        if (x == 0)

            return yahooStories[y];

        

        return blogdexStories[y];

        //return popdexStories[y];



    }



    static String stripCommon(String in)

    {

        String current = in;

        for (int i = 0; i < ignore_words.length; i++)

        {

            current = replaceToken(current, ignore_words[i], " ");

        }

        return current;

    }

As shown in the getGoogleResults() method at the start of Example 10-4, the request to Google strips the search term of common words before obtaining the search results. These common words (such as "and," "the," "of," etc.) are removed before sending the Google search string. By default, these words, called stop words in Google's parlance, are ignored. This word list is by no means definitive; most search engines (including Google) don't publish a stop-word list. Despite this, by excluding at least a known subset of these common stop words from counting against Google's 10 word limit, it increases the likelihood that the application will get an appropriate reference from Google's web service.

While this application removes words to ensure a better match, if the content for the site was more specific, the application could automatically include certain words in the search. For example, a Linux news site might include the word Linux in every search to help ensure that Google returns only relevant links.

Example 10-4. Building the news sheet, Part II

    static GoogleSearchResultElement[] getGoogleResults(String in)

    {

        String searchTerm = in;

        System.out.println(searchTerm);

        searchTerm = stripCommon(searchTerm);

        System.out.println(searchTerm);



        GoogleSearch search = new GoogleSearch( );



        // Set mandatory attributes

        search.setKey(DeveloperTokens.googleKey);

        search.setQueryString(searchTerm);



        // Set optional attributes

        search.setSafeSearch(true);

        // Invoke the actual search

        GoogleSearchResult result = null;

        try

        {

            result = search.doSearch( );

        } catch (GoogleSearchFault e)

        {

            e.printStackTrace( );

        }



        GoogleSearchResultElement[] mySearchElements =

            result.getResultElements( );



        return mySearchElements;

    }



    static void addGoogleToStory(Story story)

    {

        GoogleSearchResultElement[] mySearchElements =

            getGoogleResults(story.title);



        for (int i = 0; i < story.related_title.length; i++)

        {

            if (i < mySearchElements.length)

            {

                story.related_title[i] = mySearchElements[i].getTitle( );

                String temp = mySearchElements[i].getSnippet( );

                temp = replaceToken(temp, "<b>", "");

                temp = replaceToken(temp, "</b>", "");

                temp = replaceToken(temp, "<br>", " ");

                story.related_description[i] = temp;

                story.related_link[i] = mySearchElements[i].getURL( );

            }

        }

    }



    static void addGoogleNotes(Story[] stories)

    {

        for (int i = 0; i < stories.length; i++)

        {

            addGoogleToStory(stories[i]);

        }

    }





    static void refreshSiteStories(String feedURL, Story[] stories)

    {

        try

        {

            java.net.URL inpFile = new java.net.URL(feedURL);

            ChannelIF channel =

                RSSParser.parse(new ChannelBuilder( ), inpFile);



            Object[] items = channel.getItems( ).toArray( );

            for (int i = 0; i < stories.length; i++)

            {

                stories[i] = new Story( );

                ItemIF current = (ItemIF)items[i];



                if (current.getTitle( ) != null)

                    stories[i].title = current.getTitle( );



                if (current.getLink( ) != null)

                    stories[i].link = current.getLink( ).toString( );



                if (current.getSubject( ) != null)

                    if (current.getSubject( ).length( ) > 0)

                        System.out.println(

                            "#"

                                + (i + 1)

                                + " Subject"

                                + current.getSubject( ));



                if (current.getDescription( ) != null)

                {

                    String temp = current.getDescription( );

                    if (temp.length( ) > 1024)

                    {

                        temp = temp.substring(0, 1020);

                        temp = temp + "...";

                    }

                    temp = replaceToken(temp, "<", "&lt;");

                    temp = replaceToken(temp, ">", "&gt;");

                    stories[i].description = temp;

                }



            }

        } catch (Exception e)

        {

            e.printStackTrace( );

        }

        addGoogleNotes(stories);

    }



    public static void refreshStories( )

    {

        long timing = System.currentTimeMillis( );



        refreshSiteStories(yahooURL, yahooStories);

        refreshSiteStories(blogdexURL, blogdexStories);

        //refreshSiteStories(popdexURL, popdexStories);

        System.out.println(System.currentTimeMillis( ) - timing);

        isReady = true;

    }



    public static void main(String[] args)

    {

        while (!init( ))

        {

        }

        for (int i = 0; i < yahooStories.length; i++)

        {

            System.out.println("X" + yahooStories[i].title);

            System.out.println(

                "foo" + yahooStories[i].related_title[0]);

        }

        System.out.println("Program complete!");

    }



    /** All parameters must not be null. */

    static public String replaceToken(

        String input,

        String token,

        String value)

    {

        if (input == null)

            throw new NullPointerException(

            "replaceToken input should not be null");

        if (token == null)

            throw new NullPointerException(

            "replaceToken token should not be null");

        if (value == null)

            throw new NullPointerException(

            "replaceToken value should not be null");



        boolean done = false;

        int current = 0;

        int last = 0;

        StringBuffer results = new StringBuffer("");

        while (!done)

        {

            last = current;

            current = input.indexOf(token, current);

            if (current == -1)

                done = true;

            if (!done)

            {

                results.append(input.substring(last, current));

                results.append(value);

                current = current + token.length( );

            } else

            {

                results.append(input.substring(last));

            }

        }



        if (input.length( ) > 0)

            if (results.toString( ).length( ) == 0)

                return input;



        return results.toString( );

    }



    public synchronized static boolean init( )

    {

        if (runner == null)

        {

            runner = new NewsSheet( );

            Timer myTimer = new Timer(true);

            myTimer.schedule(new NewsSheet( ), 0, 1000L * 60L * 60L);

        }

        return isReady;

    }



    public void run( )

    {

        System.out.print(

            new java.util.Date( ).toString( ) + "Refreshing...");



        refreshStories( );

        System.out.println("Done");

    }



}

Notice that a couple of aspects of the code help you cope (at least partially) with the vagaries of RSS. For example, in the method addGoogleToStory(), notice that the B and BR tags Google inserts into the search summary are stripped out. Google places these tags into their response consistently, regardless of your needs. Similarly, while Yahoo offers a very "pure" feed?only plain text is sent in their article?the stories as selected by BlogDex typically contain all sorts of odd formatting. For example, in the screenshot shown in Figure 10-2, notice that the right column is significantly wider than the left column; this is due to a long URL with no breaks or spaces in one of the feeds. As touched on in the previous chapter, writing a parser engine that can handle every bizarre idea of an RSS file can quickly become an exercise in frustration.

As you can see, RSS (and Google search) are powerful tools for enhancing a site; these provide for inexpensive but interesting access to content. Despite the limitations of RSS, it's easy to see the power of aggregation for a reader (and publisher) of content. Intelligent automation can go a long way toward providing useful content.