9.2 Putting Speech to Work

Once you've decided that you want to play around with the speech API, it's actually pretty simple to put it into action. This section will discuss how to put speech into basic dialog boxes, as well as more useful applications of text-to-speech and speech recognition.

9.2.1 Getting Set Up

As mentioned earlier, these steps assume that you are a member of the Apple Developer Connection (ADC), for which you can sign up for free. Visit https://connect.apple.com and log in to the developer connection. You'll be given several menus and submenus on the left. Select "Download Software" and then "Java." Then download the Speech Framework as a Mac binary file (in .dmg format). Once you have mounted the disk image, start the included installer.

The installer will place several items of interest on your disk. First, it will place a JAR file, JavaSpeechFramework.jar, in the standard extensions directory of your JavaVM.framework folder, at /System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home/lib/ext/ (see Chapter 2 for more information on the Mac OS X JVM directory layout). It will place documentation in the directory /Developer/Research/JavaSpeechFramework/Documentation/, and sample code in the directory /Developer/Research/JavaSpeechFramework/Examples/.

The JavaSpeechFramework.jar file, therefore, is of great interest. You'll need to make sure this library is on the classpath for your compiler and application before you use the framework.

You can put this JAR file in the ext directory and not worry about classpath issues.

9.2.2 The TalkingJDialog Class

The class TalkingJDialog, shown in Example 9-1, is a simple extension to the standard Swing JDialog. This class extends the basic JDialog dialog box with additional information to provide for spoken text.

This class is not cross-platform and will fail on non-Mac OS X systems. Chapter 5 and Chapter 6 show how to provide support for Apple-specific extensions while retaining cross-platform compatibility.

Check for both the Speech Framework and the Mac OS X platform to ensure that users who don't have the update won't be confused by error messages.

Example 9-1. Extending JDialog for speech
package com.wiverson.macosbook.speech;

/* This single class does the vast bulk of the
   heavy lifting of actually making Mac OS X talk.
 
   Don't blink or you'll miss it.
 */
import com.apple.speech.synthesis.Synthesizer;

/* This class describes a very generic version of
   JDialog with a few methods added for speech recognition
   and related user interface. It's extraordinarily
   straightforward.
 */

public class TalkingJDialog extends javax.swing.JDialog
    implements java.awt.event.MouseListener
{
    
    public TalkingJDialog(  )
    {
        this.setResizable(false);
        this.addMouseListener(this);
    }
    
    /* This method is used to allow the user to click
     anywhere and immediately cancel out of the
     speech playback - even if the dialog isn't
     dismissed
     */
    public void mousePressed(java.awt.event.MouseEvent mouseEvent)
    {
        if(mySynthesizer != null)
            mySynthesizer.stopSpeech(  );
    }
    
    // Needed to complete the MouseListener interface
    public void mouseReleased(java.awt.event.MouseEvent mouseEvent)
    {}
    public void mouseExited(java.awt.event.MouseEvent mouseEvent)
    {}
    public void mouseEntered(java.awt.event.MouseEvent mouseEvent)
    {}
    public void mouseClicked(java.awt.event.MouseEvent mouseEvent)
    {}
    
    public void dispose(  )
    {
        super.dispose(  );
    }
    
    public void hide(  )
    {
        super.hide(  );
        // If the dialog goes away, be sure to stop talking.
        mySynthesizer.stopSpeech(  );
    }
    
    private Synthesizer mySynthesizer = null;
    
    public void show(  )
    {
        super.show(  );
        // Get a synthesizer for this dialog
        // if one isn't already available
        if(mySynthesizer == null)
            mySynthesizer = new Synthesizer(  );
        // Start talking!
        mySynthesizer.speakText(getNotificationText(  ));
    }
    
    // Storage & accessors for the text to be spoken
    private String spokenText;
    public void setNotificationText(String inString)
    {
        spokenText = inString;
    }
    public String getNotificationText(  )
    {
        return spokenText;
    }
}

9.2.3 A Talking Dialog Box

On its own, this class is pretty useless, as is JDialog without an additional extension. To use it, extend TalkingJDialog with your own dialog box and listen to Mac OS X read your messages. Example 9-2 provides a simple, user-friendly standalone example of a talking dialog.

Example 9-2. A simple talking alert box
package com.wiverson.macosbook.speech;

public class TalkingAlertJDialog 
    extends com.wiverson.macosbook.speech.TalkingJDialog
{
        
    /** Creates new form TalkingAlertJDialog */
    public TalkingAlertJDialog(String alert)
    {
        setNotificationText(alert);
        initComponents(  );
        this.getRootPane(  ).setDefaultButton(okButton);
        pack(  );
        java.awt.Dimension screenSize = 
             java.awt.Toolkit.getDefaultToolkit().getScreenSize(  );
        setSize(new java.awt.Dimension(374, 128));
        setLocation((screenSize.width-374)/2,(screenSize.height-128)/4);
    }
    
    private void initComponents(  )
    {
        alertText = new javax.swing.JLabel(  );
        stylePanel = new javax.swing.JPanel(  );
        okButton = new javax.swing.JButton(  );

        setTitle("Alert");
        setResizable(false);
        alertText.setText(getNotificationText(  ));
        alertText.setHorizontalAlignment(javax.swing.SwingConstants.CENTER);
        getContentPane(  ).add(alertText, java.awt.BorderLayout.CENTER);

        okButton.setText("OK");
        okButton.addActionListener(new java.awt.event.ActionListener(  )
        {
            public void actionPerformed(java.awt.event.ActionEvent evt)
            {
                okButtonActionPerformed(evt);
            }
        });

        stylePanel.add(okButton);

        getContentPane(  ).add(stylePanel, java.awt.BorderLayout.SOUTH);

    }

    private void okButtonActionPerformed(java.awt.event.ActionEvent evt)
    {
        setVisible(false);
    }
   
    public static void main(String args[])
    {
        new TalkingAlertJDialog("Help! I've fallen and I can't get up!").show(  );
    }
    
    private javax.swing.JLabel alertText;
    private javax.swing.JPanel stylePanel;
    private javax.swing.JButton okButton;
}

While a picture may be worth a thousand words, you'll have to try this one out on your own to really appreciate Mac OS X's speech features. Still, Figure 9-2 shows TalkingAlertJDialog in action.

Figure 9-2. A talking alert box
figs/XJG_0902.gif

9.2.4 Ask Mac OS X

Next, write a small utility application that sits in the background and answers common questions. This section shows you how to set up the voice recognizer, teach it a few phrases, and make it answer common questions. This lesson should familiarize you with other useful applications of the Speech Framework. Example 9-3 includes the source listing for this utility.

Example 9-3. Speech utility listener
package com.wiverson.macosbook.speech;

import javax.swing.JLabel;
import javax.swing.JComboBox;
import java.awt.BorderLayout;

public class SpeechListener 
    extends javax.swing.JDialog 
    implements java.awt.event.ActionListener,
               com.apple.speech.recognition.UnrecognizedEventListener,
               com.apple.speech.recognition.DetectedEventListener,
               com.apple.speech.recognition.DoneEventListener
{
    
    // Set up the speech recognition engine
    static com.apple.speech.recognition.Recognizer mySpeechRecognizer = null;
    static com.apple.speech.recognition.LanguageModel myLanguageModel = null;
    
    // Set up the text-to-speech engine
    static com.apple.speech.synthesis.Synthesizer mySynthesizer = null;
    
    
    public SpeechListener(  )
    {
        this.getContentPane().setLayout(new BorderLayout(  ));
        statusLabel = new JLabel("Ready.");
        statusLabel.setHorizontalTextPosition(statusLabel.LEFT);
        this.getContentPane(  ).add(statusLabel, BorderLayout.CENTER);
        
        manualCommandMenu = new JComboBox(  );
        manualCommandMenu.setModel(new javax.swing.DefaultComboBoxModel(tasks));
        manualCommandMenu.addActionListener(this);
        
        this.getContentPane(  ).add(manualCommandMenu, BorderLayout.EAST);
        
        this.pack(  );
        this.setSize(300, 50);
        this.setTitle("Address me as " + computerName);
        
        // Set up to talk have the computer talk back.
        if(mySynthesizer == null)
            mySynthesizer = new com.apple.speech.synthesis.Synthesizer(  );
        
        try
        {
            // Hack for workaround of bug which
            // prevents Java apps from receiving
            // AppleEvents in Mac OS X 10.0
            com.apple.ae.AppleEventFunctions.initAE(  );
            
            // Create the SpeechRecoginizer.  
            // Speech is activated lazily upon startup.
            mySpeechRecognizer = new com.apple.speech.recognition.Recognizer(  );
            
            // Create & setup the LanguageModel which we will add our phrases to.
            myLanguageModel = new com.apple.speech.recognition.LanguageModel(  );
            mySpeechRecognizer.setLanguageModel(myLanguageModel);
            
            // Add the phrases we are looking for.
            // Note that we need to add the computer's address first.
            // Still, easier than using the more complex API
            String[] full_tasks = new String[tasks.length];
            for(int i = 0; i < tasks.length; i++)
                full_tasks[i] = computerName + tasks[i];
            
            myLanguageModel.setPhrases(full_tasks);
            
            // Start the recoginizer
            mySpeechRecognizer.start(  );
            
            // Listen for speech events
            mySpeechRecognizer.addDoneEventListener(this);
            mySpeechRecognizer.addUnrecognizedEventListener(this);
            mySpeechRecognizer.addDetectedEventListener(this);
        }
        catch(Exception e)
        {
            e.printStackTrace(  );
        }
    }
    
    private JLabel statusLabel;
    private JComboBox manualCommandMenu;
    private String computerName = "Computer ";
    
    static final private int DAY = 0;
    static final private int SONG = 1;
    static final private int QUIT = 2;
    static final private int BEEP = 3;
    
    private String[] tasks =
    {
        "what day is it",
        "sing a song",
        "quit",
        "beep"
    };
    
    static void main(String[] args)
    {
        (new SpeechListener()).show(  );
    }
    
    public void doCommand(String input)
    {
        statusLabel.setText("I heard " + input);
        
        if(input.compareTo(tasks[DAY]) == 0)
        {
            mySynthesizer.speakText(new java.util.Date().toString(  ));
        }
        
        if(input.compareTo(tasks[SONG]) == 0)
        {
            mySynthesizer.speakText("Sorry, I'm shy");
        }
        
        if(input.compareTo(tasks[QUIT]) == 0)
        {
            System.exit(0);
        }
        
        if(input.compareTo(tasks[BEEP]) == 0)
        {
            java.awt.Toolkit.getDefaultToolkit().beep(  );
        }
    }
    
    public void handleDoneEvent(com.apple.speech.recognition.DoneEvent doneEvent)
    {
        String command = doneEvent.getPhraseRecognized(  );
        if(command != null)
        {
            command = command.substring(computerName.length(), command.length(  ) );
            doCommand(command);
        } else
        {
            statusLabel.setText("Can't understand...?");
        }
    }
    
    public void actionPerformed(java.awt.event.ActionEvent actionEvent)
    {
        if(actionEvent.getSource(  ) instanceof JComboBox)
        {
        doCommand
           (
             (
             (JComboBox)actionEvent.getSource(  )
             ).getSelectedItem().toString(  )
        );
        }
    }
    
    public void handleDetectedEvent(
         com.apple.speech.recognition.DetectedEvent detectedEvent)
    {
        statusLabel.setText("Listening...");
    }
    
    public void handleUnrecognizedEvent(
         com.apple.speech.recognition.UnrecognizedEvent unrecognizedEvent)
    {
        statusLabel.setText("Unrecognized...");
    }
}

Fire up this application:

java com.wiverson.macosbook.speech.SpeechListener

Make sure you've got the Mac OS X speech packages in your classpath before using this program, or you won't be able to compile or execute it.

Once started, the program sits quietly in the background, waiting for the user to speak a phrase such as "Computer, what day is it?" The computer will then respond, using the voice synthesizer to answer the question.

To add additional tasks to the example above, you'll need to add additional phrases to the tasks array, branching logic to the doCommand( ) method and the relevant implementation.

If you're adding support for voice recognition, you'll probably want to integrate the voice commands into your application's existing event dispatching system. Ideally, you should provide a customizable interface for users to specify the specific phrases they'd like to use to trigger events.

9.2.5 Custom Language Models

Besides adding tasks, you can install your own "grammar" by creating more complex language models. This allows you to build much more sophisticated applications, but it is also considerably more difficult to configure and develop.

A custom language model, represented by the com.apple.speech.recognition.LanguageModel class, has a list of zero or more words, phrases, or paths. For example, suppose that you want the system to handle commands such as "call Will" and "schedule a lunch with Brent next Tuesday" (perhaps with other names and days as well). Displaying the model in Backus-Naur Form (BNF) is one way to specify language models. Example 9-4 shows a BNF description of a relatively simple language model.

Example 9-4. A BNF description of a language model
<TopLM> = <call> <person> | schedule meeting with <person> |view today's 
schedule;
<call> = call | phone| dial;
<person> = Will | Brent | Cynthia | Diane;

Building up a custom language model allows your application to mix and match names and phrases, rather than learning each phrase with each possible name and action.

If your application requires this sort of sophistication, investigate the installed documentation at /Developer/Research/JavaSpeechFramework/Documentation/com/apple/speech/recognition/Model.html. The use of this model precludes the use of the simpler API from the sample applications. It was left out of this book, largely because of the still-missing support for speech in JDK 1.4. For projects complex enough to require the sophistication of custom language models, you'll probably want to investigate a commercial package such as IBM's ViaVoice (http://www.apple.com/macosx/applications/viavoice/).