Once you've decided that you want to play around with the speech API, it's actually pretty simple to put it into action. This section will discuss how to put speech into basic dialog boxes, as well as more useful applications of text-to-speech and speech recognition.
As mentioned earlier, these steps assume that you are a member of the Apple Developer Connection (ADC), for which you can sign up for free. Visit https://connect.apple.com and log in to the developer connection. You'll be given several menus and submenus on the left. Select "Download Software" and then "Java." Then download the Speech Framework as a Mac binary file (in .dmg format). Once you have mounted the disk image, start the included installer.
The installer will place several items of interest on your disk. First, it will place a JAR file, JavaSpeechFramework.jar, in the standard extensions directory of your JavaVM.framework folder, at /System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home/lib/ext/ (see Chapter 2 for more information on the Mac OS X JVM directory layout). It will place documentation in the directory /Developer/Research/JavaSpeechFramework/Documentation/, and sample code in the directory /Developer/Research/JavaSpeechFramework/Examples/.
The JavaSpeechFramework.jar file, therefore, is of great interest. You'll need to make sure this library is on the classpath for your compiler and application before you use the framework.
|
The class TalkingJDialog, shown in Example 9-1, is a simple extension to the standard Swing JDialog. This class extends the basic JDialog dialog box with additional information to provide for spoken text.
|
package com.wiverson.macosbook.speech; /* This single class does the vast bulk of the heavy lifting of actually making Mac OS X talk. Don't blink or you'll miss it. */ import com.apple.speech.synthesis.Synthesizer; /* This class describes a very generic version of JDialog with a few methods added for speech recognition and related user interface. It's extraordinarily straightforward. */ public class TalkingJDialog extends javax.swing.JDialog implements java.awt.event.MouseListener { public TalkingJDialog( ) { this.setResizable(false); this.addMouseListener(this); } /* This method is used to allow the user to click anywhere and immediately cancel out of the speech playback - even if the dialog isn't dismissed */ public void mousePressed(java.awt.event.MouseEvent mouseEvent) { if(mySynthesizer != null) mySynthesizer.stopSpeech( ); } // Needed to complete the MouseListener interface public void mouseReleased(java.awt.event.MouseEvent mouseEvent) {} public void mouseExited(java.awt.event.MouseEvent mouseEvent) {} public void mouseEntered(java.awt.event.MouseEvent mouseEvent) {} public void mouseClicked(java.awt.event.MouseEvent mouseEvent) {} public void dispose( ) { super.dispose( ); } public void hide( ) { super.hide( ); // If the dialog goes away, be sure to stop talking. mySynthesizer.stopSpeech( ); } private Synthesizer mySynthesizer = null; public void show( ) { super.show( ); // Get a synthesizer for this dialog // if one isn't already available if(mySynthesizer == null) mySynthesizer = new Synthesizer( ); // Start talking! mySynthesizer.speakText(getNotificationText( )); } // Storage & accessors for the text to be spoken private String spokenText; public void setNotificationText(String inString) { spokenText = inString; } public String getNotificationText( ) { return spokenText; } }
On its own, this class is pretty useless, as is JDialog without an additional extension. To use it, extend TalkingJDialog with your own dialog box and listen to Mac OS X read your messages. Example 9-2 provides a simple, user-friendly standalone example of a talking dialog.
package com.wiverson.macosbook.speech; public class TalkingAlertJDialog extends com.wiverson.macosbook.speech.TalkingJDialog { /** Creates new form TalkingAlertJDialog */ public TalkingAlertJDialog(String alert) { setNotificationText(alert); initComponents( ); this.getRootPane( ).setDefaultButton(okButton); pack( ); java.awt.Dimension screenSize = java.awt.Toolkit.getDefaultToolkit().getScreenSize( ); setSize(new java.awt.Dimension(374, 128)); setLocation((screenSize.width-374)/2,(screenSize.height-128)/4); } private void initComponents( ) { alertText = new javax.swing.JLabel( ); stylePanel = new javax.swing.JPanel( ); okButton = new javax.swing.JButton( ); setTitle("Alert"); setResizable(false); alertText.setText(getNotificationText( )); alertText.setHorizontalAlignment(javax.swing.SwingConstants.CENTER); getContentPane( ).add(alertText, java.awt.BorderLayout.CENTER); okButton.setText("OK"); okButton.addActionListener(new java.awt.event.ActionListener( ) { public void actionPerformed(java.awt.event.ActionEvent evt) { okButtonActionPerformed(evt); } }); stylePanel.add(okButton); getContentPane( ).add(stylePanel, java.awt.BorderLayout.SOUTH); } private void okButtonActionPerformed(java.awt.event.ActionEvent evt) { setVisible(false); } public static void main(String args[]) { new TalkingAlertJDialog("Help! I've fallen and I can't get up!").show( ); } private javax.swing.JLabel alertText; private javax.swing.JPanel stylePanel; private javax.swing.JButton okButton; }
While a picture may be worth a thousand words, you'll have to try this one out on your own to really appreciate Mac OS X's speech features. Still, Figure 9-2 shows TalkingAlertJDialog in action.
Next, write a small utility application that sits in the background and answers common questions. This section shows you how to set up the voice recognizer, teach it a few phrases, and make it answer common questions. This lesson should familiarize you with other useful applications of the Speech Framework. Example 9-3 includes the source listing for this utility.
package com.wiverson.macosbook.speech; import javax.swing.JLabel; import javax.swing.JComboBox; import java.awt.BorderLayout; public class SpeechListener extends javax.swing.JDialog implements java.awt.event.ActionListener, com.apple.speech.recognition.UnrecognizedEventListener, com.apple.speech.recognition.DetectedEventListener, com.apple.speech.recognition.DoneEventListener { // Set up the speech recognition engine static com.apple.speech.recognition.Recognizer mySpeechRecognizer = null; static com.apple.speech.recognition.LanguageModel myLanguageModel = null; // Set up the text-to-speech engine static com.apple.speech.synthesis.Synthesizer mySynthesizer = null; public SpeechListener( ) { this.getContentPane().setLayout(new BorderLayout( )); statusLabel = new JLabel("Ready."); statusLabel.setHorizontalTextPosition(statusLabel.LEFT); this.getContentPane( ).add(statusLabel, BorderLayout.CENTER); manualCommandMenu = new JComboBox( ); manualCommandMenu.setModel(new javax.swing.DefaultComboBoxModel(tasks)); manualCommandMenu.addActionListener(this); this.getContentPane( ).add(manualCommandMenu, BorderLayout.EAST); this.pack( ); this.setSize(300, 50); this.setTitle("Address me as " + computerName); // Set up to talk have the computer talk back. if(mySynthesizer == null) mySynthesizer = new com.apple.speech.synthesis.Synthesizer( ); try { // Hack for workaround of bug which // prevents Java apps from receiving // AppleEvents in Mac OS X 10.0 com.apple.ae.AppleEventFunctions.initAE( ); // Create the SpeechRecoginizer. // Speech is activated lazily upon startup. mySpeechRecognizer = new com.apple.speech.recognition.Recognizer( ); // Create & setup the LanguageModel which we will add our phrases to. myLanguageModel = new com.apple.speech.recognition.LanguageModel( ); mySpeechRecognizer.setLanguageModel(myLanguageModel); // Add the phrases we are looking for. // Note that we need to add the computer's address first. // Still, easier than using the more complex API String[] full_tasks = new String[tasks.length]; for(int i = 0; i < tasks.length; i++) full_tasks[i] = computerName + tasks[i]; myLanguageModel.setPhrases(full_tasks); // Start the recoginizer mySpeechRecognizer.start( ); // Listen for speech events mySpeechRecognizer.addDoneEventListener(this); mySpeechRecognizer.addUnrecognizedEventListener(this); mySpeechRecognizer.addDetectedEventListener(this); } catch(Exception e) { e.printStackTrace( ); } } private JLabel statusLabel; private JComboBox manualCommandMenu; private String computerName = "Computer "; static final private int DAY = 0; static final private int SONG = 1; static final private int QUIT = 2; static final private int BEEP = 3; private String[] tasks = { "what day is it", "sing a song", "quit", "beep" }; static void main(String[] args) { (new SpeechListener()).show( ); } public void doCommand(String input) { statusLabel.setText("I heard " + input); if(input.compareTo(tasks[DAY]) == 0) { mySynthesizer.speakText(new java.util.Date().toString( )); } if(input.compareTo(tasks[SONG]) == 0) { mySynthesizer.speakText("Sorry, I'm shy"); } if(input.compareTo(tasks[QUIT]) == 0) { System.exit(0); } if(input.compareTo(tasks[BEEP]) == 0) { java.awt.Toolkit.getDefaultToolkit().beep( ); } } public void handleDoneEvent(com.apple.speech.recognition.DoneEvent doneEvent) { String command = doneEvent.getPhraseRecognized( ); if(command != null) { command = command.substring(computerName.length(), command.length( ) ); doCommand(command); } else { statusLabel.setText("Can't understand...?"); } } public void actionPerformed(java.awt.event.ActionEvent actionEvent) { if(actionEvent.getSource( ) instanceof JComboBox) { doCommand ( ( (JComboBox)actionEvent.getSource( ) ).getSelectedItem().toString( ) ); } } public void handleDetectedEvent( com.apple.speech.recognition.DetectedEvent detectedEvent) { statusLabel.setText("Listening..."); } public void handleUnrecognizedEvent( com.apple.speech.recognition.UnrecognizedEvent unrecognizedEvent) { statusLabel.setText("Unrecognized..."); } }
Fire up this application:
java com.wiverson.macosbook.speech.SpeechListener
|
Once started, the program sits quietly in the background, waiting for the user to speak a phrase such as "Computer, what day is it?" The computer will then respond, using the voice synthesizer to answer the question.
|
If you're adding support for voice recognition, you'll probably want to integrate the voice commands into your application's existing event dispatching system. Ideally, you should provide a customizable interface for users to specify the specific phrases they'd like to use to trigger events.
Besides adding tasks, you can install your own "grammar" by creating more complex language models. This allows you to build much more sophisticated applications, but it is also considerably more difficult to configure and develop.
A custom language model, represented by the com.apple.speech.recognition.LanguageModel class, has a list of zero or more words, phrases, or paths. For example, suppose that you want the system to handle commands such as "call Will" and "schedule a lunch with Brent next Tuesday" (perhaps with other names and days as well). Displaying the model in Backus-Naur Form (BNF) is one way to specify language models. Example 9-4 shows a BNF description of a relatively simple language model.
<TopLM> = <call> <person> | schedule meeting with <person> |view today's schedule; <call> = call | phone| dial; <person> = Will | Brent | Cynthia | Diane;
Building up a custom language model allows your application to mix and match names and phrases, rather than learning each phrase with each possible name and action.
If your application requires this sort of sophistication, investigate the installed documentation at /Developer/Research/JavaSpeechFramework/Documentation/com/apple/speech/recognition/Model.html. The use of this model precludes the use of the simpler API from the sample applications. It was left out of this book, largely because of the still-missing support for speech in JDK 1.4. For projects complex enough to require the sophistication of custom language models, you'll probably want to investigate a commercial package such as IBM's ViaVoice (http://www.apple.com/macosx/applications/viavoice/).