9.1 The Speech API

Two main sections in the Speech API are used to create spoken words from text strings and to listen to words spoken by the user. This section covers a few basic classes you should be familiar with when working with the Speech API.

9.1.1 The Synthesizer Class

When you simply want Mac OS X to speak text, you'll work primarily with the com.apple.speech.synthesis package. Synthesizer is the most important class for converting text to spoken words. This class has a few basic methods to work with, such as speakText(String) and stopSpeech( ). In addition, several methods allow control over other speech options, including:

  • Notification of specific events while speaking text, including when individual words or individual phonemes are spoken, or when speech is started, finished, or paused.

  • The ability to embed special commands via delimiters (http://developer.apple.com/techpubs/mac/Sound/Sound-200.html#HEADING200-0).

  • Changing of pitch, pitch modulation, rate, voice, and volume.

  • Pausing of the current speech synthesis immediately, or at the end of the current sentence or word.

All methods for these classes are detailed in the included Javadoc documentation for the Speech Framework. Rather than deal with each individually, the rest of this chapter will put the framework into action, giving you practical experience in working with OS X, Java, and speech.

9.1.2 Setting Speech Defaults

Although programmatic options control the speech playback as described above, the "System Preferences Speech" control panel sets default speech configuration, as shown in Figure 9-1.

Figure 9-1. Speech preferences

9.1.3 Speech Recognition

Recognizing speech from a user is a bit trickier than generating speech from text, and is handled by the com.apple.speech.recognition package. The package's core class is Recognizer, which lets you specify which words and phrases are known by the recognition package. You also need to specify the language style to be used, through the LanguageModel class. This class allows you to specify the type of speech so the recognition engine can try to make intelligent decisions about combinations of words it "hears." You'll then add phrases to the model and add that model to the Recognizer (Recognizer.setLanguageModel( )).

Once you've registered all the words and phrases, you then need to add event handlers to the Recognizer. This class lets you deal with recognized and unrecognized events. You can launch programs, continue listening, show (or speak) error messages if a phrase isn't understood, and do anything else that Java programming supports.