Chapter 9. The Mac OS X Speech Framework

Your Macintosh wants to talk to you . . . and it's even willing to listen to what you have to say. Mac OS computers have been able to speak for a long time?ever since the introduction of PlainTalk and Speech Recognition in the pre-Mac OS 8 days (the early 1990s).

Speech is a very interesting concept, but it's one that has been sadly under-supported by most Mac OS applications. One of Classic Mac OS's most interesting features was its support for a feature called " talking dialogs." You could specify a few basic options, and the alerts that appeared would be spoken automatically. No application support was needed, as the appropriate text string was detected automatically by the alert/dialog API. This feature wasn't reimplemented for Mac OS X until the release of Jaguar (Mac OS 10.2). In addition to this basic functionality, Mac OS X features a number of other speech capabilities.

First, Mac OS X can perform speech recognition. Broadly speaking, there are two classes of speech recognition: systems that can understand specific words or phrases (such as the engine in Mac OS X) and systems that are capable of full dictation services. Some packages available from third parties provide full dictation for Mac OS X, but they require an independent commercial license and are beyond the scope of this book. This chapter focuses on the ability of an OS X system to recognize words and on how your Java programs can use that functionality.

Additionally, Mac OS X still supports text-to-speech conversion. This conversion allows plain text, such as that typed into TextPad or a Microsoft Word document, to be converted into a binary audio format and read back to the user. This conversion is a bit of a niche feature, but is pretty cool and worth knowing about.

Apple has made Java-based frameworks for both text-to-speech and voice recognition available as freely downloadable packages from the Apple Developer Connection (ADC). The native support is already included in Mac OS X, but the downloadable frameworks include the required Java bindings and documentation to make them useful programmatically. You'll need to register with the ADC to download the toolkit; free registration is available at

Currently, the Speech Framework relies on Apple's JDirect implementation (as described in Chapter 5). Since JDirect isn't included in the Mac OS X JDK 1.4 implementation, it may be some time before an implementation of the Speech Framework is made available for JDK 1.4-based applications. Visit for the latest information. In the meantime, you'll have to consider speech a JDK 1.3-only feature.

Before diving into the code, consider this advice before using speech in your applications:

  • You can't require speech input for your application unless you are willing to constrain the use of your application. I wrote most of this chapter in a coffee shop. Text-to-speech worked well with my headphones, but I wasn't bold enough to talk to my iBook in public. If I'm hesitant, your users might be, too.

  • It's easier and (arguably) more useful to add text-to-speech to your application than to add speech recognition. Also, just because you add one, it doesn't mean that you must add both. I'd suggest adding text-to-speech capabilities first and speech recognition second.

  • When using text-to-speech, include an easy way for the user to stop the system from speaking. If you use a talking alert dialog, turn off the sound if the user clicks the mouse anywhere, not just on a button. Include an option that turns speech off and on easily and globally in your application. If you're working on a game, pause the speech engine when you pause the rest of the game.

  • Don't forget that the hardware and environment can affect the utility of both technologies. Also, non-native English speakers can sometimes find speaking systems difficult to use or understand.