Speech Application Language Tags (SALT)

Speech Application Language Tags (SALT)

The Speech Application Language Tags (SALT) technology was created in 2001 by Cisco, Comverse, Intel, Microsoft, Phillips, and Speechworks. These companies, along with many others, have formed the SALT Forum to oversee future development and standardization of the SALT specification. Their goal is to provide a royalty-free, platform-independent standard for creating multimodal and telephony-enabled applications that can be accessed from PCs, telephones, and PDAs. SALT is a technology for extending common markup languages (including HTML, XHTML, cHTML, WML, etc.) with a spoken dialog interface for Internet applications. It is designed for both voice-only applications and multimodal applications that combine voice and visual displays.

SALT Elements

SALT is a lightweight specification consisting of a small number of XML elements, with associated attributes and DOM object properties, events, and methods. SALT has four main top-level elements, as follows:

  • <listen...>. Used for speech recognition and information input.

  • <prompt...>. Used for prompt playing and speech synthesis configuration.

  • <dtmf...>. Used for configuration and control of DTMF input.

  • <smex...>. Used for general-purpose communication with the host platform.

In addition to these top-level elements there are also <grammar>, <bind>, and <record> elements for specifying grammars, processing recognition results, and recording audio input. In total, SALT has only around 10 elements, making it a very simple language. In most cases, these elements are used in conjunction with ECMAScript to allow for a programmatic approach to application development. This simplicity makes SALT easy to use in conjunction with other markup languages such as HTML or WML.

There are two main usage scenarios for SALT:

  • Multimodal. SALT can be used to extend existing Internet applications with speech for both input and output. For example, a user can request a set of information using voice and have the response returned visually using HTML. Conversely, a user can enter a request using HTML and have the response spoken back to them.

  • Voice only. SALT can be used for applications without a visual display to control the application flow, input, and output. The interaction occurs over any telephone (wireline or wireless), allowing for universal access to Internet content.

The SALT specification is designed to benefit a variety of users. It gives end users more options for interacting with applications. SALT-enabled applications allow for speech, text, or graphical interfaces on their own or in combination with each other. Since it integrates into existing markup languages, developers can continue to use the tools and technologies they are comfortable with while adding advanced speech interfaces. These speech interfaces can help reduce overall application costs while reducing application complexity. In addition, using SALT, businesses can continue to use their existing Web infrastructure and expertise.

Competition between SALT and VoiceXML

VoiceXML and SALT are competitors in the speech application market. VoiceXML is an established specification with a growing user base. Its focus has been to replace or augment existing interactive voice response (IVR) systems, whose software and hardware are proprietary, resulting in higher costs and vendor lock-in. VoiceXML is changing that by taking a standard approach for creating cross-platform, advanced voice applications. SALT, on the other hand, is more focused on multimodal applications, where voice and text are combined to provide more effective interfaces to Internet applications.

The promoters of SALT argue that VoiceXML is inflexible and does not work well with existing Web development tools and server platforms. They are planning to address these failings with SALT, which, because it extends existing markup languages, can be used to add voice to Internet applications in a simplified manner. Microsoft is also spending significant resources on supporting SALT throughout its platform. Visual Studio. NET, ASP.NET, Internet Explorer, and PocketIE will all add support for SALT in the near future. This should increase SALT adoption, as both Visual Studio .NET and ASP.NET are widely used for Internet application development.

Not surprisingly, VoiceXML promoters contend there is no need for SALT, citing the fact that VoiceXML has been around since 1999 and has a strong industry following, proven by the many VoiceXML-based applications that have been successfully deployed. SALT, of course, is still in its conception stage, without any proven market success.

In rebuttal is the fact that SALT can be used with existing speech recognition and text-to-speech technology, which will enable many of the existing VoiceXML vendors to add SALT support to their solutions. This will help SALT to gain some market momentum without requiring companies to put forth a lot of effort. Still, SALT lags considerably behind VoiceXML in terms of both market adoption and maturity of the specification. Whereas the VoiceXML Forum has nearly 400 member companies, has released the second version of its specification, and is recognized as an official standard by the W3C, SALT has just over 50 member companies and only in August 2002 submitted SALT version 1.0 to the W3C Multimodal Interaction Working Group and Voice Browser Working Group for standardization.

The point here is, regardless whether VoiceXML or SALT becomes the dominant voice technology, clearly, an increasing amount of Internet content will become available through a voice interface. This will enable the over 1 billion wireline and wireless telephones to access the Internet through a simple call. (For a complete overview of VoiceXML technology see Chapter 15, "Voice Applications with VoiceXML.")