Hack 53 A Talking, Lip-Synched Avatar

figs/expert.gif figs/hack53.gif

Synchronize an animated head to the speech synthesizer.

After completing the speech synthesizer [Hack #52], I showed it to Adam Phillips, and a few days later, he'd drawn a suitably robotic character named Charlie, shown in Figure 7-3.

Figure 7-3. Charlie the `droid
figs/flhk_0703.gif


Adam also provided a complete set of mouth shapes for lip sync animations, as shown in Figure 7-4. Many of the mouth shapes are used for more than one allophone. For example, the symbol JShCh shows the mouth shape used for three different allophones: "j for jump," "sh for ship," and "ch for Charlie."

Figure 7-4. The full set of mouth shapes for English speech
figs/flhk_0704.gif


The next step was to map each of the 77 allophones to one of the 13 mouth shapes, in a way that allows a script to recognize when each allophone is spoken and display the appropriate mouth shape.

First, I arranged the 13 mouth shapes as keyframes in a movie clip instance named mouth, as shown in Figure 7-5.

Figure 7-5. The mouth clip's timeline
figs/flhk_0705.gif


I then created an array to provide the link between the frame numbers in mouth with the allophone names. Rather than create a separate array element for each allophone, I created a separate array element for each voice shape. Why? Well, because there are far fewer mouth shapes than allophones (13 mouth shapes versus 77 allophones). I took advantage of a quick way of searching though an array [Hack #79] .

Here's my solution.

First, I created an array of strings, one per mouth shape. Each string consists of all the allophones that are associated with that mouth shape. The allophones are padded on either side by spaces, making it possible to distinguish between "aer" and "r."

var shapes = new Array( );

// Define an array of mouth shapes with the corresponding allophones.

shapes[0]  = " space ";

shapes[1]  = " b bb m p ";

shapes[2]  = " a aer ay ee er err i ii ";

shapes[3]  = " aa ";

shapes[4]  = " r " ;

shapes[5]  = " o ";

shapes[6]  = " or ow oy ";

shapes[7]  = " oo ou ouu w wh ";

shapes[8]  = " ck d dd dth g gg h hh n ng nn s t tt z zh ";

shapes[9]  = " c e ear k y yy ";

shapes[10] = " f u uh ";

shapes[11] = " ch sh j ";

shapes[12] = " l ll ";

shapes[13] = " th ";

The mouthAnim( ) function looks up an allophone string (such as "th") in the shapes array. It then uses the element number of the search result to jump to the frame in movie clip mouth that contains the most appropriate mouth shape. The first mouth shape is at frame 10, the second at frame 20, and so on. To better see how this works, uncomment the trace action (shown in bold) when you run the FLA.

function mouthAnim(phone) {

  for (var i = 0; i < shapes.length; i++) {

    if (shapes[i].indexOf(" " + phone + " ") != -1) {

      //  trace(phone + " found in " + shapes[i]);

      mouth.gotoAndStop((i+1)*10);

      break;

    }

  }

}

The full changes to the code from the earlier speech synthesizer hack [Hack #52] are shown in bold:

makePhrase = function ( ) {

  if (soundCount < soundMax) {

    soundCount++;

    speech.attachSound(aPhones[soundCount]);

    mouthAnim(aPhones[soundCount]);

    speech.start( );

  } else {

    delete speech.onSoundComplete;

  }

};

function say(phrase) {

  var i = j = 0;

  aPhones = new Array( );

  for (i = 0; i < phrase.length; i++) {

    if (phrase.charAt(i) != "|") {

      aPhones[j] += phrase.charAt(i);

      if (phrase.charAt(i) == " ") {

        aphones[j] = "space";

      }

    } else {

      j++;

    }

  }

  speech.attachSound(aPhones[0]);

  mouthAnim(aPhones[0])

  speech.start( );

  speech.onSoundComplete = makePhrase;

  soundCount = 0;

  soundMax = j-1;

}

function mouthAnim(phone) {

  for (var i = 0; i < shapes.length; i++) {

    if (shapes[i].indexOf(" " + phone + " ") != -1) {

      //  trace(phone + " found in "+ shapes[i]);

      mouth.gotoAndStop((i+1)*10);

      break;

    }

  }

}

var speech = new Sound(this);

var shapes = new Array( );

shapes[0]  = " space ";

shapes[1]  = " b bb m p ";

shapes[2]  = " a aer ay ee er err i ii ";

shapes[3]  = " aa ";

shapes[4]  = " r " ;

shapes[5]  = " o ";

shapes[6]  = " or ow oy ";

shapes[7]  = " oo ou ouu w wh ";

shapes[8]  = " ck d dd dth g gg h hh n ng nn s t tt z zh ";

shapes[9]  = " c e ear k y yy ";

shapes[10] = " f u uh ";

shapes[11] = " ch sh j ";

shapes[12] = " l ll ";

shapes[13] = " th ";

say("h|e|ll|oo| | | | | |h|ow| |ar| |y|ouu| | | | |tt|u|d|ay| |");

stop( );

The Poser application [Hack #32] can be made to lip sync to an animation (although you may have to buy a separate application?Mimic by Daz, http://www.daz3d.com?to do this). This allows you to either create animated speaking characters/avatars or to create guide images to help you in creating your animation keyframes. Oddcast (http://www.oddcast.com) is a good example of a professional application using speaking avatars.

?Art input and animation expertise from Adam Phillips