I've seen the stories (on boingboing, remember the news) about the Elmo kids' book that has interactive audio and has been telling kids, "Who wants to die!" in an apparent prank by someone involved with making the book. I've also read the press release today by the publisher:
the track was recorded as 'Uh oh, who has to go' and due to compression of the digital audio file, some consumers hear a different phrase... We are absolutely certain that the audio file was not tampered with.
Covering their ass, I thought, until I heard the audio sample in this news video from KNDU and now I believe that the publisher is correct. The Elmo sentence under question is an excellent example of the psychological principal of Priming, whereby what you perceive can be affected by your expectations. Listen to the sample expecting to hear "Who wants to die," and that is exactly what you hear. However, listen expecting to hear "Who has to go," and then the correct phrase becomes what you hear. Listen to the video a couple of times and force yourself to "expect" the two different phrases, and many of you will in fact switch what you hear depending on your expectation.
Of course, the first person who misidentified the sample as "Who wants to die" wasn't expecting to hear this frightening threat, so priming wasn't the reason they had their misunderstanding (even though the sentence demonstrates priming very well). How did consumers hear this unintended death threat, then, if priming wasn't the reason?
There are two main confusions with the sentence in question: "has" is confused with "wants", and "go" is confused with "die". The Elmo book certainly uses some severe compression scheme to reduce the bit rate necessary to store the speech in the book as the publisher stated--that's obvious just by listening to it. This compression scheme distorts the speech (in addition to the speech distortion that occurs from the annoying Elmo voice), adds a certain amount of noise, and reduces the speech bandwidth. All of these could lead to confusions in consonants and vowels perceived in the sentence. I decided to pull out some research papers on speech confusion and see if there's an explanation for this mix-up.
Classic research on consonant confusion by Miller and Nicely in 1955 looked at the impact of noise and bandwidth on consonant confusions. According to their research, for speech at a +12 signal-to-noise ratio and a bandwidth of 200-1200 Hz (probably not a bad approximation to the sever compression applied to the Elmo speech), the phoneme /g/ will be incorrectly identified as a /d/ as often as it is correctly identified as a /g/ (click on the figure to the right to see the full-sized confusion matrix--the data of relevance is highlighted in yellow). This begins to explain confusing "go" with "die": the word sounds like it starts with a /d/ instead of a /g/ due to the crappy compression system.
The vowel confusion is a little more difficult to explain, but I'll try assuming that they are represented by the dipthongs /OW/ and /AY/. The vowel sound in "go" has a similar first formant time-course to the vowel in"die" (according to Rabiner and Juang), so again a compression system that limits the bandwidth of speech might make the two vowel sounds more alike.
So now I've explained from a scientific basis how Elmo's "go" could be misinterpreted as "die".
A similar explanation can be made for the vowels in the confusion of "has" with "wants": both words have similar first formants. The consonant confusion with these words is more difficult to explain. Confusing /h/ with /w/ isn't common according to research by Wang and Bilger in 1973 (Miller and Nicely's paper did not look at these consonants). The /h/ is a frication, the /w/ is voiced--the two are rarely confused. I suspect that the compression distortion obliterated the soft consonant /h/ and allowed the user to imagine whatever consonant they want.
This opens a whole new line of work for linguists--alerting companies when their crappy compression systems may cause customers mental anguish (or worse if it's in a car's GPS system). You don't need to mind your p's and q's but be careful because, according to Miller and Nicely under the noisy conditions I considered above, the phoneme /t/ is more likely to be heard incorrectly as a /p/ than correctly as a /t/. So, if you get your face slapped at a noisy bar asking a woman if she wants to see your cool trick, at least now you know why.
I think the [w] of "wants" is picked up from the diphthong at the end of [huw]. That leaves only the nasal unaccounted for, since the [t] in "wants" is evanescent in any rapid speech.
Posted by: John Cowan | February 03, 2006 at 08:24 AM
Thanks, John. Your explanations sound right.
Posted by: bwedwards | February 03, 2006 at 07:12 PM
Agreed. On the discrepancy of H vs. W, I have had first hand experience with this issue. A friend of mine introduced me to a song called Papu Yuar http://www.youtube.com/watch?v=1yRL39PTRbo. Aside from being pretty silly and ripping off Jimi Hendrix, this song is also very addictive, and it made me want to sing it all throughout the day. After doing so for a few hours (I do not speak the language) my friend informed me that I was pronouncing it incorrectly. I was saying something like "Papu Waar" instead of "Papu Yaar." Having taught English for a while in a different country, I was excited by the linguistic challenge. I slowed my speaking for a while and realized that the "W" sound was occurring because of the shape of the vowel sound immediately preceding it: oooo. Since I was not accustomed to the transition of the language, and the fellow singing doesn't hit a hard Y on "yaar," my lips--being curled up in the beginning form of a W--pulled back into that sound.
SO
If the H sound was compressed significantly away, the "oo" sound of "Who" could easily elide into a word starting with a W sound immediately after.
ALSO
H is an un-vocalized sound (like the difference between Z and S: S is just air, Z is air and a hum. Try to whisper the phrase "Zebras at the Zoo" and you will get "Seebras at the Soo"). H, like S, is un-vocalized. However, if the compression added enough noise to the track, a distortion in this area could be perceived as vocalization, rendering the H and airy extension of the consonant in front of it....
This may have combined with the shape of the "Whooo" which lent itself to shaping a W vowel sound. If the H was distorted to the degree that it sounds vocalized, you have an airy pronunciation of waz. Since N is a vocalized sound, the nasal doesn't stand much of a chance.
However, if the original vocal performer did not pronounce the short "ah" sound like a country singer, then priming may have come into effect here as well: a long-term kind of priming due to the socialization of a decision making process for ambiguous sounds. The closer the vowel gets to a schwa, the easier it would be to make this mistake. The phrase "Who was to...?" makes absolutely no sense in the context of the book. I don't know whether it is a bathroom reference or an assessment of who needs to travel somewhere, but in any case, if the book was talking about a trip or using the restroom, it primes the idea "I have to go the bathroom" or "He HAS to go to the bathroom." If it was a store or some other needed trip, we might say "I have to go to the store," or "He HAS to go to the store."
I also posit that the important distinction between need and desire could play an important role in this decision making process:
Consider this conversation:
“Mommy, Johnny has to ride the slide!”
“Does he Have to, or Want to?”
“He wants to.”
The two options distinguished here as possible choices are "has" and "wants." So for the question
Who...to (verb)?
"Has" or "Wants" will be understood before "Was" will be. You are primed by the implied decision as to the assessment of need, which results in a primed response bank! Since this implied decision only offers two options, and one has been destroyed by the distortion of the H, the only viable response from the primed response bank is "Wants."
Priming reigns once more!
I wonder what that would sound like if I compressed it?
Oh well. Who WANTS TO talk about something else?
Posted by: socialscientist | January 07, 2009 at 10:11 AM
I know this is old news, but I've just come across it because of the new story about the progammable Elmo doll that allegedly began saying, "Kill James!" after its batteries were changed.
Anyway, I found another story about this book that says the book should say, "Who wants to try to go potty?" and not "Who has to go?" as stated in your post.
I haven't actually listened to the book itself, and I haven't heard the audio clip because it's been taken down from the site linked above.
But I think the trip to "Who wants to die?" is much shorter if you start from "Who wants to try?" than if you start from "Who has to go?"
This is only speculation, of course. Without hearing the book myself, I can't say for sure. I'm sure curiosity will get the best of me and drive me to find it and take a listen the next time I'm at the bookstore.
http://www.clickorlando.com/news/5784303/detail.html
Posted by: Ce Bon | November 30, 2009 at 02:00 AM
I am dead.
Posted by: celebrity oops | December 25, 2009 at 07:50 PM
This article is just perfect. It made my day. Thank you! And by the way, please do visit also Ceiling Fans With Lights Chi Hair Dryer Commercial Coffee Makers mercial-espresso-machines.org"> Commercial Espresso Machines Contacts For Less
Posted by: Harry | April 05, 2010 at 08:13 AM
that's a strange way of do these things.
Posted by: buy viagra | April 20, 2010 at 07:34 PM