| Nov. 16, 2003. 07:59 AM | |||
Even babies know how to separate speech from song So why do the best of our computers find it so difficult? PETER CALAMAI SCIENCE WRITER U2 artist Bono didn't sing his message when he spoke about international development to the Liberal leadership convention in Toronto Friday. The effect of his performance would have been decidedly different if he'd sung "Sunday, Bloody Sunday" or any other U2 classic — but his change of medium would have gone undetected by even the most powerful and sophisticated computer. In fact, it likely wouldn't even recognize that Bono was singing, rather than speaking.Even babies quickly grasp the difference between speech and song, so why do computers find it difficult, and does it matter? After all, as everyone realizes who has ever phoned to check a credit card balance or get a new directory listing, we're already using computerized speech recognition for simple daily tasks.David Gerhard, a professor of computer science at the University of Regina, says the speech-song distinction is important if we expect computers to ever come close to standards of artificial intelligence that interact smoothly with human beings and their surroundings, as regularly depicted in science fiction movies.Computers programmed to recognize and analyze the sung voice could have numerous practical applications — speech therapy, transcribing words and musical notes from a song, training singers, even retrieving songs that fit your personal tastes from the immense and growing online music collections. Gerhard is among a small coterie of researchers tackling the song and music limitations of existing computerized sound-recognition systems. And the researchers are confident their work will not, as some might fear, erase yet one more element of chance and mystery from life and human interaction.An accomplished guitarist and singer, Gerhard believes properly programmed computers can deepen our appreciation of music's magic and mysteries. He says the inspiration for his current research — described last week at the Acoustical Society of America meeting in Austin, Texas — struck when he was directing a choir."I was listening to the voices and how they all adjusted and blended together and I thought: I wonder just what is going on here," he says.At the core of that question is determining the elements people use to distinguish sung lyrics from spoken words. How important is vibrato or pitch? And how does the human computer, our brain, handle the huge range between the vibrato of an operatic basso profundo and a quivering elderly voice, yet realize that both are examples of singing? "Computers really have only one powerful processing site," notes Gerhard, "but our brains have millions of simple interconnected sites. "It's what's called massively parallel computing."A single chip has to make a yes-no decision on some sound which can lie on a fuzzy continuum that runs through poetry and rap music. "The brain can play that back and forth between different sites to realize that things which sound very similar really fall into different categories."So, Gerhard set out to discover precisely which elements people rely on in deciding when words are being sung and not spoken. That alone turned out to be a big job, enough to produce a successful Ph.D thesis. A key stage in the research was gathering hundreds of examples of people using the same words in speech and song and having scores of other people listen and note the differences they heard.
To hear examples of David Gerhard's speech-song pairings, go to http://www.thestar.com/calamai and click on this article. Additional articles by Peter Calamai |