When calling to make a plane reservation or verify a bank statement, the first "voice" Americans usually hear is not a real human, it's a computerized communication system. Most ask callers to press a specific telephone key to hear certain information, but more and more of them now give callers the option of saying what they want.
Voice recognition systems have become more accurate, but they are not infallible. They still have trouble with different speaking styles, and many fail completely when presented with a foreign accent. One linguist is helping computers understand us.
When dialing an information line, callers often hear a prerecorded message that sends them through a maze of numbered selections. But these person-to-machine interactions are quickly becoming more engaging, thanks to language experts who are fascinated by how people, themselves, communicate.
"It's the most basic thing we do. We do it all day," says Professor Dan Jurafsky. "We spend more time doing language than any other one thing in our day, except maybe sleeping, which is much less interesting to study."
Mr. Jurafsky is a professor of linguistics, which means he studies human speech, in all its varieties. He pursues his favorite subject at the University of Colorado in Boulder, where he's developed software that makes sense of language subtleties, helping computers understand and talk to people. One problem facing those designing the computer systems is the way people talk.
"You know, when you hear speech, there's no spaces between the words. Right? Speech, it all runs together," he notes. "So how do you know where the words are? How do you hear the boundaries between the words?"
Mr. Jurafsky has focused his efforts on computerized language systems, which allow users to speak their instructions, rather than typing them on a keyboard. His basic research has led to computer systems such as the CU Communicator at the University of Colorado Center for Spoken Language Research. Wayne Ward, a colleague of Mr. Jurafsky who helped design the system, demonstrates how it works by calling up to book a flight. The CU Communicator doesn't really make plane reservations, but it uses real reservation data.
Mr. Ward "converses" with the reservations system in ordinary speech, complete with a slight southern drawl. For example, the computer asks, "What are your travel plans?" When Mr. Ward replies, "I'd like to go from Denver to Boston the morning of November 3," the computer rephrases his request, and asks him to confirm that the computer got it right.
Now, Mr. Ward spoke clearly and simply to this computerized travel agent. When I phone the CU communicator, you may notice that I'm a little more disorganized, and I also have a Midwestern twang. I eventually tell the computer I want to go to the Washington Monument, which is obviously not an airport at all. The poor computer tried very hard, but it completely misinterpreted my request, routing me from a nonexistent place - Albuquerque, Hawaii to Kansas City.
Despite a few mistakes, it's impressive how well this computer handled our different conversational styles, along with Mr. Ward's southern drawl and my slight Midwestern twang. Drawls and twangs may seem like subtle things, but Dan Jurafsky says minor speech variations can cause major problems for computerized speech recognizers.
"They break very badly on children, on very old people with crackly voices, and people with strong foreign accents," he conceded. "One of the most interesting research areas on the engineering side is, how can we make our speech recognizers robust to people with these kinds of accents. What makes a Spanish accent different than a Midwestern American English accent? What's different about the choice of words or the rhythm of the speech?"
Because language is constantly changing, he says America's computer programmers will always need to come up with new answers to those questions. "We're a nation of immigrants," he points out. "Every generation we get immigrants with different accents, but it'll always be someone from somewhere with an accent."
Mr. Jurafsky says he hopes that someday, all computers will understand a broad range of accents, as well as languages, because this will make computer technology more accessible to people around the world. What's more, since a computer that talks also seems to have a personality, the linguistics professor theorizes that better speech recognizers will improve our relationship with these increasingly complex machines. In recognition of his efforts to help man and machine communicate, Dan Jurafsky has just received a $500,000 MacArthur "genius" grant.