By Will Butler
[Image: two overlapping speech bubbles. Flickr: Marc Wathieu]
You don't have to be a laryngologist to understand how the human voice leaves a deep and unique impression, like a social fingerprint. There may be a small percentage of the population with near-athletic vocal command, but most people aren't totally comfortable with their speaking voice. And some people can't speak at all. For anyone who struggles with reading, understanding, or speaking, a synthetic voice can be an integral tool. But for most people with serious speech disorders, the perfect prosthesis does not exist.
"We wouldn't dream of fitting a little girl with the prosthetic limb of a grown man. Why would we give her the same prosthetic voice?" That's Rupal Patel, in a recent TEDWomen talk from earlier this month. Patel directs Northwestern University's communication analysis and design laboratory, and her new project, VocalID, values the voice just like any other limb.
The idea is fairly simple. As described on VocalID's site:
To build custom crafted voices, we extract properties from a target talker's disordered speech (whatever sounds the target talker can produce) and apply these features to a synthetic voice that was created from a surrogate voice donor that resembles the target talker in age, size, sex, etc. The result is a synthetic voice that contains as much of the vocal identity of the target talker as possible, and the speech clarity of the surrogate voice donor.
The process of creating a unique prosthetic voice sounds labor intensive but, as Patel told AllThingsD, after a donor has done their part, VocalID can create a new voice in "literally a few minutes."
Personally, I find this exciting. I'm lucky enough to have the neurological and physical apparatus for intelligible speech, but as a blind/low vision person who uses the internet constantly, reading thousands of words a day, I understand the absurdity of only having a few workable voices.
Blind and low vision people have lots of ways of using their computers, but on my MacBook, I have two ways of reading: zooming in, sometimes 1000% or more, or by using Mac's text-to-speech or voiceover function. The only voice I can tolerate is "Alex." Software running in Mac OS X and Kindle has come closer to resembling natural speech patterns, but as you might be able to hear in the video below, they lack a certain… flow:
This puts me in a tough spot. Do I suffer through the one decent synthetic voice, trying to extract meaning from the monotone, or tax my residual vision? As someone who hasn't fully let go of their eyeballs, I find myself reverting to the latter; a painful and unsustainable way of working, straining to scan line after line of blown-up text. And I'm not the only person who wastes their faculties like this.
My situation is one thing, but the real potential for projects such as VocalID lies in communication and social settings. As the TED blog points out, Stephen Hawking is British, but his voice is not British-sounding. When Patel fitted William, a 9-year-old boy, with his own VocalID match, he remarked that he'd "never heard me before."
Whether we realize or not, we scrutinize voices very carefully, and make judgments based on pitch, tone, inflection, cadence, sibilance, and breath, among other things. Having a voice that is mismatched and robotic can put anyone at a disadvantage. For those who rely on synthetic expression their whole life, why shouldn't they at least get their own voice?