The real voice of Siri explains the art of voiceover

Posted by on Jun 24, 2015 in Voice-Over | 0 comments

Sometimes it’s hard to appreciate that the countless electronic voices we hear, from the prompt at the self-checkout to the disembodied tone coming from our phones, were provided by a real person. Where do those voices come from? To find out, I asked the original voice of the iPhone assistant Siri, Susan Bennett.

Like this video? Subscribe to Vox on YouTube.

She’s a voice actor who, in addition to her iconic iPhone oeuvre, does commercials, sings, and provides voices for many other companies and services. And she explained just how her unique industry works.

How Susan Bennett became the original voice of Siri — and what it says about voice acting

Susan Bennett in her home studio.

Susan Bennett

Susan Bennett in her home studio.

Talking to Susan Bennett is surreal — at one moment she sounds completely normal, except she has the most pleasant voice you’ve ever heard. But in a flash she can turn on the Siri voice, and you start thinking you’re talking to your computer.

Bennett is a native of Burlington, Vermont, who moved to upstate New York when she was young, and her background gave her a neutral American speaking style. After acting and singing at Brown University, she went to twangy Atlanta, where her clear, unaccented voice has given her a unique competitive advantage.

One way or another, she’s been humanizing computers for decades

In the 1970s, Bennett broke into voice acting by humanizing a very different computer than Siri, singing the jingle for Tillie the All Time Teller, one of the first ATMs. For decades, Bennett recorded the narration for answering services, PA systems, and other clients that range from big and corporate to small and local. That experience led her to her most recognizable gig.

When Bennett recorded the voice for Siri in 2005, she had no idea it would end up on the iPhone. She recorded it well before the company that built Siri was bought by Apple, and she didn’t even know she was the voice of Siri until the product debuted in the App Store in 2010 and then appeared on the iPhone 4S in 2011. But as seamless as Bennett sounds as Siri, it was a surprisingly difficult project to capture her voice.

How a digital assistant like Siri is recorded

An image of Siri shortly after her 2011 debut.

Oli Scarff/Getty Images

An image of Siri shortly after her 2011 debut.

Siri needs to be able to say just about everything in the English language, and that took a lot of hard work.

“I recorded four hours a day, five days a week for the month of July,” Bennett says. For a voice actor, that workload causes a lot of strain. “That’s a long time to be talking constantly. Consequently, you get tired.”

Different places need different Siris

Apple’s voice — and all digital voices — are heard around the world, so that means different Siri voices record in different languages. Sadly, Bennett says there’s no “Siri convention,” though she has exchanged emails with Jon Briggs, who was the first voice of the English Siri, called “Daniel.” Briggs is also recognized for his work as the announcer on The Weakest Link game show.

The original Siri “was to sound otherworldly and have a dry sense of humor,” Bennett says. She added that to her take on the character, even as she focused on staying consistent and clear.

Voice acting always requires some technical acumen — as Bennett says, it’s about “being able to read 65 seconds’ worth of copy in 60 seconds.” But recording for a computerized voice like Siri is especially difficult. These marathon vocal sessions didn’t involve reading full words or sentences. Instead, she recorded the raw materials for speech — basic sounds.

The technique of using sophisticated computer programs to build words and sentences from basic sounds is called concatenated speech (Vox sister site The Verge described the process of linking those sounds in 2013). The goal is to try to include every possible sound (usually drawn from a syllable-long building block) so they can be assembled in every possible combination for every possible word.

To record these, voice actors are forced to recite gibberish-like sentences that include all of the English language’s different sounds.

At her home studio, Bennett recorded a few phrases for me. She’d saved an old script for a digital voice that she’d done earlier for Lucent Technologies, including absurd phrases like “oil your mills jewel weed today.” Bennett calls it “digital voice poetry,” and she suggests you get a glass of wine while listening:

The process can take a while because the goal is to record as many varieties and types of sounds as possible, in order to make a better and more human sounding speech. For example, actors like Bennett don’t just need to record an “s” sound — they need to record the varying “s” sounds in words like “hiss,” “snakes,” and “rose.” Eventually, the sounds are stitched together by a computer, with a goal of ever-increasing naturalistic sound.

Bennett thinks some new recordings have probably been incorporated into the current version of Siri, to improve it and provide more options for users. That means the digital assistant you hear on your phone today is likely a mashup of different human voices, including Bennett and others, strung together into one helpful program.

New technology has turned voice acting into a highly competitive business

An elaborate home studio is typical for a voice actor.

Shutterstock

An elaborate home studio is typical for a voice actor.

Still, it’s more important than ever that Bennett be able to say she was the original voice of Siri. It serves as a unique marker in a business where there’s always new talent trying to get the next gig. And that competitive spirit extends to Bennett’s home studio, which would make any audiophile envious.

ISDN is still relevant, even with cable internet

To consumers accustomed to fiber-optic internet, ISDN may seem quaint — it sends audio through old-school copper wire. But in the voiceover world, it remains a top way to send high-quality sound quickly. This Sessionville article describes why ISDN stuck around, even as high-speed internet has become ubiquitous.

It’s built on rubber feet to absorb sound, and she uses it every day. There’s foam on the wall, a desk with a pre-amp and mixer, and a Neumann TLM 193 microphone (average price: $1,599). Sitting on an adjustable school, she reads her scripts off an iPad and has a computer monitor to see how recording is going.

She’s invested seriously in her studio because a majority of her recording occurs at home, typical of many voice actors. Thanks to worldwide high-quality connections — begun with high-quality ISDN lines and extending to today’s fiber-optic broadband — it’s possible for actors around the world to record from home and compete with one another. Like so many industries, technology changed everything for voice actors.

“You could choose a talent from anywhere and record that person from anywhere else,” Bennett says. “All the people from any city no longer were limited to their local group of actors. They could go anywhere in the world.”

She installed her ISDN in 1996, and to remain competitive, many voice actors did the same. Technology has brought big opportunities to the business, as well as stiffer competition.

But as competitive as voiceover is, voices will always be necessary

Siri, ready to answer your questions in 2011.

Hadrian/Shutterstock

Siri, ready to answer your questions in 2011.

Bennett takes care of her voice: drinking tepid water sometimes instead of tea, occasionally having some honey, and avoiding clearing her throat.

But there’s no magic strategy to becoming a voice actor, because something about the voice is innate.

“I think that voices are very personal,” she says, “and I think that’s one of the reasons why people love Siri and all the other digital assistants, because they do bring a bit of humanity to all this machinery we’re dealing with.”

That’s unlikely to change, even as computerized voices become more common. Something about a voice can’t be simulated. That’s very clear when you talk to Susan Bennett and hear her sound just like Siri. But it’s even clearer when she breaks character and starts to laugh.

Leave a Reply

Your email address will not be published. Required fields are marked *

Time limit is exhausted. Please reload CAPTCHA.