AI and the Shadow People

Shadows of woman and dog on road. AI voice over is like a shadow of human

AI Voices are like shadows of real humans

If I asked you to describe this picture, what would you say? Would you say it was a person (maybe a woman), with her dog? Or would you say it was a picture of a shadow of a person with a shadow of a dog?

This is actually a picture of me and my chihuahua. Can you tell from this photo that I’m wearing my old purple down coat and I’m too hot because spring is in the air? Can you tell that my dog is actually scared because I stopped to take this photo on our morning walk very close to some noisy construction trucks? Can you tell from this picture that I’m wearing my very favorite sneakers, my Hokas, that gave me the resilience to start jogging for the first time ever?

If a picture is worth a thousand words, and a video is worth a million words, then how many words is a shadow worth?

Now, I happen to love shadows. I love how they let me see everyday objects in new ways. I love the shapes they make. But shadows are only about shape. There is no inner light. There is no heart. You can only get so much of the story from a shadow.

Now let’s talk about voices. Have you ever been moved to tears or laughter by just one word spoken? A voice can express such a wide range of emotions. The capacity of the human body to vocalize and shape sound into words into meaning is extraordinary. A human voice can capture color, texture, pitch, even taste and smell. It can stoke our imaginations to the point where all of our senses are involved. That is the job of a voice actor. Whether it’s an audiobook narrator, an animated voice over, or a commercial ad, the voice actor’s job is to bring the heart, to move us to understanding, feeling, and action.

When the Rainbows are Missing

Shadow that looks like a dog with a rainbow ear.

A lot of voice actors are worried right now about the future of AI. Artificial Intelligence is quickly becoming a cheap and easy way to manufacture voices that sound pretty darned convincing. Is AI beginning to replace professional voice acting? It’s certainly trying to. You’ll find many AI voice over examples at Eleven Labs Prime Voice AI and Murf AI. But if AI is the perfectly convincing shadow of a dog in the photo above, the rainbows are the voice actors.

See, it all comes down to your definition of “replace.” A quick Google search for a definition yielded this: “Replace, displace, supplant, supersede means to put out of a usual or proper place or into the place of another. ‘Replace’ implies a filling of a place once occupied by something lost, destroyed, or no longer usable or adequate.” Can an AI voice generator sound like a voice actor or famous person? Yes. There’s many a realistic AI voice. Can AI REPLACE that human being? No. Because the fact of the matter is, an AI generated voice is only a shadow of a person. It will only ever be a shadow of a person. There are many instances where having the shadow of a person suits a project just fine. Where, I’d argue, it’s even useful. For example, an AI voice is a great placeholder, or scratch track, for a corporate narration or dubbing project to help guide the professional voice actor in the final voice over. AI voice is also very useful in text-to-speech applications for the visually impaired and anyone who needs the auditory feedback of an assignment or other text in order to fully comprehend it.

If you try to “replace” the professional voice talent entirely, though, it can’t be done. Voice actors invest large amounts of money and time into coaching, equipment, practicing the craft of voice acting, and living human lives. We bring a vast wealth of experiences and understanding, our interpersonal relatability, our human feelings to the table. We have neither been lost and destroyed nor rendered unusable and inadequate. Voice actors simply cannot be replaced. Like a dog that can sense fear from a mile away, people (consumers) can sense when a message or emotion rings hollow. It may sound good on the surface. It may have all the contours and shaping of the human voice and human experience. But the rainbows, the life, the breath, the human journey, will always be absent.

Human Versus Machine—The Power of the Physical Body

Have you ever watched a video of a famous actor voicing an animated character? Notice how much they move around? Notice how their posture changes depending on the emotion they’re expressing? How they gesticulate? How they scrunch up their face, and raise their eyebrows, and turn their head, and half smile or frown? This physicality changes the shapes of the words and creates nuance.

AI has no bodily expression, and even if it did, its movements wouldn’t affect its vocalizations. Humans are built different. We communicate with each other in so many ways that go beyond verbalization. There is meaning in our movements, our pauses, our breath. Even when we are just listening, without the visual cues of watching someone speak, we can hear those natural shifts. When voice actors understand what we are saying, we embody that understanding, and we communicate not only the words but also that embodiment.

If you listen to the AI voices on Murf ai, you will hear AI imitations of real people’s voices. These AI generated voices are pretty good, but they are still imitations. The voice actors, the human beings, are the creators. The AI is just imitating what it hears. Listen closely and you’ll notice an odd patchiness to the sound, a strangeness in word emphasis or pacing. AI voice cloning just isn’t the same as the real thing. Again, that’s not to say it doesn’t have its place; it just can’t RE-place.

How do you tell a story?

I know I’ve just spent a lot of time showing you why AI can’t generate the same sorts of emotions and human connections in its words that humans can.

I should point out that not all humans communicate and generate emotions in the same way, either. Each of our brains is wired differently. Some people on the autism spectrum, for example, are characterized as having a “flat affect,” in other words having a minimal amount of inflections and a sort of “flat” voice that doesn’t particularly convey emotions. This doesn’t mean that the emotions don’t exist. They just aren’t expressed verbally. Similarly, people with schizophrenia, or with clinical depression, may exhibit a flat affect.

It’s dangerous to imply that there’s only one “human” way to communicate, and I want to be clear that this is not what I’m suggesting. What I am saying is that voice actors train in the craft of communicating in a particular way that feels familiar and authentic to a large human audience because it conveys an immediate expression of emotion and understanding. We train in the craft of storytelling, something most people find to be engaging and moving. Human beings have inner experiences and emotions, and the excellent voice actor is able to externalize those inner experiences through language and movement in a way that connects us.

“But what about the inner life of AI?” you may be asking. There’s been a lot of talk about the future of sentient AI. Is it possible that the more people feed their voices into sites like Vocaroo and Descript, the more AI can be trained to talk, and perhaps even think, like humans? For all I know, Google LaMDA the “Google AI come to life,” really does have a rich inner experience, and I’m not here to refute that. I do think that some AI may in fact have an understanding of what it’s saying. I think it’s possible that one day we may find ourselves with sentient AI. But the way voice actors tell stories using our human bodies and human experiences is always going to sound different than the way AI tells stories. As a result, voice actors will keep on acting and AI will keep on imitating. Maybe one day humans and AI will be able to listen to each others’ stories with appreciation and respect. Meanwhile, we both have important, but different, kinds of services we excel at providing.

Previous
Previous

Formula of an Explainer Video

Next
Next

Explainer videos and why every business needs one