Create a Unique Vocal Brand Identity with a Custom TTS

Aug 11, 2020

8 MIN READ

Create a Unique Vocal Brand Identity with a Custom TTS

Andrew Richards

Implementing a custom, branded voice assistant for your product, service, or app begins with setting goals and deciding what you want your brand to sound like. Skipping this process or taking shortcuts by not getting enough people involved can lead to a brand mismatch once the voice assistant has been implemented. Selecting the voice talent and choosing a Text-to-Speech (TTS) voice that matches the attributes and tone of voice users expect to hear from your brand is a critical step toward making a lasting impression with your voice assistant.

When you’re ready to determine which voice will represent your brand, get as many different stakeholders as possible involved to achieve consensus around the type of voice you want to implement. During that crucial stage of the process, get consensus on these 4 key questions:

Who’s talking? Define your voice to match your brand
What’s your voice assistant’s personality? Determine tone and accent
Where will it be used? Make decisions based on the environment and context
How does it perform? Test quantitatively and qualitatively

1. Carefully define your TTS to match your brand identity

Whether you plan to use an existing text-to-speech (TTS) voice, a custom voice, neural TTS, or concatenated TTS, it’s a good idea to start by defining the identity of your ideal voice. The voice you choose is going to represent your brand, so it’s important to spend time on this with internal stakeholders. This should ideally include a discussion within your team, market research about how your customers perceive your brand, and additional research about the ideal age, gender, and desired accent of your voice assistant.

The voice you choose is going to represent your brand, so it’s important to spend time on this with internal stakeholders.

Spending time upfront to define who your voice assistant will be is critical to its eventual success. Don’t let the default TTS voice that comes built into some platforms define who you are as a company. You’ll risk sounding like everyone else, and creating a mismatch between how you want to be perceived and how users actually perceive your brand.

A good example of a discrepancy between identity and voice is Lost Voice Guy, also known as Lee Ridley. He’s a comedian from the UK who lost his ability to speak at a young age. He now uses a speech-enabled tablet PC in his act. The default voice he uses has a BBC-like British accent, whereas Lee is a down to earth guy from the North of England. He uses this mismatch to comedic effect and it works really well. Lee’s application of a voice mismatch is unique. Unless you’re attempting comedy, you’ll want the voice identity to match your brand.

An example of a great voice match is the voice the BBC gave Beeb, their new voice assistant. The voice identity was carefully crafted after months of research. As a result, they decided to break away from the prestigious Southern British accent traditionally associated with the BBC and gave the app a Northern British accent instead. They also decided to use a male voice as opposed to a female voice to avoid perpetuating gender stereotypes.

2. It’s not always what the TTS says that matters, but how it’s said

How you say something can be as important as what you say. Whether it’s putting emphasis on a specific word or changing the intonation of a full sentence, it can change how speech is perceived. Therefore, it may be necessary for your TTS voice to adapt according to the context of your users. For example, adopting an apologetic tone when correcting a mistake made by the voice assistant can alter the user experience from a purely negative one to something more palatable.

A cheerful voice isn’t always the right default. It may sound good on paper, until you hear it announce a flight delay or bad weather. Some TTS voices can be trained via a markup to change the speaking style to better match the context of the conversation. As TTS continues to advance, we may see voice assistants adapt their speaking style automatically based on some form of semantic analysis or user context.

Adopting an apologetic tone when correcting a mistake made by the voice assistant can alter the user experience from a purely negative one to something more palatable.

Today, most TTS engines come with a pre-processing module that converts things like times, dates, and telephone numbers into standardized speech. Proper nouns like artists’ names, company names, or product names can sometimes be tricky—especially when the same word can have multiple pronunciations or is spelled in an unexpected way.

In addition, abbreviations, acronyms, and initialisms that are common to your product or service may require training your TTS. For example, if your voice assistant said that your bank balance is dollar sign one two seven, you’d find that really strange. Make sure that you choose a TTS that allows for flexibility and customization via a markup language and a user defined lexicon.

3. Choose a TTS that best meets your needs

An important distinction to make when choosing TTS is the type of technology used to develop the voice.

Currently, there are two options, neural TTS and concatenated TTS. Which one you use will depend on your budget, voice requirements, and your user’s environment. While neural TTS is more humanlike, natural, and pleasant than concatenated TTS, it can only run in a cloud environment and it’s a lot more expensive. Most TTS providers charge up to four times more for neural TTS than for concatenated TTS. If voice quality is one of your top goals and you have the budget, the cloud connectivity, and the required CPU capacity, then a neural TTS is definitely a better option.

While neural TTS is more humanlike, natural, and pleasant than concatenated TTS, it can only run in a cloud environment and it’s a lot more expensive.

If, however, your voice assistant will be running in a noisy environment, neural TTS can be less intelligible, which can be problematic for some users and use cases. Although costs for neural TTS are likely to go down over time, there are advantages to using a less-expensive concatenated TTS that can be more easily embedded in products without cloud connectivity and is more intelligible in noisy environments.

4. Balance TTS accuracy with pleasantness

Evaluating TTS options includes measuring accuracy quantitatively as well as gauging the emotional response it involves qualitatively. You can test the accuracy of TTS by seeing how it handles addresses, names, numbers, foreign words, or homographs. Evaluating the effectiveness of the voice can be really subjective because people react emotionally to voices—whether they realize it or not. We can be more forgiving of a pleasant sounding voice that makes lots of mistakes than an unpleasant voice with impeccable pronunciation skills.

Mean opinion score-based tests that are ranked on a scale of 1-5 are typically used to evaluate voices in the most objective way possible. These tests can be conducted internally or outsourced. In either case, it’s important to choose the criteria carefully to include things like naturalness, pleasantness, and intelligibility. As the technology gets more natural and intelligible, the emotional connection users have to your voice assistant will become much more important. Ensure that the test subjects are representative of the users who are going to interact with your voice assistant to create the best match possible.

When you’re working with internal stakeholders to reach agreement about the sound of your branded voice, keep in mind that the goal is to find a voice that will be a positive extension of your brand and your ambassador in the world at large.

As the technology gets more natural and intelligible, the emotional connection users have to your voice assistant will become much more important

If you find a voice in a TTS catalogue that meets your needs, go ahead and use it. Just be aware that you may end up sounding like everyone else and lose the opportunity to communicate your unique brand identity and differentiate yourself from the competition. On the other hand, a custom vocal identity can personalize your brand in a market soon to be filled with other branded voices.

At SoundHound Inc., we have all the tools and expertise needed to create unique VUIs and a vocal brand identity. Explore Houndify’s independent voice AI platform at Houndify.com and register for a free account. Want to learn more? Talk to us about how we can help you bring your voice strategy to life.

Andrew Richards is director of business development at SoundHound Inc., based in France. He’s been working in the voice tech space for almost 20 years and has spent more than 15 years working with text-to-speech technology.

You Might Also be Interested In

Interested in Learning More?

Subscribe today to stay informed and get regular updates from SoundHound Inc.

Voice AI in the Auto Industry: Top Trends That Matter

Learn More

Voice assistants are increasingly the preferred method for users to search, ask questions, and complete tasks efficiently, quickly, conveniently, and hands-free. In 2020 alone, 45% of internet users searched via voice, according to We Are Social. Even before voice assistants became ubiquitous, the auto industry recognized voice AI as a way to deliver safer, smarter, more natural, and — most importantly — hands-free in-car experiences.

27-10-2021

Top 5 Reasons Voice AI is Driving the Future of Auto

Learn More

In-car voice assistants have become more than just a way to pull up music and navigation, hands-free. They are becoming the standard for exceptional driving experiences by offering functionalities that are defining the future of the auto industry. As car manufacturers seek to differentiate themselves in a market increasingly saturated with the latest technological advancements, they’ll need to distinguish themselves with voice AI that surpasses user expectations and needs.

14-10-2021

How In-Car Voice Assistant Adoption is Influencing the Future of Auto Manufacturing

Learn More

In-car voice assistants were one of the first applications of voice AI outside of smart speakers for the home. Leading car manufacturers looking to make driving safer and more enjoyable immediately saw the benefit of investing in custom voice assistants. Those that implemented early are delivering branded, hands-free experiences for their customers that include fast, easy, safe, and hands-free access to navigation, entertainment, information, cabin controls, communication, and more via voice AI technology.

21-09-2021

Create a Unique Vocal Brand Identity with a Custom TTS

Andrew Richards

1. Carefully define your TTS to match your brand identity

2. It’s not always what the TTS says that matters, but how it’s said

3. Choose a TTS that best meets your needs

4. Balance TTS accuracy with pleasantness

You Might Also be Interested In

Design Conversational AI with These 5 Expert Tips in Mind

4 Things You Should Know Before Designing a Voice Assistant

How Brand-Owned Voice Assistants Offer Key Advantages

Interested in Learning More?

You Might Also be Interested In

Voice AI in the Auto Industry: Top Trends That Matter

Top 5 Reasons Voice AI is Driving the Future of Auto

How In-Car Voice Assistant Adoption is Influencing the Future of Auto Manufacturing