Today, most TTS engines come with a pre-processing module that converts things like times, dates, and telephone numbers into standardized speech. Proper nouns like artists’ names, company names, or product names can sometimes be tricky—especially when the same word can have multiple pronunciations or is spelled in an unexpected way.
In addition, abbreviations, acronyms, and initialisms that are common to your product or service may require training your TTS. For example, if your voice assistant said that your bank balance is dollar sign one two seven, you’d find that really strange. Make sure that you choose a TTS that allows for flexibility and customization via a markup language and a user defined lexicon.
3. Choose a TTS that best meets your needs
An important distinction to make when choosing TTS is the type of technology used to develop the voice.
Currently, there are two options, neural TTS and concatenated TTS. Which one you use will depend on your budget, voice requirements, and your user’s environment. While neural TTS is more humanlike, natural, and pleasant than concatenated TTS, it can only run in a cloud environment and it’s a lot more expensive. Most TTS providers charge up to four times more for neural TTS than for concatenated TTS. If voice quality is one of your top goals and you have the budget, the cloud connectivity, and the required CPU capacity, then a neural TTS is definitely a better option.