Available voices in Studio

Voices available in Studio differ in quality, control options, and stability.

At the moment, Studio supports the following voice tiers:

  • S-tier

  • A-tier

  • B-tier

  • C-tier

  • Special voices

⚠️ Not all languages available in Studio support all voice tiers.

S-tier voices

S-tier voices use a new synthesis module provided by a third-party vendor.
Pros

  • Highly natural, human-like sound

  • Strong intonation, pacing, and emotional nuance

  • Support for explicit reactions inside chunks, for example:
    [giggles], [sighs], [laughter]

Limitation

  • Pause breaks may not work reliably for some voices

A-tier voices

  • Good balance of naturalness and clarity

  • Available across multiple locales
    (for example, EN-US–based English, LATAM-based Spanish)

  • Support custom emotions and prompts to improve expressiveness
    (results may vary)

  • Support pauses for more natural pacing

B-tier voices

  • More synthetic and monotone compared to higher tiers

  • Do not support prompts or custom emotions

  • Do not support version-based synthesis

  • Support phonemes

  • Support pauses for pacing control

C-tier voices

  • Most synthetic and monotone tier

  • Do not support prompts or custom emotions

  • Do not support version-based synthesis

  • Support phonemes

  • Support pauses for pacing control

Special voices

  • Created based on a single actor and a specific locale

  • More stable in terms of language and accent
    (for example, French-Canadian voices reliably produce French-Canadian speech)

  • Accent leakage is uncommon

  • Emotions and prompts are less predictable than with A-tier voices

  • Less controllable, but generally more stable

Moderated Special voices

Some voices in Studio are marked with the Moderated voice tag.

This means that all text synthesized with this voice is checked in real time against a list of prohibited topics and words (e.g “suicide“, “violence“, etc)
This moderation is applied at the request of the voice talent who provided the voice.

If a chunk contains restricted words or topics, the chunk will not be synthesized.

What to expect

  • Moderation is applied automatically

  • The check happens during synthesis

  • Only the affected chunks are blocked

  • Other chunks in the task can still be processed

Models and methods

Studio relies on speech synthesis models provided by external vendors.

The exception is Emotion Transfer, which is a Dubformer-specific technology developed in-house and used to improve emotional expressiveness.