Available voices in Studio
Voices available in Studio differ in quality, control options, and stability.
At the moment, Studio supports the following voice tiers:
S-tier
A-tier
B-tier
C-tier
Special voices
⚠️ Not all languages available in Studio support all voice tiers.
S-tier voices
S-tier voices use a new synthesis module provided by a third-party vendor.
Pros
Highly natural, human-like sound
Strong intonation, pacing, and emotional nuance
Support for explicit reactions inside chunks, for example:
[giggles],[sighs],[laughter]
Limitation
Pause breaks may not work reliably for some voices
A-tier voices
Good balance of naturalness and clarity
Available across multiple locales
(for example, EN-US–based English, LATAM-based Spanish)Support custom emotions and prompts to improve expressiveness
(results may vary)Support pauses for more natural pacing
B-tier voices
More synthetic and monotone compared to higher tiers
Do not support prompts or custom emotions
Do not support version-based synthesis
Support phonemes
Support pauses for pacing control
C-tier voices
Most synthetic and monotone tier
Do not support prompts or custom emotions
Do not support version-based synthesis
Support phonemes
Support pauses for pacing control
Special voices
Created based on a single actor and a specific locale
More stable in terms of language and accent
(for example, French-Canadian voices reliably produce French-Canadian speech)Accent leakage is uncommon
Emotions and prompts are less predictable than with A-tier voices
Less controllable, but generally more stable
Moderated Special voices
Some voices in Studio are marked with the Moderated voice tag.
This means that all text synthesized with this voice is checked in real time against a list of prohibited topics and words (e.g “suicide“, “violence“, etc)
This moderation is applied at the request of the voice talent who provided the voice.
If a chunk contains restricted words or topics, the chunk will not be synthesized.

What to expect
Moderation is applied automatically
The check happens during synthesis
Only the affected chunks are blocked
Other chunks in the task can still be processed
Models and methods
Studio relies on speech synthesis models provided by external vendors.
The exception is Emotion Transfer, which is a Dubformer-specific technology developed in-house and used to improve emotional expressiveness.