Emotion Transfer: Best Practices
Emotion Transfer often produces high-quality results out of the box. For additional control and refinement, the following editing methods help you fine-tune performance, timing, and expression.
Improve Voice Output Using Text Edits
A lot can be achieved simply by editing the text of your translated phrase. Commas and exclamation marks bring out the desired intonation. Capitalized words or quotation marks influence logical stress. If the synthesis mispronounces a foreign word of a proper name, you can spell it “as it sounds”.
Improve Voice Output Using Versions:
With Emotion transfer, unlimited number of synthesis versions can be generated for a phrase. They are currently generated two at a time.
While editing a phrase, click on the versions button.

Generate as many versions as you need.
Select the best one.

Adjust Voice Expressiveness
You can make a voice more or less expressive by changing the value of this parameter. Try generating versions with different expressiveness and see which option works best in your case.

Duration Control
Duration control is a powerful tool for aligning synthesized speech with timing requirements in dubbing and voiceover workflows. When generating new versions, you can specify the expected duration of the synthesized phrase.
The AI voice will automatically adjust its speaking rate — slower or faster — to match the target duration.
This allows you to control pacing naturally, without changing playback speed or introducing audio artifacts.
If needed, the resulting phrase can still be fine-tuned on the timeline by stretching or squeezing it.
“Auto” duration is used by default. When this option is chosen, the voice speaks with the most natural speed, considering the reference.
If you feel that automatically picked duration doesn’t match the pace and rhythm of the original voice, unflag it (Fig. 1), and you will be able to change the “Expected duration” parameter (Fig. 2). This feature is in Beta version, so your Expected duration will not always be met by synthesis. Please allow for some margin.
The ”Orig duration” button lets you copy the duration of the original phrase into the “Expected duration” field.


Changing reference phrases
For better consistency or expression, you can generate a new voice using a different reference cues.
Select the phrase and click on the “Adjust emotion transfer” icon.

Select one or two original phrases that you want to use as a reference for a new voice. You can listen to them by clicking the play button. Try to use phrases without noise on the background.

Select the chunks that contain the desired emotion.
Click Create voice from selected chunks.
Use the default test text, or enter your own text to better evaluate the result.
Click Use this voice to apply it.
You can generate multiple voices and compare them before selecting the best option.

⚙️ Prompting:
Prompts are not supported for Emotion transfer at this stage, but will be implemented in the future.
⚙️ Pause tags:
Pause tags are not currently supported. If you need to create a pause within a phrase, the following approaches are recommended: 1) use punctuation, 2) generate longer versions (explained in detail below), 3) split the phrase in two and adjust the timings.