Auto-mixing

What are some common usage scenarios?

Choosing the Mixing Mode:

How to choose the Voice Replacement mode:
Simply set the original_sound_volume parameter to 0.0. This will eliminate the background sound, and no mixing will actually take place. All other parameters that control the background sound and sound overlay will be ignored.
How to choose the Voiceover mode:
If original_sound_volume is not equal to 0.0, and the list of speaker IDs to replace with synthesis is empty (narrator_speaker_ids=[]), then the synthesis overlay mode is used for all speakers.
Note: It's important that the parameter remove_voice is set to 0 with these settings. Otherwise, all speakers will be passed to narrator_speaker_ids, and the final mode will be set to Smart Voiceover.
How to choose the Custom Voiceover mode:
If original_sound_volume is not equal to 0.0, and the list of speaker IDs for voice replacement with synthesis (narrator_speaker_id) is not empty. Then, speech replacement will be performed for those speakers, and synthesis will be overlaid on the speech of the remaining speakers.
How to choose the Smart Voiceover mode:
If original_sound_volume is not equal to 0.0, and all Speaker IDs for voice replacement are listed in narrator_speaker_ids, then voice replacement will occur for all speakers, resulting in the Smart Voiceover mode. Similarly, if the list of voice replacement Speaker IDs is empty narrator_speaker_ids=[], but remove_voice=1, then all speakers will be passed to narrator_speaker_ids.

Speaker Control for Speech Replacement and Synthesis Overlay:

I want no sounds other than synthesis at all.
Choose the Voice Replacement mode.
I want synthesis to be overlaid on the voice of all speakers.
Choose the Voiceover mode.
I want synthesis to replace the original speech for all speakers.
Choose the Smart Voiceover mode.
I want to select the speakers for which I want to do voice replacement, and overlay the speaker on top of the original speech for the rest.
Choose the Custom Voiceover mode.

**Adjusting Volume/Sounds within Synthesis Overlay Segments:**

When voiceover is used, the original speakers’ speech is too loud compared to the synthesis.
The parameter voiceover_muffle_db controls how much to reduce the volume of the original speech overlaid with the synthesis. The higher the value of this parameter, the quieter the original speech will be.
The synthesized voice is too quiet/loud compared to the background track.
To adjust the volume of the synthesis, you can use the transfer_voice_loudness=True flag. This will transfer the loudness of all lines from the original speech to the synthesis lines. Alternatively, you can manually adjust the overall loudness of the synthesis track by setting the tts_loudness parameter. It is not recommended to simultaneously change the transfer_voice_loudness and tts_loudness parameters at the same time, as their combined effect on the track volume may be unpredictable.
Important: Changing these parameters does affect the volume of the original speech.
Artifacts/distortions of the background are audible during the synthesis overlay.
If no client track is used, we use AI to get our own background track, and artifacts may appear. By default, we leave the background volume unchanged and only adjust the volume of the original speech, making it quieter. This behavior can be disabled by setting voiceover_fix_background_volume=False, in which case the background volume will also be reduced, and potential artifacts/distortions will be less noticeable.
There is a client background track, but in moments of synthesis overlay, the background is very loud.
If we have a client track and need to adjust the volume, which should be reduced during the synthesis overlay, you can set the background_muffle_db parameter. The higher its value, the quieter the background will be at that moment.

**Adjusting the Volume/Sounds outside Synthesis Overlay Segments:**

The original speaker’s voice is loud but becomes quieter during the synthesis moments and returns to the original, which is very noticeable. How can it be made consistently quiet throughout the video?
To keep the volume of the original speech consistent throughout the entire video, you can set the voiceover_fix_vocals_volume=True parameter. This will fix and reduce the volume of the original voice by the selected value of voiceover_muffle_db.