For following tables, W represents using WeNet feature only, W + W represents using both WeNet and Whisper features, and W + W + C represents using WeNet, Whisper and ContentVec features.
Source | TransformerSVC (W) |
TransformerSVC (W + W) |
TransformerSVC (W + W + C) |
VitsSVC (W) |
VitsSVC (W + W) |
VitsSVC (W + W + C) |
DiffWaveNetSVC (W) |
DiffWaveNetSVC (W + W) |
DiffWaveNetSVC (W + W + C) |
---|---|---|---|---|---|---|---|---|---|
Source | TransformerSVC (W) |
TransformerSVC (W + W) |
TransformerSVC (W + W + C) |
VitsSVC (W) |
VitsSVC (W + W) |
VitsSVC (W + W + C) |
DiffWaveNetSVC (W) |
DiffWaveNetSVC (W + W) |
DiffWaveNetSVC (W + W + C) |
---|---|---|---|---|---|---|---|---|---|
Source | TransformerSVC (W) |
TransformerSVC (W + W) |
TransformerSVC (W + W + C) |
VitsSVC (W) |
VitsSVC (W + W) |
VitsSVC (W + W + C) |
DiffWaveNetSVC (W) |
DiffWaveNetSVC (W + W) |
DiffWaveNetSVC (W + W + C) |
---|---|---|---|---|---|---|---|---|---|
Source | TransformerSVC (W) |
TransformerSVC (W + W) |
TransformerSVC (W + W + C) |
VitsSVC (W) |
VitsSVC (W + W) |
VitsSVC (W + W + C) |
DiffWaveNetSVC (W) |
DiffWaveNetSVC (W + W) |
DiffWaveNetSVC (W + W + C) |
---|---|---|---|---|---|---|---|---|---|
Source | TransformerSVC (W) |
TransformerSVC (W + W) |
TransformerSVC (W + W + C) |
VitsSVC (W) |
VitsSVC (W + W) |
VitsSVC (W + W + C) |
DiffWaveNetSVC (W) |
DiffWaveNetSVC (W + W) |
DiffWaveNetSVC (W + W + C) |
---|---|---|---|---|---|---|---|---|---|