Performance of Different Semantic-based Features


Table: Objective evaluation results of different semantic-based features and their integration.

In this page, we will show some representative cases to

We select DiffWaveNetSVC as the base model and conduct experiments under the Recording Studio Setting. We use Opencpop as the target singer, whose training corpus is 5.2 hours of studio recorded singing voices. Here are some reference samples:


Spectrogram Reconstruction

Firstly, we aim to evaluate the reconstruction performance among different semantic-based features. The following samples are from Opencpop's test set:

Ground Truth WeNet WeNet + Whisper WeNet + Whisper + ContentVec
#1
MCD 0.00 12.56 8.00 5.02
Mel Spectrogram Your browser does not support the img element. Your browser does not support the img element. Your browser does not support the img element. Your browser does not support the img element.
#2
MCD 0.00 11.20 6.21 3.66
Mel Spectrogram Your browser does not support the img element. Your browser does not support the img element. Your browser does not support the img element. Your browser does not support the img element.
#3
MCD 0.00 13.40 8.82 6.51
Mel Spectrogram Your browser does not support the img element. Your browser does not support the img element. Your browser does not support the img element. Your browser does not support the img element.

Observation: After integrating diverse semantic-features, the mel spectrograms are approaching the ground truth. The overall quality of the audios are also better.



Melody Modeling

Furthermore, we use the M4Singer as the source audios to conduct the singing voice conversion. Our desired target singer is Opencpop. Here we evaluate the melody modeling capability of the different semantic-features.

Source WeNet WeNet + Whisper WeNet + Whisper + ContentVec
#1
F0CORR 1.00 -0.44 0.90 0.92
F0 Your browser does not support the img element. Your browser does not support the img element. Your browser does not support the img element. Your browser does not support the img element.
#2
F0CORR 1.00 -0.22 0.61 0.68
F0 Your browser does not support the img element. Your browser does not support the img element. Your browser does not support the img element. Your browser does not support the img element.
#3
F0CORR 1.00 -0.16 0.22 0.72
F0 Your browser does not support the img element. Your browser does not support the img element. Your browser does not support the img element. Your browser does not support the img element.

Observation: After integrating diverse semantic-features, the trajectories of the melody between converted audios and ground truth are closer. However, we can also find that using only semantic-based features is hard to model melody adequately, appearing the “out of tune” for human hearing. Therefore, introducing explicit melody modeling (such as F0 features) for SVC remains necessary in the present technology context.



Lyrics Modeling

In this section, we evaluate the lyrics modeling capability of the different semantic-features. Like before, the source audios are from M4Singer and the target singer is Opencpop.

Source WeNet WeNet + Whisper WeNet + Whisper + ContentVec
#1 这只是刚刚入门接下来你还会会弹琴会写歌会双截棍
zhè zhǐ shì gāng gāng rù mén jiē xià lái nǐ hái huì huì tán qín huì xiě gē huì shuāng jié gùn
这只是刚刚入门接下来你还会弹琴会写歌会上街滚 (CER: 18.2%) 这只是当童话吗地安利啊狗货看清单报上节归 (CER: 81.8%) 这只是刚刚入门接下来你还会弹琴会写歌会双戒柜 (CER: 13.6%) 只是刚刚入门接下来你还会弹琴会写歌会双截棍 (CER: 9.1%)
zhè zhǐ shì gāng gāng rù mén jiē xià lái nǐ hái huì tán qín huì xiě gē huì shàng jiē gǔn zhè zhǐ shì dāng tóng huà ma dì xià ān lì a gǒu huò kàn qīng dān xiě bào shàng jié guī zhè zhǐ shì gāng gāng rù mén jiē xià lái nǐ hái huì tán qín huì xiě gē huì shuāng jiè guì jiù zhǐ shì gāng gāng rù mén jiē xià lái nǐ hái huì tán qín huì xiě gē huì shuāng jié gùn
#2 我好想对你对你宠爱才短短几个礼拜心情坏因为你不在
wǒ hǎo xiǎng duì nǐ duì nǐ chǒng ài cái duǎn duǎn jǐ gè lǐ bài xīn qíng huài yīn wèi nǐ bù zài
我好想对你对你宠爱才短短几个礼拜心情因为你不在 (CER: 4.2%) 同好心愿意大胆即刻必败应尽力为你不 (CER: 90.5%) 我好对你对你宠爱才等等几个礼拜心情欢意为你不在 (CER: 20.8%) 我好想对你对你宠爱才短短几个礼拜心情欢意为你不在 (CER: 8.33%)
wǒ hǎo xiǎng duì nǐ duì nǐ chǒng ài cái duǎn duǎn jǐ gè lǐ bài xīn qíng huì yīn wèi nǐ bù zài tóng hào xīn yuàn yì ài chū ài dà dǎn jí kè bì bài yīng jìn lì wèi nǐ bù cāi wǒ hǎo xīn duì nǐ duì nǐ chǒng ài cái děng děng jǐ gè lǐ bài xīn qíng huān yì wèi nǐ bù zài wǒ hǎo xiǎng duì nǐ duì nǐ chǒng ài cái duǎn duǎn jǐ gè lǐ bài xīn qíng huān yì wèi nǐ bù zài
#3 夕阳下我拉着你一起望着天天慢的静静的度过旧旧的时间
xī yáng xià wǒ lā zhe nǐ yì qǐ wàng zhe tiān tiān màn de jìng jìng de dù guò jiù jiù de shí jiān
夕阳下拉着你一起望着天静静度过久久的时间 (CER: 24.0%) 心跳不欢声你一望着天慢的听见多古典就失恋 (CER: 73.9%) 深夜拉着你一起望着天慢的静静的的时间 (CER: 29.2%) 夕阳下我拉着你一起望着天给你慢的静静的度过久久的时间 (CER: 15.4%)
xī yáng xià lā zhe nǐ yì qǐ wàng zhe tiān jìng màn jìng jìng dù guò jiǔ jiǔ de shí jiān xīn tiào bù huān shēng nǐ yī qiè wàng zhe tiān xīn màn de tīng jiàn de duō gǔ diǎn jiù shī liàn shēn yè xià lā zhe nǐ yì qǐ wàng zhe tiān tīng màn de jìng jìng de dōu guò de shí jiān xī yáng xià wǒ lā zhe nǐ yì qǐ wàng zhe tiān gěi nǐ màn de jìng jìng de dù guò jiǔ jiǔ de shí jiān
#4 顾不顾将相王侯管不管万世千秋求只求爱化解这万丈红尘纷乱永无休
gù bù gù jiàng xiàng wáng hóu guǎn bù guǎn wàn shì qiān qiū qiú zhǐ qiú ài huà jiě zhè wàn zhàng hóng chén fēn luàn yǒng wú xiū
孤不孤将相望后管不管万世千秋求求爱化解这万丈红尘纷乱永无休 (CER: 16.7%) 牵挂顾登场问候问完把时间救当初仇我将 (CER: 72.3%) 顾不顾正向往后还关为师千秋求求爱化解这万丈红尘纷乱永无休 (CER: 30.0%) 顾不顾往后管不管为师千秋求求爱化解这万丈红尘纷乱永无休 (CER: 20.0%)
gū bù gū jiàng xiàng wàng hòu guǎn bù guǎn wàn shì qiān qiū qiú zhī qiú ài huà jiě zhè wàn zhàng hóng chén fēn luàn yǒng wú xiū qiān guàgù dēng chǎng wèn hòu wènwán bǎ shí jiān jiù dāng chū chóu ài wǒ jiāng gù bù gù zhèng xiàng wǎng hòu háiguān wèi shī qiān qiū qiú zhī qiú ài huà jiě zhè wàn zhàng hóng chén fēn luàn yǒng wú xiū gù bù gù zhǎng xiàng wǎng hòu guǎn bù guǎn wèi shī qiān qiū qiú zhī qiú ài huà jiě zhè wàn zhàng hóng chén fēn luàn yǒng wú xiū
#5 让我断了气铁了心爱的过火一回头就找到出路让我成为了无情的K歌之王
ràng wǒ duàn le qì tiě le xīn ài de guò huǒ yī huí tóu jiù zhǎo dào chū lù ràng wǒ chéng wéi liǎo wú qíng de K gē zhī wáng
让我断了气了心火一回头就找到出路让我成为了无情的K之王 (CER: 12.5%) 让我几天的拍电话你回头都早老住让我成为了无黑洞失望 (CER: 63.3%) 让我断了气了心火一回头就找到出路让我成为了无时光 (CER: 21.9%) 让我断了七天的的过火一回头就找到出路让我成为了无情的歌之王 (CER: 15.6%)
ràng wǒ duàn le qì tiē le xīn tài de huǒ huǒ yī huí tóu jiù zhǎo dào chū lù ràng wǒ chéng wéi liǎo wú qíng de K zhī wáng ràng wǒ děng le jǐ tiān de xīn pāi diàn huà nǐ huí tóu dōu zǎo lǎo zhù ràng wǒ chéng wéi liǎo wú xíng de hēi dòng shī wàng ràng wǒ duàn le qì tiē le xīn hǎi de huǒ huǒ yī huí tóu jiù zhǎo dào chū lù ràng wǒ chéng wéi liǎo wú xíng de pèishí guāng ràng wǒ duàn le qī tiān de xīn hǎi de guò huǒ yī huí tóu jiù zhǎo dào chū lù ràng wǒ chéng wéi liǎo wú qíng de pèi gē zhī wáng

Observation: After integrating diverse semantic-features, the intelligibility of the converted audios has improved.



Speaker Similarity

Finally, we evalute the speaker similarity for the converted audios.

Source WeNet Whisper ContentVec
WeNet + Whisper WeNet + ContentVec Whisper + ContentVec WeNet + Whisper + ContentVec

Observation: For different semantic-features, the speaker similarities of them are comparable and hard to rank from human perception.