DiverseSemanticSVC Demo Page

Performance of Different Semantic-based Features

Table: Objective evaluation results of different semantic-based features and their integration.

In this page, we will show some representative cases to

display the gap when using semantic-based features alone to conduct the SVC task.
reveal that the complementary roles of the diverse semantic-based features for SVC.

We select DiffWaveNetSVC as the base model and conduct experiments under the Recording Studio Setting. We use Opencpop as the target singer, whose training corpus is 5.2 hours of studio recorded singing voices. Here are some reference samples:

Spectrogram Reconstruction

Firstly, we aim to evaluate the reconstruction performance among different semantic-based features. The following samples are from Opencpop's test set:

	Ground Truth	WeNet	WeNet + Whisper	WeNet + Whisper + ContentVec
#1
MCD	0.00	12.56	8.00	5.02
Mel Spectrogram
#2
MCD	0.00	11.20	6.21	3.66
Mel Spectrogram
#3
MCD	0.00	13.40	8.82	6.51
Mel Spectrogram

Observation: After integrating diverse semantic-features, the mel spectrograms are approaching the ground truth. The overall quality of the audios are also better.

Melody Modeling

Furthermore, we use the M4Singer as the source audios to conduct the singing voice conversion. Our desired target singer is Opencpop. Here we evaluate the melody modeling capability of the different semantic-features.

	Source	WeNet	WeNet + Whisper	WeNet + Whisper + ContentVec
#1
F0CORR	1.00	-0.44	0.90	0.92
F0
#2
F0CORR	1.00	-0.22	0.61	0.68
F0
#3
F0CORR	1.00	-0.16	0.22	0.72
F0

Observation: After integrating diverse semantic-features, the trajectories of the melody between converted audios and ground truth are closer. However, we can also find that using only semantic-based features is hard to model melody adequately, appearing the “out of tune” for human hearing. Therefore, introducing explicit melody modeling (such as F0 features) for SVC remains necessary in the present technology context.

Lyrics Modeling

In this section, we evaluate the lyrics modeling capability of the different semantic-features. Like before, the source audios are from M4Singer and the target singer is Opencpop.

	Source	WeNet	WeNet + Whisper	WeNet + Whisper + ContentVec
#1	这只是刚刚入门接下来你还会会弹琴会写歌会双截棍 zhè zhǐ shì gāng gāng rù mén jiē xià lái nǐ hái huì huì tán qín huì xiě gē huì shuāng jié gùn

	这只是刚刚入门接下来你还会弹琴会写歌会上街滚 (CER: 18.2%)	这只是当童话吗地下安利啊狗货看清单写报上节归 (CER: 81.8%)	这只是刚刚入门接下来你还会弹琴会写歌会双戒柜 (CER: 13.6%)	就只是刚刚入门接下来你还会弹琴会写歌会双截棍 (CER: 9.1%)
	zhè zhǐ shì gāng gāng rù mén jiē xià lái nǐ hái huì tán qín huì xiě gē huì shàng jiē gǔn	zhè zhǐ shì dāng tóng huà ma dì xià ān lì a gǒu huò kàn qīng dān xiě bào shàng jié guī	zhè zhǐ shì gāng gāng rù mén jiē xià lái nǐ hái huì tán qín huì xiě gē huì shuāng jiè guì	jiù zhǐ shì gāng gāng rù mén jiē xià lái nǐ hái huì tán qín huì xiě gē huì shuāng jié gùn
#2	我好想对你对你宠爱才短短几个礼拜心情坏因为你不在 wǒ hǎo xiǎng duì nǐ duì nǐ chǒng ài cái duǎn duǎn jǐ gè lǐ bài xīn qíng huài yīn wèi nǐ bù zài

	我好想对你对你宠爱才短短几个礼拜心情会因为你不在 (CER: 4.2%)	同好心愿意爱出爱大胆即刻必败应尽力为你不猜 (CER: 90.5%)	我好心对你对你宠爱才等等几个礼拜心情欢意为你不在 (CER: 20.8%)	我好想对你对你宠爱才短短几个礼拜心情欢意为你不在 (CER: 8.33%)
	wǒ hǎo xiǎng duì nǐ duì nǐ chǒng ài cái duǎn duǎn jǐ gè lǐ bài xīn qíng huì yīn wèi nǐ bù zài	tóng hào xīn yuàn yì ài chū ài dà dǎn jí kè bì bài yīng jìn lì wèi nǐ bù cāi	wǒ hǎo xīn duì nǐ duì nǐ chǒng ài cái děng děng jǐ gè lǐ bài xīn qíng huān yì wèi nǐ bù zài	wǒ hǎo xiǎng duì nǐ duì nǐ chǒng ài cái duǎn duǎn jǐ gè lǐ bài xīn qíng huān yì wèi nǐ bù zài
#3	夕阳下我拉着你一起望着天天慢的静静的度过旧旧的时间 xī yáng xià wǒ lā zhe nǐ yì qǐ wàng zhe tiān tiān màn de jìng jìng de dù guò jiù jiù de shí jiān

	夕阳下午拉着你一起望着天静慢地静静地度过久久的时间 (CER: 24.0%)	心跳不欢声你一切望着天心慢的听见的多古典就失恋 (CER: 73.9%)	深夜下午拉着你一起望着天听慢的静静的兜过去的时间 (CER: 29.2%)	夕阳下我拉着你一起望着天给你慢的静静的度过久久的时间 (CER: 15.4%)
	xī yáng xià wǔ lā zhe nǐ yì qǐ wàng zhe tiān jìng màn dì jìng jìng dì dù guò jiǔ jiǔ de shí jiān	xīn tiào bù huān shēng nǐ yī qiè wàng zhe tiān xīn màn de tīng jiàn de duō gǔ diǎn jiù shī liàn	shēn yè xià wǔ lā zhe nǐ yì qǐ wàng zhe tiān tīng màn de jìng jìng de dōu guò qù de shí jiān	xī yáng xià wǒ lā zhe nǐ yì qǐ wàng zhe tiān gěi nǐ màn de jìng jìng de dù guò jiǔ jiǔ de shí jiān
#4	顾不顾将相王侯管不管万世千秋求只求爱化解这万丈红尘纷乱永无休 gù bù gù jiàng xiàng wáng hóu guǎn bù guǎn wàn shì qiān qiū qiú zhǐ qiú ài huà jiě zhè wàn zhàng hóng chén fēn luàn yǒng wú xiū

	孤不孤将相望后管不管万世千秋求知求爱化解这万丈红尘纷乱永无休 (CER: 16.7%)	牵挂不顾登场问候问不完把时间救当初仇爱我将 (CER: 72.3%)	顾不顾正向往后还不关为师千秋求知求爱化解这万丈红尘纷乱永无休 (CER: 30.0%)	顾不顾长相往后管不管为师千秋求知求爱化解这万丈红尘纷乱永无休 (CER: 20.0%)
	gū bù gū jiàng xiàng wàng hòu guǎn bù guǎn wàn shì qiān qiū qiú zhī qiú ài huà jiě zhè wàn zhàng hóng chén fēn luàn yǒng wú xiū	qiān guà bù gù dēng chǎng wèn hòu wèn bù wán bǎ shí jiān jiù dāng chū chóu ài wǒ jiāng	gù bù gù zhèng xiàng wǎng hòu hái bù guān wèi shī qiān qiū qiú zhī qiú ài huà jiě zhè wàn zhàng hóng chén fēn luàn yǒng wú xiū	gù bù gù zhǎng xiàng wǎng hòu guǎn bù guǎn wèi shī qiān qiū qiú zhī qiú ài huà jiě zhè wàn zhàng hóng chén fēn luàn yǒng wú xiū
#5	让我断了气铁了心爱的过火一回头就找到出路让我成为了无情的K歌之王 ràng wǒ duàn le qì tiě le xīn ài de guò huǒ yī huí tóu jiù zhǎo dào chū lù ràng wǒ chéng wéi liǎo wú qíng de K gē zhī wáng

	让我断了气贴了心态的火火一回头就找到出路让我成为了无情的K哥之王 (CER: 12.5%)	让我等了几天的心拍电话你回头都早老住让我成为了无形的黑洞失望 (CER: 63.3%)	让我断了气贴了心海的火火一回头就找到出路让我成为了无形的配歌时光 (CER: 21.9%)	让我断了七天的心海的过火一回头就找到出路让我成为了无情的配歌之王 (CER: 15.6%)
	ràng wǒ duàn le qì tiē le xīn tài de huǒ huǒ yī huí tóu jiù zhǎo dào chū lù ràng wǒ chéng wéi liǎo wú qíng de K gē zhī wáng	ràng wǒ děng le jǐ tiān de xīn pāi diàn huà nǐ huí tóu dōu zǎo lǎo zhù ràng wǒ chéng wéi liǎo wú xíng de hēi dòng shī wàng	ràng wǒ duàn le qì tiē le xīn hǎi de huǒ huǒ yī huí tóu jiù zhǎo dào chū lù ràng wǒ chéng wéi liǎo wú xíng de pèi gē shí guāng	ràng wǒ duàn le qī tiān de xīn hǎi de guò huǒ yī huí tóu jiù zhǎo dào chū lù ràng wǒ chéng wéi liǎo wú qíng de pèi gē zhī wáng