You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the paper, it is said that the h_bert is the output of the Prosodic Text Encoder module, however, in the code, I believe it supposed to be variable d_en which is not the variable that being fed into other components.
In the implementation, bert_dur acts like h_bert in the paper but as I said above, in the paper, h_bert must bu Prosodic Text Encoder's output, which is d_en in the implementation I believe.
321 s_preds = sampler(noise = torch.randn_like(s_trg).unsqueeze(1).to(device),
322 embedding=bert_dur,
323 embedding_scale=1,
324 features=ref, # reference from the same speaker as the embedding
325 embedding_mask_proba=0.1,
326 num_steps=num_steps).squeeze(1)
327 loss_diff = model.diffusion(s_trg.unsqueeze(1), embedding=bert_dur, features=ref).mean() # EDM loss
328 loss_sty = F.l1_loss(s_preds, s_trg.detach()) # style reconstruction loss
Am I missing something?
The text was updated successfully, but these errors were encountered:
In the paper, it is said that the h_bert is the output of the Prosodic Text Encoder module, however, in the code, I believe it supposed to be variable d_en which is not the variable that being fed into other components.
train_second.py
309 bert_dur = model.bert(texts, attention_mask=(~text_mask).int())
310 d_en = model.bert_encoder(bert_dur).transpose(-1, -2)
In the implementation, bert_dur acts like h_bert in the paper but as I said above, in the paper, h_bert must bu Prosodic Text Encoder's output, which is d_en in the implementation I believe.
321 s_preds = sampler(noise = torch.randn_like(s_trg).unsqueeze(1).to(device),
322 embedding=bert_dur,
323 embedding_scale=1,
324 features=ref, # reference from the same speaker as the embedding
325 embedding_mask_proba=0.1,
326 num_steps=num_steps).squeeze(1)
327 loss_diff = model.diffusion(s_trg.unsqueeze(1), embedding=bert_dur, features=ref).mean() # EDM loss
328 loss_sty = F.l1_loss(s_preds, s_trg.detach()) # style reconstruction loss
Am I missing something?
The text was updated successfully, but these errors were encountered: