Open
Description
Hi Guys,
Just FYI.
I found the mistake of forgetting the activation function in former implementation and get it fixed now: 30c43cf
But according to personal experiments on TiSASRec, you may prefer to undo it for getting better performance sometimes, pls check the issue as below for details:
Currently, just for SASRec, after trained 601 epochs, w/ ReLU I got:
test (NDCG@10: 0.5626, HR@10: 0.8073)
w/o ReLU I got:
test (NDCG@10: 0.5715, HR@10: 0.8157)
w/o ReLU and use AdamW instead of Adam, I got:
test (NDCG@10: 0.5781, HR@10: 0.8096)
still bit far from the paper reported:
test (NDCG@10: 0.5905, HR@10: 0.8245)
when setting maxlen=200 for all experiments, guess replacing MHA in PyTorch 1.6 with self-made MHA(https://github.com/pmixer/TiSASRec.pytorch/blob/e87342ead6e90898234432f7d9b86e76695008bc/model.py#L25) which may lead to a bit of leaky of future information
can remove the gap.
Regards,
Zan