You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I read in your paper that your infoloss should be based on the distribution of the subgraphs knowing the original graph and the parameters.
However, in your code, in order, you 1) compute this distribution in logits, 2) sample with a gumbel-softmax trick, and 3) apply the infoloss on the sampled subgraph. From my understanding, you should rather 1) compute the distribution in logits, 2) transform the logits into probabilities, using the same temperature as in the gumbel-softmax code, 3) apply the infoloss on that distribution, and 4) do your gumbel-softmax trick on the logits to be used in other parts of the code.
Mathematically, I think what you do bring a lot of noise in the infoloss back-propagated gradients, and I would expect the loss to be more efficient and clean if you follow the order I propose. That is, apply the infoloss on (att_log_logits / temp).sigmoid() (with temp set to 1 in your code) rather than on self.sampling(att_log_logits, epoch, training).
What do you think? Have I missed something?
I would love to read your opinion on the matter.
ps: Thanks again for your paper and your reactivity to my previous issues!
The text was updated successfully, but these errors were encountered:
This is another very good question! We did it intentionally in the paper of GSAT, and we explained a little bit here. What you suggest is mathematically correct, and our implementation was more like an empirical choice for more regularization. If I remember correctly, I found that using $\alpha$ in the info loss yielded better performance on spurious-motif datasets.
But in our follow-up work, LRI, we did more experiments on more realistic datasets, and we find that it seems using $\alpha$ or $p$ in the info loss do not have significant changes, and we adopt the mathematically correct way to implement our follow-up work, i.e., here.
Thanks again for your suggestions! I guess I need to add some doc in the code to make this point clear in the implementation of GSAT :)
Hello, me again !
I read in your paper that your infoloss should be based on the distribution of the subgraphs knowing the original graph and the parameters.
However, in your code, in order, you 1) compute this distribution in logits, 2) sample with a gumbel-softmax trick, and 3) apply the infoloss on the sampled subgraph. From my understanding, you should rather 1) compute the distribution in logits, 2) transform the logits into probabilities, using the same temperature as in the gumbel-softmax code, 3) apply the infoloss on that distribution, and 4) do your gumbel-softmax trick on the logits to be used in other parts of the code.
Mathematically, I think what you do bring a lot of noise in the infoloss back-propagated gradients, and I would expect the loss to be more efficient and clean if you follow the order I propose. That is, apply the infoloss on
(att_log_logits / temp).sigmoid()
(withtemp
set to 1 in your code) rather than onself.sampling(att_log_logits, epoch, training)
.What do you think? Have I missed something?
I would love to read your opinion on the matter.
ps: Thanks again for your paper and your reactivity to my previous issues!
The text was updated successfully, but these errors were encountered: