Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infoloss before or after sampling #10

Closed
simoons95 opened this issue Mar 3, 2023 · 2 comments
Closed

Infoloss before or after sampling #10

simoons95 opened this issue Mar 3, 2023 · 2 comments

Comments

@simoons95
Copy link

Hello, me again !

I read in your paper that your infoloss should be based on the distribution of the subgraphs knowing the original graph and the parameters.

However, in your code, in order, you 1) compute this distribution in logits, 2) sample with a gumbel-softmax trick, and 3) apply the infoloss on the sampled subgraph. From my understanding, you should rather 1) compute the distribution in logits, 2) transform the logits into probabilities, using the same temperature as in the gumbel-softmax code, 3) apply the infoloss on that distribution, and 4) do your gumbel-softmax trick on the logits to be used in other parts of the code.

Mathematically, I think what you do bring a lot of noise in the infoloss back-propagated gradients, and I would expect the loss to be more efficient and clean if you follow the order I propose. That is, apply the infoloss on (att_log_logits / temp).sigmoid() (with temp set to 1 in your code) rather than on self.sampling(att_log_logits, epoch, training).

What do you think? Have I missed something?
I would love to read your opinion on the matter.

ps: Thanks again for your paper and your reactivity to my previous issues!

@siqim
Copy link
Member

siqim commented Mar 3, 2023

Hi again!

This is another very good question! We did it intentionally in the paper of GSAT, and we explained a little bit here. What you suggest is mathematically correct, and our implementation was more like an empirical choice for more regularization. If I remember correctly, I found that using $\alpha$ in the info loss yielded better performance on spurious-motif datasets.

But in our follow-up work, LRI, we did more experiments on more realistic datasets, and we find that it seems using $\alpha$ or $p$ in the info loss do not have significant changes, and we adopt the mathematically correct way to implement our follow-up work, i.e., here.

Thanks again for your suggestions! I guess I need to add some doc in the code to make this point clear in the implementation of GSAT :)

Best,
Siqi

@siqim siqim pinned this issue Mar 3, 2023
@simoons95
Copy link
Author

Indeed, I missed the information. Thank you for your fast and clear answer!

@siqim siqim closed this as completed Mar 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants