GeneWalk for novel gene function prediction #77

IwanParf · 2024-10-22T07:32:43Z

IwanParf
Oct 22, 2024

Dear GeneWalk developers,
thank you for this easy-to-use tool!
In your paper you are stating: "Currently, only connected GO terms are considered for identification of function relevance, but we imagine that GeneWalk could be extended to predict novel gene functions because of high similarity scores between a gene and unconnected GO terms"

This is exactly what I am interested in. I haven't tried yet, but I guess I could open the node_vectors_X.pkl files, extract the vector embedding of a gene of interest and compare it with different GO-BP vector embeddings via cosine similarity? I can do that for all calculated GeneWalk graphs (I used the default of 3) and for the randomized graphs and then perform a t test with H0 that the cosine similarities are not different between the graphs and their randomized counterparts? Is that roughly what you imagined what one could do, or is it more complicated? Do you have another suggestion how to do it or do you already have code in place that you are open to share?

IwanParf · 2024-10-22T09:07:39Z

IwanParf
Oct 22, 2024
Author

Okay, reading the paper again, I can determine the null distribution by taking all (right?) gene-GO pair cosine similarities from my 3 replicate random graphs and obtain the p value for a particular gene-GO pair of interest by assuming its cosine similarity comes from the same null distribution? What distribution are you using for H0, though? Normal, t?

1 reply

ri23 Oct 23, 2024
Maintainer

"I could open the node_vectors_X.pkl files, extract the vector embedding of a gene of interest and compare it with different GO-BP vector embeddings via cosine similarity? "
Yes correct

"I can determine the null distribution by taking all (right?) gene-GO pair cosine similarities from my 3 replicate random graphs and obtain the p value for a particular gene-GO pair of interest by assuming its cosine similarity comes from the same null distribution"
Yes you are correct, that distribution of random cosine similarities is the null distribution.

You can then determine the p-value from the gene-GO term cos similarity you want to test against this null distribution as done in this psim function by looking at its rank.
https://github.com/churchmanlab/genewalk/blob/master/genewalk/perform_statistics.py#L247
and then if needed average over the replicate runs indeed as done in perform_statistics.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GeneWalk for novel gene function prediction #77

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

GeneWalk for novel gene function prediction #77

IwanParf Oct 22, 2024

Replies: 1 comment · 1 reply

IwanParf Oct 22, 2024 Author

ri23 Oct 23, 2024 Maintainer

IwanParf
Oct 22, 2024

Replies: 1 comment 1 reply

IwanParf
Oct 22, 2024
Author

ri23 Oct 23, 2024
Maintainer