You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Try to figure out which genes from the user model were the most important to a given term in the similarity.
Some thoughts on this from Slack
Given the two models with coefficients [a_1, a_2, …, a_p] & [b_1, b_2, …, b_p] (where p is the number of embedding dimensions) — one trained based on the input and the other for the ‘similar’ term — a simple way to do this might be to do the following:
Given each gene’s embedding vector [x_1, x_2, …, x_p], calculate a score for the gene as the cosine similarity between it’s embedding vector and the vector [a_1+b_1, a_2+b_2, …, a_p+b_p], rank gene’s by this score, and report the top few.
However some things to think about
If a term is not similar, are top genes confusing then?
How to get scores for the genes (more z-scores?). Presenting top ten might not mean much.
The text was updated successfully, but these errors were encountered:
Try to figure out which genes from the user model were the most important to a given term in the similarity.
Some thoughts on this from Slack
However some things to think about
The text was updated successfully, but these errors were encountered: