get list of genes that is important for similairty tables #324

ChristopherMancuso · 2024-09-12T14:54:05Z

Try to figure out which genes from the user model were the most important to a given term in the similarity.

Some thoughts on this from Slack

Given the two models with coefficients [a_1, a_2, …, a_p] & [b_1, b_2, …, b_p] (where p is the number of embedding dimensions) — one trained based on the input and the other for the ‘similar’ term — a simple way to do this might be to do the following:
Given each gene’s embedding vector [x_1, x_2, …, x_p], calculate a score for the gene as the cosine similarity between it’s embedding vector and the vector [a_1+b_1, a_2+b_2, …, a_p+b_p], rank gene’s by this score, and report the top few.

However some things to think about

If a term is not similar, are top genes confusing then?
How to get scores for the genes (more z-scores?). Presenting top ten might not mean much.

ChristopherMancuso added the enhancement New feature or request label Sep 26, 2024

ChristopherMancuso self-assigned this Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

get list of genes that is important for similairty tables #324

get list of genes that is important for similairty tables #324

ChristopherMancuso commented Sep 12, 2024

get list of genes that is important for similairty tables #324

get list of genes that is important for similairty tables #324

Comments

ChristopherMancuso commented Sep 12, 2024