Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get list of genes that is important for similairty tables #324

Open
ChristopherMancuso opened this issue Sep 12, 2024 · 0 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@ChristopherMancuso
Copy link
Contributor

Try to figure out which genes from the user model were the most important to a given term in the similarity.

Some thoughts on this from Slack

Given the two models with coefficients [a_1, a_2, …, a_p] & [b_1, b_2, …, b_p] (where p is the number of embedding dimensions) — one trained based on the input and the other for the ‘similar’ term — a simple way to do this might be to do the following:
Given each gene’s embedding vector [x_1, x_2, …, x_p], calculate a score for the gene as the cosine similarity between it’s embedding vector and the vector [a_1+b_1, a_2+b_2, …, a_p+b_p], rank gene’s by this score, and report the top few.

However some things to think about

  1. If a term is not similar, are top genes confusing then?
  2. How to get scores for the genes (more z-scores?). Presenting top ten might not mean much.
@ChristopherMancuso ChristopherMancuso added the enhancement New feature or request label Sep 26, 2024
@ChristopherMancuso ChristopherMancuso self-assigned this Sep 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant