Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature selection method using pearson residuals discrepancies #3452

Open
2 of 3 tasks
jflucier opened this issue Jan 24, 2025 · 1 comment
Open
2 of 3 tasks

feature selection method using pearson residuals discrepancies #3452

jflucier opened this issue Jan 24, 2025 · 1 comment
Labels
Triage 🩺 This issue needs to be triaged by a maintainer

Comments

@jflucier
Copy link

jflucier commented Jan 24, 2025

Please make sure these conditions are met

  • I have checked that this issue has not already been reported.
  • I have confirmed this bug exists on the latest version of scanpy.
  • (optional) I have confirmed this bug exists on the main branch of scanpy.

What happened?

Hi,

I have performed multiple way to do features selection using pearson residuals and methods return different results.

Method 1 plot:
Image

Method 2 plot:
Image

I dont understand why these 2 methods dont produce the same result.

Based on those graph I think the best is method 2 but would like to understand why these dont produce similar plots.

Thanks in advance for your help

Minimal code sample

### 1st method:
analytic_pearson = sc.experimental.pp.normalize_pearson_residuals(adata, inplace=False)
adata.layers["analytic_pearson_residuals"] = csr_matrix(analytic_pearson["X"])
sc.pp.highly_variable_genes(adata, layer="analytic_pearson_residuals", n_top_genes=4000)
ax = sns.scatterplot(
data=adata.var, x="means", y="dispersions", hue="highly_variable", s=5
)

## 2nd method:

fig, ax = plt.subplots(1, 1, figsize=(10, 5))
sc.experimental.pp.highly_variable_genes(
adata, flavor="pearson_residuals", n_top_genes=4000
)
ax = sns.scatterplot(
data=adata.var, x="means", y="dispersions", hue="highly_variable", s=5
)
ax.set_xscale("log")
ax.set_yscale("log")
ax.set_title("Feature selection using Pearson residuals normalisation (from highly_variable_genes flavor pearson_residuals)")
pdf.savefig(bbox_inches="tight")

Error output

Versions


@jflucier jflucier added the Triage 🩺 This issue needs to be triaged by a maintainer label Jan 24, 2025
@jflucier
Copy link
Author

Another observation.

If I run scry on same data, I get this plot:
Image

The same plot but using highly_variable_genes suing flavor pearson_residuals:
Image

Though the pearson_resdual flavor was a reimplementation of scry but maybe I got this wrong.

Thanks for your help understanding this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Triage 🩺 This issue needs to be triaged by a maintainer
Projects
None yet
Development

No branches or pull requests

1 participant