Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pseudobulk Analysis with Decoupler and edgeR #5617

Open
wants to merge 47 commits into
base: main
Choose a base branch
from

Conversation

dianichj
Copy link
Contributor

@dianichj dianichj commented Dec 6, 2024

🚀 Pseudobulk Analysis with Decoupler and edgeR

Summary

This tutorial introduces pseudobulk analysis using Decoupler and edgeR in Galaxy 🌌. It covers data preparation, generating pseudobulk expression matrices, and performing differential expression analysis, including a Volcano Plot for final visualization of results 🌋📊.

🔗 Zenodo Link

https://zenodo.org/records/13929549

🎯 Objectives

  • 🧬 Understand pseudobulk analysis principles.
  • 🛠️ Generate pseudobulk expression matrices with Decoupler.
  • 📈 Perform differential expression analysis with edgeR.

✨ Key Points

  • 🔄 Pseudobulk bridges single-cell and bulk RNA-seq data.
  • 🧮 Decoupler enables pseudobulk matrix generation.
  • 🛡️ edgeR is robust for differential expression in pseudobulk data.

📋 Pending Items

  • Finalize tutorial last steps and instructions.
  • Add final plots and figures 🖼️.
  • Revise explanations for parameters and key steps 🧐.
  • Review formatting and style ✍️.

💡 Feel free to review and share your feedback—your input is much appreciated! 🙌

@dianichj dianichj marked this pull request as ready for review January 17, 2025 13:05
@dianichj dianichj requested a review from a team as a code owner January 17, 2025 13:05
@bgruening
Copy link
Member

The failing test is because of:

Liquid Exception: Liquid syntax error (line 325): Tag '{% Remove columns %}' was not properly terminated with regexp: /%}/ in topics/single-cell/tutorials/pseudobulk-analysis/tutorial.md

Copy link
Member

@shiltemann shiltemann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot @dianichj! This looks good, just a couple broken boxes and formatting tweaks (see below)

Just out of curiosity, are you all set up to get a local preview of your tutorial? This can also be done online using CodeSpaces now (see tutotrial)


In this tutorial, we will guide you through a pseudobulk analysis workflow using the **Decoupler** and **edgeR** tools available in Galaxy ({% cite Badia-iMompel2022 %}) ({% cite Liu2015 %}). These tools facilitate functional and differential expression analysis, and their output can be integrated with other Galaxy tools to visualize results, such as creating Volcano Plots, which we will also cover in this tutorial.

> <agenda-title>Pseudobulk Analysis Pipeline Agenda</agenda-title>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the agenda will be automatically generated based on your section titles, pleas replace this with

> <agenda-title></agenda-title>
>
> In this tutorial, we will cover:
>
> 1. TOC
> {:toc}
>
{: .agenda}

> 1. What are the output(s) of the edgeR tool?
> 2. How can we interpret our output result file?
>
> <solution-title>edgeR Outputs and Interpretation</solution-title>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since these boxes are nested, make sure to add a > in front of this line and every line below until the end of the solution box

# Key Takeaways and Recommendations

## Key Takeaways
- **Pseudobulk Analysis Advantage:** Pseudobulk analysis bridges single-cell and bulk RNA-seq approaches, combining high resolution with statistical robustness.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anything you add in key_points metadata will be added in a box at the end of the tutorial, so you could consider moving these there.

@dianichj
Copy link
Contributor Author

Thanks a lot @dianichj! This looks good, just a couple broken boxes and formatting tweaks (see below)

Just out of curiosity, are you all set up to get a local preview of your tutorial? This can also be done online using CodeSpaces now (see tutotrial)

Thanks a lot for all your comments, Saskia! I will check it again locally after going over all your revisions and will ping you with a comment. 😊

@dadrasarmin
Copy link
Contributor

Thanks a lot @dianichj! This looks good, just a couple broken boxes and formatting tweaks (see below)

Just out of curiosity, are you all set up to get a local preview of your tutorial? This can also be done online using CodeSpaces now (see tutotrial)

I did not know this Saskia. Thanks for mentioning it! I was following this. I noticed the linked you mentioned here is available here but I think the link in the README is the one that a new user would check.

Thanks.

@shiltemann
Copy link
Member

@dadrasarmin that is a very good point, I tend to forget about the README file, I will update that!

@dianichj
Copy link
Contributor Author

Hi @MarisaJL , here is the tutorial. Thanks lots for your help <3 !

Copy link
Member

@pavanvidem pavanvidem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dianichj looks super nice with great explanations!!


Beyond enhancing statistical validity, pseudobulk analysis enables the identification of cell-type-specific gene expression and functional changes across biological conditions. It balances the detailed resolution of single-cell data with the statistical power of bulk RNA-seq, providing insights into the functional transcriptomic landscape relevant to biological questions. Overall, for differential expression analysis in multi-sample single-cell experiments, pseudobulk approaches demonstrate superior performance compared to single-cell-specific DE methods ({% cite Squair2021 %}).

In this tutorial, we will guide you through a pseudobulk analysis workflow using the **Decoupler** and **edgeR** tools available in Galaxy ({% cite Badia-iMompel2022 %}) ({% cite Liu2015 %}). These tools facilitate functional and differential expression analysis, and their output can be integrated with other Galaxy tools to visualize results, such as creating Volcano Plots, which we will also cover in this tutorial.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
In this tutorial, we will guide you through a pseudobulk analysis workflow using the **Decoupler** and **edgeR** tools available in Galaxy ({% cite Badia-iMompel2022 %}) ({% cite Liu2015 %}). These tools facilitate functional and differential expression analysis, and their output can be integrated with other Galaxy tools to visualize results, such as creating Volcano Plots, which we will also cover in this tutorial.
In this tutorial, we will guide you through a pseudobulk analysis workflow using the **Decoupler** ({% cite Badia-iMompel2022 %}) and **edgeR** ({% cite Liu2015 %}) tools available in Galaxy. These tools facilitate functional and differential expression analysis, and their output can be integrated with other Galaxy tools to visualize results, such as creating Volcano Plots, which we will also cover in this tutorial.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not citing the original edgeR paper from 2009? https://doi.org/10.1093/bioinformatics/btp616

key_points:
- Pseudobulk analysis approach bridges the gap between single-cell and bulk RNA-seq data
- Decoupler tool generates a pseudobulk count matrix, enabling downstream differential expression and functional analyses
- edgeR is a robust tool for differential expression in pseudobulk datasets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add pbmc and combining sc datasets tutorials as requirements here


> <hands-on-title> Decoupler Pseudobulk </hands-on-title>
>
> 1. {% tool [Decoupler pseudo-bulk](toolshed.g2.bx.psu.edu/repos/ebi-gxa/decoupler_pseudobulk/decoupler_pseudobulk/1.4.0+galaxy5) %} tool with the following parameters:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you instead use the latest version (+galaxy8) of the tool?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will check if it is working =D


> <hands-on-title> Use Manipulate AnnData Tools to extract observations </hands-on-title>
>
> 1. Use the {% tool [Manipulate AnnData](toolshed.g2.bx.psu.edu/repos/iuc/anndata_manipulate/anndata_manipulate/0.10.9+galaxy0) %} tool with the following parameters:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the latest version, this function does not exist anymore. We moved this function to Scanpy Filter tool. Please adjust this hands-on step accordingly and also update the workflow.

> - *"Value"*: `T cell` (the name of the cluster of interest for subset analysis)
{: .hands_on}

After using the {% tool [Manipulate AnnData](toolshed.g2.bx.psu.edu/repos/iuc/anndata_manipulate/anndata_manipulate/0.10.9+galaxy0) %} tool to subset the cell type of interest, go back to the top of this tutorial to the hands-on **Pseudobulk with Decoupler** step, and you may perform once again the same steps in this smaller AnnData object that now should only include your T cells. Results from this analysis will correspond to differential expression between conditions only for T cells.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, replace this with Scanpy Filter

{: .hands_on}

After using the {% tool [Manipulate AnnData](toolshed.g2.bx.psu.edu/repos/iuc/anndata_manipulate/anndata_manipulate/0.10.9+galaxy0) %} tool to subset the cell type of interest, go back to the top of this tutorial to the hands-on **Pseudobulk with Decoupler** step, and you may perform once again the same steps in this smaller AnnData object that now should only include your T cells. Results from this analysis will correspond to differential expression between conditions only for T cells.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please include the results (plots and some questions with number of genes etc.) of the T cell subsampled data so that the users can compare results.

layout: tutorial_hands_on

title: Pseudobulk Analysis with Decoupler and EdgeR
zenodo_link: https://zenodo.org/records/13929549
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please include answer_histories:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants