Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CompositionVis & modelVis output #135

Closed
SoleilSu opened this issue Aug 6, 2023 · 7 comments
Closed

CompositionVis & modelVis output #135

SoleilSu opened this issue Aug 6, 2023 · 7 comments
Assignees
Labels
question Further information is requested

Comments

@SoleilSu
Copy link

SoleilSu commented Aug 6, 2023

Hi Francisco,

I have tried to use metaGEM on my metagenomic samples by following the tutorial, and I am able to run the jobs without error. However, I have only 5 reassembled bins and 0 GEMs as shown in the GEMs.stats. I am not sure if it's because my sequence depth is not good enough, so I ran the toy datasets, but I got similar results. Especially, in both cases my compostionVis results look different from what is being presented in the tutorial. The graph does not show the distribution of different samples. I have attached the graph that I obtained from toy datasets below.
image

@SoleilSu SoleilSu added the question Further information is requested label Aug 6, 2023
@franciscozorrilla
Copy link
Owner

Hi @SoleilSu,

Regarding GEMs, were you able to install CarveMe and the academic version of the CPLEX solver? Please check your log files for the CarveMe jobs to see what the errror message there was.

Regarding the plot, I suspect that perhaps the abundance estimation output folders are empy, did you run the abundance calculation jobs?

Best,
Francisco

@SoleilSu
Copy link
Author

SoleilSu commented Aug 8, 2023

Thank you for the prompt response! I think Carveme works fine and I am able to get the XML files, and when I ran FBA on the XML files, it looks like there are reactions. Here is my modelVis pdf:
image

I ran the abundance job and I got a nonempty abundance.stats files:
image

@SoleilSu
Copy link
Author

SoleilSu commented Aug 8, 2023

I just fixed GEMs issues. I am using Carveme 1.5.2 so I need to change something in the snakefile when I ran the modelVis #52

But I still have a problem with CompositionVis.

@franciscozorrilla
Copy link
Owner

Great! You have your GEMs, taxonomy assignments, and abundance mappings 👍
Seems like the problem then is coming from the plotting script itself compositionVis.R

library(tidyverse)
library(tidytext)
taxonomy=read.delim("GTDBTk.stats",header=TRUE) %>%
select(user_genome,classification) %>%
separate(.,classification,into = c("kingdom","phylum","class","order","family","genus","species"),sep = ";")
abundance=read.delim("abundance.stats",header=FALSE)
colnames(abundance)=c("user_genome","absolute_ab","rel_ab")
taxab = left_join(taxonomy,abundance,by="user_genome")
taxab$sample = gsub("\\..*$","",taxab$user_genome)
taxab$species = gsub("s__$","Undefined sp.",taxab$species)
taxab$species = gsub("s__","",taxab$species)
ggplot(taxab%>% filter(species!="Undefined sp.")) +
geom_bar(aes(x=reorder_within(species,-rel_ab,sample),y=rel_ab*100),stat="identity") +
scale_x_reordered() +
facet_wrap(~sample,scales = "free") +
ylab("Relative abundance (%)") +
xlab("Species") +
coord_flip()
ggsave("compositionVis.pdf",width = 12,height=8)

Could you please share with me your GTDBTk.stats and abundance.stats so that I can try to reproduce your error?
I suspect maybe the two files are not being joined properly for some reason and therefore you get a blank plot.

Best,
Francisco

@SoleilSu
Copy link
Author

SoleilSu commented Aug 9, 2023

Yea, I just noticed that the files were not joined as expected because the sampleID was not appended to the user_genome in the GTDBTk.stats like this issue here: #110
But it seems that my bug occurs because the same variable is not accessible in the context of my Snakefile. So I made a change in the following code in CompositionRule in the Snakefile.

        for folder in */;do 
            samp=$(echo $folder|sed 's|^.*/||');
            cat $folder/classify/*summary.tsv| sed 's/orig/o/g' | sed 's/permissive/p/g' | sed 's/strict/s/g' | sed "s/^/$samp./g";
        done > GTDBTk.stats

I changed the $samp to $(basename $folder) like the following:

        for folder in */;do 
            cat $folder/classify/*summary.tsv| sed 's/orig/o/g' | sed 's/permissive/p/g' | sed 's/strict/s/g' | sed "s/^/$(basename $folder)./g";
        done > GTDBTk.stats

In this way, my GTDBTk.stats has the sampleID, and my plot looks like this now:
image

However, I am expecting that the different sampleID have different y-axis of species similar to the one in the tutorial. So I am not sure if I should edit the compositionVis.R

Thank you!

@SoleilSu SoleilSu changed the title CompositionVis output CompositionVis output & modelVis Aug 9, 2023
@SoleilSu SoleilSu changed the title CompositionVis output & modelVis CompositionVis & modelVis output Aug 9, 2023
@franciscozorrilla
Copy link
Owner

In line 19 of the script shown in my last comment, play around with the scales value from free to free_x or free_y, or simply remove this parameter from the facet_wrap() arguments

@SoleilSu
Copy link
Author

Ok thank you so much! It seems that all issues have been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants