-
Notifications
You must be signed in to change notification settings - Fork 899
update propd new genewise #8154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems to me that you changed quite some things in the R script. I would bump the version of the module then in the main.nf file aswell :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird that it is not running the tests for the subworkflows that contain propd...
could you also pass thoses tests please?
|
||
# get unique genes | ||
D=ncol(res@counts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you use more descriptive variable names? otherwise it would be difficult for any contributor or any person that wants to check the code to follow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also please add more detailed comments: what each step is doing, etc
# get unique genes | ||
D=ncol(res@counts) | ||
M=matrix(0,D,4) | ||
colnames(M)=c("rcDdis","Nsig","lrmD","LFC") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while it was great for ionas to use these names as column, i think it would be nice to use more descriptive column names for the pipeline
rownames(M)=colnames(res@counts) | ||
print("preparing matrices") | ||
pset=which(res@results[,"FDR"]<cut) #only significant pairs | ||
re1=prepmat(pset,D, res) #transform pairwise table to DxD matrix #TODO: Ask Ionas if its ok to pass res as an argument |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and yes, always give the arguments through the function calls instead of relying on global variables. So yes you should pass res
genes <- unique(c(results\$Pair, results\$Partner)) | ||
n_genes <- length(genes) | ||
aset=c(1:nrow(res@results)) #for LFC we need all pairs | ||
re2=prepmat(aset,D, res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you would need to explain with detail why to use the entire set here and not above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also what is the difference between re1 and re2 numerically? I would imagine that simply the pairs of non significant genes would have emtpy value?
|
||
pset=subs | ||
|
||
mat1=matrix(0,D,D) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably you don't need to create pa
manually
getResults
and getSignificantResults
functions from propr
should already return the results data frame with the columns named with gene names instead of index
# In other words, the set of genes that are significantly proportional to the gene, | ||
# hence connected to the gene in the network. | ||
set=which(re1[all,g]!=0) | ||
M[g,"lrmD"]=median(c(re1[intersect(set1,set),g],-re1[intersect(set2,set),g]))/log(2) #divide by log2 to get base 2 log |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these values should be equal to the old implementation mat['lfc']
, could you make sure?
@@ -411,7 +452,7 @@ if (opt\$permutation == 0) { | |||
|
|||
# parse genewise information from pairwise results | |||
|
|||
results_genewise <- get_genewise_information(results_pairwise) | |||
results_genewise <- get_genewise_information(pd, cut=0.01) # TODO: Ask Ionas if the value should be provided by the user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the cut is given already in the module parameters -> opt$fdr
par(mfrow = c(1, 2)) | ||
# Define significant genes | ||
meanset <- which(results[,"Nsig"] > mean(results[,"Nsig"])) # Genes above avg significance | ||
downset <- which(results[,"LFC"] < 0) # Downregulated genes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so lfc is calculated by considering all the lrm1 - lrm2 for a given gene.
Therefore it is > 0 if the gene is higher in the condition 1 than 2.
To make it coherent with the up/down regulated definition, you need to make sure you have the target as cond1 and reference as cond2.
This you could do it when parsing the input for the propd function
PR checklist
Closes #XXX
versions.yml
file.label
nf-core modules test <MODULE> --profile docker
nf-core modules test <MODULE> --profile singularity
nf-core modules test <MODULE> --profile conda
nf-core subworkflows test <SUBWORKFLOW> --profile docker
nf-core subworkflows test <SUBWORKFLOW> --profile singularity
nf-core subworkflows test <SUBWORKFLOW> --profile conda