-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpectedly High Proportion Genes Showing Relaxed Selection in HyPhy RELAX Analysis #59
Comments
Dear @jdaron, Generally, when you test a single branch in RELAX you are going to get incredibly noisy estimates of
Could you share one or two genes with me so that I could run Best, |
Dear Sergei, Thanks for your quick answer. I am currently working with hyphy version HYPHY 2.5.2, but I will try updating it with more recent version. I follow your advice and run RELAX using Drosophila melanogaster as foreground node, and I got 533 and 42 genes under relax or constrained selection with significant pvalue<0.05, which is consistant with my previous result but quiet unlikely from a biological perspective. (dmel{Foreground},((albi,(aatro,(angam,ansteph))),(urlow,(cuquin,((tosep,(wysmi,sabcya)),(aekor,(arsuba,(aaegL5,aalb)))))))); However in this analysis I am more interested by the signal in the rest of the tree, especially some of the basal node, such as Node10 or Node3. If you don't recommend to use the General Descriptive model to infer the K at each node you will suggest to test each node individually and count the number of significative genes under relaxed and constrained selection? Here is a folder of 20 genes (10 with significant K<1 or K>1) I am using if you want to have a look at them. Best, |
Dear @jdaron,
Could you, perhaps, provide some more background on what the specific hypothesis you are interested in testing? This will help me tailor my response better. A couple of things to note (I am using Results could be sensitive to run options.Here are some examples (* = p-value < 0.01) Vanilla default
Do not run all models (but just the minimal models)
Total 17 of which 2 are relaxed and 15 are intensified
Add synonymous rate variation, simplify distribution for small datasets, increase the number of starting pointsSynonymous rate variation is a significant cofounder in rate selection tests. Having 3 dN/dS rates for small alignments is an overkill.
Total 10 of which 0 are relaxed and 10 are intensified
Sanity Checks
This is reported in
Heterogeneity across branches.One other sanity check to look for is this: are there "outlier" branches? For example, if you have a single "background" branch with a very high ω (e.g. a short branch with only non-synonymous substitutions), this could pull the entire reference distribution away from the test branch. One way to deal with this is to narrow your reference set, if that's appropriate. For example, here I am using
![]() Best, |
Dear HyPhy developers,
I have recently been utilizing the HyPhy RELAX tool to analyze a set of ~2000 orthologous genes identified across various species of Drosophila and mosquitoes. In my analyses, I estimate the selection intensity parameter K at every node of the phylogenetic tree using the General Descriptive model.
Below is an example of a command line I used for launching the analysis:
hyphy RELAX --alignment $orthoGene.codonAlign.aln --tree m1.tree --test Foreground
Here is the input tree provided to RELAX:
(dmel,((albi,(aatro,(ansteph,angam)Node7)Node5)Node3,(urlow,(cuquin,((tosep,(sabcya,wysmi)Node17)Node15,(aekor,(arsuba,(aalb,aaegL5{Foreground})Node24)Node22)Node20)Node14)Node12)Node10));
In the bar graph below, I plotted the distribution of the exponent K for all tips and nodes. Surprisingly, most Drosophila melanogaster (dmel) genes have a K value less than 1 (K<1). In the plot, the dotted colored line represents the median value across the dataset, which I plotted alongside the tree for clearer visualization.
I have been replicating this analysis using the same gene set and alignment with CODEML, I obtained a dN/dS value of 0.002, which confirm that something is wrong in the way I am conducting the analysis with HyPhy RELAX and that Drosophila melanogaster could not have so many genes under relaxed selection.
I would greatly appreciate any suggestions or insights you could provide to help understand what might be going wrong in my analysis with HyPhy RELAX.
Best regards,
Josquin
relax.Kdistribution.pdf
relax.tree.pdf
The text was updated successfully, but these errors were encountered: