StrainChoosr
examines phylogenetic trees and will give you the X strains that represent the most diversity
within your tree. Given today's deluge of sequencing data, picking strains to do more detailed analysis
on can be difficult. Using this tool ensures you'll have the maximum amount of diversity possible in your
scaled down set of sequences. The algorithm behind it is described in Pardi2005 and Steel2005 so please be sure to cite them if you use
StrainChoosr
.
StrainChoosr
lives on PyPi, so you can install via pip/pip3:
pip install strainchoosr
Alternatively, you can install the latest version from git. Note that there may be breaking changes pushed to the repository, so this isn't necessarily advised.
pip install git+https://github.com/lowandrew/StrainChoosr.git
To use StrainChoosr
, all you need is a newick-formatted tree file.
To pick the 5 most diverse strains from a tree, type the following on the command line:
strainchoosr --treefile /path/to/tree.nwk --number 5
This will print the names of the 5 most diverse strains in your tree to the terminal, as well as create a file called strainchoosr_output.html in your current working directory that lets you visualize the output in any web browser.
To do the same within python:
from strainchoosr import strainchoosr strainchoosr.run_strainchoosr(treefile='/path/to/tree.nwk', number_representatives=[5])
In addition to printing the strains to terminal, run_strainchoosr
will return a dictionary where keys are
the number of representatives and values are lists of the strains selected for that
number of representatives.
Alternatively, if all you want to get is the list of strains and not generated html reports:
from strainchoosr import strainchoosr import ete3 tree = ete3.Tree('path/to/treefile.nwk') diverse_strains = strainchoosr.pd_greedy(tree=tree, number_tips=5, starting_strains=[])
This will get you a list of ete3.TreeNode objects that represent the 5 most diverse possible strains. You can then use
strainchoosr.get_leaf_names_from_nodes(diverse_strains)
to get a list of names.
Complete documentation on the strainchoosr
API can be found at https://strainchoosr.readthedocs.io/api.html.
If you have any problems or want a feature implemented, please feel free to open an issue. Similarly, if you want to add a feature or otherwise improve things, feel free to open a pull request.