You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 16s.md
+284-1Lines changed: 284 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -324,6 +324,18 @@ If you have time, copy all the commands from this tutorial in a file, a try to m
324
324
325
325
### PhyloSeq Analysis
326
326
327
+
First, install and load the phyloseq package:
328
+
329
+
```R
330
+
source('http://bioconductor.org/biocLite.R')
331
+
biocLite('phyloseq')
332
+
333
+
library("phyloseq")
334
+
library("ggplot2")
335
+
library("plyr")
336
+
theme_set(theme_bw()) # set the ggplot theme
337
+
```
338
+
327
339
The PhyloSeq package has an `import_mothur` function that you can use to import the files you generated with mothur. As an example, import the example mothur data provided by phyloseq as an example:
328
340
329
341
```R
@@ -348,5 +360,276 @@ For the rest of this tutorial, we will work with an example dataset provided by
348
360
349
361
```R
350
362
data(enterotype)
351
-
enterotype
363
+
data("GlobalPatterns")
364
+
```
365
+
366
+
#### Ordination and distance-based analysis
367
+
368
+
Let's do some preliminary filtering. Remove the OTUs that included all unassigned sequences ("-1")
Note that in this case, the Fisher calculation results in a warning (but still plots). We can avoid this by specifying a measures argument to plot_richness, which will include just the alpha-diversity measures that we want.
471
+
472
+
```R
473
+
plot_richness(GP, measures=c("Chao1", "Shannon"))
474
+
```
475
+
476
+
We can specify a sample variable on which to group/organize samples along the horizontal (x) axis. An experimentally meaningful categorical variable is usually a good choice – in this case, the "SampleType" variable works much better than attempting to interpret the sample names directly (as in the previous plot):
Now suppose we wanted to use an external variable in the plot that isn’t in the GP dataset already – for example, a logical that indicated whether or not the samples are human-associated. First, define this new variable, human, as a factor (other vectors could also work; or other data you might have describing the samples).
Now tell plot_richness to map the new human variable on the horizontal axis, and shade the points in different color groups, according to which "SampleType" they belong.
dots are annotated next to tips (OTUs) in the tree, one for each sample in which that OTU was observed. Let's color the dots by taxonomic ranks, and sample covariates:
Making a radial tree is easy with ggplot2, simply recognizing that our vertically-oriented tree is a cartesian mapping of the data to a graphic – and that a radial tree is the same mapping, but with polar coordinates instead.
The dataset is plotted with every sample mapped individually to the horizontal (x) axis, and abundance values mapped to the veritcal (y) axis. At each sample’s horizontal position, the abundance values for each OTU are stacked in order from greatest to least, separate by a thin horizontal line. As long as the parameters you choose to separate the data result in more than one OTU abundance value at the respective position in the plot, the values will be stacked in order as a means of displaying both the sum total value while still representing the individual OTU abundances.
561
+
562
+
The bar plot will be clearer with color to represent the Genus to which each OTU belongs.
563
+
564
+
```R
565
+
plot_bar(gp.ch, fill="Genus")
566
+
```
567
+
568
+
Now keep the same fill color, and group the samples together by the SampleType variable; essentially, the environment from which the sample was taken and sequenced.
The following two lines subset the dataset to just the top 300 most abundant Bacteria taxa across all samples (in this case, with no prior preprocessing. Not recommended, but quick).
There is a random aspect to some of the network layout methods. For complete reproducibility of the images produced later in this tutorial, it is possible to set the random number generator seed explicitly:
601
+
602
+
`set.seed(711L)`
603
+
604
+
Because we want to use the enterotype designations as a plot feature in these plots, we need to remove the 9 samples for which no enterotype designation was assigned (this will save us the hassle of some pesky warning messages, but everything still works; the offending samples are anyway omitted).
Create an igraph-based network based on the default distance method, “Jaccard”, and a maximum distance between connected nodes of 0.3.
611
+
612
+
```R
613
+
ig<- make_network(enterotype, max.dist=0.3)
614
+
plot_network(ig, enterotype)
615
+
```
616
+
617
+
The previous graphic displayed some interesting structure, with one or two major subgraphs comprising a majority of samples. Furthermore, there seemed to be a correlation in the sample naming scheme and position within the network. Instead of trying to read all of the sample names to understand the pattern, let’s map some of the sample variables onto this graphic as color and shape:
In the previous examples, the choice of maximum-distance and distance method were informed, but arbitrary. Let’s see what happens when the maximum distance is lowered, decreasing the number of edges in the network
0 commit comments