The goal of qtlbim is to perform multivariate bayesian QTL mapping
after the methods developed in several journal articles by Yi, et
al. Currently, it works only with two-parent crosses, but we’d like to
develop methods for multiparent crosses like the Diversity Outbred mice.
This particular repository was forked from the CRAN Github repository and subsequently updated to work with new versions of gcc.
qtlbim isn’t on CRAN right now.
You can install the development version from GitHub with:
# install.packages("devtools")
devtools::install_github("fboehm/qtlbim")Detailed use instructions are available in the vignettes. We reproduce below part of the text from one vignette.
This is a basic example:
library(qtlbim)
#> Loading required package: qtl
#> Loading required package: lattice
#> Loading required package: coda
#> Loading required package: tools
#> Loading required package: MASS
## basic example codeThis document focuses on the hyper dataset from the qtl (Broman et
al. 2003) R package, which was initially studied in Sugiyama et
al. (2001). The hyper dataset is stored in qtl as a cross object.
The R/qtlbim package processes this cross object to create a qb
object called qbHyper, containing the MCMC samples. The hyper demo
shows how this is done.
It is possible to directly load the already saved qb object with the
data command. Following this by a call to qb.cross extracts a
version of the cross object used to create the qb object.
data(qbHyper)
hyper <- qb.cross(qbHyper)Alternatively, a object can be created by the following sequence of
commands. First load the hyper data set from R/qtl, and subset on the
autosomes, as R/qtlbim does not yet handle the X chromosome properly.
data(hyper)
hyper <- subset(hyper, chr=1:19)To run the MCMC sampler on the hyper data we use the command
hyper <- qb.genoprob(hyper, step=2)
# Now run the MCMC model selection algorithm.
# This can take several minutes.
qbHyper <- qb.mcmc(hyper, pheno.col = 1, seed = 1616)
#> qb.mcmc is running 60,600 iterations. The current iterations are saved:
#> 200
#> 400
#> 600
#> 800
#> 1000
#> 1200
#> 1400
#> 1600
#> 1800
#> 2000
#> 2200
#> 2400
#> 2600
#> 2800
#> 3000
#> MCMC sample has been saved to: ./bp_Feb-14-190121.
#> Bayesian MCMC took 0.74 minutes.
#> Warning: Removing external directory ./bp_Feb-14-190121The option seed sets the pseudorandom number seed so that this run can
be repeated exactly. The qb object called qbHyper is used throughout
this vignette.
This section describes in more detail how to create Markov chain Monte
Carlo (MCMC) samples from the Bayesian posterior to be used for QTL
mapping. The next step to mapping with the R/qtl package would be to
use the function calc.genoprob to create genotype probabilities based
on a Hidden Markov model. However, for Bayesian model selection, we
replace calc.genoprob with the R/qtlbim function qb.genoprob. The
function qb.genoprob performs some bookkeeping before calling
calc.genoprob with the variable stepwidth option for pseudomarker
positions. The probabilities for genotypes at pseudomarkers and at
markers with missing data are calculated by calc.genoprob from the
observed marker data using the multipoint method (Jiang and Zeng, 1997).
The MCMC samples are created by qb.mcmc after running qb.genoprob.
In the simplest case, MCMC samples are created with the following two
calls:
hyper <- qb.genoprob(hyper, step=2)
qbHyper <- qb.mcmc(hyper, pheno.col = 1)
#> qb.mcmc is running 60,600 iterations. The current iterations are saved:
#> 200
#> 400
#> 600
#> 800
#> 1000
#> 1200
#> 1400
#> 1600
#> 1800
#> 2000
#> 2200
#> 2400
#> 2600
#> 2800
#> 3000
#> MCMC sample has been saved to: ./bp_Feb-14-190206.
#> Bayesian MCMC took 0.75 minutes.
#> Warning: Removing external directory ./bp_Feb-14-190206By default the qb.mcmc function prints out progress messages of the
number of iterations completed. These progress messages can be
suppressed by setting verbose=FALSE. Arguments for the routines
qb.data and qb.model, described below, can be passed through
qb.mcmc. Otherwise, default values are used. The detail below for
qb.data.qb.model, andqb.mcmc` routines could be skipped in favor of
default settings.
The function qb.data specifies the traits to be analyzed, their
underlying distribution, the random and/or fixed covariates and whether
to standardize or to use a boxcox transformation. Note that, the cross
object can have several phenotypes and some of which could be used as
covariates.
qbData <- qb.data(hyper, pheno.col = 1, trait = "normal", fixcov = 0, rancov = 0)The R/qtlbim routines handle normal, binary and ordinal data. In
addition, the user can specify fixed (fixcov) and random (rancov)
covariate(s). [The pheno.col, fixcov, and rancov values can be
numeric indices to the phenotype names, or character strings with exact
phenotype names.] Fixed covariates can be included as interacting
covariates with the intcov option to qb.model (see below).
The function qb.model defines the model parameters, using defaults
that work well in most settings. Users are probably most interested in
specifying if epistasis is considered, the prior expected number of
main effect QTLs (main.nqtl), and the prior expected total number of
QTLs (mean.nqtl), which includes additional QTLs with only epistatic
effects. A user may set main.nqtl and mean.nqtl based on previous
QTL analysis, for example using R/qtl. Setting the maximum number of
QTLs overall (max.nqtl) or per chromosome (chr.nqtl), and setting
the minimum interval between linked QTL, can be used to restrict
sampling as needed.
Typically a real data set has several traits which can be considered as
covariates. The intcov option specifies which covariate(s) can
interact with QTLs, or equivalently, which environmental factors may
have GxE interactions. The intcov should be a vector of 0s and 1s of
the same length as the fixcov option specified for qb.data (see
above).
qbModel <- qb.model(hyper, epistasis = TRUE, main.nqtl = 3,
interval = rep(5,nchr(hyper)), chr.nqtl = rep(2,nchr(hyper)),
depen = FALSE, prop = c(0.5, 0.1, 0.05))The function qb.mcmc creates MCMC samples on the data and model
specified. The results are initially saved in a unique directory under
mydir, which is removed at completion of the command. Options for
qb.data and qb.model can be passed directly to qb.mcmc, or as the
objects created above.
qb <- qb.mcmc(hyper, data = qbData, model = qbModel, mydir = ".",
n.iter = 3000, n.thin = 20, genoupdate = TRUE)
#> qb.mcmc is running 60,600 iterations. The current iterations are saved:
#> 200
#> 400
#> 600
#> 800
#> 1000
#> 1200
#> 1400
#> 1600
#> 1800
#> 2000
#> 2200
#> 2400
#> 2600
#> 2800
#> 3000
#> MCMC sample has been saved to: ./bp_Feb-14-190251.
#> Bayesian MCMC took 0.75 minutes.
#> Warning: Removing external directory ./bp_Feb-14-190251The genoupdate option simulates pseudomarker and missing marker
genotypes if TRUE, or uses a Haley-Knott (1992) type approach if
FALSE; the latter is faster, but not generally recommended if there
are many missing genotypes or selective genotyping. n.iter samples are
saved, thinning to one in n.thin from the MCMC samples to reduce
serial correlation. That is, n.iter * n.thin samples are drawn, after
an initial n.burnin samples (1% of total by default) are discarded to
allow the chain to converge closer to the posterior distribution.