(Selecting Connected Explanatory SNPs)
MATLAB code implementing the method described in:
C.-A. Azencott, D. Grimm, M. Sugiyama, Y. Kawahara and K. Borgwardt (2013) Efficient network-guided multi-locus association mapping with graph cuts, Bioinformatics 29 (13), i171-i179 doi:10.1093/bioinformatics/btt238
-
EasyGWAS: SConES has been integrated to EasyGWAS, a framework for the analysis and meta-analysis of GWAS data. In particular, this offers a Python interface.
-
sfan: Regarding the feature selection part (i.e. after the GWAS data has been processed and the SNP scored), sfan uses a different (faster) maxflow solver, is written in Python, and also incorporates the multi-task version proposed in
M. Sugiyama, C.-A. Azencott, D. Grimm, Y. Kawahara and K. Borgwardt (2014) Multi-task feature selection on multiple networks via maximum flows, SIAM ICDM, 199-207 doi:10.1137/1.9781611973440.23
- MultiSConeS: For the original version of this multi-task version, see Multi-SConES.
In the code
folder, there is a MATLAB script demo.m
.
To run the demo start MATLAB and type in:
demo()
X
= Genotypematrix of sizen x s
, wheren
is the number of samples ands
is the number of SNPsY
= Phenotypevector of sizen x 1
, wheren
is the number of samplesW
= sparse network with sizes x s
Demo files are provided in the data
folder.
<<<<<<< HEAD
=======
[indicators, objectives] = scones(data, option)
a957a00a7dfd4d289aeb55fba4d65e96b145adba To run SConES two parameters are needed. The first one is a data cell array:
data.X
is the genotype datadata.Y
is the phenotypedata.W
is the sparse network
<<<<<<< HEADdata.selected_PCs
is the number of principle components that should be used for population structure correctiondata.lambda_values
is a vector of size1 x k
withk
values forlambda
data.eta_values
is a vector of size1 x h
withh
values foreta
=======
data.selected_PCs
is the number of principal components that should be used for population structure correctiondata.lambda_values
is a vector of size1 x k
withk
values forlambda
data.eta_values
is a vector of size1 x h
withh
values foreta
a957a00a7dfd4d289aeb55fba4d65e96b145adba The second parameter is a options cell array (optional - default values are specified):
options.automatic
: if this parameter is truedata.lambda_values
anddata.eta_values
are determined automatically (default:true
)options.number_parameters
: this parameters specifices the number of eta and lambda values in the caseoptions.automatic
is set to true (default:10
)options.stdout
: if this parameter is true output is printed into the terminal window (default:true
)
<<<<<<< HEADoptions.nfold
: ifscones_crossvalidation
is called this parameter specifices the number of folds (default:10
)options.seed
: ifscones_crossvalidation
is called this parameter specifies a seed for splitting the data (default: 0)
=======
a957a00a7dfd4d289aeb55fba4d65e96b145adba
- indicators = indicator matrix of size
n x k x h
, wheren
is the length of vectorc
,k
the length of vectorlambda_values
andh
the length of vectoreta_values
- objectives = matrix with all objective values with size
k x h
for the grid oflambda x eta
values
<<<<<<< HEAD
=======
[indicators, objectives] = scones_crossvalidation(data, option)
The first parameter (data
) is the same as described above.
The second parameter (options
) can additionally take the following values:
options.nfold
: ifscones_crossvalidation
is called this parameter specifices the number of folds (default:10
)options.seed
: ifscones_crossvalidation
is called this parameter specifies a seed for splitting the data (default: 0)
a957a00a7dfd4d289aeb55fba4d65e96b145adba
Any questions can be directed to Chloe-Agathe Azencott: chloe-agathe.azencott [at] mines-paristech.fr