Skip to content

stracquadaniolab/baghera

Folders and files

NameName
Last commit message
Last commit date

Latest commit

468f962 · Jan 28, 2022
Jan 28, 2022
Jan 28, 2022
Jan 26, 2022
Jan 28, 2022
Jan 28, 2022
Apr 4, 2019
Apr 25, 2020
Apr 4, 2019
Apr 4, 2019
Apr 2, 2021
Jan 6, 2021
Jan 28, 2022
Aug 3, 2020
Jan 28, 2022

Repository files navigation

Bayesian Gene Heritability Analysis

alt-ci alt-version alt-version

The Bayesian Gene Heritability Analysis software (BAGHERA) estimates the contribution to the heritability of a trait/disease of all the SNPs in the genome (genome-wide heritability) and those nearby protein-coding genes (gene-level heritability).

BAGHERA requires only summary statistics from a Genome-wide Association Study (GWAS), LD scores calculated from a population matching the ethnicity of the GWAS study and a gene annotation file in GTF format.

Installation

The easiest and fastest way to install BAGHERA using conda

$ conda install -c stracquadaniolab -c bioconda -c conda-forge baghera

Tutorial

A typical BAGHERA analysis consists of 3 steps:

  1. Build a SNP annotation file, where SNPs are annotated to genes and are assigned an LD score. We used precomputed LD scores (https://github.com/bulik/ldsc), from the set of variants for the European population in 1000 Genomes, and protein coding genes as annotated in Gencode v31 (https://www.gencodegenes.org/releases/current.html). Overlapping genes within 50Kb were considered together, obtaining a dataset of 15,000 non-overlapping genes. To build your own annotation files, you should run the following command:
    $ baghera-tool create-files -l <ldscore_folder> -a <annotation.gtf> -s <ld_annotated_snps> -g <genes_table>
  1. Annotate summary statistics with the SNP annotation built in step 2. We used summary statistics available at http://www.nealelab.is/uk-biobank, followed by the command below:
    $ baghera-tool generate-snp-file -s <stats file> -i <input_type> -o <snps_file> -a <ld_annotated_snps>
  1. Run the regression.
    $ baghera-tool gene-heritability <snps_file> <results_table> <summary_table> <log_file> --sweeps <samples> --burnin <tuning> --n-chains <chains> --n-cores <cores> -m <models>

Example

Running BAGHERA on the UK Biobank summary statistics for breast cancer, using EUR LD scores and the Gencode annotation.

  $ baghera-tool create-files -l data/eur_w_ld_chr/ -a data/gencode.v31lift37.basic.annotation.gtf -s data/ld_annotated_gencode_v31.csv -g data/genes_gencode_v31.csv
  $ baghera-tool generate-snp-file -s data/C50.gwas.imputed_v3.both_sexes.tsv -i position_ukbb -o data/c50.snps.csv -a data/ld_annotated_gencode_v31.csv
  $ baghera-tool gene-heritability data/c50.snps.csv data/results_normal_c50.csv data/summary_normal_c50.csv data/log_normal_c50.txt --sweeps 10000 --burnin 2500 --n-chains 4 --n-cores 4 -m normal

Workflow

Alongside BAGHERA, we are providing a Snakemake workflow https://github.com/stracquadaniolab/workflow-baghera, including sample data to test our method.

Authors

Citation

The landscape of the heritable cancer genome Viola Fanfani, Luca Citi, Adrian L Harris, Francesco Pezzella and Giovanni Stracquadanio Cancer Res March 17 2021 DOI: 10.1158/0008-5472.CAN-20-3348

@article {Fanfani2021,
	author = {Fanfani, Viola and Citi, Luca and Harris, Adrian L and Pezzella, Francesco and Stracquadanio, Giovanni},
	title = {The landscape of the heritable cancer genome},
	elocation-id = {canres.3348.2020},
	year = {2021},
	doi = {10.1158/0008-5472.CAN-20-3348},
	publisher = {American Association for Cancer Research},
	issn = {0008-5472},
	journal = {Cancer Research}
}

Issues

We just released a major upgrade of the code, please report any issue.