-
Notifications
You must be signed in to change notification settings - Fork 3
[Tutorial] Generating a list of extrusion barrier from ChIP seq data
This tutorial presents one way of generating a list of extrusion barriers from ChIP-seq data and CTCF motif information.
awkbedtoolsgzipmeme>=5.5modle>1.0.1
Installing packages with conda
conda create -n modle-tut \
-c conda-forge \
-c bioconda \
bedtools \
gawk \
gzip \
'meme>=5.5' \
'modle>1.0.1'
conda activate modle-tut- Reference genome: hg38 - download link
- CTCF freq. matrix: MA0139.1 - download link
- CTCF BigWig (fold change over control): ENCFF851FYN - download link
- CTCF narrowPeak: ENCFF330SHG - download link
Alternatives to fold change over control
If you do not have a BigWig file with FC over control, you can instead use peak summits.
Assuming you have summits in BED format, these are the additional step you have to follow before proceeding with the tutorial:
- Convert BED to bedGraph
cut -f 1-3,5 summits.bed | sort -k1,1 -k2,2n > summits.bedgraph - Convert bedGraph to bigWig
bedGraphToBigWig summits.bedgraph hg38.chrom.sizes summits.bw
Now use summits.bw in place of ENCFF851FYN.bigWig throughout the tutorial.
# Extract the reference genome
user@modle-tut:/tmp$ gzip -dk hg38.fa.gz
# Find CTCF binding sites using MAST and write results in BED format
user@modle-tut:/tmp$ mast -hit_list MA0139.1.meme hg38.fa |
grep -v '^#' |
awk 'BEGIN { FS="[[:space:]]+"; OFS="\t"; } { print $1,$5,$6,$3";"$4,"0",substr($2, 1, 1) }' |
gzip -9 > ctcf_binding_sites.bed.gzThis will generate a gzip-compressed BED6 file like the following:
user@modle-tut:/tmp$ zcat ctcf_binding_sites.bed.gz | head
chr1 10471 10489 MA0139.1;CTCF 0 -
chr1 11164 11182 MA0139.1;CTCF 0 -
chr1 11223 11241 MA0139.1;CTCF 0 -
chr1 11281 11299 MA0139.1;CTCF 0 -
chr1 11340 11358 MA0139.1;CTCF 0 -
chr1 11402 11420 MA0139.1;CTCF 0 -
chr1 14231 14249 MA0139.1;CTCF 0 -
chr1 15228 15246 MA0139.1;CTCF 0 -
chr1 15627 15645 MA0139.1;CTCF 0 -
chr1 16651 16669 MA0139.1;CTCF 0 +This file contains all CTCF binding sites identified by MAST (regardless of whether they are actually bound by CTCF)
For this reason, it is crucial to filter this list using e.g. CTCF ChIP-seq data.
# Run bedtools to filter out CTCF binding sites not overlapping CTCF ChIP peaks
user@modle-tut:/tmp$ bedtools intersect -wa -a ctcf_binding_sites.bed.gz -b ENCFF330SHG.bed.gz |
sort -u |
gzip -9 > ctcf_binding_sites.filtered.bed.gzuser@modle-tut:/tmp$ modle_tools annotate-barriers ENCFF851FYN.bigWig ctcf_binding_sites.filtered.bed.gz |
gzip -9 > extrusion_barriers.bed.gzFor each interval defined in the BED file, modle_tools annotate-barriers will fetch the peak summit from the .bigWig file provided as input, and transform it to a value between 0 and 1 using a logistic function (refer to section 6 of the suppl. text for MoDLE's paper and section "Converting CTCF ChIP-Seq to Orientation-Specific BE Permeability" from Fudenberg et al 2016 for more details).
This number is written to the score field of the output BED file, and will be interpreted by MoDLE as the average occupancy for a given extrusion barrier.
extrusion_barriers.bed.gz can be passed directly to modle simulate using the -b option to simulate loop extrusion with extrusion barrier occupancies set based on ChIP-seq data.