Skip to content

[Tutorial] Generating a list of extrusion barrier from ChIP seq data

Roberto Rossini edited this page Jul 3, 2023 · 6 revisions

This tutorial presents one way of generating a list of extrusion barriers from ChIP-seq data and CTCF motif information.

Requirements

Packages

  • awk
  • bedtools
  • gzip
  • meme>=5.5
  • modle>1.0.1
Installing packages with conda
conda create -n modle-tut \
    -c conda-forge \
    -c bioconda \
    bedtools \
    gawk \
    gzip \
    'meme>=5.5' \
    'modle>1.0.1'

conda activate modle-tut

Files

Alternatives to fold change over control

If you do not have a BigWig file with FC over control, you can instead use peak summits.

Assuming you have summits in BED format, these are the additional step you have to follow before proceeding with the tutorial:

  1. Convert BED to bedGraph cut -f 1-3,5 summits.bed | sort -k1,1 -k2,2n > summits.bedgraph
  2. Convert bedGraph to bigWig bedGraphToBigWig summits.bedgraph hg38.chrom.sizes summits.bw

Now use summits.bw in place of ENCFF851FYN.bigWig throughout the tutorial.

Steps

1. Finding CTCF binding sites

# Extract the reference genome
user@modle-tut:/tmp$ gzip -dk hg38.fa.gz

# Find CTCF binding sites using MAST and write results in BED format
user@modle-tut:/tmp$ mast -hit_list MA0139.1.meme hg38.fa |
    grep -v  '^#' |
    awk 'BEGIN { FS="[[:space:]]+"; OFS="\t"; } { print $1,$5,$6,$3";"$4,"0",substr($2, 1, 1) }' |
    gzip -9 > ctcf_binding_sites.bed.gz

This will generate a gzip-compressed BED6 file like the following:

user@modle-tut:/tmp$ zcat ctcf_binding_sites.bed.gz | head

chr1	10471	10489	MA0139.1;CTCF	0	-
chr1	11164	11182	MA0139.1;CTCF	0	-
chr1	11223	11241	MA0139.1;CTCF	0	-
chr1	11281	11299	MA0139.1;CTCF	0	-
chr1	11340	11358	MA0139.1;CTCF	0	-
chr1	11402	11420	MA0139.1;CTCF	0	-
chr1	14231	14249	MA0139.1;CTCF	0	-
chr1	15228	15246	MA0139.1;CTCF	0	-
chr1	15627	15645	MA0139.1;CTCF	0	-
chr1	16651	16669	MA0139.1;CTCF	0	+

This file contains all CTCF binding sites identified by MAST (regardless of whether they are actually bound by CTCF)

For this reason, it is crucial to filter this list using e.g. CTCF ChIP-seq data.

2. Filtering candidate barrier sites

# Run bedtools to filter out CTCF binding sites not overlapping CTCF ChIP peaks
user@modle-tut:/tmp$ bedtools intersect -wa -a ctcf_binding_sites.bed.gz -b ENCFF330SHG.bed.gz |
    sort -u |
    gzip -9 > ctcf_binding_sites.filtered.bed.gz

3. Computing average occupancy of CTCF binding sites

user@modle-tut:/tmp$ modle_tools annotate-barriers ENCFF851FYN.bigWig ctcf_binding_sites.filtered.bed.gz |
    gzip -9 > extrusion_barriers.bed.gz

For each interval defined in the BED file, modle_tools annotate-barriers will fetch the peak summit from the .bigWig file provided as input, and transform it to a value between 0 and 1 using a logistic function (refer to section 6 of the suppl. text for MoDLE's paper and section "Converting CTCF ChIP-Seq to Orientation-Specific BE Permeability" from Fudenberg et al 2016 for more details).

This number is written to the score field of the output BED file, and will be interpreted by MoDLE as the average occupancy for a given extrusion barrier.

extrusion_barriers.bed.gz can be passed directly to modle simulate using the -b option to simulate loop extrusion with extrusion barrier occupancies set based on ChIP-seq data.