Gene expression plays a crucial role in the development and function of the human body. An intricate system of regulatory mechanisms is necessary to ensure that each cell is able to perform its role in the survival and health of the overall organism. Dysregulation has been shown to cause potentially fatal developmental disorders as well as mental and physical diseases. Histone modifications and variants are fundamentally involved in gene expression regulation and their individual and combinatorial effects on gene expression are an active field of research. Similarly, motifs comprised of transcription factor and microRNA pairs co-regulating shared target genes modulate gene expression in many processes.
This thesis explored the relationship of differential gene expression and differential histone marks, specifically H3K4me2, H3K4me3, H3K27ac and H2A.Z, in these co-regulatory motifs by examining six human cell differentiation transitions. After differential gene expression and differential histone mark analysis, the expression and histone mark correlation for different motif types was statistically compared to the genome-wide correlation, as well as to motifs in randomised gene regulatory networks, and an annotation enrichment analysis was performed.
Observed correlation patterns genome-wide and in co-regulatory motifs were generally consistent with previous characterisations of individual histone marks. Motif correlations were overall stronger than genome-wide correlations, significantly so for some motif types in specific cell differentiation transitions. Some motif types were significantly enriched compared to randomised networks, and motif correlation was unusually strong in comparison to motifs in randomised networks in some instances.
Process annotations of motifs were enriched in regulation of transcription, development, cell differentiation and cell proliferation, while functional annotations showed enrichment of regulatory molecular binding, including to modifiers of epigenetic states. Additionally, there were annotation and correlation pattern differences between different motif types. Further and more in-depth study of histone marks in transcription factor and microRNA co-regulatory motifs seems warranted.
The main pipeline and functionality was implemented in Python 3 (version 3.6.8). The packages numpy (version 1.16.3) and pandas (version 0.24.2) were used for array and data frame handling. For the statistical analyses, the Pearson's correlation coefficient, hypergeometric and χ² -test implementation of the scipy (version 1.2.1) packages was used. Plots were generated with the matplotlib (version 3.0.3) and seaborn (version 0.9.0) packages.
Due to the large amount of data handled by the pipeline, intermediate pre-processing and analysis results were stored in SQLite databases that were handled with Python's built-in sqlite3 package. To improve runtime, the builtin multiprocessing package was used.
The C++ program Salmon (version 0.13.1) was employed to built the reference transcriptome index and to quantify the RNA-seq data for the differential gene expression analysis. The resulting transcript counts were summarised to the gene level and prepared for the subsequent differential analysis with the R (version 3.4.4) package tximport (version 1.6.0). The differential gene expression analysis itself was conducted with the R package DESeq2 (version 1.18.1) using default parameters.
The ChIP-seq BAM files were checked for duplicates, indexed and merged with the program SAMtools (version 1.7) in preparation for quantification and differential analysis. For the differential histone modification and histone variant analysis itself, the R package histoneHMM (version 1.7) was used with default parameters. The results were mapped onto the reference gene and promoter regions with the C++ program BEDTools (version 2.26.0). The gene regulatory networks were randomised with the help of the R package igraph (version 1.2.4.1).
These programs and R scripts were automatically executed on the command line with Python's built-in subprocess package.