This repository contains most of the data and all the scripts used during my final grade project (Apr-Jun 2020)
DATA: Contains all the data generated by the uORFs_identifier and the process of adding more features to each uORF. The complete files are found in all_scores
FGP_RESULTS: Contains the custom track that can be uploaded to UCSC genome browser, and the code needed to create it and also the files with the scored uORFs using the 5 methods proposed.
REFERENCE_DATA: Contains the reference data used for completing the uORFs files with all the additional information. (there is a lot of data missing here, due to its big size, see Data Availability
SCRIPTS: Contains the scripts for identifying all the uORFs in a given genome (uORFs_identifier) and also for adding all the additional features.
replicate_yale_analysis: Contains all the analysis pipeline writen by McGillivray et al. (2018). We have used their scripts and adapted them with our methods and data to obtain a proper classification. R scripts and data are found inside.
time: Contains a file that records the time taken by each execution of the file.
GENCODE annotation files available at:
All conservations scores source files can be downloaded from these sites:
- phastCons:
- phyloP:
- phyloCSF: all PhyloCSF*.bw files
All ribosed data we used was from GWIPS table browser (
All these data should be downloaded and placed in the proper directories.
The uORFs identifier tool can be runned using a command like this:
perl -i ../../DATA/input_data/input_file.txt -of ../../DATA/raw_data/raw_uORFs/output_file.tsv -m number_of_processes -sp species 2> error
In our study:
perl -i ../../DATA/input_data/all_human_EnsemblGeneIDs_v34.txt -of ../../DATA/raw_data/raw_uORFs/allENSG_15-5-2020_at_17-12.tsv -m 100 -sp Human 2> error
Ensembl Core API installed see for the installation procedure. GitTools installation recommended in order to manage version updates easily.
For the execution, there are some additional dependencies of Perl modules. The following lines allow you to download the corresponding libraries in a Linux system, but there are other ways to install these dependencies.
sudo apt-get install -y libparallel-forkmanager-perl sudo apt-get install -y libdbd-mysql-perl