Chromatin accessibility is a measure of the ability of nuclear macromolecules to physically contact DNA, and is essential for understanding regulatory mechanisms.
OpenAnnotate facilitates the chromatin accessibility annotation for massive genomic regions by allowing ultra-efficient annotation across various biosample types based on chromatin accessibility profiles accumulated in public repositories (1236 samples from ENCODE and 1493 samples from ATACdb).
For more information, please refer to the web: http://health.tsinghua.edu.cn/openannotate/
We have also developed an R package called OpenAnnotateR, which can be accessed through this link.
Due to the update of the website deployment location, users should first set the latest address before using the service by running:
SetAddress('166.111.5.185', '80')
Anaconda users can first create a new Python environment and activate it via(this is unnecessary if your Python environment is managed in other ways)
conda create python=3.9 -n OpenAnnotatePy
conda activate OpenAnnotatePy==0.1.0
OpenAnnotate is available on pypi here and can be installed via
pip install OpenAnnotatePy
Code | Function |
---|---|
testWebserver() | test whether the web server is working normally |
setAddress(IP, port) | set the address of the web server |
help() | get a list of the various functions and arguments that the package contains. |
getParams() | get params list |
getCelltypeList(protocol, species) | get cell types for annotation |
getTissueList(protocol, species) | get tissue for annotation |
getSystemList(protocol, species) | get systems for annotation |
searchCelltype(protocol, species, keyword) | search for cell types that contain keyword |
searchTissue(protocol, species, keyword) | search for cell types that contain keyword |
searchSystem(protocol, species, keyword) | search for cell types that contain keyword |
setParams(assay, species, cell_type, perbase) | set parameters |
runAnnotate(input) | upload file to server |
getProgress(task_id) | you can view the annotation progress |
getAnnoResult(result_type,task_id,cell_type) | download the annotation result |
getInputFile(save_path, task_id) | get your input file from server |
viewParams(task_id) | view parameters |
getExampleTaskID() | get example task id |
getExampleInputFile(save_path) | get example input file to the save_path |
fromOpen2EpiScanpy(data_path, head_path) | generate anndata from annotation result |
Upload a region file to the web server and download the head and the readopen of the annotation result to the local path, then initialize an anndata for downstream analysis (Annotation in Per-region mode).
from OpenAnnotatePy import OpenAnnotateApi
oaa=OpenAnnotateApi.Annotate()
# GRCh37/hg19 Dnase-seq All-biosamples Per-region annotation mode
oaa.setParams(species=1, protocol=1, cell_type=1, perbase=1)
task_id=oaa.runAnnotate(input='./EXAMPLE.bed.gz')
anno_data = oaa.getAnnoResult(result_type = 2,task_id = task_id ,cell_type = 1)
anno_head = oaa.getAnnoResult(result_type = 1,task_id = task_id ,cell_type = 1)
ann_data = oaa.fromOpen2EpiScanpy(anno_data, anno_head)
Import
The package inclues a class named OpenAnnotatePy
, All functions are implemented by instantiating objects of this class.
from OpenAnnotatePy import OpenAnnotateApi
Instantiate object
Instantiate an object with the data path.
oaa=OpenAnnotateApi.Annotate()
Help
Get a list of the various functions and arguments that the package contains.
oaa.help()
'''
testWebserver() : test whether the web server is working normally
setAddress(IP, port) : set the address of the web server
getParams() : get params list
getCelltypeList(protocol,species) : get cell type list
getTissueList(protocol,species) : get tissue list
getSystemList(protocol,species) : get system list
searchCelltype(protocol, species, keyword) : search for cell types that contain keyword
searchTissue(protocol, species, keyword) : search for tissues that contain keyword and the corresponding cell types
searchSystem(protocol, species, keyword) : search for systems that contain keyword and the corresponding cell types
setParams(assay,species,cell_type,perbase) : set params list
runAnnotate(input) : Upload file to server
getProgress(task_id) : query the annotation progress
getAnnoResult(result_type,task_id,cell_type) : download annotation result to local path
getInputFile(save_path, task_id) : get your input file from server
viewParams(task_id) : view parameters
getExampleTaskID() : get example task id
getExampleInputFile(save_path) : get example input file to the save_path
fromOpen2EpiScanpy(data, head) : generate anndata from annotation result
'''
Get parameters
Get the parameters to be set.
# get basic parameters you need to set
oaa.getParams()
# get the corresponding cell type list
oaa.getCelltypeList(protocol, species)
# get the corresponding tissues list
oaa.getTissueList(protocol, species)
# get the corresponding systems list
oaa.getSystemList(protocol, species)
# search cell type
oaa.searchCelltype(protocol, species, keyword)
# search tissue and corresponding cell types
oaa.searchTissue(protocol, species, keyword)
# search system and corresponding cell types
oaa.searchSystem(protocol, species, keyword)
getParams()
: Return the parameter list ofspecies
,protocol
andAnnotate method
.getCelltypeList(protocol,species)
: Return the cell type list of the correspondingprotocol
andspecies
.species
:- 1 : GRCh37/hg19
- 2 : GRCh38/hg38
- 3 : GRCm37/mm9
- 4 : GRCm38/mm10
protocol
:- 1 : DNase-seq(ENCODE)
- 2 : ATAC-seq(ENCODE)
- 3 : ATAC-seq(ATACdb)
keyword
: Key word for search. Such asK562
andBlood
.
Set parameters
Set parameters for your object.
oaa.setParams(species, protocol, cell_type, perbase)
species
:- 1 : GRCh37/hg19
- 2 : GRCh38/hg38
- 3 : GRCm37/mm9
- 4 : GRCm38/mm10
protocol
:- 1 : DNase-seq(ENCODE)
- 2 : ATAC-seq(ENCODE)
- 3 : ATAC-seq(ATACdb)
cell_type
: refer to the functiongetCelltypeList()
.perbase
: 1 : Region based,2 : Per-base based.
Example file
The format of the chromatin regions in the input file.
chr1 10732070 10733118 . . .
chr1 10781239 10781744 . . .
chr1 10795106 10799241 . . .
chr1 10851570 10852173 . . .
chr1 10965129 10966144 . . .
chr1 11906876 11908666 . . .
Example task_id
and EXAMPLE.bed
file.
oaa.getExampleInputFile(save_path)
task_id=oaa.getExampleTaskID()
task_id
: The 16-bit identity of the submitted task.
Submit
Submit your file to server and return a task_id
for query progress and download results.
task_id=oaa.runAnnotate(input)
input
: The path of the '.bed' or '.bed.gz' file or alist/pandas.DataFrame
format variable to be uploaded, such as'/Users/example/example.bed'
.
Get Result
Get the current progress according to the task_id
, download the result file to the local path.
# You can view the annotation progress
oaa.getProgress(task_id)
# You can view the parameters you set before
oaa.viewParams(task_id)
oaa.getResultType()
'''
1 - head
2 - readopen
3 - peakopen
4 - spotopen
5 - foreread
'''
# download the annotate result
oaa.getAnnoResult(result_type, task_id ,cell_type )
# download the bed file from web server
oaa.getInputFile(save_path, task_id)
result_type
: The file type of the result, 1 - head, 2 - readopen, 3 - peakopen, 4 - spotopen, 5 - foreread.save_path
: Path to save download file.task_id
: The 16-bit identity of the submitted task.cell_type
: You can choose one specific or more cell types in the form oflist
Then we provide an interface anndata
, which can embed openness data into anndata structure for downstream analysis
# build ann data matrix from openness annotation result
fromOpen2EpiScanpy(self, data, head)
data
: path to the openness result file or the output from the functiongetAnnoResult()
head
: path to the openness head file or the output from the functiongetAnnoResult(result_type = 1)
# initial and get parameters
from OpenAnnotate import OpenAnnotateApi
oaa=OpenAnnotateApi.Annotate()
oaa.help()
oaa.getParams()
output:
Species list :
1 - GRCh37/hg19
1 - GRCh38/hg38
3 - GRCm37/mm9
4 - GRCm38/mm10
Protocol list :
1 - DNase-seq(ENCODE)
2 - ATAC-seq(ENCODE)
3 - ATAC-seq(ATACdb)
Annotate mode :
1 - Region based
2 - Per-base based
# get example bed and task id.
# download bed file from server
task_id=oaa.getExampleTaskID()
oaa.getExampleInputFile(save_path='.')
oaa.getInputFile(save_path='.', task_id=2023122816404225)
output:
Example task id: 2020121013091517
get the result to ./EXAMPLE.bed.gz
get the result to ./2023122816404225.bed
Then search for the system, tissue and cell type. After setting parameters, you can submit your job to the server.
oaa.getCelltypeList(protocol=1, species=1)
oaa.getTissueList(protocol=1, species=1)
oaa.getSystemList(protocol=1, species=1)
oaa.searchCelltype(protocol=1, species=1, keyword='K562')
oaa.searchTissue(protocol=1, species=1, keyword='blood')
oaa.searchSystem(protocol=1, species=1, keyword='Stem')
oaa.setParams(species=1, protocol=1, cell_type=1, perbase=1)
task_id=oaa.runAnnotate(input='./EXAMPLE.bed.gz')
# view parameters
oaa.viewParams(task_id=2023122816404225)
Or you can submit a bed file in list or pd.Dataframe format
import pandas as pd
regions = []
with open("./EXAMPLE.bed", "r") as file:
lines = file.readlines()
for line in lines:
regions.append(line.split('\t'))
task_id=oaa.runAnnotate(input=regions)
pd_regions = pd.Dataframe(regions)
task_id=oaa.runAnnotate(input=pd_regions)
output (Omit cell type):
Your task id is: 2023122816404225
You can get the progress of your task through getProgress(task_id=2023122816404225)
Your task's parameters:
Protocol: DNase-seq(ENCODE)
Species: GRCh37/hg19
Cell type: All biosample types
Annotate mode: perbase based
# download the result
oaa.getProgress(task_id=2023122816404225)
head = oaa.getAnnoResult(result_type=1, task_id=2023122816404225,cell_type=1)
output:
Your task has been completed!
You can get the result file type first through getResultType()
You can download result file through getAnnoResult(result_type, 2023122816404225)
get the result to ./head.txt.gz
# download the result
anndata = oaa.fromOpen2EpiScanpy('./results/readopen_2023122816404225.txt', './results/head_2023122816404225.txt')