Skip to content

Commit

Permalink
Merge pull request #236 from bioinfo-chru-strasbourg/add_snpsift
Browse files Browse the repository at this point in the history
Add snpsift
  • Loading branch information
antonylebechec authored May 8, 2024
2 parents 5af156c + 38f4b30 commit c3ee6f7
Show file tree
Hide file tree
Showing 8 changed files with 188 additions and 48 deletions.
8 changes: 4 additions & 4 deletions docs/help.param.html
Original file line number Diff line number Diff line change
Expand Up @@ -995,7 +995,7 @@ <h4 id="annotationsnpeffcsvstats">annotation::snpeff::csvStats</h4>
<pre class="hljs"><code><div><span class="hljs-string">"csvStats"</span>: <span class="hljs-string">"OUTPUT.csv"</span>
</div></code></pre>
<h3 id="annotationsnpsift">annotation::snpsift</h3>
<p>Annotation process using snpSift. Provide a list of database files and annotation fields (can nnot be renamed).</p>
<p>Annotation process using snpSift. Provide a list of database files and annotation fields.</p>
<p>Examples:</p>
<blockquote>
<p>Annotation with multiple databases in multiple formats</p>
Expand All @@ -1015,7 +1015,7 @@ <h3 id="annotationsnpsift">annotation::snpsift</h3>
</div></code></pre>
<h4 id="annotationsnpsiftannotations">annotation::snpsift::annotations</h4>
<p>Specify the list of database files in formats VCF. Files need to be compressed and indexed.</p>
<p>This parameter enables users to select specific database fields (e.g. '&quot;field&quot;: null'). Use 'INFO' or 'ALL' keyword to select all fields within the database INFO/Tags header (e.g. '&quot;INFO&quot;: null', '&quot;ALL&quot;: null').</p>
<p>This parameter enables users to select specific database fields and optionally rename them (e.g. '&quot;field&quot;: null' to keep field name, '&quot;field&quot;: &quot;new_name&quot;' to rename field). Use 'INFO' or 'ALL' keyword to select all fields within the database INFO/Tags header (e.g. '&quot;INFO&quot;: null', '&quot;ALL&quot;: null').</p>
<p>If a full path is not provided, the system will automatically detect files within database folders (see Configuration doc) and assembly (see Parameter option).</p>
<p>Examples:</p>
<blockquote>
Expand All @@ -1032,8 +1032,8 @@ <h4 id="annotationsnpsiftannotations">annotation::snpsift::annotations</h4>
</blockquote>
<pre class="hljs"><code><div><span class="hljs-string">"annotations"</span>: {
<span class="hljs-attr">"tests/databases/annotations/current/hg19/dbnsfp42a.vcf.gz"</span>: {
<span class="hljs-attr">"Polyphen2_HDIV_pred"</span>: <span class="hljs-literal">null</span>,
<span class="hljs-attr">"ClinPred_pred"</span>: <span class="hljs-literal">null</span>,
<span class="hljs-attr">"Polyphen2_HDIV_pred"</span>: <span class="hljs-string">"PolyPhen"</span>,
<span class="hljs-attr">"ClinPred_pred"</span>: <span class="hljs-string">"ClinVar"</span>,
<span class="hljs-attr">"REVEL_score"</span>: <span class="hljs-literal">null</span>
}
}
Expand Down
8 changes: 4 additions & 4 deletions docs/help.param.md
Original file line number Diff line number Diff line change
Expand Up @@ -653,7 +653,7 @@ Examples:

### annotation::snpsift

Annotation process using snpSift. Provide a list of database files and annotation fields (can nnot be renamed).
Annotation process using snpSift. Provide a list of database files and annotation fields.

Examples:

Expand All @@ -678,7 +678,7 @@ Examples:

Specify the list of database files in formats VCF. Files need to be compressed and indexed.

This parameter enables users to select specific database fields (e.g. '"field": null'). Use 'INFO' or 'ALL' keyword to select all fields within the database INFO/Tags header (e.g. '"INFO": null', '"ALL": null').
This parameter enables users to select specific database fields and optionally rename them (e.g. '"field": null' to keep field name, '"field": "new_name"' to rename field). Use 'INFO' or 'ALL' keyword to select all fields within the database INFO/Tags header (e.g. '"INFO": null', '"ALL": null').


If a full path is not provided, the system will automatically detect files within database folders (see Configuration doc) and assembly (see Parameter option).
Expand All @@ -699,8 +699,8 @@ Examples:
```json
"annotations": {
"tests/databases/annotations/current/hg19/dbnsfp42a.vcf.gz": {
"Polyphen2_HDIV_pred": null,
"ClinPred_pred": null,
"Polyphen2_HDIV_pred": "PolyPhen",
"ClinPred_pred": "ClinVar",
"REVEL_score": null
}
}
Expand Down
Binary file modified docs/help.param.pdf
Binary file not shown.
8 changes: 4 additions & 4 deletions docs/json/help.param.json
Original file line number Diff line number Diff line change
Expand Up @@ -469,7 +469,7 @@
},
"snpsift": {
"__help": [
"Annotation process using snpSift. Provide a list of database files and annotation fields (can nnot be renamed)."
"Annotation process using snpSift. Provide a list of database files and annotation fields."
],
"__examples_code": [
"# Annotation with multiple databases in multiple formats",
Expand All @@ -490,7 +490,7 @@
"__auto": false,
"__help": [
"Specify the list of database files in formats VCF. Files need to be compressed and indexed.",
"This parameter enables users to select specific database fields (e.g. '\"field\": null'). Use 'INFO' or 'ALL' keyword to select all fields within the database INFO/Tags header (e.g. '\"INFO\": null', '\"ALL\": null').\n",
"This parameter enables users to select specific database fields and optionally rename them (e.g. '\"field\": null' to keep field name, '\"field\": \"new_name\"' to rename field). Use 'INFO' or 'ALL' keyword to select all fields within the database INFO/Tags header (e.g. '\"INFO\": null', '\"ALL\": null').\n",
"If a full path is not provided, the system will automatically detect files within database folders (see Configuration doc) and assembly (see Parameter option)."
],
"__examples_code": [
Expand All @@ -503,8 +503,8 @@
"# Annotation with dbNSFP (only PolyPhen, ClinVar and REVEL score)",
"\"annotations\": {",
" \"tests/databases/annotations/current/hg19/dbnsfp42a.vcf.gz\": {",
" \"Polyphen2_HDIV_pred\": null,",
" \"ClinPred_pred\": null,",
" \"Polyphen2_HDIV_pred\": \"PolyPhen\",",
" \"ClinPred_pred\": \"ClinVar\",",
" \"REVEL_score\": null",
" }",
"}",
Expand Down
80 changes: 44 additions & 36 deletions howard/objects/variants.py
Original file line number Diff line number Diff line change
Expand Up @@ -2439,6 +2439,7 @@ def update_from_vcf_duckdb(self, vcf_file: str) -> None:
AND table_parquet.\"POS\" = table_variants.\"POS\"
AND table_parquet.\"ALT\" = table_variants.\"ALT\"
AND table_parquet.\"REF\" = table_variants.\"REF\"
AND table_parquet.INFO NOT IN ('','.')
)
)
;
Expand Down Expand Up @@ -3059,6 +3060,19 @@ def annotation_snpsift(self, threads: int = None) -> None:
log.error(msg_err)
raise ValueError(msg_err)

# Config - bcftools
bcftools_bin_command = get_bin_command(
bin="bcftools",
tool="bcftools",
bin_type="bin",
config=config,
default_folder=f"{DEFAULT_TOOLS_FOLDER}/bcftools",
)
if not bcftools_bin_command:
msg_err = f"Annotation failed: no bcftools bin '{bcftools_bin_command}'"
log.error(msg_err)
raise ValueError(msg_err)

# Config - BCFTools databases folders
databases_folders = set(
self.get_config()
Expand Down Expand Up @@ -3221,6 +3235,7 @@ def annotation_snpsift(self, threads: int = None) -> None:
# Number of fields
nb_annotation_field = 0
annotation_list = []
annotation_infos_rename_list = []

for annotation_field in annotation_fields:

Expand All @@ -3242,6 +3257,12 @@ def annotation_snpsift(self, threads: int = None) -> None:
f"Annotation '{annotation_name}' - '{annotation_field}' -> '{annotation_fields_new_name}'"
)

# BCFTools annotate param to rename fields
if annotation_field != annotation_fields_new_name:
annotation_infos_rename_list.append(
f"{annotation_fields_new_name}:=INFO/{annotation_field}"
)

# Add INFO field to header
db_hdr_vcf_header_infos_number = (
db_hdr_vcf_header_infos[annotation_field].num or "."
Expand Down Expand Up @@ -3306,22 +3327,9 @@ def annotation_snpsift(self, threads: int = None) -> None:

if annotation_infos != "":

# tmp_annotation_vcf = NamedTemporaryFile(
# prefix=self.get_prefix(),
# dir=self.get_tmp_dir(),
# suffix=".vcf",
# delete=False,
# )
# tmp_annotation_vcf_name = tmp_annotation_vcf.name
# tmp_files.append(tmp_annotation_vcf_name)
# tmp_annotation_vcf_name_err = (
# tmp_annotation_vcf_name + ".err"
# )
# err_files.append(tmp_annotation_vcf_name_err)

# Annotated VCF (and error file)
tmp_annotation_vcf_name = os.path.join(
tmp_dir, os.path.basename(annotation) + ".vcf"
tmp_dir, os.path.basename(annotation) + ".vcf.gz"
)
tmp_annotation_vcf_name_err = (
tmp_annotation_vcf_name + ".err"
Expand All @@ -3333,8 +3341,16 @@ def annotation_snpsift(self, threads: int = None) -> None:
else:
annotation_infos_option = ""

# Info fields rename
if annotation_infos_rename_list:
annotation_infos_rename = " -c " + ",".join(
annotation_infos_rename_list
)
else:
annotation_infos_rename = ""

# Annotate command
command_annotate = f"{snpsift_bin_command} annotate {annotation_infos_option} {db_file} {tmp_vcf_name} 1>{tmp_annotation_vcf_name} 2>>{tmp_annotation_vcf_name_err} "
command_annotate = f"{snpsift_bin_command} annotate {annotation_infos_option} {db_file} {tmp_vcf_name} | {bcftools_bin_command} annotate --threads={threads} {annotation_infos_rename} -Oz1 -o {tmp_annotation_vcf_name} 2>>{tmp_annotation_vcf_name_err} "

# Add command
commands[command_annotate] = tmp_annotation_vcf_name
Expand All @@ -3348,6 +3364,7 @@ def annotation_snpsift(self, threads: int = None) -> None:
add_samples=False,
index=True,
)
shutil.copyfile(tmp_vcf_name, "/tmp/input.vcf")

# Num command
nb_command = 0
Expand All @@ -3358,8 +3375,12 @@ def annotation_snpsift(self, threads: int = None) -> None:
log.info(
f"Annotation - Annotate [{nb_command}/{len(commands)}]..."
)
log.debug(f"command_annotate={command_annotate}")
run_parallel_commands([command_annotate], threads)

# Debug
shutil.copyfile(commands[command_annotate], "/tmp/snpsift.vcf")

# Update variants
log.info(
f"Annotation - Updating [{nb_command}/{len(commands)}]..."
Expand Down Expand Up @@ -3555,17 +3576,6 @@ def annotation_bcftools(self, threads: int = None) -> None:
+ str(annotation_fields)
)

# Create file for field rename
log.debug("Create file for field rename")
tmp_rename = NamedTemporaryFile(
prefix=self.get_prefix(),
dir=self.get_tmp_dir(),
suffix=".rename",
delete=False,
)
tmp_rename_name = tmp_rename.name
tmp_files.append(tmp_rename_name)

# Number of fields
nb_annotation_field = 0
annotation_list = []
Expand Down Expand Up @@ -3623,7 +3633,13 @@ def annotation_bcftools(self, threads: int = None) -> None:
)
)

annotation_list.append(annotation_field)
# annotation_list.append(annotation_field)
if annotation_field != annotation_fields_new_name:
annotation_list.append(
f"{annotation_fields_new_name}:=INFO/{annotation_field}"
)
else:
annotation_list.append(annotation_field)

nb_annotation_field += 1

Expand Down Expand Up @@ -3674,14 +3690,6 @@ def annotation_bcftools(self, threads: int = None) -> None:

log.debug("Chromosomes found: " + str(list(chomosomes_list)))

# Add rename info
run_parallel_commands(
[
f"echo 'INFO/{annotation_field} {annotation_fields_new_name}' >> {tmp_rename_name}"
],
1,
)

# BED columns in the annotation file
if db_file_type in ["bed"]:
annotation_infos = "CHROM,POS,POS," + annotation_infos
Expand Down Expand Up @@ -3748,7 +3756,7 @@ def annotation_bcftools(self, threads: int = None) -> None:
)

# Command
command_annotate = f"{bcftools_bin_command} annotate --pair-logic exact --regions-file={tmp_bed_name} -a {db_file} -h {tmp_header_vcf_name} -c {annotation_infos} --rename-annots={tmp_rename_name} {tmp_vcf_name} -o {tmp_annotation_vcf_name} -Oz 2>>{tmp_annotation_vcf_name_err} && tabix {tmp_annotation_vcf_name} 2>>{tmp_annotation_vcf_name_err} "
command_annotate = f"{bcftools_bin_command} annotate --pair-logic exact --regions-file={tmp_bed_name} -a {db_file} -h {tmp_header_vcf_name} -c {annotation_infos} {tmp_vcf_name} -o {tmp_annotation_vcf_name} -Oz1 2>>{tmp_annotation_vcf_name_err} && tabix {tmp_annotation_vcf_name} 2>>{tmp_annotation_vcf_name_err} "

# Add command
commands.append(command_annotate)
Expand Down
44 changes: 44 additions & 0 deletions tests/test_commons.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,50 @@
from test_needed import *


def test_help():

# Init
help_json_file = os.path.join(folder_main, "docs", "json", "help.param.json")

# Help JSON to MD
help_content_md = help_generation_from_json(
help_json_file=help_json_file,
output_type="markdown",
title="test",
code_type="json",
)
assert help_content_md != ""

# Help JSON to HTML
help_content_html = help_generation_from_json(
help_json_file=help_json_file,
output_type="html",
title="test",
code_type="json",
)
assert help_content_html != ""

# Help
from howard.tools.tools import arguments, commands_arguments, shared_arguments

setup_cfg = f"{main_folder}/../../setup.cfg"
arguments_dict = {
"arguments": arguments,
"commands_arguments": commands_arguments,
"shared_arguments": shared_arguments,
}
help_content = help_generation(
arguments_dict=arguments_dict, setup=setup_cfg, output_type="markdown"
)
assert help_content != ""

# Gooey argument
argument_gooey = get_argument_gooey(
arguments=arguments, arg=list(arguments.keys())[0]
)
assert argument_gooey != ""


def test_identical_with_identical_files():
with tempfile.NamedTemporaryFile(
mode="w", delete=False
Expand Down
44 changes: 44 additions & 0 deletions tests/test_variants_annotations_bcftools.py
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,50 @@ def test_annotation_bcftools():
assert False


def test_annotation_bcftools_rename_field():
"""
This function tests the annotation of a VCF file using bcftools annotations.
"""

with TemporaryDirectory(dir=tests_folder) as tmp_dir:

# Init files
input_vcf = tests_data_folder + "/example.vcf.gz"
annotation_parquet = os.path.join(tests_annotations_folder, "nci60.vcf.gz")
output_vcf = f"{tmp_dir}/output.vcf.gz"

# Construct param dict
param = {
"annotation": {
"bcftools": {"annotations": {annotation_parquet: {"nci60": "nci61"}}}
}
}

# Create object
variants = Variants(
conn=None, input=input_vcf, output=output_vcf, param=param, load=True
)

# Remove if output file exists
remove_if_exists([output_vcf])

# Annotation
variants.annotation()

# query annotated variant
result = variants.get_query_to_df(
"""SELECT 1 AS count FROM variants WHERE "#CHROM" = 'chr7' AND POS = 55249063 AND REF = 'G' AND ALT = 'A' AND INFO = 'DP=125;nci61=0.66'"""
)
assert len(result) == 1

# Check if VCF is in correct format with pyVCF
variants.export_output()
try:
vcf.Reader(filename=output_vcf)
except:
assert False


def test_annotation_bcftools_bed():
"""
This function tests the annotation of a VCF file using bcftools and a bed file.
Expand Down
Loading

0 comments on commit c3ee6f7

Please sign in to comment.