Skip to content

predict-liftover-stability.R step not working #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nwiltsie opened this issue Jul 11, 2024 · 4 comments
Closed

predict-liftover-stability.R step not working #3

nwiltsie opened this issue Jul 11, 2024 · 4 comments
Assignees

Comments

@nwiltsie
Copy link
Member

I have an almost complete pipeline in the nwiltsie-bootstrap branch, but something is going wrong with predict-liftover-stability.R:

[1] 14030    21

Predicting liftover stability with RF-train_Mutect2_ntree100_nodesize5_classratio0.Rds
Error in predict.ranger.forest(forest, data, predict.all, num.trees, type,  :
  Error: One or more independent variables not found in data.
Calls: predict -> predict.ranger -> predict -> predict.ranger.forest
Execution halted

@nkwang24, can you help me debug?

Details

I have an NFTest case set up using the inputs from the original script:

rf_model = "/hot/project/method/AlgorithmEvaluation/BNCH-000142-GRCh37v38/sSNV/stableLift/train_CPCG-40QC_Mutect2/RF-train_Mutect2_ntree100_nodesize5_classratio0.Rds"
// Reference files
funcotator_data {
data_source = "/hot/ref/tool-specific-input/Funcotator/somatic/funcotator_dataSources.v1.7.20200521s"
src_reference_id = "hg19"
dest_reference_id = "hg38"
}
// The source reference sequence (FASTA)
src_fasta_ref = "/hot/ref/reference/GRCh37-EBI-hs37d5/hs37d5.fa"
// The destination reference sequence (FASTA)
dest_fasta_ref = "/hot/ref/reference/GRCh38-BI-20160721/Homo_sapiens_assembly38.fasta"
chain_file = "/hot/resource/genomics/liftover_chain_files/hg19ToHg38.over.chain"
repeat_bed = "/hot/project/method/AlgorithmEvaluation/BNCH-000142-GRCh37v38/intervals/GRCh38_RepeatMasker_intervals.bed"

---
sample_id: ExampleID
input:
vcf: /hot/project/method/AlgorithmEvaluation/BNCH-000142-GRCh37v38/validation/TCGA-SARC_WGS/GRCh37/Mutect2/TCGA-SARC_WGS_Mutect2_merge.vcf.gz

That successfully ran and produced the following directory tree:

$ tree /hot/software/pipeline/pipeline-StableLift/Nextflow/development/unreleased/nwiltsie-bootstrap \
  -I '*begin|*out|*sh|*trace|*run|*err'
/hot/software/pipeline/pipeline-StableLift/Nextflow/development/unreleased/nwiltsie-bootstrap
├── Example-test
│   └── pipeline-StableLift-0.1.0
│       └── ExampleID
│           ├── bcftools-branch-nwiltsie-bootstrap
│           │   └── intermediate
│           │       ├── apply_annotations:run_repeatmasker
│           │       │   └── RepeatMasker-ExampleID.vcf.gz
│           │       ├── apply_annotations:run_trinucleotide_annotate
│           │       │   └── Trinucleotide-ExampleID.vcf.gz
│           │       └── raw_liftover
│           │           ├── LiftOver-ExampleID-hg19-to-hg38.vcf.gz
│           │           └── LiftOver-ExampleID-hg19-to-hg38.vcf.gz.tbi
│           ├── bedtools-2.31.0
│           │   └── intermediate
│           │       └── apply_annotations:run_trinucleotide_context
│           │           ├── Trinucleotide-ExampleID.bed
│           │           └── Trinucleotide-ExampleID-full.tsv
│           ├── GATK-4.2.4.1
│           │   └── intermediate
│           │       └── run_funcotator
│           │           └── Funcotator-ExampleID-hg38.vcf.gz
│           ├── log-pipeline-StableLift-0.1.0-20240710T205201Z
│           │   ├── nextflow-log
│           │   │   ├── params.json
│           │   │   ├── report.html
│           │   │   ├── timeline.html
│           │   │   └── trace.txt
│           │   └── process-log
│           │       ├── apply_annotations
│           │       │   ├── run_compress_and_index
│           │       │   │   └── log.command.log
│           │       │   ├── run_repeatmasker
│           │       │   │   └── log.command.log
│           │       │   ├── run_trinucleotide_annotate
│           │       │   │   └── log.command.log
│           │       │   └── run_trinucleotide_context
│           │       │       └── log.command.log
│           │       ├── extract_features
│           │       │   ├── predict_variant_stability
│           │       │   │   └── log.command.log
│           │       │   └── run_extract_vcf_features
│           │       │       └── log.command.log
│           │       ├── raw_liftover
│           │       │   └── log.command.log
│           │       ├── run_funcotator
│           │       │   └── log.command.log
│           │       └── run_validate_PipeVal_with_metadata
│           │           ├── run_validate_PipeVal_with_metadata-1
│           │           │   └── log.command.log
│           │           └── run_validate_PipeVal_with_metadata-2
│           │               └── log.command.log
│           ├── samtools-branch-nwiltsie-bootstrap
│           │   └── intermediate
│           │       └── apply_annotations:run_compress_and_index
│           │           ├── output.tsv.gz
│           │           └── output.tsv.gz.tbi
│           ├── stablelift-branch-nwiltsie-bootstrap
│           │   └── intermediate
│           │       └── extract_features:run_extract_vcf_features
│           │           └── stablelift-ExampleID.Rds
│           └── validation
│               └── input_validation.txt
├── log-nftest-20240710T205036Z.log
├── log-nftest-20240710T205118Z.log
└── log-nftest-20240710T205153Z.log

37 directories, 28 files

The offending process log is in /hot/software/pipeline/pipeline-StableLift/Nextflow/development/unreleased/nwiltsie-bootstrap/Example-test/pipeline-StableLift-0.1.0/ExampleID/log-pipeline-StableLift-0.1.0-20240710T205201Z/process-log/extract_features/predict_variant_stability/log.command.log

Here is the command it ran:

$ cat Example-test/pipeline-StableLift-0.1.0/ExampleID/log-pipeline-StableLift-0.1.0-20240710T205201Z/process-log/extract_features/predict_variant_stability/log.command.sh
#!/bin/bash -ue
Rscript "/hot/code/nwiltsie/pipelines/pipeline-StableLift/./module/scripts/predict-liftover-stability.R"         --features-dt "features.Rds"         --rf-model "RF-train_Mutect2_ntree100_nodesize5_classratio0.Rds"         --variant-caller "Mutect2"         --output-tsv "stablelift-ExampleID.tsv"

I did make changes to that script compared to your original script - see 8f7530d and 9bfd132. For simplicity I was trying to separate all the computation from the plotting, and StableLift.sh did not pass the --discordance-file, so I stripped all of that code.

I also can't confirm that the prior steps are doing the "right" things - they run to completion but I don't have the know-how to verify the outputs. Here is a table matching filenames between your /hot/project/method/AlgorithmEvaluation/BNCH-000142-GRCh37v38/sSNV/stableLift/validate_TCGA-SARC_WGS_Mutect2 directory and the NFTest directory /hot/software/pipeline/pipeline-StableLift/Nextflow/development/unreleased/nwiltsie-bootstrap/Example-test/pipeline-StableLift-0.1.0/ExampleID:

Original Mine
TCGA-SARC_WGS_Mutect2_LiftOver-GRCh38.vcf.gz ./bcftools-branch-nwiltsie-bootstrap/intermediate/raw_liftover/LiftOver-ExampleID-hg19-to-hg38.vcf.gz
TCGA-SARC_WGS_Mutect2_LiftOver-GRCh38.vcf.gz.tbi ./bcftools-branch-nwiltsie-bootstrap/intermediate/raw_liftover/LiftOver-ExampleID-hg19-to-hg38.vcf.gz.tbi
TCGA-SARC_WGS_Mutect2_LiftOver-GRCh38_Funcotator.vcf.gz ./GATK-4.2.4.1/intermediate/run_funcotator/Funcotator-ExampleID-hg38.vcf.gz
TCGA-SARC_WGS_Mutect2_LiftOver-GRCh38_Funcotator.vcf.gz.tbi I'm not producing this file - maybe that's an issue?
TCGA-SARC_WGS_Mutect2_LiftOver-GRCh38_RepeatMasker.vcf.gz ./bcftools-branch-nwiltsie-bootstrap/intermediate/apply_annotations:run_repeatmasker/RepeatMasker-ExampleID.vcf.gz
TCGA-SARC_WGS_Mutect2_LiftOver-GRCh38_Trinucleotide.bed ./bedtools-2.31.0/intermediate/apply_annotations:run_trinucleotide_context/Trinucleotide-ExampleID.bed
Compressed into the next file ./bedtools-2.31.0/intermediate/apply_annotations:run_trinucleotide_context/Trinucleotide-ExampleID-full.tsv
TCGA-SARC_WGS_Mutect2_LiftOver-GRCh38_Trinucleotide.tsv.gz ./samtools-branch-nwiltsie-bootstrap/intermediate/apply_annotations:run_compress_and_index/output.tsv.gz
TCGA-SARC_WGS_Mutect2_StableLift-GRCh38_stability-scores.tsv.gz.tbi ./samtools-branch-nwiltsie-bootstrap/intermediate/apply_annotations:run_compress_and_index/output.tsv.gz.tbi
TCGA-SARC_WGS_Mutect2_LiftOver-GRCh38_annotated.vcf.gz ./bcftools-branch-nwiltsie-bootstrap/intermediate/apply_annotations:run_trinucleotide_annotate/Trinucleotide-ExampleID.vcf.gz
TCGA-SARC_WGS_Mutect2_LiftOver-GRCh38_annotated.Rds ./stablelift-branch-nwiltsie-bootstrap/intermediate/extract_features:run_extract_vcf_features/stablelift-ExampleID.Rds
TCGA-SARC_WGS_Mutect2_StableLift-GRCh38_stability-prediction.Rds This is the process at issue!
@nkwang24
Copy link
Collaborator

Hi, sorry let me provide you with an updated model that should clear this.

@nkwang24
Copy link
Collaborator

Try with this:

/hot/project/method/AlgorithmEvaluation/BNCH-000142-GRCh37v38/sSNV/stableLift/train_CPCG-40QC_Mutect2/RF-train_Mutect2_ntree2000_nodesize5_classratio0.Rds

I just tested this command so should be good to go!

Rscript /hot/code/nkwang/GitHub/uclahs-cds/project-method-AlgorithmEvaluation-BNCH-000142-GRCh37v38/report/manuscript/publish/predict-liftover-stability.R \
    --variant-caller Mutect2 \
    --features-dt /hot/project/method/AlgorithmEvaluation/BNCH-000142-GRCh37v38/sSNV/stableLift/validate_TCGA-SARC_WGS_Mutect2/TCGA-SARC_WGS_Mutect2_LiftOver-GRCh38_annotated.Rds \
    --rf-model /hot/project/method/AlgorithmEvaluation/BNCH-000142-GRCh37v38/sSNV/stableLift/train_CPCG-40QC_Mutect2/RF-train_Mutect2_ntree2000_nodesize5_classratio0.Rds

@nkwang24
Copy link
Collaborator

You might need to sync up your version of the predict-liftover-stability.R script with mine.

@nwiltsie
Copy link
Member Author

nwiltsie commented Jul 11, 2024

That solved it, thanks! Now it's failing because I excluded ROCR from the image... I suppose I'll give up and let that back in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants