Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems running ASimulatoR from within DICAST #2

akahles opened this issue Jan 7, 2022 · 14 comments

Problems running ASimulatoR from within DICAST #2

akahles opened this issue Jan 7, 2022 · 14 comments


Copy link

akahles commented Jan 7, 2022

Dear DICAST team,

After some setup issues (outlined in #1), I was able to successfully start the GUI. I selected to simulate reads with ASimulatoR and am currently stuck at the following error message:

Loading required package: GenomeInfoDb
Loading required package: polyester
Loading required package: pbmcapply
found the following fasta files: 1.fa, 10.fa, 11.fa, 12.fa, 13.fa, 14.fa, 15.fa, 16.fa, 17.fa, 18.fa, 19.fa, 2.fa, 20.fa, 21.fa, 22.fa, 3.fa, 4.fa, 5.fa, 6.fa, 7.fa, 8.fa, 9.fa, Homo_sapiens.GRCh38.dna.primary_assembly.fa, MT.fa, placeholder.fa, X.fa, Y.fa
note that splice variants will only be constructed from chromosomes that have a corresponding fasta file

set data.table threads to 8
loading superset...
finished loading superset

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘seqnames’ for signature ‘"NULL"’
Calls: ... lapply -> FUN -> <Anonymous> -> <Anonymous> -> <Anonymous>
In addition: Warning message:
In .check_input_dir(input_dir) :
  found more than one gtf/gff file in input directory. using /input/Homo_sapiens.GRCh38.105.gtf...
Execution halted
mv: rename /opt/DICAST/src/ASimulatoR/out/*gtf to input/ASimulatoR.gtf: No such file or directory
mv: rename /opt/DICAST/src/ASimulatoR/out/*gff3 to input/ASimulatoR.gff3: No such file or directory
ls: /opt/DICAST/src/ASimulatoR/out/*fastq: No such file or directory

As a setup procedure, I used the script to populate the input structure and unzipped its contents. Then I started the GUI, selected the input directory, acknowledged possible overwrites and ticked the box for "Do you want to run ASimulatoR?".

Let me know if you need additional info from my side.



@amitfenn amitfenn self-assigned this Feb 2, 2022
Copy link

amitfenn commented Feb 3, 2022

Hi Andre,

Pardon the long wait. This is my first public repository on git and my first issue, and somehow I missed linking the notifications to my email. :) Welcome to the DICAST's git.
Can I ask you to if you changed the file scripts/ASimulatoR_config.R? If so, if you could share this file, I'll try an reproduce your bug exactly.

For now, I suspect that you've got max_genes = "NULL", which shows me a similar bug. Can I ask that you limit max genes to like 100 or 15000, and try running it again? If it still doesn't work, I'd also try removing all files such as src/ASimulatoR/in/*.rda.

Meanwhile I'll reach out to ASimulatoR guys about this bug.

Thanks for your inputs so far. I hope we can get ASimulatoR running for you soon.

Copy link

akahles commented Feb 4, 2022

Hi Amit,

thanks for your reply and no worries about the delay. I had a look and I made no modifications to scripts/ASimulatoR_config.R. Also, I verified that the current setting in the file is max_genes = 100.

I tried your suggestion to remove any files src/ASimulatoR/in/*.rda, but again the run failed. Here again the log. The only difference is that it re-created the superset now.

rule run_asimulator_rule:
    input: output/snakemake/log_pulled_base_os.txt
    output: output/snakemake/log_ran_asimulator.txt
    jobid: 5
    resources: tmpdir=/var/folders/17/yzmf8ktd6m5c05vzf7k59y0c0000gn/T

./src/ASimulatoR/ line 14: /opt/DICAST/scripts/ No such file or directory
ln: /opt/DICAST/src/ASimulatoR/in/1.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/10.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/11.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/12.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/13.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/14.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/15.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/16.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/17.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/18.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/19.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/2.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/20.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/21.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/22.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/3.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/4.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/5.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/6.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/7.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/8.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/9.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/MT.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/X.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/Y.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/Homo_sapiens.GRCh38.dna.primary_assembly.fa: File exists
ln: /opt/DICAST/src/ASimulatoR/in/Homo_sapiens.GRCh38.105.gtf: File exists
Loading required package: data.table
Loading required package: rtracklayer
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append,, basename, cbind, colnames,
    dirname,, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax,, pmin,, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:data.table’:

    first, second

The following object is masked from ‘package:base’:


Loading required package: IRanges

Attaching package: ‘IRanges’

The following object is masked from ‘package:data.table’:


Loading required package: GenomeInfoDb
Loading required package: polyester
Loading required package: pbmcapply
found the following fasta files: 1.fa, 10.fa, 11.fa, 12.fa, 13.fa, 14.fa, 15.fa, 16.fa, 17.fa, 18.fa, 19.fa, 2.fa, 20.fa, 21.fa, 22.fa, 3.fa, 4.fa, 5.fa, 6.fa, 7.fa, 8.fa, 9.fa, Homo_sapiens.GRCh38.dna.primary_assembly.fa, MT.fa, placeholder.fa, X.fa, Y.fa
note that splice variants will only be constructed from chromosomes that have a corresponding fasta file

set data.table threads to 8
importing gtf/gff...
finished importing gtf/gff

creating superset...
finished creating superset

saving superset...
finished saving superset

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘seqnames’ for signature ‘"NULL"’
Calls: ... lapply -> FUN -> <Anonymous> -> <Anonymous> -> <Anonymous>
In addition: Warning messages:
1: In .check_input_dir(input_dir) :
  found more than one gtf/gff file in input directory. using /input/Homo_sapiens.GRCh38.105.gtf...
2: In mclapply(X, FUN, ..., mc.cores = mc.cores, mc.preschedule = mc.preschedule,  :
  scheduled cores 1, 12, 17, 18, 19, 20 did not deliver results, all values of the jobs will be affected
Execution halted
mv: rename /opt/DICAST/src/ASimulatoR/out/*gtf to input/ASimulatoR.gtf: No such file or directory
mv: rename /opt/DICAST/src/ASimulatoR/out/*gff3 to input/ASimulatoR.gff3: No such file or directory
ls: /opt/DICAST/src/ASimulatoR/out/*fastq: No such file or directory
Removing temporary output file output/snakemake/log_pulled_base_os.txt.

Please let me know if any other information is required from my end.

Lastly, one more question. Is the simulation data you used for the evaluations in your DICAST preprint publicly available?

Thanks and Cheers,


Copy link

amitfenn commented Feb 4, 2022

Hi Andre,

Sorry we haven't the data public, because it is several 100GBs big, but if I could have a little more of your patience, ASimulatoR will give you many more datasets for you to try DICAST with. Furthermore, the author of ASimulatoR will join us by next week, if we haven't solved this by then.

Can I ask for the outputs of ls -lah src/ASimulatoR/in and cat scripts/

Thanks in advance.

Copy link

amitfenn commented Feb 4, 2022

Also, perhaps, if it's faster, I would ask you to install ASimulatoR directly in your R environment.

unfortunately, I still haven't been able to replicate your error message. Funny we were hoping that Docker in linux and mac would behave the same.

Copy link

akahles commented Feb 4, 2022

Hi Amit,

As requested, following the output of ls -lah src/ASimulatoR/in

(dicast-snakemake) akahles@host:/opt/DICAST$ ls -lah src/ASimulatoR/in
total 14959056
drwxrwxrwx  33 root     wheel   1.0K Feb  4 11:10 .
drwxrwxrwx  18 root     wheel   576B Jan  7 15:44 ..
-rw-r--r--   2 akahles  wheel   241M Jan  7 14:54 1.fa
-rw-r--r--   2 akahles  wheel   130M Jan  7 14:54 10.fa
-rw-r--r--   2 akahles  wheel   131M Jan  7 14:54 11.fa
-rw-r--r--   2 akahles  wheel   129M Jan  7 14:54 12.fa
-rw-r--r--   2 akahles  wheel   111M Jan  7 14:54 13.fa
-rw-r--r--   2 akahles  wheel   104M Jan  7 14:54 14.fa
-rw-r--r--   2 akahles  wheel    99M Jan  7 14:54 15.fa
-rw-r--r--   2 akahles  wheel    88M Jan  7 14:54 16.fa
-rw-r--r--   2 akahles  wheel    81M Jan  7 14:54 17.fa
-rw-r--r--   2 akahles  wheel    78M Jan  7 14:54 18.fa
-rw-r--r--   2 akahles  wheel    57M Jan  7 14:54 19.fa
-rw-r--r--   2 akahles  wheel   235M Jan  7 14:54 2.fa
-rw-r--r--   2 akahles  wheel    62M Jan  7 14:54 20.fa
-rw-r--r--   2 akahles  wheel    45M Jan  7 14:54 21.fa
-rw-r--r--   2 akahles  wheel    49M Jan  7 14:54 22.fa
-rw-r--r--   2 akahles  wheel   192M Jan  7 14:54 3.fa
-rw-r--r--   2 akahles  wheel   184M Jan  7 14:54 4.fa
-rw-r--r--   2 akahles  wheel   176M Jan  7 14:54 5.fa
-rw-r--r--   2 akahles  wheel   166M Jan  7 14:54 6.fa
-rw-r--r--   2 akahles  wheel   154M Jan  7 14:54 7.fa
-rw-r--r--   2 akahles  wheel   141M Jan  7 14:54 8.fa
-rw-r--r--   2 akahles  wheel   134M Jan  7 14:54 9.fa
-rw-r--r--   2 akahles  wheel   1.3G Jan  7 14:22 Homo_sapiens.GRCh38.105.gtf
-rw-r--r--   1 akahles  wheel   5.9M Feb  4 11:10 Homo_sapiens.GRCh38.105.gtf.exon_superset.rda
-rw-r--r--   2 akahles  wheel   2.9G Jan  7 14:21 Homo_sapiens.GRCh38.dna.primary_assembly.fa
-rw-r--r--   2 akahles  wheel    17K Jan  7 14:54 MT.fa
-rw-r--r--   2 akahles  wheel   151M Jan  7 14:54 X.fa
-rw-r--r--   2 akahles  wheel    55M Jan  7 14:54 Y.fa
-rwxr-xr-x   1 akahles  wheel     0B Jan  7 14:58 placeholder.fa
-rwxrwxrwx   1 root     wheel     0B Jan  7 09:24 placeholder.gtf
-rwxr-xr-x   1 akahles  wheel   1.3K Feb  4 11:04 runASS.R


cat scripts/
#     Basic Parameters     #

ncores=4                                                	#number of cores or threads the tool will use
workdir=/MOUNT			                         	#name of the base directory inside the Docker
outdir=$workdir/output/${tool:-unspecific}-output       	#name of the output directory; will be named after the specific tool that was used
read_length=100                                          	#length of reads inside fastq files
#     Input Directories     #

controlfolder=$inputdir/controldir         			#base directory for all needed input files (when no differential comparison, control inputs when differential AS Event Detection)
casefolder=$inputdir/casedir					#base directory for only case files (for AS Event detection)
fastqdir=$controlfolder/fastqdir       				#directory for fastqfiles
bamdir=$controlfolder/bamdir           				#directory for bamfiles
samdir=$controlfolder/bamdir           				#directory for samfiles
fastadir=$inputdir              				#directory for fastafile (might vary for specific tools -> see mapping or as-specific config file)
gtfdir=$inputdir                				#directory for gtffile
gffdir=$inputdir                				#directory for gfffile

#     Input Parameters     #

asimulator_gtf=Homo_sapiens.GRCh38.105.gtf			#name of the GTF file used to generate simulated data within ASimulatoR R library.
fastaname=Homo_sapiens.GRCh38.dna.primary_assembly.fa           #name of the genome reference file (fasta format), directory=$fastadir
gtfname=ASimulatoR.gtf        	       	                #name of gtf reference file, directory=$gtffile; set to ASimulatoR_gtf.gtf, when ASimulator is true
gffname=ASimulatoR.gff3					#set to ASimulatoR_gff.gff3, when ASimulator is true

fasta=${fastadir}/$fastaname                        #fasta full path
gtf=${gtfdir}/$gtfname                              #gtf full path
gff=${gffdir}/$gffname                              #gff full path

#    Mapping tool Parameters    #
### used only in mapping tools ###

outname=$tool	# basename of output file (will usually be prefixed with the fastq file name and suffixed with .sam)

#     Index     #

recompute_index=false						#force index to be computed even if index with $indexname already exists
indexname=${fastaname}_index					#basename of index (without eg. .1.bt2 for bowtie index)
star_index=$workdir/index/star_index                            #folder containing a star index built with the $gtf and $fasta files (used by: IRFinder, KisSplice, rMATS)
indexdir=$workdir/index/${tool:-unspecific}_index 		#directory of index

I can also give ASimulatoR a try directly. I assume, I could then still use the datasets within DICAST, as long as they are stored in the pre-defined structure?


Copy link

amitfenn commented Feb 9, 2022

Hi Andre,

Thank you for you patience, I'd ask you to try a quick hack for me. This is so I may know if this bug comes from something funny DICAST does, vs something that I should talk to the authors of ASimulatoR about.

Can you please make a quick bash script with the code below in your favorite new directory and see if it works. This is to run ASimulatoR with the same configurations as the default run on DICAST. This re-downloads the essential files needed for ASimulatoR.

This is the minimal code needed to run ASimulatoR independently.


mkdir ASimulator/{in,out} -p

# Downloading Human references fron Ensemble's ftp.

# Downloading bowtie genome fastas for each Chromosome.
for chromosomes in $(curl $link | cut -d ' ' -f2 | cut -d '"' -f3 | grep -v "nonchromosomal\|primary\|toplevel\|dna_\|alt" | grep Homo_sapiens|sed 's/...>//g'| tr -d '>'); do echo Downloading $chromosomes chromosome; curl -o ASimulator/in/$(echo $chromosomes | cut -d '.' -f 5-) $link$chromosomes; done

# Downloading the gtf
curl -o ASimulator/in/Homo_sapiens.GRCh38.105.gtf.gz
gzip -d ASimulator/in/Homo_sapiens.GRCh38.105.gtf.gz
gzip -d ASimulator/in/*fa.gz

# Modify this file after download, to customize your dataset
curl -o ASimulator/in/runASS.R

# Command to run ASimulatoR through the official docker.
docker run --rm --name $USER-$RANDOM-dicast-$tool --user $(id -u):$(id -g) -v $(pwd)/ASimulator/in:/input -v $(pwd)/ASimulator/out:/output biomedbigdata/asimulator

I'd copy the .fastq files from the newly created ASimulatoR/out/ to the place <DICAST-working-dir>/input/controldir/fastqdir/ for the rest of DICAST to evaluate these files.

Furthermore, I'd ask you to copy the ASimulatoR/out/event_annotation.tsv from this run to the location <DICAST-working-dir>/src/ASimulatoR/out/event_annotation.tsv.

Everything else should work fine.
I hope this gives you the files needed to run ASimulatoR and the rest of DICAST.
Let me know how it goes, this should give me a lot more clues to try and narrow down the bug from my side.
Thanking you in advance.

Copy link

akahles commented Feb 9, 2022

Hi Amit,

thanks for posting the script. I gave it a try, but got the same error in the end. I skip the output of the download section in the beginning and will only paste the log after:

Loading required package: data.table
Loading required package: rtracklayer
Loading required package: GenomicRanges
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: ‘BiocGenerics’

The following objects are masked from ‘package:parallel’:

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from ‘package:stats’:

    IQR, mad, sd, var, xtabs

The following objects are masked from ‘package:base’:

    anyDuplicated, append,, basename, cbind, colnames,
    dirname,, duplicated, eval, evalq, Filter, Find, get, grep,
    grepl, intersect, is.unsorted, lapply, Map, mapply, match, mget,
    order, paste, pmax,, pmin,, Position, rank,
    rbind, Reduce, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors

Attaching package: ‘S4Vectors’

The following objects are masked from ‘package:data.table’:

    first, second

The following object is masked from ‘package:base’:


Loading required package: IRanges

Attaching package: ‘IRanges’

The following object is masked from ‘package:data.table’:


Loading required package: GenomeInfoDb
Loading required package: polyester
Loading required package: pbmcapply
found the following fasta files: 1.fa, 10.fa, 11.fa, 12.fa, 13.fa, 14.fa, 15.fa, 16.fa, 17.fa, 18.fa, 19.fa, 2.fa, 20.fa, 21.fa, 22.fa, 3.fa, 4.fa, 5.fa, 6.fa, 7.fa, 8.fa, 9.fa, MT.fa, X.fa, Y.fa
note that splice variants will only be constructed from chromosomes that have a corresponding fasta file

set data.table threads to 8
importing gtf/gff...
finished importing gtf/gff

creating superset...
finished creating superset

saving superset...
finished saving superset

Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘seqnames’ for signature ‘"NULL"’
Calls: ... lapply -> FUN -> <Anonymous> -> <Anonymous> -> <Anonymous>
In addition: Warning message:
In mclapply(X, FUN, ..., mc.cores = mc.cores, mc.preschedule = mc.preschedule,  :
  scheduled cores 9, 16, 17, 18, 19, 20 did not deliver results, all values of the jobs will be affected
Execution halted

I did not make any changes to the script that you posted above. Let me know if I should have.

Thanks and Cheers,


Copy link

amitfenn commented Feb 9, 2022

Thank you Andre, for your quick response.
No, there were no changes to the previous script needed.

Are you running this in a Mac environment? Can we have the specs from your machine and of the OS? Do you have access to a linux machine you could use? if not, we can try and figure out how to transfer the data we had from ASimulatoR for transparency's sake.

Unfortunately this might be where we learn that ASimulatoR doesn't run on mac and maybe DICAST too :(.
I'll wait to hear back from ASimulatoR's author.

Copy link

akahles commented Feb 9, 2022

Hi Amit,

I am running on a Mac with the following setup:

  • MacBook Pro 16 inch 2019; 2.3 GHz 8-Core Intel Core i9; 32 GB 2667 MHz DDR4
  • macOS Monterey 12.2 (21D49)
  • Docker version 20.10.12, build e91ed57

Let me know if you need any more details.

Happy to try it on an a linux machine. I will let you know how it went.

Just out of curiosity, would the setup also run with Singularity instead of Docker?

Thanks and Cheers,


Copy link

amitfenn commented Feb 9, 2022

Thanks Andre,

This is perfect. We do plan to develop on Singularity soon, but unfortunately this is still in future work.. We wanted to start with dockers and port docker images to singularity images. Stay tuned at this repo for further news.

Thanks for your support so far.

Copy link

Hi Andre,

Finally, I found some time to look at this issue and the only thing I can identify is that somehow the exon_superset file gets corrupted. You have helped us a lot already, might I ask you to try to use the attached superset instead of the one you generated? You'll have to rename it to Homo_sapiens.GRCh38.105.gtf.exon_superset.rda because github doesn't allow that file extension.

Thank you in advance.

Copy link

Hi Quirin,
I received the following error message when I tried to run ASimulatoR.

finished loading superset

assign variants to supersets...
create splicing variants and annotation. This may take a while...
finished creating splicing variants and annotation

exporting gtf for read simulation...
Error in file(file, ifelse(append, "a", "w")) :
cannot open the connection
Calls: ... -> export -> export -> .local -> cat -> file
In addition: Warning messages:
1: In mclapply(X, FUN, ..., mc.cores = mc.cores, mc.preschedule = mc.preschedule, :
scheduled cores 2, 3, 4, 10, 11, 18 did not deliver results, all values of the jobs will be affected
2: In file(file, ifelse(append, "a", "w")) :
cannot open file '/output//splicing_variants.gtf': Permission denied
Execution halted
mv: cannot stat '/mnt/301e9812-51b9-4539-a38f-225aa84a1d0b/DICAST/DICAST/src/ASimulatoR/out/*gtf': No such file or directory
mv: cannot stat '/mnt/301e9812-51b9-4539-a38f-225aa84a1d0b/DICAST/DICAST/src/ASimulatoR/out/*gff3': No such file or directory
ls: cannot access '/mnt/301e9812-51b9-4539-a38f-225aa84a1d0b/DICAST/DICAST/src/ASimulatoR/out/*fastq': No such file or directory

Copy link

Hi @NormanRog,
is this related to this issue?
If so, did you use the script posted above?
If not, please open another issue, in the ASimulatoR repository, with more information on how you called the function.
I will probably only find time to look at this at the end of the week, so I would appreciate it if you could provide as much information as possible.


Copy link

After looking into this more deeply, this looks like a memory issue because of too many processes being spawned.
ncores seems to be set to 20 although only 8 were available.

set data.table threads to 8


scheduled cores 9, 16, 17, 18, 19, 20 did not deliver results, all values of the jobs will be affected

Forking 20 processes probably took too much memory, which led to some being killed and not delivering results in both steps: creating the superset and7or the variants (a corrupted superset will lead to the error unable to find an inherited method for function ‘seqnames’ for signature ‘"NULL"’).

I added a few lines to the ASimulatoR for better documentation and limited ncores to the number of available ones. This might still be not enough. I would recommend monitoring the memory usage while simulating.

Hope this helps.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
None yet
None yet

No branches or pull requests

4 participants