Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failure on empty input sequence file #193

Closed
nick-youngblut opened this issue May 19, 2023 · 3 comments
Closed

failure on empty input sequence file #193

nick-youngblut opened this issue May 19, 2023 · 3 comments

Comments

@nick-youngblut
Copy link

Bug Description

Error:

Plugin error from feature-classifier:
    Command '['grep', '-c', '^>', '/tmp/qiime2/root/data/959c1e2d-7e0c-4ceb-ac4e-f24d1e8782ef/data/dna-sequences.fasta']' returned non-zero exit status 1.

Why is grep used instead of just loading the file and iterating through the lines? I get that it's a bit slower than grep, but the file size shouldn't be very big.

An advantage of using python to count sequences versus grep is that one gets a better stack trace of the issue versus the current code:

    # we really only want to calculate this if running in parallel
    if n_jobs != 1:
        seq_count = subprocess.run(
            ['grep', '-c', '^>', str(reads)], check=True,
            stdout=subprocess.PIPE)
        # set a max value to avoid blowing up memory
        return min(int(ceil(int(seq_count.stdout.decode('utf-8')) / n_jobs)),
                   20000)

The error only occurs on AWS Batch (works when running locally using the same docker image). I've provided >400 Gb of memory, so that is not the issue, and there should be plenty of disk space.

*Steps to reproduce the behavior

I'm using quay.io/qiime2/core:2023.2 for the docker image, and running q2-classifier on AWS Batch via Nextflow (with Fusion & Wave).

Computation Environment

quay.io/qiime2/core:2023.2

@nick-youngblut
Copy link
Author

The error is due to no sequences in the input qza file, which in this case, was caused by a failure of the upstream job to produce sequences for conversion to qza and subsequent classification with the q2-classifier.

It appears that the q2-classifier assumes >=1 sequences in the input qza. Maybe adding a check for sequences would be helpful.

@gregcaporaso gregcaporaso changed the title '['grep', '-c', '^>', ...]' failing when q2-cls run on AWS Batch failure on empty input sequence file Sep 25, 2023
@gregcaporaso
Copy link
Member

@nick-youngblut, did you encounter this when running classify-sklearn, or something else?

@gregcaporaso
Copy link
Member

This seems to be the same issue as #175. I'm going to close this in favor of that issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants