Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edit suggestions for metagenomics-binning #5721

Open
jennaj opened this issue Jan 28, 2025 · 0 comments
Open

Edit suggestions for metagenomics-binning #5721

jennaj opened this issue Jan 28, 2025 · 0 comments

Comments

@jennaj
Copy link
Member

jennaj commented Jan 28, 2025

Tutorial -> https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/metagenomics-binning/tutorial.html

Great newer tutorial!! Thanks all :)

Noticed a few problems attempting to run through this, maybe we can address these or am I misunderstanding how to use it (totally possible!)

The issues seems to be with the two steps where data is loaded into a history, then how with the very next two steps where that data is used.

Initial data upload

The data are assembled contigs but the language and hands-on actions around the upload step are with respect to paired reads. This data would just go into a list collection, not paired, correct?

Tutorial step hands-on-upload-data-into-galaxy

Screenshot. Notice how the upload data are named as reads, then the collection creation/naming step is for paired read data.

Image

First analysis step using that initial uploaded data

Naming that collection as "Raw reads" initially will disconnect it from the very next step here, where the input was named "assembly fasta files".

I think this would be more understandable, and flow better, if we used the folder icon select example and used the exact name of the list collection folder of the uploaded contigs.

Tutorial step hands-on-individual-binning-of-short-reads-with-metabat-2

Screenshot. Notice how the wrapping for "advanced options" is not on a distinct line separating that first input (that is not advanced) with the remainder of the settings (that are advanced/nested). Maybe a newline is needed or reuse a section from other tutorials that do this? There are other examples of this throughout the tutorial -- probably can fix all at the same time.

Image

Downstream upload of bins

The Upload tool rejects the links in the copy box.

Failed to fetch url https://zenodo.org/record/7845138/files/74_%20MetaBAT 2%20on%20data%20ERR2231572_%20Bins.zip. URL can't contain control characters. '/record/7845138/files/74_%20MetaBAT 2%20on%20data%20ERR2231572_%20Bins.zip' (found at least ' ')

The correct link can be captured from Zenodo, and that uploads, but when using defaults a zip datatype is assigned and there isn't a good way to get the data out into a plain text file with the correct datatype's assigned. The only "convert" option is directory and changing the datatype directly to fasta results in mismatched datatype versus content (since the file is still compressed).

Loading with the correct links, being clear about how to set the datatype, then putting that data into another list collection seems good here. Since the first upload step put the data into a simple list collection after Upload, maybe switch it up "We are going to make another list collection, and there is a faster way!" eg using the Collection Upload

https://zenodo.org/records/7845138/files/26_%20MetaBAT2%20on%20data%20ERR2231567_%20Bins.zip?download=1

Tutorial step hands-on-import-generated-assembly-files

Screenshot. If the Zenodo version of the URLs are used instead AND the datatype is specified as fasta, Galaxy will convert that to a single multi sequence fasta file per file. Also consider removing the "rename your pairs with the sample name". These are not pairs -- should this be "simplify the name of the contig files to just the sample name"? That would need to be done before making the list collection, and even faster if done at upload per file, but there could be a rename collection step (scope creep!).

Too complicated -- maybe just link to the collection operation general tutorial and expect the student to figure it out? I can't think of a cleaner way to do this but maybe the authors can. :)

The questions under here are a bit odd. The second makes sense, but the first doesn't. Do we mean how many total samples? These are not all based on that single sample name, right?

  • How many bins has been for ERR2231567 sample? -- Well, only "1", right? Not 6.
  • How many sequences are contained in the second bin? -- This is good.

Image

Then how those data load with autodetect versus specifying fasta. Notice how the zip is sort of "trapped".

Image

Then how just assigning fasta after results in a mismatched file content versus datatype.

Image

Image

Then the "good" way -- specify fasta and the file is uncompressed at upload and given the correct datatype.

Image

Using downstream uploaded bins

Maybe adjust the input to be very clear about the collection type of input (folder icon) and again use the exact list collection name given at the upload step. Also fix the wrapping with the advanced options breakout.

Tutorial step hands-on-assessing-the-completeness-and-contamination-of-genome-bins-using-lineage-specific-marker-sets-with-checkm-lineage-wf

Screenshot.

Image

Thanks all! I'd like to start helping people better with this one! If anyone has an "answer key" history they could share back, that would be great. :)

Testing history where the screenshots above are coming from

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant