That's a neat trick, and I definitely wouldn't have thought of that approach, so thanks for that!
After I finished writing this out, I realized it was super long. So here are the questions I'm asking up front, so you can choose whether or not to read the details. Thanks!
1. How do I output the quality scores when converting from FASTQ to FASTA?
2. Does the SAM-to-interval tool output only mapped reads by looking at the flag values?
3. Why am I getting the mentioned error and is there a way to resolve it?
Here are the details:
- I don't see an option to output both the sequences and the quality scores. I found two FASTQ-to-FASTA converters (one under the "Convert Formats" and the other in the FastX Toolkit) and both only output one fasta file with the sequences. Am I missing something, or should I be using some other tool to output both the sequences and the quality scores?
- The Extract Genomic Sequences tool seems to want an Interval file as input, not a list of IDs. Does that mean I should convert the filtered SAM output to Interval? Currently I'm using the SAM-to-interval conversion to extract the mapped reads and make the data more manageable in one step (pretty sure I picked that up from one of the tutorials...). I was assuming that by definition it could only output an interval if it was mapped, and if so, I wouldn't be able to convert the unmapped reads to Interval anyway. Is that wrong?
- I was setting up a workflow with Bowtie and I noticed that the Workflow Editor does show options to output unmapped reads. But when I try to output them, I get this error:
"Error due to input mapping of 'Compute quality statistics' in 'output_unmapped_reads_l'. A common cause of this is conditional outputs that cannot be determined until runtime, please review your workflow."
Superficially, this seems silly. Obviously a "conditional output" will not be determined until runtime because it's dependent on something else. So why is that an error? I have tried outputting to a few different tools, so it doesn't seem to be specific to the tool into which the unmapped reads go (in this case, "Compute Quality Statistics").
Any thoughts, insights, or even other approaches to the original problem would be great. Currently, I'm thinking my best bet is to filter out the unmapped reads locally with a Perl script and re-upload, but that felt like overkill and time-consuming when I will inevitably want to tweak or re-run things. Also, installing a local instance is currently not an option for me (though it should be in a few months). In any case, I appreciate your help a lot!
Thanks, again!
Mayank Tandon