On Mon, Sep 17, 2012 at 9:18 AM, Praveen Raj Somarajan <Praveen.s@ocimumbio.com> wrote:
Thanks Brad for the reply, but the format conversion sounds bad when deal with multiple samples, especially Paired-End or Mate-Pair samples. It doubles the task. Hence, I’d be more interested to provide csfasta/qual files with –f and –Q1, -Q2 options, as given in Bowtie manual (shown below)
“bowtie also handles input in the form of parallel .csfasta and _QV.qual files. Use -f to specify the .csfasta files and -Q (for unpaired reads) or --Q1/--Q2 (for paired-end reads) to specify the corresponding _QV.qual files. It is not necessary to first convert to FASTQ, though bowtie also handles FASTQ-formatted colorspace reads (with -q, the default)”
Why should the system spend time in converting the files when the tool itself provide the capability of accepting the original formats.
Pl share your thoughts.
Raj
Since Bowtie itself supports colorspace FASTA+QUAL, in theory the Galaxy wrapper could too. Galaxy does have file formats "csfasta" and "qualsolid" define, neither of which is currently used here - just "fastqcssanger" (FASTQ color-space, Sanger encoding): https://bitbucket.org/galaxy/galaxy-central/src/fe12d92febf9/tools/sr_mappin... However, the fact that this requires twice the number of input files would make this quite complex to implement - and also harder for the end user to use. Going to (colorspace) FASTQ as early as possible simplifies data management (you don't have to keep the two files in sync) and as a bonus saves you disk space (QUAL is very inefficient). If you are linking your Galaxy directly to your sequencing LIMS (as some people are for Illumina at least), doing conversion to FASTQ as part of that would make a nicer end user experience. Peter