Dear Galaxy Team and users, I have some 454 reads that I would like to map against a contig assembly using LASTZ. I have already mapped the reads uploaded in fasta format against the assembly but, as mapping the reads in fasta ignores the base qualities that would be present in a fastq file, I am concerned that I might need the base quality information that may be crucial in deciding on 'real' SNPs later down the line. So, as LASTZ apparently recognises fastq (*see below), I converted the reads to fastq using the 'Combine fasta and qual' tool in Galaxy and I am now currently trying to map the reads to the assembly. However, Galaxy would not recognise the fastq reads in the LASTZ input page. So I tried to fool it by changing the data type of the fastq to fasta using the 'Edit attributes' function of the history. This kept the fastq info but allowed Galaxy to recognise the file as input for LASTZ. However, this mapping has been running for almost 24 hours now and so I am concerned that there is an error. Is anyone able offer any help with why Galaxy does not recognise the reads in fastq format prior to mapping with LASTZ? Here are what the first two reads look like in fastq format: @GIQ547K01A7QJK length=76 xy=0381_0142 region=1 run=R_2010_06_11_16_16_09_ GCTTCGTGTGCGACGACACTCGTCATCGACAACGCAAGACTGGCGCTATCGCAATTGGACACACAACATGTGACCG + 27 19 17 17 18 19 11 14 14 17 19 17 22 17 17 14 14 14 17 19 17 19 17 17 19 19 25 25 22 17 12 13 14 19 21 21 21 27 21 19 19 17 17 19 17 24 25 22 20 22 22 17 16 16 12 12 12 12 19 22 17 17 17 20 20 22 27 21 25 22 20 20 22 21 16 12 @GIQ547K01AE4BG length=40 xy=0055_0266 region=1 run=R_2010_06_11_16_16_09_ GTGACTAGATACATGCAATCAATTGTCCATGTCATTCGAG + 27 23 23 19 19 19 18 19 21 19 18 19 18 25 27 27 26 26 27 27 27 27 27 19 19 18 19 19 27 27 25 24 25 25 21 21 22 22 22 18 * Input formats (copied from the LASTZ input page in Galaxy) LASTZ accepts reference and reads in FASTA format. However, because Galaxy supports implicit format conversion the tool will recognize fastq and other method specific formats. With thanks, Chris
Hello Chris, Using a fasta file as input for LASTZ is the correct way to run the tool right now. You discovered a small mismatch between our documentation and the version of the LASTZ wrapper on the main Galaxy instance. Fastq is not directly accepted (as the documentation states), so either using an original fasta file or using the "FASTQ to FASTA converter" tool would be required before running LASTZ. Currently, LASTZ itself does not use quality scores for the alignment process, but will pass these values (ascii format) into the output file (SAM) for use in downstream analysis. The public Galaxy instance will likely be updated in the future to support this option, but there is no set time-line. To be clear, the alignment results themselves would be the same with or without the quality scores being passed through a fastq input/SAM output. Our apologies for the confusion! Best, Jen Galaxy team On 9/26/11 7:49 PM, Chris.Howard@csiro.au wrote:
Dear Galaxy Team and users,
I have some 454 reads that I would like to map against a contig assembly using LASTZ. I have already mapped the reads uploaded in fasta format against the assembly but, as mapping the reads in fasta ignores the base qualities that would be present in a fastq file, I am concerned that I might need the base quality information that may be crucial in deciding on ‘real’ SNPs later down the line. So, as LASTZ apparently recognises fastq (*see below), I converted the reads to fastq using the ‘Combine fasta and qual’ tool in Galaxy and I am now currently trying to map the reads to the assembly. However, Galaxy would not recognise the fastq reads in the LASTZ input page. So I tried to fool it by changing the data type of the fastq to fasta using the ‘Edit attributes’ function of the history. This kept the fastq info but allowed Galaxy to recognise the file as input for LASTZ. However, this mapping has been running for almost 24 hours now and so I am concerned that there is an error.
Is anyone able offer any help with why Galaxy does not recognise the reads in fastq format prior to mapping with LASTZ?
**
Here are what the first two reads look like in fastq format:
@GIQ547K01A7QJK length=76 xy=0381_0142 region=1 run=R_2010_06_11_16_16_09_
GCTTCGTGTGCGACGACACTCGTCATCGACAACGCAAGACTGGCGCTATCGCAATTGGACACACAACATGTGACCG
+
27 19 17 17 18 19 11 14 14 17 19 17 22 17 17 14 14 14 17 19 17 19 17 17 19 19 25 25 22 17 12 13 14 19 21 21 21 27 21 19 19 17 17 19 17 24 25 22 20 22 22 17 16 16 12 12 12 12 19 22 17 17 17 20 20 22 27 21 25 22 20 20 22 21 16 12
@GIQ547K01AE4BG length=40 xy=0055_0266 region=1 run=R_2010_06_11_16_16_09_
GTGACTAGATACATGCAATCAATTGTCCATGTCATTCGAG
+
27 23 23 19 19 19 18 19 21 19 18 19 18 25 27 27 26 26 27 27 27 27 27 19 19 18 19 19 27 27 25 24 25 25 21 21 22 22 22 18
**
**
** Input formats (copied from the LASTZ input page in Galaxy)*
LASTZ accepts reference and reads in FASTA format. However, because Galaxy supports implicit format conversion the tool will recognize fastq and other method specific formats.
With thanks,
Chris
**
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/Support
participants (2)
-
Chris.Howard@csiro.au
-
Jennifer Jackson