Hi, I am trying to join two groomed fastq files from a paired-end Illumina read using the fastq joiner tool. The drop-down menus correctly identify the groomed fastq files, but after cranking for a few minutes the tool produces empty output: "FASTQ joiner on data 5 and data 4emptyformat: fastqsanger, database: ?Info: There were 3497909 known sequence reads not utilized.Joined 0 of 3497909 read pairs (0.00%)." The files have the same number of reads (3497909), reads have the same number of bases (102), and the joiner tool doesn't have any options (other than choosing the two files to join). I have tried this with Sanger and with Illumina 1.3+ quality scores, and in both (left-right) orientations. I've pasted the beginnings of the two files below my signature in case this is useful for diagnosing the problem. Can anyone tell me what I'm doing wrong? Thanks, -- Matthew D. Herron, PhD Department of Zoology University of British Columbia X.princeps@gmail.com http://www.eebweb.arizona.edu/grads/mherron/ Sample of read 1: @HWI-ST765:83:D091AACXX:1:1101:1202:2130 1:N:0:ACTTGA TTATTCCGTTTACCTTCACGCTGTTATGGCTCTCGGTGTTCGGCAATAGCGCGCTGTATGAAATTATCCACGGCGGCGCGGCATTTGCCGAGGAAGCGATG + CCCFFFFFHHHHHJJJJJJJJJJJJIJIJJJIEHII?FGIIJJJJJJJIJJJHFFDDEEEDEDDDDEDDDDDDDDDDDDDDDDDDD@CCD3<BDD@DDDBD Sample of read 2: @HWI-ST765:83:D091AACXX:1:1101:1202:2130 2:N:0:ACTTGA ATTCCCCAGCACCAGCGCCCCGGAGTCCGCCGAGGTCACATAAAACAGCAGGCCAGTAATGGTGGCGACGGAGGCGCTAAAGGTAAACGCCGGATACTGCG + CCCFFFFFFGHHHJJJJJJJJJIJIFHIIJJJJIG:BEFFFFEEEDDCBDDDDDDD@CDDDCACDDDBDDDDDDBDDDDDDDC>@CCC@@DDDDDDDDEDB
I'm having the same issue (though with interlacer). I suspect that it's an issue with the way the forward and reverse are read. Mine look like yours, where forward is 1:N and reverse is 2:N instead of the /1 and /2 that the tool says that it expects. Our data are Illumina pipeline 1.9 (HiSeq), so maybe that's the problem? I don't actually know, however, or how to fix this. Just interesting to have run into this problem today and then seen your email. -Lucinda On Tue, Nov 8, 2011 at 2:19 PM, Matthew Herron <xprinceps@gmail.com> wrote:
Hi, I am trying to join two groomed fastq files from a paired-end Illumina read using the fastq joiner tool. The drop-down menus correctly identify the groomed fastq files, but after cranking for a few minutes the tool produces empty output:
"FASTQ joiner on data 5 and data 4emptyformat: fastqsanger, database: ?Info: There were 3497909 known sequence reads not utilized.Joined 0 of 3497909 read pairs (0.00%)."
The files have the same number of reads (3497909), reads have the same number of bases (102), and the joiner tool doesn't have any options (other than choosing the two files to join). I have tried this with Sanger and with Illumina 1.3+ quality scores, and in both (left-right) orientations. I've pasted the beginnings of the two files below my signature in case this is useful for diagnosing the problem. Can anyone tell me what I'm doing wrong? Thanks, -- Matthew D. Herron, PhD Department of Zoology University of British Columbia X.princeps@gmail.com http://www.eebweb.arizona.edu/grads/mherron/
Sample of read 1: @HWI-ST765:83:D091AACXX:1:1101:1202:2130 1:N:0:ACTTGA TTATTCCGTTTACCTTCACGCTGTTATGGCTCTCGGTGTTCGGCAATAGCGCGCTGTATGAAATTATCCACGGCGGCGCGGCATTTGCCGAGGAAGCGATG + CCCFFFFFHHHHHJJJJJJJJJJJJIJIJJJIEHII?FGIIJJJJJJJIJJJHFFDDEEEDEDDDDEDDDDDDDDDDDDDDDDDDD@CCD3<BDD@DDDBD
Sample of read 2: @HWI-ST765:83:D091AACXX:1:1101:1202:2130 2:N:0:ACTTGA ATTCCCCAGCACCAGCGCCCCGGAGTCCGCCGAGGTCACATAAAACAGCAGGCCAGTAATGGTGGCGACGGAGGCGCTAAAGGTAAACGCCGGATACTGCG + CCCFFFFFFGHHHJJJJJJJJJIJIFHIIJJJJIG:BEFFFFEEEDDCBDDDDDDD@CDDDCACDDDBDDDDDDBDDDDDDDC>@CCC@@DDDDDDDDEDB ___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Lucinda Lawson Postdoctoral Research Computational Biologist USDA-ARS Gainesville, FL
Hello Matthew, Lucinda is correct, this tool does not interpret the new ID format correctly. I have opened a bitbucket ticket to track the issue: https://bitbucket.org/galaxy/galaxy-central/issue/677/update-joiner-tool-to-... For now, there is a work-around: 1 - Make certain to run the FASTQ Groomer with input quality scores set to "Sanger" and leave the rest of the form options as default. 2 - Use the tool "NGS: QC and manipulation -> FASTX-Toolkit for FASTQ data -> Rename sequences" to set the sequence names as "numeric". Do #1 & #2 for each file. 3 - Run Joiner with the file orders as appropriate for left/right. Thanks for reporting the issue! Best, Jen Galaxy team On 11/8/11 11:19 AM, Matthew Herron wrote:
Hi, I am trying to join two groomed fastq files from a paired-end Illumina read using the fastq joiner tool. The drop-down menus correctly identify the groomed fastq files, but after cranking for a few minutes the tool produces empty output:
"FASTQ joiner on data 5 and data 4emptyformat: fastqsanger, database: ?Info: There were 3497909 known sequence reads not utilized.Joined 0 of 3497909 read pairs (0.00%)."
The files have the same number of reads (3497909), reads have the same number of bases (102), and the joiner tool doesn't have any options (other than choosing the two files to join). I have tried this with Sanger and with Illumina 1.3+ quality scores, and in both (left-right) orientations. I've pasted the beginnings of the two files below my signature in case this is useful for diagnosing the problem. Can anyone tell me what I'm doing wrong? Thanks,
-- Jennifer Jackson http://usegalaxy.org http://galaxyproject.org/wiki/Support
participants (3)
-
Jennifer Jackson
-
Lucinda Lawson
-
Matthew Herron