FASTQ joiner fails to join PE data.
Hi, I have HiSeq2000 paired end sequence data in two separate FASTQ files. I need to filter the low quality scored sequences from my data to have a good assembly. So I decided to join the PE reads and then filter the low quality sequences in Galaxy. To do this first I groomed the data using FASTQ groomer where I kept "Sanger" as Input FASTQ quality scores type. Then I tried to join the PE sequences using FASTQ joiner. However the FASTQ joiner did not join the PE sequences but only shown the failure Info as follows *FASTQ joiner on data 8 and data 9* 0 bytes format: fastqsanger, database: ?<https://main.g2.bx.psu.edu/datasets/d08dd42f0e2ed22b/edit> Info: There were 4000000 known sequence reads not utilized. Joined 0 of 4000000 read pairs (0.00%). I am a new user and I have no idea where I am going wrong. Please suggest me how to overcome this problem. Thanks. -- ******************************************************************************************************************
Hello, The FASTQ Joiner tool is currently being updated to work with the newer sequence Id format. The progress of this change can be tracked here: https://bitbucket.org/galaxy/galaxy-central/issue/677/update-joiner-tool-to-... Meanwhile, quality filtering can be done on each file independently, then to synch up the two files (in case sequences are lost in one of the files for quality reasons), a work-around method is: - "NGS: QC and manipulation -> FASTQ to Tabular" on both files - "Join, Subtract and Group -> Join two Datasets" on c1 from both files "Keep lines of first input that do not join with second input:" as yes "Keep lines of first input that are incomplete:" as no "Fill empty columns:" as no - "NGS: QC and manipulation -> Tabular to FASTQ" run twice Recreate both FASTQ files from the same tabular file. The same sequence identifier column will be used in both runs. Hopefully this helps until we have the the regular FASTQ manipulation tools updated, Jen Galaxy team On 4/3/12 12:44 PM, meganathan pr wrote:
Hi, I have HiSeq2000 paired end sequence data in two separate FASTQ files. I need to filter the low quality scored sequences from my data to have a good assembly. So I decided to join the PE reads and then filter the low quality sequences in Galaxy. To do this first I groomed the data using FASTQ groomer where I kept "Sanger" as Input FASTQ quality scores type. Then I tried to join the PE sequences using FASTQ joiner. However the FASTQ joiner did not join the PE sequences but only shown the failure Info as follows *FASTQ joiner on data 8 and data 9* 0 bytes format: fastqsanger, database: ? <https://main.g2.bx.psu.edu/datasets/d08dd42f0e2ed22b/edit> Info: There were 4000000 known sequence reads not utilized. Joined 0 of 4000000 read pairs (0.00%).
I am a new user and I have no idea where I am going wrong. Please suggest me how to overcome this problem.
Thanks.
-- ******************************************************************************************************************
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list:
http://lists.bx.psu.edu/listinfo/galaxy-dev
To manage your subscriptions to this and other Galaxy lists, please use the interface at:
-- Jennifer Jackson http://galaxyproject.org
participants (2)
-
Jennifer Jackson
-
meganathan pr