hello,
I am trying to use the FastX tools on the Main server to trim and manipulate FASTQ files extracted directly from an sff file (again using the main galaxy web server). In using Clip and the Reverse compliment tools, I get the following error:
An error occurred running this job: fastx_reverse_complement: (or fastx_clip) found invalid nucleotide sequence (gactGCGACTCACGTACAGCAATGCACATACTATATTATATC) on line 2
gzip: stdout: Broken pipe
I have downloaded the fastx tool kit and installed it locally on my computer and experience the same problem, so I suspect that it is a format error. My fastq file looks like this:
@HA3HTSF01AH1DX
gactGCGACTCACGTACAGCAATGCACATACTATATTATATCAACCtcacaacacactacacgacacacaggagagagnnn
+
IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIFFAAB88..--,,/152///0///14,,,,///00//////!!!
@HA3HTSF01B3YMI
gactTTTGGGATTTGTTAACTTAGTTTGTAGGTGGTGGTTATAATGTTTGTGTGGTGTTATGTTTGGTGTAGTGttttgggttggtttgtattatttgtggatttggagttcgtgggtaggcaaggcacactagggggattaggtngngtnntntntttnntgntgtgntctgtgtggttgtggtgattttttactttgagtgttttgcgatttgggtgtagtaattagtcggtttttacgtaatttttaggttttgtggggttttgatttgggtttaggttttaatgcttttgatggtgggtgggggttcttacacgggtttaggttagagttggttgggctatgcatatgggtgtttgggggtttcttatttgggatgttttaggtgccatgtaatatttgtttangcgagtcgttttggttcttttaattttattacttgttttaggattgtttttaggtttaatagaaaatttntgactttttttaagggtttgggttaacatggcagttttttggggtttgggaattttttagtacgttagtttaggttaggttggttaggttagttagttagttgttgttggttgttagttgtttggttggttaggtttggttttgggttttgggttttaggttttnnnnnnnnnnn
+
??<//00==BA@>>A=BBCFDD:9444GGIIIIEIIIIIIIIIIG9444GGIIEGFDDEIB8443?;:;;@880----0000088001A@EEEECCCGEEEEB@?<<=900033--,/7..000511156675,,,,,,01/,,,!5!11!!0!,!,//!!40!0588!6665511111/,,,,/11,,,,,,,,--/:688<7<77>>>;333---13511///53////////111132.11,,,,,,,,,,,,/,444666772886::43.,,,..,,,,,,066....565...,,,,111....11111.,,,,,,,.2222442,,--------8777:::99900-2----,,,,,,,,-,,,233,,-774-----099564422000-,,,,-,,,,!-02222223433..,,.33,,,,22226//5888,,,,,,446622---..0,,,,0110220000,,,!.222,++++++++---001/-1--..33331001------++++++++++00++++++++++++++---00------+++++++++011111000000------0+++00000+++00--+++00----+++00000--++++0++++++-+++++++--++++!!!!!!!!!!!
@HA3HTSF01BHUOB
gactCATGCTTCAGATCAAGCAAGTCTTCGCTGACTACGTGCGCCATCGCCGCGAGGcgcagcacggaagtaggagcggcgtcagctcgccaacatgacgcgcgaattccacgaagagacacaaagacaccttgggaacctcttcaagtacgcggaggagaagatacgccgcgtcgcacaggaggagcacaaggcacacaggggataggnn
+
IIIIIIIIIIIGIIIGG@@@H>>>@IIIIIIIIIIIIIIIIIIIIIIIIEECGGE74426<AA812211/5/111/53385;::;AE>?330113==AACCC<=993333923333<DEGHF:::GIIIIIIIGIIIFCCEBB?:99<9;??=?99//053337<456333;?D@@<;==?943355@@@998895546:4444:4...!!
@HA3HTSF01ASNA7
gactATTGATATATTGCAACAGCTATCCAAATACACAAATATAATACTTGTTATAGGCAAATGCGACATTTTTGAGGCGAGTGAGTTGTTGCAATATATCAAtagtcgtgggaggcaaggcacacaggggatagg
+
IIIIIIIIIIIIIIIIIIIIIIIIIC<<111?1G??111D5IIIIIIIIIIIIIIIIIBBBIIIDIC?00000BBDCFCFFIIIIIIHHHIIIIIHHHIIIIIIIIIHH555GHIIIIIIIIIIIIFHHHIIIII
Can someone tell me what the format should be? I have tried converting the fastq file into a different format, to no avail. And I have extracted the FASTA and FASTA.qual files directly from the sff file, recombined them together, and still had the program fail to clip or reverse compliment the sequences.
So, i am at a loss as to what the format should look like. All of the extractions were done directly on the sff file uploaded to galaxy.
Thanks
James Borrone