6-1 ACCATTCGAGCATAC 7-1 +TGATTTCCAGAGCCAAT +>8-1 +TTACCTCACGATATTGTAATA +>9-1 +TGTATTTACAATGACTAGAAA +>10-1 +CCTTGTAGTGGATTCTGATGA +>11-1 CGATTGCCGAAGTCTACCA ->8-5 -TTCAACGCCGCCGTGAAC ->9-1 -ATGACTTCATCGTCCACCCTTTAGAACT ->10-15 -AGTACAAGGACATGC ->11-1 -TCAAATTCTAGATTTTTACGG 12-1 -TGTATTTACAATGACTAGAAA +ATGACTTCATCGTCCACCCTTTAGAACT \ No newline at end of file diff -r 0f97b3048bc3 -r 40c5e1853a66 test-data/fasta_formatter1.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/fasta_formatter1.fasta Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,100 @@ +>Scaffold3648 +AGGAATGATGACTACAATGATCAACTTAACCTATCTATTTAATTTAGTTC +CCTAATGTCAGGGACCTACCTGTTTTTGTTATGTTTGGGTTTTGTTGTTG +TTGTTTTTTTAATCTGAAGGTATTGTGCATTATATGACCTGTAATACACA +GTATAACTTTTCAAATACTTTTGTTTTACAACTTTTCTCTCTGGACTTAT +ATTAAAGTCAATTTTAATGAACATGTAGTAAAAACTAATACATGTACATC +TACAGTTTATTTATTTTTTTCTTCTTCTTTTTGTATTTCTTGTGTTACAT +TATTTCACTTCACGTTCATGTTACCAACCTTGCCCCCTTGCTTTCCATGC +AAAAAAAGAAAAAAAAGAAGCAATACTTACACTTACCCTTGAGATATCTT +GATCTGAATGCTTTAACATTCTATATGTACAATAAATTTTTGTATCTATA +GCCTATTATTATATATGTTGCTATGTCAGGCACATTGACAACATTCTCAG +AAGGTTAGAAGATGGTATTGTTCTGAAATGCCTGGAATGCCTTGTGAACT +AAGATGATTACTCATGTCATTAAAGTCCCCTAACCCAGGTATTTCCTCCT +TCCCATGACGAAAACAGTCCATTTAAACTTCACCCCACTTTGGACCCGAA +AGTGGGGTGCATTTTGGTGGTAAGCTCACCACAGAGCAAGAGAGAGTTAG +AGTCCCTAATCTGCAGTGTAAACAAACTTTGCCAGGACATCACCAGCCCA +ACCTTGATAAGTACTGCTTGGAACTCCTCCATGATGTTCTAGTCTTATTC +GCAGTCTCATATAGGTTCGGATTTTGTCCATTCTCATAGCTACCAGTATA +CATGGGAGATGCCAGTTTCATCTTCCTTGCTTCACTTTATAAGCATAGTT +ATATCANGAACTTCCTGGTTATAATTATGTTCCTTTCAAGTTTCATCATA +ATTGTCTAGTTCGATATAGTACATGGACACAATTAAATATGATATTGTCT +>Scaffold9299 +CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG +TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG +GAAAAGCATCCTTGTTTGTTTCACTATGCTTTTTAATGGTTGACGTTAAa +ggtaaagaccagtattggaaacgccccaatttcaaaaaatgaaatggaag +ctctcattaccaatcatgtgaaagaatatgttttgactaatacatgatga +taaaaaaattgccgggaaaccgcctactaattcatatatttagtaaattt +gtttctctcatggtctgtgagagatatagggtagtcccatatacatcttt +ctgtgtatagtgcttgtaactttacgaagaatgggccaaatttcttatca +ttttgatgattccagaaccttgcagatgcgagatggtagatgatcaacct +tttctgatcgattccataacgtttctttcacaatgcaatcgcatgaccat +aactggtctttacctTTAAGTTGTAGGTCTTAATTGATAACACTATATAG +TTTTTTTCTTTTTACTGTTTTTATTAATGACCTCTGTAATTTGCCCTATT +GTGAAAATACTAAAATATGTTTATACGCCGATGATGCGGCAATATTTTGC +CAAGGCAAAGAAATTGCCCTTGTTGAGAAAACTCTTAAATGTGAGTTTAA +AAAAATAGTTGATCACATTGAAAAAGATGACTTAATGTTGAATATCAAGA +AGTGTAAGATCATGTTATTTGGGACAAGAAAACGAATCAAAAATCAAAGT +GTACGCTTGATTTACAGAGATAATGTTATCGAAGTTGTAAATGAATTTAA +ATATCTTGGTGTATTATTTGATAATTATTTAAAGTGGGATATACATATAT +CGAAAACTGCCTCCAAAATATCTAGAACCATATCATGTATAAAACGAATT +AAATATTATTTGCCGAAAAGAATTTTAAAATTGTTATATGATAGTTTGAT +ATTGTCACATATTGACTACGGTATTGTTTTGTGGGGATGTTCAGCAAAGT +GTCATTTGGAAAAGTTACAAAAGTTACAAAATCGTTATGCCCGTTTAATA +CTAAACGTAGATATTTTGACACCTCGTATTATATTATTATCCTCTCTAAG +ATGGCAATCAGTTGTTCAGAGAGTGCAATACCAA +>Scaffold9309 +GAAGGAAGAAGAGGAAAATAATGATGAATTTGTAGAATTTCTATAACGTA +TGAAAACATAAACAACATGAAAAAGTATGAACCGACAGAAGAATGAAAAT +TTCAATCATATAACATGTCATTCACTTCTCTTCTCTGACTGTCAAGTATT +AGGTATTCCTTTTTATTTCCTCTTAAAATGATCATAGTTTCCTATTTCTT +TTACACCATTGGGAAGGGAATTCCAATGTTTTATGGCATTGTAATAAAAC +GAATTTCCAATACTACCTACTCTTTCTGGTAAGTTAAAGTTGAATCGGCT +ATTTCTTGTATTATAATCATGTACGTCAGTAACAAGATCGAAGTTGGATC +GAATATAATGATTCGACCTAGTATGATATATTTTATGCACGTGATGCAAT +ACGAGTTGTTTTGATCTTTGGTCGACTTCAAGAAAACCAGCTTTAGAAAG +TTCGCTGTAGCCAACATGAGTTCTTGCCTTGGACTAGAACAGTTGATAAA +TCTCACCATTTTGTTCTTTAAGATGGGTAGAAGAATCCCTGCAATCTAAA +TGGTCAATTACTGTGAAGTTATTTTTACTGGATGCACCCAATAttttttt +gataatttttttttctttgataatttttttctttttctttaataaatttt +ttggataatttttttttggataaatagttcttttttgataattctaataa +tttttttatttattttttttttttctataattttttttaaaaaatttatt +aatttttaattaaaaaaaaaataaGAGTTAACAGATTAAGGGAAACTGAC +AATTCAAAAAAAAAAAAAA +>Scaffold9310 +GCGGGGGCTGGGGAGGAAGGGGTGGCGTTATTTCACTTCCGATCTAATAC +GCTTTCTTAAGACACTGAAATATCAGTAGGTATTGGTATAGAGAATTACT +TTTTATTTTTAATTAAAACATTATCGAAATGAAGATACAGAGAAAAACGA +TGAGATGTAAGAAGTGCGCGTATTTAtgtgtgtgggtgcgtgtgtgtgtg +tgtgtgtgtgttgtgtgcgtgcgtgtgtgtggtggtgtgtACTAATTTTG +ATGTGTGTTGTGGCACAATTGCAATCATCAGTATCTTCATGAAAATGATA +ACCAGAAGCACAAAAAGGAGGgtgcgtgtgtgtgtgtgtgtgtttagtgt +gcgtgcgtgtgagggtgtttaagtgtgtatgtCGGAAATGTGGCACAATT +GCAATCATCTGTATCTTCATGAAAATGATAACCAGAAGAACAAAAAAAAA +AAACATTGAGAGAACATGTTTTTTTGATGGAAGACAAGAAGTTCTCGTAA +CGTAGGATCTCCGAGACATGATGGGGTCAACTTAAAAAGAGAGCAGTGAG +AGGCATTTATATCGAAGGTCAGGGAAAGGCAAACAAAGAAAGAAAAAAAA +AAGGCTCACAGGAGAACGAAAACACGGGCCAAAATAATAAACAGGAGCAA +GTGAACGGGCAGTTTGGTAGCTACTTCATTTACCGGCTTTTAAaggtact +atgtcccatttgcaggtcaaaaaaaatgaaaaagttaaattccaactgca +tttgaaagataatactaatttacaacttccctaaaaaaggtggggcttga +aaatgtcttcaagtgcggaaaataacgactattagttgtcaaatcgactt +tagggCTATAGAGCCCAAAAGTAATAGTCTTGA +>Scaffold11911 +TTCTTGGCACCCCCCCCCCCCCCACACTCCTGCACTGAAGAACTACTCAA +GTTTAAACTTTGCATTGCTTTTCTTTCTTTTTCAGTATTTTTTGCTTGGT +ACATGTTTCTCTTAATATCTGTCGTATAGatttttaatatttttatttat +atCTACGTCAATCTGGCTGttctttttcttgtcttctttttttttctctc +tcttttttttcctcgtattttGTATTGATCCTTACCCTAGTTTTTGAACT +TGAACAGCAATTTGCAGCACTCAAATTTCTTTAAAATTACCTTCTCTTAT +TTGtctctgttcccctctccccccctctctctctctctctctctctctct +ctctctctctttcATCTCCCATATCATAATTTGAAGTACCATCTATGGTG +TTTTCAGATTGATCTTTCTTGCTTTCCCCACCCTCCCCCTTTATGCAGTT +AATTTTCAGTCTATTTGTGTTTTCTGTGGTTGATTCTAATCATATTCTAA +CTCTTATTTTACATTTTACTTCACTAACAACTGGTTTATTATATTTGTTA +CTAATTTTGAATTAAACTATTTACCATTCTGAACGAACTGAAAGATTAAA +GATCAAACTATCTATGAATAGAATGGTATTTCTTCAATTTATTCAAATTT +CTCTCTCTTTAACCCCCTTTTTCTGCTTGCATTTTTATCCCTTTGCCGTG +GACTTCACTGGATATTTTGCTTTGATGCCAATCCAACAATTTTGCATATA +TTA diff -r 0f97b3048bc3 -r 40c5e1853a66 test-data/fasta_formatter1.out --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/fasta_formatter1.out Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,10 @@ +>Scaffold3648 +AGGAATGATGACTACAATGATCAACTTAACCTATCTATTTAATTTAGTTCCCTAATGTCAGGGACCTACCTGTTTTTGTTATGTTTGGGTTTTGTTGTTGTTGTTTTTTTAATCTGAAGGTATTGTGCATTATATGACCTGTAATACACAGTATAACTTTTCAAATACTTTTGTTTTACAACTTTTCTCTCTGGACTTATATTAAAGTCAATTTTAATGAACATGTAGTAAAAACTAATACATGTACATCTACAGTTTATTTATTTTTTTCTTCTTCTTTTTGTATTTCTTGTGTTACATTATTTCACTTCACGTTCATGTTACCAACCTTGCCCCCTTGCTTTCCATGCAAAAAAAGAAAAAAAAGAAGCAATACTTACACTTACCCTTGAGATATCTTGATCTGAATGCTTTAACATTCTATATGTACAATAAATTTTTGTATCTATAGCCTATTATTATATATGTTGCTATGTCAGGCACATTGACAACATTCTCAGAAGGTTAGAAGATGGTATTGTTCTGAAATGCCTGGAATGCCTTGTGAACTAAGATGATTACTCATGTCATTAAAGTCCCCTAACCCAGGTATTTCCTCCTTCCCATGACGAAAACAGTCCATTTAAACTTCACCCCACTTTGGACCCGAAAGTGGGGTGCATTTTGGTGGTAAGCTCACCACAGAGCAAGAGAGAGTTAGAGTCCCTAATCTGCAGTGTAAACAAACTTTGCCAGGACATCACCAGCCCAACCTTGATAAGTACTGCTTGGAACTCCTCCATGATGTTCTAGTCTTATTCGCAGTCTCATATAGGTTCGGATTTTGTCCATTCTCATAGCTACCAGTATACATGGGAGATGCCAGTTTCATCTTCCTTGCTTCACTTTATAAGCATAGTTATATCANGAACTTCCTGGTTATAATTATGTTCCTTTCAAGTTTCATCATAATTGTCTAGTTCGATATAGTACATGGACACAATTAAATA TGATATTGTCT +>Scaffold9299 +CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGGAAAAGCATCCTTGTTTGTTTCACTATGCTTTTTAATGGTTGACGTTAAaggtaaagaccagtattggaaacgccccaatttcaaaaaatgaaatggaagctctcattaccaatcatgtgaaagaatatgttttgactaatacatgatgataaaaaaattgccgggaaaccgcctactaattcatatatttagtaaatttgtttctctcatggtctgtgagagatatagggtagtcccatatacatctttctgtgtatagtgcttgtaactttacgaagaatgggccaaatttcttatcattttgatgattccagaaccttgcagatgcgagatggtagatgatcaaccttttctgatcgattccataacgtttctttcacaatgcaatcgcatgaccataactggtctttacctTTAAGTTGTAGGTCTTAATTGATAACACTATATAGTTTTTTTCTTTTTACTGTTTTTATTAATGACCTCTGTAATTTGCCCTATTGTGAAAATACTAAAATATGTTTATACGCCGATGATGCGGCAATATTTTGCCAAGGCAAAGAAATTGCCCTTGTTGAGAAAACTCTTAAATGTGAGTTTAAAAAAATAGTTGATCACATTGAAAAAGATGACTTAATGTTGAATATCAAGAAGTGTAAGATCATGTTATTTGGGACAAGAAAACGAATCAAAAATCAAAGTGTACGCTTGATTTACAGAGATAATGTTATCGAAGTTGTAAATGAATTTAAATATCTTGGTGTATTATTTGATAATTATTTAAAGTGGGATATACATATATCGAAAACTGCCTCCAAAATATCTAGAACCATATCATGTATAAAACGAATTAAATATTATTTGCCGAAAAGAATTTTAAAATTGTTATAT GATAGTTTGATATTGTCACATATTGACTACGGTATTGTTTTGTGGGGATGTTCAGCAAAGTGTCATTTGGAAAAGTTACAAAAGTTACAAAATCGTTATGCCCGTTTAATACTAAACGTAGATATTTTGACACCTCGTATTATATTATTATCCTCTCTAAGATGGCAATCAGTTGTTCAGAGAGTGCAATACCAA +>Scaffold9309 +GAAGGAAGAAGAGGAAAATAATGATGAATTTGTAGAATTTCTATAACGTATGAAAACATAAACAACATGAAAAAGTATGAACCGACAGAAGAATGAAAATTTCAATCATATAACATGTCATTCACTTCTCTTCTCTGACTGTCAAGTATTAGGTATTCCTTTTTATTTCCTCTTAAAATGATCATAGTTTCCTATTTCTTTTACACCATTGGGAAGGGAATTCCAATGTTTTATGGCATTGTAATAAAACGAATTTCCAATACTACCTACTCTTTCTGGTAAGTTAAAGTTGAATCGGCTATTTCTTGTATTATAATCATGTACGTCAGTAACAAGATCGAAGTTGGATCGAATATAATGATTCGACCTAGTATGATATATTTTATGCACGTGATGCAATACGAGTTGTTTTGATCTTTGGTCGACTTCAAGAAAACCAGCTTTAGAAAGTTCGCTGTAGCCAACATGAGTTCTTGCCTTGGACTAGAACAGTTGATAAATCTCACCATTTTGTTCTTTAAGATGGGTAGAAGAATCCCTGCAATCTAAATGGTCAATTACTGTGAAGTTATTTTTACTGGATGCACCCAATAtttttttgataatttttttttctttgataatttttttctttttctttaataaattttttggataatttttttttggataaatagttcttttttgataattctaataatttttttatttattttttttttttctataattttttttaaaaaatttattaatttttaattaaaaaaaaaataaGAGTTAACAGATTAAGGGAAACTGACAATTCAAAAAAAAAAAAAA +>Scaffold9310 +GCGGGGGCTGGGGAGGAAGGGGTGGCGTTATTTCACTTCCGATCTAATACGCTTTCTTAAGACACTGAAATATCAGTAGGTATTGGTATAGAGAATTACTTTTTATTTTTAATTAAAACATTATCGAAATGAAGATACAGAGAAAAACGATGAGATGTAAGAAGTGCGCGTATTTAtgtgtgtgggtgcgtgtgtgtgtgtgtgtgtgtgttgtgtgcgtgcgtgtgtgtggtggtgtgtACTAATTTTGATGTGTGTTGTGGCACAATTGCAATCATCAGTATCTTCATGAAAATGATAACCAGAAGCACAAAAAGGAGGgtgcgtgtgtgtgtgtgtgtgtttagtgtgcgtgcgtgtgagggtgtttaagtgtgtatgtCGGAAATGTGGCACAATTGCAATCATCTGTATCTTCATGAAAATGATAACCAGAAGAACAAAAAAAAAAAACATTGAGAGAACATGTTTTTTTGATGGAAGACAAGAAGTTCTCGTAACGTAGGATCTCCGAGACATGATGGGGTCAACTTAAAAAGAGAGCAGTGAGAGGCATTTATATCGAAGGTCAGGGAAAGGCAAACAAAGAAAGAAAAAAAAAAGGCTCACAGGAGAACGAAAACACGGGCCAAAATAATAAACAGGAGCAAGTGAACGGGCAGTTTGGTAGCTACTTCATTTACCGGCTTTTAAaggtactatgtcccatttgcaggtcaaaaaaaatgaaaaagttaaattccaactgcatttgaaagataatactaatttacaacttccctaaaaaaggtggggcttgaaaatgtcttcaagtgcggaaaataacgactattagttgtcaaatcgactttagggCTATAGAGCCCAAAAGTAATAGTCTTGA +>Scaffold11911 +TTCTTGGCACCCCCCCCCCCCCCACACTCCTGCACTGAAGAACTACTCAAGTTTAAACTTTGCATTGCTTTTCTTTCTTTTTCAGTATTTTTTGCTTGGTACATGTTTCTCTTAATATCTGTCGTATAGatttttaatatttttatttatatCTACGTCAATCTGGCTGttctttttcttgtcttctttttttttctctctcttttttttcctcgtattttGTATTGATCCTTACCCTAGTTTTTGAACTTGAACAGCAATTTGCAGCACTCAAATTTCTTTAAAATTACCTTCTCTTATTTGtctctgttcccctctccccccctctctctctctctctctctctctctctctctctctttcATCTCCCATATCATAATTTGAAGTACCATCTATGGTGTTTTCAGATTGATCTTTCTTGCTTTCCCCACCCTCCCCCTTTATGCAGTTAATTTTCAGTCTATTTGTGTTTTCTGTGGTTGATTCTAATCATATTCTAACTCTTATTTTACATTTTACTTCACTAACAACTGGTTTATTATATTTGTTACTAATTTTGAATTAAACTATTTACCATTCTGAACGAACTGAAAGATTAAAGATCAAACTATCTATGAATAGAATGGTATTTCTTCAATTTATTCAAATTTCTCTCTCTTTAACCCCCTTTTTCTGCTTGCATTTTTATCCCTTTGCCGTGGACTTCACTGGATATTTTGCTTTGATGCCAATCCAACAATTTTGCATATATTA diff -r 0f97b3048bc3 -r 40c5e1853a66 test-data/fasta_formatter2.out --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/fasta_formatter2.out Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,84 @@ +>Scaffold3648 +AGGAATGATGACTACAATGATCAACTTAACCTATCTATTTAATTTAGTTCCCTAATGTCA +GGGACCTACCTGTTTTTGTTATGTTTGGGTTTTGTTGTTGTTGTTTTTTTAATCTGAAGG +TATTGTGCATTATATGACCTGTAATACACAGTATAACTTTTCAAATACTTTTGTTTTACA +ACTTTTCTCTCTGGACTTATATTAAAGTCAATTTTAATGAACATGTAGTAAAAACTAATA +CATGTACATCTACAGTTTATTTATTTTTTTCTTCTTCTTTTTGTATTTCTTGTGTTACAT +TATTTCACTTCACGTTCATGTTACCAACCTTGCCCCCTTGCTTTCCATGCAAAAAAAGAA +AAAAAAGAAGCAATACTTACACTTACCCTTGAGATATCTTGATCTGAATGCTTTAACATT +CTATATGTACAATAAATTTTTGTATCTATAGCCTATTATTATATATGTTGCTATGTCAGG +CACATTGACAACATTCTCAGAAGGTTAGAAGATGGTATTGTTCTGAAATGCCTGGAATGC +CTTGTGAACTAAGATGATTACTCATGTCATTAAAGTCCCCTAACCCAGGTATTTCCTCCT +TCCCATGACGAAAACAGTCCATTTAAACTTCACCCCACTTTGGACCCGAAAGTGGGGTGC +ATTTTGGTGGTAAGCTCACCACAGAGCAAGAGAGAGTTAGAGTCCCTAATCTGCAGTGTA +AACAAACTTTGCCAGGACATCACCAGCCCAACCTTGATAAGTACTGCTTGGAACTCCTCC +ATGATGTTCTAGTCTTATTCGCAGTCTCATATAGGTTCGGATTTTGTCCATTCTCATAGC +TACCAGTATACATGGGAGATGCCAGTTTCATCTTCCTTGCTTCACTTTATAAGCATAGTT +ATATCANGAACTTCCTGGTTATAATTATGTTCCTTTCAAGTTTCATCATAATTGTCTAGT +TCGATATAGTACATGGACACAATTAAATATGATATTGTCT +>Scaffold9299 +CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCA +TAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGGAAAAGCATCCTTGTTTGTT +TCACTATGCTTTTTAATGGTTGACGTTAAaggtaaagaccagtattggaaacgccccaat +ttcaaaaaatgaaatggaagctctcattaccaatcatgtgaaagaatatgttttgactaa +tacatgatgataaaaaaattgccgggaaaccgcctactaattcatatatttagtaaattt +gtttctctcatggtctgtgagagatatagggtagtcccatatacatctttctgtgtatag +tgcttgtaactttacgaagaatgggccaaatttcttatcattttgatgattccagaacct +tgcagatgcgagatggtagatgatcaaccttttctgatcgattccataacgtttctttca +caatgcaatcgcatgaccataactggtctttacctTTAAGTTGTAGGTCTTAATTGATAA +CACTATATAGTTTTTTTCTTTTTACTGTTTTTATTAATGACCTCTGTAATTTGCCCTATT +GTGAAAATACTAAAATATGTTTATACGCCGATGATGCGGCAATATTTTGCCAAGGCAAAG +AAATTGCCCTTGTTGAGAAAACTCTTAAATGTGAGTTTAAAAAAATAGTTGATCACATTG +AAAAAGATGACTTAATGTTGAATATCAAGAAGTGTAAGATCATGTTATTTGGGACAAGAA +AACGAATCAAAAATCAAAGTGTACGCTTGATTTACAGAGATAATGTTATCGAAGTTGTAA +ATGAATTTAAATATCTTGGTGTATTATTTGATAATTATTTAAAGTGGGATATACATATAT +CGAAAACTGCCTCCAAAATATCTAGAACCATATCATGTATAAAACGAATTAAATATTATT +TGCCGAAAAGAATTTTAAAATTGTTATATGATAGTTTGATATTGTCACATATTGACTACG +GTATTGTTTTGTGGGGATGTTCAGCAAAGTGTCATTTGGAAAAGTTACAAAAGTTACAAA +ATCGTTATGCCCGTTTAATACTAAACGTAGATATTTTGACACCTCGTATTATATTATTAT +CCTCTCTAAGATGGCAATCAGTTGTTCAGAGAGTGCAATACCAA +>Scaffold9309 +GAAGGAAGAAGAGGAAAATAATGATGAATTTGTAGAATTTCTATAACGTATGAAAACATA +AACAACATGAAAAAGTATGAACCGACAGAAGAATGAAAATTTCAATCATATAACATGTCA +TTCACTTCTCTTCTCTGACTGTCAAGTATTAGGTATTCCTTTTTATTTCCTCTTAAAATG +ATCATAGTTTCCTATTTCTTTTACACCATTGGGAAGGGAATTCCAATGTTTTATGGCATT +GTAATAAAACGAATTTCCAATACTACCTACTCTTTCTGGTAAGTTAAAGTTGAATCGGCT +ATTTCTTGTATTATAATCATGTACGTCAGTAACAAGATCGAAGTTGGATCGAATATAATG +ATTCGACCTAGTATGATATATTTTATGCACGTGATGCAATACGAGTTGTTTTGATCTTTG +GTCGACTTCAAGAAAACCAGCTTTAGAAAGTTCGCTGTAGCCAACATGAGTTCTTGCCTT +GGACTAGAACAGTTGATAAATCTCACCATTTTGTTCTTTAAGATGGGTAGAAGAATCCCT +GCAATCTAAATGGTCAATTACTGTGAAGTTATTTTTACTGGATGCACCCAATAttttttt +gataatttttttttctttgataatttttttctttttctttaataaattttttggataatt +tttttttggataaatagttcttttttgataattctaataatttttttatttatttttttt +ttttctataattttttttaaaaaatttattaatttttaattaaaaaaaaaataaGAGTTA +ACAGATTAAGGGAAACTGACAATTCAAAAAAAAAAAAAA +>Scaffold9310 +GCGGGGGCTGGGGAGGAAGGGGTGGCGTTATTTCACTTCCGATCTAATACGCTTTCTTAA +GACACTGAAATATCAGTAGGTATTGGTATAGAGAATTACTTTTTATTTTTAATTAAAACA +TTATCGAAATGAAGATACAGAGAAAAACGATGAGATGTAAGAAGTGCGCGTATTTAtgtg +tgtgggtgcgtgtgtgtgtgtgtgtgtgtgttgtgtgcgtgcgtgtgtgtggtggtgtgt +ACTAATTTTGATGTGTGTTGTGGCACAATTGCAATCATCAGTATCTTCATGAAAATGATA +ACCAGAAGCACAAAAAGGAGGgtgcgtgtgtgtgtgtgtgtgtttagtgtgcgtgcgtgt +gagggtgtttaagtgtgtatgtCGGAAATGTGGCACAATTGCAATCATCTGTATCTTCAT +GAAAATGATAACCAGAAGAACAAAAAAAAAAAACATTGAGAGAACATGTTTTTTTGATGG +AAGACAAGAAGTTCTCGTAACGTAGGATCTCCGAGACATGATGGGGTCAACTTAAAAAGA +GAGCAGTGAGAGGCATTTATATCGAAGGTCAGGGAAAGGCAAACAAAGAAAGAAAAAAAA +AAGGCTCACAGGAGAACGAAAACACGGGCCAAAATAATAAACAGGAGCAAGTGAACGGGC +AGTTTGGTAGCTACTTCATTTACCGGCTTTTAAaggtactatgtcccatttgcaggtcaa +aaaaaatgaaaaagttaaattccaactgcatttgaaagataatactaatttacaacttcc +ctaaaaaaggtggggcttgaaaatgtcttcaagtgcggaaaataacgactattagttgtc +aaatcgactttagggCTATAGAGCCCAAAAGTAATAGTCTTGA +>Scaffold11911 +TTCTTGGCACCCCCCCCCCCCCCACACTCCTGCACTGAAGAACTACTCAAGTTTAAACTT +TGCATTGCTTTTCTTTCTTTTTCAGTATTTTTTGCTTGGTACATGTTTCTCTTAATATCT +GTCGTATAGatttttaatatttttatttatatCTACGTCAATCTGGCTGttctttttctt +gtcttctttttttttctctctcttttttttcctcgtattttGTATTGATCCTTACCCTAG +TTTTTGAACTTGAACAGCAATTTGCAGCACTCAAATTTCTTTAAAATTACCTTCTCTTAT +TTGtctctgttcccctctccccccctctctctctctctctctctctctctctctctctct +ttcATCTCCCATATCATAATTTGAAGTACCATCTATGGTGTTTTCAGATTGATCTTTCTT +GCTTTCCCCACCCTCCCCCTTTATGCAGTTAATTTTCAGTCTATTTGTGTTTTCTGTGGT +TGATTCTAATCATATTCTAACTCTTATTTTACATTTTACTTCACTAACAACTGGTTTATT +ATATTTGTTACTAATTTTGAATTAAACTATTTACCATTCTGAACGAACTGAAAGATTAAA +GATCAAACTATCTATGAATAGAATGGTATTTCTTCAATTTATTCAAATTTCTCTCTCTTT +AACCCCCTTTTTCTGCTTGCATTTTTATCCCTTTGCCGTGGACTTCACTGGATATTTTGC +TTTGATGCCAATCCAACAATTTTGCATATATTA diff -r 0f97b3048bc3 -r 40c5e1853a66 test-data/fasta_nuc_changer1.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/fasta_nuc_changer1.fasta Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,50 @@ +>cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7 +TGAGGTAGTAGGTTGTATAGTT +>cel-lin-4 MIMAT0000002 Caenorhabditis elegans lin-4 +TCCCTGAGACCTCAAGTGTGA +>cel-miR-1 MIMAT0000003 Caenorhabditis elegans miR-1 +TGGAATGTAAAGAAGTATGTA +>cel-miR-2 MIMAT0000004 Caenorhabditis elegans miR-2 +TATCACAGCCAGCTTTGATGTGC +>cel-miR-34 MIMAT0000005 Caenorhabditis elegans miR-34 +AGGCAGTGTGGTTAGCTGGTTG +>cel-miR-35 MIMAT0000006 Caenorhabditis elegans miR-35 +TCACCGGGTGGAAACTAGCAGT +>cel-miR-36 MIMAT0000007 Caenorhabditis elegans miR-36 +TCACCGGGTGAAAATTCGCATG +>cel-miR-37 MIMAT0000008 Caenorhabditis elegans miR-37 +TCACCGGGTGAACACTTGCAGT +>cel-miR-38 MIMAT0000009 Caenorhabditis elegans miR-38 +TCACCGGGAGAAAAACTGGAGT +>cel-miR-39 MIMAT0000010 Caenorhabditis elegans miR-39 +TCACCGGGTGTAAATCAGCTTG +>cel-miR-40 MIMAT0000011 Caenorhabditis elegans miR-40 +TCACCGGGTGTACATCAGCTAA +>cel-miR-41 MIMAT0000012 Caenorhabditis elegans miR-41 +TCACCGGGTGAAAAATCACCTA +>cel-miR-42 MIMAT0000013 Caenorhabditis elegans miR-42 +TCACCGGGTTAACATCTACAGA +>cel-miR-43 MIMAT0000014 Caenorhabditis elegans miR-43 +TATCACAGTTTACTTGCTGTCGC +>cel-miR-44 MIMAT0000015 Caenorhabditis elegans miR-44 +TGACTAGAGACACATTCAGCT +>cel-miR-45 MIMAT0000016 Caenorhabditis elegans miR-45 +TGACTAGAGACACATTCAGCT +>cel-miR-46 MIMAT0000017 Caenorhabditis elegans miR-46 +TGTCATGGAGTCGCTCTCTTCA +>cel-miR-47 MIMAT0000018 Caenorhabditis elegans miR-47 +TGTCATGGAGGCGCTCTCTTCA +>cel-miR-48 MIMAT0000019 Caenorhabditis elegans miR-48 +TGAGGTAGGCTCAGTAGATGCGA +>cel-miR-49 MIMAT0000020 Caenorhabditis elegans miR-49 +AAGCACCACGAGAAGCTGCAGA +>cel-miR-50 MIMAT0000021 Caenorhabditis elegans miR-50 +TGATATGTCTGGTATTCTTGGG +>cel-miR-51 MIMAT0000022 Caenorhabditis elegans miR-51 +TACCCGTAGCTCCTATCCATGTT +>cel-miR-52 MIMAT0000023 Caenorhabditis elegans miR-52 +CACCCGTACATATGTTTCCGTGCT +>cel-miR-53 MIMAT0000024 Caenorhabditis elegans miR-53 +CACCCGTACATTTGTTTCCGTGCT +>cel-miR-54 MIMAT0000025 Caenorhabditis elegans miR-54 +TACCCGTAATCTTCATAATCCGAG diff -r 0f97b3048bc3 -r 40c5e1853a66 test-data/fasta_nuc_changer1.out --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/fasta_nuc_changer1.out Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,50 @@ +>cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7 +UGAGGUAGUAGGUUGUAUAGUU +>cel-lin-4 MIMAT0000002 Caenorhabditis elegans lin-4 +UCCCUGAGACCUCAAGUGUGA +>cel-miR-1 MIMAT0000003 Caenorhabditis elegans miR-1 +UGGAAUGUAAAGAAGUAUGUA +>cel-miR-2 MIMAT0000004 Caenorhabditis elegans miR-2 +UAUCACAGCCAGCUUUGAUGUGC +>cel-miR-34 MIMAT0000005 Caenorhabditis elegans miR-34 +AGGCAGUGUGGUUAGCUGGUUG +>cel-miR-35 MIMAT0000006 Caenorhabditis elegans miR-35 +UCACCGGGUGGAAACUAGCAGU +>cel-miR-36 MIMAT0000007 Caenorhabditis elegans miR-36 +UCACCGGGUGAAAAUUCGCAUG +>cel-miR-37 MIMAT0000008 Caenorhabditis elegans miR-37 +UCACCGGGUGAACACUUGCAGU +>cel-miR-38 MIMAT0000009 Caenorhabditis elegans miR-38 +UCACCGGGAGAAAAACUGGAGU +>cel-miR-39 MIMAT0000010 Caenorhabditis elegans miR-39 +UCACCGGGUGUAAAUCAGCUUG +>cel-miR-40 MIMAT0000011 Caenorhabditis elegans miR-40 +UCACCGGGUGUACAUCAGCUAA +>cel-miR-41 MIMAT0000012 Caenorhabditis elegans miR-41 +UCACCGGGUGAAAAAUCACCUA +>cel-miR-42 MIMAT0000013 Caenorhabditis elegans miR-42 +UCACCGGGUUAACAUCUACAGA +>cel-miR-43 MIMAT0000014 Caenorhabditis elegans miR-43 +UAUCACAGUUUACUUGCUGUCGC +>cel-miR-44 MIMAT0000015 Caenorhabditis elegans miR-44 +UGACUAGAGACACAUUCAGCU +>cel-miR-45 MIMAT0000016 Caenorhabditis elegans miR-45 +UGACUAGAGACACAUUCAGCU +>cel-miR-46 MIMAT0000017 Caenorhabditis elegans miR-46 +UGUCAUGGAGUCGCUCUCUUCA +>cel-miR-47 MIMAT0000018 Caenorhabditis elegans miR-47 +UGUCAUGGAGGCGCUCUCUUCA +>cel-miR-48 MIMAT0000019 Caenorhabditis elegans miR-48 +UGAGGUAGGCUCAGUAGAUGCGA +>cel-miR-49 MIMAT0000020 Caenorhabditis elegans miR-49 +AAGCACCACGAGAAGCUGCAGA +>cel-miR-50 MIMAT0000021 Caenorhabditis elegans miR-50 +UGAUAUGUCUGGUAUUCUUGGG +>cel-miR-51 MIMAT0000022 Caenorhabditis elegans miR-51 +UACCCGUAGCUCCUAUCCAUGUU +>cel-miR-52 MIMAT0000023 Caenorhabditis elegans miR-52 +CACCCGUACAUAUGUUUCCGUGCU +>cel-miR-53 MIMAT0000024 Caenorhabditis elegans miR-53 +CACCCGUACAUUUGUUUCCGUGCU +>cel-miR-54 MIMAT0000025 Caenorhabditis elegans miR-54 +UACCCGUAAUCUUCAUAAUCCGAG diff -r 0f97b3048bc3 -r 40c5e1853a66 test-data/fasta_nuc_changer2.fasta --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/fasta_nuc_changer2.fasta Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,50 @@ +>cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7 +UGAGGUAGUAGGUUGUAUAGUU +>cel-lin-4 MIMAT0000002 Caenorhabditis elegans lin-4 +UCCCUGAGACCUCAAGUGUGA +>cel-miR-1 MIMAT0000003 Caenorhabditis elegans miR-1 +UGGAAUGUAAAGAAGUAUGUA +>cel-miR-2 MIMAT0000004 Caenorhabditis elegans miR-2 +UAUCACAGCCAGCUUUGAUGUGC +>cel-miR-34 MIMAT0000005 Caenorhabditis elegans miR-34 +AGGCAGUGUGGUUAGCUGGUUG +>cel-miR-35 MIMAT0000006 Caenorhabditis elegans miR-35 +UCACCGGGUGGAAACUAGCAGU +>cel-miR-36 MIMAT0000007 Caenorhabditis elegans miR-36 +UCACCGGGUGAAAAUUCGCAUG +>cel-miR-37 MIMAT0000008 Caenorhabditis elegans miR-37 +UCACCGGGUGAACACUUGCAGU +>cel-miR-38 MIMAT0000009 Caenorhabditis elegans miR-38 +UCACCGGGAGAAAAACUGGAGU +>cel-miR-39 MIMAT0000010 Caenorhabditis elegans miR-39 +UCACCGGGUGUAAAUCAGCUUG +>cel-miR-40 MIMAT0000011 Caenorhabditis elegans miR-40 +UCACCGGGUGUACAUCAGCUAA +>cel-miR-41 MIMAT0000012 Caenorhabditis elegans miR-41 +UCACCGGGUGAAAAAUCACCUA +>cel-miR-42 MIMAT0000013 Caenorhabditis elegans miR-42 +UCACCGGGUUAACAUCUACAGA +>cel-miR-43 MIMAT0000014 Caenorhabditis elegans miR-43 +UAUCACAGUUUACUUGCUGUCGC +>cel-miR-44 MIMAT0000015 Caenorhabditis elegans miR-44 +UGACUAGAGACACAUUCAGCU +>cel-miR-45 MIMAT0000016 Caenorhabditis elegans miR-45 +UGACUAGAGACACAUUCAGCU +>cel-miR-46 MIMAT0000017 Caenorhabditis elegans miR-46 +UGUCAUGGAGUCGCUCUCUUCA +>cel-miR-47 MIMAT0000018 Caenorhabditis elegans miR-47 +UGUCAUGGAGGCGCUCUCUUCA +>cel-miR-48 MIMAT0000019 Caenorhabditis elegans miR-48 +UGAGGUAGGCUCAGUAGAUGCGA +>cel-miR-49 MIMAT0000020 Caenorhabditis elegans miR-49 +AAGCACCACGAGAAGCUGCAGA +>cel-miR-50 MIMAT0000021 Caenorhabditis elegans miR-50 +UGAUAUGUCUGGUAUUCUUGGG +>cel-miR-51 MIMAT0000022 Caenorhabditis elegans miR-51 +UACCCGUAGCUCCUAUCCAUGUU +>cel-miR-52 MIMAT0000023 Caenorhabditis elegans miR-52 +CACCCGUACAUAUGUUUCCGUGCU +>cel-miR-53 MIMAT0000024 Caenorhabditis elegans miR-53 +CACCCGUACAUUUGUUUCCGUGCU +>cel-miR-54 MIMAT0000025 Caenorhabditis elegans miR-54 +UACCCGUAAUCUUCAUAAUCCGAG diff -r 0f97b3048bc3 -r 40c5e1853a66 test-data/fasta_nuc_changer2.out --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/fasta_nuc_changer2.out Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,50 @@ +>cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7 +TGAGGTAGTAGGTTGTATAGTT +>cel-lin-4 MIMAT0000002 Caenorhabditis elegans lin-4 +TCCCTGAGACCTCAAGTGTGA +>cel-miR-1 MIMAT0000003 Caenorhabditis elegans miR-1 +TGGAATGTAAAGAAGTATGTA +>cel-miR-2 MIMAT0000004 Caenorhabditis elegans miR-2 +TATCACAGCCAGCTTTGATGTGC +>cel-miR-34 MIMAT0000005 Caenorhabditis elegans miR-34 +AGGCAGTGTGGTTAGCTGGTTG +>cel-miR-35 MIMAT0000006 Caenorhabditis elegans miR-35 +TCACCGGGTGGAAACTAGCAGT +>cel-miR-36 MIMAT0000007 Caenorhabditis elegans miR-36 +TCACCGGGTGAAAATTCGCATG +>cel-miR-37 MIMAT0000008 Caenorhabditis elegans miR-37 +TCACCGGGTGAACACTTGCAGT +>cel-miR-38 MIMAT0000009 Caenorhabditis elegans miR-38 +TCACCGGGAGAAAAACTGGAGT +>cel-miR-39 MIMAT0000010 Caenorhabditis elegans miR-39 +TCACCGGGTGTAAATCAGCTTG +>cel-miR-40 MIMAT0000011 Caenorhabditis elegans miR-40 +TCACCGGGTGTACATCAGCTAA +>cel-miR-41 MIMAT0000012 Caenorhabditis elegans miR-41 +TCACCGGGTGAAAAATCACCTA +>cel-miR-42 MIMAT0000013 Caenorhabditis elegans miR-42 +TCACCGGGTTAACATCTACAGA +>cel-miR-43 MIMAT0000014 Caenorhabditis elegans miR-43 +TATCACAGTTTACTTGCTGTCGC +>cel-miR-44 MIMAT0000015 Caenorhabditis elegans miR-44 +TGACTAGAGACACATTCAGCT +>cel-miR-45 MIMAT0000016 Caenorhabditis elegans miR-45 +TGACTAGAGACACATTCAGCT +>cel-miR-46 MIMAT0000017 Caenorhabditis elegans miR-46 +TGTCATGGAGTCGCTCTCTTCA +>cel-miR-47 MIMAT0000018 Caenorhabditis elegans miR-47 +TGTCATGGAGGCGCTCTCTTCA +>cel-miR-48 MIMAT0000019 Caenorhabditis elegans miR-48 +TGAGGTAGGCTCAGTAGATGCGA +>cel-miR-49 MIMAT0000020 Caenorhabditis elegans miR-49 +AAGCACCACGAGAAGCTGCAGA +>cel-miR-50 MIMAT0000021 Caenorhabditis elegans miR-50 +TGATATGTCTGGTATTCTTGGG +>cel-miR-51 MIMAT0000022 Caenorhabditis elegans miR-51 +TACCCGTAGCTCCTATCCATGTT +>cel-miR-52 MIMAT0000023 Caenorhabditis elegans miR-52 +CACCCGTACATATGTTTCCGTGCT +>cel-miR-53 MIMAT0000024 Caenorhabditis elegans miR-53 +CACCCGTACATTTGTTTCCGTGCT +>cel-miR-54 MIMAT0000025 Caenorhabditis elegans miR-54 +TACCCGTAATCTTCATAATCCGAG diff -r 0f97b3048bc3 -r 40c5e1853a66 test-data/fastq_stats1.out --- a/test-data/fastq_stats1.out Mon Sep 14 15:27:55 2009 -0400 +++ b/test-data/fastq_stats1.out Mon Sep 14 17:03:17 2009 -0400 @@ -1,37 +1,37 @@ -column count min max sum mean Q1 med Q3 IQR lW rW A_Count C_Count G_Count T_Count N_Count -1 9 23 34 288 32.00 33 33 33 0 33 33 3 1 4 1 0 -2 9 28 33 287 31.89 31 33 33 2 28 33 3 3 2 1 0 -3 9 13 34 268 29.78 28 33 33 5 21 34 5 1 0 3 0 -4 9 17 33 261 29.00 30 33 33 3 26 33 1 2 3 3 0 -5 9 22 33 269 29.89 30 33 33 3 26 33 3 3 3 0 0 -6 9 22 33 277 30.78 30 33 33 3 26 33 5 3 0 1 0 -7 9 21 33 258 28.67 24 33 33 9 21 33 4 1 3 1 0 -8 9 12 33 263 29.22 32 33 33 1 31 33 2 1 1 5 0 -9 9 29 33 290 32.22 33 33 33 0 33 33 3 3 2 1 0 -10 9 23 33 277 30.78 32 33 33 1 31 33 1 4 2 2 0 -11 9 12 33 245 27.22 21 31 33 12 12 33 5 2 1 1 0 -12 9 13 33 214 23.78 15 24 33 18 13 33 2 4 2 1 0 -13 9 5 33 249 27.67 29 31 33 4 23 33 2 1 1 5 0 -14 9 5 33 233 25.89 24 33 33 9 11 33 3 3 2 1 0 -15 9 15 33 251 27.89 24 33 33 9 15 33 5 1 1 2 0 -16 9 23 34 269 29.89 24 33 33 9 23 34 3 1 2 3 0 -17 9 13 34 266 29.56 33 33 33 0 33 33 2 3 1 3 0 -18 9 21 34 272 30.22 31 33 33 2 28 34 0 5 1 3 0 -19 9 5 34 244 27.11 27 30 33 6 18 34 4 4 1 0 0 -20 9 11 34 241 26.78 23 32 33 10 11 34 3 4 2 0 0 -21 9 13 33 240 26.67 24 27 33 9 13 33 1 4 0 4 0 -22 9 5 33 190 21.11 13 21 33 20 5 33 1 4 0 3 1 -23 9 5 33 205 22.78 16 26 33 17 5 33 4 4 1 0 0 -24 9 5 33 247 27.44 28 31 33 5 21 33 1 5 1 2 0 -25 9 11 34 241 26.78 24 33 33 9 11 34 3 4 0 2 0 -26 9 5 33 212 23.56 18 31 33 15 5 33 0 6 0 3 0 -27 9 5 33 227 25.22 21 26 33 12 5 33 3 4 1 1 0 -28 9 21 33 255 28.33 24 31 33 9 21 33 2 4 3 0 0 -29 9 5 33 228 25.33 21 30 33 12 5 33 2 4 1 2 0 -30 9 10 33 213 23.67 16 28 33 17 10 33 3 4 2 0 0 -31 9 5 33 236 26.22 21 31 33 12 5 33 1 4 1 3 0 -32 9 5 33 210 23.33 12 29 33 21 5 33 3 3 0 3 0 -33 9 5 33 183 20.33 9 21 33 24 5 33 1 4 2 2 0 -34 9 5 33 150 16.67 7 17 22 15 5 33 3 4 1 1 0 -35 9 13 33 217 24.11 21 24 29 8 13 33 1 4 1 3 0 -36 9 5 33 195 21.67 18 21 32 14 5 33 3 2 1 3 0 +column count min max sum mean Q1 med Q3 IQR lW rW A_Count C_Count G_Count T_Count N_Count Max_count +1 9 23 34 288 32.00 33 33 33 0 33 33 3 1 4 1 0 9 +2 9 28 33 287 31.89 31 33 33 2 28 33 3 3 2 1 0 9 +3 9 13 34 268 29.78 28 33 33 5 21 34 5 1 0 3 0 9 +4 9 17 33 261 29.00 30 33 33 3 26 33 1 2 3 3 0 9 +5 9 22 33 269 29.89 30 33 33 3 26 33 3 3 3 0 0 9 +6 9 22 33 277 30.78 30 33 33 3 26 33 5 3 0 1 0 9 +7 9 21 33 258 28.67 24 33 33 9 21 33 4 1 3 1 0 9 +8 9 12 33 263 29.22 32 33 33 1 31 33 2 1 1 5 0 9 +9 9 29 33 290 32.22 33 33 33 0 33 33 3 3 2 1 0 9 +10 9 23 33 277 30.78 32 33 33 1 31 33 1 4 2 2 0 9 +11 9 12 33 245 27.22 21 31 33 12 12 33 5 2 1 1 0 9 +12 9 13 33 214 23.78 15 24 33 18 13 33 2 4 2 1 0 9 +13 9 5 33 249 27.67 29 31 33 4 23 33 2 1 1 5 0 9 +14 9 5 33 233 25.89 24 33 33 9 11 33 3 3 2 1 0 9 +15 9 15 33 251 27.89 24 33 33 9 15 33 5 1 1 2 0 9 +16 9 23 34 269 29.89 24 33 33 9 23 34 3 1 2 3 0 9 +17 9 13 34 266 29.56 33 33 33 0 33 33 2 3 1 3 0 9 +18 9 21 34 272 30.22 31 33 33 2 28 34 0 5 1 3 0 9 +19 9 5 34 244 27.11 27 30 33 6 18 34 4 4 1 0 0 9 +20 9 11 34 241 26.78 23 32 33 10 11 34 3 4 2 0 0 9 +21 9 13 33 240 26.67 24 27 33 9 13 33 1 4 0 4 0 9 +22 9 5 33 190 21.11 13 21 33 20 5 33 1 4 0 3 1 9 +23 9 5 33 205 22.78 16 26 33 17 5 33 4 4 1 0 0 9 +24 9 5 33 247 27.44 28 31 33 5 21 33 1 5 1 2 0 9 +25 9 11 34 241 26.78 24 33 33 9 11 34 3 4 0 2 0 9 +26 9 5 33 212 23.56 18 31 33 15 5 33 0 6 0 3 0 9 +27 9 5 33 227 25.22 21 26 33 12 5 33 3 4 1 1 0 9 +28 9 21 33 255 28.33 24 31 33 9 21 33 2 4 3 0 0 9 +29 9 5 33 228 25.33 21 30 33 12 5 33 2 4 1 2 0 9 +30 9 10 33 213 23.67 16 28 33 17 10 33 3 4 2 0 0 9 +31 9 5 33 236 26.22 21 31 33 12 5 33 1 4 1 3 0 9 +32 9 5 33 210 23.33 12 29 33 21 5 33 3 3 0 3 0 9 +33 9 5 33 183 20.33 9 21 33 24 5 33 1 4 2 2 0 9 +34 9 5 33 150 16.67 7 17 22 15 5 33 3 4 1 1 0 9 +35 9 13 33 217 24.11 21 24 29 8 13 33 1 4 1 3 0 9 +36 9 5 33 195 21.67 18 21 32 14 5 33 3 2 1 3 0 9 diff -r 0f97b3048bc3 -r 40c5e1853a66 tool-data/fastx_clipper_sequences.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool-data/fastx_clipper_sequences.txt Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,13 @@ +# +# Adapter/Linker sequences for FASTX-Clipper tool. +# +# Format: +# Adapter Sequence <TAB> Descriptive name +# +# Example: +# AAATTTGATAAGATA Our-Adapter +# +# Some adapters can be found here: +# http://seqanswers.com/forums/showthread.php?t=198
details: http://www.bx.psu.edu/hg/galaxy/rev/40c5e1853a66 changeset: 2691:40c5e1853a66 user: gua110 date: Mon Sep 14 17:03:17 2009 -0400 description: Updating FASTX tool-set to the latest version v0.0.10 28 file(s) affected in this change: static/fastx_icons/fasta_clipping_histogram_3.png static/fastx_icons/fasta_clipping_histogram_4.png test-data/fasta_collapser1.out test-data/fasta_formatter1.fasta test-data/fasta_formatter1.out test-data/fasta_formatter2.out test-data/fasta_nuc_changer1.fasta test-data/fasta_nuc_changer1.out test-data/fasta_nuc_changer2.fasta test-data/fasta_nuc_changer2.out test-data/fastq_stats1.out tool-data/fastx_clipper_sequences.txt tool_conf.xml.sample tools/fastx_toolkit/fasta_clipping_histogram.xml tools/fastx_toolkit/fasta_collapser.xml tools/fastx_toolkit/fasta_formatter.xml tools/fastx_toolkit/fasta_nucleotide_changer.xml tools/fastx_toolkit/fastq_nucleotides_distribution.xml tools/fastx_toolkit/fastq_qual_conv.xml tools/fastx_toolkit/fastq_qual_stat.xml tools/fastx_toolkit/fastq_quality_converter.xml tools/fastx_toolkit/fastx_barcode_splitter.xml tools/fastx_toolkit/fastx_barcode_splitter_galaxy_wrapper.sh tools/fastx_toolkit/fastx_clipper.xml tools/fastx_toolkit/fastx_collapser.xml tools/fastx_toolkit/fastx_nucleotides_distribution.xml tools/fastx_toolkit/fastx_quality_statistics.xml tools/fastx_toolkit/fastx_renamer.xml diffs (1772 lines): diff -r 0f97b3048bc3 -r 40c5e1853a66 static/fastx_icons/fasta_clipping_histogram_3.png Binary file static/fastx_icons/fasta_clipping_histogram_3.png has changed diff -r 0f97b3048bc3 -r 40c5e1853a66 static/fastx_icons/fasta_clipping_histogram_4.png Binary file static/fastx_icons/fasta_clipping_histogram_4.png has changed diff -r 0f97b3048bc3 -r 40c5e1853a66 test-data/fasta_collapser1.out --- a/test-data/fasta_collapser1.out Mon Sep 14 15:27:55 2009 -0400 +++ b/test-data/fasta_collapser1.out Mon Sep 14 17:03:17 2009 -0400 @@ -1,24 +1,24 @@ ->1-3 +>1-15 +AGTACAAGGACATGC +>2-11 +ATTGCTGCTCGGATGGTCCGGCTGTGCACAC +>3-5 +TTCAACGCCGCCGTGAAC +>4-3 CTGCTGCGATCGGTGTGC ->2-1 -TTACCTCACGATATTGTAATA ->3-1 -CCTTGTAGTGGATTCTGATGA ->4-1 -TGATTTCCAGAGCCAAT ->5-11 -ATTGCTGCTCGGATGGTCCGGCTGTGCACAC +>5-1 +TCAAATTCTAGATTTTTACGG + +TGTAGGCC Dummy-Adapter (don't use me) diff -r 0f97b3048bc3 -r 40c5e1853a66 tool_conf.xml.sample --- a/tool_conf.xml.sample Mon Sep 14 15:27:55 2009 -0400 +++ b/tool_conf.xml.sample Mon Sep 14 17:03:17 2009 -0400 @@ -299,24 +299,26 @@ <tool file="fasta_tools/tabular_to_fasta.xml" /> </section> <section name="FASTA/Q Information" id="cshl_library_information"> - <tool file="fastx_toolkit/fastq_qual_stat.xml" /> + <tool file="fastx_toolkit/fastx_quality_statistics.xml" /> <tool file="fastx_toolkit/fastq_quality_boxplot.xml" /> - <tool file="fastx_toolkit/fastq_nucleotides_distribution.xml" /> - <!-- <tool file="fastx_toolkit/fasta_clipping_histogram.xml" /> --> + <tool file="fastx_toolkit/fastx_nucleotides_distribution.xml" /> + <tool file="fastx_toolkit/fasta_clipping_histogram.xml" /> </section> <section name="FASTA/Q Preprocessing" id="cshl_fastx_manipulation"> <tool file="fastx_toolkit/fastq_to_fasta.xml" /> - <tool file="fastx_toolkit/fastq_qual_conv.xml" /> - <!-- <tool file="fastx_toolkit/fastx_clipper.xml" /> --> + <tool file="fastx_toolkit/fastq_quality_converter.xml" /> + <tool file="fastx_toolkit/fastx_clipper.xml" /> <tool file="fastx_toolkit/fastx_trimmer.xml" /> + <tool file="fastx_toolkit/fastx_renamer.xml" /> <tool file="fastx_toolkit/fastx_reverse_complement.xml" /> + <tool file="fastx_toolkit/fasta_formatter.xml" /> + <tool file="fastx_toolkit/fasta_nucleotide_changer.xml" /> <tool file="fastx_toolkit/fastx_artifacts_filter.xml" /> <tool file="fastx_toolkit/fastq_quality_filter.xml" /> - <!-- <tool file="fastx_toolkit/fasta_collapser.xml" /> --> - <!-- <tool file="fastx_toolkit/fastx_barcode_splitter.xml" /> --> + <tool file="fastx_toolkit/fastx_collapser.xml" /> + <!--<tool file="fastx_toolkit/fastx_barcode_splitter.xml" />--> </section> - <section name="Short Read QC and Manipulation" id="short_read_analysis"> <tool file="metag_tools/short_reads_figure_score.xml" /> <tool file="metag_tools/short_reads_figure_high_quality_length.xml" /> diff -r 0f97b3048bc3 -r 40c5e1853a66 tools/fastx_toolkit/fasta_clipping_histogram.xml --- a/tools/fastx_toolkit/fasta_clipping_histogram.xml Mon Sep 14 15:27:55 2009 -0400 +++ b/tools/fastx_toolkit/fasta_clipping_histogram.xml Mon Sep 14 17:03:17 2009 -0400 @@ -13,7 +13,7 @@ **What it does** -This tool creates a histogram image of sequence lengths distribution in a given fasta data set file. +This tool creates a histogram image of sequence lengths distribution in a given fasta dataset file. **TIP:** Use this tool after clipping your library (with **FASTX Clipper tool**), to visualize the clipping results. @@ -21,17 +21,82 @@ **Output Examples** - In the following library, most sequences are 24-mers to 27-mers. This could indicate an abundance of endo-siRNAs (depending of course of what you've tried to sequence in the first place). -.. image:: ../static/fastx_icons/fasta_clipping_histogram_1.png +.. image:: ./static/fastx_icons/fasta_clipping_histogram_1.png In the following library, most sequences are 19,22 or 23-mers. This could indicate an abundance of miRNAs (depending of course of what you've tried to sequence in the first place). -.. image:: ../static/fastx_icons/fasta_clipping_histogram_2.png +.. image:: ./static/fastx_icons/fasta_clipping_histogram_2.png + + +----- + + +**Input Formats** + +This tool accepts short-reads FASTA files. The reads don't have to be short, but they do have to be on a single line, like so:: + + >sequence1 + AGTAGTAGGTGATGTAGAGAGAGAGAGAGTAG + >sequence2 + GTGTGTGTGGGAAGTTGACACAGTA + >sequence3 + CCTTGAGATTAACGCTAATCAAGTAAAC + + +If the sequences span over multiple lines:: + + >sequence1 + CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG + TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG + aactggtctttacctTTAAGTTG + +Use the **FASTA Width Formatter** tool to re-format the FASTA into a single-lined sequences:: + + >sequence1 + CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG + + +----- + + + +**Multiplicity counts (a.k.a reads-count)** + +If the sequence identifier (the text after the '>') contains a dash and a number, it is treated as a multiplicity count value (i.e. how many times that individual sequence repeated in the original FASTA file, before collapsing). + +Example 1 - The following FASTA file *does not* have multiplicity counts:: + + >seq1 + GGATCC + >seq2 + GGTCATGGGTTTAAA + >seq3 + GGGATATATCCCCACACACACACAC + +Each sequence is counts as one, to produce the following chart: + +.. image:: ./static/fastx_icons/fasta_clipping_histogram_3.png + + +Example 2 - The following FASTA file have multiplicity counts:: + + >seq1-2 + GGATCC + >seq2-10 + GGTCATGGGTTTAAA + >seq3-3 + GGGATATATCCCCACACACACACAC + +The first sequence counts as 2, the second as 10, the third as 3, to produce the following chart: + +.. image:: ./static/fastx_icons/fasta_clipping_histogram_4.png + +Use the **FASTA Collapser** tool to create FASTA files with multiplicity counts. </help> </tool> diff -r 0f97b3048bc3 -r 40c5e1853a66 tools/fastx_toolkit/fasta_collapser.xml --- a/tools/fastx_toolkit/fasta_collapser.xml Mon Sep 14 15:27:55 2009 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,75 +0,0 @@ -<tool id="cshl_fasta_collapser" name="Collapse"> - <description>sequences</description> - <command>fasta_collapser.pl $input $output</command> - - <inputs> - <param format="fasta" name="input" type="data" label="Library to collapse" /> - </inputs> - - <tests> - <test> - <param name="input" value="fasta_collapser1.fasta" /> - <output name="output" file="fasta_collapser1.out" /> - </test> - </tests> - - <outputs> - <data format="fasta" name="output" metadata_source="input" /> - </outputs> - <help> - -**What it does** - -This tool collapses identical sequences in a FASTA file into a single sequence. - --------- - -**Example** - -Example Input File (Sequence "ATAT" appears multiple times):: - - >CSHL_2_FC0042AGLLOO_1_1_605_414 - TGCG - >CSHL_2_FC0042AGLLOO_1_1_537_759 - ATAT - >CSHL_2_FC0042AGLLOO_1_1_774_520 - TGGC - >CSHL_2_FC0042AGLLOO_1_1_742_502 - ATAT - >CSHL_2_FC0042AGLLOO_1_1_781_514 - TGAG - >CSHL_2_FC0042AGLLOO_1_1_757_487 - TTCA - >CSHL_2_FC0042AGLLOO_1_1_903_769 - ATAT - >CSHL_2_FC0042AGLLOO_1_1_724_499 - ATAT - -Example Output file:: - - >1-1 - TGCG - >2-4 - ATAT - >3-1 - TGGC - >4-1 - TGAG - >5-1 - TTCA - -.. class:: infomark - -Original Sequence Names / Lane descriptions (e.g. "CSHL_2_FC0042AGLLOO_1_1_742_502") are discarded. - -The output seqeunce name is composed of two numbers: the first is the sequence's number, the second is the multiplicity value. - -The following output:: - - >2-4 - ATAT - -means that the sequence "ATAT" is the second sequence in the file, and it appeared 4 times in the input FASTA file. - -</help> -</tool> diff -r 0f97b3048bc3 -r 40c5e1853a66 tools/fastx_toolkit/fasta_formatter.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/fastx_toolkit/fasta_formatter.xml Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,79 @@ +<tool id="cshl_fasta_formatter" name="FASTA Width"> + <description>formatter</description> + <!-- + Note: + fasta_formatter also has a tabular output mode (-t), + but Galaxy already contains such a tool, so no need + to offer the user a duplicated tool. + + So this XML tool only changes the width (line-wrapping) of a + FASTA file. + --> + <command>zcat -f '$input' | fasta_formatter -w $width -o $output</command> + <inputs> + <param format="fasta" name="input" type="data" label="Library to re-format" /> + + <param name="width" type="integer" value="0" label="New width for nucleotides strings" help="Use 0 for single line outout." /> + </inputs> + + <tests> + <test> + <!-- Re-format a FASTA file into a single line --> + <param name="input" value="fasta_formatter1.fasta" /> + <param name="width" value="0" /> + <output name="output" file="fastx_formatter1.out" /> + </test> + <test> + <!-- Re-format a FASTA file into multiple lines wrapping at 60 charactes --> + <param name="input" value="fasta_formatter1.fasta" /> + <param name="width" value="60" /> + <output name="output" file="fasta_formatter2.out" /> + </test> + </tests> + + <outputs> + <data format="input" name="output" metadata_source="input" /> + </outputs> + +<help> +**What it does** + +This tool re-formats a FASTA file, changing the width of the nucleotides lines. + +**TIP:** Outputting a single line (with **width = 0**) can be useful for scripting (with **grep**, **awk**, and **perl**). Every odd line is a sequence identifier, and every even line is a nucleotides line. + +-------- + +**Example** + +Input FASTA file (each nucleotides line is 50 characters long):: + + >Scaffold3648 + AGGAATGATGACTACAATGATCAACTTAACCTATCTATTTAATTTAGTTC + CCTAATGTCAGGGACCTACCTGTTTTTGTTATGTTTGGGTTTTGTTGTTG + TTGTTTTTTTAATCTGAAGGTATTGTGCATTATATGACCTGTAATACACA + ATTAAAGTCAATTTTAATGAACATGTAGTAAAAACT + >Scaffold9299 + CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAG + TCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAG + aactggtctttacctTTAAGTTG + + +Output FASTA file (with width=80):: + + >Scaffold3648 + AGGAATGATGACTACAATGATCAACTTAACCTATCTATTTAATTTAGTTCCCTAATGTCAGGGACCTACCTGTTTTTGTT + ATGTTTGGGTTTTGTTGTTGTTGTTTTTTTAATCTGAAGGTATTGTGCATTATATGACCTGTAATACACAATTAAAGTCA + ATTTTAATGAACATGTAGTAAAAACT + >Scaffold9299 + CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTAC + GTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG + +Output FASTA file (with width=0 => single line):: + + >Scaffold3648 + AGGAATGATGACTACAATGATCAACTTAACCTATCTATTTAATTTAGTTCCCTAATGTCAGGGACCTACCTGTTTTTGTTATGTTTGGGTTTTGTTGTTGTTGTTTTTTTAATCTGAAGGTATTGTGCATTATATGACCTGTAATACACAATTAAAGTCAATTTTAATGAACATGTAGTAAAAACT + >Scaffold9299 + CAGCATCTACATAATATGATCGCTATTAAACTTAAATCTCCTTGACGGAGTCTTCGGTCATAACACAAACCCAGACCTACGTATATGACAAAGCTAATAGaactggtctttacctTTAAGTTG +</help> +</tool> diff -r 0f97b3048bc3 -r 40c5e1853a66 tools/fastx_toolkit/fasta_nucleotide_changer.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/fastx_toolkit/fasta_nucleotide_changer.xml Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,65 @@ +<tool id="cshl_fasta_nucleotides_changer" name="RNA/DNA" > + <description>converter</description> + <command>zcat -f '$input' | fasta_nucleotide_changer $mode -v -o $output</command> + <inputs> + <param format="fasta" name="input" type="data" label="Library to convert" /> + + <param name="mode" type="select" label="Convert"> + <option value="-d">RNA to DNA (U to T)</option> + <option value="-r">DNA to RNA (T to U)</option> + </param> + </inputs> + + <tests> + <test> + <!-- DNA-to-RNA --> + <param name="input" value="fasta_nuc_changer1.fasta" /> + <param name="mode" value="-r" /> + <output name="output" file="fasta_nuc_change1.out" /> + </test> + <test> + <!-- RNA-to-DNA --> + <param name="input" value="fasta_nuc_changer2.fasta" /> + <param name="mode" value="-d" /> + <output name="output" file="fasta_nuc_change2.out" /> + </test> + </tests> + + + <outputs> + <data format="input" name="output" metadata_source="input" /> + </outputs> + +<help> +**What it does** + +This tool converts RNA FASTA files to DNA (and vice-versa). + +In **RNA-to-DNA** mode, U's are changed into T's. + +In **DNA-to-RNA** mode, T's are changed into U's. + +-------- + +**Example** + +Input RNA FASTA file ( from Sanger's mirBase ):: + + >cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7 + UGAGGUAGUAGGUUGUAUAGUU + >cel-lin-4 MIMAT0000002 Caenorhabditis elegans lin-4 + UCCCUGAGACCUCAAGUGUGA + >cel-miR-1 MIMAT0000003 Caenorhabditis elegans miR-1 + UGGAAUGUAAAGAAGUAUGUA + +Output DNA FASTA file (with RNA-to-DNA mode):: + + >cel-let-7 MIMAT0000001 Caenorhabditis elegans let-7 + TGAGGTAGTAGGTTGTATAGTT + >cel-lin-4 MIMAT0000002 Caenorhabditis elegans lin-4 + TCCCTGAGACCTCAAGTGTGA + >cel-miR-1 MIMAT0000003 Caenorhabditis elegans miR-1 + TGGAATGTAAAGAAGTATGTA + +</help> +</tool> diff -r 0f97b3048bc3 -r 40c5e1853a66 tools/fastx_toolkit/fastq_nucleotides_distribution.xml --- a/tools/fastx_toolkit/fastq_nucleotides_distribution.xml Mon Sep 14 15:27:55 2009 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,66 +0,0 @@ -<tool id="cshl_fastq_nucleotides_distribution" name="Nucleotides Distribution"> - <description>chart</description> - <command>fastq_nucleotide_distribution_graph.sh -t '$input.name' -i $input -o $output</command> - - <inputs> - <param format="txt" name="input" type="data" label="Statistics Text File (output of 'FASTQ Statistics' tool)" /> - </inputs> - - <outputs> - <data format="png" name="output" metadata_source="input" /> - </outputs> -<help> - -**What it does** - -Creates a stacked-histogram graph for the nucleotide distribution in the Solexa library. - -.. class:: infomark - -**TIP:** Use the **FASTQ Statistics** tool to generate the report file needed for this tool. - ------ - -**Output Examples** - - - -The following chart clearly shows the barcode used at the 5'-end of the library: **GATCT** - -.. image:: ../static/fastx_icons/fastq_nucleotides_distribution_1.png - - - - - - - -In the following chart, one can almost 'read' the most abundant sequence by looking at the dominant values: **TGATA TCGTA TTGAT GACTG AA...** - -.. image:: ../static/fastx_icons/fastq_nucleotides_distribution_2.png - - - - - - - - -The following chart shows a growing number of unknown (N) nucleotides towards later cycles (which might indicate a sequencing problem): - -.. image:: ../static/fastx_icons/fastq_nucleotides_distribution_3.png - - - - - - - - -But most of the time, the chart will look rather random: - -.. image:: ../static/fastx_icons/fastq_nucleotides_distribution_4.png - -</help> -</tool> -<!-- FASTQ-Nucleotides-Distribution is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) --> diff -r 0f97b3048bc3 -r 40c5e1853a66 tools/fastx_toolkit/fastq_qual_conv.xml --- a/tools/fastx_toolkit/fastq_qual_conv.xml Mon Sep 14 15:27:55 2009 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,82 +0,0 @@ -<tool id="cshl_fastq_qual_conv" name="Quality format converter"> - <description>(ASCII-Numeric)</description> - <command>zcat -f $input | fastq_quality_converter $QUAL_FORMAT -o $output</command> - <inputs> - <param format="fastqsolexa" name="input" type="data" label="Library to convert" /> - - <param name="QUAL_FORMAT" type="select" label="Desired output format"> - <option value="-a">ASCII (letters) quality scores</option> - <option value="-n">Numeric quality scores</option> - </param> - </inputs> - - <tests> - <test> - <!-- ASCII to NUMERIC --> - <param name="input" value="fastq_qual_conv1.fastq" /> - <param name="QUAL_FORMAT" value="Numeric quality scores" /> - <output name="output" file="fastq_qual_conv1.out" /> - </test> - <test> - <!-- ASCII to ASCII (basically, a no-op, but it should still produce a valid output --> - <param name="input" value="fastq_qual_conv1.fastq" /> - <param name="QUAL_FORMAT" value="ASCII (letters) quality scores" /> - <output name="output" file="fastq_qual_conv1a.out" /> - </test> - <test> - <!-- NUMERIC to ASCII --> - <param name="input" value="fastq_qual_conv2.fastq" /> - <param name="QUAL_FORMAT" value="ASCII (letters) quality scores" /> - <output name="output" file="fastq_qual_conv2.out" /> - </test> - <test> - <!-- NUMERIC to NUMERIC (basically, a no-op, but it should still produce a valid output --> - <param name="input" value="fastq_qual_conv2.fastq" /> - <param name="QUAL_FORMAT" value="Numeric quality scores" /> - <output name="output" file="fastq_qual_conv2n.out" /> - </test> - </tests> - - <outputs> - <data format="fastqsolexa" name="output" metadata_source="input" /> - </outputs> -<help> - -**What it does** - -Converts a solexa FASTQ file to/from numeric or ASCII quality format. - -.. class:: warningmark - -Re-scaling is **not** performed. (e.g. conversion from Phred scale to Solexa scale). - - ------ - -FASTQ with Numeric quality scores:: - - @CSHL__2_FC042AGWWWXX:8:1:120:202 - ACGATAGATCGGAAGAGCTAGTATGCCGTTTTCTGC - +CSHL__2_FC042AGWWWXX:8:1:120:202 - 40 40 40 40 20 40 40 40 40 6 40 40 28 40 40 25 40 20 40 -1 30 40 14 27 40 8 1 3 7 -1 11 10 -1 21 10 8 - @CSHL__2_FC042AGWWWXX:8:1:103:1185 - ATCACGATAGATCGGCAGAGCTCGTTTACCGTCTTC - +CSHL__2_FC042AGWWWXX:8:1:103:1185 - 40 40 40 40 40 35 33 31 40 40 40 32 30 22 40 -0 9 22 17 14 8 36 15 34 22 12 23 3 10 -0 8 2 4 25 30 2 - - -FASTQ with ASCII quality scores:: - - @CSHL__2_FC042AGWWWXX:8:1:120:202 - ACGATAGATCGGAAGAGCTAGTATGCCGTTTTCTGC - +CSHL__2_FC042AGWWWXX:8:1:120:202 - hhhhThhhhFhh\hhYhTh?^hN[hHACG?KJ?UJH - @CSHL__2_FC042AGWWWXX:8:1:103:1185 - ATCACGATAGATCGGCAGAGCTCGTTTACCGTCTTC - +CSHL__2_FC042AGWWWXX:8:1:103:1185 - hhhhhca_hhh`^Vh@IVQNHdObVLWCJ@HBDY^B - - -</help> -</tool> -<!-- FASTQ-Quality-Converter is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) --> diff -r 0f97b3048bc3 -r 40c5e1853a66 tools/fastx_toolkit/fastq_qual_stat.xml --- a/tools/fastx_toolkit/fastq_qual_stat.xml Mon Sep 14 15:27:55 2009 -0400 +++ /dev/null Thu Jan 01 00:00:00 1970 +0000 @@ -1,100 +0,0 @@ -<tool id="cshl_fastq_qual_stat" name="Quality Statistics"> - <description></description> - <command>zcat -f $input | fastq_quality_stats -o $output</command> - - <inputs> - <param format="fastqsolexa" name="input" type="data" label="Library to analyse" /> - </inputs> - - <tests> - <test> - <param name="input" value="fastq_stats1.fastq" /> - <output name="output" file="fastq_stats1.out" /> - </test> - </tests> - - <outputs> - <data format="txt" name="output" metadata_source="input" /> - </outputs> - -<help> - -**What it does** - -Creates quality statistics report for the given Solexa/FASTQ library. - -.. class:: infomark - -**TIP:** This statistics report can be used as input for **Quality Score** and **Nucleotides Distribution** tools. - ------ - -**The output file will contain the following fields:** - -* column = column number (1 to 36 for a 36-cycles read solexa file) -* count = number of bases found in this column. -* min = Lowest quality score value found in this column. -* max = Highest quality score value found in this column. -* sum = Sum of quality score values for this column. -* mean = Mean quality score value for this column. -* Q1 = 1st quartile quality score. -* med = Median quality score. -* Q3 = 3rd quartile quality score. -* IQR = Inter-Quartile range (Q3-Q1). -* lW = 'Left-Whisker' value (for boxplotting). -* rW = 'Right-Whisker' value (for boxplotting). -* A_Count = Count of 'A' nucleotides found in this column. -* C_Count = Count of 'C' nucleotides found in this column. -* G_Count = Count of 'G' nucleotides found in this column. -* T_Count = Count of 'T' nucleotides found in this column. -* N_Count = Count of 'N' nucleotides found in this column. - - - - - - -**Output Example**:: - - column count min max sum mean Q1 med Q3 IQR lW rW A_Count C_Count G_Count T_Count N_Count - 1 6362991 -4 40 250734117 39.41 40 40 40 0 40 40 1396976 1329101 678730 2958184 0 - 2 6362991 -5 40 250531036 39.37 40 40 40 0 40 40 1786786 1055766 1738025 1782414 0 - 3 6362991 -5 40 248722469 39.09 40 40 40 0 40 40 2296384 984875 1443989 1637743 0 - 4 6362991 -5 40 247654797 38.92 40 40 40 0 40 40 1683197 1410855 1722633 1546306 0 - 5 6362991 -4 40 248214827 39.01 40 40 40 0 40 40 2536861 1167423 1248968 1409739 0 - 6 6362991 -5 40 248499903 39.05 40 40 40 0 40 40 1598956 1236081 1568608 1959346 0 - 7 6362991 -4 40 247719760 38.93 40 40 40 0 40 40 1692667 1822140 1496741 1351443 0 - 8 6362991 -5 40 245745205 38.62 40 40 40 0 40 40 2230936 1343260 1529928 1258867 0 - 9 6362991 -5 40 245766735 38.62 40 40 40 0 40 40 1702064 1306257 1336511 2018159 0 - 10 6362991 -5 40 245089706 38.52 40 40 40 0 40 40 1519917 1446370 1450995 1945709 0 - 11 6362991 -5 40 242641359 38.13 40 40 40 0 40 40 1717434 1282975 1387804 1974778 0 - 12 6362991 -5 40 242026113 38.04 40 40 40 0 40 40 1662872 1202041 1519721 1978357 0 - 13 6362991 -5 40 238704245 37.51 40 40 40 0 40 40 1549965 1271411 1973291 1566681 1643 - 14 6362991 -5 40 235622401 37.03 40 40 40 0 40 40 2101301 1141451 1603990 1515774 475 - 15 6362991 -5 40 230766669 36.27 40 40 40 0 40 40 2344003 1058571 1440466 1519865 86 - 16 6362991 -5 40 224466237 35.28 38 40 40 2 35 40 2203515 1026017 1474060 1651582 7817 - 17 6362991 -5 40 219990002 34.57 34 40 40 6 25 40 1522515 1125455 2159183 1555765 73 - 18 6362991 -5 40 214104778 33.65 30 40 40 10 15 40 1479795 2068113 1558400 1249337 7346 - 19 6362991 -5 40 212934712 33.46 30 40 40 10 15 40 1432749 1231352 1769799 1920093 8998 - 20 6362991 -5 40 212787944 33.44 29 40 40 11 13 40 1311657 1411663 2126316 1513282 73 - 21 6362991 -5 40 211369187 33.22 28 40 40 12 10 40 1887985 1846300 1300326 1318380 10000 - 22 6362991 -5 40 213371720 33.53 30 40 40 10 15 40 542299 3446249 516615 1848190 9638 - 23 6362991 -5 40 221975899 34.89 36 40 40 4 30 40 347679 1233267 926621 3855355 69 - 24 6362991 -5 40 194378421 30.55 21 40 40 19 -5 40 433560 674358 3262764 1992242 67 - 25 6362991 -5 40 199773985 31.40 23 40 40 17 -2 40 944760 325595 1322800 3769641 195 - 26 6362991 -5 40 179404759 28.20 17 34 40 23 -5 40 3457922 156013 1494664 1254293 99 - 27 6362991 -5 40 163386668 25.68 13 28 40 27 -5 40 1392177 281250 3867895 821491 178 - 28 6362991 -5 40 156230534 24.55 12 25 40 28 -5 40 907189 981249 4174945 299437 171 - 29 6362991 -5 40 163236046 25.65 13 28 40 27 -5 40 1097171 3418678 1567013 280008 121 - 30 6362991 -5 40 151309826 23.78 12 23 40 28 -5 40 3514775 2036194 566277 245613 132 - 31 6362991 -5 40 141392520 22.22 10 21 40 30 -5 40 1569000 4571357 124732 97721 181 - 32 6362991 -5 40 143436943 22.54 10 21 40 30 -5 40 1453607 4519441 38176 351107 660 - 33 6362991 -5 40 114269843 17.96 6 14 30 24 -5 40 3311001 2161254 155505 734297 934 - 34 6362991 -5 40 140638447 22.10 10 20 40 30 -5 40 1501615 1637357 18113 3205237 669 - 35 6362991 -5 40 138910532 21.83 10 20 40 30 -5 40 1532519 3495057 23229 1311834 352 - 36 6362991 -5 40 117158566 18.41 7 15 30 23 -5 40 4074444 1402980 63287 822035 245 - - -</help> -</tool> -<!-- FASTQ-Statistics is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) --> diff -r 0f97b3048bc3 -r 40c5e1853a66 tools/fastx_toolkit/fastq_quality_converter.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/fastx_toolkit/fastq_quality_converter.xml Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,88 @@ +<tool id="cshl_fastq_quality_converter" name="Quality format converter"> + <description>(ASCII-Numeric)</description> + <command>zcat -f $input | fastq_quality_converter $QUAL_FORMAT -o $output -Q $offset</command> + <inputs> + <param format="fastqsolexa" name="input" type="data" label="Library to convert" /> + + <param name="QUAL_FORMAT" type="select" label="Desired output format"> + <option value="-a">ASCII (letters) quality scores</option> + <option value="-n">Numeric quality scores</option> + </param> + + <param name="offset" size="4" type="integer" value="33" label="FASTQ ASCII offset" /> + </inputs> + + <tests> + <test> + <!-- ASCII to NUMERIC --> + <param name="input" value="fastq_qual_conv1.fastq" /> + <param name="QUAL_FORMAT" value="Numeric quality scores" /> + <param name="offset" value="64" /> + <output name="output" file="fastq_qual_conv1.out" /> + </test> + <test> + <!-- ASCII to ASCII (basically, a no-op, but it should still produce a valid output --> + <param name="input" value="fastq_qual_conv1.fastq" /> + <param name="QUAL_FORMAT" value="ASCII (letters) quality scores" /> + <param name="offset" value="64" /> + <output name="output" file="fastq_qual_conv1a.out" /> + </test> + <test> + <!-- NUMERIC to ASCII --> + <param name="input" value="fastq_qual_conv2.fastq" /> + <param name="QUAL_FORMAT" value="ASCII (letters) quality scores" /> + <param name="offset" value="64" /> + <output name="output" file="fastq_qual_conv2.out" /> + </test> + <test> + <!-- NUMERIC to NUMERIC (basically, a no-op, but it should still produce a valid output --> + <param name="input" value="fastq_qual_conv2.fastq" /> + <param name="QUAL_FORMAT" value="Numeric quality scores" /> + <param name="offset" value="64" /> + <output name="output" file="fastq_qual_conv2n.out" /> + </test> + </tests> + + <outputs> + <data format="fastqsolexa" name="output" metadata_source="input" /> + </outputs> +<help> + +**What it does** + +Converts a solexa FASTQ file to/from numeric or ASCII quality format. + +.. class:: warningmark + +Re-scaling is **not** performed. (e.g. conversion from Phred scale to Solexa scale). + + +----- + +FASTQ with Numeric quality scores:: + + @CSHL__2_FC042AGWWWXX:8:1:120:202 + ACGATAGATCGGAAGAGCTAGTATGCCGTTTTCTGC + +CSHL__2_FC042AGWWWXX:8:1:120:202 + 40 40 40 40 20 40 40 40 40 6 40 40 28 40 40 25 40 20 40 -1 30 40 14 27 40 8 1 3 7 -1 11 10 -1 21 10 8 + @CSHL__2_FC042AGWWWXX:8:1:103:1185 + ATCACGATAGATCGGCAGAGCTCGTTTACCGTCTTC + +CSHL__2_FC042AGWWWXX:8:1:103:1185 + 40 40 40 40 40 35 33 31 40 40 40 32 30 22 40 -0 9 22 17 14 8 36 15 34 22 12 23 3 10 -0 8 2 4 25 30 2 + + +FASTQ with ASCII quality scores:: + + @CSHL__2_FC042AGWWWXX:8:1:120:202 + ACGATAGATCGGAAGAGCTAGTATGCCGTTTTCTGC + +CSHL__2_FC042AGWWWXX:8:1:120:202 + hhhhThhhhFhh\hhYhTh?^hN[hHACG?KJ?UJH + @CSHL__2_FC042AGWWWXX:8:1:103:1185 + ATCACGATAGATCGGCAGAGCTCGTTTACCGTCTTC + +CSHL__2_FC042AGWWWXX:8:1:103:1185 + hhhhhca_hhh`^Vh@IVQNHdObVLWCJ@HBDY^B + + +</help> +</tool> +<!-- FASTQ-Quality-Converter is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) --> diff -r 0f97b3048bc3 -r 40c5e1853a66 tools/fastx_toolkit/fastx_barcode_splitter.xml --- a/tools/fastx_toolkit/fastx_barcode_splitter.xml Mon Sep 14 15:27:55 2009 -0400 +++ b/tools/fastx_toolkit/fastx_barcode_splitter.xml Mon Sep 14 17:03:17 2009 -0400 @@ -1,6 +1,6 @@ <tool id="cshl_fastx_barcode_splitter" name="Barcode Splitter"> <description></description> - <command>fastx_barcode_splitter_galaxy_wrapper.sh $BARCODE $input "$input.name" --mismatches $mismatches --partial $partial $EOL > $output </command> + <command interpreter="sh">fastx_barcode_splitter_galaxy_wrapper.sh $BARCODE $input "$input.name" "$output.files_path" --mismatches $mismatches --partial $partial $EOL > $output </command> <inputs> <param format="txt" name="BARCODE" type="data" label="Barcodes to use" /> @@ -61,7 +61,7 @@ **Output Example** -.. image:: ../static/fastx_icons/barcode_splitter_output_example.png +.. image:: ./static/fastx_icons/barcode_splitter_output_example.png </help> </tool> diff -r 0f97b3048bc3 -r 40c5e1853a66 tools/fastx_toolkit/fastx_barcode_splitter_galaxy_wrapper.sh --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/fastx_toolkit/fastx_barcode_splitter_galaxy_wrapper.sh Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,80 @@ +#!/bin/sh + +# FASTX-toolkit - FASTA/FASTQ preprocessing tools. +# Copyright (C) 2009 A. Gordon (gordon@cshl.edu) +# +# This program is free software: you can redistribute it and/or modify +# it under the terms of the GNU Affero General Public License as +# published by the Free Software Foundation, either version 3 of the +# License, or (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU Affero General Public License for more details. +# +# You should have received a copy of the GNU Affero General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. + +# +#This is a shell script wrapper for 'fastx_barcode_splitter.pl' +# +# 1. Output files are saved at the dataset's files_path directory. +# +# 2. 'fastx_barcode_splitter.pl' outputs a textual table. +# This script turns it into pretty HTML with working URL +# (so lazy users can just click on the URLs and get their files) + +BARCODE_FILE="$1" +FASTQ_FILE="$2" +LIBNAME="$3" +OUTPUT_PATH="$4" +shift 4 +# The rest of the parameters are passed to the split program + +if [ "$OUTPUT_PATH" == "" ]; then + echo "Usage: $0 [BARCODE FILE] [FASTQ FILE] [LIBRARY_NAME] [OUTPUT_PATH]" >&2 + exit 1 +fi + +#Sanitize library name, make sure we can create a file with this name +LIBNAME=${LIBNAME//\.gz/} +LIBNAME=${LIBNAME//\.txt/} +LIBNAME=${LIBNAME//[^[:alnum:]]/_} + +if [ ! -r "$FASTQ_FILE" ]; then + echo "Error: Input file ($FASTQ_FILE) not found!" >&2 + exit 1 +fi +if [ ! -r "$BARCODE_FILE" ]; then + echo "Error: barcode file ($BARCODE_FILE) not found!" >&2 + exit 1 +fi +mkdir -p "$OUTPUT_PATH" +if [ ! -d "$OUTPUT_PATH" ]; then + echo "Error: failed to create output path '$OUTPUT_PATH'" >&2 + exit 1 +fi + +PUBLICURL="" +BASEPATH="$OUTPUT_PATH/" +#PREFIX="$BASEPATH"`date "+%Y-%m-%d_%H%M__"`"${LIBNAME}__" +PREFIX="$BASEPATH""${LIBNAME}__" +SUFFIX=".txt" + +RESULTS=`zcat -f "$FASTQ_FILE" | fastx_barcode_splitter.pl --bcfile "$BARCODE_FILE" --prefix "$PREFIX" --suffix "$SUFFIX" "$@"` +if [ $? != 0 ]; then + echo "error" +fi + +# +# Convert the textual tab-separated table into simple HTML table, +# with the local path replaces with a valid URL +echo "<html><body><table border=1>" +echo "$RESULTS" | sed -r "s|$BASEPATH(.*)|<a href=\"\\1\">\\1</a>|" | sed ' +i<tr><td> +s|\t|</td><td>|g +a<\/td><\/tr> +' +echo "<p>" +echo "</table></body></html>" diff -r 0f97b3048bc3 -r 40c5e1853a66 tools/fastx_toolkit/fastx_clipper.xml --- a/tools/fastx_toolkit/fastx_clipper.xml Mon Sep 14 15:27:55 2009 -0400 +++ b/tools/fastx_toolkit/fastx_clipper.xml Mon Sep 14 17:03:17 2009 -0400 @@ -1,15 +1,11 @@ <tool id="cshl_fastx_clipper" name="Clip" version="1.0.1" > <description>adapter sequences</description> <command> - zcat -f $input | fastx_clipper -s $maxmismatches -l $minlength -a $clip_source.clip_sequence -d $keepdelta -o $output -v $KEEP_N $DISCARD_OPTIONS + zcat -f $input | fastx_clipper -l $minlength -a $clip_source.clip_sequence -d $keepdelta -o $output -v $KEEP_N $DISCARD_OPTIONS </command> <inputs> <param format="fasta,fastqsolexa" name="input" type="data" label="Library to clip" /> - - <param name="maxmismatches" size="4" type="integer" value="2"> - <label>Maximum number of mismatches allowed (when matching the adapter sequence)</label> - </param> <param name="minlength" size="4" type="integer" value="15"> <label>Minimum sequence length (after clipping, sequences shorter than this length will be discarded)</label> @@ -52,22 +48,23 @@ </param> </inputs> - + <!-- + #functional test with param value starting with - fails. <tests> <test> <!-- Clip a FASTQ file --> <param name="input" value="fastx_clipper1.fastq" /> <param name="maxmismatches" value="2" /> <param name="minlength" value="15" /> - <param name="clip_source.clip_source_list" value="user" /> - <param name="clip_source.clip_sequence" value="CAATTGGTTAATCCCCCTATATA" /> + <param name="clip_source_list" value="user" /> + <param name="clip_sequence" value="CAATTGGTTAATCCCCCTATATA" /> <param name="keepdelta" value="0" /> <param name="KEEP_N" value="-n" /> <param name="DISCARD_OPTIONS" value="-c" /> <output name="output" file="fastx_clipper1a.out" /> </test> </tests> - + --> <outputs> <data format="input" name="output" metadata_source="input" /> </outputs> @@ -82,7 +79,7 @@ **Clipping Illustration:** -.. image:: ../static/fastx_icons/fastx_clipper_illustration.png +.. image:: ./static/fastx_icons/fastx_clipper_illustration.png @@ -93,7 +90,7 @@ **Clipping Example:** -.. image:: ../static/fastx_icons/fastx_clipper_example.png +.. image:: ./static/fastx_icons/fastx_clipper_example.png diff -r 0f97b3048bc3 -r 40c5e1853a66 tools/fastx_toolkit/fastx_collapser.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/fastx_toolkit/fastx_collapser.xml Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,75 @@ +<tool id="cshl_fastx_collapser" name="Collapse"> + <description>sequences</description> + <command>zcat -f '$input' | fastx_collapser -v -o '$output' </command> + + <inputs> + <param format="fastqsolexa,fasta" name="input" type="data" label="Library to collapse" /> + </inputs> + + <tests> + <test> + <param name="input" value="fasta_collapser1.fasta" /> + <output name="output" file="fasta_collapser1.out" /> + </test> + </tests> + + <outputs> + <data format="fasta" name="output" metadata_source="input" /> + </outputs> + <help> + +**What it does** + +This tool collapses identical sequences in a FASTA file into a single sequence. + +-------- + +**Example** + +Example Input File (Sequence "ATAT" appears multiple times):: + + >CSHL_2_FC0042AGLLOO_1_1_605_414 + TGCG + >CSHL_2_FC0042AGLLOO_1_1_537_759 + ATAT + >CSHL_2_FC0042AGLLOO_1_1_774_520 + TGGC + >CSHL_2_FC0042AGLLOO_1_1_742_502 + ATAT + >CSHL_2_FC0042AGLLOO_1_1_781_514 + TGAG + >CSHL_2_FC0042AGLLOO_1_1_757_487 + TTCA + >CSHL_2_FC0042AGLLOO_1_1_903_769 + ATAT + >CSHL_2_FC0042AGLLOO_1_1_724_499 + ATAT + +Example Output file:: + + >1-1 + TGCG + >2-4 + ATAT + >3-1 + TGGC + >4-1 + TGAG + >5-1 + TTCA + +.. class:: infomark + +Original Sequence Names / Lane descriptions (e.g. "CSHL_2_FC0042AGLLOO_1_1_742_502") are discarded. + +The output seqeunce name is composed of two numbers: the first is the sequence's number, the second is the multiplicity value. + +The following output:: + + >2-4 + ATAT + +means that the sequence "ATAT" is the second sequence in the file, and it appeared 4 times in the input FASTA file. + +</help> +</tool> diff -r 0f97b3048bc3 -r 40c5e1853a66 tools/fastx_toolkit/fastx_nucleotides_distribution.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/fastx_toolkit/fastx_nucleotides_distribution.xml Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,66 @@ +<tool id="cshl_fastx_nucleotides_distribution" name="Nucleotides Distribution"> + <description>chart</description> + <command>fastx_nucleotide_distribution_graph.sh -t '$input.name' -i $input -o $output</command> + + <inputs> + <param format="txt" name="input" type="data" label="Statistics Text File (output of 'FASTX Statistics' tool)" /> + </inputs> + + <outputs> + <data format="png" name="output" metadata_source="input" /> + </outputs> +<help> + +**What it does** + +Creates a stacked-histogram graph for the nucleotide distribution in the Solexa library. + +.. class:: infomark + +**TIP:** Use the **FASTQ Statistics** tool to generate the report file needed for this tool. + +----- + +**Output Examples** + + + +The following chart clearly shows the barcode used at the 5'-end of the library: **GATCT** + +.. image:: ./static/fastx_icons/fastq_nucleotides_distribution_1.png + + + + + + + +In the following chart, one can almost 'read' the most abundant sequence by looking at the dominant values: **TGATA TCGTA TTGAT GACTG AA...** + +.. image:: ./static/fastx_icons/fastq_nucleotides_distribution_2.png + + + + + + + + +The following chart shows a growing number of unknown (N) nucleotides towards later cycles (which might indicate a sequencing problem): + +.. image:: ./static/fastx_icons/fastq_nucleotides_distribution_3.png + + + + + + + + +But most of the time, the chart will look rather random: + +.. image:: ./static/fastx_icons/fastq_nucleotides_distribution_4.png + +</help> +</tool> +<!-- FASTQ-Nucleotides-Distribution is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) --> diff -r 0f97b3048bc3 -r 40c5e1853a66 tools/fastx_toolkit/fastx_quality_statistics.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/fastx_toolkit/fastx_quality_statistics.xml Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,102 @@ +<tool id="cshl_fastx_quality_statistics" name="Quality Statistics"> + <description></description> + <command>zcat -f $input | fastx_quality_stats -o $output -Q $offset</command> + + <inputs> + <param format="fasta,fastqsolexa" name="input" type="data" label="Library to analyse" /> + <param name="offset" size="4" type="integer" value="33" label="FASTQ ASCII offset" /> + </inputs> + + <tests> + <test> + <param name="input" value="fastq_stats1.fastq" /> + <param name="offset" value="64" /> + <output name="output" file="fastq_stats1.out" /> + </test> + </tests> + + <outputs> + <data format="txt" name="output" metadata_source="input" /> + </outputs> + +<help> + +**What it does** + +Creates quality statistics report for the given Solexa/FASTQ library. + +.. class:: infomark + +**TIP:** This statistics report can be used as input for **Quality Score** and **Nucleotides Distribution** tools. + +----- + +**The output file will contain the following fields:** + +* column = column number (1 to 36 for a 36-cycles read solexa file) +* count = number of bases found in this column. +* min = Lowest quality score value found in this column. +* max = Highest quality score value found in this column. +* sum = Sum of quality score values for this column. +* mean = Mean quality score value for this column. +* Q1 = 1st quartile quality score. +* med = Median quality score. +* Q3 = 3rd quartile quality score. +* IQR = Inter-Quartile range (Q3-Q1). +* lW = 'Left-Whisker' value (for boxplotting). +* rW = 'Right-Whisker' value (for boxplotting). +* A_Count = Count of 'A' nucleotides found in this column. +* C_Count = Count of 'C' nucleotides found in this column. +* G_Count = Count of 'G' nucleotides found in this column. +* T_Count = Count of 'T' nucleotides found in this column. +* N_Count = Count of 'N' nucleotides found in this column. + + + + + + +**Output Example**:: + + column count min max sum mean Q1 med Q3 IQR lW rW A_Count C_Count G_Count T_Count N_Count + 1 6362991 -4 40 250734117 39.41 40 40 40 0 40 40 1396976 1329101 678730 2958184 0 + 2 6362991 -5 40 250531036 39.37 40 40 40 0 40 40 1786786 1055766 1738025 1782414 0 + 3 6362991 -5 40 248722469 39.09 40 40 40 0 40 40 2296384 984875 1443989 1637743 0 + 4 6362991 -5 40 247654797 38.92 40 40 40 0 40 40 1683197 1410855 1722633 1546306 0 + 5 6362991 -4 40 248214827 39.01 40 40 40 0 40 40 2536861 1167423 1248968 1409739 0 + 6 6362991 -5 40 248499903 39.05 40 40 40 0 40 40 1598956 1236081 1568608 1959346 0 + 7 6362991 -4 40 247719760 38.93 40 40 40 0 40 40 1692667 1822140 1496741 1351443 0 + 8 6362991 -5 40 245745205 38.62 40 40 40 0 40 40 2230936 1343260 1529928 1258867 0 + 9 6362991 -5 40 245766735 38.62 40 40 40 0 40 40 1702064 1306257 1336511 2018159 0 + 10 6362991 -5 40 245089706 38.52 40 40 40 0 40 40 1519917 1446370 1450995 1945709 0 + 11 6362991 -5 40 242641359 38.13 40 40 40 0 40 40 1717434 1282975 1387804 1974778 0 + 12 6362991 -5 40 242026113 38.04 40 40 40 0 40 40 1662872 1202041 1519721 1978357 0 + 13 6362991 -5 40 238704245 37.51 40 40 40 0 40 40 1549965 1271411 1973291 1566681 1643 + 14 6362991 -5 40 235622401 37.03 40 40 40 0 40 40 2101301 1141451 1603990 1515774 475 + 15 6362991 -5 40 230766669 36.27 40 40 40 0 40 40 2344003 1058571 1440466 1519865 86 + 16 6362991 -5 40 224466237 35.28 38 40 40 2 35 40 2203515 1026017 1474060 1651582 7817 + 17 6362991 -5 40 219990002 34.57 34 40 40 6 25 40 1522515 1125455 2159183 1555765 73 + 18 6362991 -5 40 214104778 33.65 30 40 40 10 15 40 1479795 2068113 1558400 1249337 7346 + 19 6362991 -5 40 212934712 33.46 30 40 40 10 15 40 1432749 1231352 1769799 1920093 8998 + 20 6362991 -5 40 212787944 33.44 29 40 40 11 13 40 1311657 1411663 2126316 1513282 73 + 21 6362991 -5 40 211369187 33.22 28 40 40 12 10 40 1887985 1846300 1300326 1318380 10000 + 22 6362991 -5 40 213371720 33.53 30 40 40 10 15 40 542299 3446249 516615 1848190 9638 + 23 6362991 -5 40 221975899 34.89 36 40 40 4 30 40 347679 1233267 926621 3855355 69 + 24 6362991 -5 40 194378421 30.55 21 40 40 19 -5 40 433560 674358 3262764 1992242 67 + 25 6362991 -5 40 199773985 31.40 23 40 40 17 -2 40 944760 325595 1322800 3769641 195 + 26 6362991 -5 40 179404759 28.20 17 34 40 23 -5 40 3457922 156013 1494664 1254293 99 + 27 6362991 -5 40 163386668 25.68 13 28 40 27 -5 40 1392177 281250 3867895 821491 178 + 28 6362991 -5 40 156230534 24.55 12 25 40 28 -5 40 907189 981249 4174945 299437 171 + 29 6362991 -5 40 163236046 25.65 13 28 40 27 -5 40 1097171 3418678 1567013 280008 121 + 30 6362991 -5 40 151309826 23.78 12 23 40 28 -5 40 3514775 2036194 566277 245613 132 + 31 6362991 -5 40 141392520 22.22 10 21 40 30 -5 40 1569000 4571357 124732 97721 181 + 32 6362991 -5 40 143436943 22.54 10 21 40 30 -5 40 1453607 4519441 38176 351107 660 + 33 6362991 -5 40 114269843 17.96 6 14 30 24 -5 40 3311001 2161254 155505 734297 934 + 34 6362991 -5 40 140638447 22.10 10 20 40 30 -5 40 1501615 1637357 18113 3205237 669 + 35 6362991 -5 40 138910532 21.83 10 20 40 30 -5 40 1532519 3495057 23229 1311834 352 + 36 6362991 -5 40 117158566 18.41 7 15 30 23 -5 40 4074444 1402980 63287 822035 245 + + +</help> +</tool> +<!-- FASTQ-Statistics is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) --> diff -r 0f97b3048bc3 -r 40c5e1853a66 tools/fastx_toolkit/fastx_renamer.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/fastx_toolkit/fastx_renamer.xml Mon Sep 14 17:03:17 2009 -0400 @@ -0,0 +1,56 @@ +<tool id="cshl_fastx_renamer" name="Rename" version="0.0.11" > + <description>sequence identifiers</description> + <command>zcat -f $input | fastx_renamer -n $TYPE -o $output -v </command> + + <inputs> + <param format="fastqsolexa,fasta,fastqsanger" name="input" type="data" label="FASTQ/A Library to rename" /> + + <param name="TYPE" type="select" label="Rename sequence identifiers to"> + <option value="SEQ">Nucleotides sequence</option> + <option value="COUNT">Numeric Counter</option> + </param> + </inputs> + + <outputs> + <data format="input" name="output" metadata_source="input" /> + </outputs> + +<help> + +**What it does** + +This tool renames the sequence identifiers in a FASTQ/A file. + +.. class:: infomark + +Use this tool at the beginning of your workflow, as a way to keep the original sequence (before trimming,clipping,barcode-removal, etc). + +-------- + +**Example** + +The following Solexa-FASTQ file:: + + @CSHL_4_FC042GAMMII_2_1_517_596 + GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT + +CSHL_4_FC042GAMMII_2_1_517_596 + 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24 24 9 24 9 40 10 10 15 40 + +Renamed to **nucleotides sequence**:: + + @GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT + GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT + +GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT + 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24 24 9 24 9 40 10 10 15 40 + +Renamed to **numeric counter**:: + + @1 + GGTCAATGATGAGTTGGCACTGTAGGCACCATCAAT + +1 + 40 40 40 40 40 40 40 40 40 40 38 40 40 40 40 40 14 40 40 40 40 40 36 40 13 14 24 24 9 24 9 40 10 10 15 40 + + +</help> +</tool> +<!-- FASTQ-to-FASTA is part of the FASTX-toolkit, by A.Gordon (gordon@cshl.edu) -->