galaxy-dev
Threads by month
- ----- 2025 -----
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- 10008 discussions

[hg] galaxy 1507: add SHRiMP mapper for short reads analysis.
by gregļ¼ scofield.bx.psu.edu 22 Sep '08
by gregļ¼ scofield.bx.psu.edu 22 Sep '08
22 Sep '08
details: http://www.bx.psu.edu/hg/galaxy/rev/842f1883cf53
changeset: 1507:842f1883cf53
user: wychung
date: Mon Sep 15 15:04:41 2008 -0400
description:
add SHRiMP mapper for short reads analysis.
6 file(s) affected in this change:
test-data/shrimp_phix_anc.fa
test-data/shrimp_wrapper_test1.fastq
test-data/shrimp_wrapper_test1.out1
tool_conf.xml.sample
tools/metag_tools/shrimp_wrapper.py
tools/metag_tools/shrimp_wrapper.xml
diffs (853 lines):
diff -r 26825f08d362 -r 842f1883cf53 test-data/shrimp_phix_anc.fa
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/shrimp_phix_anc.fa Mon Sep 15 15:04:41 2008 -0400
@@ -0,0 +1,2 @@
+>PHIX174
+GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTTGATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAGTGGACTGCTGGCGGAAAATGAGAAAATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGATTCTGTCAAAAACTGACGCGTTGGATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGTTTAGATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTGGATTACTATCTGAGTCCGATGCTGTTCAACCACTAATAGGTAAGAAATCATGAGTCAAGTTACTGAACAATCCGTACGTTTCCAGACCGCTTTGGCCTCTATTAAGCTCATTCAGGCTTCTGCCGTTTTGGATTTAACCGAAGATGATTTCGATTTTCTGACGAGTAACAAAGTTTGGATTGCTACTGACCGCTCTCGTGCTCGTCGCTGCGTTGAGGCTTGCGTTTATGGTACGCTGGACTTTGTGGGATACCCTCGCTTTCCTGCTCCTGTTGAGTTTATTGCTGCCGTCATTGCTTATTATGTTCATCCCGTCAACATTCAAACGGCCTGTCTCATCATGGAAGGCGCTGAATTTACGGAAAACATTATTAATGGCGTCGAGCGTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCTTGCGTGTACGCGCAGGAAACACTGACGTTCTTACTGACGCAGAAGAAAACGTGCGTCAAAAATTACGTGCaGAAGGAGTGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTACTAAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGCCCCTTACTT
GAGGATAAATTATGTCTAATATTCAAACTGGCGCCGAGCGTATGCCGCATGACCTTTCCCATCTTGGCTTCCTTGCTGGTCAGATTGGTCGTCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGACTCCTTCGAGATGGACGCCGTTGGCGCTCTCCGTCTTTCTCCATTGCGTCGTGGCCTTGCTATTGACTCTACTGTAGACATTTTTACTTTTTATGTCCCTCATCGTCACGTTTATGGTGAACAGTGGATTAAGTTCATGAAGGATGGTGTTAATGCCACTCCTCTCCCGACTGTTAACACTACTGGTTATATTGACCATGCCGCTTTTCTTGGCACGATTAACCCTGATACCAATAAAATCCCTAAGCATTTGTTTCAGGGTTATTTGAATATCTATAACAACTATTTTAAAGCGCCGTGGATGCCTGACCGTACCGAGGCTAACCCTAATGAGCTTAATCAAGATGATGCTCGTTATGGTTTCCGTTGCTGCCATCTCAAAAACATTTGGACTGCTCCGCTTCCTCCTGAGACTGAGCTTTCTCGCCAAATGACGACTTCTACCACATCTATTGACATTATGGGTCTGCAAGCTGCTTATGCTAATTTGCATACTGACCAAGAACGTGATTACTTCATGCAGCGTTACCgTGATGTTATTTCTTCATTTGGAGGTAAAACCTCTTATGACGCTGACAACCGTCCTTTACTTGTCATGCGCTCTAATCTCTGGGCATCTGGCTATGATGTTGATGGAACTGACCAAACGTCGTTAGGCCAGTTTTCTGGTCGTGTTCAACAGACCTATAAACATTCTGTGCCGCGTTTCTTTGTTCCTGAGCATGGCACTATGTTTACTCTTGCGCTTGTTCGTTTTCCGCCTACTGCGACTAAAGAGATTCAGTACCTTAACGCTAAAGGTGCTTTGACTTATACCGATATTGCTGGCGACCCTGTTTTGTATGGCAACTTGCCGCCG
CGTGAAATTTCTATGAAGGATGTTTTCCGTTCTGGTGATTCGTCTAAGAAGTTTAAGATTGCTGAGGGTCAGTGGTATCGTTATGCGCCTTCGTATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTTCCCATTCATTCAGGAACCGCCTTCTGGTGATTTGCAAGAACGCGTACTTATTCGCCACCATGATTATGACCAGTGTTTCCAGTCCGTTCAGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGACCGTTTATCGCAATCTGCCGACCACTCGCGATTCAATCATGACTTCGTGATAAAAGATTGAGTGTGAGGTTATAACGCCGAAGCGGTAAAAATTTTAATTTTTGCCGCTGAGGGGTTGACCAAGCGAAGCGCGGTAGGTTTTCTGCTTAGGAGTTTAATCATGTTTCAGACTTTTATTTCTCGCCATAATTCAAACTTTTTTTCTGATAAGCTGGTTCTCACTTCTGTTACTCCAGCTTCTTCGGCACCTGTTTTACAGACACCTAAAGCTACATCGTCAACGTTATATTTTGATAGTTTGACGGTTAATGCTGGTAATGGTGGTTTTCTTCATTGCATTCAGATGGATACATCTGTCAACGCCGCTAATCAGGTTGTTTCTGTTGGTGCTGATATTGCTTTTGATGCCGACCCTAAATTTTTTGCCTGTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCGACTACCCTCCCGACTGCCTATGATGTTTATCCTTTGAATGGTCGCCATGATGGTGGTTATTATACCGTCAAGGACTGTGTGACTATTGACGTCCTTCCCCGTACGCCGGGCAATAAtGTTTATGTTGGTTTCATGGTTTGGTCTAACTTTACCGCTACTAAATGCCGCGGATTGGTTTCGCTGAATCAGGTTATTAAAGAGATTATTTGTCTCCAGCCACTTAAGTGAGGTGATTTATGTTTGGTGCTATTGCTGGCGGTATTGCTTCTGCTC
TTGCTGGTGGCGCCATGTCTAAATTGTTTGGAGGCGGTCAAAAAGCCGCCTCCGGTGGCATTCAAGGTGATGTGCTTGCTACCGATAACAATACTGTAGGCATGGGTGATGCTGGTATTAAATCTGCCATTCAAGGCTCTAATGTTCCTAACCCTGATGAGGCCGCCCCTAGTTTTGTTTCTGGTGCTATGGCTAAAGCTGGTAAAGGACTTCTTGAAGGTACGTTGCAGGCTGGCACTTCTGCCGTTTCTGATAAGTTGCTTGATTTGGTTGGACTTGGTGGCAAGTCTGCCGCTGATAAAGGAAAGGATACTCGTGATTATCTTGCTGCTGCATTTCCTGAGCTTAATGCTTGGGAGCGTGCTGGTGCTGATGCTTCCTCTGCTGGTATGGTTGACGCCGGATTTGAGAATCAAAAAGAGCTTACTAAAATGCAACTGGACAATCAGAAAGAGATTGCCGAGATGCAAAATGAGACTCAAAAAGAGATTGCTGGCATTCAGTCGGCGACTTCACGCCAGAATACGAAAGACCAGGTATATGCACAAAATGAGATGCTTGCTTATCAACAGAAGGAGTCTACTGCTCGCGTTGCGTCTATTATGGAAAACACCAATCTTTCCAAGCAACAGCAGGTTTCCGAGATTATGCGCCAAATGCTTACTCAAGCTCAAACGGCTGGTCAGTATTTTACCAATGACCAAATCAAAGAAATGACTCGCAAGGTTAGTGCTGAGGTTGACTTAGTTCATCAGCAAACGCAGAATCAGCGGTATGGCTCTTCTCATATTGGCGCTACTGCAAAGGATATTTCTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGATACTTGGAACAATTTCTGGAAAGACGGTAAAGCTGATGGTATTGGCTCTAATTTGTCTAGGAAATAACCGTCAGGATTGACACCCTCCCAATTGTATGTTTTCATG
CCTCCAAATCTTGGAGGCTTTTTTATGGTTCGTTCTTATTACCCTTCTGAATGTCACGCTGATTATTTTGACTTTGAGCGTATCGAGGCTCTTAAACCTGCTATTGAGGCTTGTGGCATTTCTACTCTTTCTCAATCCCCAATGCTTGGCTTCCATAAGCAGATGGATAACCGCATCAAGCTCTTGGAAGAGATTCTGTCTTTTCGTATGCAGGGCGTTGAGTTCGATAATGGTGATATGTATGTTGACGGCCATAAGGCTGCTTCTGACGTTCGTGATGAGTTTGTATCTGTTACTGAGAAGTTAATGGATGAATTGGCACAATGCTACAATGTGCTCCCCCAACTTGATATTAATAACACTATAGACCACCGCCCCGAAGGGGACGAAAAATGGTTTTTAGAGAACGAGAAGACGGTTACGCAGTTTTGCCGCAAGCTGGCTGCTGAACGCCCTCTTAAGGATATTCGCGATGAGTATAATTACCCCAAAAAGAAAGGTATTAAGGATGAGTGTTCAAGATTGCTGGAGGCCTCCACTATGAAATCGCGTAGAGGCTTTaCTATTCAGCGTTTGATGAATGCAATGCGACAGGCTCATGCTGATGGTTGGTTTATCGTTTTTGACACTCTCACGTTGGCTGACGACCGATTAGAGGCGTTTTATGATAATCCCAATGCTTTGCGTGACTATTTTCGTGATATTGGTCGTATGGTTCTTGCTGCCGAGGGTCGCAAGGCTAATGATTCACACGCCGACTGCTATCAGTATTTTTGTGTGCCTGAGTATGGTACAGCTAATGGCCGTCTTCATTTCCATGCGGTGCAtTTTATGCGGACACTTCCTACAGGTAGCGTTGACCCTAATTTTGGTCGTCGGGTACGCAATCGCCGCCAGTTAAATAGCTTGCAAAATACGTGGCCTTATGGTTACAGTATGCCCATCGCAGTTCGCTACACGCAGGACGCTTTTTCACGTTCTGGTTGGTT
GTGGCCTGTTGATGCTAAAGGTGAGCCGCTTAAAGCTACCAGTTATATGGCTGTTGGTTTCTATGTGGCTAAATACGTTAACAAAAAGTCAGATATGGACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCACTAAAAACCAAGCTGTCGCTACTTCCCAAGAAGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACGCCGTTCAACCAGATATTGAAGCAGAACGCAAAAAGAGAGATGAGATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGCGCAACCTGTGACGACAAATCTGCTCAAATTTATGCGCGCTTCGATAAAAATGATTGGCGTATCCAACCTGCA
diff -r 26825f08d362 -r 842f1883cf53 test-data/shrimp_wrapper_test1.fastq
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/shrimp_wrapper_test1.fastq Mon Sep 15 15:04:41 2008 -0400
@@ -0,0 +1,40 @@
+@HWI-EAS91_1_306UPAAXX:6:1:959:874
+GCGGGCTGCGACATAAAGCATACCGCCTGGGCGGCG
++HWI-EAS91_1_306UPAAXX:6:1:959:874
+hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
+@HWI-EAS91_1_306UPAAXX:6:1:1630:1975
+GAAAGAAAATCAGCAACAGTGGCATCGATTTTACGG
++HWI-EAS91_1_306UPAAXX:6:1:1630:1975
+hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
+@HWI-EAS91_1_306UPAAXX:6:1:770:994
+GCAGGCAGCGTGCTGCGAGTCTTTTCGAATGATAAG
++HWI-EAS91_1_306UPAAXX:6:1:770:994
+hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
+@HWI-EAS91_1_306UPAAXX:6:1:1274:306
+GTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGC
++HWI-EAS91_1_306UPAAXX:6:1:1274:306
+hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh\h
+@HWI-EAS91_1_306UPAAXX:6:1:1339:209
+GTTTGGTCAGTTCCATCAACATCATAGCCAGATGCC
++HWI-EAS91_1_306UPAAXX:6:1:1339:209
+hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
+@HWI-EAS91_1_306UPAAXX:6:1:203:1240
+GATTCTCTTGTTGACATTTTAAAAGAGCGTGGATTA
++HWI-EAS91_1_306UPAAXX:6:1:203:1240
+hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
+@HWI-EAS91_1_306UPAAXX:6:1:869:448
+GCTGGCCATCAGTTCGCGGATACCGGCGGCAAACAT
++HWI-EAS91_1_306UPAAXX:6:1:869:448
+hhhhhhhhhhhhhhhhhhhhhhhhhhhhKhhhhhhh
+@HWI-EAS91_1_306UPAAXX:6:1:939:928
+GGAGGCCTCCAGCAATCTTGAACACTCATCCTTAAT
++HWI-EAS91_1_306UPAAXX:6:1:939:928
+hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
+@HWI-EAS91_1_306UPAAXX:6:1:1756:1476
+GCGTAGAGGCTTTACTATTCAGCGTTTGATGAATGC
++HWI-EAS91_1_306UPAAXX:6:1:1756:1476
+hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
+@HWI-EAS91_1_306UPAAXX:6:1:1528:181
+GGCTGGTCAGTATTTTACCAATGACCAAATCAAAGA
++HWI-EAS91_1_306UPAAXX:6:1:1528:181
+hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
diff -r 26825f08d362 -r 842f1883cf53 test-data/shrimp_wrapper_test1.out1
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/shrimp_wrapper_test1.out1 Mon Sep 15 15:04:41 2008 -0400
@@ -0,0 +1,7 @@
+#FORMAT: readname contigname strand contigstart contigend readstart readend readlength score editstring
+>HWI-EAS91_1_306UPAAXX:6:1:1528:181 PHIX174 + 3644 3679 1 36 36 3600 36
+>HWI-EAS91_1_306UPAAXX:6:1:1756:1476 PHIX174 + 4505 4540 1 36 36 3600 36
+>HWI-EAS91_1_306UPAAXX:6:1:203:1240 PHIX174 + 310 345 1 36 36 3600 36
+>HWI-EAS91_1_306UPAAXX:6:1:1274:306 PHIX174 + 933 968 1 36 36 3600 36
+>HWI-EAS91_1_306UPAAXX:6:1:939:928 PHIX174 - 4458 4493 1 36 36 3600 36
+>HWI-EAS91_1_306UPAAXX:6:1:1339:209 PHIX174 - 1732 1767 1 36 36 3600 36
diff -r 26825f08d362 -r 842f1883cf53 tool_conf.xml.sample
--- a/tool_conf.xml.sample Sun Sep 14 14:58:50 2008 -0400
+++ b/tool_conf.xml.sample Mon Sep 15 15:04:41 2008 -0400
@@ -276,6 +276,7 @@
<tool file="metag_tools/blat_coverage_report.xml" />
</section>
<section name="Short Read Mapping" id="solexa_tools">
+ <tool file="metag_tools/shrimp_wrapper.xml" />
<tool file="sr_mapping/lastz_wrapper.xml" />
<tool file="metag_tools/megablast_wrapper.xml" />
<tool file="metag_tools/megablast_xml_parser.xml" />
diff -r 26825f08d362 -r 842f1883cf53 tools/metag_tools/shrimp_wrapper.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/metag_tools/shrimp_wrapper.py Mon Sep 15 15:04:41 2008 -0400
@@ -0,0 +1,577 @@
+#! /usr/bin/python
+
+"""
+SHRiMP wrapper
+
+Inputs:
+ reference seq and reads
+
+Outputs:
+ table of 8 columns:
+ chrom ref_loc read_id read_loc ref_nuc read_nuc quality coverage
+ SHRiMP output
+
+Parameters:
+ -s Spaced Seed (default: 111111011111)
+ -n Seed Matches per Window (default: 2)
+ -t Seed Hit Taboo Length (default: 4)
+ -9 Seed Generation Taboo Length (default: 0)
+ -w Seed Window Length (default: 115.00%)
+ -o Maximum Hits per Read (default: 100)
+ -r Maximum Read Length (default: 1000)
+ -d Kmer Std. Deviation Limit (default: -1 [None])
+
+ -m S-W Match Value (default: 100)
+ -i S-W Mismatch Value (default: -150)
+ -g S-W Gap Open Penalty (Reference) (default: -400)
+ -q S-W Gap Open Penalty (Query) (default: -400)
+ -e S-W Gap Extend Penalty (Reference) (default: -70)
+ -f S-W Gap Extend Penalty (Query) (default: -70)
+ -h S-W Hit Threshold (default: 68.00%)
+
+Command:
+%rmapper -s spaced_seed -n seed_matches_per_window -t seed_hit_taboo_length -9 seed_generation_taboo_length -w seed_window_length -o max_hits_per_read -r max_read_length -d kmer -m sw_match_value -i sw_mismatch_value -g sw_gap_open_ref -q sw_gap_open_query -e sw_gap_ext_ref -f sw_gap_ext_query -h sw_hit_threshold <query> <target> > <output> 2> <log>
+
+SHRiMP output:
+>7:2:1147:982/1 chr3 + 36586562 36586595 2 35 36 2900 3G16G13
+>7:2:1147:982/1 chr3 + 95338194 95338225 4 35 36 2700 9T7C14
+>7:2:587:93/1 chr3 + 14913541 14913577 1 35 36 2960 19--16
+
+Testing:
+%python shrimp_wrapper.py single ~/Desktop/shrimp_wrapper/phix_anc.fa tmp tmp1 ~/Desktop/shrimp_wrapper/phix.10.solexa.fastq
+%python shrimp_wrapper.py paired ~/Desktop/shrimp_wrapper/eca_ref_chrMT.fa tmp tmp1 ~/Desktop/shrimp_wrapper/eca.5.solexa_1.fastq ~/Desktop/shrimp_wrapper/eca.5.solexa_2.fastq
+
+"""
+
+import os, sys, tempfile, os.path
+
+assert sys.version_info[:2] >= (2.4)
+
+def stop_err( msg ):
+
+ sys.stderr.write( "%s\n" % msg )
+ sys.exit()
+
+def reverse_complement(s):
+
+ complement_dna = {"A":"T", "T":"A", "C":"G", "G":"C", "a":"t", "t":"a", "c":"g", "g":"c", "N":"N", "n":"n" , ".":".", "-":"-"}
+ reversed_s = []
+ for i in s:
+ reversed_s.append(complement_dna[i])
+ reversed_s.reverse()
+ return "".join(reversed_s)
+
+def generate_sub_table(result_file, ref_file, score_files, table_outfile, hit_per_read):
+
+ """
+ TODO: the cross-over error has not been addressed yet.
+ """
+
+ insertion_size = 600
+
+ all_score_file = score_files.split('&')
+
+ if len(all_score_file) != hit_per_read: stop_err('Un-equal number of files!')
+
+ temp_table_name = tempfile.NamedTemporaryFile().name
+ temp_table = open(temp_table_name, 'w')
+
+ outfile = open(table_outfile,'w')
+
+ # reference seq: not a single fasta seq
+ refseq = {}
+ chrom_cov = {}
+ seq = ''
+
+ for i, line in enumerate(file(ref_file)):
+ line = line.rstrip()
+ if not line or line.startswith('#'): continue
+
+ if line.startswith('>'):
+ if seq:
+ if refseq.has_key(title):
+ pass
+ else:
+ refseq[title] = seq
+ chrom_cov[title] = {}
+ seq = ''
+ title = line[1:]
+ else:
+ seq += line
+ if seq:
+ if not refseq.has_key(title):
+ refseq[title] = seq
+ chrom_cov[title] = {}
+
+ # find hits : one end and/or the other
+ hits = {}
+ for i, line in enumerate(file(result_file)):
+ line = line.rstrip()
+ if not line or line.startswith('#'): continue
+
+ #FORMAT: readname contigname strand contigstart contigend readstart readend readlength score editstring
+ fields = line.split('\t')
+ readname = fields[0][1:]
+ chrom = fields[1]
+ strand = fields[2]
+ chrom_start = int(fields[3]) - 1
+ chrom_end = int(fields[4])
+ read_start = fields[5]
+ read_end = fields[6]
+ read_len = fields[7]
+ score = fields[8]
+ editstring = fields[9]
+
+ if hit_per_read == 1:
+ endindex = '1'
+ else:
+ readname, endindex = readname.split('/')
+
+ if hits.has_key(readname):
+ if hits[readname].has_key(endindex):
+ hits[readname][endindex].append([strand, editstring, chrom_start, chrom_end, read_start, chrom])
+ else:
+ hits[readname][endindex] = [[strand, editstring, chrom_start, chrom_end, read_start, chrom]]
+ else:
+ hits[readname] = {}
+ hits[readname][endindex] = [[strand, editstring, chrom_start, chrom_end, read_start, chrom]]
+
+ # find score : one end and the other end
+ hits_score = {}
+ readname = ''
+ score = ''
+ for num_score_file in range(len(all_score_file)):
+ score_file = all_score_file[num_score_file]
+ for i, line in enumerate(file(score_file)):
+ line = line.rstrip()
+ if not line or line.startswith('#'): continue
+
+ if line.startswith('>'):
+ if score:
+ if hits.has_key(readname):
+ if len(hits[readname]) == hit_per_read:
+ if hits_score.has_key(readname):
+ if hits_score[readname].has_key(endindex):
+ pass
+ else:
+ hits_score[readname][endindex] = score
+ else:
+ hits_score[readname] = {}
+ hits_score[readname][endindex] = score
+ score = ''
+ if hit_per_read == 1:
+ readname = line[1:]
+ endindex = '1'
+ else:
+ readname, endindex = line[1:].split('/')
+ else:
+ score = line
+ if score: # the last one
+ if hits.has_key(readname):
+ if len(hits[readname]) == hit_per_read:
+ if hits_score.has_key(readname):
+ if hits_score[readname].has_key(endindex):
+ pass
+ else:
+ hits_score[readname][endindex] = score
+ else:
+ hits_score[readname] = {}
+ hits_score[readname][endindex] = score
+
+ # mutation call to all mappings
+ for readkey in hits.keys():
+ if len(hits[readkey]) != hit_per_read: continue
+
+ matches = []
+ match_count = 0
+
+ if hit_per_read == 1:
+ matches = [ hits[readkey]['1'] ]
+ match_count = 1
+ else:
+ end1_data = hits[readkey]['1']
+ end2_data = hits[readkey]['2']
+
+ for i, end1_hit in enumerate(end1_data):
+ crin_strand = {'+': False, '-': False}
+ crin_insertSize = {'+': False, '-': False}
+
+ crin_strand[end1_hit[0]] = True
+ crin_insertSize[end1_hit[0]] = int(end1_hit[2])
+
+ for j, end2_hit in enumerate(end2_data):
+ crin_strand[end2_hit[0]] = True
+ crin_insertSize[end2_hit[0]] = int(end2_hit[2])
+
+ if end1_hit[-1] != end2_hit[-1] : continue
+
+ if crin_strand['+'] and crin_strand['-']:
+ if (crin_insertSize['-'] - crin_insertSize['+']) <= insertion_size:
+ matches.append([end1_hit, end2_hit])
+ match_count += 1
+
+ if match_count == 1:
+ for x, end_data in enumerate(matches[0]):
+
+ end_strand, end_editstring, end_chr_start, end_chr_end, end_read_start, end_chrom = end_data
+ end_read_start = int(end_read_start) - 1
+
+ if end_strand == '-':
+ refsegment = reverse_complement(refseq[end_chrom][end_chr_start:end_chr_end])
+ else:
+ refsegment = refseq[end_chrom][end_chr_start:end_chr_end]
+
+ match_len = 0
+ editindex = 0
+ gap_read = 0
+
+ while editindex < len(end_editstring):
+ editchr = end_editstring[editindex]
+ chrA = ''
+ chrB = ''
+ locIndex = []
+ if editchr.isdigit():
+ editcode = ''
+ while editchr.isdigit() and editindex < len(end_editstring):
+ editcode += editchr
+ editindex += 1
+ if editindex < len(end_editstring): editchr = end_editstring[editindex]
+ for baseIndex in range(int(editcode)):
+ chrA += refsegment[match_len+baseIndex]
+ chrB = chrA
+ match_len += int(editcode)
+ elif editchr == 'x':
+ # crossover: inserted between the appropriate two bases
+ # Two sequencing errors: 4x15x6 (25 matches with 2 crossovers)
+ # Treated as errors in the reads; Do nothing.
+ editindex += 1
+
+ elif editchr.isalpha():
+ editcode = editchr
+ editindex += 1
+ chrA = refsegment[match_len]
+ chrB = editcode
+ match_len += len(editcode)
+
+ elif editchr == '-':
+ editcode = editchr
+ editindex += 1
+ chrA = refsegment[match_len]
+ chrB = editcode
+ match_len += len(editcode)
+ gap_read += 1
+
+ elif editchr == '(':
+ editcode = ''
+ while editchr != ')' and editindex < len(end_editstring):
+ if editindex < len(end_editstring): editchr = end_editstring[editindex]
+ editcode += editchr
+ editindex += 1
+ editcode = editcode[1:-1]
+ chrA = '-'*len(editcode)
+ chrB = editcode
+
+ else:
+ print 'Warning! Unknown symbols', editchr
+
+ if end_strand == '-':
+ chrA = reverse_complement(chrA)
+ chrB = reverse_complement(chrB)
+
+ pos_line = ''
+ rev_line = ''
+
+ for mappingIndex in range(len(chrA)):
+ # reference
+ chrAx = chrA[mappingIndex]
+ # read
+ chrBx = chrB[mappingIndex]
+
+ if chrAx and chrBx and chrBx.upper() != 'N':
+ if end_strand == '+':
+ chrom_loc = end_chr_start+match_len-len(chrA)+mappingIndex
+ read_loc = end_read_start+match_len-len(chrA)+mappingIndex-gap_read
+ if chrAx == '-': chrom_loc -= 1
+
+ if chrBx == '-':
+ scoreBx = '-1'
+ else:
+ scoreBx = hits_score[readkey][str(x+1)].split()[read_loc]
+
+ # 1-based on chrom_loc and read_loc
+ pos_line = pos_line + '\t'.join([end_chrom, str(chrom_loc+1), readkey+'/'+str(x+1), str(read_loc+1), chrAx, chrBx, scoreBx]) + '\n'
+ else:
+ chrom_loc = end_chr_end-match_len+mappingIndex
+ read_loc = end_read_start+match_len-1-mappingIndex-gap_read
+ if chrAx == '-': chrom_loc -= 1
+
+ if chrBx == '-':
+ scoreBx = '-1'
+ else:
+ scoreBx = hits_score[readkey][str(x+1)].split()[read_loc]
+
+ # 1-based on chrom_loc and read_loc
+ rev_line = '\t'.join([end_chrom, str(chrom_loc+1), readkey+'/'+str(x+1), str(read_loc+1), chrAx, chrBx, scoreBx]) +'\n' + rev_line
+
+ if chrom_cov.has_key(end_chrom):
+ if chrom_cov[end_chrom].has_key(chrom_loc):
+ chrom_cov[end_chrom][chrom_loc] += 1
+ else:
+ chrom_cov[end_chrom][chrom_loc] = 1
+ else:
+ chrom_cov[end_chrom] = {}
+ chrom_cov[end_chrom][chrom_loc] = 1
+
+ if pos_line: temp_table.write('%s\n' %(pos_line.rstrip('\r\n')))
+ if rev_line: temp_table.write('%s\n' %(rev_line.rstrip('\r\n')))
+
+ temp_table.close()
+
+ # chrom-wide coverage
+ for i, line in enumerate(open(temp_table_name)):
+ line = line.rstrip()
+ if not line or line.startswith('#'): continue
+
+ fields = line.split()
+ chrom = fields[0]
+ eachBp = int(fields[1])
+ readname = fields[2]
+
+ if hit_per_read == 1:
+ fields[2] = readname.split('/')[0]
+
+ if chrom_cov[chrom].has_key(eachBp):
+ outfile.write('%s\t%d\n' %('\t'.join(fields), chrom_cov[chrom][eachBp]))
+ else:
+ outfile.write('%s\t%d\n' %('\t'.join(fields), 0))
+
+ outfile.close()
+
+ if os.path.exists(temp_table_name): os.remove(temp_table_name)
+
+ return True
+
+def convert_fastqsolexa_to_fasta_qual(infile_name, query_fasta, query_qual):
+
+ outfile_seq = open( query_fasta, 'w' )
+ outfile_score = open( query_qual, 'w' )
+
+ seq_title_startswith = ''
+ qual_title_startswith = ''
+
+ default_coding_value = 64
+ fastq_block_lines = 0
+
+ for i, line in enumerate( file( infile_name ) ):
+ line = line.rstrip()
+ if not line or line.startswith( '#' ): continue
+
+ fastq_block_lines = ( fastq_block_lines + 1 ) % 4
+ line_startswith = line[0:1]
+
+ if fastq_block_lines == 1:
+ # first line is @title_of_seq
+ if not seq_title_startswith:
+ seq_title_startswith = line_startswith
+
+ if line_startswith != seq_title_startswith:
+ outfile_seq.close()
+ outfile_score.close()
+ stop_err( 'Invalid fastqsolexa format at line %d: %s.' % ( i + 1, line ) )
+
+ read_title = line[1:]
+ outfile_seq.write( '>%s\n' % line[1:] )
+
+ elif fastq_block_lines == 2:
+ # second line is nucleotides
+ read_length = len( line )
+ outfile_seq.write( '%s\n' % line )
+
+ elif fastq_block_lines == 3:
+ # third line is +title_of_qualityscore ( might be skipped )
+ if not qual_title_startswith:
+ qual_title_startswith = line_startswith
+
+ if line_startswith != qual_title_startswith:
+ outfile_seq.close()
+ outfile_score.close()
+ stop_err( 'Invalid fastqsolexa format at line %d: %s.' % ( i + 1, line ) )
+
+ quality_title = line[1:]
+ if quality_title and read_title != quality_title:
+ outfile_seq.close()
+ outfile_score.close()
+ stop_err( 'Invalid fastqsolexa format at line %d: sequence title "%s" differes from score title "%s".' % ( i + 1, read_title, quality_title ) )
+
+ if not quality_title:
+ outfile_score.write( '>%s\n' % read_title )
+ else:
+ outfile_score.write( '>%s\n' % line[1:] )
+
+ else:
+ # fourth line is quality scores
+ qual = ''
+ fastq_integer = True
+ # peek: ascii or digits?
+ val = line.split()[0]
+ try:
+ check = int( val )
+ fastq_integer = True
+ except:
+ fastq_integer = False
+
+ if fastq_integer:
+ # digits
+ qual = line
+ else:
+ # ascii
+ quality_score_length = len( line )
+ if quality_score_length == read_length + 1:
+ # first char is qual_score_startswith
+ qual_score_startswith = ord( line[0:1] )
+ line = line[1:]
+ elif quality_score_length == read_length:
+ qual_score_startswith = default_coding_value
+ else:
+ stop_err( 'Invalid fastqsolexa format at line %d: the number of quality scores ( %d ) is not the same as bases ( %d ).' % ( i + 1, quality_score_length, read_length ) )
+
+ for j, char in enumerate( line ):
+ score = ord( char ) - qual_score_startswith # 64
+ qual = "%s%s " % ( qual, str( score ) )
+
+ outfile_score.write( '%s\n' % qual )
+
+ outfile_seq.close()
+ outfile_score.close()
+
+ return True
+
+def __main__():
+
+ # I/O
+ type_of_reads = sys.argv[1] # single or paired
+ input_target = sys.argv[2] # fasta
+ shrimp_outfile = sys.argv[3] # shrimp output
+ table_outfile = sys.argv[4] # table output
+
+ # SHRiMP parameters: total = 15
+ # TODO: put threshold on each of these parameters
+ if len(sys.argv) == 21 or len(sys.argv) == 22:
+ spaced_seed = sys.argv[5]
+ seed_matches_per_window = sys.argv[6]
+ seed_hit_taboo_length = sys.argv[7]
+ seed_generation_taboo_length = sys.argv[8]
+ seed_window_length = sys.argv[9]
+ max_hits_per_read = sys.argv[10]
+ max_read_length = sys.argv[11]
+ kmer = sys.argv[12]
+ sw_match_value = sys.argv[13]
+ sw_mismatch_value = sys.argv[14]
+ sw_gap_open_ref = sys.argv[15]
+ sw_gap_open_query = sys.argv[16]
+ sw_gap_ext_ref = sys.argv[17]
+ sw_gap_ext_query = sys.argv[18]
+ sw_hit_threshold = sys.argv[19]
+
+ # Single-end parameters
+ if type_of_reads == 'single':
+ input_query = sys.argv[20] # single-end
+ hit_per_read = 1
+ query_fasta = tempfile.NamedTemporaryFile().name
+ query_qual = tempfile.NamedTemporaryFile().name
+ else: # Paired-end parameters
+ input_query_end1 = sys.argv[20] # paired-end
+ input_query_end2 = sys.argv[21]
+ hit_per_read = 2
+ query_fasta_end1 = tempfile.NamedTemporaryFile().name
+ query_fasta_end2 = tempfile.NamedTemporaryFile().name
+ query_qual_end1 = tempfile.NamedTemporaryFile().name
+ query_qual_end2 = tempfile.NamedTemporaryFile().name
+ else:
+ spaced_seed = '111111011111'
+ seed_matches_per_window = '2'
+ seed_hit_taboo_length = '4'
+ seed_generation_taboo_length = '0'
+ seed_window_length = '115.0'
+ max_hits_per_read = '100'
+ max_read_length = '1000'
+ kmer = '-1'
+ sw_match_value = '100'
+ sw_mismatch_value = '-150'
+ sw_gap_open_ref = '-400'
+ sw_gap_open_query = '-400'
+ sw_gap_ext_ref = '-70'
+ sw_gap_ext_query = '-70'
+ sw_hit_threshold = '68.0'
+
+ # Single-end parameters
+ if type_of_reads == 'single':
+ input_query = sys.argv[5] # single-end
+ hit_per_read = 1
+ query_fasta = tempfile.NamedTemporaryFile().name
+ query_qual = tempfile.NamedTemporaryFile().name
+ else: # Paired-end parameters
+ input_query_end1 = sys.argv[5] # paired-end
+ input_query_end2 = sys.argv[6]
+ hit_per_read = 2
+ query_fasta_end1 = tempfile.NamedTemporaryFile().name
+ query_fasta_end2 = tempfile.NamedTemporaryFile().name
+ query_qual_end1 = tempfile.NamedTemporaryFile().name
+ query_qual_end2 = tempfile.NamedTemporaryFile().name
+
+
+ # temp file for shrimp log file
+ shrimp_log = tempfile.NamedTemporaryFile().name
+
+ # convert fastq to fasta and quality score files
+ if type_of_reads == 'single':
+ return_value = convert_fastqsolexa_to_fasta_qual(input_query, query_fasta, query_qual)
+ else:
+ return_value = convert_fastqsolexa_to_fasta_qual(input_query_end1, query_fasta_end1, query_qual_end1)
+ return_value = convert_fastqsolexa_to_fasta_qual(input_query_end2, query_fasta_end2, query_qual_end2)
+
+ # SHRiMP command
+ if type_of_reads == 'single':
+ command = ' '.join(['rmapper-ls', '-s', spaced_seed, '-n', seed_matches_per_window, '-t', seed_hit_taboo_length, '-9', seed_generation_taboo_length, '-w', seed_window_length, '-o', max_hits_per_read, '-r', max_read_length, '-d', kmer, '-m', sw_match_value, '-i', sw_mismatch_value, '-g', sw_gap_open_ref, '-q', sw_gap_open_query, '-e', sw_gap_ext_ref, '-f', sw_gap_ext_query, '-h', sw_hit_threshold, query_fasta, input_target, '>', shrimp_outfile, '2>', shrimp_log])
+
+ try:
+ os.system(command)
+ except Exception, e:
+ if os.path.exists(query_fasta): os.remove(query_fasta)
+ if os.path.exists(query_qual): os.remove(query_qual)
+ stop_err(str(e))
+
+ else:
+ command_end1 = ' '.join(['rmapper-ls', '-s', spaced_seed, '-n', seed_matches_per_window, '-t', seed_hit_taboo_length, '-9', seed_generation_taboo_length, '-w', seed_window_length, '-o', max_hits_per_read, '-r', max_read_length, '-d', kmer, '-m', sw_match_value, '-i', sw_mismatch_value, '-g', sw_gap_open_ref, '-q', sw_gap_open_query, '-e', sw_gap_ext_ref, '-f', sw_gap_ext_query, '-h', sw_hit_threshold, query_fasta_end1, input_target, '>', shrimp_outfile, '2>', shrimp_log])
+ command_end2 = ' '.join(['rmapper-ls', '-s', spaced_seed, '-n', seed_matches_per_window, '-t', seed_hit_taboo_length, '-9', seed_generation_taboo_length, '-w', seed_window_length, '-o', max_hits_per_read, '-r', max_read_length, '-d', kmer, '-m', sw_match_value, '-i', sw_mismatch_value, '-g', sw_gap_open_ref, '-q', sw_gap_open_query, '-e', sw_gap_ext_ref, '-f', sw_gap_ext_query, '-h', sw_hit_threshold, query_fasta_end2, input_target, '>>', shrimp_outfile, '2>>', shrimp_log])
+
+ try:
+ os.system(command_end1)
+ os.system(command_end2)
+ except Exception, e:
+ if os.path.exists(query_fasta_end1): os.remove(query_fasta_end1)
+ if os.path.exists(query_fasta_end2): os.remove(query_fasta_end2)
+ if os.path.exists(query_qual_end1): os.remove(query_qual_end1)
+ if os.path.exists(query_qual_end2): os.remove(query_qual_end2)
+ stop_err(str(e))
+
+ # convert to table
+ if type_of_reads == 'single':
+ return_value = generate_sub_table(shrimp_outfile, input_target, query_qual, table_outfile, hit_per_read)
+ else:
+ return_value = generate_sub_table(shrimp_outfile, input_target, query_qual_end1+'&'+query_qual_end2, table_outfile, hit_per_read)
+
+ # remove temp. files
+ if type_of_reads == 'single':
+ if os.path.exists(query_fasta): os.remove(query_fasta)
+ if os.path.exists(query_qual): os.remove(query_qual)
+ else:
+ if os.path.exists(query_fasta_end1): os.remove(query_fasta_end1)
+ if os.path.exists(query_fasta_end2): os.remove(query_fasta_end2)
+ if os.path.exists(query_qual_end1): os.remove(query_qual_end1)
+ if os.path.exists(query_qual_end2): os.remove(query_qual_end2)
+
+ if os.path.exists(shrimp_log): os.remove(shrimp_log)
+
+if __name__ == '__main__': __main__()
+
diff -r 26825f08d362 -r 842f1883cf53 tools/metag_tools/shrimp_wrapper.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/metag_tools/shrimp_wrapper.xml Mon Sep 15 15:04:41 2008 -0400
@@ -0,0 +1,196 @@
+<tool id="shrimp_wrapper" name="SHRiMP" version="1.0.0">
+ <description>SHort Read Mapping Package</description>
+ <command interpreter="python">
+ #if ($type_of_reads.single_or_paired=="single" and $param.skip_or_full=="skip"):#shrimp_wrapper.py $type_of_reads.single_or_paired $input_target $output1 $output2 $input_query
+ #elif ($type_of_reads.single_or_paired=="paired" and $param.skip_or_full=="skip"):#shrimp_wrapper.py $type_of_reads.single_or_paired $input_target $output1 $output2 ${type_of_reads.input1} ${type_of_reads.input2}
+ #elif ($type_of_reads.single_or_paired=="single" and $param.skip_or_full=="full"):#shrimp_wrapper.py $type_of_reads.single_or_paired $input_target $output1 $output2 $param.spaced_seed $param.seed_matches_per_window $param.seed_hit_taboo_length $param.seed_generation_taboo_length $param.seed_window_length $param.max_hits_per_read $param.max_read_length $param.kmer $param.sw_match_value $param.sw_mismatch_value $param.sw_gap_open_ref $param.sw_gap_open_query $param.sw_gap_ext_ref $param.sw_gap_ext_query $param.sw_hit_threshold $input_query
+ #elif ($type_of_reads.single_or_paired=="paired" and $param.skip_or_full=="full"):#shrimp_wrapper.py $type_of_reads.single_or_paired $input_target $output1 $output2 $param.spaced_seed $param.seed_matches_per_window $param.seed_hit_taboo_length $param.seed_generation_taboo_length $param.seed_window_length $param.max_hits_per_read $param.max_read_length $param.kmer $param.sw_match_value $param.sw_mismatch_value $param.sw_gap_open_ref $param.sw_gap_open_query $param.sw_gap_ext_ref $param.sw_gap_ext_query $param.sw_hit_threshold ${type_of_reads.input1} ${type_of_reads.input2}
+ #end if
+ </command>
+ <inputs>
+ <page>
+ <param name="input_target" type="data" format="fasta" label="Reference sequence" />
+ <conditional name="type_of_reads">
+ <param name="single_or_paired" type="select" label="Single- or Paired-ends">
+ <option value="single">Single-end</option>
+ <option value="paired">Paired-end</option>
+ </param>
+ <when value="single">
+ <param name="input_query" type="data" format="fastqsolexa" label="Sequence file" />
+ </when>
+ <when value="paired">
+ <param name="input1" type="data" format="fastqsolexa" label="One end" />
+ <param name="input2" type="data" format="fastqsolexa" label="The other end" />
+ </when>
+ </conditional>
+ <conditional name="param">
+ <param name="skip_or_full" type="select" label="SHRiMP parameter selection">
+ <option value="skip">Default setting</option>
+ <option value="full">Full list</option>
+ </param>
+ <when value="skip" />
+ <when value="full">
+ <param name="spaced_seed" type="text" size="30" value="111111011111" label="Spaced Seed" />
+ <param name="seed_matches_per_window" type="integer" size="5" value="2" label="Seed Matches per Window" />
+ <param name="seed_hit_taboo_length" type="integer" size="5" value="4" label="Seed Hit Taboo Length" />
+ <param name="seed_generation_taboo_length" type="integer" size="5" value="0" label="Seed Generation Taboo Length" />
+ <param name="seed_window_length" type="float" size="10" value="115.0" label="Seed Window Length" help="in percentage"/>
+ <param name="max_hits_per_read" type="integer" size="10" value="100" label="Maximum Hits per Read" />
+ <param name="max_read_length" type="integer" size="10" value="1000" label="Maximum Read Length" />
+ <param name="kmer" type="integer" size="10" value="-1" label="Kmer Std. Deviation Limit" help="-1 as None"/>
+ <param name="sw_match_value" type="integer" size="10" value="100" label="S-W Match Value" />
+ <param name="sw_mismatch_value" type="integer" size="10" value="-150" label="S-W Mismatch Value" />
+ <param name="sw_gap_open_ref" type="integer" size="10" value="-400" label="S-W Gap Open Penalty (Reference)" />
+ <param name="sw_gap_open_query" type="integer" size="10" value="-400" label="S-W Gap Open Penalty (Query)" />
+ <param name="sw_gap_ext_ref" type="integer" size="10" value="-70" label="S-W Gap Extend Penalty (Reference)" />
+ <param name="sw_gap_ext_query" type="integer" size="10" value="-70" label="S-W Gap Extend Penalty (Query)" />
+ <param name="sw_hit_threshold" type="float" size="10" value="68.0" label="S-W Hit Threshold" help="in percentage"/>
+ </when>
+ </conditional>
+ </page>
+ </inputs>
+ <outputs>
+ <data name="output1" format="tabular"/>
+ <data name="output2" format="tabular"/>
+ </outputs>
+ <requirements>
+ <requirement type="binary">SHRiMP_rmapper</requirement>
+ </requirements>
+ <tests>
+ <test>
+ <param name="single_or_paired" value="single" />
+ <param name="skip_or_full" value="skip" />
+ <param name="input_target" value="shrimp_phix_anc.fa" ftype="fasta" />
+ <param name="input_query" value="shrimp_wrapper_test1.fastq" ftype="fastqsolexa"/>
+ <output name="output1" file="shrimp_wrapper_test1.out1" />
+ </test>
+ <!--
+ <test>
+ <param name="input1" value="shrimp_wrapper_test2_end1.fastq" ftype="fastqsolexa" />
+ <param name="input2" value="shrimp_wrapper_test2_end2.fastq" ftype="fastqsolexa" />
+ <param name="single_or_paired" value="paired" />
+ <param name="skip_or_full" value="skip" />
+ <param name="input_target" value="shrimp_eca_chrMT.fa" ftype="fasta" />
+ <output name="output1" file="shrimp_wrapper_test2.out1" />
+ </test>
+ <test>
+ <param name="single_or_paired" value="single" />
+ <param name="skip_or_full" value="full" />
+ <param name="input_target" value="shrimp_phix_anc.fa" ftype="fasta" />
+ <param name="input_query" value="shrimp_wrapper_test1.fastq" ftype="fastqsolexa"/>
+ <param name="spaced_seed" value="111111011111" />
+ <param name="seed_matches_per_window" value="2" />
+ <param name="seed_hit_taboo_length" value="4" />
+ <param name="seed_generation_taboo_length" value="0" />
+ <param name="seed_window_length" value="115.0" />
+ <param name="max_hits_per_read" value="100" />
+ <param name="max_read_length" value="1000" />
+ <param name="kmer" value="-1" />
+ <param name="sw_match_value" value="100" />
+ <param name="sw_mismatch_value" value="-150" />
+ <param name="sw_gap_open_ref" value="-400" />
+ <param name="sw_gap_open_query" value="-400" />
+ <param name="sw_gap_ext_ref" value="-70" />
+ <param name="sw_gap_ext_query" value="-70" />
+ <param name="sw_hit_threshold" value="68.0" />
+ <output name="output1" file="shrimp_wrapper_test1.out1" />
+ </test>
+ <test>
+ <param name="single_or_paired" value="paired" />
+ <param name="skip_or_full" value="full" />
+ <param name="input_target" value="shrimp_eca_chrMT.fa" ftype="fasta" />
+ <param name="spaced_seed" value="111111011111" />
+ <param name="seed_matches_per_window" value="2" />
+ <param name="seed_hit_taboo_length" value="4" />
+ <param name="seed_generation_taboo_length" value="0" />
+ <param name="seed_window_length" value="115.0" />
+ <param name="max_hits_per_read" value="100" />
+ <param name="max_read_length" value="1000" />
+ <param name="kmer" value="-1" />
+ <param name="sw_match_value" value="100" />
+ <param name="sw_mismatch_value" value="-150" />
+ <param name="sw_gap_open_ref" value="-400" />
+ <param name="sw_gap_open_query" value="-400" />
+ <param name="sw_gap_ext_ref" value="-70" />
+ <param name="sw_gap_ext_query" value="-70" />
+ <param name="sw_hit_threshold" value="68.0" />
+ <param name="input1" value="shrimp_wrapper_test2_end1.fastq" ftype="fastqsolexa"/>
+ <param name="input2" value="shrimp_wrapper_test2_end2.fastq" ftype="fastqsolexa"/>
+ <output name="output1" file="shrimp_wrapper_test2.out1" />
+ </test>
+ -->
+ </tests>
+<help>
+
+.. class:: warningmark
+
+Only nucleotide sequences as query.
+
+-----
+
+**What it does**
+
+Run SHRiMP on letter-space reads.
+
+-----
+
+**Example**
+
+- Input a multiple-fastq file like the following::
+
+ @seq1
+ TACCCGATTTTTTGCTTTCCACTTTATCCTACCCTT
+ +seq2
+ hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
+
+- Use default settings (for detail explanations, please see **Parameters** section)
+
+- Search against your own uploaded file, result will be in the following format::
+
+ +-------+-------+--------+----------+----------+---------+--------+--------+-------+------------+
+ | id | chrom | strand | t.start | t.end | q.start | q.end | length | score | editstring |
+ +-------+-------+--------+----------+----------+---------+--------+--------+-------+------------+
+ | >seq1 | chrMT | + | 14712 | 14747 | 1 | 36 | 36 | 3350 | 24T11 |
+ +-------+-------+--------+----------+----------+---------+--------+--------+-------+------------+
+
+- The result will be formatted Table::
+
+ +-------+---------+---------+----------+---------+----------+---------+----------+
+ | chrom | ref_loc | read_id | read_loc | ref_nuc | read_nuc | quality | coverage |
+ +-------+---------+---------+----------+---------+----------+---------+----------+
+ | chrMT | 14711 | seq1 | 0 | T | T | 40 | 1 |
+ | chrMT | 14712 | seq1 | 1 | A | A | 40 | 1 |
+ | chrMT | 14713 | seq1 | 2 | C | C | 40 | 1 |
+ +-------+---------+---------+----------+---------+----------+---------+----------+
+
+-----
+
+**Parameters**
+
+Parameter list with default value settings::
+
+ -s Spaced Seed (default: 111111011111)
+ -n Seed Matches per Window (default: 2)
+ -t Seed Hit Taboo Length (default: 4)
+ -9 Seed Generation Taboo Length (default: 0)
+ -w Seed Window Length (default: 115.00%)
+ -o Maximum Hits per Read (default: 100)
+ -r Maximum Read Length (default: 1000)
+ -d Kmer Std. Deviation Limit (default: -1 [None])
+
+ -m S-W Match Value (default: 100)
+ -i S-W Mismatch Value (default: -150)
+ -g S-W Gap Open Penalty (Reference) (default: -400)
+ -q S-W Gap Open Penalty (Query) (default: -400)
+ -e S-W Gap Extend Penalty (Reference) (default: -70)
+ -f S-W Gap Extend Penalty (Query) (default: -70)
+ -h S-W Hit Threshold (default: 68.00%)
+
+-----
+
+**Reference**
+
+ **SHRiMP**: Stephen M. Rumble, Michael Brudno, Phil Lacroute, Vladimir Yanovsky, Marc Fiume, Adrian Dalca. shrimp at cs dot toronto dot edu.
+
+</help>
+</tool>
1
0

[hg] galaxy 1509: Rewrote "Compare two queries" tool in Python.
by gregļ¼ scofield.bx.psu.edu 22 Sep '08
by gregļ¼ scofield.bx.psu.edu 22 Sep '08
22 Sep '08
details: http://www.bx.psu.edu/hg/galaxy/rev/eb941905fd70
changeset: 1509:eb941905fd70
user: guru
date: Tue Sep 16 14:09:16 2008 -0400
description:
Rewrote "Compare two queries" tool in Python.
2 file(s) affected in this change:
tools/filters/compare.xml
tools/filters/joinWrapper.py
diffs (68 lines):
diff -r ec547440ec97 -r eb941905fd70 tools/filters/compare.xml
--- a/tools/filters/compare.xml Tue Sep 16 13:25:42 2008 -0400
+++ b/tools/filters/compare.xml Tue Sep 16 14:09:16 2008 -0400
@@ -1,6 +1,6 @@
<tool id="comp1" name="Compare two Queries">
<description>to find common or distinct rows</description>
- <command interpreter="perl">joinWrapper.pl $input1 $input2 $field1 $field2 $mode "Y" $out_file1</command>
+ <command interpreter="python">joinWrapper.py $input1 $input2 $field1 $field2 $mode $out_file1</command>
<inputs>
<param format="tabular" name="input1" type="data" label="Compare"/>
<param name="field1" label="Using column" type="data_column" data_ref="input1" />
diff -r ec547440ec97 -r eb941905fd70 tools/filters/joinWrapper.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/filters/joinWrapper.py Tue Sep 16 14:09:16 2008 -0400
@@ -0,0 +1,53 @@
+#!/usr/bin/env python
+#Guruprasad Ananda
+"""
+This tool provides the UNIX "join" functionality.
+"""
+import sys, os, tempfile
+
+def stop_err(msg):
+ sys.stderr.write(msg)
+ sys.exit()
+
+def main():
+ infile1 = sys.argv[1]
+ infile2 = sys.argv[2]
+ field1 = int(sys.argv[3])
+ field2 = int(sys.argv[4])
+ mode =sys.argv[5]
+ outfile = sys.argv[6]
+
+ tmpfile1 = tempfile.NamedTemporaryFile()
+ tmpfile2 = tempfile.NamedTemporaryFile()
+
+ try:
+ #Sort the two files based on specified fields
+ os.system("sort -k %d -o %s %s" %(field1, tmpfile1.name, infile1))
+ os.system("sort -k %d -o %s %s" %(field2, tmpfile2.name, infile2))
+ except Exception, exc:
+ stop_err( 'Initialization error -> %s' %str(exc) )
+
+ option = ""
+ for line in file(tmpfile1.name):
+ line = line.strip()
+ if line:
+ elems = line.split('\t')
+ for j in range(1,len(elems)+1):
+ if j == 1:
+ option = "1.1"
+ else:
+ option = option + ",1." + str(j)
+ break
+
+ if mode == "V":
+ cmdline = 'join -v 1 -o %s -1 %d -2 %d %s %s | tr " " "\t" > %s' %(option, field1, field2, tmpfile1.name, tmpfile2.name, outfile)
+ else:
+ cmdline = 'join -o %s -1 %d -2 %d %s %s | tr " " "\t" > %s' %(option, field1, field2, tmpfile1.name, tmpfile2.name, outfile)
+
+ try:
+ os.system(cmdline)
+ except Exception, exj:
+ stop_err('Error joining the two datasets -> %s' %str(exj))
+
+if __name__ == "__main__":
+ main()
1
0

22 Sep '08
details: http://www.bx.psu.edu/hg/galaxy/rev/ec547440ec97
changeset: 1508:ec547440ec97
user: Dan Blankenberg <dan(a)bx.psu.edu>
date: Tue Sep 16 13:25:42 2008 -0400
description:
Small update for maf stats tool.
2 file(s) affected in this change:
lib/galaxy/tools/util/maf_utilities.py
tools/maf/maf_stats.py
diffs (99 lines):
diff -r 842f1883cf53 -r ec547440ec97 lib/galaxy/tools/util/maf_utilities.py
--- a/lib/galaxy/tools/util/maf_utilities.py Mon Sep 15 15:04:41 2008 -0400
+++ b/lib/galaxy/tools/util/maf_utilities.py Tue Sep 16 13:25:42 2008 -0400
@@ -199,7 +199,7 @@
yield block
def get_chopped_blocks_with_index_offset_for_region( index, src, region, species = None, mincols = 0, force_strand = None ):
for block, idx, offset in index.get_as_iterator_with_index_and_offset( src, region.start, region.end ):
- block = chop_block_by_region( block, src, region, species, mincols )
+ block = chop_block_by_region( block, src, region, species, mincols, force_strand )
if block is not None:
yield block, idx, offset
@@ -209,6 +209,25 @@
else: alignment = RegionAlignment( end - start, primary_species )
return fill_region_alignment( alignment, index, primary_species, chrom, start, end, strand, species, mincols )
+#reduces a block to only positions exisiting in the src provided
+def reduce_block_by_primary_genome( block, species, chromosome, region_start ):
+ #returns ( startIndex, {species:texts}
+ #where texts' contents are reduced to only positions existing in the primary genome
+ src = "%s.%s" % ( species, chromosome )
+ ref = block.get_component_by_src( src )
+ start_offset = ref.start - region_start
+ species_texts = {}
+ for c in block.components:
+ species_texts[ c.src.split( '.' )[0] ] = list( c.text )
+ #remove locations which are gaps in the primary species, starting from the downstream end
+ for i in range( len( species_texts[ species ] ) - 1, -1, -1 ):
+ if species_texts[ species ][i] == '-':
+ for text in species_texts.values():
+ text.pop( i )
+ for spec, text in species_texts.items():
+ species_texts[spec] = ''.join( text )
+ return ( start_offset, species_texts )
+
#fills a region alignment
def fill_region_alignment( alignment, index, primary_species, chrom, start, end, strand = '+', species = None, mincols = 0 ):
region = bx.intervals.Interval( start, end )
@@ -216,22 +235,7 @@
region.strand = strand
primary_src = "%s.%s" % ( primary_species, chrom )
- def reduce_block_by_primary_genome( block ):
- #returns ( startIndex, {species:texts}
- #where texts' contents are reduced to only positions existing in the primary genome
- ref = block.get_component_by_src( primary_src )
- start_offset = ref.start - start
- species_texts = {}
- for c in block.components:
- species_texts[ c.src.split( '.' )[0] ] = list( c.text )
- #remove locations which are gaps in the primary species, starting from the downstream end
- for i in range( len( species_texts[ primary_species ] ) - 1, -1, -1 ):
- if species_texts[ primary_species ][i] == '-':
- for text in species_texts.values():
- text.pop( i )
- for spec, text in species_texts.items():
- species_texts[spec] = ''.join( text )
- return ( start_offset, species_texts )
+
#Order blocks overlaping this position by score, lowest first
blocks = []
@@ -248,7 +252,7 @@
for block_dict in blocks:
block = chop_block_by_region( block_dict[1].get_at_offset( block_dict[2] ), primary_src, region, species, mincols, strand )
if block is None: continue
- start_offset, species_texts = reduce_block_by_primary_genome( block )
+ start_offset, species_texts = reduce_block_by_primary_genome( block, primary_species, chrom, start )
for spec, text in species_texts.items():
try:
alignment.set_range( start_offset, spec, text )
diff -r 842f1883cf53 -r ec547440ec97 tools/maf/maf_stats.py
--- a/tools/maf/maf_stats.py Mon Sep 15 15:04:41 2008 -0400
+++ b/tools/maf/maf_stats.py Tue Sep 16 13:25:42 2008 -0400
@@ -64,19 +64,11 @@
for c in block.components:
spec = c.src.split( '.' )[0]
if spec not in coverage: coverage[spec] = zeros( region.end - region.start, dtype = bool )
- ref = block.get_component_by_src( src )
- #skip gap locations due to insertions in secondary species relative to primary species
- start_offset = ref.start - region.start
- num_gaps = 0
- for i in range( len( ref.text.rstrip().rstrip( "-" ) ) ):
- if ref.text[i] in ["-"]:
- num_gaps += 1
- continue
- #Toggle base if covered
- for comp in block.components:
- spec = comp.src.split( '.' )[0]
- if comp.text and comp.text[i] not in ['-']:
- coverage[spec][start_offset + i - num_gaps] = True
+ start_offset, alignment = maf_utilities.reduce_block_by_primary_genome( block, dbkey, region.chrom, region.start )
+ for i in range( len( alignment[dbkey] ) ):
+ for spec, text in alignment.items():
+ if text[i] != '-':
+ coverage[spec][start_offset + i] = True
if summary:
#record summary
for key in coverage.keys():
1
0
details: http://www.bx.psu.edu/hg/galaxy/rev/c3ce08879473
changeset: 1511:c3ce08879473
user: Dan Blankenberg <dan(a)bx.psu.edu>
date: Tue Sep 16 14:26:14 2008 -0400
description:
Merge local heads
0 file(s) affected in this change:
diffs (12 lines):
diff -r eb941905fd70 -r c3ce08879473 lib/galaxy/tools/parameters/validation.py
--- a/lib/galaxy/tools/parameters/validation.py Tue Sep 16 14:09:16 2008 -0400
+++ b/lib/galaxy/tools/parameters/validation.py Tue Sep 16 14:26:14 2008 -0400
@@ -247,7 +247,7 @@
if line_startswith is None or line.startswith( line_startswith ):
fields = line.split( '\t' )
if metadata_column < len( fields ):
- self.valid_values.append( fields[metadata_column] )
+ self.valid_values.append( fields[metadata_column].strip() )
def validate( self, value, history = None ):
if not value: return
if hasattr( value, "metadata" ):
1
0

[hg] galaxy 1515: Forgot to update tool_conf.sample with the new...
by gregļ¼ scofield.bx.psu.edu 22 Sep '08
by gregļ¼ scofield.bx.psu.edu 22 Sep '08
22 Sep '08
details: http://www.bx.psu.edu/hg/galaxy/rev/280e8b68f845
changeset: 1515:280e8b68f845
user: guru
date: Wed Sep 17 17:14:59 2008 -0400
description:
Forgot to update tool_conf.sample with the new tool details.
1 file(s) affected in this change:
tool_conf.xml.sample
diffs (10 lines):
diff -r 33e06a98b6d8 -r 280e8b68f845 tool_conf.xml.sample
--- a/tool_conf.xml.sample Wed Sep 17 16:42:08 2008 -0400
+++ b/tool_conf.xml.sample Wed Sep 17 17:14:59 2008 -0400
@@ -281,5 +281,6 @@
<tool file="metag_tools/megablast_wrapper.xml" />
<tool file="metag_tools/megablast_xml_parser.xml" />
<tool file="metag_tools/blat_wrapper.xml" />
+ <tool file="metag_tools/mapping_to_ucsc.xml" />
</section>
</toolbox>
1
0
details: http://www.bx.psu.edu/hg/galaxy/rev/f1da9b95549b
changeset: 1516:f1da9b95549b
user: Dan Blankenberg <dan(a)bx.psu.edu>
date: Thu Sep 18 15:24:51 2008 -0400
description:
Update to latest gmaj.
1 file(s) affected in this change:
static/gmaj/gmaj.jar
diffs (2 lines):
diff -r 280e8b68f845 -r f1da9b95549b static/gmaj/gmaj.jar
Binary file static/gmaj/gmaj.jar has changed
1
0
details: http://www.bx.psu.edu/hg/galaxy/rev/4e2ed1801931
changeset: 1504:4e2ed1801931
user: Anton Nekrutenko <anton(a)bx.psu.edu>
date: Fri Sep 12 15:35:50 2008 -0400
description:
Typos
1 file(s) affected in this change:
tools/sr_mapping/lastz_wrapper.xml
diffs (17 lines):
diff -r 777e41dbdf1f -r 4e2ed1801931 tools/sr_mapping/lastz_wrapper.xml
--- a/tools/sr_mapping/lastz_wrapper.xml Fri Sep 12 15:14:20 2008 -0400
+++ b/tools/sr_mapping/lastz_wrapper.xml Fri Sep 12 15:35:50 2008 -0400
@@ -216,11 +216,11 @@
**Full Parameter List**
-The modes gives you a fuller control over lastz. The description of these and other parameters is found at the end of this page. Note, that not all parameters are included in this interface. If you would like to make additional options available through Galaxy, e-mail us at galaxy-bugs(a)bx.psu.edu.
+This modes gives you a fuller control over lastz. The description of these and other parameters is found at the end of this page. Note, that not all parameters are included in this interface. If you would like to make additional options available through Galaxy, e-mail us at galaxy-bugs(a)bx.psu.edu.
------
-** Do you want to modify reference name?**
+**Do you want to modify reference name?**
This option allows you set the name of the reference sequence manually. This is helpful when, for example, you would like to make reference name compatible with the UCSC naming conventions to be able to display your lastz results as a custom track at UCSC Genome Browser.
1
0
details: http://www.bx.psu.edu/hg/galaxy/rev/26825f08d362
changeset: 1506:26825f08d362
user: Anton Nekrutenko <anton(a)bx.psu.edu>
date: Sun Sep 14 14:58:50 2008 -0400
description:
Forgot two test datasets
2 file(s) affected in this change:
test-data/B1.fa
test-data/phiX.fa
diffs (1087 lines):
diff -r b6ff467f4522 -r 26825f08d362 test-data/B1.fa
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/B1.fa Sun Sep 14 14:58:50 2008 -0400
@@ -0,0 +1,1000 @@
+>HWI-EAS91_1_306UPAAXX:6:1:1503:1160
+GGTGGTCTATAGTGTTATTAATATCAAGTTGGGGGG
+>HWI-EAS91_1_306UPAAXX:6:1:1564:1179
+GCGAGCAGTAGACTCCTTCTGTTGATAAGCAAGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:1704:1082
+GATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTC
+>HWI-EAS91_1_306UPAAXX:6:1:1588:1797
+GTATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:1304:1526
+GTAGTTGAAATGGTAATAAGACGACCAATCTGACCT
+>HWI-EAS91_1_306UPAAXX:6:1:1490:1582
+GTCGTGTTCAACAGACCTATAAACATTCTGTGCCGC
+>HWI-EAS91_1_306UPAAXX:6:1:1356:1339
+GTAGACATTTTTACTTTTTATGTCCCTCATCGTCAC
+>HWI-EAS91_1_306UPAAXX:6:1:1311:853
+GGTTGGTTTATCGTTTTTGACACTCTCACGTTGTCT
+>HWI-EAS91_1_306UPAAXX:6:1:1257:1552
+GTTCGCTTTGAGTCTTCTTCGGTTCCGACTACCCTC
+>HWI-EAS91_1_306UPAAXX:6:1:1486:1402
+GTTACTGAGAAGTTAATGGATGAATTGGCACAATGC
+>HWI-EAS91_1_306UPAAXX:6:1:1028:1081
+GGATTGGTTTCGCTGAATCAGGTTATTAAAGAGATT
+>HWI-EAS91_1_306UPAAXX:6:1:1167:752
+GGTTTTCTTCATTGCATTCAGATGGATACATCTGTC
+>HWI-EAS91_1_306UPAAXX:6:1:1507:1113
+GTCAACGTTATATTTTGATAGTTTGACGGTTAATTC
+>HWI-EAS91_1_306UPAAXX:6:1:1654:1311
+GGATGAAAATGCTCACAATGACAAATCTGTCCACGG
+>HWI-EAS91_1_306UPAAXX:6:1:1386:1060
+GTTCTTGGTCAGTATGCAAATTAGCATAAGCAGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:1070:1356
+GGTTACAGTATGCCCATCGCAGTTCGCTACACGCAG
+>HWI-EAS91_1_306UPAAXX:6:1:787:1032
+GCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCA
+>HWI-EAS91_1_306UPAAXX:6:1:834:1017
+GCTACACGCAGGACGCTTTTTCACGTTCTGGTTGGT
+>HWI-EAS91_1_306UPAAXX:6:1:1703:1155
+GGATTGGTTTCGCTGAATCAGGTTATTAAAGAGATT
+>HWI-EAS91_1_306UPAAXX:6:1:1406:593
+GTTGAGTTCGATAATGGTGATATGTATGTTTACGGC
+>HWI-EAS91_1_306UPAAXX:6:1:1411:886
+GTCCTTTACTTGTCATGCGCTCTAATCTCTGGGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:923:972
+GCATGACAAGTAAAGGACGGTTGTCAGCGTCATAAG
+>HWI-EAS91_1_306UPAAXX:6:1:1279:1004
+GCCATAGCACCAGAAACAAAACTAGGGGCGGCCTCT
+>HWI-EAS91_1_306UPAAXX:6:1:1070:840
+GGTTGTCAGCGTCATAAGAGGTTTTACCTCCAAATG
+>HWI-EAS91_1_306UPAAXX:6:1:1595:1040
+GTTTCTGATAAGTTGCTTGATTTGGTTGGACTTGGT
+>HWI-EAS91_1_306UPAAXX:6:1:1002:559
+GAGATTGCCGAGATGCAAAATGAGACTCAAAAAGAG
+>HWI-EAS91_1_306UPAAXX:6:1:999:974
+GTTTGGATTGCTACTGACCGCTCTCGTGCTCGTCGC
+>HWI-EAS91_1_306UPAAXX:6:1:896:982
+GTGGCTGGAGACAAATAATCTCTTTAATAACCTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:1366:741
+GTTCAAGATTGCTGGAGGCCTCCACTATGAAATCGC
+>HWI-EAS91_1_306UPAAXX:6:1:749:1469
+GTTTATGGTGAACAGTGGATTAAGTTCATGAAGGAT
+>HWI-EAS91_1_306UPAAXX:6:1:1010:592
+GAGTTTATTGCTGCCGTCATTGCTTATTATGTTCAT
+>HWI-EAS91_1_306UPAAXX:6:1:1393:650
+GTGACTCATATCTAAACCAGTCCTTGACGAACGTGC
+>HWI-EAS91_1_306UPAAXX:6:1:1238:1731
+GAGAAATAAAAGTCTGAAACATGATTAAACTCCTAA
+>HWI-EAS91_1_306UPAAXX:6:1:1629:908
+GATGCGGTTATCCATCTGCTTATGGAAGCCAAGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:1560:849
+GCTGTCGCTACTTCCCAAGAAGCTGTTCAGAATCAG
+>HWI-EAS91_1_306UPAAXX:6:1:1029:783
+GAGAAGTTAATGGATGAATTGGCACAATGCTACAAT
+>HWI-EAS91_1_306UPAAXX:6:1:1152:1324
+GACAATCAGAAAGAGATTGCCGAGATGCAAAATGAG
+>HWI-EAS91_1_306UPAAXX:6:1:1614:2042
+GAAATGCCACAAGCCTCAATAGCAGGTTTAAGAGCC
+>HWI-EAS91_1_306UPAAXX:6:1:1398:439
+GATGGTTGGTTTATCGTTTTTGACACTCTCACGTTG
+>HWI-EAS91_1_306UPAAXX:6:1:955:616
+GACTAAAGAGATTCAGTACCTTAACGCTAAAGGTGC
+>HWI-EAS91_1_306UPAAXX:6:1:1672:753
+GAATGCCAGCAATCTCTTTTTGAGTCTCATTTTGCT
+>HWI-EAS91_1_306UPAAXX:6:1:1195:1293
+GCAATGCGACAGGCTCATGCTGATGGTTGGTTTATC
+>HWI-EAS91_1_306UPAAXX:6:1:1074:755
+GCAAGAGTAAACATAGTGCCATGCTCAGGAACAAAG
+>HWI-EAS91_1_306UPAAXX:6:1:984:499
+GACTTAGTTCATCAGCAAACGCAGAATCAGCGGTAT
+>HWI-EAS91_1_306UPAAXX:6:1:1452:1833
+GCGTGCTGGTGCTGATGCTTCCTCTGCTGGTATGGT
+>HWI-EAS91_1_306UPAAXX:6:1:863:710
+GAGTTCGATAATGGTGATATGTATGTTGACGGCCAT
+>HWI-EAS91_1_306UPAAXX:6:1:885:649
+GCAGAAGTTAACACTTTCGGATATTTCTGATGAGTC
+>HWI-EAS91_1_306UPAAXX:6:1:917:1214
+GACAGATGTATCCATCTGAATGCAATGAAGAAAACC
+>HWI-EAS91_1_306UPAAXX:6:1:892:1254
+GCTCAGGAAATGCAGCAGCAAGATAATCACGAGTAT
+>HWI-EAS91_1_306UPAAXX:6:1:1555:1005
+GCATTTGGCGCATAATCTCGGAAACCTGCTGTTGCT
+>HWI-EAS91_1_306UPAAXX:6:1:1637:1413
+GATGCTGTTCAACCACTAATAGGTAAGAAATCATGT
+>HWI-EAS91_1_306UPAAXX:6:1:1102:1567
+GGCCAGTTTTCTGGTCGTGTTCAACAGACCTATAAA
+>HWI-EAS91_1_306UPAAXX:6:1:799:1337
+GTATATGCACAAAATGAGATGCTTGCTTATCAACAG
+>HWI-EAS91_1_306UPAAXX:6:1:1353:1843
+GCAGACCCATAATGTCAATAGATGTGGTAGAAGTCG
+>HWI-EAS91_1_306UPAAXX:6:1:1196:789
+GCGGCATACGCTCGGCGCCAGTTTGAATATTAGACA
+>HWI-EAS91_1_306UPAAXX:6:1:1056:1676
+GTAAAATACTGACCAGCCGTTTGAGCTTGAGTAAGC
+>HWI-EAS91_1_306UPAAXX:6:1:1349:1836
+GGAAAACACCAATCTTTCCAAGCAACAGCAGGTTTC
+>HWI-EAS91_1_306UPAAXX:6:1:1027:788
+GGTGTTAATGCCACTCCTCTCCCGACTGTTAACACT
+>HWI-EAS91_1_306UPAAXX:6:1:990:1283
+GCTTAGGGATTTTATTGGTATCAGGGTTAATCGTGC
+>HWI-EAS91_1_306UPAAXX:6:1:904:939
+GAGAAGTTAATGGATGAATTGGCACAATGCTACAAT
+>HWI-EAS91_1_306UPAAXX:6:1:1732:793
+GTCAACATACATATCACCATTATCGAACTCAACGCC
+>HWI-EAS91_1_306UPAAXX:6:1:1355:2003
+GTTAGACCAAACCATGAAACCAACATAAACATTATT
+>HWI-EAS91_1_306UPAAXX:6:1:1337:977
+GCACCAGAAACAAAACTAGGGGCGGCCTCATCAGGG
+>HWI-EAS91_1_306UPAAXX:6:1:1605:1175
+GGAGGTAAAACCTCTTATGACGCTGACAACCGTCCT
+>HWI-EAS91_1_306UPAAXX:6:1:1763:1192
+GACAGGCCGTTTGAATGTTGACGGGATGAACATAAT
+>HWI-EAS91_1_306UPAAXX:6:1:722:483
+GTTATTATACCGTCAAGGACTGTGTGACTATTGACT
+>HWI-EAS91_1_306UPAAXX:6:1:1760:1136
+GCAAAGCATTGGGATTATCATAAAACGCCTCTAATC
+>HWI-EAS91_1_306UPAAXX:6:1:1088:798
+GGAAACCTGCTGTTGCTTGGAAAGATTGGTGTTTTC
+>HWI-EAS91_1_306UPAAXX:6:1:633:1076
+GCTACTTCCCAAGAAGCTGTTCAGAATCAGAATGAG
+>HWI-EAS91_1_306UPAAXX:6:1:673:754
+GTCATGGAAGCGATAAAACTCTGCAGGTTGGATATT
+>HWI-EAS91_1_306UPAAXX:6:1:1759:2019
+GTAAAGGACGGTTGTCAGCGTCATAAGAGGTTTTAC
+>HWI-EAS91_1_306UPAAXX:6:1:1064:1797
+GCGGTTATCCATCTGCTTATGGAAGCCAAGCATTGG
+>HWI-EAS91_1_306UPAAXX:6:1:1112:1669
+GCTCATGCTGATGGTTGGTTTATCGTTTTTGACACT
+>HWI-EAS91_1_306UPAAXX:6:1:510:1447
+GCATTAAGCTCAGGAAATGCAGCAGCAAGATAATCA
+>HWI-EAS91_1_306UPAAXX:6:1:877:1573
+GTGCTATTGCTGGCGGTATTTCTTCTTCTTTTTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:870:542
+GAATGTCACGCTGATTATTTTGACTTTGAGCGTATC
+>HWI-EAS91_1_306UPAAXX:6:1:966:384
+GCACCTGTTTTACAGACACCTAAAGCTACATCGTCA
+>HWI-EAS91_1_306UPAAXX:6:1:1186:1903
+GCCAGCGATAACCGGAGTAGTTGAAATGGTAATAAG
+>HWI-EAS91_1_306UPAAXX:6:1:1632:1742
+GCATCACCCATGCCTACAGTATTGTTATCGGTAGCC
+>HWI-EAS91_1_306UPAAXX:6:1:1521:559
+GAGAGCGCCAACGGCGTCCATCTCGAAGGAGTCGCC
+>HWI-EAS91_1_306UPAAXX:6:1:683:454
+GCTTATTATGTTCATCCCGTCAACATTCAAACGTCC
+>HWI-EAS91_1_306UPAAXX:6:1:112:1280
+GTTGGCGCTCTCCGTCTTTCTCCATTTCGTCGTGTC
+>HWI-EAS91_1_306UPAAXX:6:1:891:381
+GACCAGGGCGAGCGCCAGAACGTTTTTTACCTTTAG
+>HWI-EAS91_1_306UPAAXX:6:1:1348:958
+GATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1785:1915
+GCCCCGAAGGGGACNANAAATGGTTTTTAGAGAACG
+>HWI-EAS91_1_306UPAAXX:6:1:1418:42
+GTATGCCCATCGCAGTTCGCTACACGCAGGACGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:1421:743
+GGTCAACGCTACCTGTAGGAAGTGTCCGCATAAAAT
+>HWI-EAS91_1_306UPAAXX:6:1:1079:790
+GCCAAATGCTTACTCAAGCTCAAACGGCTGGTCAGT
+>HWI-EAS91_1_306UPAAXX:6:1:663:740
+GGTATTAAGGATGAGTGTTCAAGATTGCTGGATGCC
+>HWI-EAS91_1_306UPAAXX:6:1:1245:413
+GTTTGAATGTTGACGGGATGAACATAATAAGCAATG
+>HWI-EAS91_1_306UPAAXX:6:1:1378:1035
+GCTCTTGCTGGTGGCGCCATGTCTAAATTGTTTGGG
+>HWI-EAS91_1_306UPAAXX:6:1:903:1746
+GTACGGGGAAGGACGTCAATAGTCACACAGTCCTTG
+>HWI-EAS91_1_306UPAAXX:6:1:1713:1134
+GGCGTACGGGGAAGGACGTCAATAGTCACACAGTCC
+>HWI-EAS91_1_306UPAAXX:6:1:1246:1887
+GCTCTAATCTCTGGGCATCTGGCTATGATGTTGATG
+>HWI-EAS91_1_306UPAAXX:6:1:872:1731
+GGGCGGCCTCATCAGGGTTAGGAACATTAGAGCCTT
+>HWI-EAS91_1_306UPAAXX:6:1:1714:1582
+GCTTTCCTGCTCCTGTTGAGTTTATTGCTTCCGTCT
+>HWI-EAS91_1_306UPAAXX:6:1:1785:763
+GNCGAGAAATAAAANNNTGAAACATGATTAAANTCC
+>HWI-EAS91_1_306UPAAXX:6:1:1684:542
+GAAAAGACAGAATCTCTTCCAAGAGCTTGATGCGGT
+>HWI-EAS91_1_306UPAAXX:6:1:1581:1665
+GACTTTGAGCGTATCGAGGCTCTTAAACCTGCTATT
+>HWI-EAS91_1_306UPAAXX:6:1:901:1581
+GTGCTGATATTGCTTTTGATGCCGACCCTAAATTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1128:239
+GGTTATTATACCGTCAAGGACTGTGTGACTATTGAC
+>HWI-EAS91_1_306UPAAXX:6:1:969:441
+GGTAAGAAATCATGAGTCAAGTTACTGAACAATCCG
+>HWI-EAS91_1_306UPAAXX:6:1:630:1087
+GCCACCATGATTATGACCAGTGTTTCCAGTCCGTTC
+>HWI-EAS91_1_306UPAAXX:6:1:606:1852
+GGAGACAAATAATCTCTTTAATAACCTGATTCAGCG
+>HWI-EAS91_1_306UPAAXX:6:1:489:1315
+GAAAGCTCAGTCTCAGGAGGAAGCGGAGCAGTCCAC
+>HWI-EAS91_1_306UPAAXX:6:1:465:1983
+GAGCCAATACCATCAGCTTTACCGTCTTTCCAGAAA
+>HWI-EAS91_1_306UPAAXX:6:1:559:1028
+GAGTGCTTAATCCAACTTACCAAGCTGGGTTACGAC
+>HWI-EAS91_1_306UPAAXX:6:1:1655:1413
+GTATGTTGACGGCCATAAGGCTGCTTCTGACGTTCG
+>HWI-EAS91_1_306UPAAXX:6:1:980:605
+GCCGTTTGAATGTTGACGGGATGAACATAATAAGCA
+>HWI-EAS91_1_306UPAAXX:6:1:1629:1865
+GAAAAGCGGCATGGTCAATATAACCAGTAGTGTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:1180:1920
+GCACTCCGTGGACAGATTTGTCATTGTGAGCATTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1116:383
+GCGCAGGAAACACTGACGTTCTTACTGACGCAGAAG
+>HWI-EAS91_1_306UPAAXX:6:1:906:2041
+GTCACGTTTATGGTGAACAGTGGATTAAGTTCATGA
+>HWI-EAS91_1_306UPAAXX:6:1:1514:157
+GTCAATAGATGTGGTAGAAGTCGTCATTTGGCGTGG
+>HWI-EAS91_1_306UPAAXX:6:1:1032:1857
+GCAATCGCCGCCAGTTAAATAGCTTGCAAAATACGT
+>HWI-EAS91_1_306UPAAXX:6:1:638:609
+GATTCTGTCAAAAACTGACGCGTTGGATGAGGAGAT
+>HWI-EAS91_1_306UPAAXX:6:1:74:750
+GATAATCACGAGTATCCTTTCCTTTATCATCTTCAT
+>HWI-EAS91_1_306UPAAXX:6:1:486:822
+GTTGACGATGTAGCTTTAGGTGTCTTTAAAACAGGT
+>HWI-EAS91_1_306UPAAXX:6:1:899:473
+GAACAGCATCGGACTCAGATAGTAATCCACGCTCTT
+>HWI-EAS91_1_306UPAAXX:6:1:1613:197
+GTGACATTCAGAAGGGTAATAAGAACGAACCATAAA
+>HWI-EAS91_1_306UPAAXX:6:1:326:1747
+GTTGAGGCTTTCGTTTATTGTACGCTTTGCTTTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1487:526
+GCAAAATACGTGGCCTTATGGTTACAGTATGCCCAT
+>HWI-EAS91_1_306UPAAXX:6:1:629:665
+GAAATGCAGCAGCAAGATAATCACGAGTATCCTTTC
+>HWI-EAS91_1_306UPAAXX:6:1:766:744
+GGCCGTCAACATACATATCACCATTATCGAACTCAA
+>HWI-EAS91_1_306UPAAXX:6:1:391:1771
+GTGGTTGATATTTTTCATGGTATTGATAAATCTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:591:1102
+GCTTTGCGTGACTATTTTCGTGATATTGTTCGTTTG
+>HWI-EAS91_1_306UPAAXX:6:1:917:664
+GCCATGATGGTGGTTATTATACCGTCAAGGACTGTG
+>HWI-EAS91_1_306UPAAXX:6:1:217:737
+GTTCAGTTGTTGCATTGGAATATTCAGTTTAAATTT
+>HWI-EAS91_1_306UPAAXX:6:1:1047:839
+GACCATTCAAAGGATAAACATCATAGGCAGTCGGGG
+>HWI-EAS91_1_306UPAAXX:6:1:558:1040
+GCCACCAGCAAGAGCAGAAGCAATACCGCCAGCAAT
+>HWI-EAS91_1_306UPAAXX:6:1:1207:524
+GCCAATACCATCAGCTTTACCGTCTTTCCAGAAATT
+>HWI-EAS91_1_306UPAAXX:6:1:708:1634
+GCCATTCAAGGCTCTAATGTTCCTAACCCTGATGAG
+>HWI-EAS91_1_306UPAAXX:6:1:576:1851
+GTGCTATGGCTAAAGCTGGTAAAGGACTTCTTGAAG
+>HWI-EAS91_1_306UPAAXX:6:1:906:460
+GTAGACATTTTTACTTTTTATGTCCCTCATCGTCAC
+>HWI-EAS91_1_306UPAAXX:6:1:693:1260
+GCGAAAGGTCGCAAAGTAAGAGCTTCTCGAGCTGCG
+>HWI-EAS91_1_306UPAAXX:6:1:1373:286
+GGACACTTCCTACAGGTAGCGTTGACCCTAATTTTG
+>HWI-EAS91_1_306UPAAXX:6:1:762:41
+GATACTTGGAACAATTTCTGGAAAGACGGTAAAGCT
+>HWI-EAS91_1_306UPAAXX:6:1:475:1091
+GTCACACAGTCCTTGACGGTATAATAACCACCATCT
+>HWI-EAS91_1_306UPAAXX:6:1:791:627
+GCCTCCGGTGGCATTCAAGGTGATGTGCTTGCTACC
+>HWI-EAS91_1_306UPAAXX:6:1:336:1791
+GAAGGAGTCGCCAGCGATAACCGGAGTAGTTGAAAT
+>HWI-EAS91_1_306UPAAXX:6:1:1483:943
+GCACGTAATTTTTGACGCACGTTTTCTTCTGCGTCT
+>HWI-EAS91_1_306UPAAXX:6:1:641:1071
+GATGGGCATACTGTAACCATAAGGCCACGTATTTTG
+>HWI-EAS91_1_306UPAAXX:6:1:196:755
+GAACGCCCTCTTAAGGATATTCGCGATGAGTATAAT
+>HWI-EAS91_1_306UPAAXX:6:1:463:1398
+GTCATAAGAGGTTTTACCTCCAAATGAAGAAATAAC
+>HWI-EAS91_1_306UPAAXX:6:1:1559:460
+GCTCACAATGACAAATCTGTCCACGGAGTGCTTAAT
+>HWI-EAS91_1_306UPAAXX:6:1:1625:1561
+GAGGAGTGGCATTAACACCATCCTTCATGAACTTAC
+>HWI-EAS91_1_306UPAAXX:6:1:1729:1588
+GCTGATAAAGGAAAGGATACTCGTGATTATCTTGCT
+>HWI-EAS91_1_306UPAAXX:6:1:945:393
+GGCCTCATCAGGGTTAGGAACATTAGAGCCTTGAAT
+>HWI-EAS91_1_306UPAAXX:6:1:298:1391
+GTAAAGTTAGACCAAACCATGAAACCAACATAAACA
+>HWI-EAS91_1_306UPAAXX:6:1:1270:1500
+GAATTACTACTGCTTGTTTACGAATTAAATATATGT
+>HWI-EAS91_1_306UPAAXX:6:1:481:1546
+GCTGGCATTCAGTCGGCGACTTCACGCCAGAATACG
+>HWI-EAS91_1_306UPAAXX:6:1:473:1729
+GTTCTTACTGACGCAGAAGAAAACGTGCGTCAAAAT
+>HWI-EAS91_1_306UPAAXX:6:1:801:1831
+GCTGAGGTTGACTTAGTTCATCAGCAAACGCAGAAT
+>HWI-EAS91_1_306UPAAXX:6:1:536:639
+GCCGACCCTAAATTTTTTGCCTGTTTGGTTCTCTTT
+>HWI-EAS91_1_306UPAAXX:6:1:259:938
+GTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTG
+>HWI-EAS91_1_306UPAAXX:6:1:907:1513
+GGCATGGGTGATGCTGGTATTAAATCTGCCATTCAC
+>HWI-EAS91_1_306UPAAXX:6:1:372:1409
+GATGAGTATAATTACCCCAAAAAGAAAGGTATTAAG
+>HWI-EAS91_1_306UPAAXX:6:1:485:1626
+GATGGCAGCAACGGAAACCATAACGAGCATCATCTT
+>HWI-EAS91_1_306UPAAXX:6:1:583:1679
+GCTCAAAGTCAAAATAATCAGCGTGACATTCAGAAG
+>HWI-EAS91_1_306UPAAXX:6:1:690:1610
+GACGCGTTGGATGAGGAGAAGTGGCTTAATATGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:257:918
+GCAGGCTGGCACTTCTGCCGTTTCTGATAAGTTTCT
+>HWI-EAS91_1_306UPAAXX:6:1:818:33
+GTGTTAATGCCACTCCTCTCCCGACTGTTAACTCTG
+>HWI-EAS91_1_306UPAAXX:6:1:541:1242
+GGGATTATCATAAAACGCCTCTAATCGGTCGTCAGC
+>HWI-EAS91_1_306UPAAXX:6:1:1014:279
+GTAAAAATGTCTACAGTAGAGTCAATAGCAAGGCCC
+>HWI-EAS91_1_306UPAAXX:6:1:672:1790
+GGCCGTTTGAATGTTGACGGGATGAACATAATAAGC
+>HWI-EAS91_1_306UPAAXX:6:1:708:464
+GGAGACAAATAATCTCTTTAATAACCTGATTCAGCG
+>HWI-EAS91_1_306UPAAXX:6:1:633:1486
+GGGAAAGGTCATGCGGCATACGCTCGGCGCCAGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:328:696
+GTTCCGACTACCCTCCCGACTGCCTATGATGTTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:259:1389
+GCGTACTTATTCGCCACCATGATTATTACCAGTGTT
+>HWI-EAS91_1_306UPAAXX:6:1:1315:41
+GCTTTCCGTGATGTCACAGCCTGCTTTGATGTGTCG
+>HWI-EAS91_1_306UPAAXX:6:1:1647:549
+GCTTAATCCAACTTACCAAGCTGGGTTACGACGCGC
+>HWI-EAS91_1_306UPAAXX:6:1:300:886
+GTTCTTGGTCAGTATGCAAATTAGCATAAGCAGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:317:1411
+GTACGCTGTACTTTGTGGGATACCCTCGCTTTCCTT
+>HWI-EAS91_1_306UPAAXX:6:1:321:1819
+GGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGT
+>HWI-EAS91_1_306UPAAXX:6:1:631:70
+GTGGATTACTATCTGAGTCCGATGCTGTTCAACCAC
+>HWI-EAS91_1_306UPAAXX:6:1:624:1040
+GCTGGCGACTCCTTCGAGATGGACGCCGTTTGCGCT
+>HWI-EAS91_1_306UPAAXX:6:1:662:1187
+GGGAGAGGAGTGGCATTAACACCATCCTTCATGACC
+>HWI-EAS91_1_306UPAAXX:6:1:1440:1959
+GAATCAGCGGTATGGCTCCTCTCCTATTTTTGCTTC
+>HWI-EAS91_1_306UPAAXX:6:1:458:1629
+GCTGGTGGCGCCATGTCTAAATTTTTTGGAGGCGGT
+>HWI-EAS91_1_306UPAAXX:6:1:216:790
+GGGATGAAAATGCTCACAATGACAAATCTGTCCACG
+>HWI-EAS91_1_306UPAAXX:6:1:1407:1174
+TTACCTATTAGTGGTTGAACAGCATCGGACTCAGAT
+>HWI-EAS91_1_306UPAAXX:6:1:999:1790
+GTCCTGCGTGTAGCGAACTGCGATGGGCATACTGTC
+>HWI-EAS91_1_306UPAAXX:6:1:141:1994
+GGCTTTTTTATGGTTCGTTCTTATTACCCTTCTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:225:465
+GTCAGATATGGACCTTGCTGCTAAAGGTCTAGGAGC
+>HWI-EAS91_1_306UPAAXX:6:1:649:1760
+GACCCATAATGTCAATAGATGTGGTAGAAGTCGTCT
+>HWI-EAS91_1_306UPAAXX:6:1:300:986
+GTTGAACACGACCAGAAAACTGGCCTAACGACGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:478:605
+GAGACTGAGCTTTCTCGCCAAATGACGACTTCTACC
+>HWI-EAS91_1_306UPAAXX:6:1:622:395
+GGTAGCTTTAAGCGGCTCACCTTTAGCATCAACAGG
+>HWI-EAS91_1_306UPAAXX:6:1:1701:574
+GTAAAGCCTCTACGCGATTTCATAGTGGAGGCCTCC
+>HWI-EAS91_1_306UPAAXX:6:1:646:59
+GGAAGTGTCCGCATAAAATGCACCGCATGGAAATGT
+>HWI-EAS91_1_306UPAAXX:6:1:284:2031
+GACAGAATCGTTAGTTGATGGCGAAAGGTCGCAAAG
+>HWI-EAS91_1_306UPAAXX:6:1:22:1009
+GATGGATACATCTGTCAACGCCGCTAATCAGGTTGT
+>HWI-EAS91_1_306UPAAXX:6:1:47:1826
+GCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCG
+>HWI-EAS91_1_306UPAAXX:6:1:1025:1236
+TGGATGAGGAGAAGTGGCTTAATATGCTTGGCACGT
+>HWI-EAS91_1_306UPAAXX:6:1:773:591
+GAGCAGGAAAGCGAGGGTATCCCACAAAGTCCAGCG
+>HWI-EAS91_1_306UPAAXX:6:1:1753:527
+GGTGGCATTCAAGGTGATGTGCTTGCTACCGATAAC
+>HWI-EAS91_1_306UPAAXX:6:1:426:1717
+GTAGCGCCAATATGAGAAGAGCCATACCGCTGATTC
+>HWI-EAS91_1_306UPAAXX:6:1:959:818
+TTCTGATAAGCTGGTTCTCACTTCTGTTACTCCAGC
+>HWI-EAS91_1_306UPAAXX:6:1:459:1344
+GCCTATGATGTTTATCCTTTGAATGGTCGCCATGAT
+>HWI-EAS91_1_306UPAAXX:6:1:973:1367
+TTCGTGATGAGTTTGTATCTGTTACTGATAAGTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:201:871
+GATTAGAGGCGTTTTATGATAATCCCAATGCTTTTC
+>HWI-EAS91_1_306UPAAXX:6:1:713:1672
+GGCGTACGGGGAAGGACGTCAATAGTCACACAGTCC
+>HWI-EAS91_1_306UPAAXX:6:1:444:1435
+TTTGTGGGATACCCTCGCTTTCCTGCTCCTGTTGTG
+>HWI-EAS91_1_306UPAAXX:6:1:288:1136
+GCCTTCCATGATGAGACAGGCCGTTTTAATTTTTTC
+>HWI-EAS91_1_306UPAAXX:6:1:1653:225
+GCAAGGCCACGACGCAATGGAGAAAGACGGAGAGCG
+>HWI-EAS91_1_306UPAAXX:6:1:537:1764
+GCTCCGCTTCCTCCTGAGACTGAGCTTTCTCGCCAA
+>HWI-EAS91_1_306UPAAXX:6:1:196:1854
+GTATCGAGGCTCTTAAACCTGCTATTTAGGCTTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:312:1707
+GCGTCATAAGAGGTTTTACCTCCAAATGAAGAAATA
+>HWI-EAS91_1_306UPAAXX:6:1:651:183
+GTATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:295:694
+GTGATTACTTCATGCAGCGTTACCGTGATGTTATTT
+>HWI-EAS91_1_306UPAAXX:6:1:330:1895
+GCAAGTCTGCCGCTGATAAAGGAAAGGATACTCGTG
+>HWI-EAS91_1_306UPAAXX:6:1:590:331
+GAAATTTCTATGAATGATGTTTTCCGTTCTGGTGAT
+>HWI-EAS91_1_306UPAAXX:6:1:481:1687
+GCAGATTGCGATAAACGGTCACATTAAATTTAACCT
+>HWI-EAS91_1_306UPAAXX:6:1:1112:1279
+TGTGCATATACCTGGTCTTTCGTATTCTGTCGTGAT
+>HWI-EAS91_1_306UPAAXX:6:1:1099:1216
+TTAGAGCGCATGACAAGTAAAGGACGGTTGTCAGCG
+>HWI-EAS91_1_306UPAAXX:6:1:221:1238
+GTATCCTTTCCTTTATCATCGGCAGACTTTTCACCT
+>HWI-EAS91_1_306UPAAXX:6:1:1015:364
+GCCAGCGATAACCGGAGTAGTTGAAATGGTAATAAG
+>HWI-EAS91_1_306UPAAXX:6:1:735:1806
+TGTTATTAATATCAAGTTGGGGGAGCACATTGTAGC
+>HWI-EAS91_1_306UPAAXX:6:1:320:411
+GCTCTTGGAAGAGATTCTGTCTTTTCGTATGCAGTG
+>HWI-EAS91_1_306UPAAXX:6:1:1273:1031
+TTAAGGATATTCGCGATGAGTATAATTACCCCAAAA
+>HWI-EAS91_1_306UPAAXX:6:1:1456:1088
+AATAATCAGCGTGACATTCAGAAGGGTAATAAGAAC
+>HWI-EAS91_1_306UPAAXX:6:1:1365:307
+GACGGCCATAAGGCTGCTTCTGACGTTCGTGATGAG
+>HWI-EAS91_1_306UPAAXX:6:1:478:252
+GATGCGGTTATCCATCTGCTTATTGAAGCCAAGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:915:1232
+TATTAATAACACTATAGACCACCGCCCCGAAGGGGC
+>HWI-EAS91_1_306UPAAXX:6:1:680:1357
+TTCCTCCTGAGACTGAGCTTTCTCGCCAAATGACGC
+>HWI-EAS91_1_306UPAAXX:6:1:238:1279
+GCCGAAGCCCCTGCAATTAAAATTGTTGACCACCTA
+>HWI-EAS91_1_306UPAAXX:6:1:1583:35
+GCAAATTAGCATAAGCAGCTTGCAGACCCATAATGT
+>HWI-EAS91_1_306UPAAXX:6:1:502:283
+GTTCCGACTACCCTCCCGACTGCCTATGATGTTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:418:1730
+GAAGGCTTCCCATTCATTCAGGAACCGCCTTCTGGT
+>HWI-EAS91_1_306UPAAXX:6:1:596:647
+GCCTCAACGCAGCGACGAGCACGAGAGCGGTCAGTA
+>HWI-EAS91_1_306UPAAXX:6:1:92:1591
+GTTCATGAAGGATGGTGTTAATGCCACTCCTCTCCC
+>HWI-EAS91_1_306UPAAXX:6:1:430:1938
+GCAGGACGCTTTTTCACGTTCTGGTTGGTTGTGTCC
+>HWI-EAS91_1_306UPAAXX:6:1:212:527
+GGTATTGATAAAGCTGTTGCCGATACTTGTAACAAT
+>HWI-EAS91_1_306UPAAXX:6:1:594:942
+GACGACATTAGAAATATCCTTTGCAGTAGCGCCAAT
+>HWI-EAS91_1_306UPAAXX:6:1:169:1774
+GCCTTCCATGATGAGACAGGCCGTTTTAATTTTTAC
+>HWI-EAS91_1_306UPAAXX:6:1:1090:210
+GGAGAGCGCCAACGGCGTCCATCTCGAAGGAGTCGC
+>HWI-EAS91_1_306UPAAXX:6:1:589:96
+GGCGGCCCCATCAGGGTTAGGAACATTAGAGCCTTG
+>HWI-EAS91_1_306UPAAXX:6:1:1477:1231
+TAGGAACATTAGAGCCTTGAATGGCAGATTTAATAC
+>HWI-EAS91_1_306UPAAXX:6:1:707:1076
+TCTGACGTTCGTGATGAGTTTGTATCTTTTTCTTTG
+>HWI-EAS91_1_306UPAAXX:6:1:749:1715
+GAACATAATAAGCAATGACGGCAGCAATAAACTCAA
+>HWI-EAS91_1_306UPAAXX:6:1:1738:1884
+GCTCACCTTTAGCATCAACAGGCCACAACCAACCAG
+>HWI-EAS91_1_306UPAAXX:6:1:1160:1088
+TCACATTTTGTTCATGGTAGAGATTCTCTTGTTGAC
+>HWI-EAS91_1_306UPAAXX:6:1:517:119
+GCAAGGCTAATGATTCACACGCCGACTGCTATCAGT
+>HWI-EAS91_1_306UPAAXX:6:1:1472:716
+TGGTAATGGTGGTTTTCTTCATTTCATTCAGTTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:281:441
+GAGCAGTAGACTCCTTCTGTTGATAAGCAAGCATCT
+>HWI-EAS91_1_306UPAAXX:6:1:1101:324
+AATACCATCAGCTTTACCGTCTTTCCAGAAATTGTT
+>HWI-EAS91_1_306UPAAXX:6:1:1225:1494
+TTCTCAAATCCGGCGTCAACCATACCAGCAGAGGAA
+>HWI-EAS91_1_306UPAAXX:6:1:1509:1025
+TTCTTGCTGCCGAGGGTCGCAAGGCTATTGTTTCAC
+>HWI-EAS91_1_306UPAAXX:6:1:592:510
+GATACCAATAAAATCCCTAAGCATTTGTTTCTGGTT
+>HWI-EAS91_1_306UPAAXX:6:1:324:1729
+GAACAAAGAAACGCGGCACAGAATGTTTATAGGTCT
+>HWI-EAS91_1_306UPAAXX:6:1:128:1925
+GGAACAACTCACTAAAAACCAAGCTGTCGCTACTTC
+>HWI-EAS91_1_306UPAAXX:6:1:786:893
+TACGGGGAAGGACGTCAATAGTCACACAGTCCTTGC
+>HWI-EAS91_1_306UPAAXX:6:1:248:955
+GCTACAATGTGCTCCCCCAACTTGATATTAATAACA
+>HWI-EAS91_1_306UPAAXX:6:1:388:1127
+GATATTGGTCGTATGGTTCTTGCTGCCTAGTGTCTC
+>HWI-EAS91_1_306UPAAXX:6:1:721:1156
+TCTGGTTGGTTGTGGCCTTTTTATGCTAAATGTTAG
+>HWI-EAS91_1_306UPAAXX:6:1:1564:1468
+TTACTTTTTATGTCCCTCATCGTCACGTTTATGTTG
+>HWI-EAS91_1_306UPAAXX:6:1:750:77
+GGCTCATTCTGATTCTGAACAGCTTCTTGGGAAGTA
+>HWI-EAS91_1_306UPAAXX:6:1:405:487
+GTTGGATTAAGCACTCCGTGGACAGATTTGTCATTT
+>HWI-EAS91_1_306UPAAXX:6:1:836:1204
+TTGCTTCTGCTCTTGCTTGTGGCGCCATGTCTAAAT
+>HWI-EAS91_1_306UPAAXX:6:1:224:1548
+GCTGCCGTCATTGCTTATTATGTTCATCCCTTCAAC
+>HWI-EAS91_1_306UPAAXX:6:1:931:1015
+TTAAGGTACTGAATCTCTTTAGTCGCAGTAGGCGGT
+>HWI-EAS91_1_306UPAAXX:6:1:329:579
+GTCCCTCATCGTCACGTTTATGGTGAACAGTGGATT
+>HWI-EAS91_1_306UPAAXX:6:1:260:1145
+GCTTGCGTTTATGGTACGCTGGACTTTTTGTGATAC
+>HWI-EAS91_1_306UPAAXX:6:1:1523:1253
+TTGGTAAAATACTGACCAGCCGTTTGAGCTTGAGTA
+>HWI-EAS91_1_306UPAAXX:6:1:326:1271
+GACCACTCGCGATTCAATCATGACTTCGTGATAAAT
+>HWI-EAS91_1_306UPAAXX:6:1:213:622
+GCACCTGTTTTACAGACACCTAAAGCTACATCGTCA
+>HWI-EAS91_1_306UPAAXX:6:1:274:712
+GCGGTCAAAAAGCCGCCTCCGGTGGCATTCAAGGTG
+>HWI-EAS91_1_306UPAAXX:6:1:1549:627
+TATGGTTCTTGCTGCCGAGGGTCGCAAGGCTAATGT
+>HWI-EAS91_1_306UPAAXX:6:1:1714:737
+TCTTTCGTATTCTGGCGTGAAGTCGCCGACTGAATG
+>HWI-EAS91_1_306UPAAXX:6:1:760:1217
+TACACGCAGGACGCTTTTTCACGTTCTGGTTGGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:174:768
+GTTGGCTGACGACCGATTAGAGGCGTTTTTTTATAT
+>HWI-EAS91_1_306UPAAXX:6:1:172:1412
+GGTCGGCAGATTGCGATAAACGTTCACATTAAATTT
+>HWI-EAS91_1_306UPAAXX:6:1:1393:869
+TTCATCCCGTCAACATTCAAACGGCCTGTCTCATCT
+>HWI-EAS91_1_306UPAAXX:6:1:301:481
+GTTATAGATATTCAAATAACCCTGAAACAAATGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:648:1093
+TAACGCTGCATGAAGTAATCACGTTCTTGGTCAGTT
+>HWI-EAS91_1_306UPAAXX:6:1:1233:591
+TTCCCATCTTGGCTTCCTTGCTGGTCAGATTGGTCT
+>HWI-EAS91_1_306UPAAXX:6:1:540:1415
+TTATTAAAGAGATTATTTTTCTCCAGCCACTTATGT
+>HWI-EAS91_1_306UPAAXX:6:1:151:1792
+GCAAGCTGCTTATGCTAATTTGCATACTGACCAAGA
+>HWI-EAS91_1_306UPAAXX:6:1:748:1378
+TGGATTACTATCTGAGTCCGATGCTGTTCAACCACT
+>HWI-EAS91_1_306UPAAXX:6:1:1526:1479
+TGGTTGGTTGTGGCCTGTTGATGCTAAAGGTGAGCC
+>HWI-EAS91_1_306UPAAXX:6:1:985:1093
+TAACCGTCTTCTCGTTCTCTAAAAACCATTTTTCTT
+>HWI-EAS91_1_306UPAAXX:6:1:480:1378
+TCAACCTCAGCACTAACCTTGCGAGTCATTTCTTTG
+>HWI-EAS91_1_306UPAAXX:6:1:903:753
+TGTGGCCTGTTGATGCTAAAGGTGAGCCGCTTAAAG
+>HWI-EAS91_1_306UPAAXX:6:1:1697:1737
+GGCGACCCTGTTTTGTATGGCAACTTGCCGCCGCGT
+>HWI-EAS91_1_306UPAAXX:6:1:803:1037
+TGTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCTACT
+>HWI-EAS91_1_306UPAAXX:6:1:1727:1244
+TTAATGCTTGGGAGCGTGCTGGTGCTGATGCTTCCT
+>HWI-EAS91_1_306UPAAXX:6:1:253:1162
+GCATTTAGTAGCGGTAAAGTTTGACCAAACCATTAT
+>HWI-EAS91_1_306UPAAXX:6:1:216:856
+GTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCT
+>HWI-EAS91_1_306UPAAXX:6:1:825:886
+TCCCACAAAGTCCAGCGTACCATAAACGCAAGCCTC
+>HWI-EAS91_1_306UPAAXX:6:1:1699:962
+TGATTTCGATTTTCTGACGAGTAACAAAGTTTGGAT
+>HWI-EAS91_1_306UPAAXX:6:1:1210:625
+TCAGATAGTAATCCACGCTCTTTTAAAATGTCAACA
+>HWI-EAS91_1_306UPAAXX:6:1:538:616
+TAAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGT
+>HWI-EAS91_1_306UPAAXX:6:1:184:1849
+GCTCACCTTTAGCATCAACAGGCCACAACCAACCAG
+>HWI-EAS91_1_306UPAAXX:6:1:1636:1103
+TATCTGACTTTTTGTTAACGTATTTAGCCACATAGA
+>HWI-EAS91_1_306UPAAXX:6:1:605:223
+GGTTATTTGAATATCTATAACAACTATTTTAAATCG
+>HWI-EAS91_1_306UPAAXX:6:1:256:1052
+GGTAAAGGACTTCTTGAAGGTACGTTGCAGTCTGGC
+>HWI-EAS91_1_306UPAAXX:6:1:300:1515
+GCCATGATGGTGGTTATTATACCGTCAAGGACTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1684:1320
+TGCTTGGCTTCCATAAGCAGATGGATAACCGCATCA
+>HWI-EAS91_1_306UPAAXX:6:1:1186:895
+TCAGATGGATACATCTGTCAACGCCGCTAATCAGGT
+>HWI-EAS91_1_306UPAAXX:6:1:1463:754
+TCACTTCTGTTACTCCAGCTTCTTCGGCACCTGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:808:1053
+TGTCCACGGAGTGCTTAATCCAACTTACCAAGCTGG
+>HWI-EAS91_1_306UPAAXX:6:1:960:1218
+TTTCTAATGTCGTCACTGATGCTGCTTCTGTTGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:521:1646
+GGAAAACGAACAAGCGCAAGAGTAAACATAGTGCCA
+>HWI-EAS91_1_306UPAAXX:6:1:289:1885
+GCCAGCGATAACCGGAGTAGTTGAAATGGTAATAAG
+>HWI-EAS91_1_306UPAAXX:6:1:471:170
+GGTCAGTTCCATCAACATCATAGCCAGATGCCCAGA
+>HWI-EAS91_1_306UPAAXX:6:1:828:754
+TTTGCGTGACTATTTTCGTGATATTGTTCGTATGGT
+>HWI-EAS91_1_306UPAAXX:6:1:924:1679
+TTTAATGTGACCGTTTATCGCAATCTGCCGACCACT
+>HWI-EAS91_1_306UPAAXX:6:1:837:901
+TGCATTTTAGTAAGCTCTTTTTGATTCTCAAATCCG
+>HWI-EAS91_1_306UPAAXX:6:1:543:16
+GCTTAATGCTTGGGAGCGTGCTGGTGCTGATGCTTC
+>HWI-EAS91_1_306UPAAXX:6:1:1482:578
+TCTTTAGCTCCTAGACCTTTAGCAGCAAGGTCCATA
+>HWI-EAS91_1_306UPAAXX:6:1:1254:1668
+TTATGCGCCTTCGTATGTTTCTCCTGCTTATCACCT
+>HWI-EAS91_1_306UPAAXX:6:1:1402:898
+TCATGAGTCAAGTTACTGAACAATCCGTACGTTTCC
+>HWI-EAS91_1_306UPAAXX:6:1:764:1534
+TTATACCGTCAAGGACTGTGTGACTATTGACGTCCT
+>HWI-EAS91_1_306UPAAXX:6:1:681:1079
+TGGCGAATAAGTACGCGTTCTTGCAAATCACCAGAA
+>HWI-EAS91_1_306UPAAXX:6:1:672:1350
+TTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1266:493
+TGACCAGCCGTTTGAGCTTGAGTAAGCATTTGGCGC
+>HWI-EAS91_1_306UPAAXX:6:1:118:238
+GACGGTATAATAACCACCATCATGGCGACCATTCAA
+>HWI-EAS91_1_306UPAAXX:6:1:699:433
+TTATTGCCCGGCGTACGGGGAAGGACGTCAATAGTC
+>HWI-EAS91_1_306UPAAXX:6:1:708:1387
+TGCTTCTGACGTTCGTGATGAGTTTGTATCTGTTAC
+>HWI-EAS91_1_306UPAAXX:6:1:498:1085
+TTATGATAATCCCAATGCTTTGCGTGACTATTTTCT
+>HWI-EAS91_1_306UPAAXX:6:1:1101:1301
+TCCGTACGTTTCCAGACCGCTTTGGCCTCTATTAAT
+>HWI-EAS91_1_306UPAAXX:6:1:261:213
+GAATGGTCGCCATGATGGTGGTTATTATACCGTCAC
+>HWI-EAS91_1_306UPAAXX:6:1:1287:1267
+TGCTACTGACCGCTCTCGTGCTCGTCGCTGCGTTGT
+>HWI-EAS91_1_306UPAAXX:6:1:744:331
+TTAATGGATGAATTGGCACAATGCTACAATGTGCTC
+>HWI-EAS91_1_306UPAAXX:6:1:614:814
+TGTCAGCGTCATAAGAGGTTTTACCTCCAAATGAAG
+>HWI-EAS91_1_306UPAAXX:6:1:1362:1063
+TAAACGCAAGCCTCAACGCAGCGACGAGCACGAGAG
+>HWI-EAS91_1_306UPAAXX:6:1:1238:1508
+TCAACTAACGATTCTGTCAAAAACTGACGCGTTGGT
+>HWI-EAS91_1_306UPAAXX:6:1:904:1130
+TTATCGCAATCTGCCGACCACTCGCGATTCAATCAT
+>HWI-EAS91_1_306UPAAXX:6:1:465:216
+GACCATGCCGCTTTTCTTGGCACGATTAACCCTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:844:628
+TAATGTCAATAGATGTGGTAGAAGTCGTCATTTGGC
+>HWI-EAS91_1_306UPAAXX:6:1:684:1444
+TATCCCACAAAGTCCAGCGTACCATAAACGCAAGCC
+>HWI-EAS91_1_306UPAAXX:6:1:515:1373
+TAAGTTCATGAAGGATGGTGTTAATGCCACTCCTCT
+>HWI-EAS91_1_306UPAAXX:6:1:764:1667
+TTGAGTTCGATAATGGTGATATGTATGTTGACGTCC
+>HWI-EAS91_1_306UPAAXX:6:1:1722:598
+TGAGTTTATTGCTGCCGTCATTGCTTATTATGTTCT
+>HWI-EAS91_1_306UPAAXX:6:1:670:1188
+TTCTGTCAAAAACTGACGCGTTGGATGAGGAGAAGT
+>HWI-EAS91_1_306UPAAXX:6:1:1682:1705
+TAGCCACATAGAAACCAACAGCCATATAACTGGTAG
+>HWI-EAS91_1_306UPAAXX:6:1:1008:1616
+TCCTTTACTTGTCATGCGCTCTAATCTCTGTGCATC
+>HWI-EAS91_1_306UPAAXX:6:1:490:1220
+TAAAAATTTTAATTTTTGCCGCTGAGGGGTTGACCT
+>HWI-EAS91_1_306UPAAXX:6:1:891:1437
+TAATGGTGATATGTATGTTTACGTCCATAAGGCTGT
+>HWI-EAS91_1_306UPAAXX:6:1:1310:321
+TCAATCCCCAATGCTTGGCTTCCATAAGCAGATGGT
+>HWI-EAS91_1_306UPAAXX:6:1:827:1597
+TGCGAGGTACTAAAGGCAAGCGTAAAGGCGCTCGTC
+>HWI-EAS91_1_306UPAAXX:6:1:1062:1158
+TAGAGTCAATAGCAAGGCCACGACGCAATGGAGAAA
+>HWI-EAS91_1_306UPAAXX:6:1:1419:208
+TGGCGCATAATCTCGGAAACCTGCTGTTGCTTGGAA
+>HWI-EAS91_1_306UPAAXX:6:1:691:1018
+AAATATCAACCACACCAGAAGCAGCATCAGTGACGA
+>HWI-EAS91_1_306UPAAXX:6:1:374:113
+GATAAAGCTGTTGCCGATACTTGGAACAATTTCTGT
+>HWI-EAS91_1_306UPAAXX:6:1:1720:784
+TGAGGATAAATTATGTCTAATATTCAAACTGGCGCC
+>HWI-EAS91_1_306UPAAXX:6:1:1424:1394
+ATAAAAATGATTGGCGTATCCAACCTGCAGAGTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1063:1760
+TAGTTCATCAGCAAACGCAGAATCAGCGGTATGGCT
+>HWI-EAS91_1_306UPAAXX:6:1:1235:729
+TTTTTATGTCCCTCATCGTCACGTTTATGGTGAACA
+>HWI-EAS91_1_306UPAAXX:6:1:167:1507
+TAGTGTTATTAATATCAAGTTTTTGGAGCACATTGT
+>HWI-EAS91_1_306UPAAXX:6:1:717:1569
+TCAGGAACCGCCTTCTGGTGATTTGCAAGAACGCGT
+>HWI-EAS91_1_306UPAAXX:6:1:610:765
+TTCAGCGCCTTCCATGATGAGACAGGCCGTTTGAAT
+>HWI-EAS91_1_306UPAAXX:6:1:663:380
+TAAACATTCTGTGCCGCGTTTCTTTGTTCCTTATCT
+>HWI-EAS91_1_306UPAAXX:6:1:790:1358
+TTATCACCTTATTGAAGGCTTATCATTCATTTAGGT
+>HWI-EAS91_1_306UPAAXX:6:1:965:1633
+TAGATGTGGTAGAAGTCGTCATTTGGCGAGAAAGCT
+>HWI-EAS91_1_306UPAAXX:6:1:673:319
+TTCTTGCAAATCACCAGAAGGCGGTTCCTGAATGAT
+>HWI-EAS91_1_306UPAAXX:6:1:684:371
+TAGCGGTAAAGTTAGACCAAACCATGAAACCAACAT
+>HWI-EAS91_1_306UPAAXX:6:1:1147:1444
+ATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGC
+>HWI-EAS91_1_306UPAAXX:6:1:983:678
+ATACCTGGTCTTTCGTATTCTGGCGTGAAGTCGCCG
+>HWI-EAS91_1_306UPAAXX:6:1:1608:1119
+TCACGCGGCGGCAAGTTGCCATACAAAACAGGGTCG
+>HWI-EAS91_1_306UPAAXX:6:1:1048:1193
+TAGTCAGGTTAAATTTAATGTGACCGTTTATCGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:1265:1429
+ATATTTTTCATGGTATTGATAAAGCTGTTGCCGATT
+>HWI-EAS91_1_306UPAAXX:6:1:1607:1677
+TGTTGCTTGGAAAGATTGGTGTTTTCCATAATAGAC
+>HWI-EAS91_1_306UPAAXX:6:1:1087:1421
+ACGAACGTCAGAAGCAGCCTTATGGCCGTCAACATC
+>HWI-EAS91_1_306UPAAXX:6:1:324:490
+GCACCAAACATAAATCACCTCACTTAAGTGGCTGGG
+>HWI-EAS91_1_306UPAAXX:6:1:1596:614
+TTACCGCTACTAAATGCCGCGGATTGGTTTCGCTGT
+>HWI-EAS91_1_306UPAAXX:6:1:343:83
+GTTACGCAGTTTTGCCGCAAGCTGGCTGCTGTACGC
+>HWI-EAS91_1_306UPAAXX:6:1:203:667
+GCATGAATGTGCTTAATAGAGGCCAAGGCGGTCTAG
+>HWI-EAS91_1_306UPAAXX:6:1:34:480
+GGCAAGTTGCCATACAAAACAGGGTCGCCAGCAATT
+>HWI-EAS91_1_306UPAAXX:6:1:606:1743
+TAGCGACAGCTTGGTTTTTAGTGAGTTGTTCCATTC
+>HWI-EAS91_1_306UPAAXX:6:1:254:1391
+TATAATTACCCCAAAAAGAAAGGTATTAAGGATGAG
+>HWI-EAS91_1_306UPAAXX:6:1:1568:1750
+TAACCAGTAGTGTTAACAGTCGGGAGAGGAGTGGCT
+>HWI-EAS91_1_306UPAAXX:6:1:1538:869
+TACCCCAAAAAGAAAGGTATTAAGGATGAGTGTTCA
+>HWI-EAS91_1_306UPAAXX:6:1:255:38
+GTCAGGATTGACACCCTCCCAATTGTATGTTTTCAT
+>HWI-EAS91_1_306UPAAXX:6:1:1543:1555
+TAAAACGCCTCTAATCGGTCGTCAGCCAACGTGAGG
+>HWI-EAS91_1_306UPAAXX:6:1:1365:733
+AGAATCAGCGGTATGGCTCTTCTCCTTTTTTCGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:1604:943
+TACTTGTCATGCGCTCTAATCTCTGGGCATCTGGCT
+>HWI-EAS91_1_306UPAAXX:6:1:1574:1632
+TCAGTATGCAAATTAGCATAAGCAGCTTGCAGACCC
+>HWI-EAS91_1_306UPAAXX:6:1:565:1799
+TCTTGGTCAGTATGCAAATTAGCATAAGCAGCTTGC
+>HWI-EAS91_1_306UPAAXX:6:1:1004:380
+TATTGACTCTACTGTAGACATTTTTACTTTTTATTT
+>HWI-EAS91_1_306UPAAXX:6:1:1345:965
+ATTCAAAGGATAAACATCATAGGCAGTCGGGAGGGT
+>HWI-EAS91_1_306UPAAXX:6:1:1704:756
+TGGTAAAGGACTTCTTGAAGGTACGTTGCAGGCTGG
+>HWI-EAS91_1_306UPAAXX:6:1:310:1346
+TATAACGTTGACGATGTAGCTTTAGTTTTCTTTAAA
+>HWI-EAS91_1_306UPAAXX:6:1:900:1858
+TTTACCGCTTCGGCGTTATAACCTCACACTCAATCT
+>HWI-EAS91_1_306UPAAXX:6:1:1250:1741
+TAAATCCAAAACGGCAGAAGCCTGAATGAGCTTAAT
+>HWI-EAS91_1_306UPAAXX:6:1:1170:1317
+TCAAACTGGCGCCGAGCGTATGCCGCATGACCTTTC
+>HWI-EAS91_1_306UPAAXX:6:1:149:1896
+GCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATC
+>HWI-EAS91_1_306UPAAXX:6:1:1504:494
+TGTCTACAGTAGAGTCAATAGCAAGGCCACGACGCC
+>HWI-EAS91_1_306UPAAXX:6:1:395:256
+GTCCATATCTGACTTTTTGTTAACGTATTTATCCAC
+>HWI-EAS91_1_306UPAAXX:6:1:1110:1109
+ACCGCTTCGGCGTTATAACCTCACACTCAATCTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:895:649
+TTCTGCACGTAATTTTTGACGCACGTTTTCTTCTGC
+>HWI-EAS91_1_306UPAAXX:6:1:827:1378
+TGCAAGCTGCTTATGCTAATTTGCATACTGACCAAG
+>HWI-EAS91_1_306UPAAXX:6:1:1051:1587
+TTTGACACTCTCACGTTGGCTGACGACCGATTAGAG
+>HWI-EAS91_1_306UPAAXX:6:1:1656:1549
+AACCTGCTGTTGCTTGGAAAGATTGGTGTTTTCCAT
+>HWI-EAS91_1_306UPAAXX:6:1:366:150
+GGTCAGTAGCAATCCAAACTTTGTTACTCGTCAGAA
+>HWI-EAS91_1_306UPAAXX:6:1:955:1792
+ATTAAGCTCATTCAGGCTTCTGCCGTTTTGGATTTA
+>HWI-EAS91_1_306UPAAXX:6:1:1340:1403
+ATAAAATGCACCGCATGGAAATGAAGACGGCCATTA
+>HWI-EAS91_1_306UPAAXX:6:1:1693:1017
+TGAGTTTGTATCTGTTACTGAGAAGTTAATGGATGT
+>HWI-EAS91_1_306UPAAXX:6:1:1099:1572
+AATTTTTACCGCTTCGGCGTTATAACCTCACACTCA
+>HWI-EAS91_1_306UPAAXX:6:1:218:1148
+TATGCAAATTAGCATAAGCAGCTTGCAGACCCATAT
+>HWI-EAS91_1_306UPAAXX:6:1:403:614
+TGGTGCTGATGCTTCCTCTGCTGGTATGGTTTACGC
+>HWI-EAS91_1_306UPAAXX:6:1:1651:646
+TCAAGCTCTTGGAAGAGATTCTGTCTTTTCGTATGC
+>HWI-EAS91_1_306UPAAXX:6:1:1566:499
+TGCGGTGCATTTTATGCGGACACTTCCTACAGGTAG
+>HWI-EAS91_1_306UPAAXX:6:1:825:951
+ACAGGCCGTTTGAATGTTTACGGGGTGTACATAATA
+>HWI-EAS91_1_306UPAAXX:6:1:1745:1865
+TTAACTTCTGCGTCATGGAAGCGATAAAACTCTGCG
+>HWI-EAS91_1_306UPAAXX:6:1:973:1992
+TAGTAATTCCTGCTTTATCAAGATAATTTTTCGACT
+>HWI-EAS91_1_306UPAAXX:6:1:171:1653
+TAATAATGTTTTCCGTAAATTCAGCGCCTTCCATGT
+>HWI-EAS91_1_306UPAAXX:6:1:397:363
+TGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGT
+>HWI-EAS91_1_306UPAAXX:6:1:1336:1155
+ATATGTATGTTGACGGCCATAAGGCTGCTTCTGACG
+>HWI-EAS91_1_306UPAAXX:6:1:685:629
+AGTATGCAAATTAGCATAAGCAGCTTGCAGACCCAT
+>HWI-EAS91_1_306UPAAXX:6:1:1409:510
+ACATAATAAGCAATGACGGCAGCAATAAACTCAACA
+>HWI-EAS91_1_306UPAAXX:6:1:1631:998
+AACCATCAGCATGAGCCTGTCGCATTGCATTCATCC
+>HWI-EAS91_1_306UPAAXX:6:1:260:1698
+TTATTATGTTCATCCCGTCAACATTCAAACTGCCTT
+>HWI-EAS91_1_306UPAAXX:6:1:578:971
+TTAACGCTACTAAATTCCGCGGATTGGTTTCGTTGT
+>HWI-EAS91_1_306UPAAXX:6:1:1613:642
+ATAGAAATTTCACGCGGCGGCAAGTTGCCATACAAA
+>HWI-EAS91_1_306UPAAXX:6:1:237:650
+GACGGTATAATAACCACCATCATGGCGACCATTCAA
+>HWI-EAS91_1_306UPAAXX:6:1:1100:1875
+TTATGGTTCGTTCTTATTACCCTTCTGAATGTCACG
+>HWI-EAS91_1_306UPAAXX:6:1:352:32
+GTACCATAAACGCAAGCCTCAACGCAGCGACGAGCC
+>HWI-EAS91_1_306UPAAXX:6:1:443:229
+GCAGTAGGCGGAAAACGAACAAGCGCAAGAGTAAAC
+>HWI-EAS91_1_306UPAAXX:6:1:1131:731
+AGCAGTCGGCGTGTGAATCATTAGCCTTGCGACCCT
+>HWI-EAS91_1_306UPAAXX:6:1:133:1089
+AAGGTTAGTGCTGAGGTTGACTTAGTTCATCATCAA
+>HWI-EAS91_1_306UPAAXX:6:1:65:1307
+TGTGGGATACCCTCGCTTTCCTGCTCCTGTTGAGTT
+>HWI-EAS91_1_306UPAAXX:6:1:905:1493
+TCAGCTTTACCGTCTTTCCAGAAATTGTTCCAAGTT
+>HWI-EAS91_1_306UPAAXX:6:1:733:540
+TGCCGCGGATTGGTTTCGCTGAATCAGGTTATTAAT
+>HWI-EAS91_1_306UPAAXX:6:1:161:1707
+TAATGTCGTCACTGATGCTGCTTCTGTTGTTGTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:380:1861
+TCTGGGCATCTGGCTATGATGTTGATGGAACTGACC
+>HWI-EAS91_1_306UPAAXX:6:1:1761:566
+TATTGGTCGTATGGTTCTTGCTGCCGAGGGTCGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:1486:651
+TGGCGGTATTTCTTCTTCTCTTTCTTGTTGCGCCCT
+>HWI-EAS91_1_306UPAAXX:6:1:508:1380
+TCCATCAACATCATAGCCAGATGCCCAGAGATTAGA
+>HWI-EAS91_1_306UPAAXX:6:1:1763:855
+TGTTTTGTATGGCAACTTGCCGCCGCGTGAAATTTC
+>HWI-EAS91_1_306UPAAXX:6:1:1553:553
+TAATTGCAGGGGCTTCGGCCCCTTACTTGAGGATAA
+>HWI-EAS91_1_306UPAAXX:6:1:1424:507
+TCCACTGCAACAACTGAACGGACTGGAAACACTGGT
+>HWI-EAS91_1_306UPAAXX:6:1:988:135
+TAAGCTGGTTCTCACTTCTGTTACTCCAGCTTCTTC
+>HWI-EAS91_1_306UPAAXX:6:1:810:1918
+TTTTCATCCCGAAGTTGCGGCTCATTCTGATTCTGT
+>HWI-EAS91_1_306UPAAXX:6:1:588:559
+TCTGGTTGAACGGCGTCGCGTCGTAACCCAGCTTGG
+>HWI-EAS91_1_306UPAAXX:6:1:1264:1214
+ATCAGGTTATTAAAGAGATTATTTGTCTCCAGCCAC
+>HWI-EAS91_1_306UPAAXX:6:1:1000:1475
+TCCGTTCTGGTGATTCGTCTAAGAAGTTTAAGATTG
+>HWI-EAS91_1_306UPAAXX:6:1:1389:160
+TTATTCGCCACCATGATTATGACCAGTGTTTCCAGT
+>HWI-EAS91_1_306UPAAXX:6:1:422:1296
+TTCAACTACTCCGGTTATCGCTGGCGACTCCTTCGT
+>HWI-EAS91_1_306UPAAXX:6:1:1273:856
+TAGCCATAGCACCAGAAACAAAACTAGGGGCGGCCT
+>HWI-EAS91_1_306UPAAXX:6:1:450:969
+TGTTTTCCATAATAGACGCAACGCGAGCAGTAGACT
+>HWI-EAS91_1_306UPAAXX:6:1:1202:828
+ATCGTCAACGTTATATTTTGATAGTTTGACGTTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:1721:1800
+GGGTTAGGGACATTAGAGCCTTGACTGACTGAGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:228:2019
+TTGAGTAAGCATTTGGCGCATAATCTCGGAAACCTG
+>HWI-EAS91_1_306UPAAXX:6:1:1579:1214
+ACGTTTGGTCAGTTCCATCAACATCATAGCCAGATG
+>HWI-EAS91_1_306UPAAXX:6:1:429:1055
+TTTTGCCTGTTTGGTTCGCTTTGAGTCTTCTTCTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1359:1689
+AAGAGCAGAAGCAATACCGCCAGCAATAGCACCAAA
+>HWI-EAS91_1_306UPAAXX:6:1:1474:1056
+TCCTCCTGAGACTGAGCTTTCTCGCCAAATGACGAC
+>HWI-EAS91_1_306UPAAXX:6:1:105:1818
+TTGGGGATTGAGAAAGAGTAGAAATGCCACAAGCCT
+>HWI-EAS91_1_306UPAAXX:6:1:208:1538
+TAAAATGCAACTGGACAATCAGAAAGAGATTGCCGA
+>HWI-EAS91_1_306UPAAXX:6:1:1361:1623
+AATCCGTACGTTTCCAGACCGCTTTGGCCTCTATTA
+>HWI-EAS91_1_306UPAAXX:6:1:595:1670
+TGAATCTCTTTAGTCGCAGTAGGCGGAAAACGAACA
+>HWI-EAS91_1_306UPAAXX:6:1:6:1885
+TCTAATGTCGTCACTGATGCTGCTTCTGGTGTGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:706:1085
+TGGTTCGTTCTTATTACCCTTCTGAATGTCACGCTG
+>HWI-EAS91_1_306UPAAXX:6:1:1307:825
+AGCGGTAAAGTTAGACCAAACCATGAAACCAACATA
+>HWI-EAS91_1_306UPAAXX:6:1:762:802
+TGGCATTAACACCATCCTTCATGAACTTAATCCACT
+>HWI-EAS91_1_306UPAAXX:6:1:1657:506
+TTGCGACCCTCGGCAGCAAGAACCATACGACCAATT
+>HWI-EAS91_1_306UPAAXX:6:1:184:811
+TTCTGATAAGCTGGTTCTCACTTCTGTTACTCCAGC
+>HWI-EAS91_1_306UPAAXX:6:1:1469:1718
+TGACCGCTCTCGTGCTCGTCGCTGCGTTGAGGCTTT
+>HWI-EAS91_1_306UPAAXX:6:1:815:1640
+TGGCGGCGATTGCGTACCCGACGACCCAAATTAGGG
+>HWI-EAS91_1_306UPAAXX:6:1:1580:1388
+AAGGCTTCCCATTCATTCAGGAACCGCCTTCTGGTG
+>HWI-EAS91_1_306UPAAXX:6:1:1617:1554
+TACGGGGAAGGACGTCAATAGTCACACAGTCCTTGA
+>HWI-EAS91_1_306UPAAXX:6:1:1544:431
+TGATGCTAAAGGTGAGCCGCTTAAAGCTACCAGTTA
+>HWI-EAS91_1_306UPAAXX:6:1:1604:1541
+TCAGTGACGACATTAGAAATATCCTTTGCAGTAGCG
+>HWI-EAS91_1_306UPAAXX:6:1:1485:741
+ATCAAACGCTGAATAGTAAAGCCTCTACGCGATTTC
+>HWI-EAS91_1_306UPAAXX:6:1:1226:393
+TGCCACAAGCCTCAATAGCAGGTTTAAGAGCCTCGA
+>HWI-EAS91_1_306UPAAXX:6:1:1506:973
+ATTAGGGTCAACGCTACCTGTAGGAAGTGTCCGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:890:1838
+TGTCTAATATTCAAACTGGCGCCGAGCGTATGCCGC
+>HWI-EAS91_1_306UPAAXX:6:1:453:1527
+TAAGAGGGCGTTCAGCAGCCAGCTTGCGGCAAAACT
+>HWI-EAS91_1_306UPAAXX:6:1:1056:570
+ACATTGTAGCATTGTGCCAATTCATCCATTAACTTC
+>HWI-EAS91_1_306UPAAXX:6:1:1736:74
+TATCCGAAAGTGTTAACTTCTGCGTCATGGAAGCGT
+>HWI-EAS91_1_306UPAAXX:6:1:169:1896
+GTATGCAAATTAGCATAAGCAGCTTGCAGACCCATA
+>HWI-EAS91_1_306UPAAXX:6:1:259:949
+TGAGGATAAATTATGTCTAATATTCAAACTTGCTCC
+>HWI-EAS91_1_306UPAAXX:6:1:1205:893
+ATTTCTGGAAAGACGGTAAAGCTGATGGTATTGGCT
+>HWI-EAS91_1_306UPAAXX:6:1:732:1335
+TACTCGTGATTATCTTGCTGCTGCATTTCCTGAGCT
+>HWI-EAS91_1_306UPAAXX:6:1:667:664
+TCTGAGTCCGATGCTGTTCAACCACTAATAGGTAAG
+>HWI-EAS91_1_306UPAAXX:6:1:535:587
+TTAGAGGCGTTTTATGATAATCCCAATGCTTTGCGT
+>HWI-EAS91_1_306UPAAXX:6:1:412:446
+GTGTGGTTGATATTTTTCATGGTATTGATAAAGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:507:1599
+TTGCTGGCGGTTTTTCTTTTTTTTTTTTTTTTTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:575:1818
+TAAAATGCACCGCATGGAAATGAAGACGGCCATTAG
+>HWI-EAS91_1_306UPAAXX:6:1:1568:1428
+ACCAGTTATATGGCTGGTTGTTTTTTTTTTTTTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1417:982
+AACAAGAGAATCTCTACCATGAACAAAATGTGACTC
+>HWI-EAS91_1_306UPAAXX:6:1:280:1340
+GGCCAAACCAGTGGCGATGGCCGCGCTGGAGGTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:966:144
+TACTAAATGCCGCGGATTGGTTTCGCTGAATCAGGT
+>HWI-EAS91_1_306UPAAXX:6:1:1391:1987
+TAATAATGTTTTCCGTAAATTCAGCGCCTTCCATGT
+>HWI-EAS91_1_306UPAAXX:6:1:1280:278
+ATGGAAATGAAGACGGCCATTAGCTGTACCATACTC
+>HWI-EAS91_1_306UPAAXX:6:1:631:858
+TGATATTGGTCGTATGGTTCTTGCTTCCGTGGGTCT
+>HWI-EAS91_1_306UPAAXX:6:1:518:573
+TTAGGTGTCTGTAAAACAGGTGCCGAAGAAGCTGGT
+>HWI-EAS91_1_306UPAAXX:6:1:54:981
+TTGACATTTTAAAAGAGCGTGGATTACTATCTGATT
+>HWI-EAS91_1_306UPAAXX:6:1:218:1165
+TATTGACTCTACTGTAGACATTTTTACTTTTTATTT
+>HWI-EAS91_1_306UPAAXX:6:1:1727:1530
+TCAACGCAGCGACGAGCACGAGAGCGGTCAGTAGCA
+>HWI-EAS91_1_306UPAAXX:6:1:519:657
+TGAACAGCATCGGACTCAGATAGTAATCCACGCTCT
+>HWI-EAS91_1_306UPAAXX:6:1:939:967
+ATACCGTCAAGGACTGTGTGACTATTGACGTCCTTC
+>HWI-EAS91_1_306UPAAXX:6:1:299:1060
+TATAACTGGTAGCTTTAAGCGGCTCACCTTTAGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:438:665
+TAATTCGTAAACAAGCAGTAGTAATTCCTGCTTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:1303:1971
+AGCATTGTGCCAATTCATCCATTAACTTCTCAGTAA
+>HWI-EAS91_1_306UPAAXX:6:1:214:1264
+TCAGCACCAACAGAAACAACCTGATTAGCGGCGTTG
+>HWI-EAS91_1_306UPAAXX:6:1:1454:1423
+AACGGAAAACATCCTTCATAGAAATTTCACGCGGCG
+>HWI-EAS91_1_306UPAAXX:6:1:1633:340
+TTCCATAATAGACGCAACGCGAGCAGTAGACTCCTT
+>HWI-EAS91_1_306UPAAXX:6:1:671:1196
+ATACGAAAAGACAGAATCTCTTCCAAGAGCTTGATG
diff -r b6ff467f4522 -r 26825f08d362 test-data/phiX.fa
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/phiX.fa Sun Sep 14 14:58:50 2008 -0400
@@ -0,0 +1,79 @@
+>phiX
+GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTT
+GATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAGTGGACTGCTGGCGGAAAATGAGAAA
+ATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGATTCTG
+TCAAAAACTGACGCGTTGGATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGTTTA
+GATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTGGATTACTATC
+TGAGTCCGATGCTGTTCAACCACTAATAGGTAAGAAATCATGAGTCAAGTTACTGAACAATCCGTACGTT
+TCCAGACCGCTTTGGCCTCTATTAAGCTCATTCAGGCTTCTGCCGTTTTGGATTTAACCGAAGATGATTT
+CGATTTTCTGACGAGTAACAAAGTTTGGATTGCTACTGACCGCTCTCGTGCTCGTCGCTGCGTTGAGGCT
+TGCGTTTATGGTACGCTGGACTTTGTGGGATACCCTCGCTTTCCTGCTCCTGTTGAGTTTATTGCTGCCG
+TCATTGCTTATTATGTTCATCCCGTCAACATTCAAACGGCCTGTCTCATCATGGAAGGCGCTGAATTTAC
+GGAAAACATTATTAATGGCGTCGAGCGTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCTTGCGTGTA
+CGCGCAGGAAACACTGACGTTCTTACTGACGCAGAAGAAAACGTGCGTCAAAAATTACGTGCGGAAGGAG
+TGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTACT
+AAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGC
+CCCTTACTTGAGGATAAATTATGTCTAATATTCAAACTGGCGCCGAGCGTATGCCGCATGACCTTTCCCA
+TCTTGGCTTCCTTGCTGGTCAGATTGGTCGTCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGAC
+TCCTTCGAGATGGACGCCGTTGGCGCTCTCCGTCTTTCTCCATTGCGTCGTGGCCTTGCTATTGACTCTA
+CTGTAGACATTTTTACTTTTTATGTCCCTCATCGTCACGTTTATGGTGAACAGTGGATTAAGTTCATGAA
+GGATGGTGTTAATGCCACTCCTCTCCCGACTGTTAACACTACTGGTTATATTGACCATGCCGCTTTTCTT
+GGCACGATTAACCCTGATACCAATAAAATCCCTAAGCATTTGTTTCAGGGTTATTTGAATATCTATAACA
+ACTATTTTAAAGCGCCGTGGATGCCTGACCGTACCGAGGCTAACCCTAATGAGCTTAATCAAGATGATGC
+TCGTTATGGTTTCCGTTGCTGCCATCTCAAAAACATTTGGACTGCTCCGCTTCCTCCTGAGACTGAGCTT
+TCTCGCCAAATGACGACTTCTACCACATCTATTGACATTATGGGTCTGCAAGCTGCTTATGCTAATTTGC
+ATACTGACCAAGAACGTGATTACTTCATGCAGCGTTACCATGATGTTATTTCTTCATTTGGAGGTAAAAC
+CTCTTATGACGCTGACAACCGTCCTTTACTTGTCATGCGCTCTAATCTCTGGGCATCTGGCTATGATGTT
+GATGGAACTGACCAAACGTCGTTAGGCCAGTTTTCTGGTCGTGTTCAACAGACCTATAAACATTCTGTGC
+CGCGTTTCTTTGTTCCTGAGCATGGCACTATGTTTACTCTTGCGCTTGTTCGTTTTCCGCCTACTGCGAC
+TAAAGAGATTCAGTACCTTAACGCTAAAGGTGCTTTGACTTATACCGATATTGCTGGCGACCCTGTTTTG
+TATGGCAACTTGCCGCCGCGTGAAATTTCTATGAAGGATGTTTTCCGTTCTGGTGATTCGTCTAAGAAGT
+TTAAGATTGCTGAGGGTCAGTGGTATCGTTATGCGCCTTCGTATGTTTCTCCTGCTTATCACCTTCTTGA
+AGGCTTCCCATTCATTCAGGAACCGCCTTCTGGTGATTTGCAAGAACGCGTACTTATTCGCCACCATGAT
+TATGACCAGTGTTTCCAGTCCGTTCAGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGACCGTTT
+ATCGCAATCTGCCGACCACTCGCGATTCAATCATGACTTCGTGATAAAAGATTGAGTGTGAGGTTATAAC
+GCCGAAGCGGTAAAAATTTTAATTTTTGCCGCTGAGGGGTTGACCAAGCGAAGCGCGGTAGGTTTTCTGC
+TTAGGAGTTTAATCATGTTTCAGACTTTTATTTCTCGCCATAATTCAAACTTTTTTTCTGATAAGCTGGT
+TCTCACTTCTGTTACTCCAGCTTCTTCGGCACCTGTTTTACAGACACCTAAAGCTACATCGTCAACGTTA
+TATTTTGATAGTTTGACGGTTAATGCTGGTAATGGTGGTTTTCTTCATTGCATTCAGATGGATACATCTG
+TCAACGCCGCTAATCAGGTTGTTTCTGTTGGTGCTGATATTGCTTTTGATGCCGACCCTAAATTTTTTGC
+CTGTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCGACTACCCTCCCGACTGCCTATGATGTTTATCCTTTG
+AATGGTCGCCATGATGGTGGTTATTATACCGTCAAGGACTGTGTGACTATTGACGTCCTTCCCCGTACGC
+CGGGCAATAACGTTTATGTTGGTTTCATGGTTTGGTCTAACTTTACCGCTACTAAATGCCGCGGATTGGT
+TTCGCTGAATCAGGTTATTAAAGAGATTATTTGTCTCCAGCCACTTAAGTGAGGTGATTTATGTTTGGTG
+CTATTGCTGGCGGTATTGCTTCTGCTCTTGCTGGTGGCGCCATGTCTAAATTGTTTGGAGGCGGTCAAAA
+AGCCGCCTCCGGTGGCATTCAAGGTGATGTGCTTGCTACCGATAACAATACTGTAGGCATGGGTGATGCT
+GGTATTAAATCTGCCATTCAAGGCTCTAATGTTCCTAACCCTGATGAGGCCGCCCCTAGTTTTGTTTCTG
+GTGCTATGGCTAAAGCTGGTAAAGGACTTCTTGAAGGTACGTTGCAGGCTGGCACTTCTGCCGTTTCTGA
+TAAGTTGCTTGATTTGGTTGGACTTGGTGGCAAGTCTGCCGCTGATAAAGGAAAGGATACTCGTGATTAT
+CTTGCTGCTGCATTTCCTGAGCTTAATGCTTGGGAGCGTGCTGGTGCTGATGCTTCCTCTGCTGGTATGG
+TTGACGCCGGATTTGAGAATCAAAAAGAGCTTACTAAAATGCAACTGGACAATCAGAAAGAGATTGCCGA
+GATGCAAAATGAGACTCAAAAAGAGATTGCTGGCATTCAGTCGGCGACTTCACGCCAGAATACGAAAGAC
+CAGGTATATGCACAAAATGAGATGCTTGCTTATCAACAGAAGGAGTCTACTGCTCGCGTTGCGTCTATTA
+TGGAAAACACCAATCTTTCCAAGCAACAGCAGGTTTCCGAGATTATGCGCCAAATGCTTACTCAAGCTCA
+AACGGCTGGTCAGTATTTTACCAATGACCAAATCAAAGAAATGACTCGCAAGGTTAGTGCTGAGGTTGAC
+TTAGTTCATCAGCAAACGCAGAATCAGCGGTATGGCTCTTCTCATATTGGCGCTACTGCAAAGGATATTT
+CTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGA
+TACTTGGAACAATTTCTGGAAAGACGGTAAAGCTGATGGTATTGGCTCTAATTTGTCTAGGAAATAACCG
+TCAGGATTGACACCCTCCCAATTGTATGTTTTCATGCCTCCAAATCTTGGAGGCTTTTTTATGGTTCGTT
+CTTATTACCCTTCTGAATGTCACGCTGATTATTTTGACTTTGAGCGTATCGAGGCTCTTAAACCTGCTAT
+TGAGGCTTGTGGCATTTCTACTCTTTCTCAATCCCCAATGCTTGGCTTCCATAAGCAGATGGATAACCGC
+ATCAAGCTCTTGGAAGAGATTCTGTCTTTTCGTATGCAGGGCGTTGAGTTCGATAATGGTGATATGTATG
+TTGACGGCCATAAGGCTGCTTCTGACGTTCGTGATGAGTTTGTATCTGTTACTGAGAAGTTAATGGATGA
+ATTGGCACAATGCTACAATGTGCTCCCCCAACTTGATATTAATAACACTATAGACCACCGCCCCGAAGGG
+GACGAAAAATGGTTTTTAGAGAACGAGAAGACGGTTACGCAGTTTTGCCGCAAGCTGGCTGCTGAACGCC
+CTCTTAAGGATATTCGCGATGAGTATAATTACCCCAAAAAGAAAGGTATTAAGGATGAGTGTTCAAGATT
+GCTGGAGGCCTCCACTATGAAATCGCGTAGAGGCTTTGCTATTCAGCGTTTGATGAATGCAATGCGACAG
+GCTCATGCTGATGGTTGGTTTATCGTTTTTGACACTCTCACGTTGGCTGACGACCGATTAGAGGCGTTTT
+ATGATAATCCCAATGCTTTGCGTGACTATTTTCGTGATATTGGTCGTATGGTTCTTGCTGCCGAGGGTCG
+CAAGGCTAATGATTCACACGCCGACTGCTATCAGTATTTTTGTGTGCCTGAGTATGGTACAGCTAATGGC
+CGTCTTCATTTCCATGCGGTGCACTTTATGCGGACACTTCCTACAGGTAGCGTTGACCCTAATTTTGGTC
+GTCGGGTACGCAATCGCCGCCAGTTAAATAGCTTGCAAAATACGTGGCCTTATGGTTACAGTATGCCCAT
+CGCAGTTCGCTACACGCAGGACGCTTTTTCACGTTCTGGTTGGTTGTGGCCTGTTGATGCTAAAGGTGAG
+CCGCTTAAAGCTACCAGTTATATGGCTGTTGGTTTCTATGTGGCTAAATACGTTAACAAAAAGTCAGATA
+TGGACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCACTAAAAACCAAGCTGTCGCTACT
+TCCCAAGAAGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTG
+TCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACGCCGTTCAACCAGATATTGAAGC
+AGAACGCAAAAAGAGAGATGAGATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGCGCAACC
+TGTGACGACAAATCTGCTCAAATTTATGCGCGCTTCGATAAAAATGATTGGCGTATCCAACCTGCA
+
1
0

[hg] galaxy 1505: Update MAF stitcher to be more efficient. Requ...
by gregļ¼ scofield.bx.psu.edu 22 Sep '08
by gregļ¼ scofield.bx.psu.edu 22 Sep '08
22 Sep '08
details: http://www.bx.psu.edu/hg/galaxy/rev/b6ff467f4522
changeset: 1505:b6ff467f4522
user: Dan Blankenberg <dan(a)bx.psu.edu>
date: Fri Sep 12 15:50:20 2008 -0400
description:
Update MAF stitcher to be more efficient. Requires bx-pyhon rev>=449.
2 file(s) affected in this change:
eggs.ini
lib/galaxy/tools/util/maf_utilities.py
diffs (188 lines):
diff -r 4e2ed1801931 -r b6ff467f4522 eggs.ini
--- a/eggs.ini Fri Sep 12 15:35:50 2008 -0400
+++ b/eggs.ini Fri Sep 12 15:50:20 2008 -0400
@@ -55,12 +55,12 @@
MySQL_python = _5.0.51a_static
python_lzo = _static
flup = .dev_r2311
-bx_python = _dev_r448
+bx_python = _dev_r449
nose = .dev_r101
; source location, necessary for scrambling
[source]
-bx_python = http://dist.g2.bx.psu.edu/bx-python_dist-r448.tar.bz2
+bx_python = http://dist.g2.bx.psu.edu/bx-python_dist-r449.tar.bz2
Cheetah = http://umn.dl.sourceforge.net/sourceforge/cheetahtemplate/Cheetah-1.0.tar.gz
DRMAA_python = http://gridengine.sunsource.net/files/documents/7/36/DRMAA-python-0.2.tar.gz
MySQL_python = http://superb-west.dl.sourceforge.net/sourceforge/mysql-python/MySQL-python⦠http://mysql.mirrors.pair.com/Downloads/MySQL-5.0/mysql-5.0.51a.tar.gz
diff -r 4e2ed1801931 -r b6ff467f4522 lib/galaxy/tools/util/maf_utilities.py
--- a/lib/galaxy/tools/util/maf_utilities.py Fri Sep 12 15:35:50 2008 -0400
+++ b/lib/galaxy/tools/util/maf_utilities.py Fri Sep 12 15:50:20 2008 -0400
@@ -54,11 +54,15 @@
#sets a position for a species
def set_position( self, index, species, base ):
+ if len( base ) != 1: raise "A genomic position can only have a length of 1."
+ return self.set_range( index, species, base )
+ #sets a range for a species
+ def set_range( self, index, species, bases ):
if index >= self.size or index < 0: raise "Your index (%i) is out of range (0 - %i)." % ( index, self.size - 1 )
- if len(base) != 1: raise "A genomic position can only have a length of 1."
+ if len( bases ) == 0: raise "A set of genomic positions can only have a positive length."
if species not in self.sequences.keys(): self.add_species( species )
self.sequences[species].seek( index )
- self.sequences[species].write( base )
+ self.sequences[species].write( bases )
#Flush temp file of specified species, or all species
def flush( self, species = None ):
@@ -164,32 +168,40 @@
except:
return ( None, None )
+def chop_block_by_region( block, src, region, species = None, mincols = 0, force_strand = None ):
+ ref = block.get_component_by_src( src )
+ #We want our block coordinates to be from positive strand
+ if ref.strand == "-":
+ block = block.reverse_complement()
+ ref = block.get_component_by_src( src )
+
+ #save old score here for later use
+ old_score = block.score
+ slice_start = max( region.start, ref.start )
+ slice_end = min( region.end, ref.end )
+
+ #slice block by reference species at determined limits
+ block = block.slice_by_component( ref, slice_start, slice_end )
+
+ if block.text_size > mincols:
+ if ( force_strand is None and region.strand != ref.strand ) or ( force_strand is not None and force_strand != ref.strand ):
+ block = block.reverse_complement()
+ # restore old score, may not be accurate, but it is better than 0 for everything
+ block.score = old_score
+ if species is not None:
+ block = block.limit_to_species( species )
+ block.remove_all_gap_columns()
+ return block
+ return None
#generator yielding only chopped and valid blocks for a specified region
def get_chopped_blocks_for_region( index, src, region, species = None, mincols = 0, force_strand = None ):
- for block in index.get_as_iterator( src, region.start, region.end ):
- ref = block.get_component_by_src( src )
- #We want our block coordinates to be from positive strand
- if ref.strand == "-":
- block = block.reverse_complement()
- ref = block.get_component_by_src( src )
-
- #save old score here for later use
- old_score = block.score
- slice_start = max( region.start, ref.start )
- slice_end = min( region.end, ref.end )
-
- #slice block by reference species at determined limits
- block = block.slice_by_component( ref, slice_start, slice_end )
-
- if block.text_size > mincols:
- if ( force_strand is None and region.strand != ref.strand ) or ( force_strand is not None and force_strand != ref.strand ):
- block = block.reverse_complement()
- # restore old score, may not be accurate, but it is better than 0 for everything
- block.score = old_score
- if species is not None:
- block = block.limit_to_species( species )
- block.remove_all_gap_columns()
- yield block
+ for block, idx, offset in get_chopped_blocks_with_index_offset_for_region( index, src, region, species, mincols, force_strand ):
+ yield block
+def get_chopped_blocks_with_index_offset_for_region( index, src, region, species = None, mincols = 0, force_strand = None ):
+ for block, idx, offset in index.get_as_iterator_with_index_and_offset( src, region.start, region.end ):
+ block = chop_block_by_region( block, src, region, species, mincols )
+ if block is not None:
+ yield block, idx, offset
#returns a filled region alignment for specified regions
def get_region_alignment( index, primary_species, chrom, start, end, strand = '+', species = None, mincols = 0 ):
@@ -199,46 +211,51 @@
#fills a region alignment
def fill_region_alignment( alignment, index, primary_species, chrom, start, end, strand = '+', species = None, mincols = 0 ):
- #first step through blocks, save index and score in array, then order by score (array will start as 0=index0,scoreX)
- #step through ordered list, step through maf blocks, stopping at index, store, then break inner loop
region = bx.intervals.Interval( start, end )
region.chrom = chrom
region.strand = strand
primary_src = "%s.%s" % ( primary_species, chrom )
-
+
+ def reduce_block_by_primary_genome( block ):
+ #returns ( startIndex, {species:texts}
+ #where texts' contents are reduced to only positions existing in the primary genome
+ ref = block.get_component_by_src( primary_src )
+ start_offset = ref.start - start
+ species_texts = {}
+ for c in block.components:
+ species_texts[ c.src.split( '.' )[0] ] = list( c.text )
+ #remove locations which are gaps in the primary species, starting from the downstream end
+ for i in range( len( species_texts[ primary_species ] ) - 1, -1, -1 ):
+ if species_texts[ primary_species ][i] == '-':
+ for text in species_texts.values():
+ text.pop( i )
+ for spec, text in species_texts.items():
+ species_texts[spec] = ''.join( text )
+ return ( start_offset, species_texts )
+
#Order blocks overlaping this position by score, lowest first
- blocks_order = []
- for i, block in enumerate( get_chopped_blocks_for_region( index, primary_src, region, species, mincols ) ):
- for j in range( 0, len( blocks_order ) ):
- if float( block.score ) < float( blocks_order[j]['score'] ):
- blocks_order.insert( j, {'index':i, 'score':block.score} )
+ blocks = []
+ for block, idx, offset in index.get_as_iterator_with_index_and_offset( primary_src, start, end ):
+ score = float( block.score )
+ for i in range( 0, len( blocks ) ):
+ if score < blocks[i][0]:
+ blocks.insert( i, ( score, idx, offset ) )
break
else:
- blocks_order.append( {'index':i, 'score':block.score} )
+ blocks.append( ( score, idx, offset ) )
- #Loop through ordered block indexes and layer blocks by score
- for block_dict in blocks_order:
- for block_index, block in enumerate( get_chopped_blocks_for_region( index, primary_src, region, species, mincols ) ):
- if block_index == block_dict['index']:
- ref = block.get_component_by_src( primary_src )
- #skip gap locations due to insertions in secondary species relative to primary species
- start_offset = ref.start - start
- num_gaps = 0
- for i in range( len( ref.text.rstrip().rstrip("-") ) ):
- if ref.text[i] in ["-"]:
- num_gaps += 1
- continue
- #Set base for all species
- for spec in [ c.src.split( '.' )[0] for c in block.components ]:
- try:
- #NB: If a gap appears in higher scoring secondary species block,
- #it will overwrite any bases that have been set by lower scoring blocks
- #this seems more proper than allowing, e.g. a single base from lower scoring alignment to exist outside of its genomic context
- alignment.set_position( start_offset + i - num_gaps, spec, block.get_component_by_src_start( spec ).text[i] )
- except:
- #species/sequence for species does not exist
- pass
- break
+ #Loop through ordered blocks and layer by increasing score
+ for block_dict in blocks:
+ block = chop_block_by_region( block_dict[1].get_at_offset( block_dict[2] ), primary_src, region, species, mincols, strand )
+ if block is None: continue
+ start_offset, species_texts = reduce_block_by_primary_genome( block )
+ for spec, text in species_texts.items():
+ try:
+ alignment.set_range( start_offset, spec, text )
+ except:
+ #species/sequence for species does not exist
+ pass
+
return alignment
#returns a filled spliced region alignment for specified region with start and end lists
1
0
I see from the parameters code that dynamic_options are to be replaced
with options as part of workflow buildout.
I'm finding lots of use cases where the dynamic_options returned by
code from an included module makes some complicated things really easy
for users. For example. In the new gene expression tools, each
expression experiment is stored as a new Galaxy datatype based on the
Bioconductor representation (affybatch, eset etc). Each of those
structures has (optional!) accompanying experimental metadata
(phenodata) which at the time the affybatch is being created, is in
the form of a tab delimited file with a header row. For constructing
design and contrast matrices for analyses, the user has to choose one
or more of those phenodata columns for that experiment - and the
choice typically might be limited to those columns containing
*exactly* two values - ie dichotomous contrasts.
I have code working that allows the user to choose an input (eg
affybatch) experiment file from their history, then to choose from
among *only* the dichotomous phenotype columns, and run the analysis -
you cannot imagine what a big deal this is compared with trying to
teach people to generate design and contrast matrices interactively in
R!
But of course, these miracles all rely on dynamic_options calling some
code included with the tool.
What's the best way forward for a situation where we need to obtain
this kind of drop down list for a tool, that depends on the choice on
a previous page, that will be compatible with workflows in the
long-haul?
I guess one approach is that when generating the (eg affybatch)
metadata, I guess I could create all the option lists I'm going to
ever need as additional metadata datastructures that could be used
like the options from files are used elsewhere - the catch is that
they'd all have to be precomputed rather than being computed on the
fly by the tool - is that reasonable or is there some way to allow
dynamic computing on the metadata (and it's a little complex,
involving parsing the phenodata and constructing a concordance of the
values in each column and eg returning only the columns with exactly
two values)
--
python -c "foo = map(None,'moc.liamg(a)surazal.ssor'); foo.reverse();
print ''.join(foo)"
1
0