galaxy-dev
Threads by month
- ----- 2025 -----
- July
- June
- May
- April
- March
- February
- January
- ----- 2024 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2023 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2022 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2021 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2020 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2019 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2018 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2017 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2016 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2015 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2014 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2013 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2012 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2011 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2010 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2009 -----
- December
- November
- October
- September
- August
- July
- June
- May
- April
- March
- February
- January
- ----- 2008 -----
- December
- November
- October
- September
- August
- 10008 discussions

[hg] galaxy 1513: Quick 'n easy solution to the EMBOSS stage in/...
by greg@scofield.bx.psu.edu 22 Sep '08
by greg@scofield.bx.psu.edu 22 Sep '08
22 Sep '08
details: http://www.bx.psu.edu/hg/galaxy/rev/cf17b5a16eff
changeset: 1513:cf17b5a16eff
user: Nate Coraor <nate(a)bx.psu.edu>
date: Wed Sep 17 10:45:20 2008 -0400
description:
Quick 'n easy solution to the EMBOSS stage in/out problem: read the
outputs from the "real" galaxy path instead of the temp stage path.
1 file(s) affected in this change:
lib/galaxy/jobs/runners/pbs.py
diffs (31 lines):
diff -r 1e408bab8941 -r cf17b5a16eff lib/galaxy/jobs/runners/pbs.py
--- a/lib/galaxy/jobs/runners/pbs.py Tue Sep 16 15:23:23 2008 -0400
+++ b/lib/galaxy/jobs/runners/pbs.py Wed Sep 17 10:45:20 2008 -0400
@@ -146,7 +146,7 @@
if self.app.config.pbs_application_server:
pbs_ofile = self.app.config.pbs_application_server + ':' + ofile
pbs_efile = self.app.config.pbs_application_server + ':' + efile
- stagein = self.get_stage_in_out( job_wrapper.get_input_fnames() + job_wrapper.get_output_fnames() )
+ stagein = self.get_stage_in_out( job_wrapper.get_input_fnames() + job_wrapper.get_output_fnames(), symlink=True )
stageout = self.get_stage_in_out( job_wrapper.get_output_fnames() )
job_attrs = pbs.new_attropl(5)
job_attrs[0].name = pbs.ATTR_o
@@ -372,15 +372,15 @@
self.queue.put( self.STOP_SIGNAL )
log.info( "pbs job runner stopped" )
- def get_stage_in_out( self, fnames ):
+ def get_stage_in_out( self, fnames, symlink=False ):
"""Convenience function to create a stagein/stageout list"""
stage = ''
for fname in fnames:
if os.access(fname, os.R_OK):
- if stage != '':
+ if stage:
stage += ','
# pathnames are now absolute
- if self.app.config.pbs_stage_path != '':
+ if symlink and self.app.config.pbs_stage_path:
stage_name = os.path.join(self.app.config.pbs_stage_path, os.path.split(fname)[1])
else:
stage_name = fname
1
0

[hg] galaxy 1512: The MetadataCollection object is now created o...
by greg@scofield.bx.psu.edu 22 Sep '08
by greg@scofield.bx.psu.edu 22 Sep '08
22 Sep '08
details: http://www.bx.psu.edu/hg/galaxy/rev/1e408bab8941
changeset: 1512:1e408bab8941
user: Dan Blankenberg <dan(a)bx.psu.edu>
date: Tue Sep 16 15:23:23 2008 -0400
description:
The MetadataCollection object is now created only once per dataset object instance (and when datatype is changed), instead of each time dataset.metadata is called.
The 'no_value' attribute for a metadata element's spec is returned when the metadata element's value is None.
2 file(s) affected in this change:
lib/galaxy/datatypes/metadata.py
lib/galaxy/model/__init__.py
diffs (89 lines):
diff -r c3ce08879473 -r 1e408bab8941 lib/galaxy/datatypes/metadata.py
--- a/lib/galaxy/datatypes/metadata.py Tue Sep 16 14:26:14 2008 -0400
+++ b/lib/galaxy/datatypes/metadata.py Tue Sep 16 15:23:23 2008 -0400
@@ -151,9 +151,16 @@
"""
def __init__(self, parent, spec):
self.parent = parent
- self.bunch = parent._metadata or dict()
if spec is None: self.spec = MetadataSpecCollection()
else: self.spec = spec
+
+ #set default metadata values
+ if not self.parent._metadata:
+ self.parent._metadata = {}
+ for name, value in self.spec.items():
+ if name not in self.bunch:
+ self.bunch[name] = value.default
+
def __iter__(self):
return self.bunch.__iter__()
def get( self, key, default=None ):
@@ -168,19 +175,21 @@
def __nonzero__(self):
return self.bunch.__nonzero__()
def __getattr__(self, name):
- if self.bunch.get( name ):
- return self.bunch.get( name )
+ if name == "bunch":
+ return self.parent._metadata
+ rval = self.bunch.get( name )
+ if rval is None:
+ rval = self.spec.get( name, None )
+ if rval:
+ rval = rval.no_value
+ return rval
+ def __setattr__(self, name, value):
+ if name in ["parent","spec"]:
+ self.__dict__[name] = value
+ elif name == "bunch":
+ self.parent._metadata = value
else:
- if self.spec.get(name, None):
- return self.spec[name].default
- else:
- return None
- def __setattr__(self, name, value):
- if name in ["parent","bunch","spec"]:
- self.__dict__[name] = value
- else:
- self.__dict__["bunch"][name] = value
- self.bunch = self.parent._metadata = dict( self.bunch )
+ self.bunch[name] = value
MetadataElement = Statement(MetadataElementSpec)
diff -r c3ce08879473 -r 1e408bab8941 lib/galaxy/model/__init__.py
--- a/lib/galaxy/model/__init__.py Tue Sep 16 14:26:14 2008 -0400
+++ b/lib/galaxy/model/__init__.py Tue Sep 16 15:23:23 2008 -0400
@@ -113,7 +113,7 @@
self.peek = peek
self.extension = extension
self.designation = designation
- self._metadata = metadata or dict()
+ self.metadata = metadata or dict()
self.dbkey = dbkey
self.deleted = deleted
self.visible = visible
@@ -159,9 +159,9 @@
return datatypes_registry.get_datatype_by_extension( self.extension )
def get_metadata( self ):
- if not self._metadata:
- self._metadata = dict()
- return MetadataCollection( self, self.datatype.metadata_spec )
+ if not hasattr( self, '_metadata_collection' ):
+ self._metadata_collection = MetadataCollection( self, self.datatype.metadata_spec )
+ return self._metadata_collection
def set_metadata( self, bunch ):
# Needs to accept a MetadataCollection, a bunch, or a dict
self._metadata = dict( bunch.items() )
@@ -191,6 +191,8 @@
def change_datatype( self, new_ext ):
self.clear_associated_files()
+ if hasattr( self, '_metadata_collection' ):
+ del self._metadata_collection
datatypes_registry.change_datatype( self, new_ext )
def get_size( self ):
"""Returns the size of the data on disk"""
1
0

[hg] galaxy 1515: Forgot to update tool_conf.sample with the new...
by greg@scofield.bx.psu.edu 22 Sep '08
by greg@scofield.bx.psu.edu 22 Sep '08
22 Sep '08
details: http://www.bx.psu.edu/hg/galaxy/rev/280e8b68f845
changeset: 1515:280e8b68f845
user: guru
date: Wed Sep 17 17:14:59 2008 -0400
description:
Forgot to update tool_conf.sample with the new tool details.
1 file(s) affected in this change:
tool_conf.xml.sample
diffs (10 lines):
diff -r 33e06a98b6d8 -r 280e8b68f845 tool_conf.xml.sample
--- a/tool_conf.xml.sample Wed Sep 17 16:42:08 2008 -0400
+++ b/tool_conf.xml.sample Wed Sep 17 17:14:59 2008 -0400
@@ -281,5 +281,6 @@
<tool file="metag_tools/megablast_wrapper.xml" />
<tool file="metag_tools/megablast_xml_parser.xml" />
<tool file="metag_tools/blat_wrapper.xml" />
+ <tool file="metag_tools/mapping_to_ucsc.xml" />
</section>
</toolbox>
1
0

22 Sep '08
details: http://www.bx.psu.edu/hg/galaxy/rev/ec547440ec97
changeset: 1508:ec547440ec97
user: Dan Blankenberg <dan(a)bx.psu.edu>
date: Tue Sep 16 13:25:42 2008 -0400
description:
Small update for maf stats tool.
2 file(s) affected in this change:
lib/galaxy/tools/util/maf_utilities.py
tools/maf/maf_stats.py
diffs (99 lines):
diff -r 842f1883cf53 -r ec547440ec97 lib/galaxy/tools/util/maf_utilities.py
--- a/lib/galaxy/tools/util/maf_utilities.py Mon Sep 15 15:04:41 2008 -0400
+++ b/lib/galaxy/tools/util/maf_utilities.py Tue Sep 16 13:25:42 2008 -0400
@@ -199,7 +199,7 @@
yield block
def get_chopped_blocks_with_index_offset_for_region( index, src, region, species = None, mincols = 0, force_strand = None ):
for block, idx, offset in index.get_as_iterator_with_index_and_offset( src, region.start, region.end ):
- block = chop_block_by_region( block, src, region, species, mincols )
+ block = chop_block_by_region( block, src, region, species, mincols, force_strand )
if block is not None:
yield block, idx, offset
@@ -209,6 +209,25 @@
else: alignment = RegionAlignment( end - start, primary_species )
return fill_region_alignment( alignment, index, primary_species, chrom, start, end, strand, species, mincols )
+#reduces a block to only positions exisiting in the src provided
+def reduce_block_by_primary_genome( block, species, chromosome, region_start ):
+ #returns ( startIndex, {species:texts}
+ #where texts' contents are reduced to only positions existing in the primary genome
+ src = "%s.%s" % ( species, chromosome )
+ ref = block.get_component_by_src( src )
+ start_offset = ref.start - region_start
+ species_texts = {}
+ for c in block.components:
+ species_texts[ c.src.split( '.' )[0] ] = list( c.text )
+ #remove locations which are gaps in the primary species, starting from the downstream end
+ for i in range( len( species_texts[ species ] ) - 1, -1, -1 ):
+ if species_texts[ species ][i] == '-':
+ for text in species_texts.values():
+ text.pop( i )
+ for spec, text in species_texts.items():
+ species_texts[spec] = ''.join( text )
+ return ( start_offset, species_texts )
+
#fills a region alignment
def fill_region_alignment( alignment, index, primary_species, chrom, start, end, strand = '+', species = None, mincols = 0 ):
region = bx.intervals.Interval( start, end )
@@ -216,22 +235,7 @@
region.strand = strand
primary_src = "%s.%s" % ( primary_species, chrom )
- def reduce_block_by_primary_genome( block ):
- #returns ( startIndex, {species:texts}
- #where texts' contents are reduced to only positions existing in the primary genome
- ref = block.get_component_by_src( primary_src )
- start_offset = ref.start - start
- species_texts = {}
- for c in block.components:
- species_texts[ c.src.split( '.' )[0] ] = list( c.text )
- #remove locations which are gaps in the primary species, starting from the downstream end
- for i in range( len( species_texts[ primary_species ] ) - 1, -1, -1 ):
- if species_texts[ primary_species ][i] == '-':
- for text in species_texts.values():
- text.pop( i )
- for spec, text in species_texts.items():
- species_texts[spec] = ''.join( text )
- return ( start_offset, species_texts )
+
#Order blocks overlaping this position by score, lowest first
blocks = []
@@ -248,7 +252,7 @@
for block_dict in blocks:
block = chop_block_by_region( block_dict[1].get_at_offset( block_dict[2] ), primary_src, region, species, mincols, strand )
if block is None: continue
- start_offset, species_texts = reduce_block_by_primary_genome( block )
+ start_offset, species_texts = reduce_block_by_primary_genome( block, primary_species, chrom, start )
for spec, text in species_texts.items():
try:
alignment.set_range( start_offset, spec, text )
diff -r 842f1883cf53 -r ec547440ec97 tools/maf/maf_stats.py
--- a/tools/maf/maf_stats.py Mon Sep 15 15:04:41 2008 -0400
+++ b/tools/maf/maf_stats.py Tue Sep 16 13:25:42 2008 -0400
@@ -64,19 +64,11 @@
for c in block.components:
spec = c.src.split( '.' )[0]
if spec not in coverage: coverage[spec] = zeros( region.end - region.start, dtype = bool )
- ref = block.get_component_by_src( src )
- #skip gap locations due to insertions in secondary species relative to primary species
- start_offset = ref.start - region.start
- num_gaps = 0
- for i in range( len( ref.text.rstrip().rstrip( "-" ) ) ):
- if ref.text[i] in ["-"]:
- num_gaps += 1
- continue
- #Toggle base if covered
- for comp in block.components:
- spec = comp.src.split( '.' )[0]
- if comp.text and comp.text[i] not in ['-']:
- coverage[spec][start_offset + i - num_gaps] = True
+ start_offset, alignment = maf_utilities.reduce_block_by_primary_genome( block, dbkey, region.chrom, region.start )
+ for i in range( len( alignment[dbkey] ) ):
+ for spec, text in alignment.items():
+ if text[i] != '-':
+ coverage[spec][start_offset + i] = True
if summary:
#record summary
for key in coverage.keys():
1
0
details: http://www.bx.psu.edu/hg/galaxy/rev/1d326855ba89
changeset: 1517:1d326855ba89
user: wychung
date: Thu Sep 18 15:41:23 2008 -0400
description:
update shrimp_wrapper.
2 file(s) affected in this change:
tools/metag_tools/shrimp_wrapper.py
tools/metag_tools/shrimp_wrapper.xml
diffs (621 lines):
diff -r f1da9b95549b -r 1d326855ba89 tools/metag_tools/shrimp_wrapper.py
--- a/tools/metag_tools/shrimp_wrapper.py Thu Sep 18 15:24:51 2008 -0400
+++ b/tools/metag_tools/shrimp_wrapper.py Thu Sep 18 15:41:23 2008 -0400
@@ -61,17 +61,13 @@
reversed_s.reverse()
return "".join(reversed_s)
-def generate_sub_table(result_file, ref_file, score_files, table_outfile, hit_per_read):
+def generate_sub_table(result_file, ref_file, score_files, table_outfile, hit_per_read, insertion_size):
- """
- TODO: the cross-over error has not been addressed yet.
- """
+ invalid_editstring_char = 0
- insertion_size = 600
+ all_score_file = score_files.split(',')
- all_score_file = score_files.split('&')
-
- if len(all_score_file) != hit_per_read: stop_err('Un-equal number of files!')
+ if len(all_score_file) != hit_per_read: stop_err('One or more query files is missing. Please check your dataset.')
temp_table_name = tempfile.NamedTemporaryFile().name
temp_table = open(temp_table_name, 'w')
@@ -178,7 +174,7 @@
hits_score[readname] = {}
hits_score[readname][endindex] = score
- # mutation call to all mappings
+ # call to all mappings
for readkey in hits.keys():
if len(hits[readkey]) != hit_per_read: continue
@@ -211,6 +207,7 @@
match_count += 1
if match_count == 1:
+
for x, end_data in enumerate(matches[0]):
end_strand, end_editstring, end_chr_start, end_chr_end, end_read_start, end_chrom = end_data
@@ -226,20 +223,26 @@
gap_read = 0
while editindex < len(end_editstring):
+
editchr = end_editstring[editindex]
chrA = ''
chrB = ''
locIndex = []
+
if editchr.isdigit():
editcode = ''
+
while editchr.isdigit() and editindex < len(end_editstring):
editcode += editchr
editindex += 1
if editindex < len(end_editstring): editchr = end_editstring[editindex]
+
for baseIndex in range(int(editcode)):
chrA += refsegment[match_len+baseIndex]
chrB = chrA
+
match_len += int(editcode)
+
elif editchr == 'x':
# crossover: inserted between the appropriate two bases
# Two sequencing errors: 4x15x6 (25 matches with 2 crossovers)
@@ -263,18 +266,21 @@
elif editchr == '(':
editcode = ''
+
while editchr != ')' and editindex < len(end_editstring):
if editindex < len(end_editstring): editchr = end_editstring[editindex]
editcode += editchr
editindex += 1
+
editcode = editcode[1:-1]
chrA = '-'*len(editcode)
chrB = editcode
else:
- print 'Warning! Unknown symbols', editchr
-
+ invalid_editstring_char += 1
+
if end_strand == '-':
+
chrA = reverse_complement(chrA)
chrB = reverse_complement(chrB)
@@ -288,9 +294,12 @@
chrBx = chrB[mappingIndex]
if chrAx and chrBx and chrBx.upper() != 'N':
+
if end_strand == '+':
+
chrom_loc = end_chr_start+match_len-len(chrA)+mappingIndex
read_loc = end_read_start+match_len-len(chrA)+mappingIndex-gap_read
+
if chrAx == '-': chrom_loc -= 1
if chrBx == '-':
@@ -300,9 +309,12 @@
# 1-based on chrom_loc and read_loc
pos_line = pos_line + '\t'.join([end_chrom, str(chrom_loc+1), readkey+'/'+str(x+1), str(read_loc+1), chrAx, chrBx, scoreBx]) + '\n'
+
else:
+
chrom_loc = end_chr_end-match_len+mappingIndex
read_loc = end_read_start+match_len-1-mappingIndex-gap_read
+
if chrAx == '-': chrom_loc -= 1
if chrBx == '-':
@@ -314,11 +326,14 @@
rev_line = '\t'.join([end_chrom, str(chrom_loc+1), readkey+'/'+str(x+1), str(read_loc+1), chrAx, chrBx, scoreBx]) +'\n' + rev_line
if chrom_cov.has_key(end_chrom):
+
if chrom_cov[end_chrom].has_key(chrom_loc):
chrom_cov[end_chrom][chrom_loc] += 1
else:
chrom_cov[end_chrom][chrom_loc] = 1
+
else:
+
chrom_cov[end_chrom] = {}
chrom_cov[end_chrom][chrom_loc] = 1
@@ -329,6 +344,7 @@
# chrom-wide coverage
for i, line in enumerate(open(temp_table_name)):
+
line = line.rstrip()
if not line or line.startswith('#'): continue
@@ -348,6 +364,9 @@
outfile.close()
if os.path.exists(temp_table_name): os.remove(temp_table_name)
+
+ if invalid_editstring_char:
+ print 'Skip ', invalid_editstring_char, ' invalid characters in editstrings'
return True
@@ -359,7 +378,7 @@
seq_title_startswith = ''
qual_title_startswith = ''
- default_coding_value = 64
+ default_coding_value = 64 # Solexa ascii-code
fastq_block_lines = 0
for i, line in enumerate( file( infile_name ) ):
@@ -448,16 +467,63 @@
def __main__():
+ # SHRiMP path
+ shrimp = 'rmapper-ls'
+
# I/O
- type_of_reads = sys.argv[1] # single or paired
- input_target = sys.argv[2] # fasta
- shrimp_outfile = sys.argv[3] # shrimp output
- table_outfile = sys.argv[4] # table output
-
- # SHRiMP parameters: total = 15
- # TODO: put threshold on each of these parameters
- if len(sys.argv) == 21 or len(sys.argv) == 22:
- spaced_seed = sys.argv[5]
+ input_target_file = sys.argv[1] # fasta
+ shrimp_outfile = sys.argv[2] # shrimp output
+ table_outfile = sys.argv[3] # table output
+ single_or_paired = sys.argv[4].split(',')
+
+ insertion_size = 600
+
+ if len(single_or_paired) == 1: # single or paired
+ type_of_reads = 'single'
+ hit_per_read = 1
+ input_query = single_or_paired[0]
+ query_fasta = tempfile.NamedTemporaryFile().name
+ query_qual = tempfile.NamedTemporaryFile().name
+
+ else: # paired-end
+ type_of_reads = 'paired'
+ hit_per_read = 2
+ input_query_end1 = single_or_paired[0]
+ input_query_end2 = single_or_paired[1]
+ insertion_size = int(single_or_paired[2])
+ query_fasta_end1 = tempfile.NamedTemporaryFile().name
+ query_fasta_end2 = tempfile.NamedTemporaryFile().name
+ query_qual_end1 = tempfile.NamedTemporaryFile().name
+ query_qual_end2 = tempfile.NamedTemporaryFile().name
+
+ # SHRiMP parameters: total = 15, default values
+ spaced_seed = '111111011111'
+ seed_matches_per_window = '2'
+ seed_hit_taboo_length = '4'
+ seed_generation_taboo_length = '0'
+ seed_window_length = '115.0'
+ max_hits_per_read = '100'
+ max_read_length = '1000'
+ kmer = '-1'
+ sw_match_value = '100'
+ sw_mismatch_value = '-150'
+ sw_gap_open_ref = '-400'
+ sw_gap_open_query = '-400'
+ sw_gap_ext_ref = '-70'
+ sw_gap_ext_query = '-70'
+ sw_hit_threshold = '68.0'
+
+ # TODO: put the threshold on each of these parameters
+ if len(sys.argv) > 5:
+
+ try:
+ if sys.argv[5].isdigit():
+ spaced_seed = sys.argv[5]
+ else:
+ stop_err('Error in assigning parameter: Spaced seed.')
+ except:
+ stop_err('Spaced seed must be a combination of 1s and 0s.')
+
seed_matches_per_window = sys.argv[6]
seed_hit_taboo_length = sys.argv[7]
seed_generation_taboo_length = sys.argv[8]
@@ -473,53 +539,6 @@
sw_gap_ext_query = sys.argv[18]
sw_hit_threshold = sys.argv[19]
- # Single-end parameters
- if type_of_reads == 'single':
- input_query = sys.argv[20] # single-end
- hit_per_read = 1
- query_fasta = tempfile.NamedTemporaryFile().name
- query_qual = tempfile.NamedTemporaryFile().name
- else: # Paired-end parameters
- input_query_end1 = sys.argv[20] # paired-end
- input_query_end2 = sys.argv[21]
- hit_per_read = 2
- query_fasta_end1 = tempfile.NamedTemporaryFile().name
- query_fasta_end2 = tempfile.NamedTemporaryFile().name
- query_qual_end1 = tempfile.NamedTemporaryFile().name
- query_qual_end2 = tempfile.NamedTemporaryFile().name
- else:
- spaced_seed = '111111011111'
- seed_matches_per_window = '2'
- seed_hit_taboo_length = '4'
- seed_generation_taboo_length = '0'
- seed_window_length = '115.0'
- max_hits_per_read = '100'
- max_read_length = '1000'
- kmer = '-1'
- sw_match_value = '100'
- sw_mismatch_value = '-150'
- sw_gap_open_ref = '-400'
- sw_gap_open_query = '-400'
- sw_gap_ext_ref = '-70'
- sw_gap_ext_query = '-70'
- sw_hit_threshold = '68.0'
-
- # Single-end parameters
- if type_of_reads == 'single':
- input_query = sys.argv[5] # single-end
- hit_per_read = 1
- query_fasta = tempfile.NamedTemporaryFile().name
- query_qual = tempfile.NamedTemporaryFile().name
- else: # Paired-end parameters
- input_query_end1 = sys.argv[5] # paired-end
- input_query_end2 = sys.argv[6]
- hit_per_read = 2
- query_fasta_end1 = tempfile.NamedTemporaryFile().name
- query_fasta_end2 = tempfile.NamedTemporaryFile().name
- query_qual_end1 = tempfile.NamedTemporaryFile().name
- query_qual_end2 = tempfile.NamedTemporaryFile().name
-
-
# temp file for shrimp log file
shrimp_log = tempfile.NamedTemporaryFile().name
@@ -532,7 +551,7 @@
# SHRiMP command
if type_of_reads == 'single':
- command = ' '.join(['rmapper-ls', '-s', spaced_seed, '-n', seed_matches_per_window, '-t', seed_hit_taboo_length, '-9', seed_generation_taboo_length, '-w', seed_window_length, '-o', max_hits_per_read, '-r', max_read_length, '-d', kmer, '-m', sw_match_value, '-i', sw_mismatch_value, '-g', sw_gap_open_ref, '-q', sw_gap_open_query, '-e', sw_gap_ext_ref, '-f', sw_gap_ext_query, '-h', sw_hit_threshold, query_fasta, input_target, '>', shrimp_outfile, '2>', shrimp_log])
+ command = ' '.join([shrimp, '-s', spaced_seed, '-n', seed_matches_per_window, '-t', seed_hit_taboo_length, '-9', seed_generation_taboo_length, '-w', seed_window_length, '-o', max_hits_per_read, '-r', max_read_length, '-d', kmer, '-m', sw_match_value, '-i', sw_mismatch_value, '-g', sw_gap_open_ref, '-q', sw_gap_open_query, '-e', sw_gap_ext_ref, '-f', sw_gap_ext_query, '-h', sw_hit_threshold, query_fasta, input_target_file, '>', shrimp_outfile, '2>', shrimp_log])
try:
os.system(command)
@@ -541,9 +560,9 @@
if os.path.exists(query_qual): os.remove(query_qual)
stop_err(str(e))
- else:
- command_end1 = ' '.join(['rmapper-ls', '-s', spaced_seed, '-n', seed_matches_per_window, '-t', seed_hit_taboo_length, '-9', seed_generation_taboo_length, '-w', seed_window_length, '-o', max_hits_per_read, '-r', max_read_length, '-d', kmer, '-m', sw_match_value, '-i', sw_mismatch_value, '-g', sw_gap_open_ref, '-q', sw_gap_open_query, '-e', sw_gap_ext_ref, '-f', sw_gap_ext_query, '-h', sw_hit_threshold, query_fasta_end1, input_target, '>', shrimp_outfile, '2>', shrimp_log])
- command_end2 = ' '.join(['rmapper-ls', '-s', spaced_seed, '-n', seed_matches_per_window, '-t', seed_hit_taboo_length, '-9', seed_generation_taboo_length, '-w', seed_window_length, '-o', max_hits_per_read, '-r', max_read_length, '-d', kmer, '-m', sw_match_value, '-i', sw_mismatch_value, '-g', sw_gap_open_ref, '-q', sw_gap_open_query, '-e', sw_gap_ext_ref, '-f', sw_gap_ext_query, '-h', sw_hit_threshold, query_fasta_end2, input_target, '>>', shrimp_outfile, '2>>', shrimp_log])
+ else: # paired
+ command_end1 = ' '.join([shrimp, '-s', spaced_seed, '-n', seed_matches_per_window, '-t', seed_hit_taboo_length, '-9', seed_generation_taboo_length, '-w', seed_window_length, '-o', max_hits_per_read, '-r', max_read_length, '-d', kmer, '-m', sw_match_value, '-i', sw_mismatch_value, '-g', sw_gap_open_ref, '-q', sw_gap_open_query, '-e', sw_gap_ext_ref, '-f', sw_gap_ext_query, '-h', sw_hit_threshold, query_fasta_end1, input_target_file, '>', shrimp_outfile, '2>', shrimp_log])
+ command_end2 = ' '.join([shrimp, '-s', spaced_seed, '-n', seed_matches_per_window, '-t', seed_hit_taboo_length, '-9', seed_generation_taboo_length, '-w', seed_window_length, '-o', max_hits_per_read, '-r', max_read_length, '-d', kmer, '-m', sw_match_value, '-i', sw_mismatch_value, '-g', sw_gap_open_ref, '-q', sw_gap_open_query, '-e', sw_gap_ext_ref, '-f', sw_gap_ext_query, '-h', sw_hit_threshold, query_fasta_end2, input_target_file, '>>', shrimp_outfile, '2>>', shrimp_log])
try:
os.system(command_end1)
@@ -557,9 +576,9 @@
# convert to table
if type_of_reads == 'single':
- return_value = generate_sub_table(shrimp_outfile, input_target, query_qual, table_outfile, hit_per_read)
+ return_value = generate_sub_table(shrimp_outfile, input_target_file, query_qual, table_outfile, hit_per_read, insertion_size)
else:
- return_value = generate_sub_table(shrimp_outfile, input_target, query_qual_end1+'&'+query_qual_end2, table_outfile, hit_per_read)
+ return_value = generate_sub_table(shrimp_outfile, input_target_file, query_qual_end1+','+query_qual_end2, table_outfile, hit_per_read, insertion_size)
# remove temp. files
if type_of_reads == 'single':
diff -r f1da9b95549b -r 1d326855ba89 tools/metag_tools/shrimp_wrapper.xml
--- a/tools/metag_tools/shrimp_wrapper.xml Thu Sep 18 15:24:51 2008 -0400
+++ b/tools/metag_tools/shrimp_wrapper.xml Thu Sep 18 15:41:23 2008 -0400
@@ -1,50 +1,51 @@
<tool id="shrimp_wrapper" name="SHRiMP" version="1.0.0">
<description>SHort Read Mapping Package</description>
<command interpreter="python">
- #if ($type_of_reads.single_or_paired=="single" and $param.skip_or_full=="skip"):#shrimp_wrapper.py $type_of_reads.single_or_paired $input_target $output1 $output2 $input_query
- #elif ($type_of_reads.single_or_paired=="paired" and $param.skip_or_full=="skip"):#shrimp_wrapper.py $type_of_reads.single_or_paired $input_target $output1 $output2 ${type_of_reads.input1} ${type_of_reads.input2}
- #elif ($type_of_reads.single_or_paired=="single" and $param.skip_or_full=="full"):#shrimp_wrapper.py $type_of_reads.single_or_paired $input_target $output1 $output2 $param.spaced_seed $param.seed_matches_per_window $param.seed_hit_taboo_length $param.seed_generation_taboo_length $param.seed_window_length $param.max_hits_per_read $param.max_read_length $param.kmer $param.sw_match_value $param.sw_mismatch_value $param.sw_gap_open_ref $param.sw_gap_open_query $param.sw_gap_ext_ref $param.sw_gap_ext_query $param.sw_hit_threshold $input_query
- #elif ($type_of_reads.single_or_paired=="paired" and $param.skip_or_full=="full"):#shrimp_wrapper.py $type_of_reads.single_or_paired $input_target $output1 $output2 $param.spaced_seed $param.seed_matches_per_window $param.seed_hit_taboo_length $param.seed_generation_taboo_length $param.seed_window_length $param.max_hits_per_read $param.max_read_length $param.kmer $param.sw_match_value $param.sw_mismatch_value $param.sw_gap_open_ref $param.sw_gap_open_query $param.sw_gap_ext_ref $param.sw_gap_ext_query $param.sw_hit_threshold ${type_of_reads.input1} ${type_of_reads.input2}
+ #if ($type_of_reads.single_or_paired=="single" and $param.skip_or_full=="skip"):#shrimp_wrapper.py $input_target $output1 $output2 $input_query
+ #elif ($type_of_reads.single_or_paired=="paired" and $param.skip_or_full=="skip"):#shrimp_wrapper.py $input_target $output1 $output2 $type_of_reads.input1,$type_of_reads.input2,$type_of_reads.insertion_size
+ #elif ($type_of_reads.single_or_paired=="single" and $param.skip_or_full=="full"):#shrimp_wrapper.py $input_target $output1 $output2 $input_query $param.spaced_seed $param.seed_matches_per_window $param.seed_hit_taboo_length $param.seed_generation_taboo_length $param.seed_window_length $param.max_hits_per_read $param.max_read_length $param.kmer $param.sw_match_value $param.sw_mismatch_value $param.sw_gap_open_ref $param.sw_gap_open_query $param.sw_gap_ext_ref $param.sw_gap_ext_query $param.sw_hit_threshold
+ #elif ($type_of_reads.single_or_paired=="paired" and $param.skip_or_full=="full"):#shrimp_wrapper.py $input_target $output1 $output2 $type_of_reads.input1,$type_of_reads.input2,$type_of_reads.insertion_size $param.spaced_seed $param.seed_matches_per_window $param.seed_hit_taboo_length $param.seed_generation_taboo_length $param.seed_window_length $param.max_hits_per_read $param.max_read_length $param.kmer $param.sw_match_value $param.sw_mismatch_value $param.sw_gap_open_ref $param.sw_gap_open_query $param.sw_gap_ext_ref $param.sw_gap_ext_query $param.sw_hit_threshold
#end if
</command>
<inputs>
<page>
- <param name="input_target" type="data" format="fasta" label="Reference sequence" />
<conditional name="type_of_reads">
<param name="single_or_paired" type="select" label="Single- or Paired-ends">
<option value="single">Single-end</option>
<option value="paired">Paired-end</option>
</param>
<when value="single">
- <param name="input_query" type="data" format="fastqsolexa" label="Sequence file" />
+ <param name="input_query" type="data" format="fastqsolexa" label="Align sequencing reads" />
</when>
<when value="paired">
- <param name="input1" type="data" format="fastqsolexa" label="One end" />
- <param name="input2" type="data" format="fastqsolexa" label="The other end" />
+ <param name="insertion_size" type="integer" size="5" value="600" label="Insertion length between two ends" help="bp" />
+ <param name="input1" type="data" format="fastqsolexa" label="Align sequencing reads, one end" />
+ <param name="input2" type="data" format="fastqsolexa" label="and the other end" />
</when>
</conditional>
+ <param name="input_target" type="data" format="fasta" label="against reference" />
<conditional name="param">
- <param name="skip_or_full" type="select" label="SHRiMP parameter selection">
- <option value="skip">Default setting</option>
- <option value="full">Full list</option>
+ <param name="skip_or_full" type="select" label="SHRiMP settings to use" help="For most mapping needs use Commonly used settings. If you want full control use Full List">
+ <option value="skip">Commonly used</option>
+ <option value="full">Full Parameter List</option>
</param>
<when value="skip" />
<when value="full">
- <param name="spaced_seed" type="text" size="30" value="111111011111" label="Spaced Seed" />
- <param name="seed_matches_per_window" type="integer" size="5" value="2" label="Seed Matches per Window" />
- <param name="seed_hit_taboo_length" type="integer" size="5" value="4" label="Seed Hit Taboo Length" />
- <param name="seed_generation_taboo_length" type="integer" size="5" value="0" label="Seed Generation Taboo Length" />
- <param name="seed_window_length" type="float" size="10" value="115.0" label="Seed Window Length" help="in percentage"/>
- <param name="max_hits_per_read" type="integer" size="10" value="100" label="Maximum Hits per Read" />
- <param name="max_read_length" type="integer" size="10" value="1000" label="Maximum Read Length" />
- <param name="kmer" type="integer" size="10" value="-1" label="Kmer Std. Deviation Limit" help="-1 as None"/>
- <param name="sw_match_value" type="integer" size="10" value="100" label="S-W Match Value" />
- <param name="sw_mismatch_value" type="integer" size="10" value="-150" label="S-W Mismatch Value" />
- <param name="sw_gap_open_ref" type="integer" size="10" value="-400" label="S-W Gap Open Penalty (Reference)" />
- <param name="sw_gap_open_query" type="integer" size="10" value="-400" label="S-W Gap Open Penalty (Query)" />
- <param name="sw_gap_ext_ref" type="integer" size="10" value="-70" label="S-W Gap Extend Penalty (Reference)" />
- <param name="sw_gap_ext_query" type="integer" size="10" value="-70" label="S-W Gap Extend Penalty (Query)" />
- <param name="sw_hit_threshold" type="float" size="10" value="68.0" label="S-W Hit Threshold" help="in percentage"/>
+ <param name="spaced_seed" type="text" size="30" value="111111011111" label="Spaced Seed" />
+ <param name="seed_matches_per_window" type="integer" size="5" value="2" label="Seed Matches per Window" />
+ <param name="seed_hit_taboo_length" type="integer" size="5" value="4" label="Seed Hit Taboo Length" />
+ <param name="seed_generation_taboo_length" type="integer" size="5" value="0" label="Seed Generation Taboo Length" />
+ <param name="seed_window_length" type="float" size="10" value="115.0" label="Seed Window Length" help="in percentage"/>
+ <param name="max_hits_per_read" type="integer" size="10" value="100" label="Maximum Hits per Read" />
+ <param name="max_read_length" type="integer" size="10" value="1000" label="Maximum Read Length" />
+ <param name="kmer" type="integer" size="10" value="-1" label="Kmer Std. Deviation Limit" help="-1 as None"/>
+ <param name="sw_match_value" type="integer" size="10" value="100" label="S-W Match Value" />
+ <param name="sw_mismatch_value" type="integer" size="10" value="-150" label="S-W Mismatch Value" />
+ <param name="sw_gap_open_ref" type="integer" size="10" value="-400" label="S-W Gap Open Penalty (Reference)" />
+ <param name="sw_gap_open_query" type="integer" size="10" value="-400" label="S-W Gap Open Penalty (Query)" />
+ <param name="sw_gap_ext_ref" type="integer" size="10" value="-70" label="S-W Gap Extend Penalty (Reference)" />
+ <param name="sw_gap_ext_query" type="integer" size="10" value="-70" label="S-W Gap Extend Penalty (Query)" />
+ <param name="sw_hit_threshold" type="float" size="10" value="68.0" label="S-W Hit Threshold" help="in percentage"/>
</when>
</conditional>
</page>
@@ -54,7 +55,7 @@
<data name="output2" format="tabular"/>
</outputs>
<requirements>
- <requirement type="binary">SHRiMP_rmapper</requirement>
+ <requirement type="binary">rmapper-ls</requirement>
</requirements>
<tests>
<test>
@@ -64,13 +65,14 @@
<param name="input_query" value="shrimp_wrapper_test1.fastq" ftype="fastqsolexa"/>
<output name="output1" file="shrimp_wrapper_test1.out1" />
</test>
- <!--
+ <!--
<test>
- <param name="input1" value="shrimp_wrapper_test2_end1.fastq" ftype="fastqsolexa" />
- <param name="input2" value="shrimp_wrapper_test2_end2.fastq" ftype="fastqsolexa" />
<param name="single_or_paired" value="paired" />
<param name="skip_or_full" value="skip" />
<param name="input_target" value="shrimp_eca_chrMT.fa" ftype="fasta" />
+ <param name="input1" value="shrimp_wrapper_test2_end1.fastq" ftype="fastqsolexa" />
+ <param name="input2" value="shrimp_wrapper_test2_end2.fastq" ftype="fastqsolexa" />
+ <param name="insertion_size" value="600" />
<output name="output1" file="shrimp_wrapper_test2.out1" />
</test>
<test>
@@ -116,6 +118,7 @@
<param name="sw_hit_threshold" value="68.0" />
<param name="input1" value="shrimp_wrapper_test2_end1.fastq" ftype="fastqsolexa"/>
<param name="input2" value="shrimp_wrapper_test2_end2.fastq" ftype="fastqsolexa"/>
+ <param name="insertion_size" value="600" />
<output name="output1" file="shrimp_wrapper_test2.out1" />
</test>
-->
@@ -124,67 +127,146 @@
.. class:: warningmark
-Only nucleotide sequences as query.
+Please note that only **nucleotide** sequences (letter-space) can be used as query.
-----
**What it does**
-Run SHRiMP on letter-space reads.
-
+SHRiMP (SHort Read Mapping Package) is a software package for aligning genomic reads against a target genome.
+
+This wrapper post-processes the default SHRiMP/rmapper-ls output and generates a table with all information from reads and reference for the mapping. The tool takes single- or paired-end reads. For single-end reads, only uniquely mapped alignment is considered. In paired-end reads, only pairs that meet the following criteria will be used to generate the table: 1). the ends fall within the insertion size; 2). the ends are mapped at the opposite directions. If there are still multiple mappings after applying the criteria, this paired-end read will be discarded.
+
+
-----
-
-**Example**
-
-- Input a multiple-fastq file like the following::
+
+**Input formats**
+
+A multiple-fastq file, for example::
@seq1
TACCCGATTTTTTGCTTTCCACTTTATCCTACCCTT
- +seq2
+ +seq1
hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh
-- Use default settings (for detail explanations, please see **Parameters** section)
-
-- Search against your own uploaded file, result will be in the following format::
-
- +-------+-------+--------+----------+----------+---------+--------+--------+-------+------------+
- | id | chrom | strand | t.start | t.end | q.start | q.end | length | score | editstring |
- +-------+-------+--------+----------+----------+---------+--------+--------+-------+------------+
- | >seq1 | chrMT | + | 14712 | 14747 | 1 | 36 | 36 | 3350 | 24T11 |
- +-------+-------+--------+----------+----------+---------+--------+--------+-------+------------+
-
-- The result will be formatted Table::
-
- +-------+---------+---------+----------+---------+----------+---------+----------+
- | chrom | ref_loc | read_id | read_loc | ref_nuc | read_nuc | quality | coverage |
- +-------+---------+---------+----------+---------+----------+---------+----------+
- | chrMT | 14711 | seq1 | 0 | T | T | 40 | 1 |
- | chrMT | 14712 | seq1 | 1 | A | A | 40 | 1 |
- | chrMT | 14713 | seq1 | 2 | C | C | 40 | 1 |
- +-------+---------+---------+----------+---------+----------+---------+----------+
-----
-**Parameters**
+**Outputs**
-Parameter list with default value settings::
+The tool gives two outputs.
+
+**Table output**
+
+Table output contains 8 columns::
+
+ 1 2 3 4 5 6 7 8
+ ----------------------------------------------------
+ chrM 14711 seq1 0 T A 40 1
+ chrM 14712 seq1 1 T T 40 1
+
+where::
+
+ 1. (chrM) - Reference sequence id
+ 2. (14711) - Position of the mapping in the reference
+ 3. (seq1) - Read id
+ 4. (0) - Position of the mapping in the read
+ 5. (T) - Nucleotide in the reference
+ 6. (A) - Nucleotide in the read
+ 7. (40) - Quality score for the nucleotide in the position of the read
+ 8. (1) - The number of times this position is covered by reads
+
+
+**SHRiMP output**
+
+This is the default output from SHRiMP/rmapper-ls::
+
+ 1 2 3 4 5 6 7 8 9 10
+ -------------------------------------------------------------------
+ seq1 chrM + 3644 3679 1 36 36 3600 36
+
+where::
+
+ 1. (seq1) - Read id
+ 2. (chrM) - Reference sequence id
+ 3. (+) - Strand of the read
+ 4. (3466) - Start position of the alignment in the reference
+ 5. (3679) - End position of the alignment in the reference
+ 6. (1) - Start position of the alignment in the read
+ 7. (36) - End position of the alignment in the read
+ 8. (36) - Length of the read
+ 9. (3600) - Score
+ 10. (36) - Edit string
+
+
+-----
+
+**SHRiMP parameter list**
+
+The commonly used parameters with default value setting::
-s Spaced Seed (default: 111111011111)
+ The spaced seed is a single contiguous string of 0's and 1's.
+ 0's represent wildcards, or positions which will always be
+ considered as matching, whereas 1's dictate positions that
+ must match. A string of all 1's will result in a simple kmer scan.
-n Seed Matches per Window (default: 2)
+ The number of seed matches per window dictates how many seeds
+ must match within some window length of the genome before that
+ region is considered for Smith-Waterman alignment. A lower
+ value will increase sensitivity while drastically increasing
+ running time. Higher values will have the opposite effect.
-t Seed Hit Taboo Length (default: 4)
+ The seed taboo length specifies how many target genome bases
+ or colours must exist prior to a previous seed match in order
+ to count another seed match as a hit.
-9 Seed Generation Taboo Length (default: 0)
+
-w Seed Window Length (default: 115.00%)
+ This parameter specifies the genomic span in bases (or colours)
+ in which *seed_matches_per_window* must exist before the read
+ is given consideration by the Simth-Waterman alignment machinery.
-o Maximum Hits per Read (default: 100)
+ This parameter specifies how many hits to remember for each read.
+ If more hits are encountered, ones with lower scores are dropped
+ to make room.
-r Maximum Read Length (default: 1000)
+ This parameter specifies the maximum length of reads that will
+ be encountered in the dataset. If larger reads than the default
+ are used, an appropriate value must be passed to *rmapper*.
-d Kmer Std. Deviation Limit (default: -1 [None])
+ This option permits pruning read kmers, which occur with
+ frequencies greater than *kmer_std_dev_limit* standard
+ deviations above the average. This can shorten running
+ time at the cost of some sensitivity.
+ *Note*: A negative value disables this option.
+ -m S-W Match Value (default: 100)
+ The value applied to matches during the Smith-Waterman score calculation.
+ -i S-W Mismatch Value (default: -150)
+ The value applied to mismatches during the Smith-Waterman
+ score calculation.
+ -g S-W Gap Open Penalty (Reference) (default: -400)
+ The value applied to gap opens along the reference sequence
+ during the Smith-Waterman score calculation.
+ *Note*: Note that for backward compatibility, if -g is set
+ and -q is not set, the gap open penalty for the query will
+ be set to the same value as specified for the reference.
+ -q S-W Gap Open Penalty (Query) (default: -400)
+ The value applied to gap opens along the query sequence during
+ the Smith-Waterman score calculation.
+ -e S-W Gap Extend Penalty (Reference) (default: -70)
+ The value applied to gap extends during the Smith-Waterman score calculation.
+ *Note*: Note that for backward compatibility, if -e is set
+ and -f is not set, the gap exten penalty for the query will
+ be set to the same value as specified for the reference.
+ -f S-W Gap Extend Penalty (Query) (default: -70)
+ The value applied to gap extends during the Smith-Waterman score calculation.
+ -h S-W Hit Threshold (default: 68.00%)
+ In letter-space, this parameter determines the threshold
+ score for both vectored and full Smith-Waterman alignments.
+ Any values less than this quanitity will be thrown away.
+ *Note* This option differs slightly in meaning between letter-space and colour-space.
- -m S-W Match Value (default: 100)
- -i S-W Mismatch Value (default: -150)
- -g S-W Gap Open Penalty (Reference) (default: -400)
- -q S-W Gap Open Penalty (Query) (default: -400)
- -e S-W Gap Extend Penalty (Reference) (default: -70)
- -f S-W Gap Extend Penalty (Query) (default: -70)
- -h S-W Hit Threshold (default: 68.00%)
-----
1
0

[hg] galaxy 1514: New tool to format short read mapping data as ...
by greg@scofield.bx.psu.edu 22 Sep '08
by greg@scofield.bx.psu.edu 22 Sep '08
22 Sep '08
details: http://www.bx.psu.edu/hg/galaxy/rev/33e06a98b6d8
changeset: 1514:33e06a98b6d8
user: guru
date: Wed Sep 17 16:42:08 2008 -0400
description:
New tool to format short read mapping data as a UCSC custom track,.
2 file(s) affected in this change:
tools/metag_tools/mapping_to_ucsc.py
tools/metag_tools/mapping_to_ucsc.xml
diffs (415 lines):
diff -r cf17b5a16eff -r 33e06a98b6d8 tools/metag_tools/mapping_to_ucsc.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/metag_tools/mapping_to_ucsc.py Wed Sep 17 16:42:08 2008 -0400
@@ -0,0 +1,204 @@
+#! /usr/bin/python
+
+from galaxy import eggs
+import sys, tempfile, os
+
+assert sys.version_info[:2] >= (2.4)
+
+def stop_err(msg):
+ sys.stderr.write(msg)
+ sys.exit()
+
+def main():
+
+ out_fname = sys.argv[1]
+ in_fname = sys.argv[2]
+ chr_col = int(sys.argv[3])-1
+ coord_col = int(sys.argv[4])-1
+ track_type = sys.argv[5]
+ if track_type == 'coverage' or track_type == 'both':
+ coverage_col = int(sys.argv[6])-1
+ cname = sys.argv[7]
+ cdescription = sys.argv[8]
+ ccolor = sys.argv[9].replace('-',',')
+ cvisibility = sys.argv[10]
+ if track_type == 'snp' or track_type == 'both':
+ if track_type == 'both':
+ j = 5
+ else:
+ j = 0
+ #sname = sys.argv[7+j]
+ sdescription = sys.argv[6+j]
+ svisibility = sys.argv[7+j]
+ #ref_col = int(sys.argv[10+j])-1
+ read_col = int(sys.argv[8+j])-1
+
+
+ # Sort the input file based on chromosome (alphabetically) and start co-ordinates (numerically)
+ sorted_infile = tempfile.NamedTemporaryFile()
+ try:
+ os.system("sort -k %d,%d -k %dn -o %s %s" %(chr_col+1,chr_col+1,coord_col+1,sorted_infile.name,in_fname))
+ except Exception, exc:
+ stop_err( 'Initialization error -> %s' %str(exc) )
+
+ #generate chr list
+ sorted_infile.seek(0)
+ chr_vals = []
+ for line in file( sorted_infile.name ):
+ line = line.strip()
+ if not(line):
+ continue
+ try:
+ fields = line.split('\t')
+ chr = fields[chr_col]
+ if chr not in chr_vals:
+ chr_vals.append(chr)
+ except:
+ pass
+ if not(chr_vals):
+ stop_err("Skipped all lines as invalid.")
+
+ if track_type == 'coverage' or track_type == 'both':
+ if track_type == 'coverage':
+ fout = open( out_fname, "w" )
+ else:
+ fout = tempfile.NamedTemporaryFile()
+ fout.write('''track type=wiggle_0 name="%s" description="%s" color=%s visibility=%s\n''' \
+ % ( cname, cdescription, ccolor, cvisibility ))
+ if track_type == 'snp' or track_type == 'both':
+ fout_a = tempfile.NamedTemporaryFile()
+ fout_t = tempfile.NamedTemporaryFile()
+ fout_g = tempfile.NamedTemporaryFile()
+ fout_c = tempfile.NamedTemporaryFile()
+ fout_ref = tempfile.NamedTemporaryFile()
+
+ fout_a.write('''track type=wiggle_0 name="%s" description="%s" color=%s visibility=%s\n''' \
+ % ( "Track A", sdescription, '255,0,0', svisibility ))
+ fout_t.write('''track type=wiggle_0 name="%s" description="%s" color=%s visibility=%s\n''' \
+ % ( "Track T", sdescription, '0,255,0', svisibility ))
+ fout_g.write('''track type=wiggle_0 name="%s" description="%s" color=%s visibility=%s\n''' \
+ % ( "Track G", sdescription, '0,0,255', svisibility ))
+ fout_c.write('''track type=wiggle_0 name="%s" description="%s" color=%s visibility=%s\n''' \
+ % ( "Track C", sdescription, '255,0,255', svisibility ))
+
+
+ sorted_infile.seek(0)
+ for line in file( sorted_infile.name ):
+ line = line.strip()
+ if not(line):
+ continue
+ try:
+ fields = line.split('\t')
+ chr = fields[chr_col]
+ start = int(fields[coord_col])
+ assert start > 0
+ except:
+ continue
+ try:
+ ind = chr_vals.index(chr) #encountered chr for the 1st time
+ del chr_vals[ind]
+ prev_start = ''
+ header = "variableStep chrom=%s\n" %(chr)
+ if track_type == 'coverage' or track_type == 'both':
+ coverage = int(fields[coverage_col])
+ line1 = "%s\t%s\n" %(start,coverage)
+ fout.write("%s%s" %(header,line1))
+ if track_type == 'snp' or track_type == 'both':
+ a = t = g = c = 0
+ fout_a.write("%s" %(header))
+ fout_t.write("%s" %(header))
+ fout_g.write("%s" %(header))
+ fout_c.write("%s" %(header))
+ try:
+ #ref_nt = fields[ref_col].capitalize()
+ read_nt = fields[read_col].capitalize()
+ try:
+ nt_ind = ['A','T','G','C'].index(read_nt)
+ if nt_ind == 0:
+ a+=1
+ elif nt_ind == 1:
+ t+=1
+ elif nt_ind == 2:
+ g+=1
+ else:
+ c+=1
+ except ValueError:
+ pass
+ except:
+ pass
+ prev_start = start
+ except ValueError:
+ if start != prev_start:
+ if track_type == 'coverage' or track_type == 'both':
+ coverage = int(fields[coverage_col])
+ fout.write("%s\t%s\n" %(start,coverage))
+ if track_type == 'snp' or track_type == 'both':
+ if a:
+ fout_a.write("%s\t%s\n" %(prev_start,a))
+ if t:
+ fout_t.write("%s\t%s\n" %(prev_start,t))
+ if g:
+ fout_g.write("%s\t%s\n" %(prev_start,g))
+ if c:
+ fout_c.write("%s\t%s\n" %(prev_start,c))
+ a = t = g = c = 0
+ try:
+ #ref_nt = fields[ref_col].capitalize()
+ read_nt = fields[read_col].capitalize()
+ try:
+ nt_ind = ['A','T','G','C'].index(read_nt)
+ if nt_ind == 0:
+ a+=1
+ elif nt_ind == 1:
+ t+=1
+ elif nt_ind == 2:
+ g+=1
+ else:
+ c+=1
+ except ValueError:
+ pass
+ except:
+ pass
+ prev_start = start
+ else:
+ if track_type == 'snp' or track_type == 'both':
+ try:
+ #ref_nt = fields[ref_col].capitalize()
+ read_nt = fields[read_col].capitalize()
+ try:
+ nt_ind = ['A','T','G','C'].index(read_nt)
+ if nt_ind == 0:
+ a+=1
+ elif nt_ind == 1:
+ t+=1
+ elif nt_ind == 2:
+ g+=1
+ else:
+ c+=1
+ except ValueError:
+ pass
+ except:
+ pass
+
+ if track_type == 'snp' or track_type == 'both':
+ if a:
+ fout_a.write("%s\t%s\n" %(prev_start,a))
+ if t:
+ fout_t.write("%s\t%s\n" %(prev_start,t))
+ if g:
+ fout_g.write("%s\t%s\n" %(prev_start,g))
+ if c:
+ fout_c.write("%s\t%s\n" %(prev_start,c))
+
+ fout_a.seek(0)
+ fout_g.seek(0)
+ fout_t.seek(0)
+ fout_c.seek(0)
+
+ if track_type == 'snp':
+ os.system("cat %s %s %s %s >> %s" %(fout_a.name,fout_t.name,fout_g.name,fout_c.name,out_fname))
+ elif track_type == 'both':
+ fout.seek(0)
+ os.system("cat %s %s %s %s %s | cat > %s" %(fout.name,fout_a.name,fout_t.name,fout_g.name,fout_c.name,out_fname))
+if __name__ == "__main__":
+ main()
\ No newline at end of file
diff -r cf17b5a16eff -r 33e06a98b6d8 tools/metag_tools/mapping_to_ucsc.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/tools/metag_tools/mapping_to_ucsc.xml Wed Sep 17 16:42:08 2008 -0400
@@ -0,0 +1,202 @@
+<tool id="mapToUCSC" name="Format mapping data" version="1.0.0">
+ <description> as UCSC custom track</description>
+ <command interpreter="python">
+ mapping_to_ucsc.py
+ $out_file1
+ $input
+ $chr_col
+ $coord_col
+ $track.track_type
+ #if $track.track_type == "coverage" or $track.track_type == "both"
+ $track.coverage_col
+ "${track.cname}"
+ "${track.cdescription}"
+ "${track.ccolor}"
+ "${track.cvisibility}"
+ #end if
+ #if $track.track_type == "snp" or $track.track_type == "both"
+ "${track.sdescription}"
+ "${track.svisibility}"
+ $track.col2
+ #end if
+ </command>
+ <inputs>
+ <param format="tabular" name="input" type="data" label="Select mapping data"/>
+ <param name="chr_col" type="data_column" data_ref="input" label="Column for reference chromosome" />
+ <param name="coord_col" type="data_column" data_ref="input" numerical="True" label="Numerical column for reference co-ordinate" />
+ <conditional name="track">
+ <param name="track_type" type="select" label="Display">
+ <option value="snp" selected="true">SNPs</option>
+ <option value="coverage">Read coverage</option>
+ <option value="both">Both</option>
+ </param>
+ <when value = "coverage">
+ <param name="coverage_col" type="data_column" data_ref="input" numerical="True" label="Numerical column for read coverage" />
+ <param name="cname" type="text" size="15" value="User Track" label="Coverage track name">
+ <validator type="length" max="15"/>
+ </param>
+ <param name="cdescription" type="text" value="User Supplied Coverage Track (from Galaxy)" label="Coverage track description">
+ <validator type="length" max="60" size="15"/>
+ </param>
+ <param label="Coverage track Color" name="ccolor" type="select">
+ <option selected="yes" value="0-0-0">Black</option>
+ <option value="255-0-0">Red</option>
+ <option value="0-255-0">Green</option>
+ <option value="0-0-255">Blue</option>
+ <option value="255-0-255">Magenta</option>
+ <option value="0-255-255">Cyan</option>
+ <option value="255-215-0">Gold</option>
+ <option value="160-32-240">Purple</option>
+ <option value="255-140-0">Orange</option>
+ <option value="255-20-147">Pink</option>
+ <option value="92-51-23">Dark Chocolate</option>
+ <option value="85-107-47">Olive green</option>
+ </param>
+ <param label="Coverage track Visibility" name="cvisibility" type="select">
+ <option selected="yes" value="1">Dense</option>
+ <option value="2">Full</option>
+ <option value="3">Pack</option>
+ <option value="4">Squish</option>
+ <option value="0">Hide</option>
+ </param>
+ </when>
+
+ <when value = "snp">
+ <!--
+ <param name="col1" type="data_column" data_ref="input" label="Column containing the reference nucleotide" />
+ -->
+ <param name="col2" type="data_column" data_ref="input" label="Column containing the read nucleotide" />
+ <!--
+ <param name="sname" type="text" size="15" value="User Track-2" label="SNP track name">
+ <validator type="length" max="15"/>
+ </param>
+ -->
+ <param name="sdescription" type="text" value="User Supplied Track (from Galaxy)" label="SNP track description">
+ <validator type="length" max="60" size="15"/>
+ </param>
+ <param label="SNP track Visibility" name="svisibility" type="select">
+ <option selected="yes" value="1">Dense</option>
+ <option value="2">Full</option>
+ <option value="3">Pack</option>
+ <option value="4">Squish</option>
+ <option value="0">Hide</option>
+ </param>
+ </when>
+
+ <when value = "both">
+ <param name="coverage_col" type="data_column" data_ref="input" numerical="True" label="Numerical column for read coverage" />
+ <param name="cname" type="text" size="15" value="User Track" label="Coverage track name">
+ <validator type="length" max="15"/>
+ </param>
+ <param name="cdescription" type="text" size="15" value="User Supplied Track (from Galaxy)" label="Coverage track description">
+ <validator type="length" max="60"/>
+ </param>
+ <param label="Coverage track Color" name="ccolor" type="select">
+ <option selected="yes" value="0-0-0">Black</option>
+ <option value="255-0-0">Red</option>
+ <option value="0-255-0">Green</option>
+ <option value="0-0-255">Blue</option>
+ <option value="255-0-255">Magenta</option>
+ <option value="0-255-255">Cyan</option>
+ <option value="255-215-0">Gold</option>
+ <option value="160-32-240">Purple</option>
+ <option value="255-140-0">Orange</option>
+ <option value="255-20-147">Pink</option>
+ <option value="92-51-23">Dark Chocolate</option>
+ <option value="85-107-47">Olive green</option>
+ </param>
+ <param label="Coverage track Visibility" name="cvisibility" type="select">
+ <option selected="yes" value="1">Dense</option>
+ <option value="2">Full</option>
+ <option value="3">Pack</option>
+ <option value="4">Squish</option>
+ <option value="0">Hide</option>
+ </param>
+ <!--
+ <param name="col1" type="data_column" data_ref="input" label="Column containing the reference nucleotide" />
+ -->
+ <param name="col2" type="data_column" data_ref="input" label="Column containing the read nucleotide" />
+ <!--
+ <param name="sname" type="text" size="15" value="User Track-2" label="SNP track name">
+ <validator type="length" max="15"/>
+ </param>
+ -->
+ <param name="sdescription" type="text" size="15" value="User Supplied Track (from Galaxy)" label="SNP track description">
+ <validator type="length" max="60"/>
+ </param>
+ <param label="SNP track Visibility" name="svisibility" type="select">
+ <option selected="yes" value="1">Dense</option>
+ <option value="2">Full</option>
+ <option value="3">Pack</option>
+ <option value="4">Squish</option>
+ <option value="0">Hide</option>
+ </param>
+ </when>
+ </conditional>
+ </inputs>
+ <outputs>
+ <data format="customtrack" name="out_file1"/>
+ </outputs>
+
+
+ <help>
+
+.. class:: infomark
+
+**What it does**
+
+This tool formats mapping data generated by short read mappers, as a custom track that can be displayed at UCSC genome browser.
+
+-----
+
+.. class:: warningmark
+
+**Note**
+
+This tool requires the mapping data to contain at least the following information:
+
+chromosome, genome coordinate, read nucleotide (if option to display is SNPs), read coverage (if option to display is Read coverage).
+
+-----
+
+**Example**
+
+For the following Mapping data::
+
+ #chr g_start read_id read_coord g_nt read_nt qual read_coverage
+ chrM 1 1:29:1672:1127/1 11 G G 40 134
+ chrM 1 1:32:93:933/1 4 G A 40 134
+ chrM 1 1:34:116:2032/1 11 G A 40 134
+ chrM 1 1:39:207:964/1 1 G G 40 134
+ chrM 2 1:3:359:848/1 1 G C 40 234
+ chrM 2 1:40:1435:1013/1 1 G G 40 234
+ chrM 3 1:40:730:972/1 9 G G 40 334
+ chrM 4 1:42:1712:921/2 31 G T 35 434
+ chrM 4 1:44:1649:493/1 4 G G 40 434
+
+running this tool to display both SNPs and Read coverage will return the following tracks, containing aggregated data per genome co-ordinate::
+
+ track type=wiggle_0 name="Coverage Track" description="User Supplied Track (from Galaxy)" color=0,0,0 visibility=1
+ variableStep chrom=chrM
+ 1 134
+ 2 234
+ 3 334
+ 4 434
+ track type=wiggle_0 name="Track A" description="User Supplied SNP Track (from Galaxy)" color=255,0,0 visibility=1
+ variableStep chrom=chrM
+ 1 2
+ track type=wiggle_0 name="Track T" description="User Supplied SNP Track (from Galaxy)" color=0,255,0 visibility=1
+ variableStep chrom=chrM
+ 4 1
+ track type=wiggle_0 name="Track G" description="User Supplied SNP Track (from Galaxy)" color=0,0,255 visibility=1
+ variableStep chrom=chrM
+ 1 2
+ 2 1
+ 3 1
+ 4 1
+ track type=wiggle_0 name="Track C" description="User Supplied SNP Track (from Galaxy)" color=255,0,255 visibility=1
+ variableStep chrom=chrM
+ 2 1
+
+ </help>
+</tool>
1
0
details: http://www.bx.psu.edu/hg/galaxy/rev/26825f08d362
changeset: 1506:26825f08d362
user: Anton Nekrutenko <anton(a)bx.psu.edu>
date: Sun Sep 14 14:58:50 2008 -0400
description:
Forgot two test datasets
2 file(s) affected in this change:
test-data/B1.fa
test-data/phiX.fa
diffs (1087 lines):
diff -r b6ff467f4522 -r 26825f08d362 test-data/B1.fa
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/B1.fa Sun Sep 14 14:58:50 2008 -0400
@@ -0,0 +1,1000 @@
+>HWI-EAS91_1_306UPAAXX:6:1:1503:1160
+GGTGGTCTATAGTGTTATTAATATCAAGTTGGGGGG
+>HWI-EAS91_1_306UPAAXX:6:1:1564:1179
+GCGAGCAGTAGACTCCTTCTGTTGATAAGCAAGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:1704:1082
+GATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTC
+>HWI-EAS91_1_306UPAAXX:6:1:1588:1797
+GTATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:1304:1526
+GTAGTTGAAATGGTAATAAGACGACCAATCTGACCT
+>HWI-EAS91_1_306UPAAXX:6:1:1490:1582
+GTCGTGTTCAACAGACCTATAAACATTCTGTGCCGC
+>HWI-EAS91_1_306UPAAXX:6:1:1356:1339
+GTAGACATTTTTACTTTTTATGTCCCTCATCGTCAC
+>HWI-EAS91_1_306UPAAXX:6:1:1311:853
+GGTTGGTTTATCGTTTTTGACACTCTCACGTTGTCT
+>HWI-EAS91_1_306UPAAXX:6:1:1257:1552
+GTTCGCTTTGAGTCTTCTTCGGTTCCGACTACCCTC
+>HWI-EAS91_1_306UPAAXX:6:1:1486:1402
+GTTACTGAGAAGTTAATGGATGAATTGGCACAATGC
+>HWI-EAS91_1_306UPAAXX:6:1:1028:1081
+GGATTGGTTTCGCTGAATCAGGTTATTAAAGAGATT
+>HWI-EAS91_1_306UPAAXX:6:1:1167:752
+GGTTTTCTTCATTGCATTCAGATGGATACATCTGTC
+>HWI-EAS91_1_306UPAAXX:6:1:1507:1113
+GTCAACGTTATATTTTGATAGTTTGACGGTTAATTC
+>HWI-EAS91_1_306UPAAXX:6:1:1654:1311
+GGATGAAAATGCTCACAATGACAAATCTGTCCACGG
+>HWI-EAS91_1_306UPAAXX:6:1:1386:1060
+GTTCTTGGTCAGTATGCAAATTAGCATAAGCAGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:1070:1356
+GGTTACAGTATGCCCATCGCAGTTCGCTACACGCAG
+>HWI-EAS91_1_306UPAAXX:6:1:787:1032
+GCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCA
+>HWI-EAS91_1_306UPAAXX:6:1:834:1017
+GCTACACGCAGGACGCTTTTTCACGTTCTGGTTGGT
+>HWI-EAS91_1_306UPAAXX:6:1:1703:1155
+GGATTGGTTTCGCTGAATCAGGTTATTAAAGAGATT
+>HWI-EAS91_1_306UPAAXX:6:1:1406:593
+GTTGAGTTCGATAATGGTGATATGTATGTTTACGGC
+>HWI-EAS91_1_306UPAAXX:6:1:1411:886
+GTCCTTTACTTGTCATGCGCTCTAATCTCTGGGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:923:972
+GCATGACAAGTAAAGGACGGTTGTCAGCGTCATAAG
+>HWI-EAS91_1_306UPAAXX:6:1:1279:1004
+GCCATAGCACCAGAAACAAAACTAGGGGCGGCCTCT
+>HWI-EAS91_1_306UPAAXX:6:1:1070:840
+GGTTGTCAGCGTCATAAGAGGTTTTACCTCCAAATG
+>HWI-EAS91_1_306UPAAXX:6:1:1595:1040
+GTTTCTGATAAGTTGCTTGATTTGGTTGGACTTGGT
+>HWI-EAS91_1_306UPAAXX:6:1:1002:559
+GAGATTGCCGAGATGCAAAATGAGACTCAAAAAGAG
+>HWI-EAS91_1_306UPAAXX:6:1:999:974
+GTTTGGATTGCTACTGACCGCTCTCGTGCTCGTCGC
+>HWI-EAS91_1_306UPAAXX:6:1:896:982
+GTGGCTGGAGACAAATAATCTCTTTAATAACCTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:1366:741
+GTTCAAGATTGCTGGAGGCCTCCACTATGAAATCGC
+>HWI-EAS91_1_306UPAAXX:6:1:749:1469
+GTTTATGGTGAACAGTGGATTAAGTTCATGAAGGAT
+>HWI-EAS91_1_306UPAAXX:6:1:1010:592
+GAGTTTATTGCTGCCGTCATTGCTTATTATGTTCAT
+>HWI-EAS91_1_306UPAAXX:6:1:1393:650
+GTGACTCATATCTAAACCAGTCCTTGACGAACGTGC
+>HWI-EAS91_1_306UPAAXX:6:1:1238:1731
+GAGAAATAAAAGTCTGAAACATGATTAAACTCCTAA
+>HWI-EAS91_1_306UPAAXX:6:1:1629:908
+GATGCGGTTATCCATCTGCTTATGGAAGCCAAGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:1560:849
+GCTGTCGCTACTTCCCAAGAAGCTGTTCAGAATCAG
+>HWI-EAS91_1_306UPAAXX:6:1:1029:783
+GAGAAGTTAATGGATGAATTGGCACAATGCTACAAT
+>HWI-EAS91_1_306UPAAXX:6:1:1152:1324
+GACAATCAGAAAGAGATTGCCGAGATGCAAAATGAG
+>HWI-EAS91_1_306UPAAXX:6:1:1614:2042
+GAAATGCCACAAGCCTCAATAGCAGGTTTAAGAGCC
+>HWI-EAS91_1_306UPAAXX:6:1:1398:439
+GATGGTTGGTTTATCGTTTTTGACACTCTCACGTTG
+>HWI-EAS91_1_306UPAAXX:6:1:955:616
+GACTAAAGAGATTCAGTACCTTAACGCTAAAGGTGC
+>HWI-EAS91_1_306UPAAXX:6:1:1672:753
+GAATGCCAGCAATCTCTTTTTGAGTCTCATTTTGCT
+>HWI-EAS91_1_306UPAAXX:6:1:1195:1293
+GCAATGCGACAGGCTCATGCTGATGGTTGGTTTATC
+>HWI-EAS91_1_306UPAAXX:6:1:1074:755
+GCAAGAGTAAACATAGTGCCATGCTCAGGAACAAAG
+>HWI-EAS91_1_306UPAAXX:6:1:984:499
+GACTTAGTTCATCAGCAAACGCAGAATCAGCGGTAT
+>HWI-EAS91_1_306UPAAXX:6:1:1452:1833
+GCGTGCTGGTGCTGATGCTTCCTCTGCTGGTATGGT
+>HWI-EAS91_1_306UPAAXX:6:1:863:710
+GAGTTCGATAATGGTGATATGTATGTTGACGGCCAT
+>HWI-EAS91_1_306UPAAXX:6:1:885:649
+GCAGAAGTTAACACTTTCGGATATTTCTGATGAGTC
+>HWI-EAS91_1_306UPAAXX:6:1:917:1214
+GACAGATGTATCCATCTGAATGCAATGAAGAAAACC
+>HWI-EAS91_1_306UPAAXX:6:1:892:1254
+GCTCAGGAAATGCAGCAGCAAGATAATCACGAGTAT
+>HWI-EAS91_1_306UPAAXX:6:1:1555:1005
+GCATTTGGCGCATAATCTCGGAAACCTGCTGTTGCT
+>HWI-EAS91_1_306UPAAXX:6:1:1637:1413
+GATGCTGTTCAACCACTAATAGGTAAGAAATCATGT
+>HWI-EAS91_1_306UPAAXX:6:1:1102:1567
+GGCCAGTTTTCTGGTCGTGTTCAACAGACCTATAAA
+>HWI-EAS91_1_306UPAAXX:6:1:799:1337
+GTATATGCACAAAATGAGATGCTTGCTTATCAACAG
+>HWI-EAS91_1_306UPAAXX:6:1:1353:1843
+GCAGACCCATAATGTCAATAGATGTGGTAGAAGTCG
+>HWI-EAS91_1_306UPAAXX:6:1:1196:789
+GCGGCATACGCTCGGCGCCAGTTTGAATATTAGACA
+>HWI-EAS91_1_306UPAAXX:6:1:1056:1676
+GTAAAATACTGACCAGCCGTTTGAGCTTGAGTAAGC
+>HWI-EAS91_1_306UPAAXX:6:1:1349:1836
+GGAAAACACCAATCTTTCCAAGCAACAGCAGGTTTC
+>HWI-EAS91_1_306UPAAXX:6:1:1027:788
+GGTGTTAATGCCACTCCTCTCCCGACTGTTAACACT
+>HWI-EAS91_1_306UPAAXX:6:1:990:1283
+GCTTAGGGATTTTATTGGTATCAGGGTTAATCGTGC
+>HWI-EAS91_1_306UPAAXX:6:1:904:939
+GAGAAGTTAATGGATGAATTGGCACAATGCTACAAT
+>HWI-EAS91_1_306UPAAXX:6:1:1732:793
+GTCAACATACATATCACCATTATCGAACTCAACGCC
+>HWI-EAS91_1_306UPAAXX:6:1:1355:2003
+GTTAGACCAAACCATGAAACCAACATAAACATTATT
+>HWI-EAS91_1_306UPAAXX:6:1:1337:977
+GCACCAGAAACAAAACTAGGGGCGGCCTCATCAGGG
+>HWI-EAS91_1_306UPAAXX:6:1:1605:1175
+GGAGGTAAAACCTCTTATGACGCTGACAACCGTCCT
+>HWI-EAS91_1_306UPAAXX:6:1:1763:1192
+GACAGGCCGTTTGAATGTTGACGGGATGAACATAAT
+>HWI-EAS91_1_306UPAAXX:6:1:722:483
+GTTATTATACCGTCAAGGACTGTGTGACTATTGACT
+>HWI-EAS91_1_306UPAAXX:6:1:1760:1136
+GCAAAGCATTGGGATTATCATAAAACGCCTCTAATC
+>HWI-EAS91_1_306UPAAXX:6:1:1088:798
+GGAAACCTGCTGTTGCTTGGAAAGATTGGTGTTTTC
+>HWI-EAS91_1_306UPAAXX:6:1:633:1076
+GCTACTTCCCAAGAAGCTGTTCAGAATCAGAATGAG
+>HWI-EAS91_1_306UPAAXX:6:1:673:754
+GTCATGGAAGCGATAAAACTCTGCAGGTTGGATATT
+>HWI-EAS91_1_306UPAAXX:6:1:1759:2019
+GTAAAGGACGGTTGTCAGCGTCATAAGAGGTTTTAC
+>HWI-EAS91_1_306UPAAXX:6:1:1064:1797
+GCGGTTATCCATCTGCTTATGGAAGCCAAGCATTGG
+>HWI-EAS91_1_306UPAAXX:6:1:1112:1669
+GCTCATGCTGATGGTTGGTTTATCGTTTTTGACACT
+>HWI-EAS91_1_306UPAAXX:6:1:510:1447
+GCATTAAGCTCAGGAAATGCAGCAGCAAGATAATCA
+>HWI-EAS91_1_306UPAAXX:6:1:877:1573
+GTGCTATTGCTGGCGGTATTTCTTCTTCTTTTTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:870:542
+GAATGTCACGCTGATTATTTTGACTTTGAGCGTATC
+>HWI-EAS91_1_306UPAAXX:6:1:966:384
+GCACCTGTTTTACAGACACCTAAAGCTACATCGTCA
+>HWI-EAS91_1_306UPAAXX:6:1:1186:1903
+GCCAGCGATAACCGGAGTAGTTGAAATGGTAATAAG
+>HWI-EAS91_1_306UPAAXX:6:1:1632:1742
+GCATCACCCATGCCTACAGTATTGTTATCGGTAGCC
+>HWI-EAS91_1_306UPAAXX:6:1:1521:559
+GAGAGCGCCAACGGCGTCCATCTCGAAGGAGTCGCC
+>HWI-EAS91_1_306UPAAXX:6:1:683:454
+GCTTATTATGTTCATCCCGTCAACATTCAAACGTCC
+>HWI-EAS91_1_306UPAAXX:6:1:112:1280
+GTTGGCGCTCTCCGTCTTTCTCCATTTCGTCGTGTC
+>HWI-EAS91_1_306UPAAXX:6:1:891:381
+GACCAGGGCGAGCGCCAGAACGTTTTTTACCTTTAG
+>HWI-EAS91_1_306UPAAXX:6:1:1348:958
+GATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1785:1915
+GCCCCGAAGGGGACNANAAATGGTTTTTAGAGAACG
+>HWI-EAS91_1_306UPAAXX:6:1:1418:42
+GTATGCCCATCGCAGTTCGCTACACGCAGGACGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:1421:743
+GGTCAACGCTACCTGTAGGAAGTGTCCGCATAAAAT
+>HWI-EAS91_1_306UPAAXX:6:1:1079:790
+GCCAAATGCTTACTCAAGCTCAAACGGCTGGTCAGT
+>HWI-EAS91_1_306UPAAXX:6:1:663:740
+GGTATTAAGGATGAGTGTTCAAGATTGCTGGATGCC
+>HWI-EAS91_1_306UPAAXX:6:1:1245:413
+GTTTGAATGTTGACGGGATGAACATAATAAGCAATG
+>HWI-EAS91_1_306UPAAXX:6:1:1378:1035
+GCTCTTGCTGGTGGCGCCATGTCTAAATTGTTTGGG
+>HWI-EAS91_1_306UPAAXX:6:1:903:1746
+GTACGGGGAAGGACGTCAATAGTCACACAGTCCTTG
+>HWI-EAS91_1_306UPAAXX:6:1:1713:1134
+GGCGTACGGGGAAGGACGTCAATAGTCACACAGTCC
+>HWI-EAS91_1_306UPAAXX:6:1:1246:1887
+GCTCTAATCTCTGGGCATCTGGCTATGATGTTGATG
+>HWI-EAS91_1_306UPAAXX:6:1:872:1731
+GGGCGGCCTCATCAGGGTTAGGAACATTAGAGCCTT
+>HWI-EAS91_1_306UPAAXX:6:1:1714:1582
+GCTTTCCTGCTCCTGTTGAGTTTATTGCTTCCGTCT
+>HWI-EAS91_1_306UPAAXX:6:1:1785:763
+GNCGAGAAATAAAANNNTGAAACATGATTAAANTCC
+>HWI-EAS91_1_306UPAAXX:6:1:1684:542
+GAAAAGACAGAATCTCTTCCAAGAGCTTGATGCGGT
+>HWI-EAS91_1_306UPAAXX:6:1:1581:1665
+GACTTTGAGCGTATCGAGGCTCTTAAACCTGCTATT
+>HWI-EAS91_1_306UPAAXX:6:1:901:1581
+GTGCTGATATTGCTTTTGATGCCGACCCTAAATTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1128:239
+GGTTATTATACCGTCAAGGACTGTGTGACTATTGAC
+>HWI-EAS91_1_306UPAAXX:6:1:969:441
+GGTAAGAAATCATGAGTCAAGTTACTGAACAATCCG
+>HWI-EAS91_1_306UPAAXX:6:1:630:1087
+GCCACCATGATTATGACCAGTGTTTCCAGTCCGTTC
+>HWI-EAS91_1_306UPAAXX:6:1:606:1852
+GGAGACAAATAATCTCTTTAATAACCTGATTCAGCG
+>HWI-EAS91_1_306UPAAXX:6:1:489:1315
+GAAAGCTCAGTCTCAGGAGGAAGCGGAGCAGTCCAC
+>HWI-EAS91_1_306UPAAXX:6:1:465:1983
+GAGCCAATACCATCAGCTTTACCGTCTTTCCAGAAA
+>HWI-EAS91_1_306UPAAXX:6:1:559:1028
+GAGTGCTTAATCCAACTTACCAAGCTGGGTTACGAC
+>HWI-EAS91_1_306UPAAXX:6:1:1655:1413
+GTATGTTGACGGCCATAAGGCTGCTTCTGACGTTCG
+>HWI-EAS91_1_306UPAAXX:6:1:980:605
+GCCGTTTGAATGTTGACGGGATGAACATAATAAGCA
+>HWI-EAS91_1_306UPAAXX:6:1:1629:1865
+GAAAAGCGGCATGGTCAATATAACCAGTAGTGTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:1180:1920
+GCACTCCGTGGACAGATTTGTCATTGTGAGCATTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1116:383
+GCGCAGGAAACACTGACGTTCTTACTGACGCAGAAG
+>HWI-EAS91_1_306UPAAXX:6:1:906:2041
+GTCACGTTTATGGTGAACAGTGGATTAAGTTCATGA
+>HWI-EAS91_1_306UPAAXX:6:1:1514:157
+GTCAATAGATGTGGTAGAAGTCGTCATTTGGCGTGG
+>HWI-EAS91_1_306UPAAXX:6:1:1032:1857
+GCAATCGCCGCCAGTTAAATAGCTTGCAAAATACGT
+>HWI-EAS91_1_306UPAAXX:6:1:638:609
+GATTCTGTCAAAAACTGACGCGTTGGATGAGGAGAT
+>HWI-EAS91_1_306UPAAXX:6:1:74:750
+GATAATCACGAGTATCCTTTCCTTTATCATCTTCAT
+>HWI-EAS91_1_306UPAAXX:6:1:486:822
+GTTGACGATGTAGCTTTAGGTGTCTTTAAAACAGGT
+>HWI-EAS91_1_306UPAAXX:6:1:899:473
+GAACAGCATCGGACTCAGATAGTAATCCACGCTCTT
+>HWI-EAS91_1_306UPAAXX:6:1:1613:197
+GTGACATTCAGAAGGGTAATAAGAACGAACCATAAA
+>HWI-EAS91_1_306UPAAXX:6:1:326:1747
+GTTGAGGCTTTCGTTTATTGTACGCTTTGCTTTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1487:526
+GCAAAATACGTGGCCTTATGGTTACAGTATGCCCAT
+>HWI-EAS91_1_306UPAAXX:6:1:629:665
+GAAATGCAGCAGCAAGATAATCACGAGTATCCTTTC
+>HWI-EAS91_1_306UPAAXX:6:1:766:744
+GGCCGTCAACATACATATCACCATTATCGAACTCAA
+>HWI-EAS91_1_306UPAAXX:6:1:391:1771
+GTGGTTGATATTTTTCATGGTATTGATAAATCTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:591:1102
+GCTTTGCGTGACTATTTTCGTGATATTGTTCGTTTG
+>HWI-EAS91_1_306UPAAXX:6:1:917:664
+GCCATGATGGTGGTTATTATACCGTCAAGGACTGTG
+>HWI-EAS91_1_306UPAAXX:6:1:217:737
+GTTCAGTTGTTGCATTGGAATATTCAGTTTAAATTT
+>HWI-EAS91_1_306UPAAXX:6:1:1047:839
+GACCATTCAAAGGATAAACATCATAGGCAGTCGGGG
+>HWI-EAS91_1_306UPAAXX:6:1:558:1040
+GCCACCAGCAAGAGCAGAAGCAATACCGCCAGCAAT
+>HWI-EAS91_1_306UPAAXX:6:1:1207:524
+GCCAATACCATCAGCTTTACCGTCTTTCCAGAAATT
+>HWI-EAS91_1_306UPAAXX:6:1:708:1634
+GCCATTCAAGGCTCTAATGTTCCTAACCCTGATGAG
+>HWI-EAS91_1_306UPAAXX:6:1:576:1851
+GTGCTATGGCTAAAGCTGGTAAAGGACTTCTTGAAG
+>HWI-EAS91_1_306UPAAXX:6:1:906:460
+GTAGACATTTTTACTTTTTATGTCCCTCATCGTCAC
+>HWI-EAS91_1_306UPAAXX:6:1:693:1260
+GCGAAAGGTCGCAAAGTAAGAGCTTCTCGAGCTGCG
+>HWI-EAS91_1_306UPAAXX:6:1:1373:286
+GGACACTTCCTACAGGTAGCGTTGACCCTAATTTTG
+>HWI-EAS91_1_306UPAAXX:6:1:762:41
+GATACTTGGAACAATTTCTGGAAAGACGGTAAAGCT
+>HWI-EAS91_1_306UPAAXX:6:1:475:1091
+GTCACACAGTCCTTGACGGTATAATAACCACCATCT
+>HWI-EAS91_1_306UPAAXX:6:1:791:627
+GCCTCCGGTGGCATTCAAGGTGATGTGCTTGCTACC
+>HWI-EAS91_1_306UPAAXX:6:1:336:1791
+GAAGGAGTCGCCAGCGATAACCGGAGTAGTTGAAAT
+>HWI-EAS91_1_306UPAAXX:6:1:1483:943
+GCACGTAATTTTTGACGCACGTTTTCTTCTGCGTCT
+>HWI-EAS91_1_306UPAAXX:6:1:641:1071
+GATGGGCATACTGTAACCATAAGGCCACGTATTTTG
+>HWI-EAS91_1_306UPAAXX:6:1:196:755
+GAACGCCCTCTTAAGGATATTCGCGATGAGTATAAT
+>HWI-EAS91_1_306UPAAXX:6:1:463:1398
+GTCATAAGAGGTTTTACCTCCAAATGAAGAAATAAC
+>HWI-EAS91_1_306UPAAXX:6:1:1559:460
+GCTCACAATGACAAATCTGTCCACGGAGTGCTTAAT
+>HWI-EAS91_1_306UPAAXX:6:1:1625:1561
+GAGGAGTGGCATTAACACCATCCTTCATGAACTTAC
+>HWI-EAS91_1_306UPAAXX:6:1:1729:1588
+GCTGATAAAGGAAAGGATACTCGTGATTATCTTGCT
+>HWI-EAS91_1_306UPAAXX:6:1:945:393
+GGCCTCATCAGGGTTAGGAACATTAGAGCCTTGAAT
+>HWI-EAS91_1_306UPAAXX:6:1:298:1391
+GTAAAGTTAGACCAAACCATGAAACCAACATAAACA
+>HWI-EAS91_1_306UPAAXX:6:1:1270:1500
+GAATTACTACTGCTTGTTTACGAATTAAATATATGT
+>HWI-EAS91_1_306UPAAXX:6:1:481:1546
+GCTGGCATTCAGTCGGCGACTTCACGCCAGAATACG
+>HWI-EAS91_1_306UPAAXX:6:1:473:1729
+GTTCTTACTGACGCAGAAGAAAACGTGCGTCAAAAT
+>HWI-EAS91_1_306UPAAXX:6:1:801:1831
+GCTGAGGTTGACTTAGTTCATCAGCAAACGCAGAAT
+>HWI-EAS91_1_306UPAAXX:6:1:536:639
+GCCGACCCTAAATTTTTTGCCTGTTTGGTTCTCTTT
+>HWI-EAS91_1_306UPAAXX:6:1:259:938
+GTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTG
+>HWI-EAS91_1_306UPAAXX:6:1:907:1513
+GGCATGGGTGATGCTGGTATTAAATCTGCCATTCAC
+>HWI-EAS91_1_306UPAAXX:6:1:372:1409
+GATGAGTATAATTACCCCAAAAAGAAAGGTATTAAG
+>HWI-EAS91_1_306UPAAXX:6:1:485:1626
+GATGGCAGCAACGGAAACCATAACGAGCATCATCTT
+>HWI-EAS91_1_306UPAAXX:6:1:583:1679
+GCTCAAAGTCAAAATAATCAGCGTGACATTCAGAAG
+>HWI-EAS91_1_306UPAAXX:6:1:690:1610
+GACGCGTTGGATGAGGAGAAGTGGCTTAATATGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:257:918
+GCAGGCTGGCACTTCTGCCGTTTCTGATAAGTTTCT
+>HWI-EAS91_1_306UPAAXX:6:1:818:33
+GTGTTAATGCCACTCCTCTCCCGACTGTTAACTCTG
+>HWI-EAS91_1_306UPAAXX:6:1:541:1242
+GGGATTATCATAAAACGCCTCTAATCGGTCGTCAGC
+>HWI-EAS91_1_306UPAAXX:6:1:1014:279
+GTAAAAATGTCTACAGTAGAGTCAATAGCAAGGCCC
+>HWI-EAS91_1_306UPAAXX:6:1:672:1790
+GGCCGTTTGAATGTTGACGGGATGAACATAATAAGC
+>HWI-EAS91_1_306UPAAXX:6:1:708:464
+GGAGACAAATAATCTCTTTAATAACCTGATTCAGCG
+>HWI-EAS91_1_306UPAAXX:6:1:633:1486
+GGGAAAGGTCATGCGGCATACGCTCGGCGCCAGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:328:696
+GTTCCGACTACCCTCCCGACTGCCTATGATGTTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:259:1389
+GCGTACTTATTCGCCACCATGATTATTACCAGTGTT
+>HWI-EAS91_1_306UPAAXX:6:1:1315:41
+GCTTTCCGTGATGTCACAGCCTGCTTTGATGTGTCG
+>HWI-EAS91_1_306UPAAXX:6:1:1647:549
+GCTTAATCCAACTTACCAAGCTGGGTTACGACGCGC
+>HWI-EAS91_1_306UPAAXX:6:1:300:886
+GTTCTTGGTCAGTATGCAAATTAGCATAAGCAGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:317:1411
+GTACGCTGTACTTTGTGGGATACCCTCGCTTTCCTT
+>HWI-EAS91_1_306UPAAXX:6:1:321:1819
+GGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGT
+>HWI-EAS91_1_306UPAAXX:6:1:631:70
+GTGGATTACTATCTGAGTCCGATGCTGTTCAACCAC
+>HWI-EAS91_1_306UPAAXX:6:1:624:1040
+GCTGGCGACTCCTTCGAGATGGACGCCGTTTGCGCT
+>HWI-EAS91_1_306UPAAXX:6:1:662:1187
+GGGAGAGGAGTGGCATTAACACCATCCTTCATGACC
+>HWI-EAS91_1_306UPAAXX:6:1:1440:1959
+GAATCAGCGGTATGGCTCCTCTCCTATTTTTGCTTC
+>HWI-EAS91_1_306UPAAXX:6:1:458:1629
+GCTGGTGGCGCCATGTCTAAATTTTTTGGAGGCGGT
+>HWI-EAS91_1_306UPAAXX:6:1:216:790
+GGGATGAAAATGCTCACAATGACAAATCTGTCCACG
+>HWI-EAS91_1_306UPAAXX:6:1:1407:1174
+TTACCTATTAGTGGTTGAACAGCATCGGACTCAGAT
+>HWI-EAS91_1_306UPAAXX:6:1:999:1790
+GTCCTGCGTGTAGCGAACTGCGATGGGCATACTGTC
+>HWI-EAS91_1_306UPAAXX:6:1:141:1994
+GGCTTTTTTATGGTTCGTTCTTATTACCCTTCTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:225:465
+GTCAGATATGGACCTTGCTGCTAAAGGTCTAGGAGC
+>HWI-EAS91_1_306UPAAXX:6:1:649:1760
+GACCCATAATGTCAATAGATGTGGTAGAAGTCGTCT
+>HWI-EAS91_1_306UPAAXX:6:1:300:986
+GTTGAACACGACCAGAAAACTGGCCTAACGACGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:478:605
+GAGACTGAGCTTTCTCGCCAAATGACGACTTCTACC
+>HWI-EAS91_1_306UPAAXX:6:1:622:395
+GGTAGCTTTAAGCGGCTCACCTTTAGCATCAACAGG
+>HWI-EAS91_1_306UPAAXX:6:1:1701:574
+GTAAAGCCTCTACGCGATTTCATAGTGGAGGCCTCC
+>HWI-EAS91_1_306UPAAXX:6:1:646:59
+GGAAGTGTCCGCATAAAATGCACCGCATGGAAATGT
+>HWI-EAS91_1_306UPAAXX:6:1:284:2031
+GACAGAATCGTTAGTTGATGGCGAAAGGTCGCAAAG
+>HWI-EAS91_1_306UPAAXX:6:1:22:1009
+GATGGATACATCTGTCAACGCCGCTAATCAGGTTGT
+>HWI-EAS91_1_306UPAAXX:6:1:47:1826
+GCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCG
+>HWI-EAS91_1_306UPAAXX:6:1:1025:1236
+TGGATGAGGAGAAGTGGCTTAATATGCTTGGCACGT
+>HWI-EAS91_1_306UPAAXX:6:1:773:591
+GAGCAGGAAAGCGAGGGTATCCCACAAAGTCCAGCG
+>HWI-EAS91_1_306UPAAXX:6:1:1753:527
+GGTGGCATTCAAGGTGATGTGCTTGCTACCGATAAC
+>HWI-EAS91_1_306UPAAXX:6:1:426:1717
+GTAGCGCCAATATGAGAAGAGCCATACCGCTGATTC
+>HWI-EAS91_1_306UPAAXX:6:1:959:818
+TTCTGATAAGCTGGTTCTCACTTCTGTTACTCCAGC
+>HWI-EAS91_1_306UPAAXX:6:1:459:1344
+GCCTATGATGTTTATCCTTTGAATGGTCGCCATGAT
+>HWI-EAS91_1_306UPAAXX:6:1:973:1367
+TTCGTGATGAGTTTGTATCTGTTACTGATAAGTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:201:871
+GATTAGAGGCGTTTTATGATAATCCCAATGCTTTTC
+>HWI-EAS91_1_306UPAAXX:6:1:713:1672
+GGCGTACGGGGAAGGACGTCAATAGTCACACAGTCC
+>HWI-EAS91_1_306UPAAXX:6:1:444:1435
+TTTGTGGGATACCCTCGCTTTCCTGCTCCTGTTGTG
+>HWI-EAS91_1_306UPAAXX:6:1:288:1136
+GCCTTCCATGATGAGACAGGCCGTTTTAATTTTTTC
+>HWI-EAS91_1_306UPAAXX:6:1:1653:225
+GCAAGGCCACGACGCAATGGAGAAAGACGGAGAGCG
+>HWI-EAS91_1_306UPAAXX:6:1:537:1764
+GCTCCGCTTCCTCCTGAGACTGAGCTTTCTCGCCAA
+>HWI-EAS91_1_306UPAAXX:6:1:196:1854
+GTATCGAGGCTCTTAAACCTGCTATTTAGGCTTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:312:1707
+GCGTCATAAGAGGTTTTACCTCCAAATGAAGAAATA
+>HWI-EAS91_1_306UPAAXX:6:1:651:183
+GTATGTTTCTCCTGCTTATCACCTTCTTGAAGGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:295:694
+GTGATTACTTCATGCAGCGTTACCGTGATGTTATTT
+>HWI-EAS91_1_306UPAAXX:6:1:330:1895
+GCAAGTCTGCCGCTGATAAAGGAAAGGATACTCGTG
+>HWI-EAS91_1_306UPAAXX:6:1:590:331
+GAAATTTCTATGAATGATGTTTTCCGTTCTGGTGAT
+>HWI-EAS91_1_306UPAAXX:6:1:481:1687
+GCAGATTGCGATAAACGGTCACATTAAATTTAACCT
+>HWI-EAS91_1_306UPAAXX:6:1:1112:1279
+TGTGCATATACCTGGTCTTTCGTATTCTGTCGTGAT
+>HWI-EAS91_1_306UPAAXX:6:1:1099:1216
+TTAGAGCGCATGACAAGTAAAGGACGGTTGTCAGCG
+>HWI-EAS91_1_306UPAAXX:6:1:221:1238
+GTATCCTTTCCTTTATCATCGGCAGACTTTTCACCT
+>HWI-EAS91_1_306UPAAXX:6:1:1015:364
+GCCAGCGATAACCGGAGTAGTTGAAATGGTAATAAG
+>HWI-EAS91_1_306UPAAXX:6:1:735:1806
+TGTTATTAATATCAAGTTGGGGGAGCACATTGTAGC
+>HWI-EAS91_1_306UPAAXX:6:1:320:411
+GCTCTTGGAAGAGATTCTGTCTTTTCGTATGCAGTG
+>HWI-EAS91_1_306UPAAXX:6:1:1273:1031
+TTAAGGATATTCGCGATGAGTATAATTACCCCAAAA
+>HWI-EAS91_1_306UPAAXX:6:1:1456:1088
+AATAATCAGCGTGACATTCAGAAGGGTAATAAGAAC
+>HWI-EAS91_1_306UPAAXX:6:1:1365:307
+GACGGCCATAAGGCTGCTTCTGACGTTCGTGATGAG
+>HWI-EAS91_1_306UPAAXX:6:1:478:252
+GATGCGGTTATCCATCTGCTTATTGAAGCCAAGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:915:1232
+TATTAATAACACTATAGACCACCGCCCCGAAGGGGC
+>HWI-EAS91_1_306UPAAXX:6:1:680:1357
+TTCCTCCTGAGACTGAGCTTTCTCGCCAAATGACGC
+>HWI-EAS91_1_306UPAAXX:6:1:238:1279
+GCCGAAGCCCCTGCAATTAAAATTGTTGACCACCTA
+>HWI-EAS91_1_306UPAAXX:6:1:1583:35
+GCAAATTAGCATAAGCAGCTTGCAGACCCATAATGT
+>HWI-EAS91_1_306UPAAXX:6:1:502:283
+GTTCCGACTACCCTCCCGACTGCCTATGATGTTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:418:1730
+GAAGGCTTCCCATTCATTCAGGAACCGCCTTCTGGT
+>HWI-EAS91_1_306UPAAXX:6:1:596:647
+GCCTCAACGCAGCGACGAGCACGAGAGCGGTCAGTA
+>HWI-EAS91_1_306UPAAXX:6:1:92:1591
+GTTCATGAAGGATGGTGTTAATGCCACTCCTCTCCC
+>HWI-EAS91_1_306UPAAXX:6:1:430:1938
+GCAGGACGCTTTTTCACGTTCTGGTTGGTTGTGTCC
+>HWI-EAS91_1_306UPAAXX:6:1:212:527
+GGTATTGATAAAGCTGTTGCCGATACTTGTAACAAT
+>HWI-EAS91_1_306UPAAXX:6:1:594:942
+GACGACATTAGAAATATCCTTTGCAGTAGCGCCAAT
+>HWI-EAS91_1_306UPAAXX:6:1:169:1774
+GCCTTCCATGATGAGACAGGCCGTTTTAATTTTTAC
+>HWI-EAS91_1_306UPAAXX:6:1:1090:210
+GGAGAGCGCCAACGGCGTCCATCTCGAAGGAGTCGC
+>HWI-EAS91_1_306UPAAXX:6:1:589:96
+GGCGGCCCCATCAGGGTTAGGAACATTAGAGCCTTG
+>HWI-EAS91_1_306UPAAXX:6:1:1477:1231
+TAGGAACATTAGAGCCTTGAATGGCAGATTTAATAC
+>HWI-EAS91_1_306UPAAXX:6:1:707:1076
+TCTGACGTTCGTGATGAGTTTGTATCTTTTTCTTTG
+>HWI-EAS91_1_306UPAAXX:6:1:749:1715
+GAACATAATAAGCAATGACGGCAGCAATAAACTCAA
+>HWI-EAS91_1_306UPAAXX:6:1:1738:1884
+GCTCACCTTTAGCATCAACAGGCCACAACCAACCAG
+>HWI-EAS91_1_306UPAAXX:6:1:1160:1088
+TCACATTTTGTTCATGGTAGAGATTCTCTTGTTGAC
+>HWI-EAS91_1_306UPAAXX:6:1:517:119
+GCAAGGCTAATGATTCACACGCCGACTGCTATCAGT
+>HWI-EAS91_1_306UPAAXX:6:1:1472:716
+TGGTAATGGTGGTTTTCTTCATTTCATTCAGTTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:281:441
+GAGCAGTAGACTCCTTCTGTTGATAAGCAAGCATCT
+>HWI-EAS91_1_306UPAAXX:6:1:1101:324
+AATACCATCAGCTTTACCGTCTTTCCAGAAATTGTT
+>HWI-EAS91_1_306UPAAXX:6:1:1225:1494
+TTCTCAAATCCGGCGTCAACCATACCAGCAGAGGAA
+>HWI-EAS91_1_306UPAAXX:6:1:1509:1025
+TTCTTGCTGCCGAGGGTCGCAAGGCTATTGTTTCAC
+>HWI-EAS91_1_306UPAAXX:6:1:592:510
+GATACCAATAAAATCCCTAAGCATTTGTTTCTGGTT
+>HWI-EAS91_1_306UPAAXX:6:1:324:1729
+GAACAAAGAAACGCGGCACAGAATGTTTATAGGTCT
+>HWI-EAS91_1_306UPAAXX:6:1:128:1925
+GGAACAACTCACTAAAAACCAAGCTGTCGCTACTTC
+>HWI-EAS91_1_306UPAAXX:6:1:786:893
+TACGGGGAAGGACGTCAATAGTCACACAGTCCTTGC
+>HWI-EAS91_1_306UPAAXX:6:1:248:955
+GCTACAATGTGCTCCCCCAACTTGATATTAATAACA
+>HWI-EAS91_1_306UPAAXX:6:1:388:1127
+GATATTGGTCGTATGGTTCTTGCTGCCTAGTGTCTC
+>HWI-EAS91_1_306UPAAXX:6:1:721:1156
+TCTGGTTGGTTGTGGCCTTTTTATGCTAAATGTTAG
+>HWI-EAS91_1_306UPAAXX:6:1:1564:1468
+TTACTTTTTATGTCCCTCATCGTCACGTTTATGTTG
+>HWI-EAS91_1_306UPAAXX:6:1:750:77
+GGCTCATTCTGATTCTGAACAGCTTCTTGGGAAGTA
+>HWI-EAS91_1_306UPAAXX:6:1:405:487
+GTTGGATTAAGCACTCCGTGGACAGATTTGTCATTT
+>HWI-EAS91_1_306UPAAXX:6:1:836:1204
+TTGCTTCTGCTCTTGCTTGTGGCGCCATGTCTAAAT
+>HWI-EAS91_1_306UPAAXX:6:1:224:1548
+GCTGCCGTCATTGCTTATTATGTTCATCCCTTCAAC
+>HWI-EAS91_1_306UPAAXX:6:1:931:1015
+TTAAGGTACTGAATCTCTTTAGTCGCAGTAGGCGGT
+>HWI-EAS91_1_306UPAAXX:6:1:329:579
+GTCCCTCATCGTCACGTTTATGGTGAACAGTGGATT
+>HWI-EAS91_1_306UPAAXX:6:1:260:1145
+GCTTGCGTTTATGGTACGCTGGACTTTTTGTGATAC
+>HWI-EAS91_1_306UPAAXX:6:1:1523:1253
+TTGGTAAAATACTGACCAGCCGTTTGAGCTTGAGTA
+>HWI-EAS91_1_306UPAAXX:6:1:326:1271
+GACCACTCGCGATTCAATCATGACTTCGTGATAAAT
+>HWI-EAS91_1_306UPAAXX:6:1:213:622
+GCACCTGTTTTACAGACACCTAAAGCTACATCGTCA
+>HWI-EAS91_1_306UPAAXX:6:1:274:712
+GCGGTCAAAAAGCCGCCTCCGGTGGCATTCAAGGTG
+>HWI-EAS91_1_306UPAAXX:6:1:1549:627
+TATGGTTCTTGCTGCCGAGGGTCGCAAGGCTAATGT
+>HWI-EAS91_1_306UPAAXX:6:1:1714:737
+TCTTTCGTATTCTGGCGTGAAGTCGCCGACTGAATG
+>HWI-EAS91_1_306UPAAXX:6:1:760:1217
+TACACGCAGGACGCTTTTTCACGTTCTGGTTGGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:174:768
+GTTGGCTGACGACCGATTAGAGGCGTTTTTTTATAT
+>HWI-EAS91_1_306UPAAXX:6:1:172:1412
+GGTCGGCAGATTGCGATAAACGTTCACATTAAATTT
+>HWI-EAS91_1_306UPAAXX:6:1:1393:869
+TTCATCCCGTCAACATTCAAACGGCCTGTCTCATCT
+>HWI-EAS91_1_306UPAAXX:6:1:301:481
+GTTATAGATATTCAAATAACCCTGAAACAAATGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:648:1093
+TAACGCTGCATGAAGTAATCACGTTCTTGGTCAGTT
+>HWI-EAS91_1_306UPAAXX:6:1:1233:591
+TTCCCATCTTGGCTTCCTTGCTGGTCAGATTGGTCT
+>HWI-EAS91_1_306UPAAXX:6:1:540:1415
+TTATTAAAGAGATTATTTTTCTCCAGCCACTTATGT
+>HWI-EAS91_1_306UPAAXX:6:1:151:1792
+GCAAGCTGCTTATGCTAATTTGCATACTGACCAAGA
+>HWI-EAS91_1_306UPAAXX:6:1:748:1378
+TGGATTACTATCTGAGTCCGATGCTGTTCAACCACT
+>HWI-EAS91_1_306UPAAXX:6:1:1526:1479
+TGGTTGGTTGTGGCCTGTTGATGCTAAAGGTGAGCC
+>HWI-EAS91_1_306UPAAXX:6:1:985:1093
+TAACCGTCTTCTCGTTCTCTAAAAACCATTTTTCTT
+>HWI-EAS91_1_306UPAAXX:6:1:480:1378
+TCAACCTCAGCACTAACCTTGCGAGTCATTTCTTTG
+>HWI-EAS91_1_306UPAAXX:6:1:903:753
+TGTGGCCTGTTGATGCTAAAGGTGAGCCGCTTAAAG
+>HWI-EAS91_1_306UPAAXX:6:1:1697:1737
+GGCGACCCTGTTTTGTATGGCAACTTGCCGCCGCGT
+>HWI-EAS91_1_306UPAAXX:6:1:803:1037
+TGTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCTACT
+>HWI-EAS91_1_306UPAAXX:6:1:1727:1244
+TTAATGCTTGGGAGCGTGCTGGTGCTGATGCTTCCT
+>HWI-EAS91_1_306UPAAXX:6:1:253:1162
+GCATTTAGTAGCGGTAAAGTTTGACCAAACCATTAT
+>HWI-EAS91_1_306UPAAXX:6:1:216:856
+GTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCT
+>HWI-EAS91_1_306UPAAXX:6:1:825:886
+TCCCACAAAGTCCAGCGTACCATAAACGCAAGCCTC
+>HWI-EAS91_1_306UPAAXX:6:1:1699:962
+TGATTTCGATTTTCTGACGAGTAACAAAGTTTGGAT
+>HWI-EAS91_1_306UPAAXX:6:1:1210:625
+TCAGATAGTAATCCACGCTCTTTTAAAATGTCAACA
+>HWI-EAS91_1_306UPAAXX:6:1:538:616
+TAAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGT
+>HWI-EAS91_1_306UPAAXX:6:1:184:1849
+GCTCACCTTTAGCATCAACAGGCCACAACCAACCAG
+>HWI-EAS91_1_306UPAAXX:6:1:1636:1103
+TATCTGACTTTTTGTTAACGTATTTAGCCACATAGA
+>HWI-EAS91_1_306UPAAXX:6:1:605:223
+GGTTATTTGAATATCTATAACAACTATTTTAAATCG
+>HWI-EAS91_1_306UPAAXX:6:1:256:1052
+GGTAAAGGACTTCTTGAAGGTACGTTGCAGTCTGGC
+>HWI-EAS91_1_306UPAAXX:6:1:300:1515
+GCCATGATGGTGGTTATTATACCGTCAAGGACTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1684:1320
+TGCTTGGCTTCCATAAGCAGATGGATAACCGCATCA
+>HWI-EAS91_1_306UPAAXX:6:1:1186:895
+TCAGATGGATACATCTGTCAACGCCGCTAATCAGGT
+>HWI-EAS91_1_306UPAAXX:6:1:1463:754
+TCACTTCTGTTACTCCAGCTTCTTCGGCACCTGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:808:1053
+TGTCCACGGAGTGCTTAATCCAACTTACCAAGCTGG
+>HWI-EAS91_1_306UPAAXX:6:1:960:1218
+TTTCTAATGTCGTCACTGATGCTGCTTCTGTTGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:521:1646
+GGAAAACGAACAAGCGCAAGAGTAAACATAGTGCCA
+>HWI-EAS91_1_306UPAAXX:6:1:289:1885
+GCCAGCGATAACCGGAGTAGTTGAAATGGTAATAAG
+>HWI-EAS91_1_306UPAAXX:6:1:471:170
+GGTCAGTTCCATCAACATCATAGCCAGATGCCCAGA
+>HWI-EAS91_1_306UPAAXX:6:1:828:754
+TTTGCGTGACTATTTTCGTGATATTGTTCGTATGGT
+>HWI-EAS91_1_306UPAAXX:6:1:924:1679
+TTTAATGTGACCGTTTATCGCAATCTGCCGACCACT
+>HWI-EAS91_1_306UPAAXX:6:1:837:901
+TGCATTTTAGTAAGCTCTTTTTGATTCTCAAATCCG
+>HWI-EAS91_1_306UPAAXX:6:1:543:16
+GCTTAATGCTTGGGAGCGTGCTGGTGCTGATGCTTC
+>HWI-EAS91_1_306UPAAXX:6:1:1482:578
+TCTTTAGCTCCTAGACCTTTAGCAGCAAGGTCCATA
+>HWI-EAS91_1_306UPAAXX:6:1:1254:1668
+TTATGCGCCTTCGTATGTTTCTCCTGCTTATCACCT
+>HWI-EAS91_1_306UPAAXX:6:1:1402:898
+TCATGAGTCAAGTTACTGAACAATCCGTACGTTTCC
+>HWI-EAS91_1_306UPAAXX:6:1:764:1534
+TTATACCGTCAAGGACTGTGTGACTATTGACGTCCT
+>HWI-EAS91_1_306UPAAXX:6:1:681:1079
+TGGCGAATAAGTACGCGTTCTTGCAAATCACCAGAA
+>HWI-EAS91_1_306UPAAXX:6:1:672:1350
+TTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1266:493
+TGACCAGCCGTTTGAGCTTGAGTAAGCATTTGGCGC
+>HWI-EAS91_1_306UPAAXX:6:1:118:238
+GACGGTATAATAACCACCATCATGGCGACCATTCAA
+>HWI-EAS91_1_306UPAAXX:6:1:699:433
+TTATTGCCCGGCGTACGGGGAAGGACGTCAATAGTC
+>HWI-EAS91_1_306UPAAXX:6:1:708:1387
+TGCTTCTGACGTTCGTGATGAGTTTGTATCTGTTAC
+>HWI-EAS91_1_306UPAAXX:6:1:498:1085
+TTATGATAATCCCAATGCTTTGCGTGACTATTTTCT
+>HWI-EAS91_1_306UPAAXX:6:1:1101:1301
+TCCGTACGTTTCCAGACCGCTTTGGCCTCTATTAAT
+>HWI-EAS91_1_306UPAAXX:6:1:261:213
+GAATGGTCGCCATGATGGTGGTTATTATACCGTCAC
+>HWI-EAS91_1_306UPAAXX:6:1:1287:1267
+TGCTACTGACCGCTCTCGTGCTCGTCGCTGCGTTGT
+>HWI-EAS91_1_306UPAAXX:6:1:744:331
+TTAATGGATGAATTGGCACAATGCTACAATGTGCTC
+>HWI-EAS91_1_306UPAAXX:6:1:614:814
+TGTCAGCGTCATAAGAGGTTTTACCTCCAAATGAAG
+>HWI-EAS91_1_306UPAAXX:6:1:1362:1063
+TAAACGCAAGCCTCAACGCAGCGACGAGCACGAGAG
+>HWI-EAS91_1_306UPAAXX:6:1:1238:1508
+TCAACTAACGATTCTGTCAAAAACTGACGCGTTGGT
+>HWI-EAS91_1_306UPAAXX:6:1:904:1130
+TTATCGCAATCTGCCGACCACTCGCGATTCAATCAT
+>HWI-EAS91_1_306UPAAXX:6:1:465:216
+GACCATGCCGCTTTTCTTGGCACGATTAACCCTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:844:628
+TAATGTCAATAGATGTGGTAGAAGTCGTCATTTGGC
+>HWI-EAS91_1_306UPAAXX:6:1:684:1444
+TATCCCACAAAGTCCAGCGTACCATAAACGCAAGCC
+>HWI-EAS91_1_306UPAAXX:6:1:515:1373
+TAAGTTCATGAAGGATGGTGTTAATGCCACTCCTCT
+>HWI-EAS91_1_306UPAAXX:6:1:764:1667
+TTGAGTTCGATAATGGTGATATGTATGTTGACGTCC
+>HWI-EAS91_1_306UPAAXX:6:1:1722:598
+TGAGTTTATTGCTGCCGTCATTGCTTATTATGTTCT
+>HWI-EAS91_1_306UPAAXX:6:1:670:1188
+TTCTGTCAAAAACTGACGCGTTGGATGAGGAGAAGT
+>HWI-EAS91_1_306UPAAXX:6:1:1682:1705
+TAGCCACATAGAAACCAACAGCCATATAACTGGTAG
+>HWI-EAS91_1_306UPAAXX:6:1:1008:1616
+TCCTTTACTTGTCATGCGCTCTAATCTCTGTGCATC
+>HWI-EAS91_1_306UPAAXX:6:1:490:1220
+TAAAAATTTTAATTTTTGCCGCTGAGGGGTTGACCT
+>HWI-EAS91_1_306UPAAXX:6:1:891:1437
+TAATGGTGATATGTATGTTTACGTCCATAAGGCTGT
+>HWI-EAS91_1_306UPAAXX:6:1:1310:321
+TCAATCCCCAATGCTTGGCTTCCATAAGCAGATGGT
+>HWI-EAS91_1_306UPAAXX:6:1:827:1597
+TGCGAGGTACTAAAGGCAAGCGTAAAGGCGCTCGTC
+>HWI-EAS91_1_306UPAAXX:6:1:1062:1158
+TAGAGTCAATAGCAAGGCCACGACGCAATGGAGAAA
+>HWI-EAS91_1_306UPAAXX:6:1:1419:208
+TGGCGCATAATCTCGGAAACCTGCTGTTGCTTGGAA
+>HWI-EAS91_1_306UPAAXX:6:1:691:1018
+AAATATCAACCACACCAGAAGCAGCATCAGTGACGA
+>HWI-EAS91_1_306UPAAXX:6:1:374:113
+GATAAAGCTGTTGCCGATACTTGGAACAATTTCTGT
+>HWI-EAS91_1_306UPAAXX:6:1:1720:784
+TGAGGATAAATTATGTCTAATATTCAAACTGGCGCC
+>HWI-EAS91_1_306UPAAXX:6:1:1424:1394
+ATAAAAATGATTGGCGTATCCAACCTGCAGAGTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1063:1760
+TAGTTCATCAGCAAACGCAGAATCAGCGGTATGGCT
+>HWI-EAS91_1_306UPAAXX:6:1:1235:729
+TTTTTATGTCCCTCATCGTCACGTTTATGGTGAACA
+>HWI-EAS91_1_306UPAAXX:6:1:167:1507
+TAGTGTTATTAATATCAAGTTTTTGGAGCACATTGT
+>HWI-EAS91_1_306UPAAXX:6:1:717:1569
+TCAGGAACCGCCTTCTGGTGATTTGCAAGAACGCGT
+>HWI-EAS91_1_306UPAAXX:6:1:610:765
+TTCAGCGCCTTCCATGATGAGACAGGCCGTTTGAAT
+>HWI-EAS91_1_306UPAAXX:6:1:663:380
+TAAACATTCTGTGCCGCGTTTCTTTGTTCCTTATCT
+>HWI-EAS91_1_306UPAAXX:6:1:790:1358
+TTATCACCTTATTGAAGGCTTATCATTCATTTAGGT
+>HWI-EAS91_1_306UPAAXX:6:1:965:1633
+TAGATGTGGTAGAAGTCGTCATTTGGCGAGAAAGCT
+>HWI-EAS91_1_306UPAAXX:6:1:673:319
+TTCTTGCAAATCACCAGAAGGCGGTTCCTGAATGAT
+>HWI-EAS91_1_306UPAAXX:6:1:684:371
+TAGCGGTAAAGTTAGACCAAACCATGAAACCAACAT
+>HWI-EAS91_1_306UPAAXX:6:1:1147:1444
+ATCTGTCCACGGAGTGCTTAATCCAACTTACCAAGC
+>HWI-EAS91_1_306UPAAXX:6:1:983:678
+ATACCTGGTCTTTCGTATTCTGGCGTGAAGTCGCCG
+>HWI-EAS91_1_306UPAAXX:6:1:1608:1119
+TCACGCGGCGGCAAGTTGCCATACAAAACAGGGTCG
+>HWI-EAS91_1_306UPAAXX:6:1:1048:1193
+TAGTCAGGTTAAATTTAATGTGACCGTTTATCGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:1265:1429
+ATATTTTTCATGGTATTGATAAAGCTGTTGCCGATT
+>HWI-EAS91_1_306UPAAXX:6:1:1607:1677
+TGTTGCTTGGAAAGATTGGTGTTTTCCATAATAGAC
+>HWI-EAS91_1_306UPAAXX:6:1:1087:1421
+ACGAACGTCAGAAGCAGCCTTATGGCCGTCAACATC
+>HWI-EAS91_1_306UPAAXX:6:1:324:490
+GCACCAAACATAAATCACCTCACTTAAGTGGCTGGG
+>HWI-EAS91_1_306UPAAXX:6:1:1596:614
+TTACCGCTACTAAATGCCGCGGATTGGTTTCGCTGT
+>HWI-EAS91_1_306UPAAXX:6:1:343:83
+GTTACGCAGTTTTGCCGCAAGCTGGCTGCTGTACGC
+>HWI-EAS91_1_306UPAAXX:6:1:203:667
+GCATGAATGTGCTTAATAGAGGCCAAGGCGGTCTAG
+>HWI-EAS91_1_306UPAAXX:6:1:34:480
+GGCAAGTTGCCATACAAAACAGGGTCGCCAGCAATT
+>HWI-EAS91_1_306UPAAXX:6:1:606:1743
+TAGCGACAGCTTGGTTTTTAGTGAGTTGTTCCATTC
+>HWI-EAS91_1_306UPAAXX:6:1:254:1391
+TATAATTACCCCAAAAAGAAAGGTATTAAGGATGAG
+>HWI-EAS91_1_306UPAAXX:6:1:1568:1750
+TAACCAGTAGTGTTAACAGTCGGGAGAGGAGTGGCT
+>HWI-EAS91_1_306UPAAXX:6:1:1538:869
+TACCCCAAAAAGAAAGGTATTAAGGATGAGTGTTCA
+>HWI-EAS91_1_306UPAAXX:6:1:255:38
+GTCAGGATTGACACCCTCCCAATTGTATGTTTTCAT
+>HWI-EAS91_1_306UPAAXX:6:1:1543:1555
+TAAAACGCCTCTAATCGGTCGTCAGCCAACGTGAGG
+>HWI-EAS91_1_306UPAAXX:6:1:1365:733
+AGAATCAGCGGTATGGCTCTTCTCCTTTTTTCGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:1604:943
+TACTTGTCATGCGCTCTAATCTCTGGGCATCTGGCT
+>HWI-EAS91_1_306UPAAXX:6:1:1574:1632
+TCAGTATGCAAATTAGCATAAGCAGCTTGCAGACCC
+>HWI-EAS91_1_306UPAAXX:6:1:565:1799
+TCTTGGTCAGTATGCAAATTAGCATAAGCAGCTTGC
+>HWI-EAS91_1_306UPAAXX:6:1:1004:380
+TATTGACTCTACTGTAGACATTTTTACTTTTTATTT
+>HWI-EAS91_1_306UPAAXX:6:1:1345:965
+ATTCAAAGGATAAACATCATAGGCAGTCGGGAGGGT
+>HWI-EAS91_1_306UPAAXX:6:1:1704:756
+TGGTAAAGGACTTCTTGAAGGTACGTTGCAGGCTGG
+>HWI-EAS91_1_306UPAAXX:6:1:310:1346
+TATAACGTTGACGATGTAGCTTTAGTTTTCTTTAAA
+>HWI-EAS91_1_306UPAAXX:6:1:900:1858
+TTTACCGCTTCGGCGTTATAACCTCACACTCAATCT
+>HWI-EAS91_1_306UPAAXX:6:1:1250:1741
+TAAATCCAAAACGGCAGAAGCCTGAATGAGCTTAAT
+>HWI-EAS91_1_306UPAAXX:6:1:1170:1317
+TCAAACTGGCGCCGAGCGTATGCCGCATGACCTTTC
+>HWI-EAS91_1_306UPAAXX:6:1:149:1896
+GCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATC
+>HWI-EAS91_1_306UPAAXX:6:1:1504:494
+TGTCTACAGTAGAGTCAATAGCAAGGCCACGACGCC
+>HWI-EAS91_1_306UPAAXX:6:1:395:256
+GTCCATATCTGACTTTTTGTTAACGTATTTATCCAC
+>HWI-EAS91_1_306UPAAXX:6:1:1110:1109
+ACCGCTTCGGCGTTATAACCTCACACTCAATCTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:895:649
+TTCTGCACGTAATTTTTGACGCACGTTTTCTTCTGC
+>HWI-EAS91_1_306UPAAXX:6:1:827:1378
+TGCAAGCTGCTTATGCTAATTTGCATACTGACCAAG
+>HWI-EAS91_1_306UPAAXX:6:1:1051:1587
+TTTGACACTCTCACGTTGGCTGACGACCGATTAGAG
+>HWI-EAS91_1_306UPAAXX:6:1:1656:1549
+AACCTGCTGTTGCTTGGAAAGATTGGTGTTTTCCAT
+>HWI-EAS91_1_306UPAAXX:6:1:366:150
+GGTCAGTAGCAATCCAAACTTTGTTACTCGTCAGAA
+>HWI-EAS91_1_306UPAAXX:6:1:955:1792
+ATTAAGCTCATTCAGGCTTCTGCCGTTTTGGATTTA
+>HWI-EAS91_1_306UPAAXX:6:1:1340:1403
+ATAAAATGCACCGCATGGAAATGAAGACGGCCATTA
+>HWI-EAS91_1_306UPAAXX:6:1:1693:1017
+TGAGTTTGTATCTGTTACTGAGAAGTTAATGGATGT
+>HWI-EAS91_1_306UPAAXX:6:1:1099:1572
+AATTTTTACCGCTTCGGCGTTATAACCTCACACTCA
+>HWI-EAS91_1_306UPAAXX:6:1:218:1148
+TATGCAAATTAGCATAAGCAGCTTGCAGACCCATAT
+>HWI-EAS91_1_306UPAAXX:6:1:403:614
+TGGTGCTGATGCTTCCTCTGCTGGTATGGTTTACGC
+>HWI-EAS91_1_306UPAAXX:6:1:1651:646
+TCAAGCTCTTGGAAGAGATTCTGTCTTTTCGTATGC
+>HWI-EAS91_1_306UPAAXX:6:1:1566:499
+TGCGGTGCATTTTATGCGGACACTTCCTACAGGTAG
+>HWI-EAS91_1_306UPAAXX:6:1:825:951
+ACAGGCCGTTTGAATGTTTACGGGGTGTACATAATA
+>HWI-EAS91_1_306UPAAXX:6:1:1745:1865
+TTAACTTCTGCGTCATGGAAGCGATAAAACTCTGCG
+>HWI-EAS91_1_306UPAAXX:6:1:973:1992
+TAGTAATTCCTGCTTTATCAAGATAATTTTTCGACT
+>HWI-EAS91_1_306UPAAXX:6:1:171:1653
+TAATAATGTTTTCCGTAAATTCAGCGCCTTCCATGT
+>HWI-EAS91_1_306UPAAXX:6:1:397:363
+TGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGT
+>HWI-EAS91_1_306UPAAXX:6:1:1336:1155
+ATATGTATGTTGACGGCCATAAGGCTGCTTCTGACG
+>HWI-EAS91_1_306UPAAXX:6:1:685:629
+AGTATGCAAATTAGCATAAGCAGCTTGCAGACCCAT
+>HWI-EAS91_1_306UPAAXX:6:1:1409:510
+ACATAATAAGCAATGACGGCAGCAATAAACTCAACA
+>HWI-EAS91_1_306UPAAXX:6:1:1631:998
+AACCATCAGCATGAGCCTGTCGCATTGCATTCATCC
+>HWI-EAS91_1_306UPAAXX:6:1:260:1698
+TTATTATGTTCATCCCGTCAACATTCAAACTGCCTT
+>HWI-EAS91_1_306UPAAXX:6:1:578:971
+TTAACGCTACTAAATTCCGCGGATTGGTTTCGTTGT
+>HWI-EAS91_1_306UPAAXX:6:1:1613:642
+ATAGAAATTTCACGCGGCGGCAAGTTGCCATACAAA
+>HWI-EAS91_1_306UPAAXX:6:1:237:650
+GACGGTATAATAACCACCATCATGGCGACCATTCAA
+>HWI-EAS91_1_306UPAAXX:6:1:1100:1875
+TTATGGTTCGTTCTTATTACCCTTCTGAATGTCACG
+>HWI-EAS91_1_306UPAAXX:6:1:352:32
+GTACCATAAACGCAAGCCTCAACGCAGCGACGAGCC
+>HWI-EAS91_1_306UPAAXX:6:1:443:229
+GCAGTAGGCGGAAAACGAACAAGCGCAAGAGTAAAC
+>HWI-EAS91_1_306UPAAXX:6:1:1131:731
+AGCAGTCGGCGTGTGAATCATTAGCCTTGCGACCCT
+>HWI-EAS91_1_306UPAAXX:6:1:133:1089
+AAGGTTAGTGCTGAGGTTGACTTAGTTCATCATCAA
+>HWI-EAS91_1_306UPAAXX:6:1:65:1307
+TGTGGGATACCCTCGCTTTCCTGCTCCTGTTGAGTT
+>HWI-EAS91_1_306UPAAXX:6:1:905:1493
+TCAGCTTTACCGTCTTTCCAGAAATTGTTCCAAGTT
+>HWI-EAS91_1_306UPAAXX:6:1:733:540
+TGCCGCGGATTGGTTTCGCTGAATCAGGTTATTAAT
+>HWI-EAS91_1_306UPAAXX:6:1:161:1707
+TAATGTCGTCACTGATGCTGCTTCTGTTGTTGTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:380:1861
+TCTGGGCATCTGGCTATGATGTTGATGGAACTGACC
+>HWI-EAS91_1_306UPAAXX:6:1:1761:566
+TATTGGTCGTATGGTTCTTGCTGCCGAGGGTCGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:1486:651
+TGGCGGTATTTCTTCTTCTCTTTCTTGTTGCGCCCT
+>HWI-EAS91_1_306UPAAXX:6:1:508:1380
+TCCATCAACATCATAGCCAGATGCCCAGAGATTAGA
+>HWI-EAS91_1_306UPAAXX:6:1:1763:855
+TGTTTTGTATGGCAACTTGCCGCCGCGTGAAATTTC
+>HWI-EAS91_1_306UPAAXX:6:1:1553:553
+TAATTGCAGGGGCTTCGGCCCCTTACTTGAGGATAA
+>HWI-EAS91_1_306UPAAXX:6:1:1424:507
+TCCACTGCAACAACTGAACGGACTGGAAACACTGGT
+>HWI-EAS91_1_306UPAAXX:6:1:988:135
+TAAGCTGGTTCTCACTTCTGTTACTCCAGCTTCTTC
+>HWI-EAS91_1_306UPAAXX:6:1:810:1918
+TTTTCATCCCGAAGTTGCGGCTCATTCTGATTCTGT
+>HWI-EAS91_1_306UPAAXX:6:1:588:559
+TCTGGTTGAACGGCGTCGCGTCGTAACCCAGCTTGG
+>HWI-EAS91_1_306UPAAXX:6:1:1264:1214
+ATCAGGTTATTAAAGAGATTATTTGTCTCCAGCCAC
+>HWI-EAS91_1_306UPAAXX:6:1:1000:1475
+TCCGTTCTGGTGATTCGTCTAAGAAGTTTAAGATTG
+>HWI-EAS91_1_306UPAAXX:6:1:1389:160
+TTATTCGCCACCATGATTATGACCAGTGTTTCCAGT
+>HWI-EAS91_1_306UPAAXX:6:1:422:1296
+TTCAACTACTCCGGTTATCGCTGGCGACTCCTTCGT
+>HWI-EAS91_1_306UPAAXX:6:1:1273:856
+TAGCCATAGCACCAGAAACAAAACTAGGGGCGGCCT
+>HWI-EAS91_1_306UPAAXX:6:1:450:969
+TGTTTTCCATAATAGACGCAACGCGAGCAGTAGACT
+>HWI-EAS91_1_306UPAAXX:6:1:1202:828
+ATCGTCAACGTTATATTTTGATAGTTTGACGTTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:1721:1800
+GGGTTAGGGACATTAGAGCCTTGACTGACTGAGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:228:2019
+TTGAGTAAGCATTTGGCGCATAATCTCGGAAACCTG
+>HWI-EAS91_1_306UPAAXX:6:1:1579:1214
+ACGTTTGGTCAGTTCCATCAACATCATAGCCAGATG
+>HWI-EAS91_1_306UPAAXX:6:1:429:1055
+TTTTGCCTGTTTGGTTCGCTTTGAGTCTTCTTCTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1359:1689
+AAGAGCAGAAGCAATACCGCCAGCAATAGCACCAAA
+>HWI-EAS91_1_306UPAAXX:6:1:1474:1056
+TCCTCCTGAGACTGAGCTTTCTCGCCAAATGACGAC
+>HWI-EAS91_1_306UPAAXX:6:1:105:1818
+TTGGGGATTGAGAAAGAGTAGAAATGCCACAAGCCT
+>HWI-EAS91_1_306UPAAXX:6:1:208:1538
+TAAAATGCAACTGGACAATCAGAAAGAGATTGCCGA
+>HWI-EAS91_1_306UPAAXX:6:1:1361:1623
+AATCCGTACGTTTCCAGACCGCTTTGGCCTCTATTA
+>HWI-EAS91_1_306UPAAXX:6:1:595:1670
+TGAATCTCTTTAGTCGCAGTAGGCGGAAAACGAACA
+>HWI-EAS91_1_306UPAAXX:6:1:6:1885
+TCTAATGTCGTCACTGATGCTGCTTCTGGTGTGTTT
+>HWI-EAS91_1_306UPAAXX:6:1:706:1085
+TGGTTCGTTCTTATTACCCTTCTGAATGTCACGCTG
+>HWI-EAS91_1_306UPAAXX:6:1:1307:825
+AGCGGTAAAGTTAGACCAAACCATGAAACCAACATA
+>HWI-EAS91_1_306UPAAXX:6:1:762:802
+TGGCATTAACACCATCCTTCATGAACTTAATCCACT
+>HWI-EAS91_1_306UPAAXX:6:1:1657:506
+TTGCGACCCTCGGCAGCAAGAACCATACGACCAATT
+>HWI-EAS91_1_306UPAAXX:6:1:184:811
+TTCTGATAAGCTGGTTCTCACTTCTGTTACTCCAGC
+>HWI-EAS91_1_306UPAAXX:6:1:1469:1718
+TGACCGCTCTCGTGCTCGTCGCTGCGTTGAGGCTTT
+>HWI-EAS91_1_306UPAAXX:6:1:815:1640
+TGGCGGCGATTGCGTACCCGACGACCCAAATTAGGG
+>HWI-EAS91_1_306UPAAXX:6:1:1580:1388
+AAGGCTTCCCATTCATTCAGGAACCGCCTTCTGGTG
+>HWI-EAS91_1_306UPAAXX:6:1:1617:1554
+TACGGGGAAGGACGTCAATAGTCACACAGTCCTTGA
+>HWI-EAS91_1_306UPAAXX:6:1:1544:431
+TGATGCTAAAGGTGAGCCGCTTAAAGCTACCAGTTA
+>HWI-EAS91_1_306UPAAXX:6:1:1604:1541
+TCAGTGACGACATTAGAAATATCCTTTGCAGTAGCG
+>HWI-EAS91_1_306UPAAXX:6:1:1485:741
+ATCAAACGCTGAATAGTAAAGCCTCTACGCGATTTC
+>HWI-EAS91_1_306UPAAXX:6:1:1226:393
+TGCCACAAGCCTCAATAGCAGGTTTAAGAGCCTCGA
+>HWI-EAS91_1_306UPAAXX:6:1:1506:973
+ATTAGGGTCAACGCTACCTGTAGGAAGTGTCCGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:890:1838
+TGTCTAATATTCAAACTGGCGCCGAGCGTATGCCGC
+>HWI-EAS91_1_306UPAAXX:6:1:453:1527
+TAAGAGGGCGTTCAGCAGCCAGCTTGCGGCAAAACT
+>HWI-EAS91_1_306UPAAXX:6:1:1056:570
+ACATTGTAGCATTGTGCCAATTCATCCATTAACTTC
+>HWI-EAS91_1_306UPAAXX:6:1:1736:74
+TATCCGAAAGTGTTAACTTCTGCGTCATGGAAGCGT
+>HWI-EAS91_1_306UPAAXX:6:1:169:1896
+GTATGCAAATTAGCATAAGCAGCTTGCAGACCCATA
+>HWI-EAS91_1_306UPAAXX:6:1:259:949
+TGAGGATAAATTATGTCTAATATTCAAACTTGCTCC
+>HWI-EAS91_1_306UPAAXX:6:1:1205:893
+ATTTCTGGAAAGACGGTAAAGCTGATGGTATTGGCT
+>HWI-EAS91_1_306UPAAXX:6:1:732:1335
+TACTCGTGATTATCTTGCTGCTGCATTTCCTGAGCT
+>HWI-EAS91_1_306UPAAXX:6:1:667:664
+TCTGAGTCCGATGCTGTTCAACCACTAATAGGTAAG
+>HWI-EAS91_1_306UPAAXX:6:1:535:587
+TTAGAGGCGTTTTATGATAATCCCAATGCTTTGCGT
+>HWI-EAS91_1_306UPAAXX:6:1:412:446
+GTGTGGTTGATATTTTTCATGGTATTGATAAAGCTT
+>HWI-EAS91_1_306UPAAXX:6:1:507:1599
+TTGCTGGCGGTTTTTCTTTTTTTTTTTTTTTTTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:575:1818
+TAAAATGCACCGCATGGAAATGAAGACGGCCATTAG
+>HWI-EAS91_1_306UPAAXX:6:1:1568:1428
+ACCAGTTATATGGCTGGTTGTTTTTTTTTTTTTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:1417:982
+AACAAGAGAATCTCTACCATGAACAAAATGTGACTC
+>HWI-EAS91_1_306UPAAXX:6:1:280:1340
+GGCCAAACCAGTGGCGATGGCCGCGCTGGAGGTTTT
+>HWI-EAS91_1_306UPAAXX:6:1:966:144
+TACTAAATGCCGCGGATTGGTTTCGCTGAATCAGGT
+>HWI-EAS91_1_306UPAAXX:6:1:1391:1987
+TAATAATGTTTTCCGTAAATTCAGCGCCTTCCATGT
+>HWI-EAS91_1_306UPAAXX:6:1:1280:278
+ATGGAAATGAAGACGGCCATTAGCTGTACCATACTC
+>HWI-EAS91_1_306UPAAXX:6:1:631:858
+TGATATTGGTCGTATGGTTCTTGCTTCCGTGGGTCT
+>HWI-EAS91_1_306UPAAXX:6:1:518:573
+TTAGGTGTCTGTAAAACAGGTGCCGAAGAAGCTGGT
+>HWI-EAS91_1_306UPAAXX:6:1:54:981
+TTGACATTTTAAAAGAGCGTGGATTACTATCTGATT
+>HWI-EAS91_1_306UPAAXX:6:1:218:1165
+TATTGACTCTACTGTAGACATTTTTACTTTTTATTT
+>HWI-EAS91_1_306UPAAXX:6:1:1727:1530
+TCAACGCAGCGACGAGCACGAGAGCGGTCAGTAGCA
+>HWI-EAS91_1_306UPAAXX:6:1:519:657
+TGAACAGCATCGGACTCAGATAGTAATCCACGCTCT
+>HWI-EAS91_1_306UPAAXX:6:1:939:967
+ATACCGTCAAGGACTGTGTGACTATTGACGTCCTTC
+>HWI-EAS91_1_306UPAAXX:6:1:299:1060
+TATAACTGGTAGCTTTAAGCGGCTCACCTTTAGCAT
+>HWI-EAS91_1_306UPAAXX:6:1:438:665
+TAATTCGTAAACAAGCAGTAGTAATTCCTGCTTTAT
+>HWI-EAS91_1_306UPAAXX:6:1:1303:1971
+AGCATTGTGCCAATTCATCCATTAACTTCTCAGTAA
+>HWI-EAS91_1_306UPAAXX:6:1:214:1264
+TCAGCACCAACAGAAACAACCTGATTAGCGGCGTTG
+>HWI-EAS91_1_306UPAAXX:6:1:1454:1423
+AACGGAAAACATCCTTCATAGAAATTTCACGCGGCG
+>HWI-EAS91_1_306UPAAXX:6:1:1633:340
+TTCCATAATAGACGCAACGCGAGCAGTAGACTCCTT
+>HWI-EAS91_1_306UPAAXX:6:1:671:1196
+ATACGAAAAGACAGAATCTCTTCCAAGAGCTTGATG
diff -r b6ff467f4522 -r 26825f08d362 test-data/phiX.fa
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/phiX.fa Sun Sep 14 14:58:50 2008 -0400
@@ -0,0 +1,79 @@
+>phiX
+GAGTTTTATCGCTTCCATGACGCAGAAGTTAACACTTTCGGATATTTCTGATGAGTCGAAAAATTATCTT
+GATAAAGCAGGAATTACTACTGCTTGTTTACGAATTAAATCGAAGTGGACTGCTGGCGGAAAATGAGAAA
+ATTCGACCTATCCTTGCGCAGCTCGAGAAGCTCTTACTTTGCGACCTTTCGCCATCAACTAACGATTCTG
+TCAAAAACTGACGCGTTGGATGAGGAGAAGTGGCTTAATATGCTTGGCACGTTCGTCAAGGACTGGTTTA
+GATATGAGTCACATTTTGTTCATGGTAGAGATTCTCTTGTTGACATTTTAAAAGAGCGTGGATTACTATC
+TGAGTCCGATGCTGTTCAACCACTAATAGGTAAGAAATCATGAGTCAAGTTACTGAACAATCCGTACGTT
+TCCAGACCGCTTTGGCCTCTATTAAGCTCATTCAGGCTTCTGCCGTTTTGGATTTAACCGAAGATGATTT
+CGATTTTCTGACGAGTAACAAAGTTTGGATTGCTACTGACCGCTCTCGTGCTCGTCGCTGCGTTGAGGCT
+TGCGTTTATGGTACGCTGGACTTTGTGGGATACCCTCGCTTTCCTGCTCCTGTTGAGTTTATTGCTGCCG
+TCATTGCTTATTATGTTCATCCCGTCAACATTCAAACGGCCTGTCTCATCATGGAAGGCGCTGAATTTAC
+GGAAAACATTATTAATGGCGTCGAGCGTCCGGTTAAAGCCGCTGAATTGTTCGCGTTTACCTTGCGTGTA
+CGCGCAGGAAACACTGACGTTCTTACTGACGCAGAAGAAAACGTGCGTCAAAAATTACGTGCGGAAGGAG
+TGATGTAATGTCTAAAGGTAAAAAACGTTCTGGCGCTCGCCCTGGTCGTCCGCAGCCGTTGCGAGGTACT
+AAAGGCAAGCGTAAAGGCGCTCGTCTTTGGTATGTAGGTGGTCAACAATTTTAATTGCAGGGGCTTCGGC
+CCCTTACTTGAGGATAAATTATGTCTAATATTCAAACTGGCGCCGAGCGTATGCCGCATGACCTTTCCCA
+TCTTGGCTTCCTTGCTGGTCAGATTGGTCGTCTTATTACCATTTCAACTACTCCGGTTATCGCTGGCGAC
+TCCTTCGAGATGGACGCCGTTGGCGCTCTCCGTCTTTCTCCATTGCGTCGTGGCCTTGCTATTGACTCTA
+CTGTAGACATTTTTACTTTTTATGTCCCTCATCGTCACGTTTATGGTGAACAGTGGATTAAGTTCATGAA
+GGATGGTGTTAATGCCACTCCTCTCCCGACTGTTAACACTACTGGTTATATTGACCATGCCGCTTTTCTT
+GGCACGATTAACCCTGATACCAATAAAATCCCTAAGCATTTGTTTCAGGGTTATTTGAATATCTATAACA
+ACTATTTTAAAGCGCCGTGGATGCCTGACCGTACCGAGGCTAACCCTAATGAGCTTAATCAAGATGATGC
+TCGTTATGGTTTCCGTTGCTGCCATCTCAAAAACATTTGGACTGCTCCGCTTCCTCCTGAGACTGAGCTT
+TCTCGCCAAATGACGACTTCTACCACATCTATTGACATTATGGGTCTGCAAGCTGCTTATGCTAATTTGC
+ATACTGACCAAGAACGTGATTACTTCATGCAGCGTTACCATGATGTTATTTCTTCATTTGGAGGTAAAAC
+CTCTTATGACGCTGACAACCGTCCTTTACTTGTCATGCGCTCTAATCTCTGGGCATCTGGCTATGATGTT
+GATGGAACTGACCAAACGTCGTTAGGCCAGTTTTCTGGTCGTGTTCAACAGACCTATAAACATTCTGTGC
+CGCGTTTCTTTGTTCCTGAGCATGGCACTATGTTTACTCTTGCGCTTGTTCGTTTTCCGCCTACTGCGAC
+TAAAGAGATTCAGTACCTTAACGCTAAAGGTGCTTTGACTTATACCGATATTGCTGGCGACCCTGTTTTG
+TATGGCAACTTGCCGCCGCGTGAAATTTCTATGAAGGATGTTTTCCGTTCTGGTGATTCGTCTAAGAAGT
+TTAAGATTGCTGAGGGTCAGTGGTATCGTTATGCGCCTTCGTATGTTTCTCCTGCTTATCACCTTCTTGA
+AGGCTTCCCATTCATTCAGGAACCGCCTTCTGGTGATTTGCAAGAACGCGTACTTATTCGCCACCATGAT
+TATGACCAGTGTTTCCAGTCCGTTCAGTTGTTGCAGTGGAATAGTCAGGTTAAATTTAATGTGACCGTTT
+ATCGCAATCTGCCGACCACTCGCGATTCAATCATGACTTCGTGATAAAAGATTGAGTGTGAGGTTATAAC
+GCCGAAGCGGTAAAAATTTTAATTTTTGCCGCTGAGGGGTTGACCAAGCGAAGCGCGGTAGGTTTTCTGC
+TTAGGAGTTTAATCATGTTTCAGACTTTTATTTCTCGCCATAATTCAAACTTTTTTTCTGATAAGCTGGT
+TCTCACTTCTGTTACTCCAGCTTCTTCGGCACCTGTTTTACAGACACCTAAAGCTACATCGTCAACGTTA
+TATTTTGATAGTTTGACGGTTAATGCTGGTAATGGTGGTTTTCTTCATTGCATTCAGATGGATACATCTG
+TCAACGCCGCTAATCAGGTTGTTTCTGTTGGTGCTGATATTGCTTTTGATGCCGACCCTAAATTTTTTGC
+CTGTTTGGTTCGCTTTGAGTCTTCTTCGGTTCCGACTACCCTCCCGACTGCCTATGATGTTTATCCTTTG
+AATGGTCGCCATGATGGTGGTTATTATACCGTCAAGGACTGTGTGACTATTGACGTCCTTCCCCGTACGC
+CGGGCAATAACGTTTATGTTGGTTTCATGGTTTGGTCTAACTTTACCGCTACTAAATGCCGCGGATTGGT
+TTCGCTGAATCAGGTTATTAAAGAGATTATTTGTCTCCAGCCACTTAAGTGAGGTGATTTATGTTTGGTG
+CTATTGCTGGCGGTATTGCTTCTGCTCTTGCTGGTGGCGCCATGTCTAAATTGTTTGGAGGCGGTCAAAA
+AGCCGCCTCCGGTGGCATTCAAGGTGATGTGCTTGCTACCGATAACAATACTGTAGGCATGGGTGATGCT
+GGTATTAAATCTGCCATTCAAGGCTCTAATGTTCCTAACCCTGATGAGGCCGCCCCTAGTTTTGTTTCTG
+GTGCTATGGCTAAAGCTGGTAAAGGACTTCTTGAAGGTACGTTGCAGGCTGGCACTTCTGCCGTTTCTGA
+TAAGTTGCTTGATTTGGTTGGACTTGGTGGCAAGTCTGCCGCTGATAAAGGAAAGGATACTCGTGATTAT
+CTTGCTGCTGCATTTCCTGAGCTTAATGCTTGGGAGCGTGCTGGTGCTGATGCTTCCTCTGCTGGTATGG
+TTGACGCCGGATTTGAGAATCAAAAAGAGCTTACTAAAATGCAACTGGACAATCAGAAAGAGATTGCCGA
+GATGCAAAATGAGACTCAAAAAGAGATTGCTGGCATTCAGTCGGCGACTTCACGCCAGAATACGAAAGAC
+CAGGTATATGCACAAAATGAGATGCTTGCTTATCAACAGAAGGAGTCTACTGCTCGCGTTGCGTCTATTA
+TGGAAAACACCAATCTTTCCAAGCAACAGCAGGTTTCCGAGATTATGCGCCAAATGCTTACTCAAGCTCA
+AACGGCTGGTCAGTATTTTACCAATGACCAAATCAAAGAAATGACTCGCAAGGTTAGTGCTGAGGTTGAC
+TTAGTTCATCAGCAAACGCAGAATCAGCGGTATGGCTCTTCTCATATTGGCGCTACTGCAAAGGATATTT
+CTAATGTCGTCACTGATGCTGCTTCTGGTGTGGTTGATATTTTTCATGGTATTGATAAAGCTGTTGCCGA
+TACTTGGAACAATTTCTGGAAAGACGGTAAAGCTGATGGTATTGGCTCTAATTTGTCTAGGAAATAACCG
+TCAGGATTGACACCCTCCCAATTGTATGTTTTCATGCCTCCAAATCTTGGAGGCTTTTTTATGGTTCGTT
+CTTATTACCCTTCTGAATGTCACGCTGATTATTTTGACTTTGAGCGTATCGAGGCTCTTAAACCTGCTAT
+TGAGGCTTGTGGCATTTCTACTCTTTCTCAATCCCCAATGCTTGGCTTCCATAAGCAGATGGATAACCGC
+ATCAAGCTCTTGGAAGAGATTCTGTCTTTTCGTATGCAGGGCGTTGAGTTCGATAATGGTGATATGTATG
+TTGACGGCCATAAGGCTGCTTCTGACGTTCGTGATGAGTTTGTATCTGTTACTGAGAAGTTAATGGATGA
+ATTGGCACAATGCTACAATGTGCTCCCCCAACTTGATATTAATAACACTATAGACCACCGCCCCGAAGGG
+GACGAAAAATGGTTTTTAGAGAACGAGAAGACGGTTACGCAGTTTTGCCGCAAGCTGGCTGCTGAACGCC
+CTCTTAAGGATATTCGCGATGAGTATAATTACCCCAAAAAGAAAGGTATTAAGGATGAGTGTTCAAGATT
+GCTGGAGGCCTCCACTATGAAATCGCGTAGAGGCTTTGCTATTCAGCGTTTGATGAATGCAATGCGACAG
+GCTCATGCTGATGGTTGGTTTATCGTTTTTGACACTCTCACGTTGGCTGACGACCGATTAGAGGCGTTTT
+ATGATAATCCCAATGCTTTGCGTGACTATTTTCGTGATATTGGTCGTATGGTTCTTGCTGCCGAGGGTCG
+CAAGGCTAATGATTCACACGCCGACTGCTATCAGTATTTTTGTGTGCCTGAGTATGGTACAGCTAATGGC
+CGTCTTCATTTCCATGCGGTGCACTTTATGCGGACACTTCCTACAGGTAGCGTTGACCCTAATTTTGGTC
+GTCGGGTACGCAATCGCCGCCAGTTAAATAGCTTGCAAAATACGTGGCCTTATGGTTACAGTATGCCCAT
+CGCAGTTCGCTACACGCAGGACGCTTTTTCACGTTCTGGTTGGTTGTGGCCTGTTGATGCTAAAGGTGAG
+CCGCTTAAAGCTACCAGTTATATGGCTGTTGGTTTCTATGTGGCTAAATACGTTAACAAAAAGTCAGATA
+TGGACCTTGCTGCTAAAGGTCTAGGAGCTAAAGAATGGAACAACTCACTAAAAACCAAGCTGTCGCTACT
+TCCCAAGAAGCTGTTCAGAATCAGAATGAGCCGCAACTTCGGGATGAAAATGCTCACAATGACAAATCTG
+TCCACGGAGTGCTTAATCCAACTTACCAAGCTGGGTTACGACGCGACGCCGTTCAACCAGATATTGAAGC
+AGAACGCAAAAAGAGAGATGAGATTGAGGCTGGGAAAAGTTACTGTAGCCGACGTTTTGGCGGCGCAACC
+TGTGACGACAAATCTGCTCAAATTTATGCGCGCTTCGATAAAAATGATTGGCGTATCCAACCTGCA
+
1
0

[hg] galaxy 1505: Update MAF stitcher to be more efficient. Requ...
by greg@scofield.bx.psu.edu 22 Sep '08
by greg@scofield.bx.psu.edu 22 Sep '08
22 Sep '08
details: http://www.bx.psu.edu/hg/galaxy/rev/b6ff467f4522
changeset: 1505:b6ff467f4522
user: Dan Blankenberg <dan(a)bx.psu.edu>
date: Fri Sep 12 15:50:20 2008 -0400
description:
Update MAF stitcher to be more efficient. Requires bx-pyhon rev>=449.
2 file(s) affected in this change:
eggs.ini
lib/galaxy/tools/util/maf_utilities.py
diffs (188 lines):
diff -r 4e2ed1801931 -r b6ff467f4522 eggs.ini
--- a/eggs.ini Fri Sep 12 15:35:50 2008 -0400
+++ b/eggs.ini Fri Sep 12 15:50:20 2008 -0400
@@ -55,12 +55,12 @@
MySQL_python = _5.0.51a_static
python_lzo = _static
flup = .dev_r2311
-bx_python = _dev_r448
+bx_python = _dev_r449
nose = .dev_r101
; source location, necessary for scrambling
[source]
-bx_python = http://dist.g2.bx.psu.edu/bx-python_dist-r448.tar.bz2
+bx_python = http://dist.g2.bx.psu.edu/bx-python_dist-r449.tar.bz2
Cheetah = http://umn.dl.sourceforge.net/sourceforge/cheetahtemplate/Cheetah-1.0.tar.gz
DRMAA_python = http://gridengine.sunsource.net/files/documents/7/36/DRMAA-python-0.2.tar.gz
MySQL_python = http://superb-west.dl.sourceforge.net/sourceforge/mysql-python/MySQL-python… http://mysql.mirrors.pair.com/Downloads/MySQL-5.0/mysql-5.0.51a.tar.gz
diff -r 4e2ed1801931 -r b6ff467f4522 lib/galaxy/tools/util/maf_utilities.py
--- a/lib/galaxy/tools/util/maf_utilities.py Fri Sep 12 15:35:50 2008 -0400
+++ b/lib/galaxy/tools/util/maf_utilities.py Fri Sep 12 15:50:20 2008 -0400
@@ -54,11 +54,15 @@
#sets a position for a species
def set_position( self, index, species, base ):
+ if len( base ) != 1: raise "A genomic position can only have a length of 1."
+ return self.set_range( index, species, base )
+ #sets a range for a species
+ def set_range( self, index, species, bases ):
if index >= self.size or index < 0: raise "Your index (%i) is out of range (0 - %i)." % ( index, self.size - 1 )
- if len(base) != 1: raise "A genomic position can only have a length of 1."
+ if len( bases ) == 0: raise "A set of genomic positions can only have a positive length."
if species not in self.sequences.keys(): self.add_species( species )
self.sequences[species].seek( index )
- self.sequences[species].write( base )
+ self.sequences[species].write( bases )
#Flush temp file of specified species, or all species
def flush( self, species = None ):
@@ -164,32 +168,40 @@
except:
return ( None, None )
+def chop_block_by_region( block, src, region, species = None, mincols = 0, force_strand = None ):
+ ref = block.get_component_by_src( src )
+ #We want our block coordinates to be from positive strand
+ if ref.strand == "-":
+ block = block.reverse_complement()
+ ref = block.get_component_by_src( src )
+
+ #save old score here for later use
+ old_score = block.score
+ slice_start = max( region.start, ref.start )
+ slice_end = min( region.end, ref.end )
+
+ #slice block by reference species at determined limits
+ block = block.slice_by_component( ref, slice_start, slice_end )
+
+ if block.text_size > mincols:
+ if ( force_strand is None and region.strand != ref.strand ) or ( force_strand is not None and force_strand != ref.strand ):
+ block = block.reverse_complement()
+ # restore old score, may not be accurate, but it is better than 0 for everything
+ block.score = old_score
+ if species is not None:
+ block = block.limit_to_species( species )
+ block.remove_all_gap_columns()
+ return block
+ return None
#generator yielding only chopped and valid blocks for a specified region
def get_chopped_blocks_for_region( index, src, region, species = None, mincols = 0, force_strand = None ):
- for block in index.get_as_iterator( src, region.start, region.end ):
- ref = block.get_component_by_src( src )
- #We want our block coordinates to be from positive strand
- if ref.strand == "-":
- block = block.reverse_complement()
- ref = block.get_component_by_src( src )
-
- #save old score here for later use
- old_score = block.score
- slice_start = max( region.start, ref.start )
- slice_end = min( region.end, ref.end )
-
- #slice block by reference species at determined limits
- block = block.slice_by_component( ref, slice_start, slice_end )
-
- if block.text_size > mincols:
- if ( force_strand is None and region.strand != ref.strand ) or ( force_strand is not None and force_strand != ref.strand ):
- block = block.reverse_complement()
- # restore old score, may not be accurate, but it is better than 0 for everything
- block.score = old_score
- if species is not None:
- block = block.limit_to_species( species )
- block.remove_all_gap_columns()
- yield block
+ for block, idx, offset in get_chopped_blocks_with_index_offset_for_region( index, src, region, species, mincols, force_strand ):
+ yield block
+def get_chopped_blocks_with_index_offset_for_region( index, src, region, species = None, mincols = 0, force_strand = None ):
+ for block, idx, offset in index.get_as_iterator_with_index_and_offset( src, region.start, region.end ):
+ block = chop_block_by_region( block, src, region, species, mincols )
+ if block is not None:
+ yield block, idx, offset
#returns a filled region alignment for specified regions
def get_region_alignment( index, primary_species, chrom, start, end, strand = '+', species = None, mincols = 0 ):
@@ -199,46 +211,51 @@
#fills a region alignment
def fill_region_alignment( alignment, index, primary_species, chrom, start, end, strand = '+', species = None, mincols = 0 ):
- #first step through blocks, save index and score in array, then order by score (array will start as 0=index0,scoreX)
- #step through ordered list, step through maf blocks, stopping at index, store, then break inner loop
region = bx.intervals.Interval( start, end )
region.chrom = chrom
region.strand = strand
primary_src = "%s.%s" % ( primary_species, chrom )
-
+
+ def reduce_block_by_primary_genome( block ):
+ #returns ( startIndex, {species:texts}
+ #where texts' contents are reduced to only positions existing in the primary genome
+ ref = block.get_component_by_src( primary_src )
+ start_offset = ref.start - start
+ species_texts = {}
+ for c in block.components:
+ species_texts[ c.src.split( '.' )[0] ] = list( c.text )
+ #remove locations which are gaps in the primary species, starting from the downstream end
+ for i in range( len( species_texts[ primary_species ] ) - 1, -1, -1 ):
+ if species_texts[ primary_species ][i] == '-':
+ for text in species_texts.values():
+ text.pop( i )
+ for spec, text in species_texts.items():
+ species_texts[spec] = ''.join( text )
+ return ( start_offset, species_texts )
+
#Order blocks overlaping this position by score, lowest first
- blocks_order = []
- for i, block in enumerate( get_chopped_blocks_for_region( index, primary_src, region, species, mincols ) ):
- for j in range( 0, len( blocks_order ) ):
- if float( block.score ) < float( blocks_order[j]['score'] ):
- blocks_order.insert( j, {'index':i, 'score':block.score} )
+ blocks = []
+ for block, idx, offset in index.get_as_iterator_with_index_and_offset( primary_src, start, end ):
+ score = float( block.score )
+ for i in range( 0, len( blocks ) ):
+ if score < blocks[i][0]:
+ blocks.insert( i, ( score, idx, offset ) )
break
else:
- blocks_order.append( {'index':i, 'score':block.score} )
+ blocks.append( ( score, idx, offset ) )
- #Loop through ordered block indexes and layer blocks by score
- for block_dict in blocks_order:
- for block_index, block in enumerate( get_chopped_blocks_for_region( index, primary_src, region, species, mincols ) ):
- if block_index == block_dict['index']:
- ref = block.get_component_by_src( primary_src )
- #skip gap locations due to insertions in secondary species relative to primary species
- start_offset = ref.start - start
- num_gaps = 0
- for i in range( len( ref.text.rstrip().rstrip("-") ) ):
- if ref.text[i] in ["-"]:
- num_gaps += 1
- continue
- #Set base for all species
- for spec in [ c.src.split( '.' )[0] for c in block.components ]:
- try:
- #NB: If a gap appears in higher scoring secondary species block,
- #it will overwrite any bases that have been set by lower scoring blocks
- #this seems more proper than allowing, e.g. a single base from lower scoring alignment to exist outside of its genomic context
- alignment.set_position( start_offset + i - num_gaps, spec, block.get_component_by_src_start( spec ).text[i] )
- except:
- #species/sequence for species does not exist
- pass
- break
+ #Loop through ordered blocks and layer by increasing score
+ for block_dict in blocks:
+ block = chop_block_by_region( block_dict[1].get_at_offset( block_dict[2] ), primary_src, region, species, mincols, strand )
+ if block is None: continue
+ start_offset, species_texts = reduce_block_by_primary_genome( block )
+ for spec, text in species_texts.items():
+ try:
+ alignment.set_range( start_offset, spec, text )
+ except:
+ #species/sequence for species does not exist
+ pass
+
return alignment
#returns a filled spliced region alignment for specified region with start and end lists
1
0
details: http://www.bx.psu.edu/hg/galaxy/rev/4e2ed1801931
changeset: 1504:4e2ed1801931
user: Anton Nekrutenko <anton(a)bx.psu.edu>
date: Fri Sep 12 15:35:50 2008 -0400
description:
Typos
1 file(s) affected in this change:
tools/sr_mapping/lastz_wrapper.xml
diffs (17 lines):
diff -r 777e41dbdf1f -r 4e2ed1801931 tools/sr_mapping/lastz_wrapper.xml
--- a/tools/sr_mapping/lastz_wrapper.xml Fri Sep 12 15:14:20 2008 -0400
+++ b/tools/sr_mapping/lastz_wrapper.xml Fri Sep 12 15:35:50 2008 -0400
@@ -216,11 +216,11 @@
**Full Parameter List**
-The modes gives you a fuller control over lastz. The description of these and other parameters is found at the end of this page. Note, that not all parameters are included in this interface. If you would like to make additional options available through Galaxy, e-mail us at galaxy-bugs(a)bx.psu.edu.
+This modes gives you a fuller control over lastz. The description of these and other parameters is found at the end of this page. Note, that not all parameters are included in this interface. If you would like to make additional options available through Galaxy, e-mail us at galaxy-bugs(a)bx.psu.edu.
------
-** Do you want to modify reference name?**
+**Do you want to modify reference name?**
This option allows you set the name of the reference sequence manually. This is helpful when, for example, you would like to make reference name compatible with the UCSC naming conventions to be able to display your lastz results as a custom track at UCSC Genome Browser.
1
0
I see from the parameters code that dynamic_options are to be replaced
with options as part of workflow buildout.
I'm finding lots of use cases where the dynamic_options returned by
code from an included module makes some complicated things really easy
for users. For example. In the new gene expression tools, each
expression experiment is stored as a new Galaxy datatype based on the
Bioconductor representation (affybatch, eset etc). Each of those
structures has (optional!) accompanying experimental metadata
(phenodata) which at the time the affybatch is being created, is in
the form of a tab delimited file with a header row. For constructing
design and contrast matrices for analyses, the user has to choose one
or more of those phenodata columns for that experiment - and the
choice typically might be limited to those columns containing
*exactly* two values - ie dichotomous contrasts.
I have code working that allows the user to choose an input (eg
affybatch) experiment file from their history, then to choose from
among *only* the dichotomous phenotype columns, and run the analysis -
you cannot imagine what a big deal this is compared with trying to
teach people to generate design and contrast matrices interactively in
R!
But of course, these miracles all rely on dynamic_options calling some
code included with the tool.
What's the best way forward for a situation where we need to obtain
this kind of drop down list for a tool, that depends on the choice on
a previous page, that will be compatible with workflows in the
long-haul?
I guess one approach is that when generating the (eg affybatch)
metadata, I guess I could create all the option lists I'm going to
ever need as additional metadata datastructures that could be used
like the options from files are used elsewhere - the catch is that
they'd all have to be precomputed rather than being computed on the
fly by the tool - is that reasonable or is there some way to allow
dynamic computing on the metadata (and it's a little complex,
involving parsing the phenodata and constructing a concordance of the
values in each column and eg returning only the columns with exactly
two values)
--
python -c "foo = map(None,'moc.liamg(a)surazal.ssor'); foo.reverse();
print ''.join(foo)"
1
0