- galaxy-dev - lists.galaxyproject.org

[hg] galaxy 2843: Updated several sample index loc files to make...
by Greg Von Kuster 09 Oct '09

09 Oct '09

details: http://www.bx.psu.edu/hg/galaxy/rev/ebe3e881ac25 changeset: 2843:ebe3e881ac25 user: Kelly Vincent <kpvincent(a)bx.psu.edu> date: Wed Oct 07 15:31:18 2009 -0400 description: Updated several sample index loc files to make it clearer how the actual loc files should appear 4 file(s) affected in this change: tool-data/bowtie_indices.loc.sample tool-data/sam_fa_indices.loc.sample tool-data/sequence_index_base.loc.sample tool-data/sequence_index_color.loc.sample diffs (89 lines): diff -r 31c577c6fd49 -r ebe3e881ac25 tool-data/bowtie_indices.loc.sample --- a/tool-data/bowtie_indices.loc.sample Wed Oct 07 15:25:16 2009 -0400 +++ b/tool-data/bowtie_indices.loc.sample Wed Oct 07 15:31:18 2009 -0400 @@ -1,8 +1,8 @@ #This is a sample file distributed with Galaxy that enables tools #to use a directory of Bowtie indexed sequences data files. You will need #to create these data files and then create a bowtie_indices.loc file -#similar to this one (store it in this directory ) that points to -#the directories in which those files are stored. The bowtie_indices.loc +#similar to this one (store it in this directory) that points to +#the directories in which those files are stored. The bowtie_indices.loc #file has this format (white space characters are TAB characters): # #<build> <file_base> @@ -26,3 +26,4 @@ #exist, but it is the prefix for the actual index files. For example: # #hg18 /depot/data2/galaxy/bowtie/hg18/hg18 +#hg19 /depot/data2/galaxy/bowtie/hg19/hg19 diff -r 31c577c6fd49 -r ebe3e881ac25 tool-data/sam_fa_indices.loc.sample --- a/tool-data/sam_fa_indices.loc.sample Wed Oct 07 15:25:16 2009 -0400 +++ b/tool-data/sam_fa_indices.loc.sample Wed Oct 07 15:31:18 2009 -0400 @@ -1,17 +1,17 @@ #This is a sample file distributed with Galaxy that enables tools #to use a directory of Samtools indexed sequences data files. You will need #to create these data files and then create a sam_fa_indices.loc file -#similar to this one (store it in this directory ) that points to -#the directories in which those files are stored. The sam_fa_indices.loc +#similar to this one (store it in this directory) that points to +#the directories in which those files are stored. The sam_fa_indices.loc #file has this format (white space characters are TAB characters): # -#<index> <seq> <location> +#index <seq> <location> # #So, for example, if you had hg18 indexed stored in #/depot/data2/galaxy/sam/, #then the sam_fa_indices.loc entry would look like this: # -#hg18 /depot/data2/galaxy/sam/hg18.fa +#index hg18 /depot/data2/galaxy/sam/hg18.fa # #and your /depot/data2/galaxy/sam/ directory #would contain hg18.fa and hg18.fa.fai files: @@ -24,4 +24,5 @@ #exist, but it should never be directly used. Instead, the name serves #as a prefix for the index file. For example: # -#hg18 /depot/data2/galaxy/sam/hg18.fa +#index hg18 /depot/data2/galaxy/sam/hg18.fa +#index hg19 /depot/data2/galaxy/sam/hg19.fa diff -r 31c577c6fd49 -r ebe3e881ac25 tool-data/sequence_index_base.loc.sample --- a/tool-data/sequence_index_base.loc.sample Wed Oct 07 15:25:16 2009 -0400 +++ b/tool-data/sequence_index_base.loc.sample Wed Oct 07 15:31:18 2009 -0400 @@ -1,8 +1,8 @@ #This is a sample file distributed with Galaxy that enables tools #to use a directory of BWA indexed sequences data files. You will need #to create these data files and then create a sequence_index_base.loc file -#similar to this one (store it in this directory ) that points to -#the directories in which those files are stored. The sequence_index_base.loc +#similar to this one (store it in this directory) that points to +#the directories in which those files are stored. The sequence_index_base.loc #file has this format (white space characters are TAB characters): # #<build> <file_base> @@ -26,3 +26,4 @@ #exist, but it is the prefix for the actual index files. For example: # #phiX /depot/data2/galaxy/phiX/base/phiX.fa +#hg18 /depot/data2/galaxy/hg18/base/hg18.fa diff -r 31c577c6fd49 -r ebe3e881ac25 tool-data/sequence_index_color.loc.sample --- a/tool-data/sequence_index_color.loc.sample Wed Oct 07 15:25:16 2009 -0400 +++ b/tool-data/sequence_index_color.loc.sample Wed Oct 07 15:31:18 2009 -0400 @@ -1,8 +1,8 @@ #This is a sample file distributed with Galaxy that enables tools #to use a directory of BWA indexed sequences data files. You will need #to create these data files and then create a sequence_index_color.loc file -#similar to this one (store it in this directory ) that points to -#the directories in which those files are stored. The sequence_index_color.loc +#similar to this one (store it in this directory) that points to +#the directories in which those files are stored. The sequence_index_color.loc #file has this format (white space characters are TAB characters): # #<build> <file_base> @@ -26,3 +26,4 @@ #exist, but it is the prefix for the actual index files. For example: # #phiX /depot/data2/galaxy/phiX/color/phiX.fa +#hg18 /depot/data2/galaxy/hg18/color/hg18.fa

1 0

[hg] galaxy 2844: Incorporate code to provide UCSC and Gbrowse i...
by Greg Von Kuster 09 Oct '09

09 Oct '09

details: http://www.bx.psu.edu/hg/galaxy/rev/e73efc9387ee changeset: 2844:e73efc9387ee user: Greg Von Kuster <greg(a)bx.psu.edu> date: Wed Oct 07 16:37:48 2009 -0400 description: Incorporate code to provide UCSC and Gbrowse integration for wiggle files contributed by Brad Chapman - handles ticket # 134. 3 file(s) affected in this change: lib/galaxy/datatypes/genetics.py lib/galaxy/datatypes/interval.py lib/galaxy/datatypes/tabular.py diffs (263 lines): diff -r ebe3e881ac25 -r e73efc9387ee lib/galaxy/datatypes/genetics.py --- a/lib/galaxy/datatypes/genetics.py Wed Oct 07 15:31:18 2009 -0400 +++ b/lib/galaxy/datatypes/genetics.py Wed Oct 07 16:37:48 2009 -0400 @@ -56,10 +56,6 @@ def get_estimated_display_viewport( self, dataset ): """Return a chrom, start, stop tuple for viewing a file.""" raise notImplemented - - def as_ucsc_display_file( self, dataset, **kwd ): - """Returns file""" - return file(dataset.file_name,'r') def ucsc_links( self, dataset, type, app, base_url ): """ from the ever-helpful angie hinrichs angie(a)soe.ucsc.edu diff -r ebe3e881ac25 -r e73efc9387ee lib/galaxy/datatypes/interval.py --- a/lib/galaxy/datatypes/interval.py Wed Oct 07 15:31:18 2009 -0400 +++ b/lib/galaxy/datatypes/interval.py Wed Oct 07 16:37:48 2009 -0400 @@ -493,7 +493,6 @@ """Initialize datatype, by adding GBrowse display app""" Tabular.__init__(self, **kwd) self.add_display_app ( 'c_elegans', 'display in Wormbase', 'as_gbrowse_display_file', 'gbrowse_links' ) - def set_meta( self, dataset, overwrite = True, **kwd ): i = 0 for i, line in enumerate( file ( dataset.file_name ) ): @@ -508,7 +507,6 @@ except: pass Tabular.set_meta( self, dataset, overwrite = overwrite, skip = i ) - def make_html_table( self, dataset, skipchars=[] ): """Create HTML table, used for displaying peek""" out = ['<table cellspacing="0" cellpadding="3">'] @@ -524,11 +522,6 @@ except Exception, exc: out = "Can't create peek %s" % exc return out - - def as_gbrowse_display_file( self, dataset, **kwd ): - """Returns file contents that can be displayed in GBrowse apps.""" - return open( dataset.file_name ) - def get_estimated_display_viewport( self, dataset ): """ Return a chrom, start, stop tuple for viewing a file. There are slight differences between gff 2 and gff 3 @@ -568,7 +561,6 @@ return ( seqid, str( start ), str( stop ) ) else: return ( '', '', '' ) - def gbrowse_links( self, dataset, type, app, base_url ): ret_val = [] if dataset.has_data: @@ -582,7 +574,6 @@ link = "%s?start=%s&stop=%s&ref=%s&dbkey=%s" % ( site_url, start, stop, seqid, dataset.dbkey ) ret_val.append( ( site_name, link ) ) return ret_val - def sniff( self, filename ): """ Determines whether the file is in gff format @@ -639,7 +630,6 @@ def __init__(self, **kwd): """Initialize datatype, by adding GBrowse display app""" Gff.__init__(self, **kwd) - def set_meta( self, dataset, overwrite = True, **kwd ): i = 0 for i, line in enumerate( file ( dataset.file_name ) ): @@ -666,7 +656,6 @@ if valid_start and valid_end and start < end and strand in self.valid_gff3_strand and phase in self.valid_gff3_phase: break Tabular.set_meta( self, dataset, overwrite = overwrite, skip = i ) - def sniff( self, filename ): """ Determines whether the file is in gff version 3 format @@ -740,9 +729,70 @@ MetadataElement( name="columns", default=3, desc="Number of columns", readonly=True, visible=False ) + def __init__( self, **kwd ): + Tabular.__init__( self, **kwd ) + self.add_display_app( 'ucsc', 'display at UCSC', 'as_ucsc_display_file', 'ucsc_links' ) + self.add_display_app( 'gbrowse', 'display in Gbrowse', 'as_gbrowse_display_file', 'gbrowse_links' ) + def get_estimated_display_viewport( self, dataset ): + value = ( "", "", "" ) + num_check_lines = 100 # only check up to this many non empty lines + for i, line in enumerate( file( dataset.file_name ) ): + line = line.rstrip( '\r\n' ) + if line and line.startswith( "browser" ): + chr_info = line.split()[-1] + wig_chr, coords = chr_info.split( ":" ) + start, end = coords.split( "-" ) + value = ( wig_chr, start, end ) + break + if i > num_check_lines: + break + return value + def _get_remote_call_url( self, redirect_url, site_name, dataset, type, app, base_url ): + """Retrieve the URL to call out to an external site and retrieve data. + This routes our external URL through a local galaxy instance which makes + the data available, followed by redirecting to the remote site with a + link back to the available information. + """ + internal_url = "%s" % url_for( controller='dataset', dataset_id=dataset.id, action='display_at', filename='%s_%s' % ( type, site_name ) ) + base_url = app.config.get( "display_at_callback", base_url ) + if base_url.startswith( 'https://' ): + base_url = base_url.replace( 'https', 'http', 1 ) + display_url = urllib.quote_plus( "%s%s/display_as?id=%i&display_app=%s&authz_method=display_at" % \ + ( base_url, url_for( controller='root' ), dataset.id, type ) ) + link = '%s?redirect_url=%s&display_url=%s' % ( internal_url, redirect_url, display_url ) + return link + def _get_viewer_range( self, dataset ): + """Retrieve the chromosome, start, end for an external viewer.""" + if dataset.has_data: + viewport_tuple = self.get_estimated_display_viewport( dataset ) + if viewport_tuple: + chrom = viewport_tuple[0] + start = viewport_tuple[1] + stop = viewport_tuple[2] + return ( chrom, start, stop ) + return ( None, None, None ) + def gbrowse_links( self, dataset, type, app, base_url ): + ret_val = [] + chrom, start, stop = self._get_viewer_range( dataset ) + if chrom is not None: + for site_name, site_url in util.get_gbrowse_sites_by_build( dataset.dbkey ): + if site_name in app.config.gbrowse_display_sites: + redirect_url = urllib.quote_plus( "%s%s/?ref=%s&start=%s&stop=%s&eurl=%%s" % ( site_url, dataset.dbkey, chrom, start, stop ) ) + link = self._get_remote_call_url( redirect_url, site_name, dataset, type, app, base_url ) + ret_val.append( ( site_name, link ) ) + return ret_val + def ucsc_links( self, dataset, type, app, base_url ): + ret_val = [] + chrom, start, stop = self._get_viewer_range( dataset ) + if chrom is not None: + for site_name, site_url in util.get_ucsc_by_build( dataset.dbkey ): + if site_name in app.config.ucsc_display_sites: + redirect_url = urllib.quote_plus( "%sdb=%s&position=%s:%s-%s&hgt.customText=%%s" % ( site_url, dataset.dbkey, chrom, start, stop ) ) + link = self._get_remote_call_url( redirect_url, site_name, dataset, type, app, base_url ) + ret_val.append( ( site_name, link ) ) + return ret_val def make_html_table( self, dataset ): return Tabular.make_html_table( self, dataset, skipchars=['track', '#'] ) - def set_meta( self, dataset, overwrite = True, **kwd ): i = 0 for i, line in enumerate( file ( dataset.file_name ) ): @@ -761,7 +811,6 @@ if do_break: break Tabular.set_meta( self, dataset, overwrite = overwrite, skip = i ) - def sniff( self, filename ): """ Determines wether the file is in wiggle format @@ -792,7 +841,6 @@ return False except: return False - def get_track_window(self, dataset, data, start, end): """ Assumes we have a numpy file. @@ -817,7 +865,6 @@ y = data[ t_start : t_end ] return zip(x.tolist(), y.tolist()) - def get_track_resolution( self, dataset, start, end): range = end - start # Determine appropriate resolution to plot ~1000 points @@ -826,7 +873,6 @@ resolution = min( resolution, 100000 ) resolution = max( resolution, 1 ) return resolution - def get_track_type( self ): return "LineTrack" @@ -882,8 +928,6 @@ except: #return "." return ('', '', '') - def as_ucsc_display_file( self, dataset ): - return open(dataset.file_name) def ucsc_links( self, dataset, type, app, base_url ): ret_val = [] if dataset.has_data: @@ -948,58 +992,6 @@ return False return True -class GBrowseTrack ( Tabular ): - """GMOD GBrowseTrack""" - file_ext = "gbrowsetrack" - - def __init__(self, **kwd): - """Initialize datatype, by adding GBrowse display app""" - Tabular.__init__(self, **kwd) - self.add_display_app ('c_elegans', 'display in Wormbase', 'as_gbrowse_display_file', 'gbrowse_links' ) - - def set_readonly_meta( self, dataset, skip=1, **kwd ): - """Resets the values of readonly metadata elements.""" - Tabular.set_readonly_meta( self, dataset, skip = skip, **kwd ) - - def set_meta( self, dataset, overwrite = True, **kwd ): - Tabular.set_meta( self, dataset, overwrite = overwrite, skip = 1 ) - - def make_html_table( self, dataset ): - return Tabular.make_html_table( self, dataset, skipchars=['track', '#'] ) - - def get_estimated_display_viewport( self, dataset ): - #TODO: fix me... - return ('', '', '') - - def gbrowse_links( self, dataset, type, app, base_url ): - ret_val = [] - if dataset.has_data: - viewport_tuple = self.get_estimated_display_viewport(dataset) - if viewport_tuple: - chrom = viewport_tuple[0] - start = viewport_tuple[1] - stop = viewport_tuple[2] - for site_name, site_url in util.get_gbrowse_sites_by_build(dataset.dbkey): - if site_name in app.config.gbrowse_display_sites: - display_url = urllib.quote_plus( "%s%s/display_as?id=%i&display_app=%s" % (base_url, url_for( controller='root' ), dataset.id, type) ) - link = "%sname=%s&ref=%s:%s..%s&eurl=%s" % (site_url, dataset.dbkey, chrom, start, stop, display_url ) - ret_val.append( (site_name, link) ) - return ret_val - - def as_gbrowse_display_file( self, dataset, **kwd ): - """Returns file contents that can be displayed in GBrowse apps.""" - #TODO: fix me... - return open(dataset.file_name) - - def sniff( self, filename ): - """ - Determines whether the file is in gbrowsetrack format. - - GBrowseTrack files are built within Galaxy. - TODO: Not yet sure what this file will look like. Fix this sniffer and add some unit tests here as soon as we know. - """ - return False - if __name__ == '__main__': import doctest, sys doctest.testmod(sys.modules[__name__]) diff -r ebe3e881ac25 -r e73efc9387ee lib/galaxy/datatypes/tabular.py --- a/lib/galaxy/datatypes/tabular.py Wed Oct 07 15:31:18 2009 -0400 +++ b/lib/galaxy/datatypes/tabular.py Wed Oct 07 16:37:48 2009 -0400 @@ -205,6 +205,10 @@ def display_peek( self, dataset ): """Returns formatted html of peek""" return self.make_html_table( dataset ) + def as_gbrowse_display_file( self, dataset, **kwd ): + return open( dataset.file_name ) + def as_ucsc_display_file( self, dataset, **kwd ): + return open( dataset.file_name ) class Taxonomy( Tabular ): def __init__(self, **kwd):

1 0

[hg] galaxy 2841: Added fastq (generic) datatype and deleted fas...
by Greg Von Kuster 09 Oct '09

09 Oct '09

details: http://www.bx.psu.edu/hg/galaxy/rev/6252781aa157 changeset: 2841:6252781aa157 user: Kelly Vincent <kpvincent(a)bx.psu.edu> date: Wed Oct 07 12:43:15 2009 -0400 description: Added fastq (generic) datatype and deleted fastqsolexa datatype 8 file(s) affected in this change: datatypes_conf.xml.sample lib/galaxy/datatypes/registry.py lib/galaxy/datatypes/sequence.py lib/galaxy/datatypes/test/1.fastq lib/galaxy/datatypes/test/2.fastq test-data/1.fastq test-data/2gen.fastq test/functional/test_sniffing_and_metadata_settings.py diffs (380 lines): diff -r ecb6d86a5a9c -r 6252781aa157 datatypes_conf.xml.sample --- a/datatypes_conf.xml.sample Wed Oct 07 11:48:30 2009 -0400 +++ b/datatypes_conf.xml.sample Wed Oct 07 12:43:15 2009 -0400 @@ -22,11 +22,8 @@ <datatype extension="fasta" type="galaxy.datatypes.sequence:Fasta" display_in_upload="true"> <converter file="fasta_to_tabular_converter.xml" target_datatype="tabular"/> </datatype> + <datatype extension="fastq" type="galaxy.datatypes.sequence:Fastq" display_in_upload="true"/> <datatype extension="fastqsanger" type="galaxy.datatypes.sequence:FastqSanger" display_in_upload="true"/> - <datatype extension="fastqsolexa" type="galaxy.datatypes.sequence:FastqSolexa" display_in_upload="true"> - <converter file="fastqsolexa_to_fasta_converter.xml" target_datatype="fasta"/> - <converter file="fastqsolexa_to_qual_converter.xml" target_datatype="qualsolexa"/> - </datatype> <datatype extension="genetrack" type="galaxy.datatypes.tracks:GeneTrack"/> <datatype extension="gff" type="galaxy.datatypes.interval:Gff" display_in_upload="true"> <converter file="gff_to_bed_converter.xml" target_datatype="bed"/> @@ -200,8 +197,8 @@ <sniffer type="galaxy.datatypes.qualityscore:QualityScoreSOLiD"/> <sniffer type="galaxy.datatypes.qualityscore:QualityScore454"/> <sniffer type="galaxy.datatypes.sequence:Fasta"/> - <sniffer type="galaxy.datatypes.sequence:FastqSolexa"/> <sniffer type="galaxy.datatypes.sequence:FastqSanger"/> + <sniffer type="galaxy.datatypes.sequence:Fastq"/> <sniffer type="galaxy.datatypes.interval:Wiggle"/> <sniffer type="galaxy.datatypes.images:Html"/> <sniffer type="galaxy.datatypes.sequence:Axt"/> diff -r ecb6d86a5a9c -r 6252781aa157 lib/galaxy/datatypes/registry.py --- a/lib/galaxy/datatypes/registry.py Wed Oct 07 11:48:30 2009 -0400 +++ b/lib/galaxy/datatypes/registry.py Wed Oct 07 12:43:15 2009 -0400 @@ -119,8 +119,8 @@ 'customtrack' : interval.CustomTrack(), 'csfasta' : sequence.csFasta(), 'fasta' : sequence.Fasta(), + 'fastq' : sequence.Fastq(), 'fastqsanger' : sequence.FastqSanger(), - 'fastqsolexa' : sequence.FastqSolexa(), 'gff' : interval.Gff(), 'gff3' : interval.Gff3(), 'genetrack' : tracks.GeneTrack(), @@ -149,8 +149,8 @@ 'customtrack' : 'text/plain', 'csfasta' : 'text/plain', 'fasta' : 'text/plain', + 'fastq' : 'text/plain', 'fastqsanger' : 'text/plain', - 'fastqsolexa' : 'text/plain', 'gff' : 'text/plain', 'gff3' : 'text/plain', 'interval' : 'text/plain', @@ -179,8 +179,8 @@ qualityscore.QualityScoreSOLiD(), qualityscore.QualityScore454(), sequence.Fasta(), - sequence.FastqSolexa(), sequence.FastqSanger(), + sequence.Fastq(), interval.Wiggle(), images.Html(), sequence.Axt(), diff -r ecb6d86a5a9c -r 6252781aa157 lib/galaxy/datatypes/sequence.py --- a/lib/galaxy/datatypes/sequence.py Wed Oct 07 11:48:30 2009 -0400 +++ b/lib/galaxy/datatypes/sequence.py Wed Oct 07 12:43:15 2009 -0400 @@ -1,5 +1,5 @@ """ -Image classes +Sequence classes """ import data @@ -134,10 +134,10 @@ pass return False -class FastqSolexa( Sequence ): - """Class representing a FASTQ sequence ( the Solexa variant )""" - file_ext = "fastqsolexa" - +class Fastq ( Sequence ): + """Class representing a generic FASTQ sequence""" + file_ext = "fastq" + def set_peek( self, dataset ): if not dataset.dataset.purged: dataset.peek = data.get_file_peek( dataset.file_name ) @@ -145,102 +145,46 @@ else: dataset.peek = 'file does not exist' dataset.blurb = 'file purged from disk' - - def sniff( self, filename ): + + def sniff ( self, filename ): """ - Determines whether the file is in fastqsolexa format (Solexa Variant) + Determines whether the file is in generic fastq format For details, see http://maq.sourceforge.net/fastq.shtml - Note: There are two kinds of FASTQ files, known as "Sanger" (sometimes called "Standard") and Solexa + Note: There are three kinds of FASTQ files, known as "Sanger" (sometimes called "Standard"), Solexa, and Illumina These differ in the representation of the quality scores - >>> fname = get_test_fname( '1.fastqsolexa' ) - >>> FastqSolexa().sniff( fname ) + >>> fname = get_test_fname( '1.fastqsanger' ) + >>> Fastq().sniff( fname ) True - >>> fname = get_test_fname( '2.fastqsolexa' ) - >>> FastqSolexa().sniff( fname ) + >>> fname = get_test_fname( '2.fastqsanger' ) + >>> Fastq().sniff( fname ) True """ headers = get_headers( filename, None ) - bases_regexp = re.compile( "^[NGTAC]*$" ) + bases_regexp = re.compile( "^[NGTAC]*" ) + # check that first block looks like a fastq block try: if len( headers ) >= 4 and headers[0][0] and headers[0][0][0] == "@" and headers[2][0] and headers[2][0][0] == "+" and headers[1][0]: # Check the sequence line, make sure it contains only G/C/A/T/N if not bases_regexp.match( headers[1][0] ): return False - - # Check quality score: integer or ascii char. - try: - check = int(headers[3][0]) - qscore_int = True - except: - qscore_int = False - - # check length and range of quality scores - if qscore_int: - if len( headers[3] ) != len( headers[1][0] ): - return False - if not self.check_qual_values_within_range(headers[3], 'int'): - return False - try: - if not self.check_qual_values_within_range(headers[7], 'int'): - return False - try: - if not self.check_qual_values_within_range(headers[11], 'int'): - return False - except IndexError: - pass - except IndexError: - pass - else: - if len( headers[3][0] ) != len( headers[1][0] ): - return False - if not self.check_qual_values_within_range(headers[3][0], 'char'): - return False - try: - if not self.check_qual_values_within_range(headers[7][0], 'char'): - return False - try: - if not self.check_qual_values_within_range(headers[11][0], 'char'): - return False - except IndexError: - pass - except IndexError: - pass return True return False except: return False - def check_qual_values_within_range( self, qual_seq, score_type ): - if score_type == 'char': - for val in qual_seq: - if ord(val) < 59 or ord(val) > 104: - return False - elif score_type == 'int': - for val in qual_seq: - if int(val) < -5 or int(val) > 40: - return False - return True - -class FastqSanger( Sequence ): + +class FastqSanger( Fastq ): """Class representing a FASTQ sequence ( the Sanger variant )""" file_ext = "fastqsanger" - - def set_peek( self, dataset ): - if not dataset.dataset.purged: - dataset.peek = data.get_file_peek( dataset.file_name ) - dataset.blurb = data.nice_size( dataset.get_size() ) - else: - dataset.peek = 'file does not exist' - dataset.blurb = 'file purged from disk' def sniff( self, filename ): """ Determines whether the file is in fastqsanger format (Sanger Variant) For details, see http://maq.sourceforge.net/fastq.shtml - Note: There are two kinds of FASTQ files, known as "Sanger" (sometimes called "Standard") and Solexa + Note: There are three kinds of FASTQ files, known as "Sanger" (sometimes called "Standard"), Solexa, and Illumina These differ in the representation of the quality scores >>> fname = get_test_fname( '1.fastqsanger' ) @@ -254,60 +198,33 @@ bases_regexp = re.compile( "^[NGTAC]*$" ) try: if len( headers ) >= 4 and headers[0][0] and headers[0][0][0] == "@" and headers[2][0] and headers[2][0][0] == "+" and headers[1][0]: - # Check the sequence line, make sure it contains only G/C/A/T/N - if not bases_regexp.match( headers[1][0] ): - return False - # Check quality score: integer or ascii char. - try: - check = int(headers[3][0]) - qscore_int = True - except: - qscore_int = False - - # check length and range of quality scores - if qscore_int: - if len( headers[3] ) != len( headers[1][0] ): - return False - if not self.check_qual_values_within_range(headers[3], 'int'): - return False + # look through first 20 blocks and make sure bases valid and qualities valid + for i in range( 1, 80, 4 ): try: - if not self.check_qual_values_within_range(headers[7], 'int'): + # check that bases are legitimate + if not bases_regexp.match( headers[i][0] ): return False - try: - if not self.check_qual_values_within_range(headers[11], 'int'): - return False - except IndexError: - pass + # check length of qualities (matching bases) + if len( headers[i+2][0] ) != len( headers[1][0] ): + return False + # check qualities within fastqsanger range + if not self.check_qual_values_within_range( headers[i+2][0] ): + return False except IndexError: pass - else: - if len( headers[3][0] ) != len( headers[1][0] ): - return False - if not self.check_qual_values_within_range(headers[3][0], 'char'): - return False - try: - if not self.check_qual_values_within_range(headers[7][0], 'char'): - return False - try: - if not self.check_qual_values_within_range(headers[11][0], 'char'): - return False - except IndexError: - pass - except IndexError: - pass - return True - return False + return True + return False except: return False - def check_qual_values_within_range( self, qual_seq, score_type ): - if score_type == 'char': - for val in qual_seq: - if ord(val) >= 33 and ord(val) <= 126: - return True - elif score_type == 'int': - for val in qual_seq: - if int(val) >= 0 and int(val) <= 93: - return True + def check_qual_values_within_range( self, qual_seq ): + under59 = False + for val in qual_seq: + if ord(val) < 33 or ord(val) > 126: + return False + if not under59 and ord(val) < 59: + under59 = True + if under59: + return True return False try: @@ -521,7 +438,7 @@ >>> fname = get_test_fname( 'alignment.lav' ) >>> Axt().sniff( fname ) False - """ + """ headers = get_headers( filename, None ) if len(headers) < 4: return False diff -r ecb6d86a5a9c -r 6252781aa157 lib/galaxy/datatypes/test/1.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/lib/galaxy/datatypes/test/1.fastq Wed Oct 07 12:43:15 2009 -0400 @@ -0,0 +1,8 @@ +@HANNIBAL_1_FC302VTAAXX:2:1:228:167 +GAATTGATCAGGACATAGGACAACTGTAGGCACCAT ++HANNIBAL_1_FC302VTAAXX:2:1:228:167 +40 40 40 40 35 40 40 40 25 40 40 26 40 9 33 11 40 35 17 40 40 33 40 7 9 15 3 22 15 30 11 17 9 4 9 4 +@HANNIBAL_1_FC302VTAAXX:2:1:156:340 +GAGTTCTCGTCGCCTGTAGGCACCATCAATCGTATG ++HANNIBAL_1_FC302VTAAXX:2:1:156:340 +40 15 40 17 6 36 40 40 40 25 40 9 35 33 40 14 14 18 15 17 19 28 31 4 24 18 27 14 15 18 2 8 12 8 11 9 \ No newline at end of file diff -r ecb6d86a5a9c -r 6252781aa157 lib/galaxy/datatypes/test/2.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/lib/galaxy/datatypes/test/2.fastq Wed Oct 07 12:43:15 2009 -0400 @@ -0,0 +1,8 @@ +@seq1 +GACAGCTTGGTTTTTAGTGAGTTGTTCCTTTCTTT ++seq1 +hhhhhhhhhhhhhhhhhhhhhhhhhhPW@hhhhhh +@seq2 +GCAATGACGGCAGCAATAAACTCAACAGGTGCTGG ++seq2 +hhhhhhhhhhhhhhYhhahhhhWhAhFhSIJGChO \ No newline at end of file diff -r ecb6d86a5a9c -r 6252781aa157 test-data/1.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/1.fastq Wed Oct 07 12:43:15 2009 -0400 @@ -0,0 +1,8 @@ +@HANNIBAL_1_FC302VTAAXX:2:1:228:167 +GAATTGATCAGGACATAGGACAACTGTAGGCACCAT ++HANNIBAL_1_FC302VTAAXX:2:1:228:167 +40 40 40 40 35 40 40 40 25 40 40 26 40 9 33 11 40 35 17 40 40 33 40 7 9 15 3 22 15 30 11 17 9 4 9 4 +@HANNIBAL_1_FC302VTAAXX:2:1:156:340 +GAGTTCTCGTCGCCTGTAGGCACCATCAATCGTATG ++HANNIBAL_1_FC302VTAAXX:2:1:156:340 +40 15 40 17 6 36 40 40 40 25 40 9 35 33 40 14 14 18 15 17 19 28 31 4 24 18 27 14 15 18 2 8 12 8 11 9 \ No newline at end of file diff -r ecb6d86a5a9c -r 6252781aa157 test-data/2gen.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/2gen.fastq Wed Oct 07 12:43:15 2009 -0400 @@ -0,0 +1,8 @@ +@seq1 +GACAGCTTGGTTTTTAGTGAGTTGTTCCTTTCTTT ++seq1 +hhhhhhhhhhhhhhhhhhhhhhhhhhPW@hhhhhh +@seq2 +GCAATGACGGCAGCAATAAACTCAACAGGTGCTGG ++seq2 +hhhhhhhhhhhhhhYhhahhhhWhAhFhSIJGChO \ No newline at end of file diff -r ecb6d86a5a9c -r 6252781aa157 test/functional/test_sniffing_and_metadata_settings.py --- a/test/functional/test_sniffing_and_metadata_settings.py Wed Oct 07 11:48:30 2009 -0400 +++ b/test/functional/test_sniffing_and_metadata_settings.py Wed Oct 07 12:43:15 2009 -0400 @@ -81,16 +81,6 @@ assert latest_hda is not None, "Problem retrieving fasta hda from the database" if not latest_hda.name == '1.fasta' and not latest_hda.extension == 'fasta': raise AssertionError, "fasta data type was not correctly sniffed." - def test_030_fastqsolexa_datatype( self ): - """Testing correctly sniffing fastqsolexa ( the Solexa variant ) data type upon upload""" - self.upload_file( '1.fastqsolexa' ) - self.verify_dataset_correctness( '1.fastqsolexa' ) - self.check_history_for_string( '1.fastqsolexa format: <span class="fastqsolexa">fastqsolexa</span>, database: \? Info: uploaded fastqsolexa file' ) - latest_hda = galaxy.model.HistoryDatasetAssociation.query() \ - .order_by( desc( galaxy.model.HistoryDatasetAssociation.table.c.create_time ) ).first() - assert latest_hda is not None, "Problem retrieving fastqsolexa hda from the database" - if not latest_hda.name == '1.fastqsolexa' and not latest_hda.extension == 'fastqsolexa': - raise AssertionError, "fastqsolexa data type was not correctly sniffed." def test_035_gff_datatype( self ): """Testing correctly sniffing gff data type upon upload""" self.upload_file( '5.gff' ) @@ -236,6 +226,16 @@ assert latest_hda is not None, "Problem retrieving sam hda from the database" if not latest_hda.name == '1.sam' and not latest_hda.extension == 'sam': raise AssertionError, "sam data type was not correctly sniffed." + def test_095_fastq_datatype( self ): + """Testing correctly sniffing fastq ( generic ) data type upon upload""" + self.upload_file( '2gen.fastq' ) + self.verify_dataset_correctness( '2gen.fastq' ) + self.check_history_for_string( '2gen.fastq format: <span class="fastq">fastq</span>, database: \? Info: uploaded fastq file' ) + latest_hda = galaxy.model.HistoryDatasetAssociation.query() \ + .order_by( desc( galaxy.model.HistoryDatasetAssociation.table.c.create_time ) ).first() + assert latest_hda is not None, "Problem retrieving fastq hda from the database" + if not latest_hda.name == '2gen.fastq' and not latest_hda.extension == 'fastq': + raise AssertionError, "fastq data type was not correctly sniffed." def test_9999_clean_up( self ): self.delete_history( id=self.security.encode_id( history1.id ) ) self.logout()

1 0

[hg] galaxy 2840: Reorganization and rewording of history Option...
by Greg Von Kuster 09 Oct '09

09 Oct '09

details: http://www.bx.psu.edu/hg/galaxy/rev/ecb6d86a5a9c changeset: 2840:ecb6d86a5a9c user: jeremy goecks <jeremy.goecks at emory.edu> date: Wed Oct 07 11:48:30 2009 -0400 description: Reorganization and rewording of history Options menu. 1 file(s) affected in this change: templates/root/index.mako diffs (56 lines): diff -r 3ad620871b25 -r ecb6d86a5a9c templates/root/index.mako --- a/templates/root/index.mako Wed Oct 07 11:18:14 2009 -0400 +++ b/templates/root/index.mako Wed Oct 07 11:48:30 2009 -0400 @@ -6,12 +6,18 @@ $(function(){ $("#history-options-button").css( "position", "relative" ); make_popupmenu( $("#history-options-button"), { - "List your histories": null, - "Stored by you": function() { + "History Lists": null, + "My Saved Histories": function() { galaxy_main.location = "${h.url_for( controller='history', action='list')}"; }, + "My Shared Histories": function() { + galaxy_main.location = "${h.url_for( controller='history', action='list', operation='sharing' )}"; + }, + "Histories Shared with Me": function() { + galaxy_main.location = "${h.url_for( controller='history', action='list_shared')}"; + }, "Current History": null, - "Create new": function() { + "Create New": function() { galaxy_history.location = "${h.url_for( controller='root', action='history_new' )}"; }, "Clone": function() { @@ -20,13 +26,13 @@ "Share": function() { galaxy_main.location = "${h.url_for( controller='history', action='share' )}"; }, - "Extract workflow": function() { + "Extract Workflow": function() { galaxy_main.location = "${h.url_for( controller='workflow', action='build_from_current_history' )}"; }, - "Dataset security": function() { + "Dataset Security": function() { galaxy_main.location = "${h.url_for( controller='root', action='history_set_default_permissions' )}"; }, - "Show deleted datasets": function() { + "Show Deleted Datasets": function() { galaxy_history.location = "${h.url_for( controller='root', action='history', show_deleted=True)}"; }, "Delete": function() @@ -35,13 +41,6 @@ { galaxy_main.location = "${h.url_for( controller='history', action='delete_current' )}"; } - }, - "Manage shared histories": null, - "Shared by you": function() { - galaxy_main.location = "${h.url_for( controller='history', action='list', operation='sharing' )}"; - }, - "Shared with you": function() { - galaxy_main.location = "${h.url_for( controller='history', action='list_shared')}"; } }); });

1 0

[hg] galaxy 2842: Fixed bug in bed_to_interval_index converter.
by Greg Von Kuster 09 Oct '09

09 Oct '09

details: http://www.bx.psu.edu/hg/galaxy/rev/31c577c6fd49 changeset: 2842:31c577c6fd49 user: guru date: Wed Oct 07 15:25:16 2009 -0400 description: Fixed bug in bed_to_interval_index converter. 1 file(s) affected in this change: lib/galaxy/datatypes/converters/bed_to_interval_index_converter.py diffs (14 lines): diff -r 6252781aa157 -r 31c577c6fd49 lib/galaxy/datatypes/converters/bed_to_interval_index_converter.py --- a/lib/galaxy/datatypes/converters/bed_to_interval_index_converter.py Wed Oct 07 12:43:15 2009 -0400 +++ b/lib/galaxy/datatypes/converters/bed_to_interval_index_converter.py Wed Oct 07 15:25:16 2009 -0400 @@ -15,8 +15,8 @@ offset = 0 for line in open(input_fname, "r"): - feature = line.split() - if not feature or feature[0] == "track" or feature[0] == "#": + feature = line.strip().split() + if not feature or feature[0].startswith("track") or feature[0].startswith("#"): offset += len(line) continue chrom = feature[0]

1 0

[hg] galaxy 2837: refactor trackster, better failure message
by Greg Von Kuster 09 Oct '09

09 Oct '09

details: http://www.bx.psu.edu/hg/galaxy/rev/cb842f737d46 changeset: 2837:cb842f737d46 user: Kanwei Li <kanwei(a)gmail.com> date: Tue Oct 06 23:47:08 2009 -0400 description: refactor trackster, better failure message 6 file(s) affected in this change: lib/galaxy/datatypes/indexers/coverage.py lib/galaxy/datatypes/indexers/wiggle.py lib/galaxy/web/controllers/tracks.py static/scripts/trackster.js templates/tracks/browser.mako templates/tracks/new_browser.mako diffs (252 lines): diff -r b25297e88f96 -r cb842f737d46 lib/galaxy/datatypes/indexers/coverage.py --- a/lib/galaxy/datatypes/indexers/coverage.py Tue Oct 06 21:25:01 2009 -0400 +++ b/lib/galaxy/datatypes/indexers/coverage.py Tue Oct 06 23:47:08 2009 -0400 @@ -2,7 +2,7 @@ """ Read a chromosome of coverage data, and write it as a npy array, as -well as averages over regions of progessively larger size in powers of 10 +well as averages over regions of progressively larger size in powers of 10 """ from __future__ import division diff -r b25297e88f96 -r cb842f737d46 lib/galaxy/datatypes/indexers/wiggle.py --- a/lib/galaxy/datatypes/indexers/wiggle.py Tue Oct 06 21:25:01 2009 -0400 +++ b/lib/galaxy/datatypes/indexers/wiggle.py Tue Oct 06 23:47:08 2009 -0400 @@ -2,7 +2,7 @@ """ Read a chromosome of wiggle data, and write it as a npy array, as -well as averages over regions of progessively larger size in powers of 10 +well as averages over regions of progressively larger size in powers of 10 """ from __future__ import division diff -r b25297e88f96 -r cb842f737d46 lib/galaxy/web/controllers/tracks.py --- a/lib/galaxy/web/controllers/tracks.py Tue Oct 06 21:25:01 2009 -0400 +++ b/lib/galaxy/web/controllers/tracks.py Tue Oct 06 23:47:08 2009 -0400 @@ -49,7 +49,8 @@ # FIXME: hardcoding this for now, but it should be derived from the available # converters -browsable_types = set( ["wig", "bed" ] ) +browsable_types = ( "wig", "bed" ) + class TracksController( BaseController ): """ @@ -92,7 +93,7 @@ if dataset.metadata.dbkey == dbkey and dataset.extension in browsable_types: datasets[dataset.id] = (dataset.extension, dataset.name) # Render the template - return trans.fill_template( "tracks/new_browser.mako", dbkey=dbkey, dbkey_set=dbkey_set, datasets=datasets ) + return trans.fill_template( "tracks/new_browser.mako", converters=browsable_types, dbkey=dbkey, dbkey_set=dbkey_set, datasets=datasets ) @web.expose def browser(self, trans, dataset_ids, chrom=""): diff -r b25297e88f96 -r cb842f737d46 static/scripts/trackster.js --- a/static/scripts/trackster.js Tue Oct 06 21:25:01 2009 -0400 +++ b/static/scripts/trackster.js Tue Oct 06 23:47:08 2009 -0400 @@ -5,35 +5,6 @@ var DENSITY = 1000, DATA_ERROR = "There was an error in indexing this dataset.", DATA_NONE = "No data for this chrom/contig."; - -var DataCache = function( type, track ) { - this.type = type; - this.track = track; - this.cache = Object(); -}; -$.extend( DataCache.prototype, { - get: function( resolution, position ) { - var cache = this.cache; - if ( !( cache[resolution] && cache[resolution][position] ) ) { - if ( !cache[resolution] ) { - cache[resolution] = Object(); - } - var low = position * DENSITY * resolution; - var high = ( position + 1 ) * DENSITY * resolution; - cache[resolution][position] = { state: "loading" }; - - $.getJSON( data_url, { track_type: this.track.track_type, chrom: this.track.view.chrom, low: low, high: high, dataset_id: this.track.dataset_id }, function ( data ) { - if( data == "pending" ) { - setTimeout( fetcher, 5000 ); - } else { - cache[resolution][position] = { state: "loaded", values: data }; - } - $(document).trigger( "redraw" ); - }); - } - return cache[resolution][position]; - } -}); var View = function( chrom, max_length ) { this.chrom = chrom; @@ -234,7 +205,7 @@ this.container_div.addClass( "line-track" ); this.content_div.css( "height", this.height_px + "px" ); this.dataset_id = dataset_id; - this.cache = new DataCache( "", this ); + this.cache = new Cache(50); }; $.extend( LineTrack.prototype, TiledTrack.prototype, { init: function() { @@ -254,6 +225,21 @@ } }); }, + get_data: function( resolution, position ) { + var key = resolution + '-' + position, + cache = this.cache; + + if ( !cache[key] ) { + var low = position * DENSITY * resolution, + high = ( position + 1 ) * DENSITY * resolution; + + $.getJSON( data_url, { track_type: this.track_type, chrom: this.view.chrom, low: low, high: high, dataset_id: this.dataset_id }, function ( data ) { + cache[key] = data; + $(document).trigger( "redraw" ); + }); + } + return cache[key]; + }, draw_tile: function( resolution, tile_index, parent_element, w_scale, h_scale ) { if (!this.vertical_range) // We don't have the necessary information yet return; @@ -261,13 +247,13 @@ var tile_low = tile_index * DENSITY * resolution, tile_high = ( tile_index + 1 ) * DENSITY * resolution, tile_length = DENSITY * resolution; - var chunk = this.cache.get( resolution, tile_index ); - var element; - if ( chunk.state == "loading" ) { - element = $("<div class='loading tile'></div>"); - } else { - element = $("<canvas class='tile'></canvas>"); + var data = this.get_data( resolution, tile_index ); + if ( !data ) { + in_path = false; + return null; } + var element = $("<canvas class='tile'></canvas>"); + element.css( { position: "absolute", top: 0, @@ -275,18 +261,13 @@ }); parent_element.append( element ); // Chunk is still loading, do nothing - if ( chunk.state == "loading" ) { - in_path = false; - return null; - } + var canvas = element; canvas.get(0).width = Math.ceil( tile_length * w_scale ); canvas.get(0).height = this.height_px; var ctx = canvas.get(0).getContext("2d"); var in_path = false; ctx.beginPath(); - var data = chunk.values; - if (!data) return; for ( var i = 0; i < data.length - 1; i++ ) { var x = data[i][0] - tile_low; var y = data[i][1]; diff -r b25297e88f96 -r cb842f737d46 templates/tracks/browser.mako --- a/templates/tracks/browser.mako Tue Oct 06 21:25:01 2009 -0400 +++ b/templates/tracks/browser.mako Tue Oct 06 23:47:08 2009 -0400 @@ -7,7 +7,7 @@ <%def name="javascripts()"> ${parent.javascripts()} -${h.js( "jquery", "jquery.event.drag", "jquery.mousewheel", "trackster" )} +${h.js( "jquery", "jquery.event.drag", "jquery.mousewheel", "lrucache", "trackster" )} <script type="text/javascript"> diff -r b25297e88f96 -r cb842f737d46 templates/tracks/new_browser.mako --- a/templates/tracks/new_browser.mako Tue Oct 06 21:25:01 2009 -0400 +++ b/templates/tracks/new_browser.mako Tue Oct 06 23:47:08 2009 -0400 @@ -11,39 +11,48 @@ </script> </%def> -<div class="form"> - <div class="form-title">Select datasets to include in browser</div> - <div id="dbkey" class="form-body"> - <form id="form" method="POST"> - <div class="form-row"> - <label for="dbkey">Reference genome build (dbkey): </label> - <div class="form-row-input"> - <select name="dbkey" id="dbkey" refresh_on_change="true"> - %for tmp_dbkey in dbkey_set: - <option value="${tmp_dbkey}" - %if tmp_dbkey == dbkey: - selected="selected" - %endif - >${tmp_dbkey}</option> - %endfor - </select> +% if not converters: + <div class="errormessagelarge"> + There are no available converters needed for visualization. Please verify that your tool_conf.xml file contains + converters for datatypes (see tool_conf.xml.sample) for examples. + </div> + +% else: + <div class="form"> + <div class="form-title">Select datasets to include in browser</div> + + <div id="dbkey" class="form-body"> + <form id="form" method="POST"> + <div class="form-row"> + <label for="dbkey">Reference genome build (dbkey): </label> + <div class="form-row-input"> + <select name="dbkey" id="dbkey" refresh_on_change="true"> + %for tmp_dbkey in dbkey_set: + <option value="${tmp_dbkey}" + %if tmp_dbkey == dbkey: + selected="selected" + %endif + >${tmp_dbkey}</option> + %endfor + </select> + </div> + <div style="clear: both;"></div> </div> - <div style="clear: both;"></div> + <div class="form-row"> + <label for="dataset_ids">Datasets to include: </label> + %for dataset_id, (dataset_ext, dataset_name) in datasets.iteritems(): + <div> + <input type="checkbox" id="${dataset_id}" name="dataset_ids" value="${dataset_id}" /> + <label style="display:inline; font-weight: normal" for="${dataset_id}">[${dataset_ext}] ${dataset_name}</label> + </div> + %endfor + + <div style="clear: both;"></div> + </div> </div> <div class="form-row"> - <label for="dataset_ids">Datasets to include: </label> - %for dataset_id, (dataset_ext, dataset_name) in datasets.iteritems(): - <div> - <input type="checkbox" id="${dataset_id}" name="dataset_ids" value="${dataset_id}" /> - <label style="display:inline; font-weight: normal" for="${dataset_id}">[${dataset_ext}] ${dataset_name}</label> - </div> - %endfor - - <div style="clear: both;"></div> + <input type="submit" name="browse" value="Browse"/> </div> - </div> - <div class="form-row"> - <input type="submit" name="browse" value="Browse"/> - </div> - </form> -</div> + </form> + </div> +% endif

1 0

[hg] galaxy 2839: ngs updates
by Greg Von Kuster 09 Oct '09

09 Oct '09

details: http://www.bx.psu.edu/hg/galaxy/rev/3ad620871b25 changeset: 2839:3ad620871b25 user: Anton Nekrutenko <anton(a)bx.psu.edu> date: Wed Oct 07 11:18:14 2009 -0400 description: ngs updates 6 file(s) affected in this change: tool_conf.xml.sample tools/fastx_toolkit/fastq_quality_converter.xml tools/fastx_toolkit/fastq_to_fasta.xml tools/fastx_toolkit/fastx_quality_statistics.xml tools/metag_tools/split_paired_reads.xml tools/next_gen_conversion/fastq_gen_conv.xml diffs (251 lines): diff -r 9a75d2428e21 -r 3ad620871b25 tool_conf.xml.sample --- a/tool_conf.xml.sample Wed Oct 07 11:12:39 2009 -0400 +++ b/tool_conf.xml.sample Wed Oct 07 11:18:14 2009 -0400 @@ -72,10 +72,6 @@ <tool file="maf/maf_to_fasta.xml" /> <tool file="fasta_tools/tabular_to_fasta.xml" /> <tool file="fastx_toolkit/fastq_to_fasta.xml" /> - <tool file="next_gen_conversion/solid_to_fastq.xml" /> - <tool file="next_gen_conversion/fastq_conversions.xml" /> - <tool file="fastx_toolkit/fastq_quality_converter.xml" /> - <tool file="next_gen_conversion/fastq_gen_conv.xml" /> </section> <section name="Extract Features" id="features"> <tool file="filters/ucsc_gene_bed_to_exon_bed.xml" /> @@ -175,32 +171,27 @@ </section> <section name="NGS: QC and manipulation" id="cshl_library_information"> <label text="Generic FASTQ data" id="fastq" /> + <tool file="next_gen_conversion/fastq_gen_conv.xml" /> + <tool file="fastx_toolkit/fastq_quality_converter.xml" /> <tool file="fastx_toolkit/fastx_quality_statistics.xml" /> <tool file="fastx_toolkit/fastq_quality_boxplot.xml" /> <tool file="fastx_toolkit/fastx_nucleotides_distribution.xml" /> -  -  - <tool file="fastx_toolkit/fastx_trimmer.xml" /> - <tool file="fastx_toolkit/fastx_renamer.xml" /> - <tool file="fastx_toolkit/fastx_reverse_complement.xml" /> - <tool file="fastx_toolkit/fastx_artifacts_filter.xml" /> - <tool file="fastx_toolkit/fastq_quality_filter.xml" /> -  <tool file="metag_tools/split_paired_reads.xml" /> <label text="Roche-454 data" id="454" /> <tool file="metag_tools/short_reads_figure_score.xml" /> <tool file="metag_tools/short_reads_trim_seq.xml" /> <label text="AB-SOLiD data" id="solid" /> + <tool file="next_gen_conversion/solid_to_fastq.xml" /> <tool file="solid_tools/solid_qual_stats.xml" /> <tool file="solid_tools/solid_qual_boxplot.xml" /> </section> <section name="NGS: Mapping" id="solexa_tools">  + <tool file="sr_mapping/bowtie_wrapper.xml" /> + <tool file="sr_mapping/bwa_wrapper.xml" /> <tool file="metag_tools/megablast_wrapper.xml" /> <tool file="metag_tools/megablast_xml_parser.xml" /> - <tool file="sr_mapping/bowtie_wrapper.xml" /> - <tool file="sr_mapping/bwa_wrapper.xml" /> - </section> + </section> <section name="NGS: SAM Tools" id="samtools"> <tool file="samtools/sam_bitwise_flag_filter.xml" /> <tool file="samtools/sam2interval.xml" /> diff -r 9a75d2428e21 -r 3ad620871b25 tools/fastx_toolkit/fastq_quality_converter.xml --- a/tools/fastx_toolkit/fastq_quality_converter.xml Wed Oct 07 11:12:39 2009 -0400 +++ b/tools/fastx_toolkit/fastq_quality_converter.xml Wed Oct 07 11:18:14 2009 -0400 @@ -2,7 +2,7 @@ <description>(ASCII-Numeric)</description> <command>zcat -f $input | fastq_quality_converter $QUAL_FORMAT -o $output -Q $offset</command> <inputs> - <param format="fastqsolexa,fastqsanger" name="input" type="data" label="Library to convert" /> + <param format="fastq" name="input" type="data" label="Library to convert" /> <param name="QUAL_FORMAT" type="select" label="Desired output format"> <option value="-a">ASCII (letters) quality scores</option> @@ -11,7 +11,7 @@ <param name="offset" type="select" label="FASTQ ASCII offset"> <option value="33">33</option> - <option value="64">64</option> + <option selected="true" value="64">64</option> </param> </inputs> @@ -47,7 +47,7 @@ </tests> <outputs> - <data format="fastqsolexa" name="output" metadata_source="input" /> + <data format="fastq" name="output" metadata_source="input" /> </outputs> <help> diff -r 9a75d2428e21 -r 3ad620871b25 tools/fastx_toolkit/fastq_to_fasta.xml --- a/tools/fastx_toolkit/fastq_to_fasta.xml Wed Oct 07 11:12:39 2009 -0400 +++ b/tools/fastx_toolkit/fastq_to_fasta.xml Wed Oct 07 11:18:14 2009 -0400 @@ -3,7 +3,7 @@ <command>gunzip -cf $input | fastq_to_fasta $SKIPN $RENAMESEQ -o $output -v </command> <inputs> - <param format="fastqsolexa,fastqsanger" name="input" type="data" label="FASTQ Library to convert" /> + <param format="fastq" name="input" type="data" label="FASTQ Library to convert" /> <param name="SKIPN" type="select" label="Discard sequences with unknown (N) bases "> <option value="">yes</option> diff -r 9a75d2428e21 -r 3ad620871b25 tools/fastx_toolkit/fastx_quality_statistics.xml --- a/tools/fastx_toolkit/fastx_quality_statistics.xml Wed Oct 07 11:12:39 2009 -0400 +++ b/tools/fastx_toolkit/fastx_quality_statistics.xml Wed Oct 07 11:18:14 2009 -0400 @@ -3,11 +3,8 @@ <command>zcat -f $input | fastx_quality_stats -o $output -Q $offset</command> <inputs> - <param format="fasta,fastqsolexa,fastqsanger" name="input" type="data" label="Library to analyse" /> - <param name="offset" type="select" label="FASTQ ASCII offset"> - <option value="33">33</option> - <option value="64">64</option> - </param> + <param format="fastqsanger" name="input" type="data" label="Library to analyse" /> + <param name="offset" type="hidden" value="33"/> </inputs> <tests> diff -r 9a75d2428e21 -r 3ad620871b25 tools/metag_tools/split_paired_reads.xml --- a/tools/metag_tools/split_paired_reads.xml Wed Oct 07 11:12:39 2009 -0400 +++ b/tools/metag_tools/split_paired_reads.xml Wed Oct 07 11:18:14 2009 -0400 @@ -4,7 +4,7 @@ split_paired_reads.py $input $output1 $output2 </command> <inputs> - <param name="input" type="data" format="fastqsolexa,fastqsanger" label="Your paired-end file" /> + <param name="input" type="data" format="fastqsanger" label="Your paired-end file" /> </inputs> <outputs> <data name="output1" format="input"/> @@ -12,8 +12,8 @@ </outputs> <tests> <test> - <param name="input" value="split_paired_reads_test1.fastq" ftype="fastqsolexa" /> - <output name="output1" file="split_paired_reads_test1.out1" fype="fastqsolexa" /> + <param name="input" value="split_paired_reads_test1.fastq" ftype="fastqsanger"/> + <output name="output1" file="split_paired_reads_test1.out1" ftype="fastqsanger"/> </test> </tests> <help> diff -r 9a75d2428e21 -r 3ad620871b25 tools/next_gen_conversion/fastq_gen_conv.xml --- a/tools/next_gen_conversion/fastq_gen_conv.xml Wed Oct 07 11:12:39 2009 -0400 +++ b/tools/next_gen_conversion/fastq_gen_conv.xml Wed Oct 07 11:18:14 2009 -0400 @@ -1,5 +1,5 @@ <tool id="fastq_gen_conv" name="FASTQ Groomer" version="1.0.0"> - <description>converts any type of FASTQ file to Sanger type and validates data</description> + <description>converts any FASTQ to Sanger</description> <command interpreter="python"> fastq_gen_conv.py --input=$input @@ -18,24 +18,24 @@ --output=$output </command> <inputs> - <param name="input" type="data" format="fastq" label="FASTQ file to check:" /> + <param name="input" type="data" format="fastq" label="Groom this dataset" /> <conditional name="origTypeChoice"> - <param name="origType" type="select" label="What type of FASTQ do you think this is?"> - <option value="solexa">Solexa</option> - <option value="illumina">Illumina</option> - <option value="sanger">Sanger</option> + <param name="origType" type="select" label="How do you think quality values are scaled?" help="See below for explanation"> + <option value="solexa">Solexa/Illumina 1.0</option> + <option value="illumina">Illumina 1.3+</option> + <option value="sanger">Sanger (validation only)</option> </param> <when value="solexa" /> <when value="illumina" /> <when value="sanger"> <conditional name="howManyBlocks"> - <param name="allOrNot" type="select" label="Do you want to do a subset of lines, or do the whole file?"> - <option value="all">Check all</option> - <option value="not">Select blocks</option> + <param name="allOrNot" type="select" label="Since your fastq is already in Sanger format you can check it for consistency"> + <option value="all">Check all (may take a while)</option> + <option selected="true" value="not">Check selected number of blocks</option> </param> <when value="all" /> <when value="not"> - <param name="blocks" type="integer" value="1000" label="How many blocks (four lines each) do you want to do?" /> + <param name="blocks" type="integer" value="1000" label="How many blocks (four lines each) do you want to check?" /> </when> </conditional> </when> @@ -62,39 +62,45 @@ **What it does** -This tool takes a FASTQ file (Solexa or Illumina) and converts it to Sanger format. It only converts valid blocks. It also can confirm the validity of Sanger FASTQ. +Galaxy pipeline for mapping of Illumina data requires data to be in fastq format with quality values conforming to so called "Sanger" format. Unfortunately there are many other types of fastq. Thus the main objective of this tool is to "groom" multiple types of fastq into Sanger-conforming fastq that can be used in downstream application such as mapping. + +.. class:: infomark + +**TIP**: If the input dataset is already in Sanger format the tool does not perform conversion. However validation (described below) is still performed. ----- -**Example** +**Types of fastq datasets** -- Converting the following Solexa FASTQ file:: +A good description of fastq datasets can be found `here`__, while a description of Galaxy's fastq "logic" can be found `here`__. Because ranges of quality values within different types of fastq datasets overlap it very difficult to detect them automatically. This tool supports conversion of two commonly found types (Solexa/Illumina 1.0 and Illumina 1.3+) into fastq Sanger. - @seq1 - AGTCGTGGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT - + - ;>@BCEFGHJKLMNOPQRSTUVWXYZ[\]^_?abcdefghijklmno - @seq2 - AGTCGTTGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT - + - ;>@BCElcH@KLMNOPQ>STZVWbYu[\]^_?a=;d?fghijklmno - @seq3 - AGTCGTCGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT - + - 7>@BCEFGHJKLMNOPQRSTUVWXYZ[\]^_?abcdefghijklmno + .. __: http://en.wikipedia.org/wiki/FASTQ_format + .. __: http://bitbucket.org/galaxy/galaxy-central/wiki/NGS -- will produce the following Sanger FASTQ data:: +.. class:: warningmark - @seq1 - AGTCGTGGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT - + - "#$%%''()+,-./0123456789:;<=>?@#BCDEFGHIJKLMNOP - @seq2 - AGTCGTTGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT - + - "#$%%'MD)$,-./012#45;78C:V%lt;=>?@#B""E#GHIJKLMNOP - -- Note that seq3 was not converted, because it contained an invalid Solexa quality value (7). +**NOTE** that there is also a type of fastq format where quality values are represented by a list of space-delimited integers (e.g., 40 40 20 15 -5 20 ...). This tool **does not** handle such fastq. If you have such a dataset, it needs to be converted into ASCII-type fastq (where quality values are encoded by characters) by "Numeric-to-ASCII" utility before it can accepted by this tool. + +----- + +**Validation** + +In addition to converting quality values to Sanger format the tool also checks the input dataset for consistency. Specifically, it performs these four checks: + +- skips empty lines +- checks that blocks are properly formed by making sure that: + + #. there are four lines per block + #. the first line starts with "@" + #. the third line starts with "+" + #. lengths of second line (sequences) and the fourth line (quality string) are identical + +- checks that quality values are within range for the chosen fastq format (e.g., the format provided by the user in **How do you think quality values are scaled?** drop down. + +To see exactly what the tool does you can take a look at its source code `here`__. + + .. __: http://bitbucket.org/galaxy/galaxy-central/src/tip/tools/next_gen_conversio… + </help> </tool>

1 0

[hg] galaxy 2836: Added FASTQ \"Groomer\" tool to converters sec...
by Greg Von Kuster 09 Oct '09

09 Oct '09

details: http://www.bx.psu.edu/hg/galaxy/rev/b25297e88f96 changeset: 2836:b25297e88f96 user: Kelly Vincent <kpvincent(a)bx.psu.edu> date: Tue Oct 06 21:25:01 2009 -0400 description: Added FASTQ \"Groomer\" tool to converters section. Relies on new datatype (fastq) which will be added later. 7 file(s) affected in this change: test-data/fastq_gen_conv_in1.fastq test-data/fastq_gen_conv_in2.fastq test-data/fastq_gen_conv_out1.fastqsanger test-data/fastq_gen_conv_out2.fastqsanger tool_conf.xml.sample tools/next_gen_conversion/fastq_gen_conv.py tools/next_gen_conversion/fastq_gen_conv.xml diffs (370 lines): diff -r 2fb0a64c6aaa -r b25297e88f96 test-data/fastq_gen_conv_in1.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/fastq_gen_conv_in1.fastq Tue Oct 06 21:25:01 2009 -0400 @@ -0,0 +1,16 @@ +@seq1 +AGTCGTGGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT ++ +;>@BCEFGHJKLMNOPQRSTUVWXYZ[\]^_?abcdefghijklmno +@seq2 +AGTCGTTGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT ++ +;>@BCElcH@KLMNOPQ>STZVWbYu[\]^_?a=;d?fghijklmno +@seq3 +AGTCGTCGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT ++ +7>@BCEFGHJKLMNOPQRSTUVWXYZ[\]^_?abcdefghijklmno +@seq4 +AGTCGTAGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT ++ +;>@BCEFGHJKLMNOPQRSTUVWXYZ[\]^_?abcdefghijklmno diff -r 2fb0a64c6aaa -r b25297e88f96 test-data/fastq_gen_conv_in2.fastq --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/fastq_gen_conv_in2.fastq Tue Oct 06 21:25:01 2009 -0400 @@ -0,0 +1,24 @@ +@seq1 +AAAGGTTTCTCTTTTGGAAATATCTAAATCCC ++ +!"#$%&\'()*+,-./0123456789:;<=>. +@seq2 +GGGTCTCCCAGAATGATTAGAGCCGTATAGGA ++ +?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\] +@seq3 +GCGGTTCAATACGATTACCACCATGATAAATA ++ +?Aa.1ghB2K!#lk(02GY[[II])Kwl+,5M +@seq4 +AGTCTTTTCCTCTAAAATAACATAGGATACTA ++ +ghY)N375Nh.,Ol>==/<:2#i&d%#KdNII +@seq5 +GAGGACTCATGGTAGGTATTTTACATGACATT ++ +IIgy%hf6#394bd&hNMWL$OPB63II*,+- +@seq6 +GGCCTACATTCATTTACGAGACTAATTAGGGA ++ +IIIIIgd6#5%jKO&.,D+s3aW=cdGB#a1$ \ No newline at end of file diff -r 2fb0a64c6aaa -r b25297e88f96 test-data/fastq_gen_conv_out1.fastqsanger --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/fastq_gen_conv_out1.fastqsanger Tue Oct 06 21:25:01 2009 -0400 @@ -0,0 +1,12 @@ +@seq1 +AGTCGTGGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT ++ +"#$%%''()+,-./0123456789:;<=>?@#BCDEFGHIJKLMNOP +@seq2 +AGTCGTTGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT ++ +"#$%%'MD)$,-./012#45;78C:V<=>?@#B""E#GHIJKLMNOP +@seq4 +AGTCGTAGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT ++ +"#$%%''()+,-./0123456789:;<=>?@#BCDEFGHIJKLMNOP diff -r 2fb0a64c6aaa -r b25297e88f96 test-data/fastq_gen_conv_out2.fastqsanger --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/fastq_gen_conv_out2.fastqsanger Tue Oct 06 21:25:01 2009 -0400 @@ -0,0 +1,12 @@ +@seq1 +AAAGGTTTCTCTTTTGGAAATATCTAAATCCC ++ +!"#$%&\'()*+,-./0123456789:;<=>. +@seq2 +GGGTCTCCCAGAATGATTAGAGCCGTATAGGA ++ +?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\] +@seq3 +GCGGTTCAATACGATTACCACCATGATAAATA ++ +?Aa.1ghB2K!#lk(02GY[[II])Kwl+,5M diff -r 2fb0a64c6aaa -r b25297e88f96 tool_conf.xml.sample --- a/tool_conf.xml.sample Tue Oct 06 16:55:47 2009 -0400 +++ b/tool_conf.xml.sample Tue Oct 06 21:25:01 2009 -0400 @@ -75,6 +75,7 @@ <tool file="next_gen_conversion/solid_to_fastq.xml" /> <tool file="next_gen_conversion/fastq_conversions.xml" /> <tool file="fastx_toolkit/fastq_quality_converter.xml" /> + <tool file="next_gen_conversion/fastq_gen_conv.xml" /> </section> <section name="Extract Features" id="features"> <tool file="filters/ucsc_gene_bed_to_exon_bed.xml" /> diff -r 2fb0a64c6aaa -r b25297e88f96 tools/next_gen_conversion/fastq_gen_conv.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/next_gen_conversion/fastq_gen_conv.py Tue Oct 06 21:25:01 2009 -0400 @@ -0,0 +1,169 @@ +""" +Converts any type of FASTQ file to Sanger type and makes small adjustments if necessary. + +usage: %prog [options] + -i, --input=i: Input FASTQ candidate file + -r, --origType=r: Original type + -a, --allOrNot=a: Whether or not to check all blocks + -b, --blocks=b: Number of blocks to check + -o, --output=o: Output file + +usage: %prog input_file oroutput_file +""" + +import math, sys +from galaxy import eggs +import pkg_resources; pkg_resources.require( "bx-python" ) +from bx.cookbook import doc_optparse + +def stop_err( msg ): + sys.stderr.write( "%s\n" % msg ) + sys.exit() + +def all_bases_valid(seq): + """Confirm that the sequence contains only bases""" + valid_bases = ['a', 'A', 'c', 'C', 'g', 'G', 't', 'T', 'N'] + for base in seq: + if base not in valid_bases: + return False + return True + +def __main__(): + #Parse Command Line + options, args = doc_optparse.parse( __doc__ ) + orig_type = options.origType + if orig_type == 'sanger' and options.allOrNot == 'not': + max_blocks = int(options.blocks) + else: + max_blocks = -1 + fin = file(options.input, 'r') + fout = file(options.output, 'w') + range_min = 1000 + range_max = -5 + block_num = 0 + bad_blocks = 0 + base_len = -1 + line_count = 0 + lines = [] + line = fin.readline() + while line: + if max_blocks >= 0 and block_num > 0 and orig_type == 'sanger' and max_blocks < block_num: + print 'break' + break + if line.strip(): + # the line that starts of a block, with a name + if line_count % 4 == 0 and line.startswith('@'): + lines.append(line) + block_num += 1 + else: + # if we expect a sequence of bases + if line_count % 4 == 1 and all_bases_valid(line.strip()): + lines.append(line) + base_len = len(line.strip()) + # if we expect the second name line + elif line_count % 4 == 2 and line.startswith('+'): + lines.append(line) + # if we expect a sequence of qualities and it's the expected length + elif line_count % 4 == 3: + split_line = line.strip().split() + # decimal qualities + if len(split_line) == base_len: + # convert + phred_list = [] + for ch in split_line: + int_ch = int(ch) + if int_ch < range_min: + range_min = int_ch + if int_ch > range_max: + range_max = int_ch + if int_ch >= 0 and int_ch <= 93: + phred_list.append(chr(int_ch + 33)) + # make sure we haven't lost any quality values + if len(phred_list) == base_len: + # print first three lines + for l in lines: + fout.write(l) + # print converted quality line + fout.write(''.join(phred_list)) + # reset + lines = [] + base_len = -1 + # abort if so + else: + bad_blocks += 1 + lines = [] + base_len = -1 + # ascii qualities + elif len(split_line[0]) == base_len: + qualities = [] + # print converted quality line + if orig_type == 'illumina': + for c in line.strip(): + if ord(c) - 64 < range_min: + range_min = ord(c) - 64 + if ord(c) - 64 > range_max: + range_max = ord(c) - 64 + if ord(c) < 64 or ord(c) > 126: + bad_blocks += 1 + base_len = -1 + lines = [] + break + else: + qualities.append( chr( ord(c) - 31 ) ) + quals = ''.join(qualities) + elif orig_type == 'solexa': + for c in line.strip(): + if ord(c) - 64 < range_min: + range_min = ord(c) - 64 + if ord(c) - 64 > range_max: + range_max = ord(c) - 64 + if ord(c) < 59 or ord(c) > 126: + bad_blocks += 1 + base_len = -1 + lines = [] + break + else: + p = 10.0**( ( ord(c) - 64 ) / -10.0 ) / ( 1 + 10.0**( ( ord(c) - 64 ) / -10.0 ) ) + qualities.append( chr( int( -10.0*math.log10( p ) ) + 33 ) ) + quals = ''.join(qualities) + else: # 'sanger' + for c in line.strip(): + if ord(c) - 33 < range_min: + range_min = ord(c) - 33 + if ord(c) - 33 > range_max: + range_max = ord(c) - 33 + if ord(c) < 33 or ord(c) > 126: + bad_blocks += 1 + base_len = -1 + lines = [] + break + else: + qualities.append(c) + quals = ''.join(qualities) + # make sure we don't have bad qualities + if len(quals) == base_len: + # print first three lines + for l in lines: + fout.write(l) + # print out quality line + fout.write(quals+'\n') + # reset + lines = [] + base_len = -1 + else: + bad_blocks += 1 + base_len = -1 + lines = [] + line_count += 1 + line = fin.readline() + fout.close() + fin.close() + if range_min != 1000 and range_min != -5: + outmsg = 'The range of quality values found were: %s to %s' % (range_min, range_max) + else: + outmsg = '' + if bad_blocks > 0: + outmsg += '\nThere were %s bad blocks skipped' % (bad_blocks) + sys.stdout.write(outmsg) + +if __name__=="__main__": __main__() \ No newline at end of file diff -r 2fb0a64c6aaa -r b25297e88f96 tools/next_gen_conversion/fastq_gen_conv.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/next_gen_conversion/fastq_gen_conv.xml Tue Oct 06 21:25:01 2009 -0400 @@ -0,0 +1,100 @@ +<tool id="fastq_gen_conv" name="FASTQ Groomer" version="1.0.0"> + <description>converts any type of FASTQ file to Sanger type and validates data</description> + <command interpreter="python"> + fastq_gen_conv.py + --input=$input + --origType=$origTypeChoice.origType + #if $origTypeChoice.origType == "sanger": + --allOrNot=$origTypeChoice.howManyBlocks.allOrNot + #if $origTypeChoice.howManyBlocks.allOrNot == "not": + --blocks=$origTypeChoice.howManyBlocks.blocks + #else: + --blocks="None" + #end if + #else: + --allOrNot="None" + --blocks="None" + #end if + --output=$output + </command> + <inputs> + <param name="input" type="data" format="fastq" label="FASTQ file to check:" /> + <conditional name="origTypeChoice"> + <param name="origType" type="select" label="What type of FASTQ do you think this is?"> + <option value="solexa">Solexa</option> + <option value="illumina">Illumina</option> + <option value="sanger">Sanger</option> + </param> + <when value="solexa" /> + <when value="illumina" /> + <when value="sanger"> + <conditional name="howManyBlocks"> + <param name="allOrNot" type="select" label="Do you want to do a subset of lines, or do the whole file?"> + <option value="all">Check all</option> + <option value="not">Select blocks</option> + </param> + <when value="all" /> + <when value="not"> + <param name="blocks" type="integer" value="1000" label="How many blocks (four lines each) do you want to do?" /> + </when> + </conditional> + </when> + </conditional> + </inputs> + <outputs> + <data name="output" format="fastqsanger"/> + </outputs> + <tests> + <test> + <param name="input" value="fastq_gen_conv_in1.fastq" ftype="fastq" /> + <param name="origType" value="solexa" /> + <output name="output" format="fastqsanger" file="fastq_gen_conv_out1.fastqsanger" /> + </test> + <test> + <param name="input" value="fastq_gen_conv_in2.fastq" ftype="fastq" /> + <param name="origType" value="sanger" /> + <param name="allOrNot" value="not" /> + <param name="blocks" value="3" /> + <output name="output" format="fastqsanger" file="fastq_gen_conv_out2.fastqsanger" /> + </test> + </tests> + <help> + +**What it does** + +This tool takes a FASTQ file (Solexa or Illumina) and converts it to Sanger format. It only converts valid blocks. It also can confirm the validity of Sanger FASTQ. + +----- + +**Example** + +- Converting the following Solexa FASTQ file:: + + @seq1 + AGTCGTGGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT + + + ;>@BCEFGHJKLMNOPQRSTUVWXYZ[\]^_?abcdefghijklmno + @seq2 + AGTCGTTGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT + + + ;>@BCElcH@KLMNOPQ>STZVWbYu[\]^_?a=;d?fghijklmno + @seq3 + AGTCGTCGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT + + + 7>@BCEFGHJKLMNOPQRSTUVWXYZ[\]^_?abcdefghijklmno + +- will produce the following Sanger FASTQ data:: + + @seq1 + AGTCGTGGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT + + + "#$%%''()+,-./0123456789:;<=>?@#BCDEFGHIJKLMNOP + @seq2 + AGTCGTTGTCATCGTGACTAGTCGATCTAGCTAGCTCTCTAGAGTGT + + + "#$%%'MD)$,-./012#45;78C:V%lt;=>?@#B""E#GHIJKLMNOP + +- Note that seq3 was not converted, because it contained an invalid Solexa quality value (7). + + </help> +</tool>

1 0

[hg] galaxy 2838: Add Arabidopsis integration at TAIR and UCLA p...
by Greg Von Kuster 09 Oct '09

09 Oct '09

details: http://www.bx.psu.edu/hg/galaxy/rev/9a75d2428e21 changeset: 2838:9a75d2428e21 user: Greg Von Kuster <greg(a)bx.psu.edu> date: Wed Oct 07 11:12:39 2009 -0400 description: Add Arabidopsis integration at TAIR and UCLA patch provided by Brad Chapman ( handles # 135 ). 6 file(s) affected in this change: lib/galaxy/config.py tool-data/shared/gbrowse/gbrowse_build_sites.txt tool-data/shared/ucsc/builds.txt tool-data/shared/ucsc/manual_builds.txt tool-data/shared/ucsc/ucsc_build_sites.txt universe_wsgi.ini.sample diffs (66 lines): diff -r cb842f737d46 -r 9a75d2428e21 lib/galaxy/config.py --- a/lib/galaxy/config.py Tue Oct 06 23:47:08 2009 -0400 +++ b/lib/galaxy/config.py Wed Oct 07 11:12:39 2009 -0400 @@ -73,8 +73,8 @@ self.use_memdump = string_as_bool( kwargs.get( 'use_memdump', 'False' ) ) self.log_memory_usage = string_as_bool( kwargs.get( 'log_memory_usage', 'False' ) ) self.log_events = string_as_bool( kwargs.get( 'log_events', 'False' ) ) - self.ucsc_display_sites = kwargs.get( 'ucsc_display_sites', "main,test,archaea" ).lower().split(",") - self.gbrowse_display_sites = kwargs.get( 'gbrowse_display_sites', "main,test" ).lower().split(",") + self.ucsc_display_sites = kwargs.get( 'ucsc_display_sites', "main,test,archaea,ucla" ).lower().split(",") + self.gbrowse_display_sites = kwargs.get( 'gbrowse_display_sites', "main,test,tair" ).lower().split(",") self.genetrack_display_sites = kwargs.get( 'genetrack_display_sites', "main,test" ).lower().split(",") self.brand = kwargs.get( 'brand', None ) self.wiki_url = kwargs.get( 'wiki_url', 'http://g2.trac.bx.psu.edu/' ) diff -r cb842f737d46 -r 9a75d2428e21 tool-data/shared/gbrowse/gbrowse_build_sites.txt --- a/tool-data/shared/gbrowse/gbrowse_build_sites.txt Tue Oct 06 23:47:08 2009 -0400 +++ b/tool-data/shared/gbrowse/gbrowse_build_sites.txt Wed Oct 07 11:12:39 2009 -0400 @@ -1,3 +1,4 @@ # wormbase sites / supported genomes -main http://www.wormbase.org/db/seq/gbgff/c_elegans/ c_elegans,c_briggsae,c_remanei,c_brenneri,c_japonica,p_pristionchus,b_malayi -test http://dev.wormbase.org/db/seq/gbrowse/c_elegans/ c_elegans,c_briggsae,c_remanei,c_brenneri,c_japonica,p_pristionchus,b_malayi +main http://www.wormbase.org/db/seq/gbgff/c_elegans/ c_elegans,c_briggsae,c_remanei,c_brenneri,c_japonica,p_pristionchus,b_malayi +test http://dev.wormbase.org/db/seq/gbrowse/c_elegans/ c_elegans,c_briggsae,c_remanei,c_brenneri,c_japonica,p_pristionchus,b_malayi +tair http://arabidopsis.org/cgi-bin/gbrowse/ arabidopsis_tair8,arabidopsis diff -r cb842f737d46 -r 9a75d2428e21 tool-data/shared/ucsc/builds.txt --- a/tool-data/shared/ucsc/builds.txt Tue Oct 06 23:47:08 2009 -0400 +++ b/tool-data/shared/ucsc/builds.txt Wed Oct 07 11:12:39 2009 -0400 @@ -786,3 +786,6 @@ aeroHydr_ATCC7966 Aeromonas hydrophila subsp. hydrophila ATCC 7966 (aeroHydr_ATCC7966) baciAnth_AMES Bacillus anthracis str. Ames (baciAnth_AMES) shewOnei Shewanella oneidensis MR-1 (shewOnei) +arabidopsis Arabidopsis thaliana TAIR9 +arabidopsis_tair8 Arabidopsis thaliana TAIR8 +araTha1 Arabidopsis thaliana TAIR7 diff -r cb842f737d46 -r 9a75d2428e21 tool-data/shared/ucsc/manual_builds.txt --- a/tool-data/shared/ucsc/manual_builds.txt Tue Oct 06 23:47:08 2009 -0400 +++ b/tool-data/shared/ucsc/manual_builds.txt Wed Oct 07 11:12:39 2009 -0400 @@ -665,3 +665,6 @@ shewOnei Shewanella oneidensis MR-1 plasmid_pMR-1=161613,chr=4969803 15217 Human herpesvirus 1 NC_001806=152261 lMaj5 Leishmania major 2005 chr1=268984,chr2=355714,chr3=384518,chr4=441313,chr5=465823,chr6=516874,chr7=596348,chr8=574972,chr9=573441,chr10=570864,chr11=582575,chr12=675347,chr13=654604,chr14=622648,chr15=629514,chr16=714659,chr17=684831,chr18=739751,chr19=702212,chr20=742551,chr21=772974,chr22=716608,chr23=772567,chr24=840950,chr25=912849,chr26=1091579,chr27=1130447,chr28=1160128,chr29=1212674,chr30=1403454,chr31=1484336,chr32=1604650,chr33=1583673,chr34=1866754,chr35=2090491,chr36=2682183 +arabidopsis Arabidopsis thaliana TAIR9 +arabidopsis_tair8 Arabidopsis thaliana TAIR8 +araTha1 Arabidopsis thaliana TAIR7 diff -r cb842f737d46 -r 9a75d2428e21 tool-data/shared/ucsc/ucsc_build_sites.txt --- a/tool-data/shared/ucsc/ucsc_build_sites.txt Tue Oct 06 23:47:08 2009 -0400 +++ b/tool-data/shared/ucsc/ucsc_build_sites.txt Wed Oct 07 11:12:39 2009 -0400 @@ -4,3 +4,4 @@ archaea http://archaea.ucsc.edu/cgi-bin/hgTracks? alkaEhrl_MLHE_1,shewW318,idioLoih_L2TR,sulSol1,erwiCaro_ATROSEPTICA,symbTher_IAM14863,moorTher_ATCC39073,therFusc_YX,methHung1,bradJapo,therElon,shewPutrCN32,pediPent_ATCC25745,mariMari_MCS10,nanEqu1,baciSubt,chlaTrac,magnMagn_AMB_1,chroViol,ralsSola,acidCryp_JF_5,erytLito_HTCC2594,desuVulg_HILDENBOROUG,pyrAer1,sulfToko1,shewANA3,paraSp_UWE25,geobKaus_HTA426,rhizEtli_CFN_42,uncuMeth_RCI,candBloc_FLORIDANUS,deinRadi,yersPest_CO92,saccEryt_NRRL_2338,rhodRHA1,candCars_RUDDII,burkMall_ATCC23344,eschColi_O157H7,burk383,psycIngr_37,rhodSpha_2_4_1,wolbEndo_OF_DROSOPHIL,burkViet_G4,propAcne_KPA171202,enteFaec_V583,campJeju_81_176,acidJS42,heliPylo_26695,pseuHalo_TAC125,chroSale_DSM3043,methVann1,archFulg1,neisMeni_Z2491_1,fusoNucl,vermEise_EF01_2,anabVari_ATCC29413,tropWhip_TW08_27,heliHepa,acinSp_ADP1,anapMarg_ST_MARIES,natrPhar1,haheChej_KCTC_2396,therPetr_RKU_1,neisGono_FA1090_1,colwPsyc_34H,desuPsyc_LSV54,hyphNept_ATCC15444,vibrC hol1,deinGeot_DSM11300,strePyog_M1_GAS,franCcI3,salmTyph,metaSedu,lactSali_UCC118,trepPall,neisMeni_MC58_1,syntWolf_GOETTINGEN,flavJohn_UW101,methBoon1,haemSomn_129PT,shewLoihPV4,igniHosp1,haemInfl_KW20,haloHalo_SL1,ferrAcid1,sphiAlas_RB2256,candPela_UBIQUE_HTCC1,caldSacc_DSM8903,aerPer1,lactPlan,carbHydr_Z_2901,therTher_HB8,vibrVuln_YJ016_1,rhodPalu_CGA009,acidCell_11B,siliPome_DSS_3,therVolc1,haloWals1,rubrXyla_DSM9941,shewAmaz,nocaJS61,vibrVuln_CMCP6_1,sinoMeli,ureaUrea,baciHalo,bartHens_HOUSTON_1,nitrWino_NB_255,hypeButy1,methBurt2,polaJS66,mesoLoti,methMari_C7,caulCres,neisMeni_FAM18_1,acidBact_ELLIN345,caldMaqu1,salmEnte_PARATYPI_ATC,glucOxyd_621H,cytoHutc_ATCC33406,nitrEuro,therMari,coxiBurn,woliSucc,heliPylo_HPAG1,mesoFlor_L1,pyrHor1,methAeol1,procMari_CCMP1375,pyroArse1,oenoOeni_PSU_1,alcaBork_SK2,wiggBrev,actiPleu_L20,lactLact,methJann1,paraDeni_PD1222,borrBurg,pyroIsla1,orieTsut_BORYONG,shewMR4,methKand1,methCaps_BATH,onioYell_PHYTOPLASMA,bordBron,cenaSymb1,burkCe no_HI2424,franTula_TULARENSIS,pyrFur2,mariAqua_VT8,heliPylo_J99,psycArct_273_4,vibrChol_MO10_1,vibrPara1,rickBell_RML369_C,metAce1,buchSp,ehrlRumi_WELGEVONDEN,methLabrZ_1,chlaPneu_CWL029,thioCrun_XCL_2,pyroCali1,chloTepi_TLS,stapAure_MU50,novoArom_DSM12444,magnMC1,zymoMobi_ZM4,salmTyph_TY2,chloChlo_CAD3,azoaSp_EBN1,therTher_HB27,bifiLong,picrTorr1,listInno,bdelBact,gramFors_KT0803,sulfAcid1,geobTher_NG80_2,peloCarb,ralsEutr_JMP134,mannSucc_MBEL55E,syneSp_WH8102,methTherPT1,clavMich_NCPPB_382,therAcid1,syntAcid_SB,porpGing_W83,therNeut0,leifXyli_XYLI_CTCB0,shewFrig,photProf_SS9,thioDeni_ATCC25259,methMaze1,desuRedu_MI_1,burkThai_E264,campFetu_82_40,blocFlor,jannCCS1,nitrMult_ATCC25196,streCoel,soliUsit_ELLIN6076,pastMult,saliRube_DSM13855,methTher1,nostSp,shigFlex_2A,saccDegr_2_40,oceaIhey,dehaEthe_195,rhodRubr_ATCC11170,arthFB24,shewMR7,pireSp,anaeDeha_2CP_C,haloVolc1,dichNodo_VCS1703A,tricEryt_IMS101,mycoGeni,thioDeni_ATCC33889,methSmit1,geobUran_RF4,shewDeni,halMar1,desuHa fn_Y51,methStad1,granBeth_CGDNIH1,therPend1,legiPneu_PHILADELPHIA,vibrChol_O395_1,nitrOcea_ATCC19707,campJeju_RM1221,methPetr_PM1,heliAcin_SHEEBA,eschColi_APEC_O1,peloTher_SI,haloHalo1,syntFuma_MPOB,xyleFast,gloeViol,leucMese_ATCC8293,bactThet_VPI_5482,xantCamp,sodaGlos_MORSITANS,geobSulf,roseDeni_OCH_114,coryEffi_YS_314,brucMeli,mycoTube_H37RV,vibrFisc_ES114_1,pyrAby1,burkXeno_LB400,polyQLWP,stapMari1,peloLute_DSM273,burkCeno_AU_1054,shewBalt,nocaFarc_IFM10152,ente638,mculMari1,saliTrop_CNB_440,neorSenn_MIYAYAMA,aquiAeol,dechArom_RCB,myxoXant_DK_1622,burkPseu_1106A,burkCepa_AMMD,methMari_C5_1,azorCaul2,methFlag_KT,leptInte,eschColi_K12,synePCC6,baumCica_HOMALODISCA,methBark1,pseuAeru,geobMeta_GS15,eschColi_CFT073,photLumi,metMar1,hermArse,campJeju,therKoda1,aeroHydr_ATCC7966,baciAnth_AMES,shewOnei,therTeng,lawsIntr_PHE_MN1_00 #Harvested from http://genome-test.cse.ucsc.edu/cgi-bin/das/dsn test http://genome-test.cse.ucsc.edu/cgi-bin/hgTracks? anoCar1,ce4,ce3,ce2,ce1,loxAfr1,rn2,eschColi_O157H7_1,rn4,droYak1,heliPylo_J99_1,droYak2,dp3,dp2,caeRem2,caeRem1,oryLat1,eschColi_K12_1,homIni13,homIni14,droAna1,droAna2,oryCun1,sacCer1,heliHepa1,droGri1,sc1,dasNov1,choHof1,tupBel1,mm9,mm8,vibrChol1,mm5,mm4,mm7,mm6,mm3,mm2,rn3,venter1,galGal3,galGal2,ornAna1,equCab1,cioSav2,rheMac2,eutHer13,droPer1,droVir2,droVir1,heliPylo_26695_1,euaGli13,calJac1,campJeju1,droSim1,hg13,hg15,hg16,hg17,monDom1,monDom4,droMoj1,petMar1,droMoj2,vibrChol_MO10_1,vibrPara1,gliRes13,vibrVuln_YJ016_1,braFlo1,cioSav1,lauRas13,dm1,canFam1,canFam2,ci1,echTel1,ci2,caePb1,dm3,ponAbe2,falciparum,xenTro1,xenTro2,nonAfr13,fr2,fr1,gasAcu1,dm2,apiMel1,apiMel2,eschColi_O157H7EDL933_1,priPac1,panTro1,hg18,panTro2,campJeju_RM1221_1,canHg12,vibrChol_O395_1,vibrFisc_ES114_1,danRer5,danRer4,danRer3,danRer2,danRer1,tetNig1,afrOth13,bosTau1,eschColi_CFT073_1,bosTau3,bosTau2,bosTau4,rodEnt13,droEre1,priMat13,vibrVu ln_CMCP6_1,cb2,cb3,cb1,borEut13,droSec1,felCat3,strPur1,strPur2,otoGar1,catArr1,anoGam1,triCas2 +ucla http://epigenomics.mcdb.ucla.edu/cgi-bin/hgTracks? araTha1 diff -r cb842f737d46 -r 9a75d2428e21 universe_wsgi.ini.sample --- a/universe_wsgi.ini.sample Tue Oct 06 23:47:08 2009 -0400 +++ b/universe_wsgi.ini.sample Wed Oct 07 11:12:39 2009 -0400 @@ -91,8 +91,8 @@ use_new_layout = true # Comma separated list of UCSC / gbrowse / GeneTrack browsers to use for viewing -ucsc_display_sites = main,test,archaea -gbrowse_display_sites = main,test +ucsc_display_sites = main,test,archaea,ucla +gbrowse_display_sites = main,test,tair genetrack_display_sites = main,test # Serving static files (needed if running standalone)

1 0

[hg] galaxy 2835: Fixing broken tool configs
by Greg Von Kuster 09 Oct '09

09 Oct '09

details: http://www.bx.psu.edu/hg/galaxy/rev/2fb0a64c6aaa changeset: 2835:2fb0a64c6aaa user: guru date: Tue Oct 06 16:55:47 2009 -0400 description: Fixing broken tool configs 3 file(s) affected in this change: tools/fastx_toolkit/fastx_reverse_complement.xml tools/fastx_toolkit/fastx_trimmer.xml tools/filters/trimmer.xml diffs (35 lines): diff -r b14f99a4f736 -r 2fb0a64c6aaa tools/fastx_toolkit/fastx_reverse_complement.xml --- a/tools/fastx_toolkit/fastx_reverse_complement.xml Tue Oct 06 15:00:32 2009 -0400 +++ b/tools/fastx_toolkit/fastx_reverse_complement.xml Tue Oct 06 16:55:47 2009 -0400 @@ -47,6 +47,7 @@ TACCNNCTTTGAATTACAAGGANGAGGCTACAGACA +CSHL_1_FC42AGWWWXX:8:1:3:740 26 27 17 15 5 5 24 26 29 31 32 33 27 21 27 33 33 33 33 33 33 27 5 27 33 33 33 33 33 33 33 33 34 33 33 33 + ------ This tool is based on `FASTX-toolkit`__ by Assaf Gordon. diff -r b14f99a4f736 -r 2fb0a64c6aaa tools/fastx_toolkit/fastx_trimmer.xml --- a/tools/fastx_toolkit/fastx_trimmer.xml Tue Oct 06 15:00:32 2009 -0400 +++ b/tools/fastx_toolkit/fastx_trimmer.xml Tue Oct 06 16:55:47 2009 -0400 @@ -3,7 +3,7 @@ <command>zcat -f '$input' | fastx_trimmer -v -f $first -l $last -o $output</command> <inputs> - <param format="fasta,fastasanger" name="input" type="data" label="Library to clip" /> + <param format="fasta,fastqsanger" name="input" type="data" label="Library to clip" /> <param name="first" size="4" type="integer" value="1"> <label>First base to keep</label> diff -r b14f99a4f736 -r 2fb0a64c6aaa tools/filters/trimmer.xml --- a/tools/filters/trimmer.xml Tue Oct 06 15:00:32 2009 -0400 +++ b/tools/filters/trimmer.xml Tue Oct 06 16:55:47 2009 -0400 @@ -4,7 +4,7 @@ trimmer.py -a -f $input1 -c $col -s $start -e $end -i $ignore $fastq > $out_file1 </command> <inputs> - <param format="tabular,text" name="input1" type="data" label="this dataset"/> + <param format="tabular,txt" name="input1" type="data" label="this dataset"/> <param name="col" type="integer" value="0" label="Trim this column only" help="0 = process entire line" /> <param name="start" type="integer" size="10" value="1" label="Trim from the beginning to this position" help="1 = do not trim the beginning"/> <param name="end" type="integer" size="10" value="0" label="Remove everything from this position to the end" help="0 = do not trim the end"/>

1 0